Multi-instance scheduling algorithm
Jobs are subprocesses carried out in tool executions (tasks). Each job has different requirements in terms of CPU and memory, and so will have a particular class of suitable computation instances on which it can be run. Information about each job's requirements is inherited from the tool description.
The default procedure in which jobs are scheduled on CAVATICA minimizes the number of instances used, by fitting as many jobs as possible on an instance with sufficient resources to execute them. A sketch of the scheduling algorithm is reproduced below.
You may override the inherited requirements for a job by specifying the instance type that you want it to run on, using one of the methods documented in Set computation instances.
def schedule(instances, jobs): 
    unscheduled_jobs = prioritize_jobs(jobs)
    
   while unscheduled_jobs:
        unscheduled_jobs = fit_jobs(instances, unscheduled_jobs)
        instances += allocate_new_instances(unscheduled_jobs)
 
    release_empty_instances(instances)
    return instances
In the algorithm above:
- jobsis a list of jobs to be scheduled. Each job in- jobshas associated CPU and memory requirements.
- instancesis a list of instances allocated for the task, with a reference to the amount of remaining resources each one has at that time.
The requirements of a job may not be determined until the initiation of the task, or during its runtime. For instance, a task may run parallel jobs (one per instance) for each contig; in this case, the number of instances required will depend on the input to the task. But this input may depend on the output of a previous node in a workflow.
- 
prioritize_jobsis a function that orders a list of jobs by the cost of the instances each one requires, given its required CPU and memory resources.
- 
fit_jobsis a function that goes through two loops: one for each job inprioritize_jobs(jobs), and one for each instance ininstances. It aims to fit each job to the first suitable instance ininstancesSinceinstancesare ordered so that instances with fewer available resources are ordered before ones with more available resources,fit_jobsresults in dense packing of jobs onto instances.
A job ‘fits’ on an instance if:
- The instance has at least as much CPU and memory than the job's CPU and memory requirements specify.
- The tool used in the job does not have a different instance type specified for it, by any of the methods documented in Set computation instances. Tools for which you have specified an instance type will always be fitted to the chosen instance type.
- allocate_new_instancesis a function that allocates a single instance per iteration of the while loop through- unscheduled_jobs. It takes the first unallocated job from the list- prioritize_jobs(jobs), and allocates it one of the following instances:
- the cheapest instance for which instance_cpu >= job_cpuandinstance_ram >= job_ram, chosen from a list of instance types ininstances.
- the instance specified by the InstanceTypehint, set using one of the methods documented in Set computation instances. If this instance doesn't satisfyinstance_cpu >= job_cpuandinstance_ram >= job_ram, then an error will be raised.
Set a tool's required resources
If you install your own tool onto CAVATICA, using the SDK, you will be able to set its required resources in terms of CPU and memory. This is done on the tool editor. It is possible to set the required resources using dynamic expressions: for instance, a tool's required memory may be 2x its input size.
Setting a tool's required resources means that it will always be allocated an instance that is sufficient for it to run. However, if you opt to override the default instance type by specifying a different one, using one of the methods documented in Set computation instances, you may end up providing your tool with insufficient CPU or memory.
