Set computation instances
Each tool that is run in a task is executed on a computation instance in the cloud. Instances are virtual computers; different instance types have different allocations of CPU and memory, so are suited for workloads with different computational requirements.
CAVATICA uses a scheduling algorithm to select an appropriate computation instance for each tool that is run in a task. The algorithm assigns to each a tool an instance that has sufficient resources to run the tool, and, when running workflows made of multiple tools, is optimized to efficiently pack tools onto instances.
While the scheduling algorithm will select a default instance that is suitable for your task, in some cases you might want to override the algorithm to select a specific instance type to run the task on. This page explains how to set the instance type for a task.
The scheduling algorithm
To see how tool executions are fitted onto instances, take a look at the scheduling algorithm we use to allocate instances to tasks.
Available instance types
As a CAVATICA user, you have access to the AWS US East and Google Cloud Platform West cloud infrastructure.
To choose an instance type, simply specify it as a value for the sbg:AWSInstanceType
or sbg:GoogleInstanceType
hint.
Determining instance types
All public tools and workflows have defined requirements for CPU and memory. These are used by the scheduler on CAVATICA to pick a suitable computation instance for the app to be run on. You can override this selection in a number of ways:
- You can set the instance type for an entire workflow. This will override any setting that you have made for any given tool in the workflow.
- You can set the instance type for any tool (either one you have added to the CGC yourself, using the SDK, or a public tool) using the tool editor. This will override the instance type selected by the scheduler.
- You can set the instance type for any tool(s) in a workflow. This will override any setting you have made on the tool editor.
- You can also set the instance type for a task. This will override any setting you have made on the workflow level, according to the following priority task > workflow > node.
Choose an instance type that is sufficient for your task
If you override the instance type that the scheduling algorithm selects on the basis of the app's required resources, and instead pick your own instance, you may inadvertently select one that doesn't have enough resources to run the app successfully. To make sure you pick a suitable instance, check the required resources of the tool you want to use. To do this, open the tool in the Tool Editor, by clicking the Edit button. Note that you can only edit a tool that is in one of your projects.
The Tool Editor contains fields labelled CPU and Memory. These contain the number of CPUs and amount of memory deemed necessary for running the tool by the person who wrapped it.
If you try to set an instance type that fails to meet a tool's required resources, then, wherever possible, you will see a warning notification. However, sometimes a tool's required resources are set dynamically. For instance, the tool may require two times as many CPUs as it has input files, and the number of input files to the tool will depend on the behavior of the tool before it in a pipeline. In this case, it may not be possible to raise an error about insufficient resources before running the app, and you will see an error during its execution.
See the documentation on the Tool Editor for more information on how to describe a tool. In particular, see how to set a tool's required resources. For details of how dynamic expressions may be used, see the documentation on dynamic expressions in tool descriptions.
Running instances in parallel
You can set the maximum number of instances to run in parallel for a workflow. See the instructions below on setting the instance type for an entire workflow for details.
Set the instance type for a workflow
You can set the instance for an entire workflow. This means that all tools in the workflow run on the selected instance type.
- To set the instance type for a workflow, you should add the workflow to a project. Then, on the Apps tab of the project dashboard, click the ellipses icon next to the workflow and select Edit. Workflow Editor opens.
- In the top-right corner of the workflow editor click and select Set Hints. You will see the Set Hints popup window.
- Click Add a Hint and enter the following information:
To set the instance type:
- In the Class field select
sbg:AWSInstanceType
orsbg:GoogleInstanceType
depending on the cloud provider you want to use. - In the Value field select an instance type from the list of available AWS or GCP instances. In the field on the right, you can set the size of storage attached to the computation instance.
- Click Done. You have successfully set the instance hint.
To set the maximum number of instances to be used in parallel:
- Click Add a Hint to create new empty fields.
- In the Class field select
sbg:maxNumberOfParallelInstances
. - In the Value field enter the maximum number of instances to be run in parallel as an integer.
- Click Done. You have successfully set the maximum number of parallel instances.
Set the instance type for a tool in a workflow (node)
You can set the instance type(s) of individual tools in a workflow. For instance, you might want to use a smaller, cheaper, instance type for most tools in your workflow, but provide one tool with a more powerful instance.
Note that if you are running the tool on its own and not in a workflow, see the instructions on how to set the instance type for a tool that is not in a workflow.
- Select the workflow that you want to configure. This can be a public workflow (in which case, you must have copied it to a project to edit it), or it can be a workflow you have built yourself.
- Click the ellipses icon next to the workflow and select Edit. Workflow Editor opens.
- In the workflow editor, double-click the node representing the tool in the workflow whose instance you want to set. This displays the object inspector on the right. Note that input and output nodes in a workflow don't represent tools, and you can't set their instances.
- In the object inspector, open the Step tab.
- Scroll to the bottom of the tab's contents and click Set Hints. The Set Hints popup window opens.
- Click Add a Hint and enter the following information:
To set the instance type:
- In the Class field select
sbg:AWSInstanceType
orsbg:GoogleInstanceType
depending on the cloud provider you want to use. - In the Value field select an instance type from the list of available AWS or GCP instances. In the field on the right, you can set the size of storage attached to the computation instance.
- Click Done. You have successfully set the instance hint.
Set the instance type for a tool
You can set the instance type for a tool in the Tool Editor.
- To set the instance type for a tool, you should add it to a project. Then, on the Apps tab of the project dashboard, click the ellipses icon next to the tool and select Edit. Tool Editor opens.
- In the Tool Editor scroll down to the Hints section.
- Click Add a Hint and enter the following information:
To set the instance type:
- In the Class field select
sbg:AWSInstanceType
orsbg:GoogleInstanceType
depending on the cloud provider you want to use. - In the Value field select an instance type from the list of available AWS or GCP instances. In the field on the right, you can set the size of storage attached to the computation instance.
- Click Done. You have successfully set the instance hint.
Set attached storage size
When setting the sbg:AWSInstanceType
or sbg:GoogleInstanceType
instance hint, the configuration options are instance type and attached storage size. Attached storage includes disks that are used by the computation instance as storage capacity during task execution. To set up attached storage, in the field Attached Storage, enter the needed storage size:
- If you have selected an instance that has its own storage (ephemeral storage), the storage size will be displayed in brackets next to the instance name. However, you can still define a different storage size (from 2 GB to 4096 GB) in the Attached Storage field, in which case CAVATICA will use attached storage instead of the instance's ephemeral storage, and attached storage costs will be added to the cost of running the computation instance. For AWS instances, read more about EBS pricing. For GCP instances, learn more about Persistent disk pricing.
- If you have selected an Amazon EBS-only instance (no storage capacity shown in brackets next to CPU and memory values), you can change attached storage size to any value from 2 GB to 4096 GB in 1 GB increments. Attached storage costs will be added to the compute instance cost, according to the EBS pricing model for AWS and Persistent disk pricing for GCP.
Updated less than a minute ago