Create a new draft task

📘
Navigation

Set output destinations via the visual interface

Use output destinations in API calls:

Create a new draft task

Get details of a task

Modify a task

Run a task

Rerun a task

This call creates a new task. You can create either a single task or a batch task by using the app's default batching, override batching, or disable batching completely.

A parent task is a task that specifies criteria by which to batch its inputs into a series of further sub-tasks, called child tasks.

See the documentation on batching tasks for more details on batching criteria.

Request

https://cavatica-api.sbgenomics.com/v2/tasks

Header fields

Name	Description
X-SBG-Auth-Token required	Your authentication token.

Query parameters

Name	Data type	Description
`fields`	string	Selector specifying a subset of fields to include in the response.
`action`	string	If set to run, the task will be run immediately upon creation.

Request body

The request body should be a JSON object specifying the app that you want to run, and assigning input files to its input nodes. It is entered as a list of key-value pairs. The keys specify the name and description of the task to be created, the app to executed, and details of its inputs files. The keys, and their permitted values, are described below.

You can see a list of the app's input nodes on CAVATICA on the Apps page for the project. Specify the files to input to the nodes using the files' IDs, which you can obtain using the call to get files.

Key	Data type	Description
`name`	string	The name of the task.
`description`	string	An optional description of the task.
`project`	string	The short name of the project that you want to create the task in.
`execution_settings`	dictionary	Detailed task execution parameters. Includes the instance type setting (`instance_type`) and/or the maximum number of parallel instances (setting `max_parallel_instances`). • `instance_type`: Possible value is the specific instance type, e.g. `"instance_type": "c4.2xlarge;ebs-gp2;2000"`. • `max_parallel_instances`: Maximum number of instances running at the same time. Takes any integer value equal to or greater than 1, e.g. `"max_parallel_instances": 2`. • `use_memoization`: Set to `false` by default. Set to `true` to enable memoization.
`app`	string	The specification of the app that you want to run. Recall that apps are specified by their projects, in the form `{project_owner}/{project}/{app_name}`.
`inputs`	dictionary	See the section on specifying task inputs for information on creating task input objects.
`output_location`	dictionary	Detailed parameters related to the output location where task outputs will be stored.
`batch`	boolean	This is set to false by default. Set to true to create a batch task and specify the `batch_input` and `batch_by` criteria as described below.
`batch_input`	string	The ID of the input on which you wish to batch. You would typically batch on the input consisting of a list of files. If this parameter is omitted, the default batching criteria defined for the app will be used.
`batch_by`	dictionary	This specifies the criteria on which to batch. It can be in one of two formats. If you wish to batch per item in the app's input (i.e., typically per file in a list of files) then specify a dictionary with the following format: `{ "type": "ITEM" }`. If you wish to batch by groups of inputs, you should specify the criteria satisfied by each group. This should be a common metadata value in one or more, metadata fields. To do this, specify a dictionary with the following format: `{ "type": "CRITERIA", "criteria": [ "metadata.<field_1>", "metadata.<field_2>" ] }`. This will group inputs by shared metadata values for `<field_1>` and `<field_2>`, in that order. Arbitrarily many metadata fields may be listed, and the order in which fields are grouped will respect the order of the list.
`use_interruptible_instances`	boolean	This field can be true or false. Set this field to true to allow the use of spot instances.

Output location

The output_location dictionary allows you to define the exact location where your task outputs will be stored. The location can either be defined for the entire project using the main_location parameter, or individually per each output node, by setting the nodes_override parameter to true and defining individual output node locations within nodes_location. See the table below for more details.

Key	Data type	Description
`main_location`	string	Defines the output location for all output nodes in the task. Can be a path within the project in which the task is created, for example `'/Analysis/<task_id>_<task_name>/'` or a path on an attached volume, such as `"volumes://volume_name/<project_id>/html"`. Parts of the path enclosed in angle brackets `<>` are tokens that are dynamically replaced with corresponding values during task execution. See the list of available tokens.
`main_location_alias`	string	The location (path) in the project that will point to the actual location where the outputs are stored. Used if `main_location` is defined as a volume path (starting with `volumes://`), to provide an easy way of accessing output data directly from project files.
`nodes_override`	boolean	Enables defining of output locations for output nodes individually through `nodes_location` (see below). Set to `true` to be able to define individual locations per output node. Default: `false`. Even if `nodes_override` is set to true, it is not necessary to define output locations for each of the output nodes individually. Data from those output nodes that don't have their locations explicitly defined through `nodes_location` is either placed in `main_location` (if defined) or at the project files root if a main output location is not defined for the task.
`nodes_location`	dictionary	Contains output paths for individual task output nodes in the following format for each output node: `"{output-node-id}": { "output_location": "{output-path}", "output_location_alias": "{alias-path}"}` For example: `"b64html": { "output_location": "volumes://outputs/tasks/mar-19", "output_location_alias": "/rfranklin/tasks/picard"}` In the example above, `b64html` is the ID of the output node for which you want to define the output location, while the parameters are defined as follows: • `output_location`: Can be a path within the project in which the task is created, for example `'/Analysis/<task_id>_<task_name>/'` or a path on an attached volume, such as `"volumes://volume_name/<project_id>/html"`. Also accepts tokens. • `output_location_alias`: The location (path) in the project that will point to the exact location where the output is stored. Used if `output_location` is defined as a volume path (starting with `volumes://`).

Example request body

{   
    "description": "my draft task",
    "name": "RFranklin, Experiment IV",
    "app": "RFranklin/my-project/new-test-app",
    "project": "RFranklin/my-project",
    "use_interruptible_instances": false,
    "execution_settings": {
            "instance_type": "c4.2xlarge;ebs-gp2;2000",
            "max_parallel_instances": 1
        },
    "inputs": {
        "cuffdiff_zip": {
            "class": "File",
            "path": "567895e6e4b00a1d67a8b1cc",
            "name": "example_human_known_indels.vcf"
        }
    },
    "output_location": {
        "main_location": "volumes://rfranklin/task-outputs/mar_19",
        "main_location_alias": "/outputs/<app_name>/mar_19",
        "nodes_override": true,
        "nodes_location": {
            "b64html": {
                "output_location": "/outputs/<app_name>/mar_19/html_reports"
            },
            "raw_vcf": {
                "output_location": "volumes://rfranklin/task-outputs/mar_19",
                "output_location_alias": "/outputs/<app_name>/mar_19/vcf"
            }
        }
    }
}

Response

See a list of CAVATICA-specific response codes that may be contained in the body of the response.

The response body for a batch task will contain information about the task. The content will be a little different depending on whether the task in question is a batch task (a parent task) or one task that is part of a batch (a child task).

The following key-value pairs in the response body indicate the batch status of the task:

Name	Data type	Description
`batch`	boolean	Set to `true` if the task is a parent batch task; otherwise `false`.
`parent`	string	The ID of the parent task, in the case that the task is part of a batch (i.e. a child task).
`batch_group`	dictionary	Present only for child tasks.This describes the structure of the parent task, i.e. the criteria by which tasks are batched. If tasks are batched per item in the input, the structure is as shown in the following example: `"batch_group": { "value": "C18-146.fastq", "fields": {} }` If tasks are batched by metadata fields, the structure is as shown in the following example: `"batch_group": { "value": "hg19, E18127-pool40-L2355", "fields": { "metadata.library_id": "hg19", "metadata.sample_id": "E18127-pool40-L2355" } }`
`execution_status`	dictionary	For a parent task, this describes the number of child tasks in any given state, in the following form: `"execution_status": { "message": "Running", "queued": 1, "running": 5, "completed": 2, "failed": 1, "aborted": 0 }`. For a child task or a single task (not part of a batch), the execution status lists a number of steps.