Create a new draft task

📘

Navigation

This call creates a new task. You can create either a single task or a batch task by using the app's default batching, override batching, or disable batching completely.

A parent task is a task that specifies criteria by which to batch its inputs into a series of further sub-tasks, called child tasks.

See the documentation on batching tasks for more details on batching criteria.

Request

https://cavatica-api.sbgenomics.com/v2/tasks

Header fields

Name

Description

X-SBG-Auth-Token required

Your authentication token.

Query parameters

Name

Data type

Description

fields

string

Selector specifying a subset of fields to include in the response.

action

string

If set to run, the task will be run immediately upon creation.

Request body

The request body should be a JSON object specifying the app that you want to run, and assigning input files to its input nodes. It is entered as a list of key-value pairs. The keys specify the name and description of the task to be created, the app to executed, and details of its inputs files. The keys, and their permitted values, are described below.

You can see a list of the app's input nodes on CAVATICA on the Apps page for the project. Specify the files to input to the nodes using the files' IDs, which you can obtain using the call to get files.

Key

Data type

Description

name

string

The name of the task.

description

string

An optional description of the task.

project

string

The short name of the project that you want to create the task in.

execution_settings

dictionary

Detailed task execution parameters. Includes the instance type setting (instance_type) and/or the maximum number of parallel instances (setting max_parallel_instances).
• instance_type: Possible value is the specific instance type, e.g. "instance_type": "c4.2xlarge;ebs-gp2;2000".
• max_parallel_instances: Maximum number of instances running at the same time. Takes any integer value equal to or greater than 1, e.g. "max_parallel_instances": 2.
• use_memoization: Set to false by default. Set to true to enable memoization.

app

string

The specification of the app that you want to run. Recall that apps are specified by their projects, in the form {project_owner}/{project}/{app_name}.

inputs

dictionary

See the section on specifying task inputs for information on creating task input objects.

output_location

dictionary

Detailed parameters related to the output location where task outputs will be stored.

batch

boolean

This is set to false by default. Set to true to create a batch task and specify the batch_input and batch_by criteria as described below.

batch_input

string

The ID of the input on which you wish to batch. You would typically batch on the input consisting of a list of files. If this parameter is omitted, the default batching criteria defined for the app will be used.

batch_by

dictionary

This specifies the criteria on which to batch. It can be in one of two formats.

  1. If you wish to batch per item in the app's input (i.e., typically per file in a list of files) then specify a dictionary with the following format: { "type": "ITEM" }.

  2. If you wish to batch by groups of inputs, you should specify the criteria satisfied by each group. This should be a common metadata value in one or more, metadata fields. To do this, specify a dictionary with the following format: 
    { "type": "CRITERIA", "criteria": [ "metadata.<field_1>", "metadata.<field_2>" ] }. This will group inputs by shared metadata values for <field_1> and  <field_2>, in that order. Arbitrarily many metadata fields may be listed, and the order in which fields are grouped will respect the order of the list.

use_interruptible_instances

boolean

This field can be true or false. Set this field to true to allow the use of spot instances.

Output location

The output_location dictionary allows you to define the exact location where your task outputs will be stored. The location can either be defined for the entire project using the main_location parameter, or individually per each output node, by setting the nodes_override parameter to true and defining individual output node locations within nodes_location. See the table below for more details.

Key

Data type

Description

main_location

string

Defines the output location for all output nodes in the task. Can be a path within the project in which the task is created, for example '/Analysis/<task_id>_<task_name>/' or a path on an attached volume, such as "volumes://volume_name/<project_id>/html". Parts of the path enclosed in angle brackets <> are tokens that are dynamically replaced with corresponding values during task execution. See the list of available tokens.

main_location_alias

string

The location (path) in the project that will point to the actual location where the outputs are stored. Used if main_location is defined as a volume path (starting with volumes://), to provide an easy way of accessing output data directly from project files.

nodes_override

boolean

Enables defining of output locations for output nodes individually through nodes_location (see below). Set to true to be able to define individual locations per output node. Default: false. 

Even if nodes_override is set to true, it is not necessary to define output locations for each of the output nodes individually. Data from those output nodes that don't have their locations explicitly defined through nodes_location is either placed in main_location (if defined) or at the project files root if a main output location is not defined for the task.

nodes_location

dictionary

Contains output paths for individual task output nodes in the following format for each output node:
"{output-node-id}": { "output_location": "{output-path}", "output_location_alias": "{alias-path}"}

For example:
"b64html": { "output_location": "volumes://outputs/tasks/mar-19", "output_location_alias": "/rfranklin/tasks/picard"}

In the example above, b64html is the ID of the output node for which you want to define the output location, while the parameters are defined as follows:

• output_location: Can be a path within the project in which the task is created, for example '/Analysis/<task_id>_<task_name>/' or a path on an attached volume, such as "volumes://volume_name/<project_id>/html". Also accepts tokens.

• output_location_alias: The location (path) in the project that will point to the exact location where the output is stored. Used if output_location is defined as a volume path (starting with volumes://).

Example request body

{   
    "description": "my draft task",
    "name": "RFranklin, Experiment IV",
    "app": "RFranklin/my-project/new-test-app",
    "project": "RFranklin/my-project",
    "use_interruptible_instances": false,
    "execution_settings": {
            "instance_type": "c4.2xlarge;ebs-gp2;2000",
            "max_parallel_instances": 1
        },
    "inputs": {
        "cuffdiff_zip": {
            "class": "File",
            "path": "567895e6e4b00a1d67a8b1cc",
            "name": "example_human_known_indels.vcf"
        }
    },
    "output_location": {
        "main_location": "volumes://rfranklin/task-outputs/mar_19",
        "main_location_alias": "/outputs/<app_name>/mar_19",
        "nodes_override": true,
        "nodes_location": {
            "b64html": {
                "output_location": "/outputs/<app_name>/mar_19/html_reports"
            },
            "raw_vcf": {
                "output_location": "volumes://rfranklin/task-outputs/mar_19",
                "output_location_alias": "/outputs/<app_name>/mar_19/vcf"
            }
        }
    }
}

Response

See a list of CAVATICA-specific response codes that may be contained in the body of the response.

The response body for a batch task will contain information about the task. The content will be a little different depending on whether the task in question is a batch task (a parent task) or one task that is part of a batch (a child task).

The following key-value pairs in the response body indicate the batch status of the task:

Name

Data type

Description

batch

boolean

Set to true if the task is a parent batch task; otherwise false.

parent

string

The ID of the parent task, in the case that the task is part of a batch (i.e. a child task).

batch_group

dictionary

Present only for child tasks.This describes the structure of the parent task, i.e. the criteria by which tasks are batched.

  1. If tasks are batched per item in the input, the structure is as shown in the following example: 
    "batch_group": { "value": "C18-146.fastq", "fields": {} }

  2. If tasks are batched by metadata fields, the structure is as shown in the following example: 
    "batch_group": { "value": "hg19, E18127-pool40-L2355", "fields": { "metadata.library_id": "hg19", "metadata.sample_id": "E18127-pool40-L2355" } }

execution_status

dictionary

For a parent task, this describes the number of child tasks in any given state, in the following form: 
"execution_status": { "message": "Running", "queued": 1, "running": 5, "completed": 2, "failed": 1, "aborted": 0 }.

For a child task or a single task (not part of a batch), the execution status lists a number of steps.