API overview

The Application Programming Interface (API) can be used to integrate CAVATICA with other applications, and to automate most procedures on it, such as uploading files, querying metadata, and executing analyses. The API uses the REST architectural style to read and write information about projects on CAVATICA.

You can also use our Python and R client libraries to integrate the CAVATICA API with your own applications.

https://cavatica-api.sbgenomics.com/v2

📘

On this page

        API paths
        General API information
        Identifying projects, users, apps, files, tasks and inputs
        Authentication
        Rate limits
        Response pagination

API paths

The paths are structured into the following endpoints, which cover different categories of activity on the Platform:

General API information

Format

API requests are made over HTTP, and information is received and sent in JSON format. For this reason, you should set both the accept and the content header of the request to application/json.

Responses also include Platform-specific error codes, in addition to standard HTTP codes. Information about each code is available on the page API status codes.

Generic query parameters

All API calls take the optional query parameter fields. This can be set to any of the top-level items listed in the response schema to restrict the response to that information only. For example, GET /v2/projects/john_doe/project1?fields=id,name will return the information about the resource project1 restricted to the fields id and name.

Identifying projects, users, apps, files, tasks and inputs

Project short names

Projects on CAVATICA have both given names, which you will see in visual interfaces, like the Projects drop-down menu on the visual interface, and short names, which are human-readable IDs derived from the given names. To refer to a project in an API call, you should use its short name.

Project short names are based on the name you give to a project when you create it. The short name is derived from the project name by:

  • Formatting the name in lower case
  • Omitting characters that are not letters, numbers, spaces or underscores
  • Replacing spaces with hyphens
  • Replacing underscores with hyphens
  • Adding _1 to any name that is already assigned to one of your projects.

For example, if I name my project 'RFranklin's experiments', it would be automatically assigned the shortname 'rfranklins-experiments'.
You can optionally override an auto-assigned short names to one of your choice, when you create a project. However, once the project has been created, its short name will be immutable. To create your own project short name, first create a project, using the drop-down menu at the top of the screen. Then, click the pencil icon on the Create a project pop-out window.

481481
12901290

To check a project's short name, or a task or file's ID, you can inspect the URL when you click on the object in the browser.

Users

CAVATICA users are referred to in the API by their usernames. These are chosen by the user when signing up for the Platform. Usernames are unique and immutable. They are also case sensitive, so make sure you have the right username capitalisation when using the API.

👍

Uniqueness of project names

Every project is uniquely identified by {project_owner_username}/{shortname}.

Apps

Apps (tools and workflows) in projects can be accessed using the API. Like projects, apps have both given names, which are assigned by the users who create them, and short names. An app's short name is derived by the same process as a project's short name.

Each app is identified with reference to the project it is contained in and its short name, using the format: {project_owner}/{project}/{app_short_name}/{revision_number}.

For instance, RFranklin/my-project/bamtools-merge-2-4-0/0 identifies an app.

Tasks

Tasks are referred to in the API calls by IDs. These are hexadecimal strings (UUIDs) assigned to tasks. You can retrieve them by making the API call to list tasks.

Tasks have the following statuses: DRAFT, RUNNING, QUEUED, ABORTED, COMPLETED or FAILED.

Files

Files are referred to in API calls by IDs. These are hexadecimal strings assigned to files. You can retrieve them by making the API call to list files.

Note that file IDs are dependent on the project the file is stored in. If you copy a file to a different project, it will have a new ID in this project.

In calls that return CWL descriptions of tasks, such as the call to GET task details, files are identified by their path objects. The file path is identical to the file ID.

Inputs

Task inputs are specified as dictionaries. They pair apps to be executed in the task with the objects that will be inputted to them.

The format for an input is:
{app_id}: {object}

The {app_id} is defined above. The value of {object} is obtained as follows:
If the object to be inputted to the task is not a file (but an integer, boolean, etc) then simply enter that value as {object}.
If the object to be inputted to the task is a file, then {object} is a dictionary, with the format:

{
   "class": "File",
   "path": "file_id",
   "name": "file_name.ext"
}

When multiple files are used as inputs, enter a list of {object}s, like this:

[
  {
     "class": "File",
      "path": "file_id",
      "name": "file_name.ext"
    }
    {
      "class": "File",
      "path": "file_id",
      "name": "file_name.ext"
    }
]

The following are all examples of inputs:

  1. An input integer:
"Offset": {2}
  1. An input file for the known indels:
{
        "cuffdiff_zip": {
            "class": "File",
            "path": "567890abc9b0307bc0414164",
            "name": "example_human_known_indels.vcf"
        }
    }

3: File inputs for a Whole Exome Sequencing workflow, in the form of FASTQ reads:

"Reads_FASTQ": [
    {
      "class": "File",
      "path": "567890abc1e5339df0414123",
      "name": "WES_human_Illumina.pe_1.fastq"
    },
    {
      "class": "File",
      "path": "567890abc4f3066bc3750174",
      "name": "WES_human_Illumina.pe_2.fastq"
    }
  ]

👍

Task inputs

For more examples of task inputs, use the call to get task inputs for some of the tasks you initiate on the CAVATICA visual interface.
For finding which app receives which inputs and their format, you can review the app's page on the CAVATICA visual interface. For example Whole Exome Sequencing GATK 2.3.9.-lite

Authentication

You will need an authentication token from the Developer Dashboard to uniquely identify yourself to the Platform.

Click here to go to the developer dashboard.

All API requests must have the HTTP header X-SBG-Auth-Token which you should set to your authentication token. The only call which is exempt from this is the '/' call to list all request paths.

Rate limits

All API calls are rate-limited, which means that you can only perform a limited number of requests hourly. All rate limit information is returned to the user in the following HTTP headers:

  1. The header X-RateLimit-Limit represents the rate limit - currently this is 1000 requests per five minutes.
  2. The header X-RateLimit-Remaining represents your remaining number of calls before hitting the limit.
  3. The header X-RateLimit-Reset - represents the time in Unix timestamp when the limit will be reset

Response pagination

All API calls take the pagination query parameters limit and offset to control the number of items returned in a response. These are useful if you are returning information about a resource with many items, such as a list of many files in a project.

👍

Filtering

In addition to controlling the number of items returned using the pagination query parameters, if you are requesting information about files using the call to GET /files you can filter items returned by filename, metadata, or originating task.

Specify the number of items to return in a response

You can control how many items are returned by an API call using the query parameter limit. If you do not specify a value for limit in a call, a maximum of 50 items will be returned by the call by default.

The maximum value for the query parameter limit is 100.

Example 1:
Suppose you have 70 files in the project my-project, and you issue the call to GET /files as follows:

GET /v2/files?project=my-project HTTP/1.1
Host: cavatica-api.sbgenomics.com
X-SBG-Auth-Token: 3259c50e1ac5426ea8f1273259740f74

Since no value for limit was specified, this call will return details of 50 of the files, along with a URL to return the next 20.

Example 2:
Again, suppose you have a project my-project with 70 files in it. The following call will return details of all 70 files"

GET /v2/files?project=my-project?limit=70 HTTP/1.1
Host: cavatica-api.sbgenomics.com
X-SBG-Auth-Token: 3259c50e1ac5426ea8f1273259740f74

Specify the starting point for items to return in a response

You can control the starting point at which to start returning items in an API call using the query parameter offset. If you do not specify a value for offset then the default starting point will be the first item in the specified resource.

Example 1:
Suppose you have a project called my-project containing 70 files, and you want to return their details, starting with the 30th file. To do this, issue the call to GET /files with a query parameter offset specified as follows:

GET /v2/files?project=my-project?offset=30 HTTP/1.1
Host: cavatica-api.sbgenomics.com
X-SBG-Auth-Token: 3259c50e1ac5426ea8f1273259740f74

Calls made with the offset query parameter additionally return the header X-Total-Matching-Query which signifies the total number of results.

Example 2:
An example of a call made using both pagination parameters is as follows:

GET v2/projects?limit=2&offset=2 HTTP/1.1
Host: cavatica-api.sbgenomics.com
X-SBG-Auth-Token: 3259c50e1ac5426ea8f1273259740f74

This returns the following body in JSON:

{
 "href": "https://cavatica-api.sbgenomics.com/v2/projects/",
 "items": [
 {
 "href": "https://cavatica-api.sbgenomics.com/v2/projects/john_doe/project1",
 "id": "john_doe/project1",
 "name": "project1"
 },
 {
 "href": "https://cavatica-api.sbgenomics.com/v2/projects/john_doe/project2",
 "id": "john_doe/project2",
 "name": "Project 2"
 }
 ],
 "links": [
 {
 "href": "http://cavatica-api.sbgenomics.com/v2/projects/?offset=4?limit=2",
 "rel": "next",
 "method": "GET"
 }
 ]
}

The headers returned include X-Total-Matching-Query which lists the total number of results.
The body of the response includes the array links, which indicate how to get the next or previous set of results.