{"_id":"57f7726da5ef060e00879703","version":{"_id":"5773dcfc255e820e00e1cd50","__v":27,"project":"5773dcfc255e820e00e1cd4d","createdAt":"2016-06-29T14:36:44.812Z","releaseDate":"2016-06-29T14:36:44.812Z","categories":["5773dcfc255e820e00e1cd51","5773df36904b0c0e00ef05ff","577baf92451b1e0e006075ac","577bb183b7ee4a0e007c4e8d","577ce77a1cf3cb0e0048e5ea","577d11865fd4de0e00cc3dab","578e62792c3c790e00937597","578f4fd98335ca0e006d5c84","578f5e5c3d04570e00976ebb","57bc35f7531e000e0075d118","57f801b3760f3a1700219ebb","5804d55d1642890f00803623","581c8d55c0dc651900aa9350","589dcf8ba8c63b3b00c3704f","594cebadd8a2f7001b0b53b2","59a562f46a5d8c00238e309a","5a2aa096e25025003c582b58","5a2e79566c771d003ca0acd4","5a3a5166142db90026f24007","5a3a52b5bcc254001c4bf152","5a3a574a2be213002675c6d2","5a3a66bb2be213002675cb73","5a3a6e4854faf60030b63159","5c8a68278e883901341de571","5cb9971e57bf020024523c7b","5cbf1683e2a36d01d5012ecd","5dc15666a4f788004c5fd7d7"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"parentDoc":null,"project":"5773dcfc255e820e00e1cd4d","__v":1,"user":"575e85ac41c8ba0e00259a44","category":{"_id":"5773dcfc255e820e00e1cd51","__v":0,"project":"5773dcfc255e820e00e1cd4d","version":"5773dcfc255e820e00e1cd50","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-06-29T14:36:44.838Z","from_sync":false,"order":0,"slug":"documentation","title":"Get started"},"githubsync":"","metadata":{"title":"","description":"","image":[]},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-10-07T10:01:17.858Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":3,"body":"##Prerequisites\nAll the resources used in the QuickStart, including the files and workflow, are available to you when you <a href=\"https://cavatica.sbgenomics.com/\" target=\"blank\">sign up for a free account</a>: there is no need to take out a subscription — just use some of your free $150 credits. \n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"We encourage you to follow these steps and to try the analysis for yourself. This is the easiest way to become familiar with Cavatica!\",\n  \"title\": \"Try it out yourself!\"\n}\n[/block]\n##Procedure\nWe'll start by creating a project and populating it with FASTQ files. Then, we'll use one of the whole exome analysis workflows to carry out the analysis. Finally, we'll examine our results. \n[block:callout]\n{\n  \"type\": \"warning\",\n  \"title\": \"On this page:\",\n  \"body\": \"* [Create a project](#section-create-a-project)\\n* [Enter file metadata](#section-enter-file-metadata)\\n* [Add FASTQ files to your project](#section-add-fastq-files-to-your-project)\\n* [Select a public workflow ](#section-select-a-public-workflow)\\n* [Edit the selected workflow](#section-edit-the-selected-workflow)\\n* [Run the analysis](#section-run-the-analysis)\\n* [View the results](#section-view-the-results-of-the-data-analysis)\"\n}\n[/block]\n##Create a project\nThe first step to running an analysis on Cavatica is to create a project. To do this, click **Create a project** under the **Projects** tab in the top navigation bar. \n\nThis will open a new window where you can name your project and select a [billing group](http://docs.sevenbridges.com/v1.0/docs/payments). Let's name our project **quickstart**. We'll use the free **Pilot Funds** as our billing group. When you're finished, click **Create**.\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/9dbed46-QS_create_project.png\",\n        \"QS_create project.png\",\n        454,\n        349,\n        \"#605263\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"success\",\n  \"title\": \"Project URL\",\n  \"body\": \"Your project is given a URL based on its name. While you can rename your project at any point in time, the URL cannot be altered after your project has been created.\"\n}\n[/block]\nOnce you create a project, you'll be taken to its [Project Dashboard](http://docs.sevenbridges.com/v1.0/docs/project-dashboard). This page contains all the information about your project, including its files, apps (tools and workflows), tasks (workflow executions), and project members.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/5aa5fe8-QS_project_dashboard.png\",\n        \"QS_project dashboard.png\",\n        1440,\n        742,\n        \"#34455e\"\n      ]\n    }\n  ]\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"Learn more about adding project members and specifying their level of access in the documentation on [managing project members](http://docs.sevenbridges.com/v1.0/docs/collaboration).\",\n  \"title\": \"Manage project members\"\n}\n[/block]\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##Add FASTQ files to your project\n\nThe next step is to add the FASTQ files to your project. The other reference files needed for the analysis will be suggested when you set up the workflow.\n\nTo find the FASTQ files necessary for the analysis, click the **Files** tab on your project dashboard and then **+Add files**.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/1fdf816-QS_add_files.png\",\n        \"QS_add files.png\",\n        559,\n        402,\n        \"#313f56\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\nClicking **+Add files** opens the file browser. Here you can view the Public Reference Files repository, **Public Files** and any files that you've already added to other projects.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/d661298-QS_add_files2.png\",\n        \"QS_add files2.png\",\n        1422,\n        754,\n        \"#f5f5f4\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"There are several ways to add files to the Platform; you can add them from a [computer](http://docs.sevenbridges.com/v1.0/docs/copy-files-using-the-visual-interface) or via [FTP/HTTP](http://docs.sevenbridges.com/v1.0/docs/copy-files-using-the-api).\"\n}\n[/block]\nFor this analysis, we want to use two paired-end files that contain whole exome sequencing data. We'll select **Public Files** on the top bar and use the search box to quickly locate them.\n\nWe want to find a pair of FASTQ files named **C835.HCC1143.2.converted.pe_1.fastq** and **C835.HCC1143.2.converted.pe_2.fastq**, we'll enter \"C835.HCC1143.2.converted.pe\" into the search box to find them.\n\nIf you don't know the names of the files you need, you can instead browse all files. Learn more about [searching for files ](http://docs.sevenbridges.com/v1.0/docs/search-files-on-the-platform) on the Platform.\n\nSelect both files using the checkboxes adjacent to the filenames, as shown below. To copy the files, click **Copy to Project** and confirm. To return to the Project Dashboard, just close the **File** window.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/ce981c3-QS_add_files3.png\",\n        \"QS_add files3.png\",\n        1421,\n        379,\n        \"#edf1f0\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"success\",\n  \"body\": \"Copy multiple files at once by checking all files before clicking **Copy to Project**\"\n}\n[/block]\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##Enter file metadata\nIt is important to annotate your files with [metadata](http://docs.sevenbridges.com/docs/metadata-on-the-seven-bridges-platform) when you perform an analysis on the Platform so that bioinformatics tools processing files in parallel can group files with identical metadata value(s) in specified fields.\n\nFile metadata includes information about the File (e.g. experimental strategy and library ID), **Sample** (e.g. sample ID), and **General** (e.g. investigation and species) . For more information on the metadata fields used on the Platform, please see the documentation on [file metadata](http://docs.sevenbridges.com/docs/metadata-on-the-seven-bridges-platform).\n\nClick the **Files** tab on your project dashboard to see all the files in the project. Currently our project, QuickStart, only contains the two files that we've just added.\n\nTo edit a file's metadata, select the file and click **Edit Metadata**. You can add (the same) metadata for both files at once. Or, you can do add metadata individually if your files have different metadata. We can edit the metadata for both of the FASTQ files simultaneously.\n\nSelect both of the files and click **Edit Metadata**. This will open a pop-up window with inputs for the different metadata fields. Notice the empty field for Platform unit ID. This needs to be set to run the task. Enter 1 in this field, and click **Save**.\n\nThis metadata will inform tools that these files come from the same sample, were produced by the same library, and have been sequenced on the same lane.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/d3bb130-QS_edit_metadata.png\",\n        \"QS_edit metadata.png\",\n        1433,\n        766,\n        \"#483c59\"\n      ]\n    }\n  ]\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"Each file used in an analysis on the Cavatica must have their own metadata values. For more information, see the [metadata documentation](http://docs.sevenbridges.com/v1.0/docs/metadata-on-the-seven-bridges-platform) on grouping and distinguishing files by metadata.\\n\\nIn the example here, note that while we have set the same **Library ID**, **Platform unit ID**, and **Platform** values for the two WES_human_Illumina files, the two files come with different **Paired-end** values ('1' and '2') by default.\"\n}\n[/block]\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##Select a public workflow \n\nThe next step is selecting a public workflow for running the analysis. We'll use the workflow, **Whole Exome Sequencing GATK 2.3.9.-lite**, which is based on the free version of the GATK tool developed by the Broad Institute.\n\nThis workflow is one of the many open source workflows available to all Cavatica users. These workflows have been tested to run efficiently in the cloud environment by the Seven Bridges bioinformatics team.\n\nTo select a public workflow for use in your project, navigate to **Apps** tab on your project dashboard and click **+Add App**.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/bddf1be-QS_add_apps.png\",\n        \"QS_add apps.png\",\n        621,\n        410,\n        \"#ebecf2\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\nTo add the Whole Exome Sequencing workflow:\n\n1. Type 'whole exome' into the search box. The Whole Exome Sequencing GATK 2.3.9.-lite will be displayed in the search results. .\n\n2. Next, click **Copy **below the workflow.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/bda436a-QS_add_apps2.png\",\n        \"QS_add apps2.png\",\n        1425,\n        612,\n        \"#5b9be7\"\n      ]\n    }\n  ]\n}\n[/block]\n3. (Optional) Set the name of the workflow in your project.\n4. Click **Copy** and the workflow will be added to your project.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/248377e-QS_add_apps3.png\",\n        \"QS_add apps3.png\",\n        1424,\n        612,\n        \"#f4f5f5\"\n      ]\n    }\n  ]\n}\n[/block]\nTo go back to the project dashboard, close the app browser window.\n\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##Edit the selected workflow\nIn many cases, you might want to tweak a workflow to work better with your dataset. This can be done easily using the workflow editor. To edit your workflow in your project, navigate to the **Apps** tab and click the pencil icon next to **Whole Exome Analysis - BWA + GATK 2.3.9-lite**.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/3347697-QS_edit_workflow.png\",\n        \"QS_edit workflow.png\",\n        1430,\n        339,\n        \"#313e55\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\nThis opens the workflow editor containing a graphical representation of the workflow where each tool, input, and reference file is represented as a node. To see a description of the workflow's function and other details such as toolkit name and version, tool author, and its license, you can click Additional Information.\n\nTo the right of the workflow diagram, the panel labeled **APPS** displays a list of all the apps available in your projects (**MyApps**) or among **PublicApps**.\n\nThe **PARAMS** panel describes the parameters of the tools used in this workflow and allows you to make quick edits.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/c8e7d45-QS_editor_params.png\",\n        \"QS_editor params.png\",\n        1433,\n        702,\n        \"#d4c2cf\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\nOn the workflow editor, click the **BWA-MEM Bundle** node (see the screenshot below). This opens the **PARAMS** tab, which displays the parameters of **BWA-MEM Bundle** sorted into **Input/Output options**, **Scoring options**, **Execution**, etc. Let's find the parameter **use_soft_clipping **and select it.\n\nThis will soft clip the supplementary alignments. To save this change as a new revision of the workflow, click **Save**. Note that clicking **Save** changes the version number from 0 to 1. This function allows you to keep track all your workflow edits.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/04ce551-QS_editor_params2.png\",\n        \"QS_editor params2.png\",\n        1435,\n        692,\n        \"#314658\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##Run the analysis\n\nNow that the workflow is ready, it's time to run the analysis. We'll click **Run**, in the upper right corner. The pop-up window with the suggested files for this workflow will be displayed.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"For all public workflows on Cavatica, our team of bioinformaticians has chosen a set of recommended input files.\"\n}\n[/block]\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/92d4082-QS_suggested_files.png\",\n        \"QS_suggested files.png\",\n        648,\n        322,\n        \"#e3e4e5\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\nClick **Copy** and the suggested files will be copied to your project and added as input files to our workflow. The files are mapped the following way.\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"Input port\",\n    \"h-1\": \"Input files\",\n    \"0-0\": \"Known_SNPs\\n\\nKnown_Indels\",\n    \"0-1\": \"dbsnp_137.b37.vcf\\n\\nMills_and_1000G_gold_standard.indels.b37.sites.vcf\\n1000G_phase1.indels.b37.vcf\",\n    \"1-0\": \"Target_BED\",\n    \"1-1\": \"exome_targets.b37.bed\",\n    \"2-0\": \"SnpEff_Database\",\n    \"2-1\": \"snpEff_v3_6_GRCh37.75.zip\",\n    \"4-0\": \"Reference or TAR with BWA reference indices\",\n    \"4-1\": \"human_g1k_v37_decoy.fasta\",\n    \"3-0\": \"FASTQ\",\n    \"3-1\": \"C835.HCC1143.2.converted.pe_1.fastq\\nC835.HCC1143.2.converted.pe_2.fastq\",\n    \"h-2\": \"File type\",\n    \"0-2\": \"**VCF** files contain databases of the known genetic variants - SNPs and indels.\",\n    \"1-2\": \"**BED** files contain all target regions which are relevant for our analysis - in this case exomes. It points to the relevant locations of the FASTA file we are using for the analysis.\",\n    \"2-2\": \"**ZIP file (snpEff) **is a specific build of the snpEff database which contains annotations of the genetic variants and their supposed effects.\",\n    \"3-2\": \"**FASTQ** files contain the experiment data for our analysis i.e. they are the output of the high-throughput sequencing instruments; for the purpose of the QuickStart guide, we will use a pair of FASTQ files which represent one whole exome sample from the TCGA dataset\",\n    \"4-2\": \"**FASTA** file is a reference genome which we will use for the alignment of the FASTQ files.\"\n  },\n  \"cols\": 3,\n  \"rows\": 5\n}\n[/block]\nOn the **DRAFT Task page** you will see the following sections under the **Task Inputs** tab: **Inputs** and **App Settings**, as shown in the screenshot below. The second tab shows various tool parameters that we've exposed by unlocking them in the workflow editor.\n\nWe'll ignore these for now (for details, see the documentation on [tool settings](http://docs.sevenbridges.com/v1.0/docs/the-tool-editor). The section marked **Inputs** is where you can enter the input files and reference files for your workflow.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/fe709d0-cavatica-quickstart-draft-task.png\",\n        \"cavatica-quickstart-draft-task.png\",\n        843,\n        555,\n        \"#425c71\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\nThe only remaining files you need to select are FASTQ files. Click **Select file(s)** and choose these files:\n * **C835.HCC1143.2.converted.pe_1.fastq**\n * **C835.HCC1143.2.converted.pe_2.fastq**\n\nThe files will be batched by sample, meaning that files with the same Sample ID metadata field will be processed together in a separate task. In our case, the paired-end files we picked already had the Sample ID field set to the same value.\n\nAfter adding the two FASTQ files, we can start this execution by clicking **Run**.\n\nWhen you start the task, a new page opens displaying the task's properties. To see all the tasks that have run or are running in this project, click **Back to tasks** in the upper left corner.\n\nHere you can see the name of each task, the project member who started it, its initiation time, the execution workflow, its status, and available task actions.\n\nThe status will be a progress bar if the task is still running or a label notifying whether the task has completed, been aborted or failed. Additional information, including how to check the status of the task or how to troubleshoot in case of the failed task, is available in the documentation on [task statistics](http://docs.sevenbridges.com/v1.0/docs/view-task-stats).\n\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n##View the results of the data analysis\n\nOnce the task is completed, you'll be notified via email. The easiest way to access results is to go to the **Tasks** tab. This shows all the information related to this particular execution.\n\nOn the **Tasks** page, the column marked **Outputs** shows the results produced by the tools in the executed workflow. In our example task, take a look at **summary_metrics** report. Clicking on the file name opens the alignment metrics from the task.\n\nAt the bottom of the screen you can see the task's raw output.\n\nThe result of the data analysis is shown in the **raw VCF file**. The raw VCF contains all the variants detected by the workflow. To download it, just click on its filename. This will open a new page displaying the contents of the file and some information describing it. Then click **Download** in the upper right corner.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/7b2d261-QS_outputs.png\",\n        \"QS_outputs.png\",\n        1398,\n        675,\n        \"#d6dada\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"success\",\n  \"body\": \"Note that the names of files outputted from a tool incorporate part of the tool's name. This makes it easier to find report files from a list of outputs.\"\n}\n[/block]\nThat’s it! We've executed a data analysis and obtained some results. We encourage you to try this procedure for yourself before getting started on your own data analyses. You can also visit the  [Seven Bridges Knowledge Center](http://docs.sevenbridges.com/v1.0) to learn more about the Platform capabilities and bringing your own tools, as well as the rest of the [Cavatica Knowledge Center](http://docs.sevenbridges.com/v1.0/docs/new-for-cavatica) to find out about Cavatica-specific features.\n\n<div align=\"right\"><a href=\"#top\">top</a></div>\n\n<hr><a name=\"tldr\"></a>\n\n###In a nutshell\n* **Create a project** to hold your analyses on Cavatica.\n* **Add files** to your project and supply their metadata to prepare them for analyses. Don't forget to add reference files!\n* **Add and edit a public workflow** (Whole Exome Analysis - BWA + GATK 2.3.9-Lite) to run your whole exome sequencing analysis.\n* **Set up your task** on the **DRAFT** task page by selecting inputs and reference files.\n* The **Outputs** page displays the results of your task.\n\n<hr>\n\n**Suggested pages:**\n\n[Troubleshoot a failed task](http://docs.sevenbridges.com/v1.0/docs/troubleshoot-a-failed-task) \n[API Quickstart](http://docs.sevenbridges.com/v1.0/docs/api-quickstart) \n[Seven Bridges Platform tutorials](http://docs.sevenbridges.com/v1.0/docs/seven-bridges-platform-tutorials)","excerpt":"<a name=\"top\"></a>To introduce you to the major features of the Cavatica, this QuickStart will walk you through the process of a simple whole exome sequencing analysis.\n\n<a href=\"#tldr\">...in a nutshell</a>","slug":"quickstart","type":"basic","title":"Cavatica quickstart"}

Cavatica quickstart

<a name="top"></a>To introduce you to the major features of the Cavatica, this QuickStart will walk you through the process of a simple whole exome sequencing analysis. <a href="#tldr">...in a nutshell</a>

##Prerequisites All the resources used in the QuickStart, including the files and workflow, are available to you when you <a href="https://cavatica.sbgenomics.com/" target="blank">sign up for a free account</a>: there is no need to take out a subscription — just use some of your free $150 credits. [block:callout] { "type": "info", "body": "We encourage you to follow these steps and to try the analysis for yourself. This is the easiest way to become familiar with Cavatica!", "title": "Try it out yourself!" } [/block] ##Procedure We'll start by creating a project and populating it with FASTQ files. Then, we'll use one of the whole exome analysis workflows to carry out the analysis. Finally, we'll examine our results. [block:callout] { "type": "warning", "title": "On this page:", "body": "* [Create a project](#section-create-a-project)\n* [Enter file metadata](#section-enter-file-metadata)\n* [Add FASTQ files to your project](#section-add-fastq-files-to-your-project)\n* [Select a public workflow ](#section-select-a-public-workflow)\n* [Edit the selected workflow](#section-edit-the-selected-workflow)\n* [Run the analysis](#section-run-the-analysis)\n* [View the results](#section-view-the-results-of-the-data-analysis)" } [/block] ##Create a project The first step to running an analysis on Cavatica is to create a project. To do this, click **Create a project** under the **Projects** tab in the top navigation bar. This will open a new window where you can name your project and select a [billing group](http://docs.sevenbridges.com/v1.0/docs/payments). Let's name our project **quickstart**. We'll use the free **Pilot Funds** as our billing group. When you're finished, click **Create**. [block:image] { "images": [ { "image": [ "https://files.readme.io/9dbed46-QS_create_project.png", "QS_create project.png", 454, 349, "#605263" ], "border": true } ] } [/block] [block:callout] { "type": "success", "title": "Project URL", "body": "Your project is given a URL based on its name. While you can rename your project at any point in time, the URL cannot be altered after your project has been created." } [/block] Once you create a project, you'll be taken to its [Project Dashboard](http://docs.sevenbridges.com/v1.0/docs/project-dashboard). This page contains all the information about your project, including its files, apps (tools and workflows), tasks (workflow executions), and project members. [block:image] { "images": [ { "image": [ "https://files.readme.io/5aa5fe8-QS_project_dashboard.png", "QS_project dashboard.png", 1440, 742, "#34455e" ] } ] } [/block] [block:callout] { "type": "info", "body": "Learn more about adding project members and specifying their level of access in the documentation on [managing project members](http://docs.sevenbridges.com/v1.0/docs/collaboration).", "title": "Manage project members" } [/block] <div align="right"><a href="#top">top</a></div> ##Add FASTQ files to your project The next step is to add the FASTQ files to your project. The other reference files needed for the analysis will be suggested when you set up the workflow. To find the FASTQ files necessary for the analysis, click the **Files** tab on your project dashboard and then **+Add files**. [block:image] { "images": [ { "image": [ "https://files.readme.io/1fdf816-QS_add_files.png", "QS_add files.png", 559, 402, "#313f56" ], "border": true } ] } [/block] Clicking **+Add files** opens the file browser. Here you can view the Public Reference Files repository, **Public Files** and any files that you've already added to other projects. [block:image] { "images": [ { "image": [ "https://files.readme.io/d661298-QS_add_files2.png", "QS_add files2.png", 1422, 754, "#f5f5f4" ], "border": true } ] } [/block] [block:callout] { "type": "info", "body": "There are several ways to add files to the Platform; you can add them from a [computer](http://docs.sevenbridges.com/v1.0/docs/copy-files-using-the-visual-interface) or via [FTP/HTTP](http://docs.sevenbridges.com/v1.0/docs/copy-files-using-the-api)." } [/block] For this analysis, we want to use two paired-end files that contain whole exome sequencing data. We'll select **Public Files** on the top bar and use the search box to quickly locate them. We want to find a pair of FASTQ files named **C835.HCC1143.2.converted.pe_1.fastq** and **C835.HCC1143.2.converted.pe_2.fastq**, we'll enter "C835.HCC1143.2.converted.pe" into the search box to find them. If you don't know the names of the files you need, you can instead browse all files. Learn more about [searching for files ](http://docs.sevenbridges.com/v1.0/docs/search-files-on-the-platform) on the Platform. Select both files using the checkboxes adjacent to the filenames, as shown below. To copy the files, click **Copy to Project** and confirm. To return to the Project Dashboard, just close the **File** window. [block:image] { "images": [ { "image": [ "https://files.readme.io/ce981c3-QS_add_files3.png", "QS_add files3.png", 1421, 379, "#edf1f0" ], "border": true } ] } [/block] [block:callout] { "type": "success", "body": "Copy multiple files at once by checking all files before clicking **Copy to Project**" } [/block] <div align="right"><a href="#top">top</a></div> ##Enter file metadata It is important to annotate your files with [metadata](http://docs.sevenbridges.com/docs/metadata-on-the-seven-bridges-platform) when you perform an analysis on the Platform so that bioinformatics tools processing files in parallel can group files with identical metadata value(s) in specified fields. File metadata includes information about the File (e.g. experimental strategy and library ID), **Sample** (e.g. sample ID), and **General** (e.g. investigation and species) . For more information on the metadata fields used on the Platform, please see the documentation on [file metadata](http://docs.sevenbridges.com/docs/metadata-on-the-seven-bridges-platform). Click the **Files** tab on your project dashboard to see all the files in the project. Currently our project, QuickStart, only contains the two files that we've just added. To edit a file's metadata, select the file and click **Edit Metadata**. You can add (the same) metadata for both files at once. Or, you can do add metadata individually if your files have different metadata. We can edit the metadata for both of the FASTQ files simultaneously. Select both of the files and click **Edit Metadata**. This will open a pop-up window with inputs for the different metadata fields. Notice the empty field for Platform unit ID. This needs to be set to run the task. Enter 1 in this field, and click **Save**. This metadata will inform tools that these files come from the same sample, were produced by the same library, and have been sequenced on the same lane. [block:image] { "images": [ { "image": [ "https://files.readme.io/d3bb130-QS_edit_metadata.png", "QS_edit metadata.png", 1433, 766, "#483c59" ] } ] } [/block] [block:callout] { "type": "info", "body": "Each file used in an analysis on the Cavatica must have their own metadata values. For more information, see the [metadata documentation](http://docs.sevenbridges.com/v1.0/docs/metadata-on-the-seven-bridges-platform) on grouping and distinguishing files by metadata.\n\nIn the example here, note that while we have set the same **Library ID**, **Platform unit ID**, and **Platform** values for the two WES_human_Illumina files, the two files come with different **Paired-end** values ('1' and '2') by default." } [/block] <div align="right"><a href="#top">top</a></div> ##Select a public workflow The next step is selecting a public workflow for running the analysis. We'll use the workflow, **Whole Exome Sequencing GATK 2.3.9.-lite**, which is based on the free version of the GATK tool developed by the Broad Institute. This workflow is one of the many open source workflows available to all Cavatica users. These workflows have been tested to run efficiently in the cloud environment by the Seven Bridges bioinformatics team. To select a public workflow for use in your project, navigate to **Apps** tab on your project dashboard and click **+Add App**. [block:image] { "images": [ { "image": [ "https://files.readme.io/bddf1be-QS_add_apps.png", "QS_add apps.png", 621, 410, "#ebecf2" ], "border": true } ] } [/block] To add the Whole Exome Sequencing workflow: 1. Type 'whole exome' into the search box. The Whole Exome Sequencing GATK 2.3.9.-lite will be displayed in the search results. . 2. Next, click **Copy **below the workflow. [block:image] { "images": [ { "image": [ "https://files.readme.io/bda436a-QS_add_apps2.png", "QS_add apps2.png", 1425, 612, "#5b9be7" ] } ] } [/block] 3. (Optional) Set the name of the workflow in your project. 4. Click **Copy** and the workflow will be added to your project. [block:image] { "images": [ { "image": [ "https://files.readme.io/248377e-QS_add_apps3.png", "QS_add apps3.png", 1424, 612, "#f4f5f5" ] } ] } [/block] To go back to the project dashboard, close the app browser window. <div align="right"><a href="#top">top</a></div> ##Edit the selected workflow In many cases, you might want to tweak a workflow to work better with your dataset. This can be done easily using the workflow editor. To edit your workflow in your project, navigate to the **Apps** tab and click the pencil icon next to **Whole Exome Analysis - BWA + GATK 2.3.9-lite**. [block:image] { "images": [ { "image": [ "https://files.readme.io/3347697-QS_edit_workflow.png", "QS_edit workflow.png", 1430, 339, "#313e55" ], "border": true } ] } [/block] This opens the workflow editor containing a graphical representation of the workflow where each tool, input, and reference file is represented as a node. To see a description of the workflow's function and other details such as toolkit name and version, tool author, and its license, you can click Additional Information. To the right of the workflow diagram, the panel labeled **APPS** displays a list of all the apps available in your projects (**MyApps**) or among **PublicApps**. The **PARAMS** panel describes the parameters of the tools used in this workflow and allows you to make quick edits. [block:image] { "images": [ { "image": [ "https://files.readme.io/c8e7d45-QS_editor_params.png", "QS_editor params.png", 1433, 702, "#d4c2cf" ], "border": true } ] } [/block] On the workflow editor, click the **BWA-MEM Bundle** node (see the screenshot below). This opens the **PARAMS** tab, which displays the parameters of **BWA-MEM Bundle** sorted into **Input/Output options**, **Scoring options**, **Execution**, etc. Let's find the parameter **use_soft_clipping **and select it. This will soft clip the supplementary alignments. To save this change as a new revision of the workflow, click **Save**. Note that clicking **Save** changes the version number from 0 to 1. This function allows you to keep track all your workflow edits. [block:image] { "images": [ { "image": [ "https://files.readme.io/04ce551-QS_editor_params2.png", "QS_editor params2.png", 1435, 692, "#314658" ], "border": true } ] } [/block] <div align="right"><a href="#top">top</a></div> ##Run the analysis Now that the workflow is ready, it's time to run the analysis. We'll click **Run**, in the upper right corner. The pop-up window with the suggested files for this workflow will be displayed. [block:callout] { "type": "info", "body": "For all public workflows on Cavatica, our team of bioinformaticians has chosen a set of recommended input files." } [/block] [block:image] { "images": [ { "image": [ "https://files.readme.io/92d4082-QS_suggested_files.png", "QS_suggested files.png", 648, 322, "#e3e4e5" ], "border": true } ] } [/block] Click **Copy** and the suggested files will be copied to your project and added as input files to our workflow. The files are mapped the following way. [block:parameters] { "data": { "h-0": "Input port", "h-1": "Input files", "0-0": "Known_SNPs\n\nKnown_Indels", "0-1": "dbsnp_137.b37.vcf\n\nMills_and_1000G_gold_standard.indels.b37.sites.vcf\n1000G_phase1.indels.b37.vcf", "1-0": "Target_BED", "1-1": "exome_targets.b37.bed", "2-0": "SnpEff_Database", "2-1": "snpEff_v3_6_GRCh37.75.zip", "4-0": "Reference or TAR with BWA reference indices", "4-1": "human_g1k_v37_decoy.fasta", "3-0": "FASTQ", "3-1": "C835.HCC1143.2.converted.pe_1.fastq\nC835.HCC1143.2.converted.pe_2.fastq", "h-2": "File type", "0-2": "**VCF** files contain databases of the known genetic variants - SNPs and indels.", "1-2": "**BED** files contain all target regions which are relevant for our analysis - in this case exomes. It points to the relevant locations of the FASTA file we are using for the analysis.", "2-2": "**ZIP file (snpEff) **is a specific build of the snpEff database which contains annotations of the genetic variants and their supposed effects.", "3-2": "**FASTQ** files contain the experiment data for our analysis i.e. they are the output of the high-throughput sequencing instruments; for the purpose of the QuickStart guide, we will use a pair of FASTQ files which represent one whole exome sample from the TCGA dataset", "4-2": "**FASTA** file is a reference genome which we will use for the alignment of the FASTQ files." }, "cols": 3, "rows": 5 } [/block] On the **DRAFT Task page** you will see the following sections under the **Task Inputs** tab: **Inputs** and **App Settings**, as shown in the screenshot below. The second tab shows various tool parameters that we've exposed by unlocking them in the workflow editor. We'll ignore these for now (for details, see the documentation on [tool settings](http://docs.sevenbridges.com/v1.0/docs/the-tool-editor). The section marked **Inputs** is where you can enter the input files and reference files for your workflow. [block:image] { "images": [ { "image": [ "https://files.readme.io/fe709d0-cavatica-quickstart-draft-task.png", "cavatica-quickstart-draft-task.png", 843, 555, "#425c71" ], "border": true } ] } [/block] The only remaining files you need to select are FASTQ files. Click **Select file(s)** and choose these files: * **C835.HCC1143.2.converted.pe_1.fastq** * **C835.HCC1143.2.converted.pe_2.fastq** The files will be batched by sample, meaning that files with the same Sample ID metadata field will be processed together in a separate task. In our case, the paired-end files we picked already had the Sample ID field set to the same value. After adding the two FASTQ files, we can start this execution by clicking **Run**. When you start the task, a new page opens displaying the task's properties. To see all the tasks that have run or are running in this project, click **Back to tasks** in the upper left corner. Here you can see the name of each task, the project member who started it, its initiation time, the execution workflow, its status, and available task actions. The status will be a progress bar if the task is still running or a label notifying whether the task has completed, been aborted or failed. Additional information, including how to check the status of the task or how to troubleshoot in case of the failed task, is available in the documentation on [task statistics](http://docs.sevenbridges.com/v1.0/docs/view-task-stats). <div align="right"><a href="#top">top</a></div> ##View the results of the data analysis Once the task is completed, you'll be notified via email. The easiest way to access results is to go to the **Tasks** tab. This shows all the information related to this particular execution. On the **Tasks** page, the column marked **Outputs** shows the results produced by the tools in the executed workflow. In our example task, take a look at **summary_metrics** report. Clicking on the file name opens the alignment metrics from the task. At the bottom of the screen you can see the task's raw output. The result of the data analysis is shown in the **raw VCF file**. The raw VCF contains all the variants detected by the workflow. To download it, just click on its filename. This will open a new page displaying the contents of the file and some information describing it. Then click **Download** in the upper right corner. [block:image] { "images": [ { "image": [ "https://files.readme.io/7b2d261-QS_outputs.png", "QS_outputs.png", 1398, 675, "#d6dada" ], "border": true } ] } [/block] [block:callout] { "type": "success", "body": "Note that the names of files outputted from a tool incorporate part of the tool's name. This makes it easier to find report files from a list of outputs." } [/block] That’s it! We've executed a data analysis and obtained some results. We encourage you to try this procedure for yourself before getting started on your own data analyses. You can also visit the [Seven Bridges Knowledge Center](http://docs.sevenbridges.com/v1.0) to learn more about the Platform capabilities and bringing your own tools, as well as the rest of the [Cavatica Knowledge Center](http://docs.sevenbridges.com/v1.0/docs/new-for-cavatica) to find out about Cavatica-specific features. <div align="right"><a href="#top">top</a></div> <hr><a name="tldr"></a> ###In a nutshell * **Create a project** to hold your analyses on Cavatica. * **Add files** to your project and supply their metadata to prepare them for analyses. Don't forget to add reference files! * **Add and edit a public workflow** (Whole Exome Analysis - BWA + GATK 2.3.9-Lite) to run your whole exome sequencing analysis. * **Set up your task** on the **DRAFT** task page by selecting inputs and reference files. * The **Outputs** page displays the results of your task. <hr> **Suggested pages:** [Troubleshoot a failed task](http://docs.sevenbridges.com/v1.0/docs/troubleshoot-a-failed-task) [API Quickstart](http://docs.sevenbridges.com/v1.0/docs/api-quickstart) [Seven Bridges Platform tutorials](http://docs.sevenbridges.com/v1.0/docs/seven-bridges-platform-tutorials)