{"_id":"57fd04804002550e004c047c","user":"5613e4f8fdd08f2b00437620","githubsync":"","version":{"_id":"5773dcfc255e820e00e1cd50","__v":26,"project":"5773dcfc255e820e00e1cd4d","createdAt":"2016-06-29T14:36:44.812Z","releaseDate":"2016-06-29T14:36:44.812Z","categories":["5773dcfc255e820e00e1cd51","5773df36904b0c0e00ef05ff","577baf92451b1e0e006075ac","577bb183b7ee4a0e007c4e8d","577ce77a1cf3cb0e0048e5ea","577d11865fd4de0e00cc3dab","578e62792c3c790e00937597","578f4fd98335ca0e006d5c84","578f5e5c3d04570e00976ebb","57bc35f7531e000e0075d118","57f801b3760f3a1700219ebb","5804d55d1642890f00803623","581c8d55c0dc651900aa9350","589dcf8ba8c63b3b00c3704f","594cebadd8a2f7001b0b53b2","59a562f46a5d8c00238e309a","5a2aa096e25025003c582b58","5a2e79566c771d003ca0acd4","5a3a5166142db90026f24007","5a3a52b5bcc254001c4bf152","5a3a574a2be213002675c6d2","5a3a66bb2be213002675cb73","5a3a6e4854faf60030b63159","5c8a68278e883901341de571","5cb9971e57bf020024523c7b","5cbf1683e2a36d01d5012ecd"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"57f801b3760f3a1700219ebb","version":"5773dcfc255e820e00e1cd50","__v":0,"project":"5773dcfc255e820e00e1cd4d","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-10-07T20:12:35.170Z","from_sync":false,"order":7,"slug":"browse-datasets","title":"Browse public datasets"},"__v":1,"project":"5773dcfc255e820e00e1cd4d","parentDoc":null,"metadata":{"title":"","description":"","image":[]},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-10-11T15:25:52.073Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":2,"body":"[block:callout]\n{\n  \"type\": \"warning\",\n  \"title\": \"On this page:\",\n  \"body\": \"* [Overview](#overview)\\n* [User responsibilities](#user-responsibilities)\\n* [Authenticate and access TCGA](#section-authenticate)\\n * [Step 1: Register for a CGC account](#register)\\n   * [Option 1: register using eRA Commons or NIH cit credentials](#option-1)\\n   * [Option 2: register for the CGC if you do not have eRA Commons credentials](#option-2)\\n * [Step 2: Connect your CGC account with your Cavatica account](#connect)\\n* [What type of TCGA data will I be able to access?](#what-can-i-access)\\n * [Open Data access](#open-data)\\n * [Controlled Data access](#controlled-data)\\n* [How can I get started?](#start)\"\n}\n[/block]\n<a name=\"overview\"></a>\n##Overview\n\nOverview\nThe Cancer Genome Atlas (TCGA) is made available on Cavatica through an integration with the [Seven Bridges Cancer Genomics Cloud (CGC)](http://www.cancergenomicscloud.org/). TCGA on Cavatica includes both Open and Controlled Data. While all data in TCGA is stripped of direct identifiers, DNA information is inherently unique to an individual. Two types of data access ‘tiers’ have been put in place to balance the desire to make the data as widely available as possible while ensuring that the rights of study participants are well protected. These two access tiers are described below.\n\nOpen Data includes information which is not unique to an individual. This includes information such as:\n  * De-identified clinical and demographic data\n  * Gene expression data\n  * Copy number alterations in regions of the genome\n  * Epigenetic data\n  * Summaries of data across individuals\n\nControlled Data includes information which is unique to an individual. This includes most raw data files and some processed data such as:\n  * Primary sequencing data (BAM and FASTQ files) from DNA, RNA, miRNA or bisulfite sequencing studies\n  * Raw and processed SNP6 array data\n  * Raw and processed Exon array data\n  * Somatic and germ-line mutation calls for an individual (VCF and MAF files)\n\nLearn about your [user responsibilities](#user-responsibilities) and how to [authenticate and access](#section-authenticate) TCGA data on Cavatica.\n\n<a name=\"user-responsibilities\"></a>\n##User responsibilities\n\nSeven Bridges is an [NIH Trusted Partner](https://gds.nih.gov/02dr2.html), and we've made data security a priority. In addition, **users are required to abide by their dbGaP data access requests and the [NIH Genomic Data User Code of Conduct](https://gds.nih.gov/pdf/Genomic_Data_User_Code_of_Conduct.pdf)**, the elements of which are reproduced below:\n1. Investigator(s) will use requested datasets solely in connection with the research project described in the approved Data Access Request for each dataset;\n2. Investigator(s) will make no attempt to identify or contact individual participants from whom these data were collected without appropriate approvals from the relevant IRBs;\n3. Investigator(s) will not distribute these data to any entity or individual beyond those specified in the approved Data Access Request;\n4. Investigator(s) will adhere to computer security practices that ensure that only authorized individuals can gain access to data files;\n5. Investigator(s) will not submit for publication or any other form of public dissemination analyses or other reports on work using or referencing NIH datasets prior to the embargo release date listed for the dataset (or dataset version) on dbGaP;\n6. Investigator(s) acknowledge the Intellectual Property Policies as specified in the Data Use Certification; and,\n7. Investigator(s) will report any inadvertent data release in accordance with the terms in the Data Use Certification, breach of data security, or other data management incidents contrary to the terms of data access. \n\nLearn more about [updating your Data Access Request](http://docs.sevenbridges.com/v1.0/page/access-dbgap-controlled-data-on-the-seven-bridges-platform) to list Seven Bridges as the Platform as a Service (PaaS) and include cloud use. For TCGA-specific documents, please refer to the [TCGA publication guidelines](http://cancergenome.nih.gov/publications/publicationguidelines) for point 5 above and the [TCGA Data Use Certifications](http://cancergenome.nih.gov/pdfs/Data_Use_Certv082014) for points 6 and 7 above.\n\n<a name=\"section-authenticate\"></a>\n##Authenticate and access TCGA\n\nAs TCGA on the Cavatica is available through an integration with the [Seven Bridges Cancer Genomics Cloud (CGC)](https://cgc-accounts.sbgenomics.com/auth/login), the CGC is the source for authenticating you with dbGaP and authorizing access to TCGA data. To access TCGA on Cavatica, you will first be directed to create an account on the Seven Bridges CGC. After registering for a CGC account, you can connect your CGC account to your Cavatica account to associate your CGC credentials.\n\n<a name=\"register\"></a>\n###Step 1: Register for a CGC account\n\nYou can sign up for the CGC using your (1) eRA Commons or NIH cit credentials or (2) your email address.\n\nNote that to access TCGA Controlled Data on the CGC, you need to register with eRA Commons or NIH cit credentials which have the appropriate data access permissions through dbGaP. If you don't log in with eRA Commons or NIH cit credentials, you will only be able to access TCGA Open Data.\n\nPlease read the following instructions carefully before registering for the CGC.\n  * [Option 1](#option-1): If you have an eRA Commons or NIH cit account, register using these credentials.\n  * [Option 2](#option-2): If you don't have an eRA Commons account, register for a CGC account with your email address.\n\n<a name=\"option-1\"></a>\n**Option 1: register using eRA Commons or NIH cit credentials**\n\nTo register for the CGC using your eRA Commons or NIH cit credentials:\n\n1. Navigate to the login page at https://cgc.sbgenomics.com/login/.\n2. On the left panel of the login page, click **LOGIN VIA ERA COMMONS** to access the external NIH iTrust site for authentication.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/bfa4eaa-Screen_Shot_2016-09-19_at_4.20.31_PM.png\",\n        \"Screen Shot 2016-09-19 at 4.20.31 PM.png\",\n        1666,\n        914,\n        \"#f1e8cb\"\n      ]\n    }\n  ]\n}\n[/block]\n3. To complete authentication, enter your eRA Commons or NIH cit username and password.\n4. To complete your registration, enter the additional information required by the CGC and click **PROCEED TO THE CGC PLATFORM**.\n[block:callout]\n{\n  \"type\": \"success\",\n  \"body\": \"We encourage you to read the [CGC Terms of Use](http://www.cancergenomicscloud.org/terms) and [TCGA Data Use policy](http://cancergenome.nih.gov/pdfs/Data_Use_Certv082014) carefully before using the CGC.\"\n}\n[/block]\n<a name=\"option-2\"></a>\n**Option 2: register for the CGC if you do not have eRA Commons credentials**\n\nIf you do not have eRA Commons credentials, create a CGC account using your email and a password of your choice. Note that accounts in this method will not have access to TCGA Controlled Data. Register using your eRA Commons or NIH cit credentials if you have approval to use TCGA Controlled Data.\n\nTo register with your email:\n\n1. Navigate to the login page at https://cgc.sbgenomics.com/login/.\n2. Click **Create a free account** below the LOGIN button on the right panel.\n3. Select **Register with good old email/password combo** provide the information requested.\n4. Check your email to confirm your registration.\n\n<a name=\"connect\"></a>\n###Step 2: Connect your CGC account with your Cavatica account\n\nOnce you've created a CGC account, you can connect your CGC account to your Cavatica account. Your CGC credentials will be associated with your Cavatica account, and you will be able to access TCGA data right away.\n\nTo connect your CGC account, first you must obtain your CGC authentication token:\n\n1. On the CGC, click your username in the upper right corner and choose **Developer** from the menu.\n    The Developer Hub is displayed. \n2. Click the **Auth token** tab.\n3. Click **Generate Token** to create your authentication token.\n4. Copy your authentication token to the clipboard. We'll be using this in a later step..\n\nNow that you have your CGC authentication token, you can connect your account as follows:\n\n1. On the Seven Bridges Platform, click your username in the upper right corner and choose **Account Settings** from the menu.\n2. Select the **Dataset access** tab from the menu on the left.\n3. Paste your CGC authentication token into the form and click **Connect accounts**. \n\nYour CGC account, along with your TCGA data access credentials, is now linked to your Platform account, as shown below. On this screen, you can also see the datasets available to you.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/9029ee5-Screen_Shot_2016-09-23_at_10.10.56_AM.jpeg\",\n        \"Screen Shot 2016-09-23 at 10.10.56 AM.jpeg\",\n        1166,\n        752,\n        \"#e1e2e4\"\n      ]\n    }\n  ]\n}\n[/block]\nNote that your CGC authentication token will expire every few months. At this point, you need to reconnect your CGC account to your Cavatica account by following steps 1 through 3 above.\n\n<a name=\"what-can-i-access\"></a>\n##What type of TCGA data will I be able to access?\n\nOnce you register for a CGC account, you'll have access to TCGA data based on your data access approval. TCGA data on Cavatica consists of Open Data and Closed Data.\n\n<a name=\"open-data\"></a>\n###Open Data access\n\nAll Cavatica users can access Open Data as soon as they create and connect their CGC account and agree to the [TCGA Data Use Certifications](http://cancergenome.nih.gov/pdfs/Data_Use_Certv082014) as well as the [TCGA publication guidelines](http://cancergenome.nih.gov/publications/publicationguidelines).\n\n<a name=\"controlled-data\"></a>\n###Controlled Data access\n\nResearchers requiring access to Controlled Data for their studies are required to obtain an approved Data Access Request through dbGaP and to agree to the TCGA Data Use Certifications](http://cancergenome.nih.gov/pdfs/Data_Use_Certv082014) as well as the [TCGA publication guidelines](http://cancergenome.nih.gov/publications/publicationguidelines).\n\nIf you are either a PI or a downloader in an approved dbGaP application, be sure to [list Seven Bridges as the Platform as a Service (PaaS) in your dbGaP application](http://docs.sevenbridges.com/v1.0/page/access-dbgap-controlled-data-on-the-seven-bridges-platform).\n\nLearn more from our documentation on the CGC Knowledge Center about[ TCGA Data](http://docs.cancergenomicscloud.org/docs/tcga-data) and [obtaining permissions to access TCGA data](http://docs.cancergenomicscloud.org/docs/tcga-data-access#section-can-i-access-the-data-i-need-on-the-cgc-right-away-).\n\n<a name=\"start\"></a>\n##How can I get started?\n\nTo start querying TCGA data right away, try using the [The Data Browser](doc:the-data-browser) . This interactive graphical interface allows you to build queries to filter data using various metadata attributes. You can then access these files for further analysis.\n\nTo access the Data Browser, click on **Data** on the top navigation bar and select **Data Browser**.","excerpt":"","slug":"tcga-data-access","type":"basic","title":"TCGA data access"}
[block:callout] { "type": "warning", "title": "On this page:", "body": "* [Overview](#overview)\n* [User responsibilities](#user-responsibilities)\n* [Authenticate and access TCGA](#section-authenticate)\n * [Step 1: Register for a CGC account](#register)\n * [Option 1: register using eRA Commons or NIH cit credentials](#option-1)\n * [Option 2: register for the CGC if you do not have eRA Commons credentials](#option-2)\n * [Step 2: Connect your CGC account with your Cavatica account](#connect)\n* [What type of TCGA data will I be able to access?](#what-can-i-access)\n * [Open Data access](#open-data)\n * [Controlled Data access](#controlled-data)\n* [How can I get started?](#start)" } [/block] <a name="overview"></a> ##Overview Overview The Cancer Genome Atlas (TCGA) is made available on Cavatica through an integration with the [Seven Bridges Cancer Genomics Cloud (CGC)](http://www.cancergenomicscloud.org/). TCGA on Cavatica includes both Open and Controlled Data. While all data in TCGA is stripped of direct identifiers, DNA information is inherently unique to an individual. Two types of data access ‘tiers’ have been put in place to balance the desire to make the data as widely available as possible while ensuring that the rights of study participants are well protected. These two access tiers are described below. Open Data includes information which is not unique to an individual. This includes information such as: * De-identified clinical and demographic data * Gene expression data * Copy number alterations in regions of the genome * Epigenetic data * Summaries of data across individuals Controlled Data includes information which is unique to an individual. This includes most raw data files and some processed data such as: * Primary sequencing data (BAM and FASTQ files) from DNA, RNA, miRNA or bisulfite sequencing studies * Raw and processed SNP6 array data * Raw and processed Exon array data * Somatic and germ-line mutation calls for an individual (VCF and MAF files) Learn about your [user responsibilities](#user-responsibilities) and how to [authenticate and access](#section-authenticate) TCGA data on Cavatica. <a name="user-responsibilities"></a> ##User responsibilities Seven Bridges is an [NIH Trusted Partner](https://gds.nih.gov/02dr2.html), and we've made data security a priority. In addition, **users are required to abide by their dbGaP data access requests and the [NIH Genomic Data User Code of Conduct](https://gds.nih.gov/pdf/Genomic_Data_User_Code_of_Conduct.pdf)**, the elements of which are reproduced below: 1. Investigator(s) will use requested datasets solely in connection with the research project described in the approved Data Access Request for each dataset; 2. Investigator(s) will make no attempt to identify or contact individual participants from whom these data were collected without appropriate approvals from the relevant IRBs; 3. Investigator(s) will not distribute these data to any entity or individual beyond those specified in the approved Data Access Request; 4. Investigator(s) will adhere to computer security practices that ensure that only authorized individuals can gain access to data files; 5. Investigator(s) will not submit for publication or any other form of public dissemination analyses or other reports on work using or referencing NIH datasets prior to the embargo release date listed for the dataset (or dataset version) on dbGaP; 6. Investigator(s) acknowledge the Intellectual Property Policies as specified in the Data Use Certification; and, 7. Investigator(s) will report any inadvertent data release in accordance with the terms in the Data Use Certification, breach of data security, or other data management incidents contrary to the terms of data access. Learn more about [updating your Data Access Request](http://docs.sevenbridges.com/v1.0/page/access-dbgap-controlled-data-on-the-seven-bridges-platform) to list Seven Bridges as the Platform as a Service (PaaS) and include cloud use. For TCGA-specific documents, please refer to the [TCGA publication guidelines](http://cancergenome.nih.gov/publications/publicationguidelines) for point 5 above and the [TCGA Data Use Certifications](http://cancergenome.nih.gov/pdfs/Data_Use_Certv082014) for points 6 and 7 above. <a name="section-authenticate"></a> ##Authenticate and access TCGA As TCGA on the Cavatica is available through an integration with the [Seven Bridges Cancer Genomics Cloud (CGC)](https://cgc-accounts.sbgenomics.com/auth/login), the CGC is the source for authenticating you with dbGaP and authorizing access to TCGA data. To access TCGA on Cavatica, you will first be directed to create an account on the Seven Bridges CGC. After registering for a CGC account, you can connect your CGC account to your Cavatica account to associate your CGC credentials. <a name="register"></a> ###Step 1: Register for a CGC account You can sign up for the CGC using your (1) eRA Commons or NIH cit credentials or (2) your email address. Note that to access TCGA Controlled Data on the CGC, you need to register with eRA Commons or NIH cit credentials which have the appropriate data access permissions through dbGaP. If you don't log in with eRA Commons or NIH cit credentials, you will only be able to access TCGA Open Data. Please read the following instructions carefully before registering for the CGC. * [Option 1](#option-1): If you have an eRA Commons or NIH cit account, register using these credentials. * [Option 2](#option-2): If you don't have an eRA Commons account, register for a CGC account with your email address. <a name="option-1"></a> **Option 1: register using eRA Commons or NIH cit credentials** To register for the CGC using your eRA Commons or NIH cit credentials: 1. Navigate to the login page at https://cgc.sbgenomics.com/login/. 2. On the left panel of the login page, click **LOGIN VIA ERA COMMONS** to access the external NIH iTrust site for authentication. [block:image] { "images": [ { "image": [ "https://files.readme.io/bfa4eaa-Screen_Shot_2016-09-19_at_4.20.31_PM.png", "Screen Shot 2016-09-19 at 4.20.31 PM.png", 1666, 914, "#f1e8cb" ] } ] } [/block] 3. To complete authentication, enter your eRA Commons or NIH cit username and password. 4. To complete your registration, enter the additional information required by the CGC and click **PROCEED TO THE CGC PLATFORM**. [block:callout] { "type": "success", "body": "We encourage you to read the [CGC Terms of Use](http://www.cancergenomicscloud.org/terms) and [TCGA Data Use policy](http://cancergenome.nih.gov/pdfs/Data_Use_Certv082014) carefully before using the CGC." } [/block] <a name="option-2"></a> **Option 2: register for the CGC if you do not have eRA Commons credentials** If you do not have eRA Commons credentials, create a CGC account using your email and a password of your choice. Note that accounts in this method will not have access to TCGA Controlled Data. Register using your eRA Commons or NIH cit credentials if you have approval to use TCGA Controlled Data. To register with your email: 1. Navigate to the login page at https://cgc.sbgenomics.com/login/. 2. Click **Create a free account** below the LOGIN button on the right panel. 3. Select **Register with good old email/password combo** provide the information requested. 4. Check your email to confirm your registration. <a name="connect"></a> ###Step 2: Connect your CGC account with your Cavatica account Once you've created a CGC account, you can connect your CGC account to your Cavatica account. Your CGC credentials will be associated with your Cavatica account, and you will be able to access TCGA data right away. To connect your CGC account, first you must obtain your CGC authentication token: 1. On the CGC, click your username in the upper right corner and choose **Developer** from the menu. The Developer Hub is displayed. 2. Click the **Auth token** tab. 3. Click **Generate Token** to create your authentication token. 4. Copy your authentication token to the clipboard. We'll be using this in a later step.. Now that you have your CGC authentication token, you can connect your account as follows: 1. On the Seven Bridges Platform, click your username in the upper right corner and choose **Account Settings** from the menu. 2. Select the **Dataset access** tab from the menu on the left. 3. Paste your CGC authentication token into the form and click **Connect accounts**. Your CGC account, along with your TCGA data access credentials, is now linked to your Platform account, as shown below. On this screen, you can also see the datasets available to you. [block:image] { "images": [ { "image": [ "https://files.readme.io/9029ee5-Screen_Shot_2016-09-23_at_10.10.56_AM.jpeg", "Screen Shot 2016-09-23 at 10.10.56 AM.jpeg", 1166, 752, "#e1e2e4" ] } ] } [/block] Note that your CGC authentication token will expire every few months. At this point, you need to reconnect your CGC account to your Cavatica account by following steps 1 through 3 above. <a name="what-can-i-access"></a> ##What type of TCGA data will I be able to access? Once you register for a CGC account, you'll have access to TCGA data based on your data access approval. TCGA data on Cavatica consists of Open Data and Closed Data. <a name="open-data"></a> ###Open Data access All Cavatica users can access Open Data as soon as they create and connect their CGC account and agree to the [TCGA Data Use Certifications](http://cancergenome.nih.gov/pdfs/Data_Use_Certv082014) as well as the [TCGA publication guidelines](http://cancergenome.nih.gov/publications/publicationguidelines). <a name="controlled-data"></a> ###Controlled Data access Researchers requiring access to Controlled Data for their studies are required to obtain an approved Data Access Request through dbGaP and to agree to the TCGA Data Use Certifications](http://cancergenome.nih.gov/pdfs/Data_Use_Certv082014) as well as the [TCGA publication guidelines](http://cancergenome.nih.gov/publications/publicationguidelines). If you are either a PI or a downloader in an approved dbGaP application, be sure to [list Seven Bridges as the Platform as a Service (PaaS) in your dbGaP application](http://docs.sevenbridges.com/v1.0/page/access-dbgap-controlled-data-on-the-seven-bridges-platform). Learn more from our documentation on the CGC Knowledge Center about[ TCGA Data](http://docs.cancergenomicscloud.org/docs/tcga-data) and [obtaining permissions to access TCGA data](http://docs.cancergenomicscloud.org/docs/tcga-data-access#section-can-i-access-the-data-i-need-on-the-cgc-right-away-). <a name="start"></a> ##How can I get started? To start querying TCGA data right away, try using the [The Data Browser](doc:the-data-browser) . This interactive graphical interface allows you to build queries to filter data using various metadata attributes. You can then access these files for further analysis. To access the Data Browser, click on **Data** on the top navigation bar and select **Data Browser**.