{"_id":"5cb99731e397b00038d054ae","project":"5773dcfc255e820e00e1cd4d","version":{"_id":"5773dcfc255e820e00e1cd50","__v":26,"project":"5773dcfc255e820e00e1cd4d","createdAt":"2016-06-29T14:36:44.812Z","releaseDate":"2016-06-29T14:36:44.812Z","categories":["5773dcfc255e820e00e1cd51","5773df36904b0c0e00ef05ff","577baf92451b1e0e006075ac","577bb183b7ee4a0e007c4e8d","577ce77a1cf3cb0e0048e5ea","577d11865fd4de0e00cc3dab","578e62792c3c790e00937597","578f4fd98335ca0e006d5c84","578f5e5c3d04570e00976ebb","57bc35f7531e000e0075d118","57f801b3760f3a1700219ebb","5804d55d1642890f00803623","581c8d55c0dc651900aa9350","589dcf8ba8c63b3b00c3704f","594cebadd8a2f7001b0b53b2","59a562f46a5d8c00238e309a","5a2aa096e25025003c582b58","5a2e79566c771d003ca0acd4","5a3a5166142db90026f24007","5a3a52b5bcc254001c4bf152","5a3a574a2be213002675c6d2","5a3a66bb2be213002675cb73","5a3a6e4854faf60030b63159","5c8a68278e883901341de571","5cb9971e57bf020024523c7b","5cbf1683e2a36d01d5012ecd"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"5cb9971e57bf020024523c7b","project":"5773dcfc255e820e00e1cd4d","version":"5773dcfc255e820e00e1cd50","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2019-04-19T09:38:38.103Z","from_sync":false,"order":4,"slug":"set-metadata-associated-with-a-file","title":"Set metadata associated with a file"},"user":"566590c83889610d0008a253","__v":0,"parentDoc":null,"metadata":{"title":"","description":"","image":[]},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2019-04-19T09:38:57.109Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":0,"body":"Metadata associated with files makes them searchable, keeping your file collection manageable as it grows. It also enables them to be properly grouped for analyses. \n\n##Overview\nOne of the most common reasons for failed tasks is missing or improper metadata. Lack of proper metadata increases the chances that tools and workflows will fail to run correctly. We recommend that you set the metadata fields for your input files before executing analyses on Cavatica.  \n\nThere are three ways to enter or change file metadata:\n\n  * [Using the visual interface](doc:set-metadata-using-the-visual-interface)\n  * [Using the command line uploader](doc:set-metadata-using-the-command-line-uploader) \n  * [Using the API](doc:set-metadata-using-the-api)\n\nWe have 14 metadata fields associated with each file on Cavatica. These are subdivided into three categories ([**File**](#section-file), [**Sample**](#section-sample), and [**General**](#section-general)). The recommended practice is to enter as much metadata as possible when you first upload files to Cavatica.\n\nFor instance, for raw sequencing files, you should enter **Cavatica** (**sequencing platform**) and** Sample ID**. Of these fields, there are seven metadata fields that we highly suggest you set for your data.\n\nWhile your tasks may run correctly without them, these metadata fields will help optimize your analyses. These fields are labeled in the table below with a red <span style=\"color:red\"><b>suggested</b></span> tag in the **Name** column.\n\nSee the tables below for more details about the metadata fields.\n\n##Metadata categories\n\n###File\nIn the following table, you will find the name, description, and values of metadata fields for **File**. The second column, **API key**, allows you to access the specified metadata field through the API. Learn more about [accessing metadata via the API](files).\n\nThere are six metadata fields that we highly suggest you set for your data. While your tasks may run correctly without them, these metadata fields will help optimize your analyses. These fields are labeled in the table below with a red <span style=\"color:red\"><b>suggested</b></span> tag in the **Name** column.\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"Name\",\n    \"h-1\": \"API key\",\n    \"h-2\": \"Description\",\n    \"h-3\": \"Values\",\n    \"0-0\": \"**Library ID**\\n<span style=\\\"color:red\\\"><b>suggested</b></span>\",\n    \"1-0\": \"**Platform**\\n<span style=\\\"color:red\\\"><b>suggested</b></span>\",\n    \"2-0\": \"**Platform unit ID**\\n<span style=\\\"color:red\\\"><b>suggested</b></span>\",\n    \"3-0\": \"**Paired end**\\n<span style=\\\"color:red\\\"><b>suggested</b></span>\",\n    \"4-0\": \"**File segment number**\\n<span style=\\\"color:red\\\"><b>suggested</b></span>\",\n    \"5-0\": \"<a name=\\\"qualscale\\\"></a>**Quality scale**\\n<span style=\\\"color:red\\\"><b>suggested</b></span>\",\n    \"6-0\": \"**Experimental strategy**\",\n    \"7-0\": \"**Reference genome**\",\n    \"0-1\": \"**library_id**\",\n    \"1-1\": \"**platform**\",\n    \"2-1\": \"**platform_unit_id**\",\n    \"3-1\": \"**paired_end**\",\n    \"4-1\": \"**file_segment_number**\",\n    \"5-1\": \"**quality_scale**\",\n    \"6-1\": \"**experimental_strategy**\",\n    \"7-1\": \"**reference_genome**\",\n    \"0-2\": \"This is an identifier for the sequencing library preparation.\\n\\nThe value set in this field does not affect whether or not the workflow runs successfully. However, all files that come from the same sequencing library must have the same value.\\n\\nThe **Library ID** will be written to the read group tag (:::at:::RG:LB) in SAM or BAM files. All aligner apps are programmed to add RG fields to the aligned BAM according to the **Library ID**.\",\n    \"1-2\": \"This is the version (manufacturer, model, etc.) of the technology that was used sequencing or assaying.\\n\\nOnly some tools and workflows may require a value for the **Platform** field. However, it is recommended that you set it whenever possible, unless you are certain that your workflow will work without it.\",\n    \"2-2\": \"This is an identifier for lanes (Illumina), or for slides (SOLiD) in the case that a library was split and ran over multiple lanes on the flow cell or slides. The **Platform unit ID** refers to the lane ID or the slide ID.\\n\\nThe value supplied in the **Platform unit ID** field will be written to the read group tag (@RG:PU) in SAM or BAM files. All aligner apps add read group fields to the aligned BAM file on the basis of **Platform unit ID** metadata.\",\n    \"3-2\": \"For paired-end sequencing, this value determines the end of the fragment sequenced.\\n\\nFor paired-end read files, this field indicates whether the read file is left end or right end. Set ‘1’ for left end and ‘2’ for right end reads. This is used to group pairs. Please keep in mind that the data type of these values is **string**. If the FASTQ file is a single-end read this field should be left as ‘-’.\\n\\n<span style=\\\"color:gray\\\"><i><b>Note</b></i>: It is important for two members of paired-end reads to have identical <b>Sample ID</b>, <b>Library ID</b>, <b>Platform unit ID</b>, and <b>File segment number</b>.</span>\",\n    \"4-2\": \"If the sequencing reads for a single library, sample and lane are divided into multiple (smaller) files, the File segment number is used to enumerate these. Otherwise, this field can be left blank.\\n\\nThis information can be used for batching when processing files with a workflow.\",\n    \"5-2\": \"For raw reads, this value denotes the sequencing technology and quality format. For BAM and SAM files, this value should always be ‘Sanger’.\\n\\nEnter this value for all FASTQ files, unless they are used in a workflow with a FASTQ quality scale detector wrapper.\",\n    \"6-2\": \"This is the method or protocol used to perform the laboratory analysis.\",\n    \"7-2\": \"The reference assembly (such as HG19 or GRCh37) to which the nucleotide sequence of a case can be aligned.\",\n    \"0-3\": \"This takes a string.\",\n    \"1-3\": \"This takes a string.\\nSuggested values:\\n  * Illumina HiSeq\\n  * Illumina GA\\n  * ABI capillary sequencer\\n  * Illumina MiSeq\\n  * ABI SOLiD\\n  * Ion Torrent PGM\\n  * LS 454\\n  * Illumina HiSeq X Ten\\n  * Illumina\\n  * Helicos\\n  * PacBio\\n  * Not available\",\n    \"2-3\": \"This takes a string.\",\n    \"3-3\": \"This takes a value of '1' or '2'. Please keep in mind that the data type of these values is **string**.\\n**Note: **For single-end sequencing, the field should be left as '-'.\",\n    \"4-3\": \"This takes an integer.\",\n    \"5-3\": \"Choose from one of the following options:\\n  * sanger\\n  * llumina13\\n  * illumina15\\n  * illumina18\\n  * solexa\\nOr, enter no value.\",\n    \"6-3\": \"This takes a string.\\nSuggested values:\\n  * DNA-Seq\\n  * WXS\\n  * WGS\\n  * Amplicon\\n  * Bisulfite-Seq\\n  * RNA-Seq\\n  * miRNA-Seq\\n  * Total RNA-Seq\\n  * Not available\",\n    \"7-3\": \"This takes a string.\\nSuggested values:\\n  * human_g1k_v37\\n  * human_g1k_v37_decoy\\n  * ucsc.hg19\\n  * Homo_sapiens.Ensembl.GRCh37\\n  * Homo_sapiens.GRCh38.dna.primary_assembly\\n  * ion_torrent.hg19\\n  * mouse_mm9_ucsc\\n  * ens_mouse_mm9_genome\\n  * mouse_mm10_ucsc\"\n  },\n  \"cols\": 4,\n  \"rows\": 8\n}\n[/block]\n###Sample\nIn the following table, you will find the name, description, and values of metadata fields for **Sample**. The second column,** API key**, allows you to access the specified metadata field through the API. Learn more about [accessing metadata via the API](files).\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"Name\",\n    \"h-1\": \"API key\",\n    \"h-2\": \"Description\",\n    \"h-3\": \"Value\",\n    \"0-0\": \"**Sample ID**\\n<span style=\\\"color:red\\\"><b>suggested</b></span>\",\n    \"1-0\": \"**Sample type**\",\n    \"0-1\": \"**sample_id**\",\n    \"1-1\": \"**sample_type**\",\n    \"0-2\": \"A human readable identifier for a sample or specimen, which could contain some metadata information. A sample or specimen is material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes, including but not limited to tissues, body fluids, cells, organs, embryos, body excretory products, etc.\\n\\nTools use **Sample ID** to separate files that come from different samples.\\nFor SAM and BAM files, the value supplied in the **Sample ID** field is written to the read group tag (@RG:SM). All aligners add read group fields to the aligned BAM file using the file’s **Sample ID** metadata.\",\n    \"1-2\": \"The type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc.\",\n    \"0-3\": \"This takes a string.\",\n    \"1-3\": \"This takes a string.\\nSuggested values:\\n  * Blood normal\\n  * Tumor tissue\\n  * Normal tissue\\n  * Primary cells\\n  * Stem cells\\n  * Embryo\\n  * Cell Line\\n  * Saliva\\n  * Control\\n  * Not available\"\n  },\n  \"cols\": 4,\n  \"rows\": 2\n}\n[/block]\n###General\nIn the following table, you will find the name, description, and values of metadata fields for **General**. The second column, **API key**, allows you to access the specified metadata field through the API. Learn more about [accessing metadata via the API](files).\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"Name\",\n    \"h-1\": \"API key\",\n    \"h-2\": \"Description\",\n    \"h-3\": \"Value\",\n    \"0-0\": \"**Investigation**\",\n    \"1-0\": \"**Species**\",\n    \"2-0\": \"**Batch number**\",\n    \"3-0\": \"**Case ID**\",\n    \"0-1\": \"**investigation**\",\n    \"1-1\": \"**species**\",\n    \"2-1\": \"**batch_number**\",\n    \"3-1\": \"**case_id**\",\n    \"0-2\": \"A value denoting the project or study that generated the data.\",\n    \"1-2\": \"A group of organisms having some common characteristic or qualities, that differ from all other groups of organisms and that are capable of breeding and producing fertile offspring.\",\n    \"2-2\": \"This is an assigned distinctive alpha-numeric identification code that signifies grouping.\",\n    \"3-2\": \"This is a human-readable identifier, such as a number or a string for a subject who has taken part in the investigation or study.\",\n    \"0-3\": \"This takes a string.\",\n    \"1-3\": \"This takes a string.\\nSuggested values:\\n  * Homo sapiens\\n  * Mus musculus\",\n    \"2-3\": \"This takes an integer.\",\n    \"3-3\": \"This takes a string.\"\n  },\n  \"cols\": 4,\n  \"rows\": 4\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"success\",\n  \"body\": \"Apart from the standard set of metadata fields that can be seen through the visual interface, custom metadata fields can be added via the [command line uploader](doc:set-metadata-using-the-command-line-uploader) or via the [API](doc:set-metadata-using-the-api).\"\n}\n[/block]\n##Grouping and distinguishing files by metadata\n\nSome apps group and process files based on their metadata. In particular, apps that can process multiple files in parallel will do so in groups of files that have the same metadata value(s) for certain metadata fields. These fields are: **Sample ID**,** Library ID**, **Platform unit ID**, and **File segment number**.\n\nThere is a prioritization rule governing how files can be grouped on the basis of metadata: files are grouped by their value for a metadata field only if all values hierarchically above that given field match. Files are grouped following the hierarchy listed below:\n\n1. Sample ID\n2. Library ID\n3. Platform unit ID\n4. File segment number\n\nFor example, files batched by **Library** will first be sorted by **Sample ID **with a secondary sort by **Library ID**. Consider the files described in the table below:\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"File name\",\n    \"h-1\": \"Sample ID\",\n    \"h-2\": \"Library ID\",\n    \"0-0\": \"File 1\",\n    \"1-0\": \"File 2\",\n    \"2-0\": \"File 3\",\n    \"3-0\": \"File 4\",\n    \"4-0\": \"File 5\",\n    \"5-0\": \"File 6\",\n    \"0-1\": \"A\",\n    \"1-1\": \"A\",\n    \"2-1\": \"B\",\n    \"3-1\": \"B\",\n    \"4-1\": \"B\",\n    \"5-1\": \"B\",\n    \"0-2\": \"1\",\n    \"1-2\": \"1\",\n    \"2-2\": \"2\",\n    \"3-2\": \"3\",\n    \"4-2\": \"2\",\n    \"5-2\": \"3\"\n  },\n  \"cols\": 3,\n  \"rows\": 6\n}\n[/block]\nThese six files will be sorted into 3 groups:\n \n  * **Sample ID** \"A\" with **Library ID** \"1\" (File 1 and File 2)\n  * **Sample ID** \"B\" with **Library ID** \"2\" (File 3 and File 5)\n  * **Sample ID** \"B\" with **Library ID** \"3\" (File 4 and 6)\n\nLearn more about [grouping files in batch analysis](doc:about-batch-analyses).\n\nMetadata is also used to distinguish files. The rule governing this is that no two files can have the same metadata values for all fields. To use a common example, paired end read FASTQ files will have the the same metadata values for the fields **Sample ID** through to **File segment number**, in the list above. However, they must take different values for the **Paired end** field.  \n\nRemember, it is important for two members of paired-end reads to have identical **Sample ID**, **Library ID**, **Platform unit ID**, and **File segment number**.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"Metadata can often be built into the filename. For instance, files produced by Illumina sequencers follow the naming schema: <sample name>_<barcode sequence>_L<lane>_R<read number>_<set number>.fastq.gz\\n\\nConsider the following two paired end files generated by an Illumina sequencer:\\n\\n  * NA10831_ATCACG_L002_R1_001.fastq.gz\\n  * NA10831_ATCACG_L002_R2_001.fastq.gz\\n\\nIn this example, we can read off some metadata values from the filename: the **Sample ID** of both files is 'NA10831', their **Platform unit ID **value is 002, and their **Paired end** values are 1 and 2 respectively. These values can then be inputted into the Platform.\"\n}\n[/block]\n<hr>\n\n**Suggested pages:**\n\n[Set metadata using the visual interface](doc:set-metadata-using-the-visual-interface) \n[Set metadata using the command line uploader](doc:set-metadata-using-the-command-line-uploader) \n[Set metadata using the API](doc:set-metadata-using-the-api)","excerpt":"","slug":"metadata-on-cavatica","type":"basic","title":"Metadata on Cavatica"}

Metadata on Cavatica


Metadata associated with files makes them searchable, keeping your file collection manageable as it grows. It also enables them to be properly grouped for analyses. ##Overview One of the most common reasons for failed tasks is missing or improper metadata. Lack of proper metadata increases the chances that tools and workflows will fail to run correctly. We recommend that you set the metadata fields for your input files before executing analyses on Cavatica. There are three ways to enter or change file metadata: * [Using the visual interface](doc:set-metadata-using-the-visual-interface) * [Using the command line uploader](doc:set-metadata-using-the-command-line-uploader) * [Using the API](doc:set-metadata-using-the-api) We have 14 metadata fields associated with each file on Cavatica. These are subdivided into three categories ([**File**](#section-file), [**Sample**](#section-sample), and [**General**](#section-general)). The recommended practice is to enter as much metadata as possible when you first upload files to Cavatica. For instance, for raw sequencing files, you should enter **Cavatica** (**sequencing platform**) and** Sample ID**. Of these fields, there are seven metadata fields that we highly suggest you set for your data. While your tasks may run correctly without them, these metadata fields will help optimize your analyses. These fields are labeled in the table below with a red <span style="color:red"><b>suggested</b></span> tag in the **Name** column. See the tables below for more details about the metadata fields. ##Metadata categories ###File In the following table, you will find the name, description, and values of metadata fields for **File**. The second column, **API key**, allows you to access the specified metadata field through the API. Learn more about [accessing metadata via the API](files). There are six metadata fields that we highly suggest you set for your data. While your tasks may run correctly without them, these metadata fields will help optimize your analyses. These fields are labeled in the table below with a red <span style="color:red"><b>suggested</b></span> tag in the **Name** column. [block:parameters] { "data": { "h-0": "Name", "h-1": "API key", "h-2": "Description", "h-3": "Values", "0-0": "**Library ID**\n<span style=\"color:red\"><b>suggested</b></span>", "1-0": "**Platform**\n<span style=\"color:red\"><b>suggested</b></span>", "2-0": "**Platform unit ID**\n<span style=\"color:red\"><b>suggested</b></span>", "3-0": "**Paired end**\n<span style=\"color:red\"><b>suggested</b></span>", "4-0": "**File segment number**\n<span style=\"color:red\"><b>suggested</b></span>", "5-0": "<a name=\"qualscale\"></a>**Quality scale**\n<span style=\"color:red\"><b>suggested</b></span>", "6-0": "**Experimental strategy**", "7-0": "**Reference genome**", "0-1": "**library_id**", "1-1": "**platform**", "2-1": "**platform_unit_id**", "3-1": "**paired_end**", "4-1": "**file_segment_number**", "5-1": "**quality_scale**", "6-1": "**experimental_strategy**", "7-1": "**reference_genome**", "0-2": "This is an identifier for the sequencing library preparation.\n\nThe value set in this field does not affect whether or not the workflow runs successfully. However, all files that come from the same sequencing library must have the same value.\n\nThe **Library ID** will be written to the read group tag (@RG:LB) in SAM or BAM files. All aligner apps are programmed to add RG fields to the aligned BAM according to the **Library ID**.", "1-2": "This is the version (manufacturer, model, etc.) of the technology that was used sequencing or assaying.\n\nOnly some tools and workflows may require a value for the **Platform** field. However, it is recommended that you set it whenever possible, unless you are certain that your workflow will work without it.", "2-2": "This is an identifier for lanes (Illumina), or for slides (SOLiD) in the case that a library was split and ran over multiple lanes on the flow cell or slides. The **Platform unit ID** refers to the lane ID or the slide ID.\n\nThe value supplied in the **Platform unit ID** field will be written to the read group tag (@RG:PU) in SAM or BAM files. All aligner apps add read group fields to the aligned BAM file on the basis of **Platform unit ID** metadata.", "3-2": "For paired-end sequencing, this value determines the end of the fragment sequenced.\n\nFor paired-end read files, this field indicates whether the read file is left end or right end. Set ‘1’ for left end and ‘2’ for right end reads. This is used to group pairs. Please keep in mind that the data type of these values is **string**. If the FASTQ file is a single-end read this field should be left as ‘-’.\n\n<span style=\"color:gray\"><i><b>Note</b></i>: It is important for two members of paired-end reads to have identical <b>Sample ID</b>, <b>Library ID</b>, <b>Platform unit ID</b>, and <b>File segment number</b>.</span>", "4-2": "If the sequencing reads for a single library, sample and lane are divided into multiple (smaller) files, the File segment number is used to enumerate these. Otherwise, this field can be left blank.\n\nThis information can be used for batching when processing files with a workflow.", "5-2": "For raw reads, this value denotes the sequencing technology and quality format. For BAM and SAM files, this value should always be ‘Sanger’.\n\nEnter this value for all FASTQ files, unless they are used in a workflow with a FASTQ quality scale detector wrapper.", "6-2": "This is the method or protocol used to perform the laboratory analysis.", "7-2": "The reference assembly (such as HG19 or GRCh37) to which the nucleotide sequence of a case can be aligned.", "0-3": "This takes a string.", "1-3": "This takes a string.\nSuggested values:\n * Illumina HiSeq\n * Illumina GA\n * ABI capillary sequencer\n * Illumina MiSeq\n * ABI SOLiD\n * Ion Torrent PGM\n * LS 454\n * Illumina HiSeq X Ten\n * Illumina\n * Helicos\n * PacBio\n * Not available", "2-3": "This takes a string.", "3-3": "This takes a value of '1' or '2'. Please keep in mind that the data type of these values is **string**.\n**Note: **For single-end sequencing, the field should be left as '-'.", "4-3": "This takes an integer.", "5-3": "Choose from one of the following options:\n * sanger\n * llumina13\n * illumina15\n * illumina18\n * solexa\nOr, enter no value.", "6-3": "This takes a string.\nSuggested values:\n * DNA-Seq\n * WXS\n * WGS\n * Amplicon\n * Bisulfite-Seq\n * RNA-Seq\n * miRNA-Seq\n * Total RNA-Seq\n * Not available", "7-3": "This takes a string.\nSuggested values:\n * human_g1k_v37\n * human_g1k_v37_decoy\n * ucsc.hg19\n * Homo_sapiens.Ensembl.GRCh37\n * Homo_sapiens.GRCh38.dna.primary_assembly\n * ion_torrent.hg19\n * mouse_mm9_ucsc\n * ens_mouse_mm9_genome\n * mouse_mm10_ucsc" }, "cols": 4, "rows": 8 } [/block] ###Sample In the following table, you will find the name, description, and values of metadata fields for **Sample**. The second column,** API key**, allows you to access the specified metadata field through the API. Learn more about [accessing metadata via the API](files). [block:parameters] { "data": { "h-0": "Name", "h-1": "API key", "h-2": "Description", "h-3": "Value", "0-0": "**Sample ID**\n<span style=\"color:red\"><b>suggested</b></span>", "1-0": "**Sample type**", "0-1": "**sample_id**", "1-1": "**sample_type**", "0-2": "A human readable identifier for a sample or specimen, which could contain some metadata information. A sample or specimen is material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes, including but not limited to tissues, body fluids, cells, organs, embryos, body excretory products, etc.\n\nTools use **Sample ID** to separate files that come from different samples.\nFor SAM and BAM files, the value supplied in the **Sample ID** field is written to the read group tag (@RG:SM). All aligners add read group fields to the aligned BAM file using the file’s **Sample ID** metadata.", "1-2": "The type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc.", "0-3": "This takes a string.", "1-3": "This takes a string.\nSuggested values:\n * Blood normal\n * Tumor tissue\n * Normal tissue\n * Primary cells\n * Stem cells\n * Embryo\n * Cell Line\n * Saliva\n * Control\n * Not available" }, "cols": 4, "rows": 2 } [/block] ###General In the following table, you will find the name, description, and values of metadata fields for **General**. The second column, **API key**, allows you to access the specified metadata field through the API. Learn more about [accessing metadata via the API](files). [block:parameters] { "data": { "h-0": "Name", "h-1": "API key", "h-2": "Description", "h-3": "Value", "0-0": "**Investigation**", "1-0": "**Species**", "2-0": "**Batch number**", "3-0": "**Case ID**", "0-1": "**investigation**", "1-1": "**species**", "2-1": "**batch_number**", "3-1": "**case_id**", "0-2": "A value denoting the project or study that generated the data.", "1-2": "A group of organisms having some common characteristic or qualities, that differ from all other groups of organisms and that are capable of breeding and producing fertile offspring.", "2-2": "This is an assigned distinctive alpha-numeric identification code that signifies grouping.", "3-2": "This is a human-readable identifier, such as a number or a string for a subject who has taken part in the investigation or study.", "0-3": "This takes a string.", "1-3": "This takes a string.\nSuggested values:\n * Homo sapiens\n * Mus musculus", "2-3": "This takes an integer.", "3-3": "This takes a string." }, "cols": 4, "rows": 4 } [/block] [block:callout] { "type": "success", "body": "Apart from the standard set of metadata fields that can be seen through the visual interface, custom metadata fields can be added via the [command line uploader](doc:set-metadata-using-the-command-line-uploader) or via the [API](doc:set-metadata-using-the-api)." } [/block] ##Grouping and distinguishing files by metadata Some apps group and process files based on their metadata. In particular, apps that can process multiple files in parallel will do so in groups of files that have the same metadata value(s) for certain metadata fields. These fields are: **Sample ID**,** Library ID**, **Platform unit ID**, and **File segment number**. There is a prioritization rule governing how files can be grouped on the basis of metadata: files are grouped by their value for a metadata field only if all values hierarchically above that given field match. Files are grouped following the hierarchy listed below: 1. Sample ID 2. Library ID 3. Platform unit ID 4. File segment number For example, files batched by **Library** will first be sorted by **Sample ID **with a secondary sort by **Library ID**. Consider the files described in the table below: [block:parameters] { "data": { "h-0": "File name", "h-1": "Sample ID", "h-2": "Library ID", "0-0": "File 1", "1-0": "File 2", "2-0": "File 3", "3-0": "File 4", "4-0": "File 5", "5-0": "File 6", "0-1": "A", "1-1": "A", "2-1": "B", "3-1": "B", "4-1": "B", "5-1": "B", "0-2": "1", "1-2": "1", "2-2": "2", "3-2": "3", "4-2": "2", "5-2": "3" }, "cols": 3, "rows": 6 } [/block] These six files will be sorted into 3 groups: * **Sample ID** "A" with **Library ID** "1" (File 1 and File 2) * **Sample ID** "B" with **Library ID** "2" (File 3 and File 5) * **Sample ID** "B" with **Library ID** "3" (File 4 and 6) Learn more about [grouping files in batch analysis](doc:about-batch-analyses). Metadata is also used to distinguish files. The rule governing this is that no two files can have the same metadata values for all fields. To use a common example, paired end read FASTQ files will have the the same metadata values for the fields **Sample ID** through to **File segment number**, in the list above. However, they must take different values for the **Paired end** field. Remember, it is important for two members of paired-end reads to have identical **Sample ID**, **Library ID**, **Platform unit ID**, and **File segment number**. [block:callout] { "type": "info", "body": "Metadata can often be built into the filename. For instance, files produced by Illumina sequencers follow the naming schema: <sample name>_<barcode sequence>_L<lane>_R<read number>_<set number>.fastq.gz\n\nConsider the following two paired end files generated by an Illumina sequencer:\n\n * NA10831_ATCACG_L002_R1_001.fastq.gz\n * NA10831_ATCACG_L002_R2_001.fastq.gz\n\nIn this example, we can read off some metadata values from the filename: the **Sample ID** of both files is 'NA10831', their **Platform unit ID **value is 002, and their **Paired end** values are 1 and 2 respectively. These values can then be inputted into the Platform." } [/block] <hr> **Suggested pages:** [Set metadata using the visual interface](doc:set-metadata-using-the-visual-interface) [Set metadata using the command line uploader](doc:set-metadata-using-the-command-line-uploader) [Set metadata using the API](doc:set-metadata-using-the-api)