Metadata schema
These are subdivided into three categories (File, Sample, and General). The recommended practice is to enter as much metadata as possible when you first upload files to CAVATICA. For instance, for raw sequencing files, you should enter Platform (sequencing platform) and Sample ID. Of these fields, there are seven metadata fields that we highly suggest you set for your data. While your tasks may run correctly without them, these metadata fields will help optimize your analyses. These fields are labeled in the table below with a suggested tag in the Name column.
Please keep in mind the fields have to be specified exactly as listed in the tables below under the Name column. This means that if the field is not listed exactly as in the table, CAVATICA will interpret it is a custom metadata field (see below).
File
In the following table, you will find the name, description, and values of metadata fields for File. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.
There are six metadata fields that we highly suggest you set for your data. While your tasks may run correctly without them, these metadata fields will help optimize your analyses. These fields are labeled in the table below with a red suggested tag in the Name column.
Sample
In the following table, you will find the name, description, and values of metadata fields for Sample. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.
Name | API key | Description | Value |
---|---|---|---|
Sample ID suggested | sample_id | A human readable identifier for a sample or specimen, which could contain some metadata information. A sample or specimen is material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes, including but not limited to tissues, body fluids, cells, organs, embryos, body excretory products, etc. Tools use Sample ID to separate files that come from different samples. For SAM and BAM files, the value supplied in the Sample ID field is written to the read group tag (@RG:SM). All aligners add read group fields to the aligned BAM file using the file’s Sample ID metadata. | This takes a string. |
Sample type | sample_type | The type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. | This takes a string. Suggested values: Blood normal Tumor tissue Normal tissue Primary cells Stem cells Embryo Cell Line Saliva Control Not available |
General
In the following table, you will find the name, description, and values of metadata fields for General. The second column, API key, allows you to access the specified metadata field through the API. Learn more about accessing metadata via the API.
Name | API key | Description | Value |
---|---|---|---|
Investigation | investigation | A value denoting the project or study that generated the data. | This takes a string. |
Species | species | A group of organisms having some common characteristic or qualities, that differ from all other groups of organisms and that are capable of breeding and producing fertile offspring. | This takes a string. Suggested values: Homo sapiens Mus musculus |
Batch number | batch_number | This is an assigned distinctive alpha-numeric identification code that signifies grouping. | This takes a string. |
Case ID | case_id | This is a human-readable identifier, such as a number or a string for a subject who has taken part in the investigation or study. | This takes a string. |
Apart from the standard set of metadata fields that can be seen through the visual interface, custom metadata fields can be added via the command line uploader or via the API.
Updated less than a minute ago