Metadata manifest file format

Two different metadata manifest file formats are used on CAVATICA. Which metadata manifest file you should use depends on the functionality that you're using the manifest for.

Metadata manifest format for the Command Line Uploader

When uploading files to CAVATICA using the Command Line Uploader, a metadata manifest file can be used to define metadata values for those files. The supported file format for this manifest file is CSV, i.e. comma separated values. A CSV file contains a number of rows with columns which are separated with a comma.

The following rules apply:

RuleDescription
Line separationThe lines are separated with a line break, while the columns are separated using a comma.
First rowThe first row (manifest file header) has to contain column names which are treated as metadata keys (e.g. “sample”, “library”). Within the header, the first column must be File name, while the remaining columns are metadata keys. Also, please have in mind that metadata keys might be metadata schema or custom metadata keys.
Must include File nameIt's mandatory to have File name as the first column in the header.
First columnThe first column has to contain the names of the files which will be uploaded. In case the files are not in the same directory as the metadata manifest file, you should also include a path to the files (e.g. ../filename.fastq).
Subsequent columnsAll subsequent columns should contain metadata fields which will be assigned to the specified files.
Case sensitivityKeys and values are case sensitive.
Escape characterTo escape comma character in metadata key or value, enclose that field within double quotation marks (e.g. for setting specific key to md_value1,2 use "md_value 1,2"). Furthermore, to use double quote character in metadata key or value, enclose that field within double quotation marks and use two double quotation marks (e.g. for setting specific key to md_value"1" use "md_value""1""").
Maximum sizeThe maximum size for the metadata manifest file is 5 GiB.
Maximum number of key-value pairsMaximum number of key-value pairs per file is 1000, including null-value keys.
Keys and values encodingKeys and values are UTF-8 encoded strings.
Maximum key length100 bytes (UTF-8 encoding)
Maximum value length300 bytes (UTF-8 encoding)
Metadata for files onlyThere is no metadata for folders on CAVATICA. Any folder specified in the manifest file will be skipped and the metadata will not be set for a folder.

The following example shows the content of the metadata manifest for three files with three metadata fields.

File namesamplelibrarypaired_end
file1.fastqsample1examplelibrary11
file2.fastqsample1examplelibrary12
file3.fastqsample2examplelibrary21

Below is the same example in a comma separated format.

File name,sample,library,paired_end
file1.fastq,sample1,examplelibrary1,1
file2.fastq,sample1,examplelibrary1,2
file3.fastq,sample2,examplelibrary2,1

Metadata manifest format for modifying metadata via the visual interface

This manifest file is used In case you are modifying metadata for your project files via the visual interface.

The supported file formats for this manifest file are CSV and TSV. A CSV file contains a number of rows with columns which are separated with a comma, while the TSV file separates them with a tab.

The following rules apply:

RuleDescription
Line separationThe lines are separated with a line break, while the columns are separated using a comma.
Columns separatorThe columns are separated using a comma (CSV) or a tab (TSV) while the lines are separated with a line break.
First rowThe first row (manifest file header) has to contain column names which are treated as metadata keys (e.g. “sample”, “library”). Within the header, the first column must be either id or name, Next, there can be project and size columns which are system metadata fields and are treated as read-only (these fields are present if manifest file is generated using Export metadata to a manifest action). The remaining columns are metadata keys, which can be metadata schema or custom metadata keys. Please have in mind that the order of columns is important, e.g. id column (if present) must be the first one.
Must include id or name (along with project path)1. It’s mandatory to have either id or name column, and it’s allowed to have both of them.
2. If both id and name columns are present, then id will be used for identifying file, while the name will be omitted.
Name fieldThis field should also include file path within the project for files which are not in the project root, so it’s actually path + name. If "id" column is not present in the metadata manifest file, then this field will be used for identifying files whose metadata should be edited. If "id" is present in the metadata manifest file, then this field will be ignored, meaning it's not possible to change file name or file path using "Import metadata manifest" feature (situation which is the same as currently, although it might change in the future). This way, it's possible to edit metadata using manifest file without providing file IDs and providing only file names (along with path, in case when files are stored in folder instead of project root) instead of having to fetch file IDs.
Project fieldIt’s possible to have “project” column (e.g. if you have used the Export metadata manifest feature, then edited generated metadata manifest file and submitted it for import). If present it is treated as read-only field (i.e. it is not possible to either move or copy files from one project to the other using this feature), but validation should take place - if the specified “project” value is different than the current project (i.e. the project in which the Import metadata manifest feature has been used) then this file shouldn’t be edited and it should be counted towards files which are failed as part of this action. Have in mind that the “project” field is non-mandatory field in the manifest file.
Size fieldIt is possible to have the “size” column (e.g. you have used the Export metadata manifest feature, edited the generated metadata manifest file and submitted it for import). If present, it is treated as read-only field (i.e. it is not possible to change the file size using this feature, or any other feature). In addition, there should not be any validations including this field.
Metadata schema fields and custom metadata fieldsFollowing the aforementioned fields (there can be at a minimum one, and at the maximum four read-only fields) you can specify the metadata fields which should be edited. There must be at least one metadata column specified in the manifest file (otherwise the action will fail because there is no metadata field to be edited). The metadata schema fields are specified according to the documented metadata schema.
Empty rowsIt's allowed to have an empty row in the manifest file. Empty row will be skipped during manifest file processing.
Case sensitivityThe manifest file is case-sensitive.
Escape characterTo escape comma character in metadata key or value, enclose that field within double quotation marks (e.g. for setting specific key to md_value1,2 use "md_value 1,2"). Furthermore, to use double quote character in metadata key or value, enclose that field within double quotation marks and use two double quotation marks (e.g. for setting specific key to md_value"1" use "md_value""1""").
Maximum sizeThe maximum size for the metadata manifest file is 5 GiB.
Maximum number of rowsThe maximum number of rows for the metadata manifest file is 40,001, which corresponds to maximum number of files in the single project on the Platform (plus header row).
Maximum number of key-value pairsMaximum number of key-value pairs per file is 1000, including null-value keys.
Keys and values encodingKeys and values are UTF-8 encoded strings.
Maximum key length100 bytes (UTF-8 encoding)
Maximum value length300 bytes (UTF-8 encoding)

📘

Columns which are specified after id, name, size and project will be treated as metadata fields.

The following example shows a metadata manifest file which contains both id and name columns along with one metadata schema field (quality_scale, paired_end) and one custom metadata field (Read length). Please note that in this case the id field will be used to uniquely identify file within the project, while the name field will be ignored.

idnamequality_scalepaired_endRead length
581b298d20946e087b2ce503file1.fastqillumina18126
581b298d20946e087b2ce51ffile2.fastqillumina18226
581b298d20946e087b2ce50bfile3.fastqsolexa198

Below is the same example in a comma separated format.

id,name,quality_scale,paired_end,Read length
581b298d20946e087b2ce503,file1.fastq,illumina18,1,26
581b298d20946e087b2ce503,file2.fastq,illumina18,2,26
581b298d20946e087b2ce503,file3.fastq,solexa,1,98

The following example shows a metadata manifest file which contains the name column, one metadata schema field (case_id) and one custom metadata field (Donor ID). Please note that in this case the name field (containing file path along with the name of the file) will be used to uniquely identify the file within the project.

namecase_idDonor ID
StudyYYZ/file1.bamcid1731197
StudyYYZ/file2.bamcid3651198
StudyYYZ/file3.bamcid8821199
name,case_id,Donor ID
StudyYYZ/file1.bam,cid173,1197
StudyXYZ/file2.bam,cid365,1198
StudyXYZ/file3.bam,cid882,1199