CCLE metadata


On this page:

Metadata is data that describes other data. On this page, we've detailed CCLE metadata that are available for viewing and filtering Cancer Cell Line Encyclopedia (CCLE) data in the Data Browser on CAVATICA. The CCLE contains Open Access sequencing data in the form of reads aligned to the hg19 reference genome for nearly 1000 cancer cell line samples, as available from cgHub on May 11, 2016.

CCLE metadata on CAVATICA consist of entities and their properties.

Entities are particular resources with UUIDs.

Properties can either describe an entity or relate that entity to another entity. For instance, properties include an entity's vital status, gender, data format, or experimental strategy.

Entities for CCLE include:

  • CCLE Cell line, which represents data generated for each cell line. Dependent elements include biospecimen data such as Sample and clinical data such as Investigation.
  • Aliquot
  • File

Below, each of these three entities is followed by a table of their related properties.

CCLE Cell line

The CCLE Cell line entity represents cell lines, which are a permanently established cell culture that will proliferate indefinitely given appropriate fresh medium and space. The CCLE Cell line entity contains these cell lines' clinical and biospecimen data. See the table below for clinical and biospecimen properties and descriptions of CCLE Cell line.

IDA human-readable identifier, such as a number or a string that may contain information about the entity. This identifier is often referred as submitter ID.
ProgramThe research program under which the data was generated. See NCI Thesaurus Code: C82662.
InvestigationA value denoting the project or study that generated the data. See NCI Thesaurus Code: C41198.
GenderThe collection of behaviors and attitudes that distinguish people on the basis of the societal roles expected for the two sexes. See NCI Thesaurus Code: C17357.
Disease typeThe type of the disease or condition studied. See NCI Thesaurus Code: C2991.
Disease type abbreviationAn acronymn or initials for the disease or condition studied. See NCI Thesaurus Code: C2991.
Primary siteThe anatomical site where the primary tumor is located in the organism. See NCI Thesaurus Code: C43761.
Histologic diagnosisDiagnosis of a disease based on the type of tissue, where type is determined based on the microscopic examination of tissue. See NCI Thesaurus Code: C61478.
HistologyThe study of the structure of the cells and their arrangements to constitute tissues and the association among these to form organs. In pathology, the microscopic process of identifying normal and abnormal morphologic characteristics in tissues, by employing various cytochemical and immunocytochemical stains. See NCI Thesaurus Code: C16681.
NoteA brief written record which provides information on cell line relations. For instance, notes mention if two cell lines come from the same patient. See NCI Thesaurus Code: C42619.
Sample nameA specific name given to material taken from a biological entity for testing, diagnosis, propagation,treatment, or research purposes, including but not limited to tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713.
Sample typeThe type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713.
Sample type codeCode that determines the type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713.
SourceCommercial vendors or academic labs that the cell lines were obtained from.


The aliquot entity in the CCLE metadata schema refers to aliquots, products or units extracted from a sample or specimen 's portion and prepared for analysis. Members of the aliquot entity can be identified by a Universally Unique Identifier (UUID). See below for metadata properties and descriptions relating to the aliquot entity.

IDA human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID.


The file entity in the CCLE metadata schema refers to the files in CCLE produced by aliquot analyses. See below for metadata properties and descriptions relating to the file entity.

Analyte typeDefines the type of an analyte on molecular bases.
File sizeSize of a file measured in bytes (B), kilobytes (KB), megabytes (MB), gigabytes (GB), terabytes (TB), and larger values.
Data formatThe type of format that determines data content.
Experimental strategyThe method or protocol used to perform the laboratory analysis. See NCI Thesaurus Code: C43622.
PlatformThe version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying. See NCI Thesaurus Code: C45378.
Data submitting centerThis field takes a string denoting the name of the center that has submitted data.
Data submitting center codeAlphanumerical values assigned to the center that has submitted the data.
Last modified dateDate the file was last modified.
Published dateDate the file was published.
Storage pathThe storage path of the file
Reference genomeThe reference assembly (such as HG19 or GRCh37) to which the nucleotide sequence of a case can be aligned.
Access levelA boolean value indicating Controlled Data or Open Data. Controlled Data is data from public datasets that has limitations on use and requires approval by dbGaP. Open Data is data from public datasets that doesn't have limitations on its use.
Submitter IDAnalytical identification assigned by the center that submitted the data.