Archive files (AWS)

Storing large amounts of data in the cloud is a convenient way to have it available for computation. Instant availability is, however, most always our primary concern.

Sometimes we have to deal with "cold" data – files that are not required for processing and have a very low chance of being accessed over a longer time period, but must nonetheless remain available for compliance with local and federal laws, best practice guidelines and internal processes – such as input and output files belonging to completed analyses.

Data that will not be used for some time (typically over three months) can be moved into archival storage. Archived files are billed at a significantly reduced price compared to the data which is always available. This makes archival a good solution for infrequently accessed files.

📘
Amazon Glacier
CAVATICA offers Amazon Glacier as the archiving back-end, specifically S3 Glacier Flexible Retrieval storage class. For up-to-date pricing information please refer to the official pricing plans at Amazon Glacier.

Cost Savings

Storing data in an archive typically costs around a third as much as storing data that is always available.

As with all other costs for user-uploaded data hosted on the Platform, CAVATICA passes the charges that we incur for archiving data directly to the customer.

📘
In June 2020, Amazon's S3 in the US East region charged $0.021-0.023 per gigabyte of standard storage data per month. A gigabyte of data stored in Amazon's archiving facility, Glacier, in the same region was billed at $0.004 per GB per month. Keeping data in Glacier rather than S3 would yield monthly savings of approximately 80%.

In addition to data hosting charges, Amazon Glacier may charge additional archival, restoration or early deletion fees. If you incur these additional costs, then we will pass them on to you.

However, if archival storage is accessed infrequently over a number of months, these charges should not be expected to affect the projected cost savings significantly.

Limitations of Archiving

Moving data to and from archival storage is not instantaneous and could take anywhere from several hours, up to a day or more to archive or restore large files.

When archived, files can not be used as inputs to the tasks, downloaded, visualized in the Genome Browser nor can their content be obtained in any way. Archived files must first be restored.

When a file is restored, it remains available for 7 days. The reason is that 7 days should be enough for you to download files or re-run an analysis in case you need to verify previous results. After this period the file is automatically archived in order to reduce costs.

Next: Archive a file (2 of 4) >>