{"_id":"5cab5746d327da000e23e501","project":"5773dcfc255e820e00e1cd4d","version":{"_id":"5773dcfc255e820e00e1cd50","__v":26,"project":"5773dcfc255e820e00e1cd4d","createdAt":"2016-06-29T14:36:44.812Z","releaseDate":"2016-06-29T14:36:44.812Z","categories":["5773dcfc255e820e00e1cd51","5773df36904b0c0e00ef05ff","577baf92451b1e0e006075ac","577bb183b7ee4a0e007c4e8d","577ce77a1cf3cb0e0048e5ea","577d11865fd4de0e00cc3dab","578e62792c3c790e00937597","578f4fd98335ca0e006d5c84","578f5e5c3d04570e00976ebb","57bc35f7531e000e0075d118","57f801b3760f3a1700219ebb","5804d55d1642890f00803623","581c8d55c0dc651900aa9350","589dcf8ba8c63b3b00c3704f","594cebadd8a2f7001b0b53b2","59a562f46a5d8c00238e309a","5a2aa096e25025003c582b58","5a2e79566c771d003ca0acd4","5a3a5166142db90026f24007","5a3a52b5bcc254001c4bf152","5a3a574a2be213002675c6d2","5a3a66bb2be213002675cb73","5a3a6e4854faf60030b63159","5c8a68278e883901341de571","5cb9971e57bf020024523c7b","5cbf1683e2a36d01d5012ecd"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"5a3a52b5bcc254001c4bf152","project":"5773dcfc255e820e00e1cd4d","version":"5773dcfc255e820e00e1cd50","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2017-12-20T12:08:21.441Z","from_sync":false,"order":8,"slug":"run-an-analysis","title":"Run an analysis"},"user":"5767bc73bb15f40e00a28777","__v":0,"parentDoc":null,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2019-04-08T14:14:30.258Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":3,"body":"[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"Memoization is currently in the beta stage. Please report any issues to [support:::at:::sbgenomics.com](mailto:support@sbgenomics.com).\"\n}\n[/block]\nBy letting Cavatica reuse already existing outputs of your previous runs, you can achieve significant time and cost optimization of your workload.\n\nIf memoization is enabled, tasks will use pre-calculated results, instead of generating new ones. This, however, relies on the existence of [intermediate files](#section-intermediate-files). Specifically, reuse of previous task results will be possible for the duration of that task’s intermediate files retention.\n\nMemoization takes place at [job](doc:about-task-execution#section-jobs) level. Specifically, if a job has already been executed, and a job  with the same app and same inputs is scheduled, even in a completely different task, the new job will not be executed but instead will return the existing outputs of the original execution, provided that memoization is enabled and the outputs are saved for a sufficient period of time.\n\nOnce memoization is triggered and job outputs are reused in a new context, appropriate job workspace directory, with all intermediate files, will also be created (files are going to be copied, so you are not going to be charged twice for the same intermediate files) for the new job, thus providing you with the option to view logs and relevant files from the original job (available via the **View task logs** option in **View stats & logs**).\n\n### Intermediate files\nIntermediate files are files that are created during the course of job execution, but are not reported as task outputs. Apps are usually configured to pick up and display only relevant outputs to the user, while often also creating, but not reporting, files that are created as intermediate products. These files are, however, crucial for later executions of the jobs that are consuming/producing them, as the memoization mechanism will reuse them, instead of having to execute those jobs again.\n\nCavatica saves intermediate files for 24 hours by default, but this option can be changed in [Project Settings](https://docs.sevenbridges.com/docs/project-settings). The minimum value is 1 hour, maximum is 120 hours (5 days). Once this setting is changed, the new value will be applied to tasks executed after the change.\n\n### Task Stats\nWhen a job within a task reuses outputs from a previous run of the same job, it will not be visible on the task stats timeline. The example below shows a workflow that is executed again, with the same inputs as a previously completed run of the same workflow. The banner above the timeline indicates that jobs have reused already available outputs instead of being executed to produce them again.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/233e13f-cgc-memoization-1.png\",\n        \"cgc-memoization-1.png\",\n        1202,\n        680,\n        \"#f0f1f3\"\n      ]\n    }\n  ]\n}\n[/block]\n### Task Logs\nTask Logs will also indicate those jobs that used adequate precomputed outputs instead of being executed all over again. Such jobs will be clearly labelled by the icon and tooltip (available on hover), as shown in the image below. Please note that logs are also not regenerated, so the ones you are seeing are actually copies of the logs generated the first time when the job was executed using the same inputs.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/8896f77-memoization-2.png\",\n        \"memoization-2.png\",\n        1202,\n        731,\n        \"#f7f8f8\"\n      ]\n    }\n  ]\n}\n[/block]\n### Important considerations\n* **Folders on Inputs**: Memoization will not work if a task has folders set up as its inputs or outputs. As we are currently not tracking folder content, we therefore cannot guarantee that inputs are the same.\n* **Non-Deterministic Tools**: Be careful with non-deterministic tools. If you need stochastic results for a non-deterministic tool with the same inputs, you should turn off memoization.\n* **Tools with dynamic inputs/outputs**: If your tool dynamically pulls inputs and pushes outputs from/to an external source, (i.e. the files are not explicitly set as inputs or outputs in the CWL app), you should turn off memoization.\n\n## Activate Memoization for a project\n[Only project administrators](https://docs.sevenbridges.com/docs/project-settings) can activate Memoization within a project. Memoization can be activated while [creating a project](https://docs.sevenbridges.com/docs/create-a-project), or subsequently within project settings following the procedure below:\n\n1. Go to your project dashboard.\n2. Click the **Settings** tab.\n3. Under **Execution settings** switch **Memoization** to **On**.\n\n## Activate Memoization for a draft task\nPlease note that settings at task level override project-level settings.\n\n1. [Create a draft task](doc:run-a-task-1#section--1-create-draft-task).\n2. On the draft task page, switch **Memoization** to **On** under the **Execution Settings** tab.\n\n## Memoization control via the API\nThe `use_memoization` parameter provides control over enabling or disabling Memoization in projects and tasks, while the `intermediate_files` parameter specifies the period of availability of intermediate files on Cavatica and can be used at project level only. Please note that project-level settings can be changed only by project administrators.\n\nMemoization is a part of the following API calls:\n\n**Project** (both `use_memoization` and `intermediate_files` parameters are available):\n  * [Create a new project](doc:create-a-new-project)\n  * [Edit a project](doc:edit-a-project)\n  * [Get details of a project](doc:get-details-of-a-project)\n\n**Task** (only `use_memoization` is available):\n  * [Create a new draft task](doc:create-a-new-task)\n  * [Get details of a task](doc:get-details-of-a-task)\n  * [Modify a task](doc:modify-a-task)","excerpt":"","slug":"about-memoization","type":"basic","title":"About Memoization"}
[block:callout] { "type": "info", "body": "Memoization is currently in the beta stage. Please report any issues to [support@sbgenomics.com](mailto:support@sbgenomics.com)." } [/block] By letting Cavatica reuse already existing outputs of your previous runs, you can achieve significant time and cost optimization of your workload. If memoization is enabled, tasks will use pre-calculated results, instead of generating new ones. This, however, relies on the existence of [intermediate files](#section-intermediate-files). Specifically, reuse of previous task results will be possible for the duration of that task’s intermediate files retention. Memoization takes place at [job](doc:about-task-execution#section-jobs) level. Specifically, if a job has already been executed, and a job with the same app and same inputs is scheduled, even in a completely different task, the new job will not be executed but instead will return the existing outputs of the original execution, provided that memoization is enabled and the outputs are saved for a sufficient period of time. Once memoization is triggered and job outputs are reused in a new context, appropriate job workspace directory, with all intermediate files, will also be created (files are going to be copied, so you are not going to be charged twice for the same intermediate files) for the new job, thus providing you with the option to view logs and relevant files from the original job (available via the **View task logs** option in **View stats & logs**). ### Intermediate files Intermediate files are files that are created during the course of job execution, but are not reported as task outputs. Apps are usually configured to pick up and display only relevant outputs to the user, while often also creating, but not reporting, files that are created as intermediate products. These files are, however, crucial for later executions of the jobs that are consuming/producing them, as the memoization mechanism will reuse them, instead of having to execute those jobs again. Cavatica saves intermediate files for 24 hours by default, but this option can be changed in [Project Settings](https://docs.sevenbridges.com/docs/project-settings). The minimum value is 1 hour, maximum is 120 hours (5 days). Once this setting is changed, the new value will be applied to tasks executed after the change. ### Task Stats When a job within a task reuses outputs from a previous run of the same job, it will not be visible on the task stats timeline. The example below shows a workflow that is executed again, with the same inputs as a previously completed run of the same workflow. The banner above the timeline indicates that jobs have reused already available outputs instead of being executed to produce them again. [block:image] { "images": [ { "image": [ "https://files.readme.io/233e13f-cgc-memoization-1.png", "cgc-memoization-1.png", 1202, 680, "#f0f1f3" ] } ] } [/block] ### Task Logs Task Logs will also indicate those jobs that used adequate precomputed outputs instead of being executed all over again. Such jobs will be clearly labelled by the icon and tooltip (available on hover), as shown in the image below. Please note that logs are also not regenerated, so the ones you are seeing are actually copies of the logs generated the first time when the job was executed using the same inputs. [block:image] { "images": [ { "image": [ "https://files.readme.io/8896f77-memoization-2.png", "memoization-2.png", 1202, 731, "#f7f8f8" ] } ] } [/block] ### Important considerations * **Folders on Inputs**: Memoization will not work if a task has folders set up as its inputs or outputs. As we are currently not tracking folder content, we therefore cannot guarantee that inputs are the same. * **Non-Deterministic Tools**: Be careful with non-deterministic tools. If you need stochastic results for a non-deterministic tool with the same inputs, you should turn off memoization. * **Tools with dynamic inputs/outputs**: If your tool dynamically pulls inputs and pushes outputs from/to an external source, (i.e. the files are not explicitly set as inputs or outputs in the CWL app), you should turn off memoization. ## Activate Memoization for a project [Only project administrators](https://docs.sevenbridges.com/docs/project-settings) can activate Memoization within a project. Memoization can be activated while [creating a project](https://docs.sevenbridges.com/docs/create-a-project), or subsequently within project settings following the procedure below: 1. Go to your project dashboard. 2. Click the **Settings** tab. 3. Under **Execution settings** switch **Memoization** to **On**. ## Activate Memoization for a draft task Please note that settings at task level override project-level settings. 1. [Create a draft task](doc:run-a-task-1#section--1-create-draft-task). 2. On the draft task page, switch **Memoization** to **On** under the **Execution Settings** tab. ## Memoization control via the API The `use_memoization` parameter provides control over enabling or disabling Memoization in projects and tasks, while the `intermediate_files` parameter specifies the period of availability of intermediate files on Cavatica and can be used at project level only. Please note that project-level settings can be changed only by project administrators. Memoization is a part of the following API calls: **Project** (both `use_memoization` and `intermediate_files` parameters are available): * [Create a new project](doc:create-a-new-project) * [Edit a project](doc:edit-a-project) * [Get details of a project](doc:get-details-of-a-project) **Task** (only `use_memoization` is available): * [Create a new draft task](doc:create-a-new-task) * [Get details of a task](doc:get-details-of-a-task) * [Modify a task](doc:modify-a-task)