Create and upload your Docker image with a Dockerfile

Overview

Dockerfiles are text files that store the commands you would execute on the command line inside a container to create a Docker image. When using Dockerfiles, the process of building an image is automated as Docker reads the commands (instructions) from a Dockerfile and executes them in succession in order to create the final image.

The benefit of Dockerfiles is that they store the whole procedure on how an image is created. They are also significant as they help facilitate and automate the process of maintaining tools that are wrapped for use on CAVATICA. Specifically, Dockerfiles can contain instructions to install the required dependencies into the container that is loaded from the base image, add the required tool and tool-related files from its repository to the container, and install the tool into the container. This means that when changes are made to a tool and it needs to be wrapped for use on the Platform again, the image containing the tool can be built automatically based on the Dockerfile with no changes or only minor changes made to the Dockerfile itself.

Format

A Dockerfile consists of two kind of items: instructions followed by arguments and comments. The basic Dockerfile format is shown below:

# Comment
INSTRUCTION arguments

Instructions are not case-sensitive, but are usually written in uppercase so that they can be differentiated from arguments more easily. Comments have the hash symbol (#) at the beginning of the line. However, if the same symbol is located anywhere else in a line, the line will not be treated as a comment.

An example of an instruction is shown below:

FROM ubuntu

This instruction and argument assign the ubuntu image as the base image that you will build upon.

Usage

This section will present some of the most common instructions used in Dockerfiles and the way in which they are usually used when wrapping tools for use on CAVATICA. For a full list of instructions and all of their possible formats and uses, please refer to the official Dockerfile reference.

FROM
Docker runs instructions in the order in which they are listed in the Dockerfile. The first instruction and the first non-comment line in a Dockerfile must be FROM in order to specify the base image from which you will start building your new image. The instruction is entered in the following format:

FROM <image>

FROM <image>:<tag>

FROM <image>@<digest>

The <tag> part of the argument is used to specify a version of the image. This means that the instruction FROM ubuntu:14.04 will automatically load the latest available version of Ubuntu 14.04 as the base image. On the other hand, <digest> is more specific as it is used to refer to an exact image which might not be the latest available version. For example, if you want to use a specific version of the Ubuntu 14.04 image which is not the latest available one, the instruction would be, for example:

FROM ubuntu@sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2

The <image> argument is mandatory when using the FROM instruction, while <tag> or <digest> are optional. If they are not specified, the assumed tag will be :latest and the latest available version of the base image will be used.

If you want to use the ubuntu base image, your Dockerfile has to start with the following instruction:

FROM ubuntu

Learn more about the FROM instruction.

LABEL
The LABEL instruction is not mandatory, but is highly suggested as it adds useful metadata to an image, including image maintainer, so that authors can be contacted for information and support. A LABEL is a key-value pair entered in the following format

LABEL <key>=<value>

Here is an example of a LABEL instruction that also includes maintainer info:

# Set maintainer
LABEL description='Dockerfile for Python 2.7. and Sambamba 0.6.6' \
maintainer='Rosalind Franklin, Seven Bridges, <[email protected]>' \

As shown above, you can also use backslashes \ to span a single instruction across multiple lines.

RUN
The RUN instruction is the main executing instruction in a Dockerfile. This instruction is used in the following form:

RUN <command>

The RUN instruction executes the command that is provided as its argument. The results of execution are then committed to the current image and the resulting image is used for the next instruction listed in the Dockerfile. The example below shows how to use the RUN instruction to pull a specific version of SAMTools from its repository:

RUN wget https://github.com/samtools/samtools/releases/download/1.2/samtools-1.2.tar.bz2

You can also chain multiple commands within the same RUN instruction:

RUN wget https://github.com/samtools/samtools/releases/download/1.2/samtools-1.2.tar.bz2 \
    && tar jxf samtools-1.2.tar.bz2 \
    && cd samtools-1.2 \
    && make \
    && make install

This code block shows how commands are chained as an argument to a single RUN instruction. This instruction uses:

the && connective to chain commands (indicates that a command will be executed only if the execution of the previous command succeeds),
the \ character (denotes a line break).

Learn more about the RUN instruction.

CMD
The CMD instruction is used to execute a command. However, unlike RUN which is executed during the build process, the purpose of CMD is to provide the default command which is executed inside the container when it is created based on the image. This instruction can be used in the following format:

CMD ["command","param1","param2"]

Alternatively, you can also use the shell form of the instruction:

CMD command param1 param2 ...

If there is more than one CMD instruction in a Dockerfile, only the last one is executed.
Containers intended for use on CAVATICA have the following CMD instruction as that is how the container is invoked during execution of a task:

CMD ["/bin/bash"]

If you specify an additional argument after docker run <image>, the command specified as the argument will override any command set within the CMD instruction in the Dockerfile.

Learn more about the CMD instruction.

ADD
The ADD instruction is used to copy files, directories or remote file URLs from their original location <source> and to the container at the specified path <destination>. The ADD instruction has the following format:

ADD <source>...<destination>

You can specify multiple <source> items. If those are files or folders, they must be located within the context of the build. The context of a build can either be on your local file system (the directory where you execute the command to build the image based on the Dockerfile) or it can be a URL (location of a Git repository, for example).

You can specify only those source paths that are within the context directory (including subdirectories), but not paths like ../directory/subdirectory. The <destination> argument can either be an absolute path or a relative one.

The <source> parameter can also take wildcards in file names, for example:

ADD sample?.txt /tmp/

This will add the files named e.g. sample1.txt, sample2.txt, etc. in the /tmp/ folder inside the container.

The basic allowed wildcards match those described on the glob page. Learn more about pattern matching in Dockerfiles.

The following rules also apply to the ADD instruction:

If <source> is a URL and <destination> does not end with a slash, then the file is downloaded from the URL and its contents are copied to <destination>. For example:

ADD http://domain.com/sourcefile.txt /tmp/destfile

This would save the entire contents of sourcefile.txt in the /tmp/ folder within the container, as a file named destfile.

If <source> is a URL and <destination> ends with a slash, then the file keeps its original name and is downloaded to <destination>/<filename>. For instance:

ADD http://domain.com/file.txt /tmp/translate/

This would create file.txt in the /tmp/translate/ folder. For this instruction to be executed properly, the URL must point to the exact file.
If authentication is required to obtain the file from the URL, you will need to use an appropriate tool with the RUN instruction, since the ADD instruction does not support authentication.

If <source> is a local TAR archive in a recognized compression format (identity, gzip, bzip2 or xz), it will be unpacked in the destination directory. However, archives from remote sources (URLs) will not be unpacked.

The rules below apply both to the ADD and [COPY instruction](#copy):

If <source> is a directory, the entire contents of the directory are copied (but not the directory itself).
If <destination> does not exist, it is created. This also applies to all missing directories in the <destination> path.
If multiple <source> items are specified (either explicitly or due to using a wildcard) then <destination> must be a directory and the path must end with a slash /.
If <destination> does not end with a slash, the contents of <source> will be written to <destination>. For example, if the specified instruction is ADD sourcefile.txt /containertmp/destfile, the entire contents of sourcefile.txt will be saved in the /containertmp/ folder within the container as a file named destfile.

Learn more about the ADD instruction.

COPY
There are two major differences between ADD and COPY:

ADD can also take a URL as <source>.
If the <source> parameter of the ADD instruction is an archive in a recognized compression format, it will be unpacked. However, the COPY instruction will only copy the archive file, without unpacking it.

The COPY instruction is used to copy files or directories to the container at the specified path. COPY has the following format:

COPY <source>...<destination>

You can specify multiple source items. The items must be located within the context of the build - you are able to specify only those source paths that are within the context directory (including subdirectories), but not paths like ../directory/subdirectory. The <destination> argument can either be an absolute path or a relative one.

The <source> parameter can also take wildcards in file names, for example:

COPY sample*.txt /script/

This will copy the files named e.g. sample12.txt, sampleabc.txt, sample_new.txt etc. to the /script/ folder inside the container.

The basic allowed wildcards match those described on the glob page. Learn more about pattern matching in Dockerfiles.

The COPY instruction shares a set of rules related to <source> and <destination> paths with the ADD instruction. Click here to see those rules.

Learn more about the COPY instruction.

ENV
The ENV instruction is used to set the environment variable(s). These variables consist of key-value pairs which can be accessed within the container. This instruction has the following form:

ENV <key> <value>

ENV <key>=<value>

The second form of the instruction allows you to add multiple key-value pairs in the same instruction, by separating the pairs with a space:

ENV <key>=<value> <key>=<value> ...

Environment variables declared through the ENV instruction can also be used as variables by other instructions. For example:

ENV PATH /code/tmp/
COPY . $PATH

When used as a variable by another instruction, environment variables are written as either $variablename or ${variablename}.

WORKDIR
The WORKDIR instruction is used to set the default working directory for the container. Instructions such as ADD, COPY, RUN or CMD that are entered after the WORKDIR instruction in a Dockerfile will be executed in the defined working directory.

👍
You should always use the WORKDIR instruction instead of RUN cd /directory/subdirectory/... to set the working directory.

This instruction can be used multiple times in a Dockerfile. If the directory that is specified as WORKDIR does not exist, it will be created.
You are also able to use variables previously set in the ENV instruction as arguments for WORKDIR. For example:

ENV PATH /app
WORKDIR $PATH

Dockerfiles for images that are intended for use on CAVATICA should include the following WORKDIR instruction:

WORKDIR /

Dockerfile sample

The following code represents a sample Dockerfile:

#! Dockerfile for installing Java 1.9 and Python 2.7. and Sambamba 0.6.6 !#
  
# Pull base image.
FROM ubuntu:18.04
  
# Set maintainer.
LABEL  description=’Dockerfile for installing Java 1.8 and Python 2.7. and Sambamba 0.6.6’ \
maintainer=’Rosalind Franklin, Seven Bridges Genomics Inc., <[email protected]>’ \
  
# Define the commonly used JAVA_HOME variable.
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
  
# Install Java 1.9 and remove tmp files.
RUN apt-get update && apt-get install -y software-properties-common && \
        echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections && \
        add-apt-repository -y ppa:webupd8team/java && \
        apt-get update && apt-get install -y oracle-java8-installer oracle-java8-set-default && \
        apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* /var/cache/oracle-jdk8-installer
  
# Install Python 2.7 and pip and remove tmp files.
RUN apt-get update && apt-get install -y \
        python \
        python-pip && \
        apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
  
# Define working directory.
WORKDIR /opt
  
# Install Sambamba 0.6.6 and remove tmp files.
RUN wget https://github.com/lomereiter/sambamba/releases/download/v0.6.6/sambamba_v0.6.6_linux.tar.bz2 && \
        tar -xjvf sambamba_v0.6.6_linux.tar.bz2 && \
        rm sambamba_v0.6.6_linux.tar.bz2 && \
        chmod +x sambamba_v0.6.6 && \
        ln -s /opt/sambamba_v0.6.6 /bin/sambamba_v0.6.6
  
# Copy Dockerfile and Changelog.
COPY Dockerfile /opt/
COPY Changelog /opt/

Building an image from a Dockerfile and pushing it to the CAVATICA image registry

When you have created a Dockerfile, the image is built using the docker build command. The docker build command requires a Dockerfile and a context to build an image. It is common practice to put the Dockerfile at the root of the build context.

❗️
Do not use the root directory of your file system (/) as the build context. Since the first step of the docker build command is to send the build context to the docker daemon, this would cause transfer of your entire drive to the daemon.

This is how the docker build command is run using the current directory as the context:

docker build .

When building an image containing a tool to be used on CAVATICA, you need to specify a repository and a tag. The docker build command then has the following format:

docker build -t pgc-images.sbgenomics.com/<user_name>/<repository_name>:<tag> .

Note that <username> needs to be your CAVATICA username, while <repository_name> must be at least 3 characters long and can only contain lowercase letters, numbers, ., - and _. Learn more about repositories in the CAVATICA image registry.

For example:

docker build -t pgc-images.sbgenomics.com/rosalind_franklin/samtools:v1 .

After the build process has been completed successfully, the next step is to log in to the CAVATICA image registry (pgc-images.sbgenomics.com) from the terminal:

docker login pgc-images.sbgenomics.com

❗️
You should enter your authentication token in response to the password prompt, not your CAVATICA password.

Finally, you need to push the image you have created to the CAVATICA image registry:

docker push pgc-images.sbgenomics.com/rosalind_franklin/samtools:v1

Once the process has been completed, use the Tool Editor to provide a description of the tool on the Platform.

Creating a Dockerfile for an existing Docker image

If you already have a Docker image and you want to create a Dockerfile for the image, use Docker Hub to store your image, as follows:

docker login

When prompted, enter your Docker Hub credentials.
Push your Docker image to Docker Hub:

docker push <image>

Create a Dockerfile containing the following line:

FROM <image>

In this case, <image> is the reference to the image on Docker Hub, e.g. rosalind_franklin/my_image.
You are now able to build the image using the Dockerfile, in the way described above.