Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/assets/images/6090-g1689.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
301 changes: 293 additions & 8 deletions docs/chapters/chapter_02.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,300 @@
## 2.1 First subtopic
Here you can enter text and if you need to cite[@creative_commons_2022]
## Learning outcomes

!!! example "Challenge 1"
**After having completed this chapter you will be able to:**

This is an example of text of Challenge 1
- fetch and run Docker containers on their computer.
- interpret the instructions of a Dockerfile
- create simple Docker containers to run simple python/R scripts.

??? success "Solution"
## Material

This is an example solution for Challenge 1.
TODO: add overview of necessary files, video, etc
TODO: UTF-8 encoding apostrophes?

[:fontawesome-solid-file-pdf: Download the presentation](../assets/pdf/docker_dance.pdf){: .md-button }

## 2.2 Second subtopic
* Unix command line [E-utilities documentation](https://www.ncbi.nlm.nih.gov/books/NBK179288/)

## 2.3 Third subtopic
## 2.1 Docker Dance

We will use Docker as an example to illustrate the development and use of containers.

### Install Docker

Please follow the installation of the latest version of Docker Desktop for your operating system. It is described at [Get Docker](https://docs.docker.com/get-docker/)

TODO: add screenshots of the desired end points per OS

### Introducing the Dockerfile

TODO: add notes about good documentation of the recipe and how to freeze versions of tools in a container image

The Dockerfile is the starting point of the Docker Dance which is schematically drawn here.

![Docker Dance](./../assets/images/6090-g1689.png){: style="width:650px;"}

Now, let's focus on the instructions for building Docker container images which are saved in a text file, named by default **Dockerfile**.

This is a basic recipe with three statements, one FROM and two RUN statements.

```sh title="Dockerfile"
FROM ubuntu:18.04

RUN apt update && apt -y upgrade
RUN apt install -y wget
```

The **FROM** statement describes the parent image. Typically, an 'operating' system but you can also use an image of other parties as a starting point. This instruction creates the base layer.

```sh
FROM ubuntu:18.04
```

Recommendation: pin the version of the OS of the base layer

The **RUN** statement specifies the command to execute inside the image filesystem.

Think about it this way: every RUN line is essentially what you would run to install programs on a freshly installed Ubuntu OS. This command will be executed as root in the container.

```sh
RUN apt install wget
```

Each row in the recipe corresponds to a **layer** of the final image.

TODO: add image like e.g. https://www.google.com/url?q=https://houseofnasheats.com/wp-content/uploads/2019/02/Layered-Rainbow-Jello-11.jpg&sa=D&source=docs&ust=1685897875842731&usg=AOvVaw2Bc7qDiD4TfX0PN_ZJYk5v

### Anatomy of the commands

With this basic Dockerfile, we will already start the build process which creates an image. Just have a look at the sketch of the Docker Dance above.

**Building Docker image**

The build command implicitly looks for a file named Dockerfile in the current directory:

```sh
docker build .

# or by specifying the exact file name

docker build --file Dockerfile .
```

**Syntax**: -file / -f

. stands for the context (in this case, current directory) of the build process. This makes sense if copying files from filesystem, for instance.

!!! info

Avoid contexts (directories) overpopulated with files (even if not actually used in the recipe).

You can define a specific name for the image during the build process.

**Syntax**: -t imagename:tag. If not defined :tag default is latest.

```sh
docker build -t mytestimage:v1 .
```

Once the build process is finished, The last line of output should be `Successfully built ... `. Then you are good to go.

As next step, we will check with the command `docker images` that you see the newly built image in the list of images.

TODO: add output of the command as screenshot

```sh
docker images
```
Then let’s check the ID of the image and run it later. But right now, we investigate some additional statements for the recipes!

Additional statements for the Dockerfile

TODO: refine table with following content:

| command | what does it do? | Example |
|---------|----------------------------------|-------------------------------------------------------|
| LABEL | Who is maintaining the container image | LABEL maintainer=”your name <your.email@domain.org>” |
| WORKDIR | all subsequent actions will be executed in that working directory. | WORKDIR ~ |
| COPY | lets you copy a local file or directory from your host (the machine from which you are building the image) | COPY ~/.bashrc . # COPY source destination |
| ADD | same, but ADD works also for URLs, and for .tar archives that will be automatically extracted upon being copied. | |
| ARG | available only while the image is built | |
| ENV | available for the future running containers | |
| ENTRYPOINT | The ENTRYPOINT specifies a command that will always be executed when the container starts. | |
| CMD | The CMD specifies arguments that will be fed to the ENTRYPOINT. | |

**Further readings**

Difference between ADD and COPY explained [here](https://stackoverflow.com/questions/24958140/what-is-the-difference-between-the-copy-and-add-commands-in-a-dockerfile) and [here](https://nickjanetakis.com/blog/docker-tip-2-the-difference-between-copy-and-add-in-a-dockerile).

Difference between ARG and ENV explained [here](https://vsupalov.com/docker-arg-vs-env/).

### A more complex recipe

A more complex recipe (save it in a text file named Dockerfile:

TODO: check this part once R/Python scripts are available

```sh title="Dockerfile"
FROM ubuntu:18.04

LABEL
WORKDIR ~

RUN apt-get update && apt-get -y upgrade
RUN apt-get install -y wget

ENTRYPOINT ["/usr/bin/wget"]
CMD ["https://cdn.wp.nginx.com/wp-content/uploads/2016/07/docker-swarm-hero2.png"]
```

**Tips for Docker files**

You should try to separate the Dockerfile into as many stages as possible, because this will allow for better caching.

For example for `apt-get`:

You must run apt-get update and apt-get install in the same command, otherwise you will encounter caching issues.
Remember to use apt-get install -y, because you will have no control over the process while it’s building.


**Useful resources**

[Dockerfile reference](https://docs.docker.com/engine/reference/builder/)
[Best practices](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008316)
[Ten simple rules for writing Dockerfiles for reproducible data science](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008316)

TODO: add exercises difficult examples from BioInformatics ???? see Biocontainers community

### Running our Docker container

Now we want to use what is inside the image.

`docker run` creates a fresh container (active instance of the image) from a Docker (static) image, and runs it.

The format is:

```sh
docker run [docker options] <IMAGE NAME> [image arguments]
```

This means that arguments that affect the way Docker runs must always go before the image name, but arguments that are passed to the image itself must go after the image name.

```sh
docker run ubuntu:18.04 /bin/ls
```

TODO: add command which is from the built container we use above

TODO: add command which is from the example we use in the current R/Python scripts

**Exercise:** What happens if you execute ls in your current working directory: is the result the same?

!!! info
You can execute any program/command that is stored inside the image.

```sh
docker run ubuntu:18.04 /bin/whoami
docker run ubuntu:18.04 cat /etc/issue
```

??? done "Answer"
Anything surprising happened?

**List running containers**

```sh
docker ps
```

List all containers (whether they are running or not):

```sh
docker ps -a
```

The IDs that are shown can be useful for other docker commands like `docker stop` and `docker exec`.

### Volumes

Docker containers are fully isolated. It is necessary to mount volumes in order to handle input/output files.
By default, Docker containers cannot access data on the host system. This means you cannot use host data in your containers. All data stored in the container will be lost when the container exits

TODO: check about mount bind statements

You can solve this in two ways:

-v /path/in/host:/path/in/container: This bind mounts a host file or directory into the container. Writes to one will affect the other. Note that both paths have to be absolute paths, so you often want to use`pwd`/some/path

-v volume_name:/path/in/container. This mounts a named volume into the container, which will live separately from the rest of your files. This is preferred, unless you need to access or edit the files from the host.

```sh
mkdir datatest
touch datatest/test
docker run --detach --volume $(pwd)/datatest:/scratch --name fastqc_container biocontainers/fastqc:v0.11.9_cv7 tail -f /dev/null
docker exec -ti fastqc_container /bin/bash
> ls -l /scratch
> exit
```

TODO: Insert example exercises

## 2.2 Container registries (e.g. Docker Hub)

Images can be stored locally or shared in a registry. Docker hub is the main public registry for Docker images.
Let’s search the keyword “ubuntu”

TODO: insert screenshot of the output

There are a lot of alternatives to Docker hub for image registries depending on the needs of the organisation or company. Some examples are shown below:

TODO: insert image of the registries


1. Get the latest image or latest release

```sh
docker pull ubuntu
```

TODO: add output

2. Check the versions of Ubuntu present and fetch version 18.04 using tags

TODO: add screenshot

```sh
docker pull ubuntu:18.04
```

When you ran this command, Docker first looked for the image on your local machine, and when it couldn’t find it, pulled it down from a cloud registry of Docker images called Docker Hub

What other repositories are possible?
Have a look at the web site https://biocontainers.pro/ which is a specific directory of Bioinformatics related tools.
the images are stored in Docker hub and/or Quay.io (RedHat registry)
these images are normally created from [Bioconda](https://bioconda.github.io)

Example: FastQC
https://biocontainers.pro/#/tools/fastqc

TODO: Open solution
```sh
docker pull biocontainers/fastqc:v0.11.9_cv7
```

Images can be listed by the command

```sh
docker images
docker image ls
```
Each image has a unique IMAGE ID.

TODO: add image with example

Where are these images stored? On Linux, they usually go to /var/lib/.
Docker is very greedy in storage so regular cleaning is necessary. We will see later on how you can do the purging.
Sometimes, it is also useful to get more information about the images. You can do this via

```sh
docker image inspect
```