Docker Series - Docker part II

I'll be more explicit about some commands that you may want to use in your Dockerfile and explain how Docker works through them.
For introduction to Docker see Part I

A docker container will keep running in the background as long as the initial command executed within the container is still running.

CMD and ENTRYPOINT

You can define the command that will be run inside the container.

ENTRYPOINT ["/usr/java/latest/bin/java"]
CMD ["-version"]

So this setup defines that when we invoke

$ docker run above/testcontainer

The ENTRYPOINT part cannot be overriden at runtime, while the CMD part supplies a default value that can be overriden.

So to be usefull, this container needs some extra parameters to output something else than the java version inside it.

$ docker run above/testcontainer -jar test.jar


ofcourse the test.jar in this case needs to be inside the container for the java process to see it.

So the container will be running for as long as the java process is running.
We can terminate it with:

docker stop $container_id

which will issue a SIGTERM and then a SIGKILL for the docker running process.

When building the container from the Docker file, the commands inside the Dockerfile are not run in a single(the same) container. After each command docker stops the container, writes the delta from the previous one. It then runs this newly saved container in order to execute the next command in the file.

This explains also why you'll see no cd command by itself in the Dockerfile since it's efect is not saved. The cd commands are inlined with the actual commands.

So this raises the issue how can you run multiple processes inside a container(lets say Tomcat and SSH). You'd want Tomcat to serve your app and ssh for you to be able to login and check say the tomcat logs from inside the container.

One might think can just start SSH as a background service in the container and then start Tomcat something like:

RUN /usr/bin/sshd -D
RUN service tomcat7 start

but this wouldn’t work since when docker will be processing the service tomcat7 start command, it will have shutdown the previous container in which sshd was running.

So it seems there is a big problem that actually docker can only be running a single process .
The answer is a tool like Supervisor which itself takes care of starting other processes, redirecting their output to log files and restarts them when they die.

Example Dockerfile

FROM java/mvn

# We want our SSH key to be added
RUN mkdir /root/.ssh && chmod 700 /root/.ssh
ADD id_dsa.pub /root/.ssh/authorized_keys

## sshd requires this directory
RUN mkdir /var/run/sshd
RUN chmod 400 /root/.ssh/authorized_keys && chown root. /root/.ssh/authorized_keys

## we copy a config file from the host to the container
ADD supervisord.conf /etc/supervisor/conf.d/supervisord.conf

EXPOSE 22 5701

## we declare supervisord as the default command for the container
CMD ["/usr/bin/supervisord"]

here is a simple supervisord.conf example:

[supervisord]
nodaemon=true

[program:sshd]
command=/usr/sbin/sshd -D
stdout_logfile=/var/log/supervisor/%(program_name)s.log
redirect_stderr=true
autorestart=true

[program:hztest]
command=java -jar /root/hazelcast/java/hztest.jar
stdout_logfile=/var/log/supervisor/%(program_name)s.log
redirect_stderr=true

EXPOSE

# We declare two container internal ports as mappable from the outside
EXPOSE 8080 22

So far as the applications running inside the container are concerned, they will be running on the container's private ports.
You exposes a port from the container to be able to receive connections from other containers or outside. When started with the CLI run option -p interface:host_port:container_exposed_port option we can bound that port to an interface of the host.
For example we can expose Tomcat inside the container to receive connections directly from the internet,but SSH only for internal connections

docker run -p 0.0.0.0:9000:8080 -p 127.0.0.1:2022:22 <image> <cmd>

and you can access http://host_ip:9000 and the Tomcat inside the container will respond.

If you don't explicitly map the private port, a random port on the default docker interface will be mapped.

Sidenote Containers when running are assigned an internal ip address something like, and you can find their ip(and a lot other info) by running:

sudo docker inspect CONTAINER_ID

to check all the running containers IPs:

for i in $(sudo docker ps -q);
   do sudo docker inspect -format='{{.NetworkSettings.IPAddress}}' $i
done

ADD

Copy some files from the HOST FS to the temporary CONTAINER and therefore be saved inside of the container image.

Example: copy the user's public key inside the container authorized_keys for easier ssh login through key authentication.

# And we want our SSH key to be added
RUN mkdir /root/.ssh && chmod 700 /root/.ssh
ADD id_dsa.pub /root/.ssh/authorized_keys

Doesn't matter if you change the files you ADDed after building the image, they are frozen inside the container image as they were at the time of the creation. For the container to see the new file version you need to rebuild the image.(For handling dynamic changing files see VOLUME bellow).

Remember that when rebuilding an image, Docker did cache the commands in the Dockerfile. Since you want the image to contain the new version, it means that the ADD command resets the cache for itself and following commands, therefore it's good practice to place the ADD clause at the foot of the Dockerfile.

As of version 0.8 Docker has improved the caching mechanism to not reset the cache for ADD if the added files did not change(cache key is timestamp related).

VOLUME

A volume can be thought like a directory external to the container that can be mounted at runtime

VOLUME ["app/conf", "/var/app/data", "/var/logs"]

Volumes are immensely useful because:

  • They can be a good way to "configure" at runtime a container. Imagine for example passing different configuration directories for each container
docker run -v /home/balamaci/project/app/conf_1:app/conf -v /home/balamaci/project/logs:/var/logs java/test_app

through the docker run -v command you can actually bind any folder inside the container, the folder doesn't necessarily have to have been defined in the Dockerfile through the VOLUME directive.
So does the VOLUME directive has any "real" use? Yes, it does as it marks that directory with the content inside it as not going inside the image.

  • They can be a way to share data / deliverables between containers and also the host.
  • More convenient backups of data.

Mounted volumes are not part of the container image. Changes to a data volume are made directly, without the overhead of a copy-on-write mechanism. This is good for very large files.

As a realcase example I also mapped my project jar / war files from the host as a volume inside the container.
That way, when I need to change something in the sources and rebuild I only need to do it once on the host and then the containers will have the new package version without having to rebuild them.
The alternative of writing them(though either git pull or ADD) into the image would mean very tedious process of rebuilding and leftover images for each rebuild. Ofcourse once your files are in a stable releasable state writing them inside the image makes more sense.

Mapping a /logs volume for each container on the host I think it's a good ideea since it makes checking the log files and backup more convinient.

ENV

#sets an environment property
ENV JAVA_HOME=/usr/java/latest

Subsequent Dockerfile commands will see this property.

With docker you can pass different environment properties at runtime and which means added flexibility because you can read this directly in your program or through bash scripts and for example generate configuration files for processes.

Something like:

#!/bin/bash

if [ "$RUN_MODE" = "production" ] ; then
sed -e "s#\${user}#$DB_USER#" -e "s#\${password}#$DB_PASSWORD#" env.template.properties > env.properties
fi

This script uses sed to replace in the env.properties file constructs like bellow with the ENV value

mysql_user=${user}
mysql_password=${password}

and can run with

docker run -e RUN_MODE=production -e DB_USER=john -e DB_PASSWORD=pass balamaci/test-app

Storing the Docker images in a custom location.

I'm running a 64GB SSD on my laptop and I'd rather keep the images and containers on a 16GB stick.
By default docker keeps it's data in /var/lib/docker.
We can change that by editing the docker config file /etc/default/docker to pass extra parameters to the docker daemon:

DOCKER_OPTS="-g /mnt/my_custom_location"

and bind mount the location:

sudo mount --bind /mnt/my_custom_location /var/lib/docker

Restart the docker daemon with

$ sudo service docker restart

Docker containers IPs:

$ sudo docker inspect -format='{{.NetworkSettings.IPAddress}}' $INSTANCE_ID

or to show all running container's IPs:

for i in $(sudo docker ps -q);
   do sudo docker inspect -format='{{.NetworkSettings.IPAddress}}' $i
done

Run docker command without sudo

The docker client(the docker command) communicates with the docker daemon through a . You can change the ownership of the . By default docker will bind the socket to the docker group if that group exists.

Docker Gotchas

Containers are NOT ephemeral After a container executes it's process or when you've stoped them explicitly, they still exist(that's why you cannot delete the image they're based on). The log files and state when stopped still exists inside the container, if you restart them, they will NOT start fresh from the "clean state" of the snapshot image. Some (including myself) thought that's the reason.

Still it can be helpful to keep separate volumes for data and logs dir so I can easily backup or check all of them.

NOT easy to set a static IP for a container Docker creates a special Linux bridge network called docker0 on startup. All containers are automatically connected to this bridge network and the IP subnet for all containers is randomly set by Docker. Currently, it is not possible to directly influence the particular IP address of a Docker container. Restarted containers might get a different IP(0.7 version).

Files like "/etc/hosts" and "/etc/resolv.conf" are readonly inside container Combined with the issue above it's not very easy to reference other containers by name with an easy setup.

UPDATE: you can use docker --link * container_name, or *docker-compose to link containers and then you can reference a host by name. Or use docker network to create subnetworks for isolation.

Layer limitations

Dockerfile instructions are repeatable, but at present the AUFS limit of 42 layers means you’re encouraged to group similar commands where possible (i.e. combining separate apt-get install lines into one RUN command). The impact of this is that a single change to a long list of required apt-get packages means invalidating Docker’s build cache for that command and all those which follow it.

Docker and Ansible

Sometimes you may require extensive configuration for each container. I've seen a lot of people which recomend as a provisioning tool Ansible. It's written in Python and seems easier to grasp than other provisioning tools like Chef or Puppet. Some suggested that you start with a basic container, have it configured through Ansible and in the end commit the result and just forget about the Dockerfile.
Here's as a very good starting point for it.

References

Docker guide
Automated deployment with Docker – lessons learnt

comments powered by Disqus