Docker Series - Intro to Docker

Foreword - How I stumbled upon Docker

So all this started when I was impressed after reading about Jepsen. I like that that Kyle(the blogger) seems to have stumbled upon a generic framework for testing distributed systems behavior in cases of node failure and network partitioning etc, and how he was able to use this framework to do some nice analysis of Cassandra, Ryak, etc. So this got me thinking I wanted to do the same thing with Hazelcast which I needed to see how it handles some network partition scenarios, slow network responses, etc and share my finding with the HZ dev team in hope of helping them build a perfect product.

My problem was that I’m running on an not very impressive personal laptop and I don’t have a lot of resources to start up 4 individuals VMs and also do some coding and debugging on this machine. So I looked maybe at some alternatives(it’s always great to be sidetracked by an interesting technology) and then I found out about Docker and got hooked on this thing which promises to hold great potential for automatization even on the development envs.

###Containers and how they compare to VMs

So why you want to consider using containers more often is because they offer:

When running containers, they can bound their ports to ports on the host machine(what is port 80 in the container can be exposed as port 2000 on the host), can have host machine folders shared with them, can share folders with other running containers – all in all very similar to what VMs can do.

Sidenote: you don’t really need LXC if you just want to limit a process to not hog all the cpu, memory, disk, etc and bring down a machine. You can do that with plain cgroups, but I’m a developer - not sysadmin- and like the high level(easier) not the “bare metal” one.

While we’ll not work directly with LXC containers setting up a LXC container does not seem very hard, but as seen later, Docker will hide the “ugly” plumbing from us and let us work with a high level interface.

Docker = LXC + CopyOnWrite FS(AuFS) + IPTABLES + Provisioning tools

Just like the VMs image, Docker is also based on a docker image, but Docker makes use of the AuFS which is a CoW(CopyOnWrite) file system which is very useful since a CoW FS means that when two(or more) processes use the same files there is no need to create copies of the files, but if one of the process needs to change the files, then it gets a separate private copy of the files it modifies and only a diff is stored. This allows a great reuse of disk space if you start more than one(image hundreds) of the same type container and work of the concept of layers which is even more interesting in the sense that you stack changes on top of one another(we’ll see later).

Sidenote: at the time of the writing, Docker requires kernel 3.8+ and Ubuntu LTS version uses the 3.5 kernel, but you can upgrade the kernel very easy without going through a complicated process like recompiling yourself.

What Docker does for containers:

You start with a base image. These images live inside Docker Registry (sourcesearch). The registry is open source and you can host your own inside the company. As you work with a container and continue to perform actions on it (e.g. download and install software, configure files etc.), to have it keep its state(create a savepoint), you need to “commit”, very similar to a version control system. The above was a missunderstanding from me and maybe others will think like this, but containers will keep their state, even when stopped and restarted. You can however commit the state of a container into a new image which you can reference and build upon.

Docker daemon and client

The docker process runs as a daemon and awaits for commands to manage all the containers and docker images. It needs to run as root so that it can have access to everything it needs.

We give commands to the Docker daemon through the docker client -command line process.

containers-linked

“By default the Docker daemon listens on unix:///var/run/docker.sock and the client must have root access to interact with the daemon. If a group named “docker” exists on your system, docker applies ownership of the socket to the group.”

You can change ownership of the linux socket so it’s easier to invoke the docker client without sudo. See here how.

There is also a REST like API through which you can use curl to give commands to the daemon - read here.

It can also be exposed and communicate using a HTTP socket.

We can fetch an image with the docker pull command from a remote repository. Let’s search for the ubuntu image on the docker registry and install it using docker

let’s look at what is available localy now:

-i - interactive mode so that for commands needing user input, like writting the bash commands.

and you enter a bash prompt inside the container

# apt-get install software-properties-common
# add-apt-repository -y ppa:webupd8team/java
# apt-get update
# apt-get install -y oracle-java7-installer

so this container now has Java installed, but if we don’t commit this new state, it will be lost(the container still keeps it’s state, but we’d like to save it as an image so we can build other images base on it, or start containers from this image).

if we open up another terminal, since bash is running inside the container it also keeps the container in a running state. To list all running containers:

$ docker ps

Congratulations on your first Docker container, running

So we can use this container as a base for others building layer upon layer like java7&maven3 and on top of this have maybe two other images:

  • java7&maven3 + Tomcat7
  • java7&maven3 + Jetty9

Due to the usage of AuFS sharing of files, space is being reused for the shared layers.

Space is being reused because only the difference between a container instance and it’s image needs to be kept.

We can check the hierarchy, and dependencies with

PS: this tree command has been removed it seems from recent versions of docker in an effort to keep the core as small as possible).

Dockerfile

Until now I’ve shown how to build the image by writting into the container bash and commiting, but this get tedious and not very clear, and it would be nice to just be able to write a “recipe” for how to build an image automatically, like a provisioning tool. It also makes sense to see what were the commands that created the image, because otherwise I might build upon a “java+tomcat7” public image that also contains some kind of hidden “malware” commands.

And for that is exactly what the Dockerfile is, and I think it’s very easy to understand while you’ll have a look at one: Write this in a text editor and save it with the name Dockerfile in a directory.

pretty self explanatory right, only this is the basic form for one, but still very sufficient, we’ll get into more details in part II.

To build the image based on the recipe file from the directory where you save the Dockerfile run and give whatever name you want throught the -t parameter: a pseudoconsole to see the output -i parameter:

and after it’s finished it will output the new image id.

What docker is actually doing is running each command in the Dockerfile and creating a new container and commits a new image for each, so if we look at the new created image we’ll see a chain of multiple commit that created the image. This is useful for command caching.

Caching the commands

It’s very nice that it only takes some time only on the first run of the build to pull the dependencies, then it caches the commands so that if you run them again(you rebuild using the same Dockerfile) without changing their order, it’s not going to run the previous steps again, it will know to build upon to the previously committed layers. If, however, you do change an instruction in the Dockerfile or re-order some instructions around then Docker will only re-use the previously built containers up until the point where the Dockerfile has changed.

Docker Images vs Dockerfiles

To sum up again, Dockerfiles are recipes that when given to the docker build command result in docker images which are immutable images that docker is using to run your code.

It’s also important to understand because it answers the question: Q: “If two different people use the same Dockerfile to build an image, does it mean they have the same image to run on?” A: Not necessarily, because say we both use the same “Java7” Dockerfile, but I built the image a month after you, while we both used the same Dockerfile, when my build process pulled java7 from the repository, it may be a newer minor version that what you have in your image when you built it a month ago. Consider also this for “apt packages” as well.

So if we really need to make sure we run the same thing, rather you’d build your image and publish it somewhere(public or private repository). Then I’d pull it from there and then I can be sure we’re running the exact same setup.

Publishing your docker image to a repository

Like git pushing to a remote repository is the done through docker push.

IMPORTANT When you run docker push to publish an image, it’s not publishing your source code(Dockerfile), it’s publishing the docker image that was built from your source code.

If you go browsing around on the docker public index, you’ll see lots of images listed there, but weirdly, you can’t see(since “trusted builds” for some you can) the Dockerfile that built them. The image is an opaque asset that is compiled from the Dockerfile.

The lack of the quick info of what exactly went into a build and trust that it does not contain any other “hidden” functionality was solved by the implementation of a “Trusted build” functionality that you link your Github account and you publish and change your Dockerfile there and a post build commit hook triggers the building. That way the published image is guaranteed to be the exact one from the Dockerfile.

Deleting the built images gotcha

I can’t finish without telling you about this little gotcha that got me scratching my head.

There are two separate commands for deletion

A. remove a container - docker rm $CONTAINER_ID B. remove an image - docker rmi $IMAGE_ID

And if you want to remove a certain image, you cannot remove the image unless you explicitly remove the container that was running on it. What is not obvious is that even if you are not having any active containers running the image you cannot remove the image.

A stopped container still exists even if stopped and it’s running state preserved. By default Docker will not remove it, since it could be restarted from that state if you wanted.

For every command in the Dockerfile when building, Docker will create an “intermediate” container so there are a lot of “leftover” containers. To remove these intermediate containers when building from a Dockerfile.

To list all the containers.

you can see what the container last ran command was and check it’s status if it’s ‘Up for …’ or ‘Exited’.

To save yourself the hassle, this command removes any non running containers(it will fail for running ones)

Then you can try again to remove the image you want. All this cleaning of stopped containers, unused images can be quite a chore.

Great References

Docker guide Docker Containers Docker for Dummies How To Install and getting started Atlasion blog - Docker

Please ask any questions if you have. For now I’m preparing Part II that I’ll detail some detailed explanations of Dockerfiles, best practices and gotchas of Docker.