Lab 3
Last updated
Was this helpful?
Last updated
Was this helpful?
By default all files created inside a container are stored on a writable container layer. That means that:
If the container no longer exists, the data is lost,
The container's writable layer is tightly coupled to the host machine, and
To manage the file system, you need a storage driver that provides a union file system, using the Linux kernel. This extra abstraction reduces performance compared to data volumes
which write directly to the filesystem.
Docker provides two options to store files in the host machine: volumes
and bind mounts
. If you're running Docker on Linux, you can also use a tmpfs mount
, and with Docker on Windows you can also use a named pipe
.
Volumes
are stored in the host filesystem that is managed by Docker.
Bind mounts
are stored anywhere on the host system.
tmpfs mounts
are stored in the host memory only.
Originally, the --mount
flag was used for Docker Swarm services and the --volume
flag was used for standalone containers. From Docker 17.06 and higher, you can also use --mount
for standalone containers and it is in general more explicit and verbose than --volume
.
A data volume
or volume
is a directory that bypasses the Union File System
of Docker.
There are three types of volumes:
anonymous volume,
named volume, and
host volume.
Let's create an instance of a popular open source NoSQL database called CouchDB and use an anonymous volume
to store the data files for the database.
To run an instance of CouchDB, use the CouchDB image from Docker Hub at https://hub.docker.com/_/couchdb. The docs say that the default for CouchDB is to write the database files to disk on the host system using its own internal volume management
.
Run the following command,
CouchDB will create an anonymous volume and generated a hashed name. Check the volumes on your host system,
Set an environment variable VOLUME
with the value of the generated name,
And inspect the volume that was created, use the hash name that was generated for the volume,
You see that Docker has created and manages a volume in the Docker host filesystem under /var/lib/docker/volumes/$VOLUME_NAME/_data
. Note that this is not a path on the host machine, but a part of the Docker managed filesystem.
Create a new database mydb
and insert a new document with a hello world
message.
Stop the container and start the container again,
Retrieve the document in the database to test that the data was persisted,
You can share an anonymous volume with another container by using the --volumes-from
option.
Create a busybox
container with an anonymous volume mounted to a directory /data
in the container, and using shell commands, write a message to a log file.
Make sure the container busybox1
is stopped but not removed.
Then create a second busybox
container named busybox2
using the --volumes-from
option to share the volume created by busybox1
,
Docker created the anynomous volume that you were able to share using the --volumes-from
option, and created a new anonymous volume.
Cleanup the existing volumes and container.
A named volume
and anonymous volume
are similar in that Docker manages where they are located. However, a named volume
can be referenced by name when mounting it to a container directory. This is helpful if you want to share a volume across multiple containers.
First, create a named volume
,
Verify the volume was created,
Now create the CouchDB container using the named volume
,
Wait until the CouchDB container is running and the instance is available.
Create a new database mydb
and insert a new document with a hello world
message.
It now is easy to share the volume with another container. For instance, read the content of the volume using the busybox
image, and share the my-couchdb-data-volume
volume by mounting the volume to a directory in the busybox
container.
You can check the Docker managed filesystem for volumes by running a busybox container with privileged permission and set the process id to host
to inspect the host system, and browse to the Docker managed directories.
Cleanup,
When you want to access the volume directory easily from the host machine directly instead of using the Docker managed directories, you can create a host volume
.
Let's use a directory in the current working directory (indicated with the command pwd
) called data
, or choose your own data directory on the host machine, e.g. /home/couchdb/data
. We let docker create the $(pwd)/data
directory if it does not exist yet. We mount the host volume
inside the CouchDB container to the container directory /opt/couchdb/data
, which is the default data directory for CouchDB.
Run the following command,
Verify that a directory data
was created,
and that CouchDB has created data files here,
Also check that now, no managed volume was created by docker, because we are now using a host volume
.
and
Create a new database mydb
and insert a new document with a hello world
message.
Note that CouchDB created a folder shards
,
List the content of the shards
directory,
and the first shard,
A shard is a horizontal partition of data in a database. Partitioning data into shards and distributing copies of each shard to different nodes in a cluster gives the data greater durability against node loss. CouchDB automatically shards databases and distributes the subsets of documents among nodes.
Cleanup,
The mount
syntax is recommended by Docker over the volume
syntax. Bind mounts have limited functionality compared to volumes. A file or directory is referenced by its full path on the host machine when mounted into a container. Bind mounts rely on the host machine’s filesystem having a specific directory structure available and you cannot use the Docker CLI to manage bind mounts. Note that bind mounts can change the host filesystem via processes running in a container.
Instead of using the -v
syntax with three fields separated by colon separator (:), the mount
syntax is more verbose and uses multiple key-value
pairs:
type: bind, volume or tmpfs,
source: path to the file or directory on host machine,
destination: path in container,
readonly,
bind-propagation: rprivate, private, rshared, shared, rslave, slave,
consistency: consistent, delegated, cached,
mount.
OverlayFS is a union mount filesystem
implementation for Linux. To understand what a Docker volume is, it helps to understand how layers and the filesystem work in Docker.
To start a container, Docker takes the read-only image and creates a new read-write layer on top. To view the layers as one, Docker uses a Union File System or OverlayFS (Overlay File System), specifically the overlay2
storage driver.
To see Docker host managed files, you need access to the Docker process file system. Using the --privileged
and --pid=host
flags you can access the host's process ID namespace from inside a container like busybox
. You can then browse to Docker's /var/lib/docker/overlay2
directory to see the downloaded layers that are managed by Docker.
To view the current list of layers in Docker,
Pull down the ubuntu
image and check again,
You see that pulling down the ubuntu
image, implicitly pulled down 4 new layers,
a611792b4cac502995fa88a888261dfba0b5d852e72f9db9e075050991423779
d181f1a41fc35a45c16e8bfcb8eee6f768f3b98f82210a43ea65f284a45fcd65
dac2f37f6280a076836d39b87b0ae5ebf5c0d386b6d8b991b103aadbcebaa7c6
f3e921b440c37c86d06cd9c9fb70df50edad553c36cc87f84d5eeba734aae709
The overlay2
storage driver in essence layers different directories on the host and presents them as a single directory.
base layer or lowerdir,
diff
layer or upperdir,
overlay layer (user view), and
work
dir.
OverlayFS refers to the lower directories as lowerdir
, which contains the base image and the read-only (R/O) layers that are pulled down.
The upper directory is called upperdir
and is the read-write (R/W) container layer.
The unified view or overlay
layer is called merged
.
Finally, a workdir
is a required, which is an empty directory used by overlay for internal use.
The overlay2
driver supports up to 128 lower OverlayFS layers. The l
directory contains shortened layer identifiers as symbolic links.
Cleanup,