Initial commit
216
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/aufs-driver.md
generated
vendored
Normal file
@@ -0,0 +1,216 @@
|
||||
<!--[metadata]>
|
||||
+++
|
||||
title = "AUFS storage driver in practice"
|
||||
description = "Learn how to optimize your use of AUFS driver."
|
||||
keywords = ["container, storage, driver, AUFS "]
|
||||
[menu.main]
|
||||
parent = "engine_driver"
|
||||
+++
|
||||
<![end-metadata]-->
|
||||
|
||||
# Docker and AUFS in practice
|
||||
|
||||
AUFS was the first storage driver in use with Docker. As a result, it has a
|
||||
long and close history with Docker, is very stable, has a lot of real-world
|
||||
deployments, and has strong community support. AUFS has several features that
|
||||
make it a good choice for Docker. These features enable:
|
||||
|
||||
- Fast container startup times.
|
||||
- Efficient use of storage.
|
||||
- Efficient use of memory.
|
||||
|
||||
Despite its capabilities and long history with Docker, some Linux distributions
|
||||
do not support AUFS. This is usually because AUFS is not included in the
|
||||
mainline (upstream) Linux kernel.
|
||||
|
||||
The following sections examine some AUFS features and how they relate to
|
||||
Docker.
|
||||
|
||||
## Image layering and sharing with AUFS
|
||||
|
||||
AUFS is a *unification filesystem*. This means that it takes multiple
|
||||
directories on a single Linux host, stacks them on top of each other, and
|
||||
provides a single unified view. To achieve this, AUFS uses a *union mount*.
|
||||
|
||||
AUFS stacks multiple directories and exposes them as a unified view through a
|
||||
single mount point. All of the directories in the stack, as well as the union
|
||||
mount point, must all exist on the same Linux host. AUFS refers to each
|
||||
directory that it stacks as a *branch*.
|
||||
|
||||
Within Docker, AUFS union mounts enable image layering. The AUFS storage driver
|
||||
implements Docker image layers using this union mount system. AUFS branches
|
||||
correspond to Docker image layers. The diagram below shows a Docker container
|
||||
based on the `ubuntu:latest` image.
|
||||
|
||||

|
||||
|
||||
This diagram shows that each image layer, and the container layer, is
|
||||
represented in the Docker hosts filesystem as a directory under
|
||||
`/var/lib/docker/`. The union mount point provides the unified view of all
|
||||
layers. As of Docker 1.10, image layer IDs do not correspond to the names of
|
||||
the directories that contain their data.
|
||||
|
||||
AUFS also supports the copy-on-write technology (CoW). Not all storage drivers
|
||||
do.
|
||||
|
||||
## Container reads and writes with AUFS
|
||||
|
||||
Docker leverages AUFS CoW technology to enable image sharing and minimize the
|
||||
use of disk space. AUFS works at the file level. This means that all AUFS CoW
|
||||
operations copy entire files - even if only a small part of the file is being
|
||||
modified. This behavior can have a noticeable impact on container performance,
|
||||
especially if the files being copied are large, below a lot of image layers,
|
||||
or the CoW operation must search a deep directory tree.
|
||||
|
||||
Consider, for example, an application running in a container needs to add a
|
||||
single new value to a large key-value store (file). If this is the first time
|
||||
the file is modified, it does not yet exist in the container's top writable
|
||||
layer. So, the CoW must *copy up* the file from the underlying image. The AUFS
|
||||
storage driver searches each image layer for the file. The search order is from
|
||||
top to bottom. When it is found, the entire file is *copied up* to the
|
||||
container's top writable layer. From there, it can be opened and modified.
|
||||
|
||||
Larger files obviously take longer to *copy up* than smaller files, and files
|
||||
that exist in lower image layers take longer to locate than those in higher
|
||||
layers. However, a *copy up* operation only occurs once per file on any given
|
||||
container. Subsequent reads and writes happen against the file's copy already
|
||||
*copied-up* to the container's top layer.
|
||||
|
||||
## Deleting files with the AUFS storage driver
|
||||
|
||||
The AUFS storage driver deletes a file from a container by placing a *whiteout
|
||||
file* in the container's top layer. The whiteout file effectively obscures the
|
||||
existence of the file in the read-only image layers below. The simplified
|
||||
diagram below shows a container based on an image with three image layers.
|
||||
|
||||

|
||||
|
||||
The `file3` was deleted from the container. So, the AUFS storage driver placed
|
||||
a whiteout file in the container's top layer. This whiteout file effectively
|
||||
"deletes" `file3` from the container by obscuring any of the original file's
|
||||
existence in the image's read-only layers. This works the same no matter which
|
||||
of the image's read-only layers the file exists in.
|
||||
|
||||
## Configure Docker with AUFS
|
||||
|
||||
You can only use the AUFS storage driver on Linux systems with AUFS installed.
|
||||
Use the following command to determine if your system supports AUFS.
|
||||
|
||||
$ grep aufs /proc/filesystems
|
||||
nodev aufs
|
||||
|
||||
This output indicates the system supports AUFS. Once you've verified your
|
||||
system supports AUFS, you can must instruct the Docker daemon to use it. You do
|
||||
this from the command line with the `docker daemon` command:
|
||||
|
||||
$ sudo docker daemon --storage-driver=aufs &
|
||||
|
||||
|
||||
Alternatively, you can edit the Docker config file and add the
|
||||
`--storage-driver=aufs` option to the `DOCKER_OPTS` line.
|
||||
|
||||
# Use DOCKER_OPTS to modify the daemon startup options.
|
||||
DOCKER_OPTS="--storage-driver=aufs"
|
||||
|
||||
Once your daemon is running, verify the storage driver with the `docker info`
|
||||
command.
|
||||
|
||||
$ sudo docker info
|
||||
Containers: 1
|
||||
Images: 4
|
||||
Storage Driver: aufs
|
||||
Root Dir: /var/lib/docker/aufs
|
||||
Backing Filesystem: extfs
|
||||
Dirs: 6
|
||||
Dirperm1 Supported: false
|
||||
Execution Driver: native-0.2
|
||||
...output truncated...
|
||||
|
||||
The output above shows that the Docker daemon is running the AUFS storage
|
||||
driver on top of an existing `ext4` backing filesystem.
|
||||
|
||||
## Local storage and AUFS
|
||||
|
||||
As the `docker daemon` runs with the AUFS driver, the driver stores images and
|
||||
containers within the Docker host's local storage area under
|
||||
`/var/lib/docker/aufs/`.
|
||||
|
||||
### Images
|
||||
|
||||
Image layers and their contents are stored under
|
||||
`/var/lib/docker/aufs/diff/`. With Docker 1.10 and higher, image layer IDs do
|
||||
not correspond to directory names.
|
||||
|
||||
The `/var/lib/docker/aufs/layers/` directory contains metadata about how image
|
||||
layers are stacked. This directory contains one file for every image or
|
||||
container layer on the Docker host (though file names no longer match image
|
||||
layer IDs). Inside each file are the names of the directories that exist below
|
||||
it in the stack
|
||||
|
||||
The command below shows the contents of a metadata file in
|
||||
`/var/lib/docker/aufs/layers/` that lists the the three directories that are
|
||||
stacked below it in the union mount. Remember, these directory names do no map
|
||||
to image layer IDs with Docker 1.10 and higher.
|
||||
|
||||
$ cat /var/lib/docker/aufs/layers/91e54dfb11794fad694460162bf0cb0a4fa710cfa3f60979c177d920813e267c
|
||||
d74508fb6632491cea586a1fd7d748dfc5274cd6fdfedee309ecdcbc2bf5cb82
|
||||
c22013c8472965aa5b62559f2b540cd440716ef149756e7b958a1b2aba421e87
|
||||
d3a1f33e8a5a513092f01bb7eb1c2abf4d711e5105390a3fe1ae2248cfde1391
|
||||
|
||||
The base layer in an image has no image layers below it, so its file is empty.
|
||||
|
||||
### Containers
|
||||
|
||||
Running containers are mounted below `/var/lib/docker/aufs/mnt/<container-id>`.
|
||||
This is where the AUFS union mount point that exposes the container and all
|
||||
underlying image layers as a single unified view exists. If a container is not
|
||||
running, it still has a directory here but it is empty. This is because AUFS
|
||||
only mounts a container when it is running. With Docker 1.10 and higher,
|
||||
container IDs no longer correspond to directory names under
|
||||
`/var/lib/docker/aufs/mnt/<container-id>`.
|
||||
|
||||
Container metadata and various config files that are placed into the running
|
||||
container are stored in `/var/lib/docker/containers/<container-id>`. Files in
|
||||
this directory exist for all containers on the system, including ones that are
|
||||
stopped. However, when a container is running the container's log files are
|
||||
also in this directory.
|
||||
|
||||
A container's thin writable layer is stored in a directory under
|
||||
`/var/lib/docker/aufs/diff/`. With Docker 1.10 and higher, container IDs no
|
||||
longer correspond to directory names. However, the containers thin writable
|
||||
layer still exists under here and is stacked by AUFS as the top writable layer
|
||||
and is where all changes to the container are stored. The directory exists even
|
||||
if the container is stopped. This means that restarting a container will not
|
||||
lose changes made to it. Once a container is deleted, it's thin writable layer
|
||||
in this directory is deleted.
|
||||
|
||||
## AUFS and Docker performance
|
||||
|
||||
To summarize some of the performance related aspects already mentioned:
|
||||
|
||||
- The AUFS storage driver is a good choice for PaaS and other similar use-cases
|
||||
where container density is important. This is because AUFS efficiently shares
|
||||
images between multiple running containers, enabling fast container start times
|
||||
and minimal use of disk space.
|
||||
|
||||
- The underlying mechanics of how AUFS shares files between image layers and
|
||||
containers uses the systems page cache very efficiently.
|
||||
|
||||
- The AUFS storage driver can introduce significant latencies into container
|
||||
write performance. This is because the first time a container writes to any
|
||||
file, the file has be located and copied into the containers top writable
|
||||
layer. These latencies increase and are compounded when these files exist below
|
||||
many image layers and the files themselves are large.
|
||||
|
||||
One final point. Data volumes provide the best and most predictable
|
||||
performance. This is because they bypass the storage driver and do not incur
|
||||
any of the potential overheads introduced by thin provisioning and
|
||||
copy-on-write. For this reason, you may want to place heavy write workloads on
|
||||
data volumes.
|
||||
|
||||
## Related information
|
||||
|
||||
* [Understand images, containers, and storage drivers](imagesandcontainers.md)
|
||||
* [Select a storage driver](selectadriver.md)
|
||||
* [Btrfs storage driver in practice](btrfs-driver.md)
|
||||
* [Device Mapper storage driver in practice](device-mapper-driver.md)
|
||||
315
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/btrfs-driver.md
generated
vendored
Normal file
@@ -0,0 +1,315 @@
|
||||
<!--[metadata]>
|
||||
+++
|
||||
title = "Btrfs storage in practice"
|
||||
description = "Learn how to optimize your use of Btrfs driver."
|
||||
keywords = ["container, storage, driver, Btrfs "]
|
||||
[menu.main]
|
||||
parent = "engine_driver"
|
||||
+++
|
||||
<![end-metadata]-->
|
||||
|
||||
# Docker and Btrfs in practice
|
||||
|
||||
Btrfs is a next generation copy-on-write filesystem that supports many advanced
|
||||
storage technologies that make it a good fit for Docker. Btrfs is included in
|
||||
the mainline Linux kernel and its on-disk-format is now considered stable.
|
||||
However, many of its features are still under heavy development and users
|
||||
should consider it a fast-moving target.
|
||||
|
||||
Docker's `btrfs` storage driver leverages many Btrfs features for image and
|
||||
container management. Among these features are thin provisioning,
|
||||
copy-on-write, and snapshotting.
|
||||
|
||||
This article refers to Docker's Btrfs storage driver as `btrfs` and the overall
|
||||
Btrfs Filesystem as Btrfs.
|
||||
|
||||
>**Note**: The [Commercially Supported Docker Engine (CS-Engine)](https://www.docker.com/compatibility-maintenance) does not currently support the `btrfs` storage driver.
|
||||
|
||||
## The future of Btrfs
|
||||
|
||||
Btrfs has been long hailed as the future of Linux filesystems. With full
|
||||
support in the mainline Linux kernel, a stable on-disk-format, and active
|
||||
development with a focus on stability, this is now becoming more of a reality.
|
||||
|
||||
As far as Docker on the Linux platform goes, many people see the `btrfs`
|
||||
storage driver as a potential long-term replacement for the `devicemapper`
|
||||
storage driver. However, at the time of writing, the `devicemapper` storage
|
||||
driver should be considered safer, more stable, and more *production ready*.
|
||||
You should only consider the `btrfs` driver for production deployments if you
|
||||
understand it well and have existing experience with Btrfs.
|
||||
|
||||
## Image layering and sharing with Btrfs
|
||||
|
||||
Docker leverages Btrfs *subvolumes* and *snapshots* for managing the on-disk
|
||||
components of image and container layers. Btrfs subvolumes look and feel like
|
||||
a normal Unix filesystem. As such, they can have their own internal directory
|
||||
structure that hooks into the wider Unix filesystem.
|
||||
|
||||
Subvolumes are natively copy-on-write and have space allocated to them
|
||||
on-demand from an underlying storage pool. They can also be nested and snapped.
|
||||
The diagram blow shows 4 subvolumes. 'Subvolume 2' and 'Subvolume 3' are
|
||||
nested, whereas 'Subvolume 4' shows its own internal directory tree.
|
||||
|
||||

|
||||
|
||||
Snapshots are a point-in-time read-write copy of an entire subvolume. They
|
||||
exist directly below the subvolume they were created from. You can create
|
||||
snapshots of snapshots as shown in the diagram below.
|
||||
|
||||

|
||||
|
||||
Btfs allocates space to subvolumes and snapshots on demand from an underlying
|
||||
pool of storage. The unit of allocation is referred to as a *chunk*, and
|
||||
*chunks* are normally ~1GB in size.
|
||||
|
||||
Snapshots are first-class citizens in a Btrfs filesystem. This means that they
|
||||
look, feel, and operate just like regular subvolumes. The technology required
|
||||
to create them is built directly into the Btrfs filesystem thanks to its
|
||||
native copy-on-write design. This means that Btrfs snapshots are space
|
||||
efficient with little or no performance overhead. The diagram below shows a
|
||||
subvolume and its snapshot sharing the same data.
|
||||
|
||||

|
||||
|
||||
Docker's `btrfs` storage driver stores every image layer and container in its
|
||||
own Btrfs subvolume or snapshot. The base layer of an image is stored as a
|
||||
subvolume whereas child image layers and containers are stored as snapshots.
|
||||
This is shown in the diagram below.
|
||||
|
||||

|
||||
|
||||
The high level process for creating images and containers on Docker hosts
|
||||
running the `btrfs` driver is as follows:
|
||||
|
||||
1. The image's base layer is stored in a Btrfs *subvolume* under
|
||||
`/var/lib/docker/btrfs/subvolumes`.
|
||||
|
||||
2. Subsequent image layers are stored as a Btrfs *snapshot* of the parent
|
||||
layer's subvolume or snapshot.
|
||||
|
||||
The diagram below shows a three-layer image. The base layer is a subvolume.
|
||||
Layer 1 is a snapshot of the base layer's subvolume. Layer 2 is a snapshot of
|
||||
Layer 1's snapshot.
|
||||
|
||||

|
||||
|
||||
As of Docker 1.10, image layer IDs no longer correspond to directory names
|
||||
under `/var/lib/docker/`.
|
||||
|
||||
## Image and container on-disk constructs
|
||||
|
||||
Image layers and containers are visible in the Docker host's filesystem at
|
||||
`/var/lib/docker/btrfs/subvolumes/`. However, as previously stated, directory
|
||||
names no longer correspond to image layer IDs. That said, directories for
|
||||
containers are present even for containers with a stopped status. This is
|
||||
because the `btrfs` storage driver mounts a default, top-level subvolume at
|
||||
`/var/lib/docker/subvolumes`. All other subvolumes and snapshots exist below
|
||||
that as Btrfs filesystem objects and not as individual mounts.
|
||||
|
||||
Because Btrfs works at the filesystem level and not the block level, each image
|
||||
and container layer can be browsed in the filesystem using normal Unix
|
||||
commands. The example below shows a truncated output of an `ls -l` command an
|
||||
image layer:
|
||||
|
||||
$ ls -l /var/lib/docker/btrfs/subvolumes/0a17decee4139b0de68478f149cc16346f5e711c5ae3bb969895f22dd6723751/
|
||||
total 0
|
||||
drwxr-xr-x 1 root root 1372 Oct 9 08:39 bin
|
||||
drwxr-xr-x 1 root root 0 Apr 10 2014 boot
|
||||
drwxr-xr-x 1 root root 882 Oct 9 08:38 dev
|
||||
drwxr-xr-x 1 root root 2040 Oct 12 17:27 etc
|
||||
drwxr-xr-x 1 root root 0 Apr 10 2014 home
|
||||
...output truncated...
|
||||
|
||||
## Container reads and writes with Btrfs
|
||||
|
||||
A container is a space-efficient snapshot of an image. Metadata in the snapshot
|
||||
points to the actual data blocks in the storage pool. This is the same as with
|
||||
a subvolume. Therefore, reads performed against a snapshot are essentially the
|
||||
same as reads performed against a subvolume. As a result, no performance
|
||||
overhead is incurred from the Btrfs driver.
|
||||
|
||||
Writing a new file to a container invokes an allocate-on-demand operation to
|
||||
allocate new data block to the container's snapshot. The file is then written to
|
||||
this new space. The allocate-on-demand operation is native to all writes with
|
||||
Btrfs and is the same as writing new data to a subvolume. As a result, writing
|
||||
new files to a container's snapshot operate at native Btrfs speeds.
|
||||
|
||||
Updating an existing file in a container causes a copy-on-write operation
|
||||
(technically *redirect-on-write*). The driver leaves the original data and
|
||||
allocates new space to the snapshot. The updated data is written to this new
|
||||
space. Then, the driver updates the filesystem metadata in the snapshot to
|
||||
point to this new data. The original data is preserved in-place for subvolumes
|
||||
and snapshots further up the tree. This behavior is native to copy-on-write
|
||||
filesystems like Btrfs and incurs very little overhead.
|
||||
|
||||
With Btfs, writing and updating lots of small files can result in slow
|
||||
performance. More on this later.
|
||||
|
||||
## Configuring Docker with Btrfs
|
||||
|
||||
The `btrfs` storage driver only operates on a Docker host where
|
||||
`/var/lib/docker` is mounted as a Btrfs filesystem. The following procedure
|
||||
shows how to configure Btrfs on Ubuntu 14.04 LTS.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
If you have already used the Docker daemon on your Docker host and have images
|
||||
you want to keep, `push` them to Docker Hub or your private Docker Trusted
|
||||
Registry before attempting this procedure.
|
||||
|
||||
Stop the Docker daemon. Then, ensure that you have a spare block device at
|
||||
`/dev/xvdb`. The device identifier may be different in your environment and you
|
||||
should substitute your own values throughout the procedure.
|
||||
|
||||
The procedure also assumes your kernel has the appropriate Btrfs modules
|
||||
loaded. To verify this, use the following command:
|
||||
|
||||
$ cat /proc/filesystems | grep btrfs
|
||||
|
||||
### Configure Btrfs on Ubuntu 14.04 LTS
|
||||
|
||||
Assuming your system meets the prerequisites, do the following:
|
||||
|
||||
1. Install the "btrfs-tools" package.
|
||||
|
||||
$ sudo apt-get install btrfs-tools
|
||||
Reading package lists... Done
|
||||
Building dependency tree
|
||||
<output truncated>
|
||||
|
||||
2. Create the Btrfs storage pool.
|
||||
|
||||
Btrfs storage pools are created with the `mkfs.btrfs` command. Passing
|
||||
multiple devices to the `mkfs.btrfs` command creates a pool across all of those
|
||||
devices. Here you create a pool with a single device at `/dev/xvdb`.
|
||||
|
||||
$ sudo mkfs.btrfs -f /dev/xvdb
|
||||
WARNING! - Btrfs v3.12 IS EXPERIMENTAL
|
||||
WARNING! - see http://btrfs.wiki.kernel.org before using
|
||||
|
||||
Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
|
||||
fs created label (null) on /dev/xvdb
|
||||
nodesize 16384 leafsize 16384 sectorsize 4096 size 4.00GiB
|
||||
Btrfs v3.12
|
||||
|
||||
Be sure to substitute `/dev/xvdb` with the appropriate device(s) on your
|
||||
system.
|
||||
|
||||
> **Warning**: Take note of the warning about Btrfs being experimental. As
|
||||
noted earlier, Btrfs is not currently recommended for production deployments
|
||||
unless you already have extensive experience.
|
||||
|
||||
3. If it does not already exist, create a directory for the Docker host's local
|
||||
storage area at `/var/lib/docker`.
|
||||
|
||||
$ sudo mkdir /var/lib/docker
|
||||
|
||||
4. Configure the system to automatically mount the Btrfs filesystem each time the system boots.
|
||||
|
||||
a. Obtain the Btrfs filesystem's UUID.
|
||||
|
||||
$ sudo blkid /dev/xvdb
|
||||
/dev/xvdb: UUID="a0ed851e-158b-4120-8416-c9b072c8cf47" UUID_SUB="c3927a64-4454-4eef-95c2-a7d44ac0cf27" TYPE="btrfs"
|
||||
|
||||
b. Create an `/etc/fstab` entry to automatically mount `/var/lib/docker`
|
||||
each time the system boots. Either of the following lines will work, just
|
||||
remember to substitute the UUID value with the value obtained from the previous
|
||||
command.
|
||||
|
||||
/dev/xvdb /var/lib/docker btrfs defaults 0 0
|
||||
UUID="a0ed851e-158b-4120-8416-c9b072c8cf47" /var/lib/docker btrfs defaults 0 0
|
||||
|
||||
5. Mount the new filesystem and verify the operation.
|
||||
|
||||
$ sudo mount -a
|
||||
$ mount
|
||||
/dev/xvda1 on / type ext4 (rw,discard)
|
||||
<output truncated>
|
||||
/dev/xvdb on /var/lib/docker type btrfs (rw)
|
||||
|
||||
The last line in the output above shows the `/dev/xvdb` mounted at
|
||||
`/var/lib/docker` as Btrfs.
|
||||
|
||||
Now that you have a Btrfs filesystem mounted at `/var/lib/docker`, the daemon
|
||||
should automatically load with the `btrfs` storage driver.
|
||||
|
||||
1. Start the Docker daemon.
|
||||
|
||||
$ sudo service docker start
|
||||
docker start/running, process 2315
|
||||
|
||||
The procedure for starting the Docker daemon may differ depending on the
|
||||
Linux distribution you are using.
|
||||
|
||||
You can force the the Docker daemon to start with the `btrfs` storage
|
||||
driver by either passing the `--storage-driver=btrfs` flag to the `docker
|
||||
daemon` at startup, or adding it to the `DOCKER_OPTS` line to the Docker config
|
||||
file.
|
||||
|
||||
2. Verify the storage driver with the `docker info` command.
|
||||
|
||||
$ sudo docker info
|
||||
Containers: 0
|
||||
Images: 0
|
||||
Storage Driver: btrfs
|
||||
[...]
|
||||
|
||||
Your Docker host is now configured to use the `btrfs` storage driver.
|
||||
|
||||
## Btrfs and Docker performance
|
||||
|
||||
There are several factors that influence Docker's performance under the `btrfs`
|
||||
storage driver.
|
||||
|
||||
- **Page caching**. Btrfs does not support page cache sharing. This means that
|
||||
*n* containers accessing the same file require *n* copies to be cached. As a
|
||||
result, the `btrfs` driver may not be the best choice for PaaS and other high
|
||||
density container use cases.
|
||||
|
||||
- **Small writes**. Containers performing lots of small writes (including
|
||||
Docker hosts that start and stop many containers) can lead to poor use of Btrfs
|
||||
chunks. This can ultimately lead to out-of-space conditions on your Docker
|
||||
host and stop it working. This is currently a major drawback to using current
|
||||
versions of Btrfs.
|
||||
|
||||
If you use the `btrfs` storage driver, closely monitor the free space on
|
||||
your Btrfs filesystem using the `btrfs filesys show` command. Do not trust the
|
||||
output of normal Unix commands such as `df`; always use the Btrfs native
|
||||
commands.
|
||||
|
||||
- **Sequential writes**. Btrfs writes data to disk via journaling technique.
|
||||
This can impact sequential writes, where performance can be up to half.
|
||||
|
||||
- **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write
|
||||
filesystems like Btrfs. Many small random writes can compound this issue. It
|
||||
can manifest as CPU spikes on Docker hosts using SSD media and head thrashing
|
||||
on Docker hosts using spinning media. Both of these result in poor performance.
|
||||
|
||||
Recent versions of Btrfs allow you to specify `autodefrag` as a mount
|
||||
option. This mode attempts to detect random writes and defragment them. You
|
||||
should perform your own tests before enabling this option on your Docker hosts.
|
||||
Some tests have shown this option has a negative performance impact on Docker
|
||||
hosts performing lots of small writes (including systems that start and stop
|
||||
many containers).
|
||||
|
||||
- **Solid State Devices (SSD)**. Btrfs has native optimizations for SSD media.
|
||||
To enable these, mount with the `-o ssd` mount option. These optimizations
|
||||
include enhanced SSD write performance by avoiding things like *seek
|
||||
optimizations* that have no use on SSD media.
|
||||
|
||||
Btfs also supports the TRIM/Discard primitives. However, mounting with the
|
||||
`-o discard` mount option can cause performance issues. Therefore, it is
|
||||
recommended you perform your own tests before using this option.
|
||||
|
||||
- **Use Data Volumes**. Data volumes provide the best and most predictable
|
||||
performance. This is because they bypass the storage driver and do not incur
|
||||
any of the potential overheads introduced by thin provisioning and
|
||||
copy-on-write. For this reason, you should place heavy write workloads on data
|
||||
volumes.
|
||||
|
||||
## Related Information
|
||||
|
||||
* [Understand images, containers, and storage drivers](imagesandcontainers.md)
|
||||
* [Select a storage driver](selectadriver.md)
|
||||
* [AUFS storage driver in practice](aufs-driver.md)
|
||||
* [Device Mapper storage driver in practice](device-mapper-driver.md)
|
||||
412
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/device-mapper-driver.md
generated
vendored
Normal file
@@ -0,0 +1,412 @@
|
||||
<!--[metadata]>
|
||||
+++
|
||||
title="Device mapper storage in practice"
|
||||
description="Learn how to optimize your use of device mapper driver."
|
||||
keywords=["container, storage, driver, device mapper"]
|
||||
[menu.main]
|
||||
parent="engine_driver"
|
||||
+++
|
||||
<![end-metadata]-->
|
||||
|
||||
# Docker and the Device Mapper storage driver
|
||||
|
||||
Device Mapper is a kernel-based framework that underpins many advanced
|
||||
volume management technologies on Linux. Docker's `devicemapper` storage driver
|
||||
leverages the thin provisioning and snapshotting capabilities of this framework
|
||||
for image and container management. This article refers to the Device Mapper
|
||||
storage driver as `devicemapper`, and the kernel framework as `Device Mapper`.
|
||||
|
||||
|
||||
>**Note**: The [Commercially Supported Docker Engine (CS-Engine) running on RHEL and CentOS Linux](https://www.docker.com/compatibility-maintenance) requires that you use the `devicemapper` storage driver.
|
||||
|
||||
|
||||
## An alternative to AUFS
|
||||
|
||||
Docker originally ran on Ubuntu and Debian Linux and used AUFS for its storage
|
||||
backend. As Docker became popular, many of the companies that wanted to use it
|
||||
were using Red Hat Enterprise Linux (RHEL). Unfortunately, because the upstream
|
||||
mainline Linux kernel did not include AUFS, RHEL did not use AUFS either.
|
||||
|
||||
To correct this Red Hat developers investigated getting AUFS into the mainline
|
||||
kernel. Ultimately, though, they decided a better idea was to develop a new
|
||||
storage backend. Moreover, they would base this new storage backend on existing
|
||||
`Device Mapper` technology.
|
||||
|
||||
Red Hat collaborated with Docker Inc. to contribute this new driver. As a result
|
||||
of this collaboration, Docker's Engine was re-engineered to make the storage
|
||||
backend pluggable. So it was that the `devicemapper` became the second storage
|
||||
driver Docker supported.
|
||||
|
||||
Device Mapper has been included in the mainline Linux kernel since version
|
||||
2.6.9. It is a core part of RHEL family of Linux distributions. This means that
|
||||
the `devicemapper` storage driver is based on stable code that has a lot of
|
||||
real-world production deployments and strong community support.
|
||||
|
||||
|
||||
## Image layering and sharing
|
||||
|
||||
The `devicemapper` driver stores every image and container on its own virtual
|
||||
device. These devices are thin-provisioned copy-on-write snapshot devices.
|
||||
Device Mapper technology works at the block level rather than the file level.
|
||||
This means that `devicemapper` storage driver's thin provisioning and
|
||||
copy-on-write operations work with blocks rather than entire files.
|
||||
|
||||
>**Note**: Snapshots are also referred to as *thin devices* or *virtual
|
||||
>devices*. They all mean the same thing in the context of the `devicemapper`
|
||||
>storage driver.
|
||||
|
||||
With `devicemapper` the high level process for creating images is as follows:
|
||||
|
||||
1. The `devicemapper` storage driver creates a thin pool.
|
||||
|
||||
The pool is created from block devices or loop mounted sparse files (more
|
||||
on this later).
|
||||
|
||||
2. Next it creates a *base device*.
|
||||
|
||||
A base device is a thin device with a filesystem. You can see which
|
||||
filesystem is in use by running the `docker info` command and checking the
|
||||
`Backing filesystem` value.
|
||||
|
||||
3. Each new image (and image layer) is a snapshot of this base device.
|
||||
|
||||
These are thin provisioned copy-on-write snapshots. This means that they
|
||||
are initially empty and only consume space from the pool when data is written
|
||||
to them.
|
||||
|
||||
With `devicemapper`, container layers are snapshots of the image they are
|
||||
created from. Just as with images, container snapshots are thin provisioned
|
||||
copy-on-write snapshots. The container snapshot stores all updates to the
|
||||
container. The `devicemapper` allocates space to them on-demand from the pool
|
||||
as and when data is written to the container.
|
||||
|
||||
The high level diagram below shows a thin pool with a base device and two
|
||||
images.
|
||||
|
||||

|
||||
|
||||
If you look closely at the diagram you'll see that it's snapshots all the way
|
||||
down. Each image layer is a snapshot of the layer below it. The lowest layer of
|
||||
each image is a snapshot of the the base device that exists in the pool. This
|
||||
base device is a `Device Mapper` artifact and not a Docker image layer.
|
||||
|
||||
A container is a snapshot of the image it is created from. The diagram below
|
||||
shows two containers - one based on the Ubuntu image and the other based on the
|
||||
Busybox image.
|
||||
|
||||

|
||||
|
||||
|
||||
## Reads with the devicemapper
|
||||
|
||||
Let's look at how reads and writes occur using the `devicemapper` storage
|
||||
driver. The diagram below shows the high level process for reading a single
|
||||
block (`0x44f`) in an example container.
|
||||
|
||||

|
||||
|
||||
1. An application makes a read request for block `0x44f` in the container.
|
||||
|
||||
Because the container is a thin snapshot of an image it does not have the
|
||||
data. Instead, it has a pointer (PTR) to where the data is stored in the image
|
||||
snapshot lower down in the image stack.
|
||||
|
||||
2. The storage driver follows the pointer to block `0xf33` in the snapshot
|
||||
relating to image layer `a005...`.
|
||||
|
||||
3. The `devicemapper` copies the contents of block `0xf33` from the image
|
||||
snapshot to memory in the container.
|
||||
|
||||
4. The storage driver returns the data to the requesting application.
|
||||
|
||||
### Write examples
|
||||
|
||||
With the `devicemapper` driver, writing new data to a container is accomplished
|
||||
by an *allocate-on-demand* operation. Updating existing data uses a
|
||||
copy-on-write operation. Because Device Mapper is a block-based technology
|
||||
these operations occur at the block level.
|
||||
|
||||
For example, when making a small change to a large file in a container, the
|
||||
`devicemapper` storage driver does not copy the entire file. It only copies the
|
||||
blocks to be modified. Each block is 64KB.
|
||||
|
||||
#### Writing new data
|
||||
|
||||
To write 56KB of new data to a container:
|
||||
|
||||
1. An application makes a request to write 56KB of new data to the container.
|
||||
|
||||
2. The allocate-on-demand operation allocates a single new 64KB block to the
|
||||
container's snapshot.
|
||||
|
||||
If the write operation is larger than 64KB, multiple new blocks are
|
||||
allocated to the container's snapshot.
|
||||
|
||||
3. The data is written to the newly allocated block.
|
||||
|
||||
#### Overwriting existing data
|
||||
|
||||
To modify existing data for the first time:
|
||||
|
||||
1. An application makes a request to modify some data in the container.
|
||||
|
||||
2. A copy-on-write operation locates the blocks that need updating.
|
||||
|
||||
3. The operation allocates new empty blocks to the container snapshot and
|
||||
copies the data into those blocks.
|
||||
|
||||
4. The modified data is written into the newly allocated blocks.
|
||||
|
||||
The application in the container is unaware of any of these
|
||||
allocate-on-demand and copy-on-write operations. However, they may add latency
|
||||
to the application's read and write operations.
|
||||
|
||||
## Configuring Docker with Device Mapper
|
||||
|
||||
The `devicemapper` is the default Docker storage driver on some Linux
|
||||
distributions. This includes RHEL and most of its forks. Currently, the
|
||||
following distributions support the driver:
|
||||
|
||||
* RHEL/CentOS/Fedora
|
||||
* Ubuntu 12.04
|
||||
* Ubuntu 14.04
|
||||
* Debian
|
||||
|
||||
Docker hosts running the `devicemapper` storage driver default to a
|
||||
configuration mode known as `loop-lvm`. This mode uses sparse files to build
|
||||
the thin pool used by image and container snapshots. The mode is designed to
|
||||
work out-of-the-box with no additional configuration. However, production
|
||||
deployments should not run under `loop-lvm` mode.
|
||||
|
||||
You can detect the mode by viewing the `docker info` command:
|
||||
|
||||
$ sudo docker info
|
||||
Containers: 0
|
||||
Images: 0
|
||||
Storage Driver: devicemapper
|
||||
Pool Name: docker-202:2-25220302-pool
|
||||
Pool Blocksize: 65.54 kB
|
||||
Backing Filesystem: xfs
|
||||
...
|
||||
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
|
||||
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
|
||||
Library Version: 1.02.93-RHEL7 (2015-01-28)
|
||||
...
|
||||
|
||||
The output above shows a Docker host running with the `devicemapper` storage
|
||||
driver operating in `loop-lvm` mode. This is indicated by the fact that the
|
||||
`Data loop file` and a `Metadata loop file` are on files under
|
||||
`/var/lib/docker/devicemapper/devicemapper`. These are loopback mounted sparse
|
||||
files.
|
||||
|
||||
### Configure direct-lvm mode for production
|
||||
|
||||
The preferred configuration for production deployments is `direct lvm`. This
|
||||
mode uses block devices to create the thin pool. The following procedure shows
|
||||
you how to configure a Docker host to use the `devicemapper` storage driver in
|
||||
a `direct-lvm` configuration.
|
||||
|
||||
> **Caution:** If you have already run the Docker daemon on your Docker host
|
||||
> and have images you want to keep, `push` them Docker Hub or your private
|
||||
> Docker Trusted Registry before attempting this procedure.
|
||||
|
||||
The procedure below will create a 90GB data volume and 4GB metadata volume to
|
||||
use as backing for the storage pool. It assumes that you have a spare block
|
||||
device at `/dev/xvdf` with enough free space to complete the task. The device
|
||||
identifier and volume sizes may be be different in your environment and you
|
||||
should substitute your own values throughout the procedure. The procedure also
|
||||
assumes that the Docker daemon is in the `stopped` state.
|
||||
|
||||
1. Log in to the Docker host you want to configure and stop the Docker daemon.
|
||||
|
||||
2. If it exists, delete your existing image store by removing the
|
||||
`/var/lib/docker` directory.
|
||||
|
||||
$ sudo rm -rf /var/lib/docker
|
||||
|
||||
3. Create an LVM physical volume (PV) on your spare block device using the
|
||||
`pvcreate` command.
|
||||
|
||||
$ sudo pvcreate /dev/xvdf
|
||||
Physical volume `/dev/xvdf` successfully created
|
||||
|
||||
The device identifier may be different on your system. Remember to
|
||||
substitute your value in the command above.
|
||||
|
||||
4. Create a new volume group (VG) called `vg-docker` using the PV created in
|
||||
the previous step.
|
||||
|
||||
$ sudo vgcreate vg-docker /dev/xvdf
|
||||
Volume group `vg-docker` successfully created
|
||||
|
||||
5. Create a new 90GB logical volume (LV) called `data` from space in the
|
||||
`vg-docker` volume group.
|
||||
|
||||
$ sudo lvcreate -L 90G -n data vg-docker
|
||||
Logical volume `data` created.
|
||||
|
||||
The command creates an LVM logical volume called `data` and an associated
|
||||
block device file at `/dev/vg-docker/data`. In a later step, you instruct the
|
||||
`devicemapper` storage driver to use this block device to store image and
|
||||
container data.
|
||||
|
||||
If you receive a signature detection warning, make sure you are working on
|
||||
the correct devices before continuing. Signature warnings indicate that the
|
||||
device you're working on is currently in use by LVM or has been used by LVM in
|
||||
the past.
|
||||
|
||||
6. Create a new logical volume (LV) called `metadata` from space in the
|
||||
`vg-docker` volume group.
|
||||
|
||||
$ sudo lvcreate -L 4G -n metadata vg-docker
|
||||
Logical volume `metadata` created.
|
||||
|
||||
This creates an LVM logical volume called `metadata` and an associated
|
||||
block device file at `/dev/vg-docker/metadata`. In the next step you instruct
|
||||
the `devicemapper` storage driver to use this block device to store image and
|
||||
container metadata.
|
||||
|
||||
7. Start the Docker daemon with the `devicemapper` storage driver and the
|
||||
`--storage-opt` flags.
|
||||
|
||||
The `data` and `metadata` devices that you pass to the `--storage-opt`
|
||||
options were created in the previous steps.
|
||||
|
||||
$ sudo docker daemon --storage-driver=devicemapper --storage-opt dm.datadev=/dev/vg-docker/data --storage-opt dm.metadatadev=/dev/vg-docker/metadata &
|
||||
[1] 2163
|
||||
[root@ip-10-0-0-75 centos]# INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
|
||||
INFO[0027] Option DefaultDriver: bridge
|
||||
INFO[0027] Option DefaultNetwork: bridge
|
||||
<output truncated>
|
||||
INFO[0027] Daemon has completed initialization
|
||||
INFO[0027] Docker daemon commit=0a8c2e3 execdriver=native-0.2 graphdriver=devicemapper version=1.8.2
|
||||
|
||||
It is also possible to set the `--storage-driver` and `--storage-opt` flags
|
||||
in the Docker config file and start the daemon normally using the `service` or
|
||||
`systemd` commands.
|
||||
|
||||
8. Use the `docker info` command to verify that the daemon is using `data` and
|
||||
`metadata` devices you created.
|
||||
|
||||
$ sudo docker info
|
||||
INFO[0180] GET /v1.20/info
|
||||
Containers: 0
|
||||
Images: 0
|
||||
Storage Driver: devicemapper
|
||||
Pool Name: docker-202:1-1032-pool
|
||||
Pool Blocksize: 65.54 kB
|
||||
Backing Filesystem: xfs
|
||||
Data file: /dev/vg-docker/data
|
||||
Metadata file: /dev/vg-docker/metadata
|
||||
[...]
|
||||
|
||||
The output of the command above shows the storage driver as `devicemapper`.
|
||||
The last two lines also confirm that the correct devices are being used for
|
||||
the `Data file` and the `Metadata file`.
|
||||
|
||||
### Examine devicemapper structures on the host
|
||||
|
||||
You can use the `lsblk` command to see the device files created above and the
|
||||
`pool` that the `devicemapper` storage driver creates on top of them.
|
||||
|
||||
$ sudo lsblk
|
||||
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
|
||||
xvda 202:0 0 8G 0 disk
|
||||
└─xvda1 202:1 0 8G 0 part /
|
||||
xvdf 202:80 0 10G 0 disk
|
||||
├─vg--docker-data 253:0 0 90G 0 lvm
|
||||
│ └─docker-202:1-1032-pool 253:2 0 10G 0 dm
|
||||
└─vg--docker-metadata 253:1 0 4G 0 lvm
|
||||
└─docker-202:1-1032-pool 253:2 0 10G 0 dm
|
||||
|
||||
The diagram below shows the image from prior examples updated with the detail
|
||||
from the `lsblk` command above.
|
||||
|
||||

|
||||
|
||||
In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data`
|
||||
and `metadata` devices created earlier. The `devicemapper` constructs the pool
|
||||
name as follows:
|
||||
|
||||
```
|
||||
Docker-MAJ:MIN-INO-pool
|
||||
```
|
||||
|
||||
`MAJ`, `MIN` and `INO` refer to the major and minor device numbers and inode.
|
||||
|
||||
Because Device Mapper operates at the block level it is more difficult to see
|
||||
diffs between image layers and containers. Docker 1.10 and later no longer
|
||||
matches image layer IDs with directory names in `/var/lib/docker`. However,
|
||||
there are two key directories. The `/var/lib/docker/devicemapper/mnt` directory
|
||||
contains the mount points for image and container layers. The
|
||||
`/var/lib/docker/devicemapper/metadata`directory contains one file for every
|
||||
image layer and container snapshot. The files contain metadata about each
|
||||
snapshot in JSON format.
|
||||
|
||||
## Device Mapper and Docker performance
|
||||
|
||||
It is important to understand the impact that allocate-on-demand and
|
||||
copy-on-write operations can have on overall container performance.
|
||||
|
||||
### Allocate-on-demand performance impact
|
||||
|
||||
The `devicemapper` storage driver allocates new blocks to a container via an
|
||||
allocate-on-demand operation. This means that each time an app writes to
|
||||
somewhere new inside a container, one or more empty blocks has to be located
|
||||
from the pool and mapped into the container.
|
||||
|
||||
All blocks are 64KB. A write that uses less than 64KB still results in a single
|
||||
64KB block being allocated. Writing more than 64KB of data uses multiple 64KB
|
||||
blocks. This can impact container performance, especially in containers that
|
||||
perform lots of small writes. However, once a block is allocated to a container
|
||||
subsequent reads and writes can operate directly on that block.
|
||||
|
||||
### Copy-on-write performance impact
|
||||
|
||||
Each time a container updates existing data for the first time, the
|
||||
`devicemapper` storage driver has to perform a copy-on-write operation. This
|
||||
copies the data from the image snapshot to the container's snapshot. This
|
||||
process can have a noticeable impact on container performance.
|
||||
|
||||
All copy-on-write operations have a 64KB granularity. As a results, updating
|
||||
32KB of a 1GB file causes the driver to copy a single 64KB block into the
|
||||
container's snapshot. This has obvious performance advantages over file-level
|
||||
copy-on-write operations which would require copying the entire 1GB file into
|
||||
the container layer.
|
||||
|
||||
In practice, however, containers that perform lots of small block writes
|
||||
(<64KB) can perform worse with `devicemapper` than with AUFS.
|
||||
|
||||
### Other device mapper performance considerations
|
||||
|
||||
There are several other things that impact the performance of the
|
||||
`devicemapper` storage driver.
|
||||
|
||||
- **The mode.** The default mode for Docker running the `devicemapper` storage
|
||||
driver is `loop-lvm`. This mode uses sparse files and suffers from poor
|
||||
performance. It is **not recommended for production**. The recommended mode for
|
||||
production environments is `direct-lvm` where the storage driver writes
|
||||
directly to raw block devices.
|
||||
|
||||
- **High speed storage.** For best performance you should place the `Data file`
|
||||
and `Metadata file` on high speed storage such as SSD. This can be direct
|
||||
attached storage or from a SAN or NAS array.
|
||||
|
||||
- **Memory usage.** `devicemapper` is not the most memory efficient Docker
|
||||
storage driver. Launching *n* copies of the same container loads *n* copies of
|
||||
its files into memory. This can have a memory impact on your Docker host. As a
|
||||
result, the `devicemapper` storage driver may not be the best choice for PaaS
|
||||
and other high density use cases.
|
||||
|
||||
One final point, data volumes provide the best and most predictable
|
||||
performance. This is because they bypass the storage driver and do not incur
|
||||
any of the potential overheads introduced by thin provisioning and
|
||||
copy-on-write. For this reason, you should to place heavy write workloads on
|
||||
data volumes.
|
||||
|
||||
## Related Information
|
||||
|
||||
* [Understand images, containers, and storage drivers](imagesandcontainers.md)
|
||||
* [Select a storage driver](selectadriver.md)
|
||||
* [AUFS storage driver in practice](aufs-driver.md)
|
||||
* [Btrfs storage driver in practice](btrfs-driver.md)
|
||||
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/aufs_delete.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 38 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/aufs_layers.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 81 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/aufs_metadata.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 26 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/base_device.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 46 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/btfs_constructs.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 62 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/btfs_container_layer.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 66 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/btfs_layers.png
generated
vendored
Normal file
|
After Width: | Height: | Size: 68 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/btfs_pool.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 42 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/btfs_snapshots.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 19 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/btfs_subvolume.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 30 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/container-layers-cas.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 136 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/container-layers.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 45 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/dm_container.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 50 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/driver-pros-cons.png
generated
vendored
Normal file
|
After Width: | Height: | Size: 103 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/image-layers.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 26 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/overlay_constructs.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 48 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/overlay_constructs2.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 83 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/saving-space.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 56 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/shared-uuid.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 246 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/shared-volume.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 48 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/sharing-layers.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 55 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/two_dm_container.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 64 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/zfs_clones.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 22 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/zfs_zpool.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 30 KiB |
BIN
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/images/zpool_blocks.jpg
generated
vendored
Normal file
|
After Width: | Height: | Size: 42 KiB |
495
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/imagesandcontainers.md
generated
vendored
Normal file
@@ -0,0 +1,495 @@
|
||||
<!--[metadata]>
|
||||
+++
|
||||
title = "Understand images, containers, and storage drivers"
|
||||
description = "Learn the technologies that support storage drivers."
|
||||
keywords = ["container, storage, driver, AUFS, btfs, devicemapper,zvfs"]
|
||||
[menu.main]
|
||||
parent = "engine_driver"
|
||||
weight = -2
|
||||
+++
|
||||
<![end-metadata]-->
|
||||
|
||||
|
||||
# Understand images, containers, and storage drivers
|
||||
|
||||
To use storage drivers effectively, you must understand how Docker builds and
|
||||
stores images. Then, you need an understanding of how these images are used by
|
||||
containers. Finally, you'll need a short introduction to the technologies that
|
||||
enable both images and container operations.
|
||||
|
||||
## Images and layers
|
||||
|
||||
Each Docker image references a list of read-only layers that represent
|
||||
filesystem differences. Layers are stacked on top of each other to form a base
|
||||
for a container's root filesystem. The diagram below shows the Ubuntu 15.04
|
||||
image comprising 4 stacked image layers.
|
||||
|
||||

|
||||
|
||||
The Docker storage driver is responsible for stacking these layers and
|
||||
providing a single unified view.
|
||||
|
||||
When you create a new container, you add a new, thin, writable layer on top of
|
||||
the underlying stack. This layer is often called the "container layer". All
|
||||
changes made to the running container - such as writing new files, modifying
|
||||
existing files, and deleting files - are written to this thin writable
|
||||
container layer. The diagram below shows a container based on the Ubuntu 15.04
|
||||
image.
|
||||
|
||||

|
||||
|
||||
### Content addressable storage
|
||||
|
||||
Docker 1.10 introduced a new content addressable storage model. This is a
|
||||
completely new way to address image and layer data on disk. Previously, image
|
||||
and layer data was referenced and stored using a a randomly generated UUID. In
|
||||
the new model this is replaced by a secure *content hash*.
|
||||
|
||||
The new model improves security, provides a built-in way to avoid ID
|
||||
collisions, and guarantees data integrity after pull, push, load, and save
|
||||
operations. It also enables better sharing of layers by allowing many images to
|
||||
freely share their layers even if they didn’t come from the same build.
|
||||
|
||||
The diagram below shows an updated version of the previous diagram,
|
||||
highlighting the changes implemented by Docker 1.10.
|
||||
|
||||

|
||||
|
||||
As can be seen, all image layer IDs are cryptographic hashes, whereas the
|
||||
container ID is still a randomly generated UUID.
|
||||
|
||||
There are several things to note regarding the new model. These include:
|
||||
|
||||
1. Migration of existing images
|
||||
2. Image and layer filesystem structures
|
||||
|
||||
Existing images, those created and pulled by earlier versions of Docker, need
|
||||
to be migrated before they can be used with the new model. This migration
|
||||
involves calculating new secure checksums and is performed automatically the
|
||||
first time you start an updated Docker daemon. After the migration is complete,
|
||||
all images and tags will have brand new secure IDs.
|
||||
|
||||
Although the migration is automatic and transparent, it is computationally
|
||||
intensive. This means it and can take time if you have lots of image data.
|
||||
During this time your Docker daemon will not respond to other requests.
|
||||
|
||||
A migration tool exists that allows you to migrate existing images to the new
|
||||
format before upgrading your Docker daemon. This means that upgraded Docker
|
||||
daemons do not need to perform the migration in-band, and therefore avoids any
|
||||
associated downtime. It also provides a way to manually migrate existing images
|
||||
so that they can be distributed to other Docker daemons in your environment
|
||||
that are already running the latest versions of Docker.
|
||||
|
||||
The migration tool is provided by Docker, Inc., and runs as a container. You
|
||||
can download it from [https://github.com/docker/v1.10-migrator/releases](https://github.com/docker/v1.10-migrator/releases).
|
||||
|
||||
While running the "migrator" image you need to expose your Docker host's data
|
||||
directory to the container. If you are using the default Docker data path, the
|
||||
command to run the container will look like this
|
||||
|
||||
$ sudo docker run --rm -v /var/lib/docker:/var/lib/docker docker/v1.10-migrator
|
||||
|
||||
If you use the `devicemapper` storage driver, you will need to include the
|
||||
`--privileged` option so that the container has access to your storage devices.
|
||||
|
||||
#### Migration example
|
||||
|
||||
The following example shows the migration tool in use on a Docker host running
|
||||
version 1.9.1 of the Docker daemon and the AUFS storage driver. The Docker host
|
||||
is running on a **t2.micro** AWS EC2 instance with 1 vCPU, 1GB RAM, and a
|
||||
single 8GB general purpose SSD EBS volume. The Docker data directory
|
||||
(`/var/lib/docker`) was consuming 2GB of space.
|
||||
|
||||
$ docker images
|
||||
REPOSITORY TAG IMAGE ID CREATED SIZE
|
||||
jenkins latest 285c9f0f9d3d 17 hours ago 708.5 MB
|
||||
mysql latest d39c3fa09ced 8 days ago 360.3 MB
|
||||
mongo latest a74137af4532 13 days ago 317.4 MB
|
||||
postgres latest 9aae83d4127f 13 days ago 270.7 MB
|
||||
redis latest 8bccd73928d9 2 weeks ago 151.3 MB
|
||||
centos latest c8a648134623 4 weeks ago 196.6 MB
|
||||
ubuntu 15.04 c8be1ac8145a 7 weeks ago 131.3 MB
|
||||
|
||||
$ du -hs /var/lib/docker
|
||||
2.0G /var/lib/docker
|
||||
|
||||
$ time docker run --rm -v /var/lib/docker:/var/lib/docker docker/v1.10-migrator
|
||||
Unable to find image 'docker/v1.10-migrator:latest' locally
|
||||
latest: Pulling from docker/v1.10-migrator
|
||||
ed1f33c5883d: Pull complete
|
||||
b3ca410aa2c1: Pull complete
|
||||
2b9c6ed9099e: Pull complete
|
||||
dce7e318b173: Pull complete
|
||||
Digest: sha256:bd2b245d5d22dd94ec4a8417a9b81bb5e90b171031c6e216484db3fe300c2097
|
||||
Status: Downloaded newer image for docker/v1.10-migrator:latest
|
||||
time="2016-01-27T12:31:06Z" level=debug msg="Assembling tar data for 01e70da302a553ba13485ad020a0d77dbb47575a31c4f48221137bb08f45878d from /var/lib/docker/aufs/diff/01e70da302a553ba13485ad020a0d77dbb47575a31c4f48221137bb08f45878d"
|
||||
time="2016-01-27T12:31:06Z" level=debug msg="Assembling tar data for 07ac220aeeef9febf1ac16a9d1a4eff7ef3c8cbf5ed0be6b6f4c35952ed7920d from /var/lib/docker/aufs/diff/07ac220aeeef9febf1ac16a9d1a4eff7ef3c8cbf5ed0be6b6f4c35952ed7920d"
|
||||
<snip>
|
||||
time="2016-01-27T12:32:00Z" level=debug msg="layer dbacfa057b30b1feaf15937c28bd8ca0d6c634fc311ccc35bd8d56d017595d5b took 10.80 seconds"
|
||||
|
||||
real 0m59.583s
|
||||
user 0m0.046s
|
||||
sys 0m0.008s
|
||||
|
||||
The Unix `time` command prepends the `docker run` command to produce timings
|
||||
for the operation. As can be seen, the overall time taken to migrate 7 images
|
||||
comprising 2GB of disk space took approximately 1 minute. However, this
|
||||
included the time taken to pull the `docker/v1.10-migrator` image
|
||||
(approximately 3.5 seconds). The same operation on an m4.10xlarge EC2 instance
|
||||
with 40 vCPUs, 160GB RAM and an 8GB provisioned IOPS EBS volume resulted in the
|
||||
following improved timings:
|
||||
|
||||
real 0m9.871s
|
||||
user 0m0.094s
|
||||
sys 0m0.021s
|
||||
|
||||
This shows that the migration operation is affected by the hardware spec of the
|
||||
machine performing the migration.
|
||||
|
||||
## Container and layers
|
||||
|
||||
The major difference between a container and an image is the top writable
|
||||
layer. All writes to the container that add new or modify existing data are
|
||||
stored in this writable layer. When the container is deleted the writable layer
|
||||
is also deleted. The underlying image remains unchanged.
|
||||
|
||||
Because each container has its own thin writable container layer, and all
|
||||
changes are stored this container layer, this means that multiple containers
|
||||
can share access to the same underlying image and yet have their own data
|
||||
state. The diagram below shows multiple containers sharing the same Ubuntu
|
||||
15.04 image.
|
||||
|
||||

|
||||
|
||||
The Docker storage driver is responsible for enabling and managing both the
|
||||
image layers and the writable container layer. How a storage driver
|
||||
accomplishes these can vary between drivers. Two key technologies behind Docker
|
||||
image and container management are stackable image layers and copy-on-write
|
||||
(CoW).
|
||||
|
||||
|
||||
## The copy-on-write strategy
|
||||
|
||||
Sharing is a good way to optimize resources. People do this instinctively in
|
||||
daily life. For example, twins Jane and Joseph taking an Algebra class at
|
||||
different times from different teachers can share the same exercise book by
|
||||
passing it between each other. Now, suppose Jane gets an assignment to complete
|
||||
the homework on page 11 in the book. At that point, Jane copies page 11,
|
||||
completes the homework, and hands in her copy. The original exercise book is
|
||||
unchanged and only Jane has a copy of the changed page 11.
|
||||
|
||||
Copy-on-write is a similar strategy of sharing and copying. In this strategy,
|
||||
system processes that need the same data share the same instance of that data
|
||||
rather than having their own copy. At some point, if one process needs to
|
||||
modify or write to the data, only then does the operating system make a copy of
|
||||
the data for that process to use. Only the process that needs to write has
|
||||
access to the data copy. All the other processes continue to use the original
|
||||
data.
|
||||
|
||||
Docker uses a copy-on-write technology with both images and containers. This
|
||||
CoW strategy optimizes both image disk space usage and the performance of
|
||||
container start times. The next sections look at how copy-on-write is leveraged
|
||||
with images and containers through sharing and copying.
|
||||
|
||||
### Sharing promotes smaller images
|
||||
|
||||
This section looks at image layers and copy-on-write technology. All image and
|
||||
container layers exist inside the Docker host's *local storage area* and are
|
||||
managed by the storage driver. On Linux-based Docker hosts this is usually
|
||||
located under `/var/lib/docker/`.
|
||||
|
||||
The Docker client reports on image layers when instructed to pull and push
|
||||
images with `docker pull` and `docker push`. The command below pulls the
|
||||
`ubuntu:15.04` Docker image from Docker Hub.
|
||||
|
||||
$ docker pull ubuntu:15.04
|
||||
15.04: Pulling from library/ubuntu
|
||||
1ba8ac955b97: Pull complete
|
||||
f157c4e5ede7: Pull complete
|
||||
0b7e98f84c4c: Pull complete
|
||||
a3ed95caeb02: Pull complete
|
||||
Digest: sha256:5e279a9df07990286cce22e1b0f5b0490629ca6d187698746ae5e28e604a640e
|
||||
Status: Downloaded newer image for ubuntu:15.04
|
||||
|
||||
From the output, you'll see that the command actually pulls 4 image layers.
|
||||
Each of the above lines lists an image layer and its UUID or cryptographic
|
||||
hash. The combination of these four layers makes up the `ubuntu:15.04` Docker
|
||||
image.
|
||||
|
||||
Each of these layers is stored in its own directory inside the Docker host's
|
||||
local storage are.
|
||||
|
||||
Versions of Docker prior to 1.10 stored each layer in a directory with the same
|
||||
name as the image layer ID. However, this is not the case for images pulled
|
||||
with Docker version 1.10 and later. For example, the command below shows an
|
||||
image being pulled from Docker Hub, followed by a directory listing on a host
|
||||
running version 1.9.1 of the Docker Engine.
|
||||
|
||||
$ docker pull ubuntu:15.04
|
||||
15.04: Pulling from library/ubuntu
|
||||
47984b517ca9: Pull complete
|
||||
df6e891a3ea9: Pull complete
|
||||
e65155041eed: Pull complete
|
||||
c8be1ac8145a: Pull complete
|
||||
Digest: sha256:5e279a9df07990286cce22e1b0f5b0490629ca6d187698746ae5e28e604a640e
|
||||
Status: Downloaded newer image for ubuntu:15.04
|
||||
|
||||
$ ls /var/lib/docker/aufs/layers
|
||||
47984b517ca9ca0312aced5c9698753ffa964c2015f2a5f18e5efa9848cf30e2
|
||||
c8be1ac8145a6e59a55667f573883749ad66eaeef92b4df17e5ea1260e2d7356
|
||||
df6e891a3ea9cdce2a388a2cf1b1711629557454fd120abd5be6d32329a0e0ac
|
||||
e65155041eed7ec58dea78d90286048055ca75d41ea893c7246e794389ecf203
|
||||
|
||||
Notice how the four directories match up with the layer IDs of the downloaded
|
||||
image. Now compare this with the same operations performed on a host running
|
||||
version 1.10 of the Docker Engine.
|
||||
|
||||
$ docker pull ubuntu:15.04
|
||||
15.04: Pulling from library/ubuntu
|
||||
1ba8ac955b97: Pull complete
|
||||
f157c4e5ede7: Pull complete
|
||||
0b7e98f84c4c: Pull complete
|
||||
a3ed95caeb02: Pull complete
|
||||
Digest: sha256:5e279a9df07990286cce22e1b0f5b0490629ca6d187698746ae5e28e604a640e
|
||||
Status: Downloaded newer image for ubuntu:15.04
|
||||
|
||||
$ ls /var/lib/docker/aufs/layers/
|
||||
1d6674ff835b10f76e354806e16b950f91a191d3b471236609ab13a930275e24
|
||||
5dbb0cbe0148cf447b9464a358c1587be586058d9a4c9ce079320265e2bb94e7
|
||||
bef7199f2ed8e86fa4ada1309cfad3089e0542fec8894690529e4c04a7ca2d73
|
||||
ebf814eccfe98f2704660ca1d844e4348db3b5ccc637eb905d4818fbfb00a06a
|
||||
|
||||
See how the four directories do not match up with the image layer IDs pulled in
|
||||
the previous step.
|
||||
|
||||
Despite the differences between image management before and after version 1.10,
|
||||
all versions of Docker still allow images to share layers. For example, If you
|
||||
`pull` an image that shares some of the same image layers as an image that has
|
||||
already been pulled, the Docker daemon recognizes this, and only pulls the
|
||||
layers it doesn't already have stored locally. After the second pull, the two
|
||||
images will share any common image layers.
|
||||
|
||||
You can illustrate this now for yourself. Starting with the `ubuntu:15.04`
|
||||
image that you just pulled, make a change to it, and build a new image based on
|
||||
the change. One way to do this is using a `Dockerfile` and the `docker build`
|
||||
command.
|
||||
|
||||
1. In an empty directory, create a simple `Dockerfile` that starts with the
|
||||
2. ubuntu:15.04 image.
|
||||
|
||||
FROM ubuntu:15.04
|
||||
|
||||
2. Add a new file called "newfile" in the image's `/tmp` directory with the
|
||||
3. text "Hello world" in it.
|
||||
|
||||
When you are done, the `Dockerfile` contains two lines:
|
||||
|
||||
FROM ubuntu:15.04
|
||||
|
||||
RUN echo "Hello world" > /tmp/newfile
|
||||
|
||||
3. Save and close the file.
|
||||
|
||||
4. From a terminal in the same folder as your `Dockerfile`, run the following
|
||||
5. command:
|
||||
|
||||
$ docker build -t changed-ubuntu .
|
||||
Sending build context to Docker daemon 2.048 kB
|
||||
Step 1 : FROM ubuntu:15.04
|
||||
---> 3f7bcee56709
|
||||
Step 2 : RUN echo "Hello world" > /tmp/newfile
|
||||
---> Running in d14acd6fad4e
|
||||
---> 94e6b7d2c720
|
||||
Removing intermediate container d14acd6fad4e
|
||||
Successfully built 94e6b7d2c720
|
||||
|
||||
> **Note:** The period (.) at the end of the above command is important. It
|
||||
> tells the `docker build` command to use the current working directory as
|
||||
> its build context.
|
||||
|
||||
The output above shows a new image with image ID `94e6b7d2c720`.
|
||||
|
||||
5. Run the `docker images` command to verify the new `changed-ubuntu` image is
|
||||
6. in the Docker host's local storage area.
|
||||
|
||||
REPOSITORY TAG IMAGE ID CREATED SIZE
|
||||
changed-ubuntu latest 03b964f68d06 33 seconds ago 131.4 MB
|
||||
ubuntu 15.04 013f3d01d247 6 weeks ago 131.3 MB
|
||||
|
||||
6. Run the `docker history` command to see which image layers were used to
|
||||
7. create the new `changed-ubuntu` image.
|
||||
|
||||
$ docker history changed-ubuntu
|
||||
IMAGE CREATED CREATED BY SIZE COMMENT
|
||||
94e6b7d2c720 2 minutes ago /bin/sh -c echo "Hello world" > /tmp/newfile 12 B
|
||||
3f7bcee56709 6 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
|
||||
<missing> 6 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.879 kB
|
||||
<missing> 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 701 B
|
||||
<missing> 6 weeks ago /bin/sh -c #(nop) ADD file:8e4943cd86e9b2ca13 131.3 MB
|
||||
|
||||
The `docker history` output shows the new `94e6b7d2c720` image layer at the
|
||||
top. You know that this is the new image layer added because it was created
|
||||
by the `echo "Hello world" > /tmp/newfile` command in your `Dockerfile`.
|
||||
The 4 image layers below it are the exact same image layers
|
||||
that make up the `ubuntu:15.04` image.
|
||||
|
||||
> **Note:** Under the content addressable storage model introduced with Docker
|
||||
> 1.10, image history data is no longer stored in a config file with each image
|
||||
> layer. It is now stored as a string of text in a single config file that
|
||||
> relates to the overall image. This can result in some image layers showing as
|
||||
> "missing" in the output of the `docker history` command. This is normal
|
||||
> behaviour and can be ignored.
|
||||
>
|
||||
> You may hear images like these referred to as *flat images*.
|
||||
|
||||
Notice the new `changed-ubuntu` image does not have its own copies of every
|
||||
layer. As can be seen in the diagram below, the new image is sharing its four
|
||||
underlying layers with the `ubuntu:15.04` image.
|
||||
|
||||

|
||||
|
||||
The `docker history` command also shows the size of each image layer. As you
|
||||
can see, the `94e6b7d2c720` layer is only consuming 12 Bytes of disk space.
|
||||
This means that the `changed-ubuntu` image we just created is only consuming an
|
||||
additional 12 Bytes of disk space on the Docker host - all layers below the
|
||||
`94e6b7d2c720` layer already exist on the Docker host and are shared by other
|
||||
images.
|
||||
|
||||
This sharing of image layers is what makes Docker images and containers so
|
||||
space efficient.
|
||||
|
||||
### Copying makes containers efficient
|
||||
|
||||
You learned earlier that a container is a Docker image with a thin writable,
|
||||
container layer added. The diagram below shows the layers of a container based
|
||||
on the `ubuntu:15.04` image:
|
||||
|
||||

|
||||
|
||||
All writes made to a container are stored in the thin writable container layer.
|
||||
The other layers are read-only (RO) image layers and can't be changed. This
|
||||
means that multiple containers can safely share a single underlying image. The
|
||||
diagram below shows multiple containers sharing a single copy of the
|
||||
`ubuntu:15.04` image. Each container has its own thin RW layer, but they all
|
||||
share a single instance of the ubuntu:15.04 image:
|
||||
|
||||

|
||||
|
||||
When an existing file in a container is modified, Docker uses the storage
|
||||
driver to perform a copy-on-write operation. The specifics of operation depends
|
||||
on the storage driver. For the AUFS and OverlayFS storage drivers, the
|
||||
copy-on-write operation is pretty much as follows:
|
||||
|
||||
* Search through the image layers for the file to update. The process starts
|
||||
at the top, newest layer and works down to the base layer one layer at a
|
||||
time.
|
||||
* Perform a "copy-up" operation on the first copy of the file that is found. A
|
||||
"copy up" copies the file up to the container's own thin writable layer.
|
||||
* Modify the *copy of the file* in container's thin writable layer.
|
||||
|
||||
Btrfs, ZFS, and other drivers handle the copy-on-write differently. You can
|
||||
read more about the methods of these drivers later in their detailed
|
||||
descriptions.
|
||||
|
||||
Containers that write a lot of data will consume more space than containers
|
||||
that do not. This is because most write operations consume new space in the
|
||||
container's thin writable top layer. If your container needs to write a lot of
|
||||
data, you should consider using a data volume.
|
||||
|
||||
A copy-up operation can incur a noticeable performance overhead. This overhead
|
||||
is different depending on which storage driver is in use. However, large files,
|
||||
lots of layers, and deep directory trees can make the impact more noticeable.
|
||||
Fortunately, the operation only occurs the first time any particular file is
|
||||
modified. Subsequent modifications to the same file do not cause a copy-up
|
||||
operation and can operate directly on the file's existing copy already present
|
||||
in the container layer.
|
||||
|
||||
Let's see what happens if we spin up 5 containers based on our `changed-ubuntu`
|
||||
image we built earlier:
|
||||
|
||||
1. From a terminal on your Docker host, run the following `docker run` command
|
||||
5 times.
|
||||
|
||||
$ docker run -dit changed-ubuntu bash
|
||||
75bab0d54f3cf193cfdc3a86483466363f442fba30859f7dcd1b816b6ede82d4
|
||||
$ docker run -dit changed-ubuntu bash
|
||||
9280e777d109e2eb4b13ab211553516124a3d4d4280a0edfc7abf75c59024d47
|
||||
$ docker run -dit changed-ubuntu bash
|
||||
a651680bd6c2ef64902e154eeb8a064b85c9abf08ac46f922ad8dfc11bb5cd8a
|
||||
$ docker run -dit changed-ubuntu bash
|
||||
8eb24b3b2d246f225b24f2fca39625aaad71689c392a7b552b78baf264647373
|
||||
$ docker run -dit changed-ubuntu bash
|
||||
0ad25d06bdf6fca0dedc38301b2aff7478b3e1ce3d1acd676573bba57cb1cfef
|
||||
|
||||
This launches 5 containers based on the `changed-ubuntu` image. As each
|
||||
container is created, Docker adds a writable layer and assigns it a random
|
||||
UUID. This is the value returned from the `docker run` command.
|
||||
|
||||
2. Run the `docker ps` command to verify the 5 containers are running.
|
||||
|
||||
$ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
0ad25d06bdf6 changed-ubuntu "bash" About a minute ago Up About a minute stoic_ptolemy
|
||||
8eb24b3b2d24 changed-ubuntu "bash" About a minute ago Up About a minute pensive_bartik
|
||||
a651680bd6c2 changed-ubuntu "bash" 2 minutes ago Up 2 minutes hopeful_turing
|
||||
9280e777d109 changed-ubuntu "bash" 2 minutes ago Up 2 minutes backstabbing_mahavira
|
||||
75bab0d54f3c changed-ubuntu "bash" 2 minutes ago Up 2 minutes boring_pasteur
|
||||
|
||||
The output above shows 5 running containers, all sharing the
|
||||
`changed-ubuntu` image. Each `CONTAINER ID` is derived from the UUID when
|
||||
creating each container.
|
||||
|
||||
3. List the contents of the local storage area.
|
||||
|
||||
$ sudo ls /var/lib/docker/containers
|
||||
0ad25d06bdf6fca0dedc38301b2aff7478b3e1ce3d1acd676573bba57cb1cfef
|
||||
9280e777d109e2eb4b13ab211553516124a3d4d4280a0edfc7abf75c59024d47
|
||||
75bab0d54f3cf193cfdc3a86483466363f442fba30859f7dcd1b816b6ede82d4
|
||||
a651680bd6c2ef64902e154eeb8a064b85c9abf08ac46f922ad8dfc11bb5cd8a
|
||||
8eb24b3b2d246f225b24f2fca39625aaad71689c392a7b552b78baf264647373
|
||||
|
||||
Docker's copy-on-write strategy not only reduces the amount of space consumed
|
||||
by containers, it also reduces the time required to start a container. At start
|
||||
time, Docker only has to create the thin writable layer for each container.
|
||||
The diagram below shows these 5 containers sharing a single read-only (RO)
|
||||
copy of the `changed-ubuntu` image.
|
||||
|
||||

|
||||
|
||||
If Docker had to make an entire copy of the underlying image stack each time it
|
||||
started a new container, container start times and disk space used would be
|
||||
significantly increased.
|
||||
|
||||
## Data volumes and the storage driver
|
||||
|
||||
When a container is deleted, any data written to the container that is not
|
||||
stored in a *data volume* is deleted along with the container.
|
||||
|
||||
A data volume is a directory or file in the Docker host's filesystem that is
|
||||
mounted directly into a container. Data volumes are not controlled by the
|
||||
storage driver. Reads and writes to data volumes bypass the storage driver and
|
||||
operate at native host speeds. You can mount any number of data volumes into a
|
||||
container. Multiple containers can also share one or more data volumes.
|
||||
|
||||
The diagram below shows a single Docker host running two containers. Each
|
||||
container exists inside of its own address space within the Docker host's local
|
||||
storage area (`/var/lib/docker/...`). There is also a single shared data
|
||||
volume located at `/data` on the Docker host. This is mounted directly into
|
||||
both containers.
|
||||
|
||||

|
||||
|
||||
Data volumes reside outside of the local storage area on the Docker host,
|
||||
further reinforcing their independence from the storage driver's control. When
|
||||
a container is deleted, any data stored in data volumes persists on the Docker
|
||||
host.
|
||||
|
||||
For detailed information about data volumes
|
||||
[Managing data in containers](https://docs.docker.com/userguide/dockervolumes/).
|
||||
|
||||
## Related information
|
||||
|
||||
* [Select a storage driver](selectadriver.md)
|
||||
* [AUFS storage driver in practice](aufs-driver.md)
|
||||
* [Btrfs storage driver in practice](btrfs-driver.md)
|
||||
* [Device Mapper storage driver in practice](device-mapper-driver.md)
|
||||
38
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/index.md
generated
vendored
Normal file
@@ -0,0 +1,38 @@
|
||||
<!--[metadata]>
|
||||
+++
|
||||
title = "Docker storage drivers"
|
||||
description = "Learn how select the proper storage driver for your container."
|
||||
keywords = ["container, storage, driver, AUFS, btfs, devicemapper,zvfs"]
|
||||
[menu.main]
|
||||
identifier = "engine_driver"
|
||||
parent = "engine_guide"
|
||||
weight = 7
|
||||
+++
|
||||
<![end-metadata]-->
|
||||
|
||||
|
||||
# Docker storage drivers
|
||||
|
||||
Docker relies on driver technology to manage the storage and interactions associated with images and the containers that run them. This section contains the following pages:
|
||||
|
||||
* [Understand images, containers, and storage drivers](imagesandcontainers.md)
|
||||
* [Select a storage driver](selectadriver.md)
|
||||
* [AUFS storage driver in practice](aufs-driver.md)
|
||||
* [Btrfs storage driver in practice](btrfs-driver.md)
|
||||
* [Device Mapper storage driver in practice](device-mapper-driver.md)
|
||||
* [OverlayFS in practice](overlayfs-driver.md)
|
||||
* [ZFS storage in practice](zfs-driver.md)
|
||||
|
||||
If you are new to Docker containers make sure you read ["Understand images, containers, and storage drivers"](imagesandcontainers.md) first. It explains key concepts and technologies that can help you when working with storage drivers.
|
||||
|
||||
### Acknowledgement
|
||||
|
||||
The Docker storage driver material was created in large part by our guest author
|
||||
Nigel Poulton with a bit of help from Docker's own Jérôme Petazzoni. In his
|
||||
spare time Nigel creates [IT training
|
||||
videos](http://www.pluralsight.com/author/nigel-poulton), co-hosts the weekly
|
||||
[In Tech We Trust podcast](http://intechwetrustpodcast.com/), and lives it large
|
||||
on [Twitter](https://twitter.com/nigelpoulton).
|
||||
|
||||
|
||||
|
||||
299
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/overlayfs-driver.md
generated
vendored
Normal file
@@ -0,0 +1,299 @@
|
||||
<!--[metadata]>
|
||||
+++
|
||||
title = "OverlayFS storage in practice"
|
||||
description = "Learn how to optimize your use of OverlayFS driver."
|
||||
keywords = ["container, storage, driver, OverlayFS "]
|
||||
[menu.main]
|
||||
parent = "engine_driver"
|
||||
+++
|
||||
<![end-metadata]-->
|
||||
|
||||
# Docker and OverlayFS in practice
|
||||
|
||||
OverlayFS is a modern *union filesystem* that is similar to AUFS. In comparison
|
||||
to AUFS, OverlayFS:
|
||||
|
||||
* has a simpler design
|
||||
* has been in the mainline Linux kernel since version 3.18
|
||||
* is potentially faster
|
||||
|
||||
As a result, OverlayFS is rapidly gaining popularity in the Docker community
|
||||
and is seen by many as a natural successor to AUFS. As promising as OverlayFS
|
||||
is, it is still relatively young. Therefore caution should be taken before
|
||||
using it in production Docker environments.
|
||||
|
||||
Docker's `overlay` storage driver leverages several OverlayFS features to build
|
||||
and manage the on-disk structures of images and containers.
|
||||
|
||||
>**Note**: Since it was merged into the mainline kernel, the OverlayFS *kernel
|
||||
>module* was renamed from "overlayfs" to "overlay". As a result you may see the
|
||||
> two terms used interchangeably in some documentation. However, this document
|
||||
> uses "OverlayFS" to refer to the overall filesystem, and `overlay` to refer
|
||||
> to Docker's storage-driver.
|
||||
|
||||
## Image layering and sharing with OverlayFS
|
||||
|
||||
OverlayFS takes two directories on a single Linux host, layers one on top of
|
||||
the other, and provides a single unified view. These directories are often
|
||||
referred to as *layers* and the technology used to layer them is known as a
|
||||
*union mount*. The OverlayFS terminology is "lowerdir" for the bottom layer and
|
||||
"upperdir" for the top layer. The unified view is exposed through its own
|
||||
directory called "merged".
|
||||
|
||||
The diagram below shows how a Docker image and a Docker container are layered.
|
||||
The image layer is the "lowerdir" and the container layer is the "upperdir".
|
||||
The unified view is exposed through a directory called "merged" which is
|
||||
effectively the containers mount point. The diagram shows how Docker constructs
|
||||
map to OverlayFS constructs.
|
||||
|
||||

|
||||
|
||||
Notice how the image layer and container layer can contain the same files. When
|
||||
this happens, the files in the container layer ("upperdir") are dominant and
|
||||
obscure the existence of the same files in the image layer ("lowerdir"). The
|
||||
container mount ("merged") presents the unified view.
|
||||
|
||||
OverlayFS only works with two layers. This means that multi-layered images
|
||||
cannot be implemented as multiple OverlayFS layers. Instead, each image layer
|
||||
is implemented as its own directory under `/var/lib/docker/overlay`.
|
||||
Hard links are then used as a space-efficient way to reference data shared with
|
||||
lower layers. As of Docker 1.10, image layer IDs no longer correspond to
|
||||
directory names in `/var/lib/docker/`
|
||||
|
||||
To create a container, the `overlay` driver combines the directory representing
|
||||
the image's top layer plus a new directory for the container. The image's top
|
||||
layer is the "lowerdir" in the overlay and read-only. The new directory for the
|
||||
container is the "upperdir" and is writable.
|
||||
|
||||
## Example: Image and container on-disk constructs
|
||||
|
||||
The following `docker pull` command shows a Docker host with downloading a
|
||||
Docker image comprising four layers.
|
||||
|
||||
$ sudo docker pull ubuntu
|
||||
Using default tag: latest
|
||||
latest: Pulling from library/ubuntu
|
||||
8387d9ff0016: Pull complete
|
||||
3b52deaaf0ed: Pull complete
|
||||
4bd501fad6de: Pull complete
|
||||
a3ed95caeb02: Pull complete
|
||||
Digest: sha256:457b05828bdb5dcc044d93d042863fba3f2158ae249a6db5ae3934307c757c54
|
||||
Status: Downloaded newer image for ubuntu:latest
|
||||
|
||||
Each image layer has it's own directory under `/var/lib/docker/overlay/`. This
|
||||
is where the the contents of each image layer are stored.
|
||||
|
||||
The output of the command below shows the four directories that store the
|
||||
contents of each image layer just pulled. However, as can be seen, the image
|
||||
layer IDs do not match the directory names in `/var/lib/docker/overlay`. This
|
||||
is normal behavior in Docker 1.10 and later.
|
||||
|
||||
$ ls -l /var/lib/docker/overlay/
|
||||
total 24
|
||||
drwx------ 3 root root 4096 Oct 28 11:02 1d073211c498fd5022699b46a936b4e4bdacb04f637ad64d3475f558783f5c3e
|
||||
drwx------ 3 root root 4096 Oct 28 11:02 5a4526e952f0aa24f3fcc1b6971f7744eb5465d572a48d47c492cb6bbf9cbcda
|
||||
drwx------ 5 root root 4096 Oct 28 11:06 99fcaefe76ef1aa4077b90a413af57fd17d19dce4e50d7964a273aae67055235
|
||||
drwx------ 3 root root 4096 Oct 28 11:01 c63fb41c2213f511f12f294dd729b9903a64d88f098c20d2350905ac1fdbcbba
|
||||
|
||||
The image layer directories contain the files unique to that layer as well as
|
||||
hard links to the data that is shared with lower layers. This allows for
|
||||
efficient use of disk space.
|
||||
|
||||
Containers also exist on-disk in the Docker host's filesystem under
|
||||
`/var/lib/docker/overlay/`. If you inspect the directory relating to a running
|
||||
container using the `ls -l` command, you find the following file and
|
||||
directories.
|
||||
|
||||
$ ls -l /var/lib/docker/overlay/<directory-of-running-container>
|
||||
total 16
|
||||
-rw-r--r-- 1 root root 64 Oct 28 11:06 lower-id
|
||||
drwxr-xr-x 1 root root 4096 Oct 28 11:06 merged
|
||||
drwxr-xr-x 4 root root 4096 Oct 28 11:06 upper
|
||||
drwx------ 3 root root 4096 Oct 28 11:06 work
|
||||
|
||||
These four filesystem objects are all artefacts of OverlayFS. The "lower-id"
|
||||
file contains the ID of the top layer of the image the container is based on.
|
||||
This is used by OverlayFS as the "lowerdir".
|
||||
|
||||
$ cat /var/lib/docker/overlay/73de7176c223a6c82fd46c48c5f152f2c8a7e49ecb795a7197c3bb795c4d879e/lower-id
|
||||
1d073211c498fd5022699b46a936b4e4bdacb04f637ad64d3475f558783f5c3e
|
||||
|
||||
The "upper" directory is the containers read-write layer. Any changes made to
|
||||
the container are written to this directory.
|
||||
|
||||
The "merged" directory is effectively the containers mount point. This is where
|
||||
the unified view of the image ("lowerdir") and container ("upperdir") is
|
||||
exposed. Any changes written to the container are immediately reflected in this
|
||||
directory.
|
||||
|
||||
The "work" directory is required for OverlayFS to function. It is used for
|
||||
things such as *copy_up* operations.
|
||||
|
||||
You can verify all of these constructs from the output of the `mount` command.
|
||||
(Ellipses and line breaks are used in the output below to enhance readability.)
|
||||
|
||||
$ mount | grep overlay
|
||||
overlay on /var/lib/docker/overlay/73de7176c223.../merged
|
||||
type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay/1d073211c498.../root,
|
||||
upperdir=/var/lib/docker/overlay/73de7176c223.../upper,
|
||||
workdir=/var/lib/docker/overlay/73de7176c223.../work)
|
||||
|
||||
The output reflects that the overlay is mounted as read-write ("rw").
|
||||
|
||||
## Container reads and writes with overlay
|
||||
|
||||
Consider three scenarios where a container opens a file for read access with
|
||||
overlay.
|
||||
|
||||
- **The file does not exist in the container layer**. If a container opens a
|
||||
file for read access and the file does not already exist in the container
|
||||
("upperdir") it is read from the image ("lowerdir"). This should incur very
|
||||
little performance overhead.
|
||||
|
||||
- **The file only exists in the container layer**. If a container opens a file
|
||||
for read access and the file exists in the container ("upperdir") and not in
|
||||
the image ("lowerdir"), it is read directly from the container.
|
||||
|
||||
- **The file exists in the container layer and the image layer**. If a
|
||||
container opens a file for read access and the file exists in the image layer
|
||||
and the container layer, the file's version in the container layer is read.
|
||||
This is because files in the container layer ("upperdir") obscure files with
|
||||
the same name in the image layer ("lowerdir").
|
||||
|
||||
Consider some scenarios where files in a container are modified.
|
||||
|
||||
- **Writing to a file for the first time**. The first time a container writes
|
||||
to an existing file, that file does not exist in the container ("upperdir").
|
||||
The `overlay` driver performs a *copy_up* operation to copy the file from the
|
||||
image ("lowerdir") to the container ("upperdir"). The container then writes the
|
||||
changes to the new copy of the file in the container layer.
|
||||
|
||||
However, OverlayFS works at the file level not the block level. This means
|
||||
that all OverlayFS copy-up operations copy entire files, even if the file is
|
||||
very large and only a small part of it is being modified. This can have a
|
||||
noticeable impact on container write performance. However, two things are
|
||||
worth noting:
|
||||
|
||||
* The copy_up operation only occurs the first time any given file is
|
||||
written to. Subsequent writes to the same file will operate against the copy of
|
||||
the file already copied up to the container.
|
||||
|
||||
* OverlayFS only works with two layers. This means that performance should
|
||||
be better than AUFS which can suffer noticeable latencies when searching for
|
||||
files in images with many layers.
|
||||
|
||||
- **Deleting files and directories**. When files are deleted within a container
|
||||
a *whiteout* file is created in the containers "upperdir". The version of the
|
||||
file in the image layer ("lowerdir") is not deleted. However, the whiteout file
|
||||
in the container obscures it.
|
||||
|
||||
Deleting a directory in a container results in *opaque directory* being
|
||||
created in the "upperdir". This has the same effect as a whiteout file and
|
||||
effectively masks the existence of the directory in the image's "lowerdir".
|
||||
|
||||
## Configure Docker with the overlay storage driver
|
||||
|
||||
To configure Docker to use the overlay storage driver your Docker host must be
|
||||
running version 3.18 of the Linux kernel (preferably newer) with the overlay
|
||||
kernel module loaded. OverlayFS can operate on top of most supported Linux
|
||||
filesystems. However, ext4 is currently recommended for use in production
|
||||
environments.
|
||||
|
||||
The following procedure shows you how to configure your Docker host to use
|
||||
OverlayFS. The procedure assumes that the Docker daemon is in a stopped state.
|
||||
|
||||
> **Caution:** If you have already run the Docker daemon on your Docker host
|
||||
> and have images you want to keep, `push` them Docker Hub or your private
|
||||
> Docker Trusted Registry before attempting this procedure.
|
||||
|
||||
1. If it is running, stop the Docker `daemon`.
|
||||
|
||||
2. Verify your kernel version and that the overlay kernel module is loaded.
|
||||
|
||||
$ uname -r
|
||||
3.19.0-21-generic
|
||||
|
||||
$ lsmod | grep overlay
|
||||
overlay
|
||||
|
||||
3. Start the Docker daemon with the `overlay` storage driver.
|
||||
|
||||
$ docker daemon --storage-driver=overlay &
|
||||
[1] 29403
|
||||
root@ip-10-0-0-174:/home/ubuntu# INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
|
||||
INFO[0000] Option DefaultDriver: bridge
|
||||
INFO[0000] Option DefaultNetwork: bridge
|
||||
<output truncated>
|
||||
|
||||
Alternatively, you can force the Docker daemon to automatically start with
|
||||
the `overlay` driver by editing the Docker config file and adding the
|
||||
`--storage-driver=overlay` flag to the `DOCKER_OPTS` line. Once this option
|
||||
is set you can start the daemon using normal startup scripts without having
|
||||
to manually pass in the `--storage-driver` flag.
|
||||
|
||||
4. Verify that the daemon is using the `overlay` storage driver
|
||||
|
||||
$ docker info
|
||||
Containers: 0
|
||||
Images: 0
|
||||
Storage Driver: overlay
|
||||
Backing Filesystem: extfs
|
||||
<output truncated>
|
||||
|
||||
Notice that the *Backing filesystem* in the output above is showing as
|
||||
`extfs`. Multiple backing filesystems are supported but `extfs` (ext4) is
|
||||
recommended for production use cases.
|
||||
|
||||
Your Docker host is now using the `overlay` storage driver. If you run the
|
||||
`mount` command, you'll find Docker has automatically created the `overlay`
|
||||
mount with the required "lowerdir", "upperdir", "merged" and "workdir"
|
||||
constructs.
|
||||
|
||||
## OverlayFS and Docker Performance
|
||||
|
||||
As a general rule, the `overlay` driver should be fast. Almost certainly faster
|
||||
than `aufs` and `devicemapper`. In certain circumstances it may also be faster
|
||||
than `btrfs`. That said, there are a few things to be aware of relative to the
|
||||
performance of Docker using the `overlay` storage driver.
|
||||
|
||||
- **Page Caching**. OverlayFS supports page cache sharing. This means multiple
|
||||
containers accessing the same file can share a single page cache entry (or
|
||||
entries). This makes the `overlay` driver efficient with memory and a good
|
||||
option for PaaS and other high density use cases.
|
||||
|
||||
- **copy_up**. As with AUFS, OverlayFS has to perform copy-up operations any
|
||||
time a container writes to a file for the first time. This can insert latency
|
||||
into the write operation — especially if the file being copied up is
|
||||
large. However, once the file has been copied up, all subsequent writes to that
|
||||
file occur without the need for further copy-up operations.
|
||||
|
||||
The OverlayFS copy_up operation should be faster than the same operation
|
||||
with AUFS. This is because AUFS supports more layers than OverlayFS and it is
|
||||
possible to incur far larger latencies if searching through many AUFS layers.
|
||||
|
||||
- **RPMs and Yum**. OverlayFS only implements a subset of the POSIX standards.
|
||||
This can result in certain OverlayFS operations breaking POSIX standards. One
|
||||
such operation is the *copy-up* operation. Therefore, using `yum` inside of a
|
||||
container on a Docker host using the `overlay` storage driver is unlikely to
|
||||
work without implementing workarounds.
|
||||
|
||||
- **Inode limits**. Use of the `overlay` storage driver can cause excessive
|
||||
inode consumption. This is especially so as the number of images and containers
|
||||
on the Docker host grows. A Docker host with a large number of images and lots
|
||||
of started and stopped containers can quickly run out of inodes.
|
||||
|
||||
Unfortunately you can only specify the number of inodes in a filesystem at the
|
||||
time of creation. For this reason, you may wish to consider putting
|
||||
`/var/lib/docker` on a separate device with its own filesystem, or manually
|
||||
specifying the number of inodes when creating the filesystem.
|
||||
|
||||
The following generic performance best practices also apply to OverlayFS.
|
||||
|
||||
- **Solid State Devices (SSD)**. For best performance it is always a good idea
|
||||
to use fast storage media such as solid state devices (SSD).
|
||||
|
||||
- **Use Data Volumes**. Data volumes provide the best and most predictable
|
||||
performance. This is because they bypass the storage driver and do not incur
|
||||
any of the potential overheads introduced by thin provisioning and
|
||||
copy-on-write. For this reason, you should place heavy write workloads on data
|
||||
volumes.
|
||||
206
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/selectadriver.md
generated
vendored
Normal file
@@ -0,0 +1,206 @@
|
||||
<!--[metadata]>
|
||||
+++
|
||||
title = "Select a storage driver"
|
||||
description = "Learn how select the proper storage driver for your container."
|
||||
keywords = ["container, storage, driver, AUFS, btfs, devicemapper,zvfs"]
|
||||
[menu.main]
|
||||
parent = "engine_driver"
|
||||
weight = -1
|
||||
+++
|
||||
<![end-metadata]-->
|
||||
|
||||
# Select a storage driver
|
||||
|
||||
This page describes Docker's storage driver feature. It lists the storage
|
||||
driver's that Docker supports and the basic commands associated with managing
|
||||
them. Finally, this page provides guidance on choosing a storage driver.
|
||||
|
||||
The material on this page is intended for readers who already have an
|
||||
[understanding of the storage driver technology](imagesandcontainers.md).
|
||||
|
||||
## A pluggable storage driver architecture
|
||||
|
||||
Docker has a pluggable storage driver architecture. This gives you the
|
||||
flexibility to "plug in" the storage driver that is best for your environment
|
||||
and use-case. Each Docker storage driver is based on a Linux filesystem or
|
||||
volume manager. Further, each storage driver is free to implement the
|
||||
management of image layers and the container layer in its own unique way. This
|
||||
means some storage drivers perform better than others in different
|
||||
circumstances.
|
||||
|
||||
Once you decide which driver is best, you set this driver on the Docker daemon
|
||||
at start time. As a result, the Docker daemon can only run one storage driver,
|
||||
and all containers created by that daemon instance use the same storage driver.
|
||||
The table below shows the supported storage driver technologies and their
|
||||
driver names:
|
||||
|
||||
|Technology |Storage driver name |
|
||||
|--------------|---------------------|
|
||||
|OverlayFS |`overlay` |
|
||||
|AUFS |`aufs` |
|
||||
|Btrfs |`btrfs` |
|
||||
|Device Mapper |`devicemapper` |
|
||||
|VFS* |`vfs` |
|
||||
|ZFS |`zfs` |
|
||||
|
||||
To find out which storage driver is set on the daemon , you use the
|
||||
`docker info` command:
|
||||
|
||||
$ docker info
|
||||
Containers: 0
|
||||
Images: 0
|
||||
Storage Driver: overlay
|
||||
Backing Filesystem: extfs
|
||||
Execution Driver: native-0.2
|
||||
Logging Driver: json-file
|
||||
Kernel Version: 3.19.0-15-generic
|
||||
Operating System: Ubuntu 15.04
|
||||
... output truncated ...
|
||||
|
||||
The `info` subcommand reveals that the Docker daemon is using the `overlay`
|
||||
storage driver with a `Backing Filesystem` value of `extfs`. The `extfs` value
|
||||
means that the `overlay` storage driver is operating on top of an existing
|
||||
(ext) filesystem. The backing filesystem refers to the filesystem that was used
|
||||
to create the Docker host's local storage area under `/var/lib/docker`.
|
||||
|
||||
Which storage driver you use, in part, depends on the backing filesystem you
|
||||
plan to use for your Docker host's local storage area. Some storage drivers can
|
||||
operate on top of different backing filesystems. However, other storage
|
||||
drivers require the backing filesystem to be the same as the storage driver.
|
||||
For example, the `btrfs` storage driver on a Btrfs backing filesystem. The
|
||||
following table lists each storage driver and whether it must match the host's
|
||||
backing file system:
|
||||
|
||||
|Storage driver |Must match backing filesystem |Incompatible with |
|
||||
|---------------|------------------------------|--------------------|
|
||||
|`overlay` |No |`btrfs` `aufs` `zfs`|
|
||||
|`aufs` |No |`btrfs` `aufs` |
|
||||
|`btrfs` |Yes | N/A |
|
||||
|`devicemapper` |No | N/A |
|
||||
|`vfs` |No | N/A |
|
||||
|`zfs` |Yes | N/A |
|
||||
|
||||
|
||||
> **Note**
|
||||
> Incompatible with means some storage drivers can not run over certain backing
|
||||
> filesystem.
|
||||
|
||||
You can set the storage driver by passing the `--storage-driver=<name>` option
|
||||
to the `docker daemon` command line, or by setting the option on the
|
||||
`DOCKER_OPTS` line in the `/etc/default/docker` file.
|
||||
|
||||
The following command shows how to start the Docker daemon with the
|
||||
`devicemapper` storage driver using the `docker daemon` command:
|
||||
|
||||
$ docker daemon --storage-driver=devicemapper &
|
||||
|
||||
$ docker info
|
||||
Containers: 0
|
||||
Images: 0
|
||||
Storage Driver: devicemapper
|
||||
Pool Name: docker-252:0-147544-pool
|
||||
Pool Blocksize: 65.54 kB
|
||||
Backing Filesystem: extfs
|
||||
Data file: /dev/loop0
|
||||
Metadata file: /dev/loop1
|
||||
Data Space Used: 1.821 GB
|
||||
Data Space Total: 107.4 GB
|
||||
Data Space Available: 3.174 GB
|
||||
Metadata Space Used: 1.479 MB
|
||||
Metadata Space Total: 2.147 GB
|
||||
Metadata Space Available: 2.146 GB
|
||||
Udev Sync Supported: true
|
||||
Deferred Removal Enabled: false
|
||||
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
|
||||
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
|
||||
Library Version: 1.02.90 (2014-09-01)
|
||||
Execution Driver: native-0.2
|
||||
Logging Driver: json-file
|
||||
Kernel Version: 3.19.0-15-generic
|
||||
Operating System: Ubuntu 15.04
|
||||
<output truncated>
|
||||
|
||||
Your choice of storage driver can affect the performance of your containerized
|
||||
applications. So it's important to understand the different storage driver
|
||||
options available and select the right one for your application. Later, in this
|
||||
page you'll find some advice for choosing an appropriate driver.
|
||||
|
||||
## Shared storage systems and the storage driver
|
||||
|
||||
Many enterprises consume storage from shared storage systems such as SAN and
|
||||
NAS arrays. These often provide increased performance and availability, as well
|
||||
as advanced features such as thin provisioning, deduplication and compression.
|
||||
|
||||
The Docker storage driver and data volumes can both operate on top of storage
|
||||
provided by shared storage systems. This allows Docker to leverage the
|
||||
increased performance and availability these systems provide. However, Docker
|
||||
does not integrate with these underlying systems.
|
||||
|
||||
Remember that each Docker storage driver is based on a Linux filesystem or
|
||||
volume manager. Be sure to follow existing best practices for operating your
|
||||
storage driver (filesystem or volume manager) on top of your shared storage
|
||||
system. For example, if using the ZFS storage driver on top of *XYZ* shared
|
||||
storage system, be sure to follow best practices for operating ZFS filesystems
|
||||
on top of XYZ shared storage system.
|
||||
|
||||
## Which storage driver should you choose?
|
||||
|
||||
Several factors influence the selection of a storage driver. However, these two
|
||||
facts must be kept in mind:
|
||||
|
||||
1. No single driver is well suited to every use-case
|
||||
2. Storage drivers are improving and evolving all of the time
|
||||
|
||||
With these factors in mind, the following points, coupled with the table below,
|
||||
should provide some guidance.
|
||||
|
||||
### Stability
|
||||
For the most stable and hassle-free Docker experience, you should consider the
|
||||
following:
|
||||
|
||||
- **Use the default storage driver for your distribution**. When Docker
|
||||
installs, it chooses a default storage driver based on the configuration of
|
||||
your system. Stability is an important factor influencing which storage driver
|
||||
is used by default. Straying from this default may increase your chances of
|
||||
encountering bugs and nuances.
|
||||
- **Follow the configuration specified on the CS Engine
|
||||
[compatibility matrix](https://www.docker.com/compatibility-maintenance)**. The
|
||||
CS Engine is the commercially supported version of the Docker Engine. It's
|
||||
code-base is identical to the open source Engine, but it has a limited set of
|
||||
supported configurations. These *supported configurations* use the most stable
|
||||
and mature storage drivers. Straying from these configurations may also
|
||||
increase your chances of encountering bugs and nuances.
|
||||
|
||||
### Experience and expertise
|
||||
|
||||
Choose a storage driver that you and your team/organization have experience
|
||||
with. For example, if you use RHEL or one of its downstream forks, you may
|
||||
already have experience with LVM and Device Mapper. If so, you may wish to use
|
||||
the `devicemapper` driver.
|
||||
|
||||
If you do not feel you have expertise with any of the storage drivers supported
|
||||
by Docker, and you want an easy-to-use stable Docker experience, you should
|
||||
consider using the default driver installed by your distribution's Docker
|
||||
package.
|
||||
|
||||
### Future-proofing
|
||||
|
||||
Many people consider OverlayFS as the future of the Docker storage driver.
|
||||
However, it is less mature, and potentially less stable than some of the more
|
||||
mature drivers such as `aufs` and `devicemapper`. For this reason, you should
|
||||
use the OverlayFS driver with caution and expect to encounter more bugs and
|
||||
nuances than if you were using a more mature driver.
|
||||
|
||||
The following diagram lists each storage driver and provides insight into some
|
||||
of their pros and cons. When selecting which storage driver to use, consider
|
||||
the guidance offered by the table below along with the points mentioned above.
|
||||
|
||||

|
||||
|
||||
|
||||
## Related information
|
||||
|
||||
* [Understand images, containers, and storage drivers](imagesandcontainers.md)
|
||||
* [AUFS storage driver in practice](aufs-driver.md)
|
||||
* [Btrfs storage driver in practice](btrfs-driver.md)
|
||||
* [Device Mapper storage driver in practice](device-mapper-driver.md)
|
||||
296
vendor/github.com/hyperhq/hypercli/docs/userguide/storagedriver/zfs-driver.md
generated
vendored
Normal file
@@ -0,0 +1,296 @@
|
||||
<!--[metadata]>
|
||||
+++
|
||||
title = "ZFS storage in practice"
|
||||
description = "Learn how to optimize your use of ZFS driver."
|
||||
keywords = ["container, storage, driver, ZFS "]
|
||||
[menu.main]
|
||||
parent = "engine_driver"
|
||||
+++
|
||||
<![end-metadata]-->
|
||||
|
||||
# Docker and ZFS in practice
|
||||
|
||||
ZFS is a next generation filesystem that supports many advanced storage
|
||||
technologies such as volume management, snapshots, checksumming, compression
|
||||
and deduplication, replication and more.
|
||||
|
||||
It was created by Sun Microsystems (now Oracle Corporation) and is open sourced
|
||||
under the CDDL license. Due to licensing incompatibilities between the CDDL
|
||||
and GPL, ZFS cannot be shipped as part of the mainline Linux kernel. However,
|
||||
the ZFS On Linux (ZoL) project provides an out-of-tree kernel module and
|
||||
userspace tools which can be installed separately.
|
||||
|
||||
The ZFS on Linux (ZoL) port is healthy and maturing. However, at this point in
|
||||
time it is not recommended to use the `zfs` Docker storage driver for
|
||||
production use unless you have substantial experience with ZFS on Linux.
|
||||
|
||||
> **Note:** There is also a FUSE implementation of ZFS on the Linux platform.
|
||||
> This should work with Docker but is not recommended. The native ZFS driver
|
||||
> (ZoL) is more tested, more performant, and is more widely used. The remainder
|
||||
> of this document will relate to the native ZoL port.
|
||||
|
||||
|
||||
## Image layering and sharing with ZFS
|
||||
|
||||
The Docker `zfs` storage driver makes extensive use of three ZFS datasets:
|
||||
|
||||
- filesystems
|
||||
- snapshots
|
||||
- clones
|
||||
|
||||
ZFS filesystems are thinly provisioned and have space allocated to them from a
|
||||
ZFS pool (zpool) via allocate on demand operations. Snapshots and clones are
|
||||
space-efficient point-in-time copies of ZFS filesystems. Snapshots are
|
||||
read-only. Clones are read-write. Clones can only be created from snapshots.
|
||||
This simple relationship is shown in the diagram below.
|
||||
|
||||

|
||||
|
||||
The solid line in the diagram shows the process flow for creating a clone. Step
|
||||
1 creates a snapshot of the filesystem, and step two creates the clone from
|
||||
the snapshot. The dashed line shows the relationship between the clone and the
|
||||
filesystem, via the snapshot. All three ZFS datasets draw space form the same
|
||||
underlying zpool.
|
||||
|
||||
On Docker hosts using the `zfs` storage driver, the base layer of an image is a
|
||||
ZFS filesystem. Each child layer is a ZFS clone based on a ZFS snapshot of the
|
||||
layer below it. A container is a ZFS clone based on a ZFS Snapshot of the top
|
||||
layer of the image it's created from. All ZFS datasets draw their space from a
|
||||
common zpool. The diagram below shows how this is put together with a running
|
||||
container based on a two-layer image.
|
||||
|
||||

|
||||
|
||||
The following process explains how images are layered and containers created.
|
||||
The process is based on the diagram above.
|
||||
|
||||
1. The base layer of the image exists on the Docker host as a ZFS filesystem.
|
||||
|
||||
This filesystem consumes space from the zpool used to create the Docker
|
||||
host's local storage area at `/var/lib/docker`.
|
||||
|
||||
2. Additional image layers are clones of the dataset hosting the image layer
|
||||
directly below it.
|
||||
|
||||
In the diagram, "Layer 1" is added by making a ZFS snapshot of the base
|
||||
layer and then creating a clone from that snapshot. The clone is writable and
|
||||
consumes space on-demand from the zpool. The snapshot is read-only, maintaining
|
||||
the base layer as an immutable object.
|
||||
|
||||
3. When the container is launched, a read-write layer is added above the image.
|
||||
|
||||
In the diagram above, the container's read-write layer is created by making
|
||||
a snapshot of the top layer of the image (Layer 1) and creating a clone from
|
||||
that snapshot.
|
||||
|
||||
As changes are made to the container, space is allocated to it from the
|
||||
zpool via allocate-on-demand operations. By default, ZFS will allocate space in
|
||||
blocks of 128K.
|
||||
|
||||
This process of creating child layers and containers from *read-only* snapshots
|
||||
allows images to be maintained as immutable objects.
|
||||
|
||||
## Container reads and writes with ZFS
|
||||
|
||||
Container reads with the `zfs` storage driver are very simple. A newly launched
|
||||
container is based on a ZFS clone. This clone initially shares all of its data
|
||||
with the dataset it was created from. This means that read operations with the
|
||||
`zfs` storage driver are fast – even if the data being read was note
|
||||
copied into the container yet. This sharing of data blocks is shown in the
|
||||
diagram below.
|
||||
|
||||

|
||||
|
||||
Writing new data to a container is accomplished via an allocate-on-demand
|
||||
operation. Every time a new area of the container needs writing to, a new block
|
||||
is allocated from the zpool. This means that containers consume additional
|
||||
space as new data is written to them. New space is allocated to the container
|
||||
(ZFS Clone) from the underlying zpool.
|
||||
|
||||
Updating *existing data* in a container is accomplished by allocating new
|
||||
blocks to the containers clone and storing the changed data in those new
|
||||
blocks. The original blocks are unchanged, allowing the underlying image
|
||||
dataset to remain immutable. This is the same as writing to a normal ZFS
|
||||
filesystem and is an implementation of copy-on-write semantics.
|
||||
|
||||
## Configure Docker with the ZFS storage driver
|
||||
|
||||
The `zfs` storage driver is only supported on a Docker host where
|
||||
`/var/lib/docker` is mounted as a ZFS filesystem. This section shows you how to
|
||||
install and configure native ZFS on Linux (ZoL) on an Ubuntu 14.04 system.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
If you have already used the Docker daemon on your Docker host and have images
|
||||
you want to keep, `push` them Docker Hub or your private Docker Trusted
|
||||
Registry before attempting this procedure.
|
||||
|
||||
Stop the Docker daemon. Then, ensure that you have a spare block device at
|
||||
`/dev/xvdb`. The device identifier may be be different in your environment and
|
||||
you should substitute your own values throughout the procedure.
|
||||
|
||||
### Install Zfs on Ubuntu 14.04 LTS
|
||||
|
||||
1. If it is running, stop the Docker `daemon`.
|
||||
|
||||
1. Install `the software-properties-common` package.
|
||||
|
||||
This is required for the `add-apt-repository` command.
|
||||
|
||||
$ sudo apt-get install software-properties-common
|
||||
Reading package lists... Done
|
||||
Building dependency tree
|
||||
<output truncated>
|
||||
|
||||
2. Add the `zfs-native` package archive.
|
||||
|
||||
$ sudo add-apt-repository ppa:zfs-native/stable
|
||||
The native ZFS filesystem for Linux. Install the ubuntu-zfs package.
|
||||
<output truncated>
|
||||
gpg: key F6B0FC61: public key "Launchpad PPA for Native ZFS for Linux" imported
|
||||
gpg: Total number processed: 1
|
||||
gpg: imported: 1 (RSA: 1)
|
||||
OK
|
||||
|
||||
3. Get the latest package lists for all registered repositories and package
|
||||
archives.
|
||||
|
||||
$ sudo apt-get update
|
||||
Ign http://us-west-2.ec2.archive.ubuntu.com trusty InRelease
|
||||
Get:1 http://us-west-2.ec2.archive.ubuntu.com trusty-updates InRelease [64.4 kB]
|
||||
<output truncated>
|
||||
Fetched 10.3 MB in 4s (2,370 kB/s)
|
||||
Reading package lists... Done
|
||||
|
||||
4. Install the `ubuntu-zfs` package.
|
||||
|
||||
$ sudo apt-get install -y ubuntu-zfs
|
||||
Reading package lists... Done
|
||||
Building dependency tree
|
||||
<output truncated>
|
||||
|
||||
5. Load the `zfs` module.
|
||||
|
||||
$ sudo modprobe zfs
|
||||
|
||||
6. Verify that it loaded correctly.
|
||||
|
||||
$ lsmod | grep zfs
|
||||
zfs 2768247 0
|
||||
zunicode 331170 1 zfs
|
||||
zcommon 55411 1 zfs
|
||||
znvpair 89086 2 zfs,zcommon
|
||||
spl 96378 3 zfs,zcommon,znvpair
|
||||
zavl 15236 1 zfs
|
||||
|
||||
## Configure ZFS for Docker
|
||||
|
||||
Once ZFS is installed and loaded, you're ready to configure ZFS for Docker.
|
||||
|
||||
|
||||
1. Create a new `zpool`.
|
||||
|
||||
$ sudo zpool create -f zpool-docker /dev/xvdb
|
||||
|
||||
The command creates the `zpool` and gives it the name "zpool-docker". The name is arbitrary.
|
||||
|
||||
2. Check that the `zpool` exists.
|
||||
|
||||
$ sudo zfs list
|
||||
NAME USED AVAIL REFER MOUNTPOINT
|
||||
zpool-docker 55K 3.84G 19K /zpool-docker
|
||||
|
||||
3. Create and mount a new ZFS filesystem to `/var/lib/docker`.
|
||||
|
||||
$ sudo zfs create -o mountpoint=/var/lib/docker zpool-docker/docker
|
||||
|
||||
4. Check that the previous step worked.
|
||||
|
||||
$ sudo zfs list -t all
|
||||
NAME USED AVAIL REFER MOUNTPOINT
|
||||
zpool-docker 93.5K 3.84G 19K /zpool-docker
|
||||
zpool-docker/docker 19K 3.84G 19K /var/lib/docker
|
||||
|
||||
Now that you have a ZFS filesystem mounted to `/var/lib/docker`, the daemon
|
||||
should automatically load with the `zfs` storage driver.
|
||||
|
||||
5. Start the Docker daemon.
|
||||
|
||||
$ sudo service docker start
|
||||
docker start/running, process 2315
|
||||
|
||||
The procedure for starting the Docker daemon may differ depending on the
|
||||
Linux distribution you are using. It is possible to force the Docker daemon
|
||||
to start with the `zfs` storage driver by passing the
|
||||
`--storage-driver=zfs`flag to the `docker daemon` command, or to the
|
||||
`DOCKER_OPTS` line in the Docker config file.
|
||||
|
||||
6. Verify that the daemon is using the `zfs` storage driver.
|
||||
|
||||
$ sudo docker info
|
||||
Containers: 0
|
||||
Images: 0
|
||||
Storage Driver: zfs
|
||||
Zpool: zpool-docker
|
||||
Zpool Health: ONLINE
|
||||
Parent Dataset: zpool-docker/docker
|
||||
Space Used By Parent: 27648
|
||||
Space Available: 4128139776
|
||||
Parent Quota: no
|
||||
Compression: off
|
||||
Execution Driver: native-0.2
|
||||
[...]
|
||||
|
||||
The output of the command above shows that the Docker daemon is using the
|
||||
`zfs` storage driver and that the parent dataset is the
|
||||
`zpool-docker/docker` filesystem created earlier.
|
||||
|
||||
Your Docker host is now using ZFS to store to manage its images and containers.
|
||||
|
||||
## ZFS and Docker performance
|
||||
|
||||
There are several factors that influence the performance of Docker using the
|
||||
`zfs` storage driver.
|
||||
|
||||
- **Memory**. Memory has a major impact on ZFS performance. This goes back to
|
||||
the fact that ZFS was originally designed for use on big Sun Solaris servers
|
||||
with large amounts of memory. Keep this in mind when sizing your Docker hosts.
|
||||
|
||||
- **ZFS Features**. Using ZFS features, such as deduplication, can
|
||||
significantly increase the amount of memory ZFS uses. For memory consumption
|
||||
and performance reasons it is recommended to turn off ZFS deduplication.
|
||||
However, deduplication at other layers in the stack (such as SAN or NAS arrays)
|
||||
can still be used as these do not impact ZFS memory usage and performance. If
|
||||
using SAN, NAS or other hardware RAID technologies you should continue to
|
||||
follow existing best practices for using them with ZFS.
|
||||
|
||||
- **ZFS Caching**. ZFS caches disk blocks in a memory structure called the
|
||||
adaptive replacement cache (ARC). The *Single Copy ARC* feature of ZFS allows a
|
||||
single cached copy of a block to be shared by multiple clones of a filesystem.
|
||||
This means that multiple running containers can share a single copy of cached
|
||||
block. This means that ZFS is a good option for PaaS and other high density use
|
||||
cases.
|
||||
|
||||
- **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write
|
||||
filesystems like ZFS. However, ZFS writes in 128K blocks and allocates *slabs*
|
||||
(multiple 128K blocks) to CoW operations in an attempt to reduce fragmentation.
|
||||
The ZFS intent log (ZIL) and the coalescing of writes (delayed writes) also
|
||||
help to reduce fragmentation.
|
||||
|
||||
- **Use the native ZFS driver for Linux**. Although the Docker `zfs` storage
|
||||
driver supports the ZFS FUSE implementation, it is not recommended when high
|
||||
performance is required. The native ZFS on Linux driver tends to perform better
|
||||
than the FUSE implementation.
|
||||
|
||||
The following generic performance best practices also apply to ZFS.
|
||||
|
||||
- **Use of SSD**. For best performance it is always a good idea to use fast
|
||||
storage media such as solid state devices (SSD). However, if you only have a
|
||||
limited amount of SSD storage available it is recommended to place the ZIL on
|
||||
SSD.
|
||||
|
||||
- **Use Data Volumes**. Data volumes provide the best and most predictable
|
||||
performance. This is because they bypass the storage driver and do not incur
|
||||
any of the potential overheads introduced by thin provisioning and
|
||||
copy-on-write. For this reason, you should place heavy write workloads on data
|
||||
volumes.
|
||||