Docker和虚拟机有什么不同?

docker containers virtual-machine virtualization


I keep rereading the Docker documentation to try to understand the difference between Docker and a full VM. How does it manage to provide a full filesystem, isolated networking environment, etc. without being as heavy?

为什么将软件部署到Docker镜像(如果这是个正确的术语的话)比简单地部署到一致的生产环境更容易?




Answer 1 Ken Cochrane


Docker originally used LinuX Containers (LXC), but later switched to runC (formerly known as libcontainer), which runs in the same operating system as its host. This allows it to share a lot of the host operating system resources. Also, it uses a layered filesystem (AuFS) and manages networking.

AuFS是一个分层文件系统,所以你可以有一个只读部分和一个写部分合并在一起。可以将操作系统的公用部分作为只读部分(并在所有的容器中共享),然后给每个容器都有自己的挂载,用于写入。

So, let's say you have a 1 GB container image; if you wanted to use a full VM, you would need to have 1 GB x number of VMs you want. With Docker and AuFS you can share the bulk of the 1 GB between all the containers and if you have 1000 containers you still might only have a little over 1 GB of space for the containers OS (assuming they are all running the same OS image).

一个完整的虚拟化系统会得到自己的资源分配给它,并做最小的共享。你会得到更多的隔离,但它更重(需要更多的资源)。使用Docker,你可以得到更少的隔离,但容器是轻量级的(需要更少的资源)。所以你可以轻松地在一台主机上运行成千上万的容器,它甚至不会眨眼。试着用Xen来做,除非你有一个非常大的主机,否则我认为这是不可能的。

一个完整的虚拟化系统通常需要几分钟的时间来启动,而DockerLXCrunC容器则需要几秒钟,甚至往往不到一秒钟。

每种类型的虚拟化系统都有其优点和缺点。如果你想在保证资源的情况下实现完全隔离,那么完整的虚拟机就是最好的选择。如果你只是想将进程相互隔离,并想在一个合理大小的主机上运行一吨的进程,那么DockerLXCrunC似乎是个不错的选择。

For more information, check out this set of blog posts which do a good job of explaining how LXC works.

为什么将软件部署到docker镜像(如果这是个正确的术语的话)比简单地部署到一致的生产环境更容易?

Deploying a consistent production environment is easier said than done. Even if you use tools like Chef and Puppet, there are always OS updates and other things that change between hosts and environments.

Docker给你提供了将操作系统快照到共享映像的能力,让你在其他Docker主机上部署也很方便。本地、dev、qa、prod等:都是同一个映像。当然,你可以用其他工具来做这个,但几乎没有那么容易和快速。

This is great for testing; let's say you have thousands of tests that need to connect to a database, and each test needs a pristine copy of the database and will make changes to the data. The classic approach to this is to reset the database after every test either with custom code or with tools like Flyway - this can be very time-consuming and means that tests must be run serially. However, with Docker you could create an image of your database and run up one instance per test, and then run all the tests in parallel since you know they will all be running against the same snapshot of the database. Since the tests are running in parallel and in Docker containers they could run all on the same box at the same time and should finish much faster. Try doing that with a full VM.

从评论中.....

有意思!我想我还是对 "快照操作系统 "这个概念感到困惑。我想我还是对 "快照操作系统 "这个概念感到困惑。如果不制作操作系统的图像,如何做到这一点呢?

Well, let's see if I can explain. You start with a base image, and then make your changes, and commit those changes using docker, and it creates an image. This image contains only the differences from the base. When you want to run your image, you also need the base, and it layers your image on top of the base using a layered file system: as mentioned above, Docker uses AuFS. AuFS merges the different layers together and you get what you want; you just need to run it. You can keep adding more and more images (layers) and it will continue to only save the diffs. Since Docker typically builds on top of ready-made images from a registry, you rarely have to "snapshot" the whole OS yourself.




Answer 2 manu97


很好的回答。只是想了解一下容器与虚拟机的形象代表,看一下下面这个。

enter image description here

Source




Answer 3 Shital Shah


了解虚拟化和容器是如何在低层工作的,这可能会有帮助。这将澄清很多事情。

注:下面我在描述时稍微简化了一下。详情请看参考资料。

How virtualization works at low level?

In this case VM manager takes over the CPU ring 0 (or the "root mode" in newer CPUs) and intercepts all privileged calls made by guest OS to create illusion that guest OS has its own hardware. Fun fact: Before 1998 it was thought to be impossible to achieve this in x86 architecture because there was no way to do this kind of interception. The folks at VMWare were the first who had an idea to rewrite the executable bytes in memory for privileged calls of guest OS to achieve this.

净效果是,虚拟化允许你在同一硬件上运行两个完全不同的操作系统。每个客座操作系统都要经历所有的启动、加载内核等过程。你可以有非常严格的安全性,比如说,客座操作系统不能完全访问主机操作系统或其他客座操作系统并把事情搞砸。

How containers works at low level?

Around 2006, people including some of the employees at Google implemented new kernel level feature called namespaces (however the idea long before existed in FreeBSD). One function of the OS is to allow sharing of global resources like network and disk to processes. What if these global resources were wrapped in namespaces so that they are visible only to those processes that run in the same namespace? Say, you can get a chunk of disk and put that in namespace X and then processes running in namespace Y can't see or access it. Similarly, processes in namespace X can't access anything in memory that is allocated to namespace Y. Of course, processes in X can't see or talk to processes in namespace Y. This provides kind of virtualization and isolation for global resources. This is how docker works: Each container runs in its own namespace but uses exactly the same kernel as all other containers. The isolation happens because kernel knows the namespace that was assigned to the process and during API calls it makes sure that process can only access resources in its own namespace.

The limitations of containers vs VM should be obvious now: You can't run completely different OS in containers like in VMs. However you can run different distros of Linux because they do share the same kernel. The isolation level is not as strong as in VM. In fact, there was a way for "guest" container to take over host in early implementations. Also you can see that when you load new container, the entire new copy of OS doesn't start like it does in VM. All containers share same kernel. This is why containers are light weight. Also unlike VM, you don't have to pre-allocate significant chunk of memory to containers because we are not running new copy of OS. This enables to run thousands of containers on one OS while sandboxing them which might not be possible to do if we were running separate copy of OS in its own VM.




Answer 4 aholbreich


我喜欢Ken Cochrane的回答。

但我想补充一点,这里不详细介绍。在我看来,Docker在整个过程中也有区别。与VM不同的是,Docker并不是(仅仅是)优化硬件的资源共享,更多的是它提供了一个打包应用的 "系统"(作为一组微服务,最好是,但不是必须的)。

To me it fits in the gap between developer-oriented tools like rpm, Debian packages, Maven, npm + Git on one side and ops tools like Puppet, VMware, Xen, you name it...

为什么将软件部署到docker镜像(如果这是个正确的术语的话)比简单地部署到一致的生产环境更容易?

Your question assumes some consistent production environment. But how to keep it consistent? Consider some amount (>10) of servers and applications, stages in the pipeline.

To keep this in sync you'll start to use something like Puppet, Chef or your own provisioning scripts, unpublished rules and/or lot of documentation... In theory servers can run indefinitely, and be kept completely consistent and up to date. Practice fails to manage a server's configuration completely, so there is considerable scope for configuration drift, and unexpected changes to running servers.

So there is a known pattern to avoid this, the so called immutable server. But the immutable server pattern was not loved. Mostly because of the limitations of VMs that were used before Docker. Dealing with several gigabytes big images, moving those big images around, just to change some fields in the application, was very very laborious. Understandable...

有了Docker生态系统,你永远不需要在 "小改动 "上移动千兆字节(感谢aufs和注册表),你也不需要担心在运行时将应用程序打包到Docker容器中,从而损失性能。你不需要担心该镜像的版本问题。

最后,你甚至可以经常在你的Linux笔记本上重现复杂的生产环境(如果在你的情况下无法正常工作,请不要给我打电话;)

当然你也可以在VM中启动Docker容器(这是个好主意)。在VM层面减少你的服务器配置。以上这些都可以由Docker来管理。

P.S. Meanwhile Docker uses its own implementation "libcontainer" instead of LXC. But LXC is still usable.




Answer 5 Ashish Bista


Docker并不是一种虚拟化方法论。它依赖于其他工具,真正实现基于容器的虚拟化或操作系统级别的虚拟化。为此,Docker最初使用的是LXC驱动,后来转到libcontainer,现在已经改名为runc。Docker主要专注于在应用容器内实现应用的自动化部署。应用容器是为了打包和运行单一服务而设计的,而系统容器则是为了运行多个进程,比如虚拟机。所以,Docker被认为是作为容器管理或应用部署工具在容器化系统中的容器管理或应用部署工具。

为了知道它和其他虚拟化有什么不同,我们先来了解一下虚拟化及其类型。然后,就会更容易理解那里有什么区别了。

Virtualization

在其设想的形式下,它被认为是一种逻辑上划分主机的方法,允许多个应用程序同时运行。然而,当公司和开源社区能够提供一种处理特权指令的方法,并允许在一个基于x86的系统上同时运行多个操作系统时,情况发生了巨大的变化。

Hypervisor

Hypervisor负责创建虚拟环境,在此基础上运行客机虚拟机。它监督来宾系统,并确保必要时将资源分配给来宾。Hypervisor坐在物理机和虚拟机之间,为虚拟机提供虚拟化服务。为了实现它,它拦截虚拟机上的来宾操作系统的操作,并在主机的操作系统上进行仿真操作。

虚拟化技术的快速发展,主要是在云计算领域,在Xen、VMware Player、KVM等管理程序的帮助下,在一台物理服务器上创建多个虚拟服务器,并在商品处理器中加入硬件支持,如Intel VT和AMD-V等,推动了虚拟化技术的进一步应用。

Types of Virtualization

虚拟化的方法可以根据其如何模仿硬件到来宾操作系统,模拟来宾操作环境来进行分类。主要有三种类型的虚拟化方式。

  • Emulation
  • Paravirtualization
  • 基于容器的虚拟化

Emulation

仿真,也称为完全虚拟化,完全在软件中运行虚拟机操作系统内核。这种类型中使用的管理程序被称为2型管理程序。它安装在主机操作系统的顶部,负责将客机操作系统内核代码翻译成软件指令。这个翻译完全是在软件中完成的,不需要硬件的参与。仿真使它可以运行任何支持被仿真环境的非修改的操作系统。这种类型的虚拟化的缺点是额外的系统资源开销,与其他类型的虚拟化相比,导致性能下降。

Emulation

这个类别的例子包括VMware Player、VirtualBox、QEMU、Bochs、Parallels等。

Paravirtualization

Paravirtualization, also known as Type 1 hypervisor, runs directly on the hardware, or “bare-metal”, and provides virtualization services directly to the virtual machines running on it. It helps the operating system, the virtualized hardware, and the real hardware to collaborate to achieve optimal performance. These hypervisors typically have a rather small footprint and do not, themselves, require extensive resources.

这一类的例子包括Xen、KVM等。

Paravirtualization

Container-based Virtualization

基于容器的虚拟化,又称操作系统级虚拟化,可以在一个操作系统内核内实现多个孤立的执行。它具有最佳的性能和密度,并具有动态资源管理的特点。这种虚拟化所提供的隔离的虚拟执行环境被称为容器,可以看成是一个可追踪的进程组。

Container-based virtualization

The concept of a container is made possible by the namespaces feature added to Linux kernel version 2.6.24. The container adds its ID to every process and adding new access control checks to every system call. It is accessed by the clone() system call that allows creating separate instances of previously-global namespaces.

命名空间可以用许多不同的方式来使用,但最常见的方法是创建一个孤立的容器,它没有可见性,也无法访问容器外的对象。容器内运行的进程似乎是在正常的Linux系统中运行,尽管它们与位于其他命名空间中的进程共享底层内核,其他类型的对象也是如此。例如,在使用命名空间时,容器内的根用户在容器外不被视为根用户,这就增加了额外的安全性。

Linux控制组(cgroups)子系统是实现基于容器虚拟化的下一个主要组件,用于对进程进行分组并管理其总资源消耗。它通常被用来限制容器的内存和CPU消耗。由于一个容器化的Linux系统只有一个内核,而且内核对容器有完全的可见性,因此只有一个级别的资源分配和调度。

Linux容器的管理工具有多种,包括LXC、LXD、systemd-nspawn、lmctfy、Warden、Linux-VServer、OpenVZ、Docker等。

Containers vs Virtual Machines

与虚拟机不同的是,容器不需要启动操作系统内核,所以容器可以在一秒钟内创建。这一特点使得基于容器的虚拟化比其他虚拟化方法更独特、更可取。

由于基于容器的虚拟化几乎不会给主机增加任何开销,因此基于容器的虚拟化具有近乎原生的性能。

对于基于容器的虚拟化,与其他虚拟化不同,不需要额外的软件。

主机上的所有容器共享主机的调度器,节省了额外的资源需求。

容器状态(Docker或LXC映像)与虚拟机映像相比,容器状态(Docker或LXC映像)的体积较小,所以容器映像很容易分发。

Resource management in containers is achieved through cgroups. Cgroups does not allow containers to consume more resources than allocated to them. However, as of now, all resources of host machine are visible in virtual machines, but can't be used. This can be realized by running top or htop on containers and host machine at the same time. The output across all environments will look similar.

Update:

How does Docker run containers in non-Linux systems?

如果因为Linux内核中的可用功能,容器是可能的,那么很明显的问题是,非Linux系统如何运行容器。Docker for Mac和Windows都是使用Linux VM来运行容器。Docker Toolbox过去是在Virtual Box VM中运行容器。但是,最新的Docker在Windows中使用Hyper-V,在Mac中使用Hypervisor.framework。

现在,让我详细介绍一下Docker for Mac如何运行容器。

Docker for Mac uses https://github.com/moby/hyperkit to emulate the hypervisor capabilities and Hyperkit uses hypervisor.framework in its core. Hypervisor.framework is Mac's native hypervisor solution. Hyperkit also uses VPNKit and DataKit to namespace network and filesystem respectively.

Docker在Mac中运行的Linux VM是只读的。但是,你可以通过运行它来实现。

screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty .

现在,我们甚至可以检查这个VM的内核版本。

# uname -a Linux linuxkit-025000000001 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:86_64 Linux .

所有的容器都在这个 VM 中运行。

There are some limitations to hypervisor.framework. Because of that Docker doesn't expose docker0 network interface in Mac. So, you can't access containers from the host. As of now, docker0 is only available inside the VM.

Hyper-v是Windows中的原生管理程序。他们也在尝试利用Windows 10的功能来原生运行Linux系统。