Windows Containers: Blog Series
Docker Swarm Networking

Docker Swarm on Windows

Docker Swarm enables containers to be managed across different hosts. It work on Windows Server 2016 hosts, but the built-in routing mesh is not supported until the newest Windows Server version 1709, released in October 2017.

Docker Swarm is the tool for managing containers across separate docker machines. It defines machines as managers or workers. They communicate with each other to implement docker services. A service is a collection of containers running with the same configuration, and following a set of rules to define the service.

Just to complete the picture, Docker Compose is the tool that creates an application from a set of services. The Containers feature in Windows Server 2016 by default includes Docker Swarm but not Docker Compose.

To set up the Swarm cluster we need more than one machine, obviously. Azure Container Service (ACS) does not currently include Windows hosts, although it is changing so fast that may be out of date any time soon. Instead we can create a cluster of Windows hosts using the Azure virtual machine scale set with Windows Server 2016 Datacenter - with Containers.

We need to open ports on the Windows firewall on each host to allow communication between the docker machines:

  • TCP port 2377 is for Docker communication between manager and worker.
  • TCP and UDP port 7946 is for the "control plane" communication between hosts (worker to worker). This trafffic synchronises the state of a service between hosts.
  • UDP port 4789 is for the "data plane" VXLAN encapsulated traffic between applications in containers.

To create the swarm, run:

docker swarm init --advertise-addr [IP address of manager]

The default is to listen on all addresses on port 2377 (0.0.0.0:2377), so there is no need to specify it. The dialogue returns a token.

To join a host as a worker, run:

docker swarm join --token [the token number returned when creating the swarm] [the listening address of the manager]

We can add or remove nodes later, as workers or managers. The documentation for setting up and managing the swarm is here: Docker Swarm.

If we want to use a GUI to see what is going on, we can use Portainer. I have described setting it up here: Windows Containers: Portainer GUI. This is what we see in the dashboard after creating the swarm:

Docker Swarm Portainer Dashboard

In the Swarm section, we can see an overview of the cluster:

Docker Swarm Portainer Swarm Cluster

And the default overlay network:

Docker Swarm Portainer Swarm Network

Before we create a service, we need to decide how external clients will connect to containers, and how containers will connect to each other. The default network type in Docker is nat. A port on the host is translated to a port on the container so, for example, we use --publish 80:80. But this limits us to one container only, on that port. If we do not define the host port (by using --publish 80), then one is created dynamically on the host, and so we can have more than one container listening on the same port. But then the client does not know what port on the host to connect to. We would need to discover the dynamic ports and put them into an external load balancer. In the case of a docker service, we would need to do this whenever a new replica is created or removed.

Alternatively we can set up a transparent network, where the container has an externally reachable IP address. This way we can have more than one container listening on the same port. But we would still need to manage the addresses in a load balancer whenever a replica is created or removed.

This is a general problem with service scaling across hosts. The Docker solution is to use an Overlay network for swarm traffic. Connections from external clients arriving at any host are routed to any replica in the service (a "routing mesh"). Connections from one container to another are on a private subnet shared across containers in the swarm, rather than on the subnet shared with the host. 

Windows Server before version 1709 supports the overlay network for communication between containers, but not the routing mesh for communication between external clients and containers. This leads to some confusing documentation.

For version 1709 and beyond, the command to create a service using the overlay network and routing mesh is, for example:

docker service create to create a new service
--name to give the service a friendly name
--replicas to specify the numbers of replicas at any one time
--publish if any ports are to be published externally
[image name] for the name of the image to run.

We can include other options, both for the configuration of the service, and the configuration of the containers. The full command for an IIS web server would be:

docker service create --name web --replicas 2 --publish 80:80 microsoft/iis

By default the containers are attached to the swarm overlay network (called "ingress"). The publishing mode is also "ingress". Any client connection to any host on port 80 is routed in a round robin to one of the containers on any host participating in the service. The containers can reach each other on their internal network on any port.

Here is the service in Portainer:

Docker Swarm Portainer Service 2

A wide range of parameters is shown in the Service Details:

Docker Swarm Portainer Service Details 2

Portainer shows the published port, in ingress mode:

Docker Swarm Portainer Service Publish Mode Ingress

We can see all the parameters of the service with docker service inspect [service name]. The overlay network has a subnet of 10.255.0.0/16. The service has created a Virtual IP of 10.255.0.4. With docker container inspect [container name] we can see the IP addresses of the containers are 10.255.0.6 and 10.255.0.7.

For version 1607 the routing mesh does not work. The approach that works on the earlier build is to publish the ports in host mode. Each host publishes the port directly, and maps it to the container. If we use a defined port on the host, then we can only have one container per host. Instead of defining the number of replicas we need to specify --mode global, so that one container is created on each node. The command to create the service this way is:

docker service create --name web --mode global --publish mode=host,published=80,target=80 microsoft/iis

If we use a dynamic port on the host, then we can have more than one, but we have to discover the port to connect to. The command to create the service this way is:

docker service create --name web --replicas 2 --publish mode=host,target=80 microsoft/iis

Doing it this way, the container is created on the "nat"network. Portainer shows the published port, in host mode:

Docker Swarm Portainer Service Publish Mode Host

Now we have containers running as a service. If a container fails, another is created. If a node fails or is shutdown, any containers running on it are replaced by new containers on other nodes.

Comments