Docker Swarm services in GCP

Published in

Google Cloud - Community

8 min readJan 24, 2022

In this article I describe how to expose services from a Docker Swarm cluster when running in GCP. I also analyze some differences between Docker Swarm and GKE on this aspect.

Background

Docker Swarm is a container orchestration system to manage multiple containers deployed across multiple hosts. Probably, this definition brings the very popular Kubernetes to the reader's mind. There's a battle between Swarm and Kubernetes, Swarm claims to be simpler to use and Kubernetes more powerful, but I won't get into this war.

Nowadays Kubernetes or any of its flavors (like Google Kubernetes Engine - GKE - in GCP) seem to dominate the cloud, but the fact is that Docker Swarm is still in use by many companies. And when moving their workloads to cloud, not everyone moves from their chosen orchestration platform. So I think it's useful knowing how to run Docker Swarm services in GCP.

I will focus mainly on the networking part. When I looked at how to reach containers in Swarm I realized its networking uses a different approach to GKE that I'm more used to. GKE clusters are first-class citizens in GCP so containers and services integrate very well. With Docker Swarm a bit more work is needed as you will see. No fear, I won't get into all the details, only the minimum needed.

But before continuing, a few words about the term "service". A Kubernetes service refers to a network construct that abstracts and exposes a set of pods/containers. For example, Load Balancer is a type of service. This is the meaning I usually refer to. In contrast, a Docker Swarm service is a container and all its configuration to run it, as the number of replicas. I hope it will be clear from the context what I am referring to.

Overlays

Docker has three fundamental types of networks: bridge, host and none. I will only describe the first one. Basically, the bridge network is a private, internal network on the host, where typically containers run. If a container needs to be accessible externally, its port can be published (--publish flag) on the Docker host.

The bridge network is implemented in Linux with a virtual bridge device, docker0, and pairs of veth (virtual Ethernet) devices for containers. In general, this is similar to what Kubernetes does, though implementations may vary.

In this setup, containers in a host can communicate to each other. However, when you have a cluster made of multiple hosts, containers across hosts have no way to communicate and this is when you may want to use Docker Swarm. It enables communication between Docker services (remember, Docker service = container) across worker nodes using overlay networks. An overlay network is a network built on top of another network, and what it allows is to have systems as if they were directly connected when they are not.

When you initialize the swarm, an overlay network named ingress (among other components) is created for you. It spans across all worker nodes, allowing communication between containers and making them externally accessible if you publish them, also providing an internal load balancing mechanism. The way this load balancing works is that a request can be received on any node of the cluster and it will forward it to any of the container instances running on any node.

This load balancing part sounds similar to the NodePort service of Kubernetes. And that being part of Kubernetes you may expect it to work for GKE too. Well, there are some differences but for sure it does, however GKE has better ways to do load balancing especially to expose HTTP services. And this is where differences become key. I will emphasize here the two more relevant for our purposes.

First, as I mentioned Swarm uses an overlay, in particular a VXLAN network. It means Ethernet frames sent by a container get encapsulated within UDP packets before being sent out of the Docker host. As a result, container IPs are not seen out of the host, only in the internal overlay network. In contrast, GKE can use Alias IPs which allows to configure secondary IPs on the VMs running containers and natively route traffic to them. This enables a more efficient load balancing where an external load balancer can target not the nodes but the specific containers (pods) that provide the service.

The figure above shows two GKE clusters configured differently:

On the left hand side, with a typical Kubernetes deployment, the pods use the NodePort service so the load balancer distributes traffic to the nodes and a second layer of load balancing forwards it to a container that may be in the same node or not. Usually it won't be, adding latency especially since nodes can be in different cloud zones.
On the right, there is a GKE cluster using Alias IPs and Network Endpoint Groups. The load balancer targets the pod endpoints directly instead of the nodes, avoiding multiple hops.

Second difference, even if Docker Swarm wanted to use a routing network model instead of an encapsulated network model, the assignment of IPs to containers is sequential with the containers deployed, not with the nodes. Alias IP, and a routing network model, needs to know which IP subranges will be assigned to the nodes.

In short, with Docker Swarm we will need to use a similar setup to that on the left where the backends are the nodes.

A practical case

Let's deploy a scenario to put things in practice. This scenario will consist of some HTTP services running on Docker Swarm services/containers.

A swarm

I will use three Docker hosts, one manager and two workers. You can get Docker on your Linux machines following installation instructions like these, or simply typing:

$ curl -fsSL https://get.docker.com -o get-docker.sh
$ sudo sh get-docker.sh

On the manager host, initialize the swarm:

manager:~$ sudo docker swarm init
...
    docker swarm join --token <xxxxx> 10.132.0.2:2377
...

The output of the command tells you how to add a worker to the swarm, using a security token and pointing to the IP address of the manager and port 2377 used for cluster management. Paste that line verbatim in the other hosts to join them as workers:

worker1:~$ sudo docker swarm join --token <xxxxx> 10.132.0.2:2377worker2:~$ sudo docker swarm join --token <xxxxx> 10.132.0.2:2377

Some services

I will deploy several replicas of a simple web server app listening on port 8080:

manager:~$ sudo docker service create --name my-web --replicas 3 -p 8080:8080 gcr.io/google-samples/hello-app:1.0

You can deploy a client VM in the same network to test access to this web server. My nodes have IPs from 10.132.0.2 to .4, so any of those should now respond on port 8080:

client:~$ watch -tn1 curl -s 10.132.0.3:8080

If you look for some seconds you will see the different containers responding, you can tell it from the hostname:

Hello, world!
Version: 1.0.0
Hostname: 78f61208d8c3

Next I will deploy another similar web server, the version 2. The containers will also listen on port 8080, but since we can't map again to that port on the nodes now I will use 8081:

manager:~$ sudo docker service create --name my-web-v2 --replicas 3 -p 8081:8080 gcr.io/google-samples/hello-app:2.0client:~$ watch -tn1 curl -s 10.132.0.2:8081
Hello, world!
Version: 2.0.0
Hostname: 1ba54c792d62

Now that we have several containers deployed, if you look into the details of the ingress overlay network you can see the IPs assigned to them. Regretfully, it seems you need to go node by node since every node will show you only the containers running on it (or at least I don't know a better way):

manager:~$ sudo docker network inspect ingress
...
  "Containers": {
    "1ba54c...": {
      "Name": "my-web-v2.3.ngz...",
      "IPv4Address": "10.0.0.29/24",
    },
    "a8cbe5...": {
      "Name": "my-web.3.5x2...",
      "IPv4Address": "10.0.0.25/24",worker1:~$ sudo docker network inspect ingress
...
      "IPv4Address": "10.0.0.27/24",
      ...
      "IPv4Address": "10.0.0.23/24",worker2:~$ sudo docker network inspect ingress
...
      "IPv4Address": "10.0.0.28/24",
      ...
      "IPv4Address": "10.0.0.24/24",

As I mentioned, Docker Swarm doesn't pre-allocate IP subranges for containers in a node and the resulting IP assignment is not sequential.

A VIP to rule them all

Accessing the node IPs is not very convenient, nor is it to specify in the request whatever ports services are published on. Let's deploy an internal HTTP(S) Load Balancer to offer a single VIP:port to access the services. I will draw here the architecture and let the reader do it themselves.

I put the Docker VMs as members of an unmanaged instance group, and this group as backend of the two backend services. Members are identified by their VM names, not by IP, so deploying the cluster in GCP doesn't require you to track your worker IPs!

The instance group has two named ports, 'webv1:8080' and 'webv2:8081', and the LB's URL map routes requests to 'svc1.example.com' and 'svc2.example.com' to the backend services. We have hidden the ports from the users!

Finally, I configured Cloud DNS to point these domains to the LB VIP. We can test it works from the client VM:

client:~$ watch -tn1 curl -s svc1.example.com
Hello, world!
Version: 1.0.0
Hostname: a8cbe5510a9c
...
client:~$ watch -tn1 curl -s svc2.example.com
Hello, world!
Version: 2.0.0
Hostname: 399d05d4d0ef

Hurray!

Final notes

There are some things I haven’t address but that I want to mention even if briefly:

Health check: the health check mechanism is not optimal on this deployment. As a result of the double load balancing involved, a health check probe will not check the service's health on the instance, in general. I'd suggest to deploy a simple daemon on each node to check the instance's health instead.
TCP/UDP services: you can use the same instance group as backend of multiple backend services, including that of a TCP/UDP load balancer.
Adding new services: if after the first deployment a new service comes up, you can create all the pieces needed and update the named ports of the instance group if required. Be aware that updating will substitute content, so your list of named ports should include existing and new ones.
Node auto-scaling: Docker Swarm doesn't support auto-scaling out of the box. You could leverage managed instance groups to automatically create new worker nodes based on parameters like CPU utilization, and join them to the cluster with docker swarm joinwhen booting. In this case, you should pay even more attention to the health check topic, since instances deemed unhealthy would be recreated.

Conclusion

Running Docker Swarm in GCP brings some very good advantages. It is not as smooth as running GKE, but not that complicated either, and it's an opportunity to quickly lift-and-shift your docker workloads to cloud. Of course, I would suggest you to take a look to GKE to see if it fits better your use case ;)