What Is Docker Swarm?
Docker Swarm is a container orchestration tool built into Docker. It groups multiple Docker hosts into a single cluster to deploy and manage services. It has a simpler setup than Kubernetes and can be operated using the Docker CLI alone, making it suitable for small to medium-scale services.
As an analogy, if a single Docker host is one chef, Swarm is a kitchen with multiple chefs working together. The head chef (manager node) distributes orders (services), and the cooks (worker nodes) prepare the dishes (containers). If one cook is unavailable, another takes over.
| Component | Role |
|---|---|
| Manager node | Cluster management, scheduling, storing service definitions |
| Worker node | Running containers |
| Service | Deployment unit (image + replica count + network config) |
| Task | Individual container instance of a service |
| Stack | Application unit that bundles multiple services |
Cluster Initialization
Here’s the process of setting up a Swarm cluster.
# === Run on the manager node ===
# Initialize Swarm (current host becomes manager)
docker swarm init --advertise-addr 192.168.1.10
# Swarm initialized: current node (abc123) is now a manager.
#
# To add a worker to this swarm, run the following command:
# docker swarm join --token SWMTKN-1-xxx 192.168.1.10:2377
#
# To add a manager to this swarm, run 'docker swarm join-token manager'
# Check worker join token
docker swarm join-token worker
# SWMTKN-1-xxx...
# Check manager join token (3 managers recommended for high availability)
docker swarm join-token manager
# === Run on the worker node ===
docker swarm join --token SWMTKN-1-xxx 192.168.1.10:2377
# This node joined a swarm as a worker.
# === Check cluster status on manager node ===
docker node ls
# ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
# abc123 * manager-1 Ready Active Leader 24.0.7
# def456 worker-1 Ready Active 24.0.7
# ghi789 worker-2 Ready Active 24.0.7
For high availability, configure an odd number of manager nodes (3, 5, or 7). The Raft consensus algorithm elects a leader, and a majority of managers must be alive for the cluster to operate normally.
Service Deployment
In Swarm, containers are deployed as services. Manage them with the docker service command.
# Create a service (3 Nginx web servers)
docker service create \
--name web \
--replicas 3 \
--publish 80:80 \
--update-delay 10s \
--update-parallelism 1 \
--restart-condition on-failure \
nginx:alpine
# List services
docker service ls
# ID NAME MODE REPLICAS IMAGE
# xyz123 web replicated 3/3 nginx:alpine
# Detailed service status (which node each task runs on)
docker service ps web
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE
# aaa111 web.1 nginx:alpine manager-1 Running Running 2 minutes ago
# bbb222 web.2 nginx:alpine worker-1 Running Running 2 minutes ago
# ccc333 web.3 nginx:alpine worker-2 Running Running 2 minutes ago
# View service logs (aggregated from all tasks)
docker service logs -f web
# Scale service (3 → 5)
docker service scale web=5
# web scaled to 5
# overall progress: 5 out of 5 tasks
# Detailed service information
docker service inspect --pretty web
Thanks to Swarm’s Routing Mesh, you can access the service through any node in the cluster. For example, even if the web service is running only on worker-1 and worker-2, a request to manager-1’s port 80 is automatically routed to the appropriate node.
Rolling Updates
The process of updating images without service interruption.
# Update service image (rolling update)
docker service update \
--image nginx:1.25-alpine \
--update-delay 15s \
--update-parallelism 1 \
--update-failure-action rollback \
--update-order start-first \
web
# Check update progress
docker service ps web
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE
# ddd444 web.1 nginx:1.25-alpine manager-1 Running Running 10 seconds ago
# aaa111 \_web.1 nginx:alpine manager-1 Shutdown Shutdown 15 seconds ago
# bbb222 web.2 nginx:alpine worker-1 Running Running 5 minutes ago
# ccc333 web.3 nginx:alpine worker-2 Running Running 5 minutes ago
# Manual rollback on failure
docker service rollback web
# web rolled back
# Check rollback status
docker service inspect --pretty web
# UpdateStatus:
# State: rollback_completed
Update option descriptions:
| Option | Description |
|---|---|
--update-parallelism 1 | Update 1 task at a time |
--update-delay 15s | Wait 15 seconds between each task update |
--update-failure-action rollback | Automatically rollback on failure |
--update-order start-first | Start new task first, then stop the old one |
Stack Deployment
docker stack deploys multiple services at once using the Docker Compose file format.
# stack.yml — web application stack
version: "3.8"
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
networks:
- frontend
deploy:
replicas: 2
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
api:
image: my-registry.com/my-app:latest
environment:
- NODE_ENV=production
- DATABASE_URL=postgres://user:pass@db:5432/myapp
networks:
- frontend
- backend
deploy:
replicas: 3
resources:
limits:
cpus: "1.0"
memory: 512M
reservations:
cpus: "0.25"
memory: 128M
update_config:
parallelism: 1
delay: 15s
order: start-first
db:
image: postgres:16-alpine
environment:
- POSTGRES_DB=myapp
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
volumes:
- pgdata:/var/lib/postgresql/data
networks:
- backend
deploy:
replicas: 1
placement:
constraints:
# Run DB only on manager node (volume consistency)
- node.role == manager
redis:
image: redis:7-alpine
networks:
- backend
deploy:
replicas: 1
networks:
frontend:
driver: overlay
backend:
driver: overlay
internal: true
volumes:
pgdata:
# Deploy stack
docker stack deploy -c stack.yml myapp
# List stacks
docker stack ls
# NAME SERVICES ORCHESTRATOR
# myapp 4 Swarm
# Check stack services
docker stack services myapp
# ID NAME MODE REPLICAS IMAGE
# aaa myapp_nginx replicated 2/2 nginx:alpine
# bbb myapp_api replicated 3/3 my-registry.com/my-app:latest
# ccc myapp_db replicated 1/1 postgres:16-alpine
# ddd myapp_redis replicated 1/1 redis:7-alpine
# Check all tasks in the stack
docker stack ps myapp
# Update stack (run same command after modifying stack.yml)
docker stack deploy -c stack.yml myapp
# Remove stack
docker stack rm myapp
Node Management
How to manage cluster node states and roles.
# Check node status
docker node ls
# Set node to maintenance mode (tasks migrate to other nodes)
docker node update --availability drain worker-1
# All tasks on worker-1 are rescheduled to other nodes
# Reactivate after maintenance
docker node update --availability active worker-1
# Add labels to nodes (used for placement constraints)
docker node update --label-add zone=ap-northeast-2a worker-1
docker node update --label-add ssd=true worker-2
# Label-based placement (run DB only on SSD nodes)
docker service create \
--name db \
--constraint 'node.labels.ssd == true' \
postgres:16-alpine
# Remove a node from the Swarm
# Run on the worker node:
docker swarm leave
# Remove from manager:
docker node rm worker-1
Health Checks and Service Discovery
Swarm provides built-in health checks and load balancing.
# Add health check to Dockerfile
FROM node:22-alpine
WORKDIR /app
COPY . .
RUN npm ci --omit=dev
# Health check every 30s, 5s timeout, unhealthy after 3 failures
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
USER node
EXPOSE 3000
CMD ["node", "server.js"]
# Set health check when creating a service
docker service create \
--name api \
--replicas 3 \
--health-cmd "wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1" \
--health-interval 30s \
--health-timeout 5s \
--health-retries 3 \
my-app:latest
# Check health check results in service status
docker service ps api
# Only healthy tasks receive traffic
Practical Tips
- Number of manager nodes: In production, configure 3 or more managers (odd number). With 1, it becomes a single point of failure (SPOF). With 2, consensus cannot be reached if 1 goes down. With 3, the cluster can tolerate 1 failure.
- Workloads on manager nodes: Manager nodes can run tasks, but in large clusters, it’s more stable to set managers to
drainmode so they focus only on management operations. - Secrets management: Use
docker secretto store encrypted secrets in the Raft log and mount them to containers at/run/secrets/. This is more secure than environment variables. - Rolling update strategy: Using
--update-order start-firstensures the new task starts successfully before the old task is stopped, preventing downtime. Set--update-failure-action rollbackfor automatic rollback on failure. - Swarm vs Kubernetes: Swarm is suitable for clusters with fewer than 10 nodes, teams familiar with Docker Compose, and situations requiring quick setup. Choose Kubernetes when you need large-scale clusters, complex scheduling, or a rich ecosystem.