Kubernetes Course Labs

Modelling Stability with StatefulSets

Kubernetes is a dynamic platform where objects are usually created in parallel and with random names. That's what happens with Pods when you create a Deployment, and it's a pattern which scales well.

But some apps need a stable environment, where objects are created in a known order with fixed names. Think of a replicated system like a message queue or a database - there's often a primary node and multiple secondaries. The secondaries depend on the primary starting first and they need to know how to find it so they can sync data. That's where you use a StatefulSet.

StatefulSets are Pod controllers which can create multiple replicas in a stable environment. Replicas have known names, start consecutively and are individually addressable within the cluster.

API specs

The spec is similar to Deployments - metadata, a selector and a template for the Pod spec - but with one important addition:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: simple-statefulset
spec:
  selector:
    matchLabels:
      app: simple-statefulset
  serviceName: simple-statefulset
  replicas: 3
  template:
    # Pod spec

Services are decoupled from other Pod controllers, but a Service is required for each StatefulSet. The Service uses a special setup with no ClusterIP:

apiVersion: v1
kind: Service
metadata:
  name: simple-statefulset
spec:
  ports:
    - port: 8010
      targetPort: 80
  selector:
    app: simple-statefulset
  clusterIP: None

The StatefulSet has a link to the Service because it manages the Service endpoints. Each Pod has its IP address added to the Service and a separate DNS name is created for each Pod.

Deploy a simple StatefulSet

We'll start with a simple(ish) example that runs multiple Nginx Pods. This app doesn't need to use a StatefulSet, but it shows the pattern without getting too complex:

Let's see it in action:

kubectl apply -f labs/statefulsets/specs/simple

kubectl get po -l app=simple-statefulset --watch

You'll see two differences from a Deployment - the Pods don't have random names, and each Pod is only created when the previous Pod has started

📋 Check the logs for the wait-service container in each of the Pods.

In Pods with multiple containers, you can view the logs for specific containers with the -c flag. These logs will show the startup workflow:

kubectl logs simple-statefulset-0 -c wait-service

kubectl logs simple-statefulset-1 -c wait-service

Pod-0 knows it is the primary, because its has the expected -0 hostname; Pod 1 knows it is a secondary because it doesn't have that hostname

When they're running these are normal Pods, the StatefulSet just manages creating them differently than a Deployment.

Communication with StatefulSet Pods

StatefulSets add their Pod IP addresses to the Service.

📋 Check all the Pods are registered with the Service.

kubectl get endpoints simple-statefulset

There's one Service with 3 Pod IP addresses, but those Pods can also be reached using individual domain names.

📋 Run a sleep Pod from labs/statefulsets/specs/sleep-pod.yaml and do a DNS lookup for simple-statefulset and simple-statefulset-2.simple-statefulset.default.svc.cluster.local.

kubectl apply -f labs/statefulsets/specs/sleep-pod.yaml

kubectl exec sleep -- nslookup simple-statefulset

kubectl exec sleep -- nslookup simple-statefulset-2.simple-statefulset.default.svc.cluster.local

The internal Service returns all Pod IPs, but each Pod also has its own DNS entry using the name of the Pod -0, -1 etc.

This app has LoadBalancer and NodePort Services with the same Pod selector. These make the app available externally and they load-balance requests in the usual way.

Browse to http://localhost:8010 / http://localhost:30010 and then Ctrl-refresh, you'll see responses from different Pods

StatefulSet Pods have their name set in a label, so if you want to avoid load-balancing (e.g. to send all traffic to a secondary) you can pin the external Service to a specific Pod:

Deploy the Service change:

kubectl apply -f labs/statefulsets/specs/simple/update

Now browse to the app and responses will always come from the same Pod

Deploy a replicated SQL database

We've got the idea of StatefulSets so now we can deploy an app which really does need them - a Postgres database with primary and secondary nodes, each of which needs a PersistentVolumeClaim (PVC) to store data.

We'll use a Postgres Docker image which has all the initialization scripts, so we don't need to worry about that (if you're interested you'll find it in the sixeyed/widgetario repo).

StatefulSets have a special relationship with PersistentVolumeClaims, so you can request a PVC for each Pod which stays linked to the Pod. Pod-1 will have its own PVC and when you deploy an update the new Pod-1 will attach to the same PVC as the previous Pod-1:

Deploy the database and watch the PVCs being created:

kubectl apply -f labs/statefulsets/specs/products-db

kubectl get pvc -l app=products-db --watch

You'll see a PVC for Pod-0 gets created, then when Pod-0 is running another PVC gets created for Pod-1

📋 Check the logs of Pod-0 and you'll see it sets itself up as the primary.

kubectl logs products-db-0

📋 Check Pod-1 and it sets itself as the secondary, once the Postgres database is up and running on the primary:

kubectl logs products-db-1

Both Pods should end with a log saying the database is ready to accept connections:

kubectl logs -l app=products-db --tail 3

Lab

StatefulSets are complex and not as common as other controllers, but they have one big advantage over Deployments - they can dynamically provision a PVC for each Pod.

Deployments let you do this with ephemeral volumes, but StatefulSets make it easier with volume claim templates.

The simple-proxy/deployment.yaml is a spec to run an Nginx proxy over the StatefulSet website we have running.

Deploy the proxy:

kubectl apply -f labs/statefulsets/specs/simple-proxy

Test it works at http://localhost:8040 / http://localhost:30040.

Your task is to replace the Deployment which uses an emptyDir volume for cache files with a StatefulSet that uses a PVC for the cache for each Pod.

The proxy doesn't need Pods to be managed consecutively, so the spec should be set to create them in parallel.

Stuck? Try hints or check the solution.


EXTRA Testing the replicated database

You may run a SQL database in your test clusters. You don't want it to be publicly available but you do want to be able to connect and run queries. Running a SQL Client in Kubernetes walks you through that.


Cleanup

kubectl delete svc,cm,secret,statefulset,deployment,pod -l kubernetes.courselabs.co=statefulsets