Workflows

Kubernetes Rollouts Promote On Slos Not On Pods Are Ready

A workflow deep-dive focused on safe Kubernetes change: rollout gates, autoscaling interplay, and rollback.

Why this is worth your time

  • The value is not in the tool — it’s in making change safe, observable, and reversible.
  • Most “mystery outages” come from missing gates (policy, rollout health, ownership labels).
  • This page is a stable anchor for posts and future expansions.

Architecture pattern

  • Define the change workflow end-to-end: inputs → validation → rollout → verification → rollback.
  • Use a small set of SLIs as gates (error-rate, latency, saturation), not vanity metrics.
  • Automate checks, but keep a human review step for high-risk changes.

Sharp edges

  • Generic dashboards with no slicing hide impact; labels/dimensions matter.
  • Auto-remediation without guardrails increases blast radius.
  • If rollback isn’t rehearsed, it won’t work when needed.

Production checklist

  • What’s the blast radius? (service, env, tenant, region)
  • What are the promotion gates? (SLIs + window)
  • What’s the rollback? (fast, tested, documented)
  • What’s the post-change verification? (baseline vs canary deltas)

Copy/paste snippets

kubectl rollout status deploy/<name>
promql: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
kubectl -n <ns> get events --sort-by=.metadata.creationTimestamp | tail -n 20

Route: /workflows/kubernetes-rollouts-promote-on-slos-not-on-pods-are-ready