Workflows
Kubernetes Rollouts Promote On Slos Not On Pods Are Ready
A workflow deep-dive focused on safe Kubernetes change: rollout gates, autoscaling interplay, and rollback.
Why this is worth your time
- The value is not in the tool — it’s in making change safe, observable, and reversible.
- Most “mystery outages” come from missing gates (policy, rollout health, ownership labels).
- This page is a stable anchor for posts and future expansions.
Architecture pattern
- Define the change workflow end-to-end: inputs → validation → rollout → verification → rollback.
- Use a small set of SLIs as gates (error-rate, latency, saturation), not vanity metrics.
- Automate checks, but keep a human review step for high-risk changes.
Sharp edges
- Generic dashboards with no slicing hide impact; labels/dimensions matter.
- Auto-remediation without guardrails increases blast radius.
- If rollback isn’t rehearsed, it won’t work when needed.
Production checklist
- What’s the blast radius? (service, env, tenant, region)
- What are the promotion gates? (SLIs + window)
- What’s the rollback? (fast, tested, documented)
- What’s the post-change verification? (baseline vs canary deltas)
Copy/paste snippets
kubectl rollout status deploy/<name>
promql: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))kubectl -n <ns> get events --sort-by=.metadata.creationTimestamp | tail -n 20
Route: /workflows/kubernetes-rollouts-promote-on-slos-not-on-pods-are-ready