Python Automation

Scripts and tooling that reduce repetitive ops work. Migration utilities, validation frameworks, and API tooling that teams actually use.

What I build

  • Migration scripts: moving workloads between clusters, clouds, or regions with validation at each step.
  • Validation tooling: pre and post-deploy checks that run automatically and produce clear pass/fail output.
  • API wrappers for infra: Python clients over cloud APIs that encode operational runbooks as code.
  • Toil automation: anything that a human does on a schedule and could be wrong gets scripted.
  • Report generation: cost reports, SLO burn rate summaries, and on-call metrics from raw data sources.

Patterns I always use

Idempotency first
Every script should be safe to run twice. Check state before acting, not after. This is the single biggest factor in whether ops scripts cause incidents.
Dry-run mode
Every script that mutates state gets a --dry-run flag that prints what it would do. Non-negotiable for anything that touches production.
Retry with backoff
Cloud API calls fail. Network calls fail. Wrap them in retry logic with exponential backoff and jitter. Don't let a transient error fail a 2-hour migration.
Structured logging
JSON logs with consistent fields. Scripts that run in CI or cron need logs you can query, not print statements you have to grep.

Example projects

  • This portfolio's engagement automation: GitHub Actions workflows that curate SRE reads weekly
  • Kubernetes namespace migration tooling with pre-flight checks and rollback
  • Terraform drift detection CLI that integrates with Slack alerting
  • SLO burn rate calculator from raw Prometheus data