Workflows
Terraform DAGs: parallelism first, safety second
A practical deep-dive on how Terraform's dependency graph behaves at scale — and how to avoid surprise destroy chains and apply-time blowups.
Why this is worth your time
- At ~200+ resources, the graph gets wide. Review becomes about blast radius, not syntax.
- Implicit dependencies are great until a refactor changes ordering and you ship an unexpected recreate.
- Bad graphs create slow applies, fragile modules, and risky drift handling.
Architecture pattern
- Module boundaries: define clear inputs/outputs; avoid cross-module data reads that create hidden coupling.
- Promotion gates: policy checks (OPA/Conftest or Checkov) + human review on planned destroys.
- Execution strategy: separate state by blast radius (workspaces or separate stacks) and run applies in controlled stages.
Sharp edges
- `depends_on` is a last resort — it often signals a leaking abstraction.
- Graph fan-out hides risk: one small change can touch many resources.
- State and provider timeouts become part of reliability (long applies fail in the worst possible moment).
Production checklist
- Before refactors: visualize the graph and identify fan-out hotspots.
- Require review on any destroy in plan; treat it like a production change.
- Split stacks by ownership and blast radius (per app/team/env).
- Add drift checks and guardrails before apply (policy + static analysis).
Copy/paste snippets
terraform graph | dot -Tsvg > graph.svg open graph.svg
terraform plan -out tfplan
terraform show -json tfplan | jq -r '.resource_changes[] | select(.change.actions|index("delete")) | .address'terraform apply -parallelism=10 # tune based on AWS/GCP API throttling + module fan-out
Route: /workflows/terraform-dag