Workflows

Terraform DAGs: parallelism first, safety second

A practical deep-dive on how Terraform's dependency graph behaves at scale — and how to avoid surprise destroy chains and apply-time blowups.

Why this is worth your time

  • At ~200+ resources, the graph gets wide. Review becomes about blast radius, not syntax.
  • Implicit dependencies are great until a refactor changes ordering and you ship an unexpected recreate.
  • Bad graphs create slow applies, fragile modules, and risky drift handling.

Architecture pattern

  • Module boundaries: define clear inputs/outputs; avoid cross-module data reads that create hidden coupling.
  • Promotion gates: policy checks (OPA/Conftest or Checkov) + human review on planned destroys.
  • Execution strategy: separate state by blast radius (workspaces or separate stacks) and run applies in controlled stages.

Sharp edges

  • `depends_on` is a last resort — it often signals a leaking abstraction.
  • Graph fan-out hides risk: one small change can touch many resources.
  • State and provider timeouts become part of reliability (long applies fail in the worst possible moment).

Production checklist

  • Before refactors: visualize the graph and identify fan-out hotspots.
  • Require review on any destroy in plan; treat it like a production change.
  • Split stacks by ownership and blast radius (per app/team/env).
  • Add drift checks and guardrails before apply (policy + static analysis).

Copy/paste snippets

terraform graph | dot -Tsvg > graph.svg
open graph.svg
terraform plan -out tfplan
terraform show -json tfplan | jq -r '.resource_changes[] | select(.change.actions|index("delete")) | .address'
terraform apply -parallelism=10
# tune based on AWS/GCP API throttling + module fan-out

Route: /workflows/terraform-dag