Auto scaling is only as good as the signal you feed it. CPU alone misleads IO-bound web tiers; ALB request count per target or custom CloudWatch metrics from your app often behave better for user-facing latency.

Choosing primitives

  • ECS/Fargate: scale service desired count on CPU, memory, or ALB requests per target with sane min/max caps.
  • EC2 ASG: prefer target tracking policies with warm pools or lifecycle hooks if bootstrapping takes minutes.
  • Set cooldowns and instance protection during deploys so scaling doesn’t fight your rolling update.

In 2025–2026 FinOps reviews, tie scaling alarms to monthly burn dashboards so finance trusts the knobs engineering turns.