Cloud Cost Optimization: Quick Wins That Actually Matter

Every cloud cost optimization guide tells you to delete unused volumes, right-size instances, and use reserved instances. Then your AWS bill hits $50,000/month, you implement all the textbook advice, and save… $800.

The problem isn’t that the advice is wrong—it’s that most guides focus on optimizations that sound impressive but don’t actually impact your bill. Turning off dev instances at night saves you $200/month. Fixing your data transfer architecture saves you $8,000/month. One of these matters.

I’ve reduced cloud bills by 30-60% for multiple companies. Here’s what actually works, prioritized by impact.

The 80/20 of Cloud Costs

Before you optimize anything, run this analysis:

# AWS
aws ce get-cost-and-usage \
  --time-period Start=2025-01-01,End=2025-01-31 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --output table

# Or use AWS Cost Explorer in the console

Sort by cost. The top 3-5 services are where you’ll find real savings. For most companies, this is:

Compute (EC2, ECS, Lambda, EKS)
Data Transfer
RDS/Managed Databases
Storage (S3, EBS)
NAT Gateways (surprisingly expensive)

If you’re spending $30,000/month on EC2 and $200/month on CloudWatch, optimizing CloudWatch is a waste of time. Start at the top.

Data Transfer: The Hidden Budget Killer

Data transfer costs sneak up on teams because they’re invisible until they’re not. You deploy a new feature, traffic increases 20%, and suddenly your bill jumps $4,000/month. The culprit is usually one of these:

Cross-AZ traffic. AWS charges $0.01/GB for data transfer between availability zones. If your application makes 100 calls per request to a database in another AZ, and you serve 10M requests/month, that’s 1TB of transfer—$10,000/month just for cross-AZ traffic.

The fix: keep services and their databases in the same AZ when possible, or cache aggressively to reduce database calls.

# Bad: Application in us-east-1a, database in us-east-1b
App (us-east-1a) → 100 queries → RDS (us-east-1b) = $0.01/GB both ways

# Better: Application and database in same AZ
App (us-east-1a) → 100 queries → RDS (us-east-1a) = free

# Best: Application caches, reduces queries to 5 per request
App (us-east-1a) → 5 queries → RDS (us-east-1a) = free + faster

NAT Gateway data transfer. NAT Gateways charge $0.045/GB for data processing on top of the $0.045/hour base charge. A service that downloads 5TB/month from the internet costs $225/month just in NAT Gateway processing fees.

If you’re pulling Docker images, downloading packages, or fetching data from external APIs, you’re paying for it twice—once for internet egress, once for NAT Gateway processing.

The fix: use VPC endpoints for AWS services (S3, DynamoDB, ECR). These are free and bypass the NAT Gateway entirely:

# Before: Lambda → NAT Gateway → Internet → S3 = $0.045/GB NAT + $0.09/GB egress
# After: Lambda → VPC Endpoint → S3 = free

# Create S3 VPC endpoint
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-12345 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-12345

One client was spending $6,000/month on NAT Gateway costs for ECS tasks pulling images from ECR. We added an ECR VPC endpoint and the cost dropped to $180/month (just the base NAT Gateway charge for other traffic).

CloudFront vs direct S3. Serving assets directly from S3 costs $0.09/GB for the first 10TB of data transfer. Serving the same assets through CloudFront costs $0.085/GB and you get caching, HTTPS, and better performance. This is a rare case where the better solution is also cheaper.

Compute Right-Sizing (Actually Effective)

Everyone tells you to right-size instances. The problem is they focus on CPU utilization, which is usually the wrong metric.

Don’t optimize based on CPU alone. An instance running at 10% CPU might still be correctly sized if it needs the memory. AWS charges for instance size, not actual resource usage.

The right approach:

Check memory utilization (requires CloudWatch agent installation)
Check network throughput and disk IOPS
Check CPU last

A client was running t3.xlarge instances (4 vCPU, 16GB RAM) at “low” CPU utilization. The recommendation was to downsize to t3.large. But memory utilization was 85% and the application needed that RAM. Downsizing would have caused OOM errors.

Burstable instances (t3, t4g) are underutilized. Most teams use t3.medium for everything because it’s familiar. But many workloads would work fine on t4g.medium (ARM-based Graviton) for 20% less cost.

The catch: your code needs to support ARM. For most web applications (Python, Node, Go, Java), this works out of the box. For compiled binaries or native dependencies, test first.

Spot instances for dev/staging. Production workloads usually need reliability, but dev environments don’t. Spot instances cost 60-90% less than on-demand.

# Auto Scaling Group with spot instances for dev
MixedInstancesPolicy:
  InstancesDistribution:
    OnDemandPercentageAboveBaseCapacity: 0  # 100% spot
    SpotAllocationStrategy: price-capacity-optimized
  LaunchTemplate:
    LaunchTemplateSpecification:
      LaunchTemplateId: lt-12345
      Version: $Latest
    Overrides:
      - InstanceType: t3.medium
      - InstanceType: t3a.medium
      - InstanceType: t4g.medium

This lets AWS pick the cheapest available spot instance. For dev environments that can tolerate occasional interruptions, it’s a no-brainer.

Database Costs: RDS, Aurora, and the Alternatives

Managed databases are convenient and expensive. A db.r5.2xlarge RDS instance costs $1,460/month. The equivalent EC2 instance costs $380/month. You’re paying $1,080/month for backups, patching, and failover automation.

Sometimes that’s worth it. Sometimes it’s not.

Multi-AZ when you don’t need it. RDS Multi-AZ doubles your database cost for automatic failover. If your SLA allows 10-15 minutes of downtime for database recovery, you don’t need Multi-AZ—use snapshots and restore manually.

We had a client spending $4,200/month on a Multi-AZ RDS instance for a dashboard that 20 employees used. Downtime wasn’t a business risk. We switched to Single-AZ with automated snapshots and saved $2,100/month.

Aurora vs RDS. Aurora costs 20% more than RDS but gives you better performance and read replicas. If you’re not using read replicas, Aurora is just more expensive.

The break-even point: if you need 2+ read replicas, Aurora is usually cheaper because RDS charges full price for each replica while Aurora read replicas are cheaper.

Serverless v2 for variable workloads. Aurora Serverless v2 scales from 0.5 ACU to 128 ACU automatically. If your database sits idle most of the time but spikes during business hours, this can cut costs in half:

# Traditional Aurora: db.r5.large running 24/7
Cost: $200/month

# Aurora Serverless v2: scales 0.5-4 ACU based on load
Average cost: $90/month (55% savings)

The catch: it doesn’t scale to zero. You always pay for the minimum ACU (usually 0.5 ACU = ~$45/month). For truly idle databases, consider stopping RDS instances when not in use.

Storage: S3 Lifecycle Policies and Intelligent Tiering

S3 storage is cheap—$0.023/GB/month for standard storage. But most teams store everything in standard tier forever.

S3 Intelligent-Tiering is set-it-and-forget-it. It automatically moves objects between access tiers based on usage. No lifecycle policies to write, no risk of moving something important to Glacier and breaking your application.

# Enable Intelligent-Tiering for a bucket
aws s3api put-bucket-intelligent-tiering-configuration \
  --bucket my-bucket \
  --id default-tiering \
  --intelligent-tiering-configuration file://tiering.json

For logs, backups, and historical data that you might need occasionally, this is perfect. Objects you haven’t accessed in 30 days move to Infrequent Access (40% cheaper), after 90 days to Archive (77% cheaper).

Delete old EBS snapshots. Every EC2 snapshot costs $0.05/GB/month. If you’re taking daily snapshots and keeping them for a year, that’s 365 snapshots × snapshot size. A 100GB volume creates 36.5TB of snapshot data over a year—$1,825/month in storage costs.

Most teams need 7-30 days of snapshots, not 365 days. Set up lifecycle rules:

# Delete snapshots older than 30 days
aws dlm create-lifecycle-policy \
  --description "Daily snapshots, 30-day retention" \
  --execution-role-arn arn:aws:iam::123:role/DLMRole \
  --policy-details file://snapshot-policy.json

Reserved Instances and Savings Plans (When They Actually Make Sense)

Reserved Instances (RIs) and Savings Plans offer 30-70% discounts in exchange for 1-3 year commitments. The advice you’ll hear: “Always use RIs for steady-state workloads!”

The reality: RIs lock you into specific instance types and regions. If your architecture changes, you’re stuck paying for instances you’re not using.

When to use RIs:

You’ve been running the same instance type for 6+ months with no plans to change
The workload is truly steady-state (not seasonal, not experimental)
You’re comfortable with a 1-year commitment (3-year is rarely worth the extra savings)

When to use Savings Plans instead:

You want flexibility to change instance types or switch between EC2 and Lambda
Your architecture is still evolving
You want to commit to a spend amount ($500/month) rather than specific instances

For most teams, Compute Savings Plans are better than RIs because they apply across instance families, regions, and even Lambda usage.

# Example savings with Compute Savings Plan
On-demand spend: $5,000/month
1-year Compute Savings Plan: $3,500/month (30% savings)
Lock-in risk: Low (applies to any compute)

Don’t buy RIs or Savings Plans in your first year. Wait until your architecture stabilizes.

Reserved Capacity for RDS

RDS Reserved Instances are different from EC2 RIs—they’re usually a better deal because database workloads are more predictable than compute workloads.

If you’re running a production database 24/7, RDS RIs can save you 40-60%:

# db.r5.xlarge RDS instance
On-demand: $730/month
1-year RI (partial upfront): $420/month (42% savings)
3-year RI (all upfront): $275/month (62% savings)

The break-even point for 1-year RIs is usually 7-9 months. If you’re confident you’ll run the same database for a year, buy the RI.

For dev/staging databases, don’t use RIs—just stop the instances when not needed.

The Real Quick Wins

If you need to cut cloud costs by 20-30% in the next 30 days, do these in order:

Add VPC endpoints for S3, ECR, and other AWS services (eliminates NAT Gateway data transfer fees)
Review data transfer between AZs (co-locate applications and databases)
Enable S3 Intelligent-Tiering (automatic cost savings with no risk)
Delete old EBS snapshots (audit snapshots, set up lifecycle policies)
Stop non-production resources outside business hours (use Lambda or AWS Instance Scheduler)
Check for unused resources (unattached EBS volumes, old Elastic IPs, forgotten EC2 instances)

These don’t require architecture changes or code deploys. You can implement them this week.

For deeper savings (40-60%), you need architectural changes:

Move to ARM instances (Graviton)
Implement aggressive caching (reduce database and API calls)
Use CloudFront for static assets (cheaper than direct S3 access)
Refactor to serverless where appropriate (Lambda is cheaper than idle EC2)

These take time but have bigger impact.

What We Actually Do

When a client asks for cloud cost optimization, here’s the process:

Cost visibility: Set up tagging and Cost Explorer to see exactly where money is going
Data transfer audit: This is almost always the biggest quick win
Resource right-sizing: Focus on the top 5 most expensive resources first
Lifecycle policies: Automate deletion/archival of old data
Reserved capacity: Only after architecture has stabilized

Most clients see 25-40% cost reduction in the first 90 days. The key is prioritizing high-impact changes over “technically correct” optimizations that save $50/month.

Cloud cost optimization isn’t a one-time project—it’s an ongoing practice. The teams that control costs build visibility, tagging, and budgets into their architecture from the start rather than treating it as a cost-cutting exercise after the bill gets scary.

Need help reducing your cloud bill without sacrificing performance or reliability? We can help.

Cloud Cost Optimization: Quick Wins That Actually Matter

The 80/20 of Cloud Costs

Data Transfer: The Hidden Budget Killer

Compute Right-Sizing (Actually Effective)

Database Costs: RDS, Aurora, and the Alternatives

Storage: S3 Lifecycle Policies and Intelligent Tiering

Reserved Instances and Savings Plans (When They Actually Make Sense)

Reserved Capacity for RDS

The Real Quick Wins

What We Actually Do

Continue Reading

Rollback Strategies That Actually Work When You're Half-Asleep

Migrating from CircleCI to GitHub Actions: A Practical Guide

Build vs Buy Decisions: A Framework That Works

Have a Project
In Mind?

The 80/20 of Cloud Costs

Data Transfer: The Hidden Budget Killer

Compute Right-Sizing (Actually Effective)

Database Costs: RDS, Aurora, and the Alternatives

Storage: S3 Lifecycle Policies and Intelligent Tiering

Reserved Instances and Savings Plans (When They Actually Make Sense)

Reserved Capacity for RDS

The Real Quick Wins

What We Actually Do

Continue Reading

Rollback Strategies That Actually Work When You're Half-Asleep

Migrating from CircleCI to GitHub Actions: A Practical Guide

Build vs Buy Decisions: A Framework That Works

Have a ProjectIn Mind?

Have a Project
In Mind?