Building Production-Ready Multi-Environment AWS Infrastructure with Terraform

The Multi-Environment Challenge

Managing multiple AWS environments—development, staging, production—creates a fundamental dilemma: duplicate everything and triple costs, or share resources and risk environment coupling. Manual AWS deployments compound the problem through configuration drift, slow provisioning (hours instead of minutes), and documentation that never matches reality.

Infrastructure as Code with Terraform solves these problems, but implementation patterns matter. This article documents production-tested approaches for multi-environment AWS infrastructure, demonstrating how to balance cost efficiency, operational simplicity, and team velocity.

Key Results:

40% resource optimization through strategic infrastructure sharing
88% deployment time reduction (120 minutes → 15 minutes)
Shared infrastructure scales at ~40% per environment (vs 100% duplication)
Complete environment provisioning in minutes via terraform apply

Architecture: Shared Foundation, Isolated Services

The core pattern: share infrastructure with negligible isolation risk (VPC, RDS, ECR), duplicate resources requiring independence (ECS, CloudFront, S3).

Service Stack:

Compute: ECS on EC2 (40% cheaper than Fargate with acceptable complexity)
Database: Single RDS PostgreSQL with schema-based environment isolation
Storage: S3 + CloudFront per environment (prevents cache pollution between staging/prod)
Serverless: Lambda for image processing, AI content generation (AWS Bedrock)
Queues: SQS for background jobs with dead-letter queues

Multi-Environment Structure

Resource Optimization:

Single RDS vs 3 instances: 65% cost reduction
Shared VPC eliminates duplicate networking overhead
ECR with tags (dev-v1.0.0, prod-v1.0.0) vs separate registries
Overall: 40% resource optimization

Implementation: Directory Structure

The implementation organizes Terraform configuration into two distinct layers with clear separation of concerns.

Directory Organization

terraform/
├── shared/              # Deploy once, serves all environments
│   ├── main.tf         # Remote state backend
│   ├── vpc.tf          # VPC, subnets, NAT
│   ├── rds.tf          # PostgreSQL Multi-AZ
│   ├── ecr.tf          # Container registry
│   ├── security.tf     # Security groups
│   └── outputs.tf      # Export IDs
│
├── environments/        # Environment-specific variables
│   ├── dev.tfvars      # Development config
│   ├── staging.tfvars  # Staging config
│   └── prod.tfvars     # Production config
│
├── main.tf             # Remote state references
├── ecs.tf              # Task definitions
├── lb.tf               # Load balancers
├── cloudfront.tf       # CDN distributions
├── s3.tf               # Storage buckets
├── ssm.tf              # Credentials
└── sqs.tf              # Async queues

Shared Infrastructure Fields

VPC Configuration (shared/vpc.tf):

CIDR block allocation (typically /16)
Public subnets across availability zones
Private subnets for database and containers
NAT instances (not NAT Gateway for cost optimization)
Route tables and associations

RDS Configuration (shared/rds.tf):

Instance class sizing (start with db.t3.micro)
Multi-AZ deployment for automatic failover
Backup retention (7-30 days)
Engine version (PostgreSQL 16+)
Subnet groups referencing VPC private subnets
Security groups limiting access to ECS only
Lifecycle prevent_destroy protection

ECR Configuration (shared/ecr.tf):

Repository naming convention
Image tag mutability settings
Lifecycle policies for cleanup (untagged images, old releases)
Encryption configuration (AES256)

Environment-Specific Fields

ECS Task Definition (ecs.tf):

Family naming with environment suffix
CPU and memory allocation (varies by environment)
Container definitions via templatefile
Network mode (awsvpc for modern configurations)
IAM execution and task roles
Environment variables and secrets from SSM

ECS Service Configuration:

Desired task count (scales with environment)
Launch type (EC2 vs Fargate)
Network configuration referencing shared VPC
Load balancer attachment to target groups
Auto-scaling policies based on CPU/memory

CloudFront Distribution (cloudfront.tf):

Origin configuration pointing to S3 buckets
Cache behavior policies
SSL certificates from ACM
Price class selection (all edges vs regional)
Custom error responses for SPA routing

S3 Buckets (s3.tf):

Bucket naming with environment suffix
CORS configuration for asset uploads
Bucket policies restricting access to CloudFront
Encryption at rest configuration

State Management Pattern

Remote State Backend (shared/main.tf):

S3 bucket for state storage
DynamoDB table for state locking
Encryption enabled
Versioning for rollback capability

Remote State References (main.tf):

terraform_remote_state data sources
Consumption of shared layer outputs (VPC ID, subnet IDs, security groups)
Dependency management between layers

This structure enables independent deployment of shared infrastructure while allowing each environment to scale and configure resources independently through variable files.

Results and Lessons

Key Metrics

Efficiency Gains:

Infrastructure deployment: 2 hours → 15 minutes (88% reduction)
Resource costs: 40% optimization through strategic sharing
Scaling characteristics: Adding environment = +40% cost (not +100%)
Environment provisioning: Minutes via terraform apply vs days of manual work

Operational Impact:

Multi-AZ RDS: Automatic failover without manual intervention
Infrastructure as documentation: Terraform files define all resources
Disaster recovery: Rebuild entire platform from code
Team velocity: Developers self-service test environments

What Worked

✅ Shared RDS with schemas: Reliable, cost-effective, operationally simple ✅ EC2-based ECS: 40% cheaper than Fargate with acceptable complexity ✅ Remote state in S3: Enables team collaboration with DynamoDB locking ✅ Terraform as documentation: Code always matches reality

Pitfalls to Avoid

❌ State file handling: Requires careful management and regular backups ❌ RDS backup accumulation: Monitor and clean old snapshots ❌ CloudFront invalidations: Use versioned filenames to avoid charges ❌ Shared resource changes: Coordinate carefully—affects all environments

Conclusion

Multi-environment Terraform infrastructure delivers 40% cost optimization and 88% faster deployments while maintaining proper isolation. The pattern—shared VPC/RDS/ECR, environment-specific ECS/CloudFront/S3—balances efficiency with independence.

Implementation Path:

Deploy shared infrastructure (VPC, RDS, ECR)
Configure one environment completely
Replicate pattern to additional environments
Use remote state and state locking from day one

When This Makes Sense:

Managing 3+ environments
Teams larger than 5 engineers
Frequent infrastructure changes
Need for reproducible deployments

Resources: