Building Production-Ready Multi-Environment AWS Infrastructure with Terraform

Multi-Environment AWS Infrastructure with Terraform

The Multi-Environment Challenge

Managing multiple AWS environments—development, staging, production—creates a fundamental dilemma: duplicate everything and triple costs, or share resources and risk environment coupling. Manual AWS deployments compound the problem through configuration drift, slow provisioning (hours instead of minutes), and documentation that never matches reality.

Infrastructure as Code with Terraform solves these problems, but implementation patterns matter. This article documents production-tested approaches for multi-environment AWS infrastructure, demonstrating how to balance cost efficiency, operational simplicity, and team velocity.

Key Results:

  • 40% resource optimization through strategic infrastructure sharing
  • 88% deployment time reduction (120 minutes → 15 minutes)
  • Shared infrastructure scales at ~40% per environment (vs 100% duplication)
  • Complete environment provisioning in minutes via terraform apply

Deployment Time Comparison

Architecture: Shared Foundation, Isolated Services

The core pattern: share infrastructure with negligible isolation risk (VPC, RDS, ECR), duplicate resources requiring independence (ECS, CloudFront, S3).

CloudFront CDNS3 Static AssetsApplication LoadBalancerECS Container1ECS Container2ECS Container3PostgreSQLRDSS3 Data StorageSQS QueuesLambdaFunctionsInternetAWS CloudContent DeliveryApplication LayerData LayerAsync Processing

Service Stack:

  • Compute: ECS on EC2 (40% cheaper than Fargate with acceptable complexity)
  • Database: Single RDS PostgreSQL with schema-based environment isolation
  • Storage: S3 + CloudFront per environment (prevents cache pollution between staging/prod)
  • Serverless: Lambda for image processing, AI content generation (AWS Bedrock)
  • Queues: SQS for background jobs with dead-letter queues

Multi-Environment Structure

Production

Staging

Development

Shared Infrastructure

VPC & Networking

10.0.0.0/16

RDS PostgreSQL

Multi-AZ, Schema Isolation

ECR Container Registry

Tag-based Versioning

ECS Cluster

t3.small

CloudFront

dev.domain.com

S3 Buckets

ECS Cluster

t3.medium

CloudFront

staging.domain.com

S3 Buckets

ECS Cluster

t3.large

CloudFront

domain.com

S3 Buckets

Resource Optimization:

  • Single RDS vs 3 instances: 65% cost reduction
  • Shared VPC eliminates duplicate networking overhead
  • ECR with tags (dev-v1.0.0, prod-v1.0.0) vs separate registries
  • Overall: 40% resource optimization

Implementation: Directory Structure

The implementation organizes Terraform configuration into two distinct layers with clear separation of concerns.

Directory Organization

terraform/
├── shared/              # Deploy once, serves all environments
│   ├── main.tf         # Remote state backend
│   ├── vpc.tf          # VPC, subnets, NAT
│   ├── rds.tf          # PostgreSQL Multi-AZ
│   ├── ecr.tf          # Container registry
│   ├── security.tf     # Security groups
│   └── outputs.tf      # Export IDs
│
├── environments/        # Environment-specific variables
│   ├── dev.tfvars      # Development config
│   ├── staging.tfvars  # Staging config
│   └── prod.tfvars     # Production config
│
├── main.tf             # Remote state references
├── ecs.tf              # Task definitions
├── lb.tf               # Load balancers
├── cloudfront.tf       # CDN distributions
├── s3.tf               # Storage buckets
├── ssm.tf              # Credentials
└── sqs.tf              # Async queues

Shared Infrastructure Fields

VPC Configuration (shared/vpc.tf):

  • CIDR block allocation (typically /16)
  • Public subnets across availability zones
  • Private subnets for database and containers
  • NAT instances (not NAT Gateway for cost optimization)
  • Route tables and associations

RDS Configuration (shared/rds.tf):

  • Instance class sizing (start with db.t3.micro)
  • Multi-AZ deployment for automatic failover
  • Backup retention (7-30 days)
  • Engine version (PostgreSQL 16+)
  • Subnet groups referencing VPC private subnets
  • Security groups limiting access to ECS only
  • Lifecycle prevent_destroy protection

ECR Configuration (shared/ecr.tf):

  • Repository naming convention
  • Image tag mutability settings
  • Lifecycle policies for cleanup (untagged images, old releases)
  • Encryption configuration (AES256)

Environment-Specific Fields

ECS Task Definition (ecs.tf):

  • Family naming with environment suffix
  • CPU and memory allocation (varies by environment)
  • Container definitions via templatefile
  • Network mode (awsvpc for modern configurations)
  • IAM execution and task roles
  • Environment variables and secrets from SSM

ECS Service Configuration:

  • Desired task count (scales with environment)
  • Launch type (EC2 vs Fargate)
  • Network configuration referencing shared VPC
  • Load balancer attachment to target groups
  • Auto-scaling policies based on CPU/memory

CloudFront Distribution (cloudfront.tf):

  • Origin configuration pointing to S3 buckets
  • Cache behavior policies
  • SSL certificates from ACM
  • Price class selection (all edges vs regional)
  • Custom error responses for SPA routing

S3 Buckets (s3.tf):

  • Bucket naming with environment suffix
  • CORS configuration for asset uploads
  • Bucket policies restricting access to CloudFront
  • Encryption at rest configuration

State Management Pattern

Remote State Backend (shared/main.tf):

  • S3 bucket for state storage
  • DynamoDB table for state locking
  • Encryption enabled
  • Versioning for rollback capability

Remote State References (main.tf):

  • terraform_remote_state data sources
  • Consumption of shared layer outputs (VPC ID, subnet IDs, security groups)
  • Dependency management between layers

This structure enables independent deployment of shared infrastructure while allowing each environment to scale and configure resources independently through variable files.

Results and Lessons

Key Metrics

Efficiency Gains:

  • Infrastructure deployment: 2 hours → 15 minutes (88% reduction)
  • Resource costs: 40% optimization through strategic sharing
  • Scaling characteristics: Adding environment = +40% cost (not +100%)
  • Environment provisioning: Minutes via terraform apply vs days of manual work

Scaling Cost Comparison

Operational Impact:

  • Multi-AZ RDS: Automatic failover without manual intervention
  • Infrastructure as documentation: Terraform files define all resources
  • Disaster recovery: Rebuild entire platform from code
  • Team velocity: Developers self-service test environments

What Worked

Shared RDS with schemas: Reliable, cost-effective, operationally simple ✅ EC2-based ECS: 40% cheaper than Fargate with acceptable complexity ✅ Remote state in S3: Enables team collaboration with DynamoDB locking ✅ Terraform as documentation: Code always matches reality

Pitfalls to Avoid

State file handling: Requires careful management and regular backups ❌ RDS backup accumulation: Monitor and clean old snapshots ❌ CloudFront invalidations: Use versioned filenames to avoid charges ❌ Shared resource changes: Coordinate carefully—affects all environments

Conclusion

Multi-environment Terraform infrastructure delivers 40% cost optimization and 88% faster deployments while maintaining proper isolation. The pattern—shared VPC/RDS/ECR, environment-specific ECS/CloudFront/S3—balances efficiency with independence.

Implementation Path:

  1. Deploy shared infrastructure (VPC, RDS, ECR)
  2. Configure one environment completely
  3. Replicate pattern to additional environments
  4. Use remote state and state locking from day one

When This Makes Sense:

  • Managing 3+ environments
  • Teams larger than 5 engineers
  • Frequent infrastructure changes
  • Need for reproducible deployments

Resources: