The Operational Challenge
Infrastructure as Code with Terraform solves infrastructure provisioning—creating VPCs, databases, container clusters—but production systems require ongoing operational tasks that fall outside Terraform's declarative model. Database seeding, application configuration, container-level operations, and dynamic environment management require imperative automation that responds to runtime conditions rather than declaring desired state.
Operational tasks Terraform cannot handle effectively:
- Database operations: Running migrations, seeding data, managing schemas per environment
- Container management: Discovering running containers, executing commands, streaming logs
- Secrets retrieval: Fetching credentials from Parameter Store at runtime
- Dynamic discovery: Finding infrastructure resources without hardcoded configuration
- Environment-specific workflows: Different operations for development vs production
The integration challenge: How do operational tools discover and interact with dynamically provisioned infrastructure without maintaining static configuration that drifts from reality?
Key Results:
- Automated database seeding across environments (30 minutes → 5 minutes)
- Zero static configuration through dynamic ECS host discovery
- Secrets never stored in code (100% SSM Parameter Store retrieval)
- Environment-agnostic playbooks (single code, all environments)
- 95% operational repeatability through codified workflows
Architecture: Terraform + Ansible Integration
Terraform and Ansible serve complementary roles in infrastructure management: Terraform provisions immutable infrastructure declaratively, Ansible performs mutable operations imperatively. The integration pattern uses resource tagging and AWS APIs as the contract between tools.
Terraform provisions infrastructure:
- VPC, subnets, security groups, routing tables
- RDS PostgreSQL with Multi-AZ configuration
- ECS cluster, task definitions, services
- ECR container registry with lifecycle policies
- S3 buckets, CloudFront distributions
- IAM roles, security groups, load balancers
- SSM Parameter Store population with secrets
- Consistent resource tagging (app_name, environment)
Ansible performs operations:
- Database schema setup and seeding
- ECS container discovery and command execution
- SSM parameter retrieval for runtime configuration
- Application-level migrations and deployments
- Multi-environment operational workflows
- Maintenance tasks and troubleshooting
- Query ECS API for running tasks filtered by tags
- Resolve task → container instance → EC2 instance mapping
- Build dynamic inventory without static files
Integration Pattern
The Contract:
- Terraform provisions infrastructure with consistent tags
- Terraform stores secrets in SSM Parameter Store following naming pattern
- Ansible discovers resources via AWS APIs using tag filters
- Ansible retrieves secrets from SSM using constructed parameter paths
- Ansible performs operations on discovered resources
- No hardcoded infrastructure details in Ansible code
Data Flow:
Benefits:
- Infrastructure changes automatically reflected in Ansible
- No static inventory files to maintain
- Secrets never in code or version control
- Single playbook works across all environments
- Configuration drift impossible (always queries current state)
Implementation: Terraform Output → Ansible Input
The implementation establishes integration through Terraform resource tagging and Ansible dynamic discovery playbooks.
Terraform Configuration for Ansible
SSM Parameter Store Setup:
Terraform creates parameters following predictable naming convention that Ansible can construct:
# terraform/ssm.tf
resource "aws_ssm_parameter" "postgres_host" {
name = "/${var.app_name}/${var.env}/POSTGRES_HOST"
type = "String"
value = data.terraform_remote_state.shared.outputs.rds_address
}
resource "aws_ssm_parameter" "postgres_password" {
name = "/${var.app_name}/${var.env}/POSTGRES_PASSWORD"
type = "SecureString"
value = "${var.db_password}_${var.env}"
}
Parameter Naming Pattern:
/${app_name}/${environment}/${parameter_name}
Examples:
/terraform-letscommerce/dev/POSTGRES_HOST/terraform-letscommerce/prod/POSTGRES_PASSWORD
ECS Service Configuration:
# terraform/ecs.tf
resource "aws_ecs_service" "app" {
name = "${var.app_name}-${var.env}-ecs-service"
cluster = "${var.app_name}-ecs-cluster"
task_definition = aws_ecs_task_definition.app.arn
desired_count = var.task_count
tags = {
Application = var.app_name
Environment = var.env
}
}
Ansible Project Structure
ansible/
├── site.yml # Main orchestration playbook
├── get-ecs-hosts.yml # Dynamic ECS host discovery
├── get-ssm-db-params.yml # SSM parameter retrieval
├── get-docker-container.yml # Container discovery on hosts
├── exec-in-container.yml # Generic command execution
├── seed-db-sql.yml # SQL database initialization
├── seed-db-sequelize.yml # ORM-based migrations/seeding
└── connect-test.yml # Connectivity validation
Main Orchestration (site.yml):
---
# 1. Discover infrastructure
- import_playbook: get-ecs-hosts.yml
# 2. Test connectivity
- import_playbook: connect-test.yml
# 3. Retrieve configuration
- import_playbook: get-ssm-db-params.yml
# 4. Container operations
- import_playbook: get-docker-container.yml
- import_playbook: exec-in-container.yml
# 5. Database operations
- import_playbook: seed-db-sql.yml
- import_playbook: seed-db-sequelize.yml
Dynamic Inventory Pattern
ECS Host Discovery Pattern:
1. Query ECS API for running tasks
→ Filter by: cluster name + service name + environment tag
→ Returns: List of task ARNs
2. Get container instance ARNs from tasks
→ Input: Task ARNs
→ Returns: Container instance ARNs
3. Resolve EC2 instance IDs
→ Input: Container instance ARNs
→ Returns: EC2 instance IDs
4. Get EC2 instance IPs
→ Input: EC2 instance IDs
→ Returns: Public/Private IP addresses
5. Add to dynamic inventory
→ Build in-memory inventory
→ No static files required
Key Point: Discovery chain resolves: Service → Tasks → Container Instances → EC2 Instances → IPs
SSM Parameter Retrieval Pattern:
1. Construct parameter path from variables
→ Pattern: /{app_name}/{environment}/{parameter}
→ Example: /terraform-letscommerce/dev/POSTGRES_HOST
2. Query SSM Parameter Store
→ Request decryption for SecureString parameters
→ Returns: Parameter values
3. Build configuration object
→ Assemble all parameters into structured config
→ Available for all subsequent playbooks
Benefits:
- Same code retrieves dev/staging/prod parameters (environment variable selects)
- Secrets decrypted at runtime, never stored in code
- Terraform updates parameters → Ansible sees changes immediately
Operational Workflows
Production operations codified as repeatable Ansible playbooks that discover infrastructure dynamically and execute environment-specific tasks.
Workflow 1: Database Seeding
Process Flow:
1. Connect to discovered ECS hosts
→ Uses dynamic inventory from discovery phase
2. Find target Docker container
→ Match by container name pattern
→ Store container ID for operations
3. Detect application directory
→ Search common paths (/app, /usr/src/app, /opt/app)
→ Identify by presence of package.json
4. Run database migrations
→ Execute: docker exec {container} sequelize db:migrate
→ Runs in all environments
5. Seed database (conditional)
→ Execute: docker exec {container} sequelize db:seed:all
→ Only in development environment
Environment-Specific Logic:
- Development: Migrations + full data seeding
- Staging: Migrations + minimal test data
- Production: Migrations only (no seeding)
Workflow 2: Container Operations
Pattern:
1. Container Discovery
→ Query Docker daemon on ECS hosts
→ Filter by container name pattern (ecs-*-server)
→ Store container ID for operations
2. Command Execution
→ Execute: docker exec {container_id} {command}
→ Capture and return output
→ Log results for audit
3. Common Operations
→ Clear application cache
→ Rebuild search indexes
→ Generate reports
→ Health checks
→ Debug production issues
Workflow 3: Complete Environment Setup
Execution Flow:
1. Infrastructure Discovery
→ Find ECS hosts running tasks for environment
→ Build dynamic inventory
2. Connectivity Validation
→ Verify SSH access to discovered hosts
→ Test network connectivity
3. Secrets Retrieval
→ Fetch database credentials from SSM
→ Decrypt secure parameters
4. Container Identification
→ Discover application containers on hosts
→ Store container IDs
5. Database Operations
→ Run migrations (all environments)
→ Seed data (development only)
6. Validation
→ Confirm all operations successful
→ Generate audit log
Time Comparison:
- Manual: 30-45 minutes (error-prone, not repeatable)
- Automated: 5 minutes (consistent, repeatable, auditable)
Dynamic Discovery Patterns
Dynamic discovery eliminates static configuration by querying AWS APIs to determine current infrastructure state at runtime.
ECS to EC2 Resolution
Discovery Chain:
aws ecs list-tasks aws ecs describe-tasks ECS Service Name Task ARNs Container Instance ARNs
aws ec2 describe-instances ansible add_host EC2 Instance IDs EC2 IP Addresses Ansible Inventory
Flow: ECS Service → Tasks → Container Instances → (aws ecs describe-container-instances) → EC2 IDs → IPs → Ansible Inventory
Why This Matters:
- ECS tasks can move between instances
- Instances can be replaced by auto-scaling
- IP addresses change with instance recreation
- Static inventory becomes stale immediately
Dynamic Discovery Ensures:
- Always connects to currently running instances
- Adapts to scaling events automatically
- Survives instance replacements
- No manual inventory updates needed
Container Discovery
Challenge: Multiple containers run on each ECS instance. Which container hosts the application?
Discovery Pattern:
- name: Find application container
become: true
shell: >
docker ps
--filter "label=com.amazonaws.ecs.task-definition-family={{ app_name }}-{{ env }}"
--format "{{ '{{' }}.ID{{ '}}' }}"
register: container_id
Discovery Criteria:
- ECS task definition family labels
- Container name patterns
- Running state filter
- First match selection
Result: target_container variable set for all subsequent operations
SSM Parameter Discovery
Parameter Path Construction:
# Variables
app_name: terraform-letscommerce
env: dev
parameter: POSTGRES_HOST
# Constructed path
parameter_path: '/{{ app_name }}/{{ env }}/{{ parameter }}'
# Result: /terraform-letscommerce/dev/POSTGRES_HOST
Advantages:
- No hardcoded parameter names
- Environment-specific values automatic
- Terraform and Ansible use same naming convention
- Adding new parameters requires no Ansible changes
Security:
- SecureString parameters encrypted at rest
- Decryption with
--with-decryptionflag - IAM policies control access per environment
- Secrets never appear in code or logs
Results and Lessons
Key Metrics
Operational Efficiency:
- Database seeding time: 30 minutes → 5 minutes (83% reduction)
- Environment setup: 2 hours → 10 minutes (full operational readiness)
- Configuration drift: 0% (dynamic discovery prevents stale configuration)
- Operational repeatability: 95% (codified workflows, consistent execution)
- Manual intervention: Reduced from 10+ steps to single command
Operational Impact:
- New environment provisioning: Infrastructure (Terraform) + operations (Ansible) fully automated
- Disaster recovery: Complete environment recreation from code
- Team onboarding: Operations documented as executable code
- Debugging: Standardized workflows for container inspection and troubleshooting
What Worked
✅ Dynamic inventory pattern: Zero static configuration, automatic adaptation to infrastructure changes
✅ Tag-based discovery: Terraform tags become Ansible query filters—simple, reliable contract
✅ SSM parameter integration: Secrets never in code, automatic environment-specific retrieval
✅ Idempotent operations: Database seeding safe to re-run, migrations track state
✅ Single playbook pattern: One codebase for all environments, environment parameter selects behavior
✅ Terraform + Ansible separation: Clear responsibilities prevent tool overlap and confusion
Pitfalls to Avoid
❌ Hardcoding infrastructure details: IPs, hostnames, container IDs in playbooks guarantee stale configuration
❌ Static inventory files: Become outdated immediately when ECS scales or instances replace
❌ Inconsistent tagging: Terraform resources without standard tags break discovery queries
❌ Missing IAM permissions: Ansible needs ECS:Describe*, EC2:Describe*, SSM:GetParameter permissions
❌ Non-idempotent operations: Database operations that fail when re-run prevent automation
❌ Ignoring error handling: Production operations need comprehensive failure detection and reporting
Conclusion
Operational automation with Ansible complements Terraform infrastructure provisioning by handling imperative tasks that require runtime state awareness. The integration pattern—Terraform provisions with tags, Ansible discovers dynamically—eliminates static configuration and prevents drift.
Implementation Path:
- Establish Terraform tagging strategy (app_name, environment on all resources)
- Create SSM parameters with predictable naming convention
- Implement ECS host discovery playbook
- Build SSM parameter retrieval for secrets
- Develop operational workflow playbooks (database, containers)
- Test in development, validate in staging, deploy to production
When This Makes Sense:
- Managing operational tasks beyond infrastructure provisioning
- Multi-environment AWS deployments requiring consistent operations
- Teams codifying operational knowledge for repeatability
- Need for automated database seeding, migrations, container operations
- Organizations eliminating manual operational procedures
Resources: