Operational Automation with Ansible on Terraform-Managed AWS Infrastructure

Operational Automation with Ansible on Terraform-Managed AWS Infrastructure

The Operational Challenge

Infrastructure as Code with Terraform solves infrastructure provisioning—creating VPCs, databases, container clusters—but production systems require ongoing operational tasks that fall outside Terraform's declarative model. Database seeding, application configuration, container-level operations, and dynamic environment management require imperative automation that responds to runtime conditions rather than declaring desired state.

Operational tasks Terraform cannot handle effectively:

  • Database operations: Running migrations, seeding data, managing schemas per environment
  • Container management: Discovering running containers, executing commands, streaming logs
  • Secrets retrieval: Fetching credentials from Parameter Store at runtime
  • Dynamic discovery: Finding infrastructure resources without hardcoded configuration
  • Environment-specific workflows: Different operations for development vs production

The integration challenge: How do operational tools discover and interact with dynamically provisioned infrastructure without maintaining static configuration that drifts from reality?

Key Results:

  • Automated database seeding across environments (30 minutes → 5 minutes)
  • Zero static configuration through dynamic ECS host discovery
  • Secrets never stored in code (100% SSM Parameter Store retrieval)
  • Environment-agnostic playbooks (single code, all environments)
  • 95% operational repeatability through codified workflows

Architecture: Terraform + Ansible Integration

Terraform and Ansible serve complementary roles in infrastructure management: Terraform provisions immutable infrastructure declaratively, Ansible performs mutable operations imperatively. The integration pattern uses resource tagging and AWS APIs as the contract between tools.

RDSECSSSMTagsECS APISSM APIEC2 APIDiscoverySecretsDB SeedingContainer OpsCloud InfrastructureTerraformAWS APIsAnsible

Terraform provisions infrastructure:

  • VPC, subnets, security groups, routing tables
  • RDS PostgreSQL with Multi-AZ configuration
  • ECS cluster, task definitions, services
  • ECR container registry with lifecycle policies
  • S3 buckets, CloudFront distributions
  • IAM roles, security groups, load balancers
  • SSM Parameter Store population with secrets
  • Consistent resource tagging (app_name, environment)

Ansible performs operations:

  • Database schema setup and seeding
  • ECS container discovery and command execution
  • SSM parameter retrieval for runtime configuration
  • Application-level migrations and deployments
  • Multi-environment operational workflows
  • Maintenance tasks and troubleshooting
  • Query ECS API for running tasks filtered by tags
  • Resolve task → container instance → EC2 instance mapping
  • Build dynamic inventory without static files

Integration Pattern

The Contract:

  1. Terraform provisions infrastructure with consistent tags
  2. Terraform stores secrets in SSM Parameter Store following naming pattern
  3. Ansible discovers resources via AWS APIs using tag filters
  4. Ansible retrieves secrets from SSM using constructed parameter paths
  5. Ansible performs operations on discovered resources
  6. No hardcoded infrastructure details in Ansible code

Data Flow:

Parameter StoreAnsibleAWS InfrastructureTerraformParameter StoreAnsibleAWS InfrastructureTerraformProvision ECS cluster with tagsStore database credentialsQuery ECS API (filter by tags)Return running tasks + instancesDescribe EC2 instancesReturn instance IPsGet parameters (constructed path)Return decrypted secretsSSH/SSM to instancesExecute operations in containers

Benefits:

  • Infrastructure changes automatically reflected in Ansible
  • No static inventory files to maintain
  • Secrets never in code or version control
  • Single playbook works across all environments
  • Configuration drift impossible (always queries current state)

Implementation: Terraform Output → Ansible Input

The implementation establishes integration through Terraform resource tagging and Ansible dynamic discovery playbooks.

Terraform Configuration for Ansible

SSM Parameter Store Setup:

Terraform creates parameters following predictable naming convention that Ansible can construct:

# terraform/ssm.tf
resource "aws_ssm_parameter" "postgres_host" {
  name  = "/${var.app_name}/${var.env}/POSTGRES_HOST"
  type  = "String"
  value = data.terraform_remote_state.shared.outputs.rds_address
}

resource "aws_ssm_parameter" "postgres_password" {
  name  = "/${var.app_name}/${var.env}/POSTGRES_PASSWORD"
  type  = "SecureString"
  value = "${var.db_password}_${var.env}"
}

Parameter Naming Pattern:

/${app_name}/${environment}/${parameter_name}

Examples:

  • /terraform-letscommerce/dev/POSTGRES_HOST
  • /terraform-letscommerce/prod/POSTGRES_PASSWORD

ECS Service Configuration:

# terraform/ecs.tf
resource "aws_ecs_service" "app" {
  name            = "${var.app_name}-${var.env}-ecs-service"
  cluster         = "${var.app_name}-ecs-cluster"
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = var.task_count

  tags = {
    Application = var.app_name
    Environment = var.env
  }
}

Ansible Project Structure

ansible/
├── site.yml                       # Main orchestration playbook
├── get-ecs-hosts.yml             # Dynamic ECS host discovery
├── get-ssm-db-params.yml         # SSM parameter retrieval
├── get-docker-container.yml      # Container discovery on hosts
├── exec-in-container.yml         # Generic command execution
├── seed-db-sql.yml               # SQL database initialization
├── seed-db-sequelize.yml         # ORM-based migrations/seeding
└── connect-test.yml              # Connectivity validation

Main Orchestration (site.yml):

---
# 1. Discover infrastructure
- import_playbook: get-ecs-hosts.yml

# 2. Test connectivity
- import_playbook: connect-test.yml

# 3. Retrieve configuration
- import_playbook: get-ssm-db-params.yml

# 4. Container operations
- import_playbook: get-docker-container.yml
- import_playbook: exec-in-container.yml

# 5. Database operations
- import_playbook: seed-db-sql.yml
- import_playbook: seed-db-sequelize.yml

Dynamic Inventory Pattern

ECS Host Discovery Pattern:

1. Query ECS API for running tasks
   → Filter by: cluster name + service name + environment tag
   → Returns: List of task ARNs

2. Get container instance ARNs from tasks
   → Input: Task ARNs
   → Returns: Container instance ARNs

3. Resolve EC2 instance IDs
   → Input: Container instance ARNs
   → Returns: EC2 instance IDs

4. Get EC2 instance IPs
   → Input: EC2 instance IDs
   → Returns: Public/Private IP addresses

5. Add to dynamic inventory
   → Build in-memory inventory
   → No static files required

Key Point: Discovery chain resolves: Service → Tasks → Container Instances → EC2 Instances → IPs

SSM Parameter Retrieval Pattern:

1. Construct parameter path from variables
   → Pattern: /{app_name}/{environment}/{parameter}
   → Example: /terraform-letscommerce/dev/POSTGRES_HOST

2. Query SSM Parameter Store
   → Request decryption for SecureString parameters
   → Returns: Parameter values

3. Build configuration object
   → Assemble all parameters into structured config
   → Available for all subsequent playbooks

Benefits:

  • Same code retrieves dev/staging/prod parameters (environment variable selects)
  • Secrets decrypted at runtime, never stored in code
  • Terraform updates parameters → Ansible sees changes immediately

Operational Workflows

Production operations codified as repeatable Ansible playbooks that discover infrastructure dynamically and execute environment-specific tasks.

Workflow 1: Database Seeding

Process Flow:

1. Connect to discovered ECS hosts
   → Uses dynamic inventory from discovery phase

2. Find target Docker container
   → Match by container name pattern
   → Store container ID for operations

3. Detect application directory
   → Search common paths (/app, /usr/src/app, /opt/app)
   → Identify by presence of package.json

4. Run database migrations
   → Execute: docker exec {container} sequelize db:migrate
   → Runs in all environments

5. Seed database (conditional)
   → Execute: docker exec {container} sequelize db:seed:all
   → Only in development environment

Environment-Specific Logic:

  • Development: Migrations + full data seeding
  • Staging: Migrations + minimal test data
  • Production: Migrations only (no seeding)

Workflow 2: Container Operations

Pattern:

1. Container Discovery
   → Query Docker daemon on ECS hosts
   → Filter by container name pattern (ecs-*-server)
   → Store container ID for operations

2. Command Execution
   → Execute: docker exec {container_id} {command}
   → Capture and return output
   → Log results for audit

3. Common Operations
   → Clear application cache
   → Rebuild search indexes
   → Generate reports
   → Health checks
   → Debug production issues

Workflow 3: Complete Environment Setup

Execution Flow:

1. Infrastructure Discovery
   → Find ECS hosts running tasks for environment
   → Build dynamic inventory

2. Connectivity Validation
   → Verify SSH access to discovered hosts
   → Test network connectivity

3. Secrets Retrieval
   → Fetch database credentials from SSM
   → Decrypt secure parameters

4. Container Identification
   → Discover application containers on hosts
   → Store container IDs

5. Database Operations
   → Run migrations (all environments)
   → Seed data (development only)

6. Validation
   → Confirm all operations successful
   → Generate audit log

Time Comparison:

  • Manual: 30-45 minutes (error-prone, not repeatable)
  • Automated: 5 minutes (consistent, repeatable, auditable)

Dynamic Discovery Patterns

Dynamic discovery eliminates static configuration by querying AWS APIs to determine current infrastructure state at runtime.

ECS to EC2 Resolution

Discovery Chain:

   aws ecs list-tasks   

   aws ecs describe-tasks   

   ECS Service Name   

   Task ARNs   

   Container Instance ARNs   

   aws ec2 describe-instances   

   ansible add_host   

   EC2 Instance IDs   

   EC2 IP Addresses   

   Ansible Inventory   

Flow: ECS Service → Tasks → Container Instances → (aws ecs describe-container-instances) → EC2 IDs → IPs → Ansible Inventory

Why This Matters:

  • ECS tasks can move between instances
  • Instances can be replaced by auto-scaling
  • IP addresses change with instance recreation
  • Static inventory becomes stale immediately

Dynamic Discovery Ensures:

  • Always connects to currently running instances
  • Adapts to scaling events automatically
  • Survives instance replacements
  • No manual inventory updates needed

Container Discovery

Challenge: Multiple containers run on each ECS instance. Which container hosts the application?

Discovery Pattern:

- name: Find application container
  become: true
  shell: >
    docker ps
    --filter "label=com.amazonaws.ecs.task-definition-family={{ app_name }}-{{ env }}"
    --format "{{ '{{' }}.ID{{ '}}' }}"
  register: container_id

Discovery Criteria:

  • ECS task definition family labels
  • Container name patterns
  • Running state filter
  • First match selection

Result: target_container variable set for all subsequent operations

SSM Parameter Discovery

Parameter Path Construction:

# Variables
app_name: terraform-letscommerce
env: dev
parameter: POSTGRES_HOST

# Constructed path
parameter_path: '/{{ app_name }}/{{ env }}/{{ parameter }}'
# Result: /terraform-letscommerce/dev/POSTGRES_HOST

Advantages:

  • No hardcoded parameter names
  • Environment-specific values automatic
  • Terraform and Ansible use same naming convention
  • Adding new parameters requires no Ansible changes

Security:

  • SecureString parameters encrypted at rest
  • Decryption with --with-decryption flag
  • IAM policies control access per environment
  • Secrets never appear in code or logs

Results and Lessons

Key Metrics

Operational Efficiency:

  • Database seeding time: 30 minutes → 5 minutes (83% reduction)
  • Environment setup: 2 hours → 10 minutes (full operational readiness)
  • Configuration drift: 0% (dynamic discovery prevents stale configuration)
  • Operational repeatability: 95% (codified workflows, consistent execution)
  • Manual intervention: Reduced from 10+ steps to single command

Operational Impact:

  • New environment provisioning: Infrastructure (Terraform) + operations (Ansible) fully automated
  • Disaster recovery: Complete environment recreation from code
  • Team onboarding: Operations documented as executable code
  • Debugging: Standardized workflows for container inspection and troubleshooting

What Worked

Dynamic inventory pattern: Zero static configuration, automatic adaptation to infrastructure changes

Tag-based discovery: Terraform tags become Ansible query filters—simple, reliable contract

SSM parameter integration: Secrets never in code, automatic environment-specific retrieval

Idempotent operations: Database seeding safe to re-run, migrations track state

Single playbook pattern: One codebase for all environments, environment parameter selects behavior

Terraform + Ansible separation: Clear responsibilities prevent tool overlap and confusion

Pitfalls to Avoid

Hardcoding infrastructure details: IPs, hostnames, container IDs in playbooks guarantee stale configuration

Static inventory files: Become outdated immediately when ECS scales or instances replace

Inconsistent tagging: Terraform resources without standard tags break discovery queries

Missing IAM permissions: Ansible needs ECS:Describe*, EC2:Describe*, SSM:GetParameter permissions

Non-idempotent operations: Database operations that fail when re-run prevent automation

Ignoring error handling: Production operations need comprehensive failure detection and reporting

Conclusion

Operational automation with Ansible complements Terraform infrastructure provisioning by handling imperative tasks that require runtime state awareness. The integration pattern—Terraform provisions with tags, Ansible discovers dynamically—eliminates static configuration and prevents drift.

Implementation Path:

  1. Establish Terraform tagging strategy (app_name, environment on all resources)
  2. Create SSM parameters with predictable naming convention
  3. Implement ECS host discovery playbook
  4. Build SSM parameter retrieval for secrets
  5. Develop operational workflow playbooks (database, containers)
  6. Test in development, validate in staging, deploy to production

When This Makes Sense:

  • Managing operational tasks beyond infrastructure provisioning
  • Multi-environment AWS deployments requiring consistent operations
  • Teams codifying operational knowledge for repeatability
  • Need for automated database seeding, migrations, container operations
  • Organizations eliminating manual operational procedures

Resources: