Understanding Resource Dependencies & Data Sources

Welcome to Day 5! Today we’ll explore how Terraform manages relationships between resources and how to query existing infrastructure using data sources. Understanding dependencies is crucial for building complex, reliable infrastructure.

🎯 Today’s Goals

Understand implicit vs explicit dependencies
Master the depends_on meta-argument
Learn about data sources and their uses
Query existing AWS resources
Build infrastructure that references external resources
Understand the resource graph

🔗 Resource Dependencies

When building infrastructure, resources often depend on each other. Terraform needs to know the order to create or destroy them.

Example Dependency Chain

    VPC
     │
     ├─► Subnet
     │     │
     │     └─► EC2 Instance
     │
     └─► Internet Gateway
           │
           └─► Route Table

🤝 Implicit Dependencies

Implicit dependencies are automatically detected when one resource references another’s attributes.

# VPC is created first (no dependencies)
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

# Subnet depends on VPC (implicit dependency)
resource "aws_subnet" "public" {
  vpc_id     = aws_vpc.main.id  # ← This creates implicit dependency
  cidr_block = "10.0.1.0/24"
}

# Instance depends on Subnet (implicit dependency)
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"
  subnet_id     = aws_subnet.public.id  # ← Implicit dependency
}

Terraform’s creation order:

VPC
Subnet (waits for VPC)
Instance (waits for Subnet)

Destruction order (reverse):

Instance
Subnet
VPC

📌 Explicit Dependencies (depends_on)

Sometimes dependencies exist that Terraform can’t detect automatically. Use depends_on for explicit dependencies.

When to Use depends_on

# IAM role must exist before instance profile
resource "aws_iam_role" "instance_role" {
  name = "instance-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })
}

# IAM policy attachment
resource "aws_iam_role_policy_attachment" "instance_policy" {
  role       = aws_iam_role.instance_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Instance profile needs the policy to be attached
resource "aws_iam_instance_profile" "instance_profile" {
  name = "instance-profile"
  role = aws_iam_role.instance_role.name

  # Explicit dependency - ensure policy is attached first
  depends_on = [aws_iam_role_policy_attachment.instance_policy]
}

Multiple Dependencies

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"

  depends_on = [
    aws_iam_instance_profile.instance_profile,
    aws_security_group.web,
    aws_subnet.public
  ]
}

🔍 Data Sources

Data sources allow Terraform to query existing infrastructure or external information. They don’t create resources - they only read data.

Data Source Syntax

data "provider_resource" "name" {
  # Filter criteria
}

# Reference with: data.provider_resource.name.attribute

Example: Query Existing VPC

# Query existing VPC by tag
data "aws_vpc" "existing" {
  tags = {
    Name = "production-vpc"
  }
}

# Use the VPC ID in a new subnet
resource "aws_subnet" "new_subnet" {
  vpc_id     = data.aws_vpc.existing.id
  cidr_block = "10.0.10.0/24"
}

📚 Common AWS Data Sources

1. AWS AMI (Amazon Machine Image)

# Get latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t2.micro"
}

2. AWS Availability Zones

# Get all available AZs in current region
data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_subnet" "public" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]
}

3. AWS Account Information

data "aws_caller_identity" "current" {}

output "account_id" {
  value = data.aws_caller_identity.current.account_id
}

output "caller_arn" {
  value = data.aws_caller_identity.current.arn
}

4. AWS Region

data "aws_region" "current" {}

output "current_region" {
  value = data.aws_region.current.name
}

5. Existing Security Group

data "aws_security_group" "default" {
  name   = "default"
  vpc_id = aws_vpc.main.id
}

6. Existing Subnet

data "aws_subnet" "selected" {
  filter {
    name   = "tag:Name"
    values = ["production-subnet-1"]
  }
}

🧪 Hands-On Lab: Dependencies & Data Sources

Let’s build a complete infrastructure using both implicit/explicit dependencies and data sources!

Step 1: Create Project Directory

mkdir terraform-dependencies-lab
cd terraform-dependencies-lab

Step 2: Create data-sources.tf

# data-sources.tf

# Get current AWS region
data "aws_region" "current" {}

# Get current AWS account
data "aws_caller_identity" "current" {}

# Get available availability zones
data "aws_availability_zones" "available" {
  state = "available"
}

# Get latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  filter {
    name   = "root-device-type"
    values = ["ebs"]
  }
}

Step 3: Create main.tf

# main.tf

terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

# VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "dependencies-lab-vpc"
  }
}

# Internet Gateway (implicit dependency on VPC)
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "dependencies-lab-igw"
  }
}

# Public Subnets using data source for AZs
resource "aws_subnet" "public" {
  count = 2

  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.${count.index + 1}.0/24"
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "public-subnet-${count.index + 1}"
    AZ   = data.aws_availability_zones.available.names[count.index]
  }
}

# Route Table
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "public-route-table"
  }
}

# Route Table Association
resource "aws_route_table_association" "public" {
  count = 2

  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# Security Group
resource "aws_security_group" "web" {
  name        = "web-security-group"
  description = "Allow HTTP and SSH"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "SSH"
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTP"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "web-sg"
  }
}

# IAM Role for EC2
resource "aws_iam_role" "ec2_role" {
  name = "ec2-ssm-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })

  tags = {
    Name = "ec2-ssm-role"
  }
}

# Attach SSM policy to role
resource "aws_iam_role_policy_attachment" "ssm_policy" {
  role       = aws_iam_role.ec2_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Instance Profile (explicit dependency on policy attachment)
resource "aws_iam_instance_profile" "ec2_profile" {
  name = "ec2-instance-profile"
  role = aws_iam_role.ec2_role.name

  # Explicit dependency to ensure policy is attached first
  depends_on = [aws_iam_role_policy_attachment.ssm_policy]
}

# EC2 Instance using AMI from data source
resource "aws_instance" "web" {
  ami                    = data.aws_ami.amazon_linux.id
  instance_type          = "t2.micro"
  subnet_id              = aws_subnet.public[0].id
  vpc_security_group_ids = [aws_security_group.web.id]
  iam_instance_profile   = aws_iam_instance_profile.ec2_profile.name

  user_data = <<-EOF
              #!/bin/bash
              yum update -y
              yum install -y httpd
              systemctl start httpd
              systemctl enable httpd
              echo "<h1>Hello from Terraform!</h1>" > /var/www/html/index.html
              echo "<p>Instance in ${data.aws_availability_zones.available.names[0]}</p>" >> /var/www/html/index.html
              echo "<p>AMI: ${data.aws_ami.amazon_linux.id}</p>" >> /var/www/html/index.html
              EOF

  tags = {
    Name = "web-server"
  }

  # Explicit dependency on route table association
  depends_on = [aws_route_table_association.public]
}

Step 4: Create outputs.tf

# outputs.tf

output "account_id" {
  description = "AWS Account ID"
  value       = data.aws_caller_identity.current.account_id
}

output "region" {
  description = "AWS Region"
  value       = data.aws_region.current.name
}

output "availability_zones" {
  description = "Available AZs"
  value       = data.aws_availability_zones.available.names
}

output "ami_id" {
  description = "AMI ID used for instance"
  value       = data.aws_ami.amazon_linux.id
}

output "ami_name" {
  description = "AMI name"
  value       = data.aws_ami.amazon_linux.name
}

output "vpc_id" {
  description = "VPC ID"
  value       = aws_vpc.main.id
}

output "subnet_ids" {
  description = "Subnet IDs"
  value       = aws_subnet.public[*].id
}

output "instance_id" {
  description = "EC2 Instance ID"
  value       = aws_instance.web.id
}

output "instance_public_ip" {
  description = "EC2 Instance Public IP"
  value       = aws_instance.web.public_ip
}

output "website_url" {
  description = "Website URL"
  value       = "http://${aws_instance.web.public_ip}"
}

Step 5: Initialize and Plan

terraform init
terraform plan

Notice in the plan:

Data sources are read first
Resources are created in dependency order
Implicit dependencies shown with arrows

Step 6: Visualize Dependencies

terraform graph | dot -Tpng > dependencies.png

Open dependencies.png to see the dependency graph!

Step 7: Apply Configuration

terraform apply

Type yes to confirm.

Step 8: Test the Website

After apply completes:

# Get the website URL
terraform output website_url
# Test with curl
curl $(terraform output -raw website_url)

You should see the HTML page with instance details!

Step 9: Examine Data Source Values

terraform output ami_id
terraform output ami_name
terraform output availability_zones

These values were queried from AWS, not hardcoded!

Step 10: Understand the Dependency Chain

Data Sources (Read First)
├── aws_region.current
├── aws_caller_identity.current
├── aws_availability_zones.available
└── aws_ami.amazon_linux

Resources (Created in Order)
├── 1. aws_vpc.main
├── 2. aws_internet_gateway.main (depends on VPC)
├── 3. aws_subnet.public[0,1] (depends on VPC, uses AZ data)
├── 4. aws_security_group.web (depends on VPC)
├── 5. aws_iam_role.ec2_role
├── 6. aws_iam_role_policy_attachment.ssm_policy (depends on role)
├── 7. aws_iam_instance_profile.ec2_profile (explicit depends_on policy)
├── 8. aws_route_table.public (depends on VPC and IGW)
├── 9. aws_route_table_association.public[0,1] (depends on subnet and RT)
└── 10. aws_instance.web (depends on subnet, SG, profile, uses AMI data)

Step 11: Clean Up

terraform destroy

Terraform destroys in reverse dependency order!

🎨 Resource Graph

Terraform builds a dependency graph to determine execution order:

# Generate graph in DOT format
terraform graph
# With specific plan
terraform graph -type=plan
# For destroy operations
terraform graph -type=plan-destroy

📊 Data Sources vs Resources

Data Source	Resource
Reads existing infrastructure	Creates new infrastructure
`data "aws_vpc" "main"`	`resource "aws_vpc" "main"`
Referenced with `data.aws_vpc.main`	Referenced with `aws_vpc.main`
Read-only	Create/Update/Delete
No state changes	Manages state

🔑 Key Concepts

Implicit Dependencies

Created automatically when referencing attributes
Most common type
Terraform detects them automatically

Explicit Dependencies

Use depends_on meta-argument
For non-obvious dependencies
Accepts list of resources

Data Sources

Query existing infrastructure
Read external information
Don’t create resources
Evaluated before resources

📝 Best Practices

✅ DO:

Prefer implicit dependencies

 subnet_id = aws_subnet.main.id  # ✅ Implicit

Use depends_on sparingly

 # Only when necessary
 depends_on = [aws_iam_role_policy_attachment.policy]

Use data sources for existing resources

 data "aws_ami" "latest" {
   most_recent = true
 }

Document why depends_on is needed

 depends_on = [aws_route_table.main]
 # Ensure route exists before instance tries to access internet

❌ DON’T:

Don’t use depends_on when implicit dependencies work
Don’t create circular dependencies
Don’t hardcode AMI IDs - use data sources
Don’t assume resource creation order without dependencies

📝 Summary

Today you learned:

✅ Implicit vs explicit dependencies
✅ When and how to use depends_on
✅ Data sources and their purpose
✅ Common AWS data sources
✅ How Terraform builds the resource graph
✅ Best practices for managing dependencies

🚀 Tomorrow’s Preview

Day 6: State Management Fundamentals

Tomorrow we’ll:

Deep dive into Terraform state
Understand state file structure
Learn about state locking
Configure remote state backends
Master state commands
Implement S3 backend for state storage

💭 Challenge Exercise

Modify today’s lab to:

Use a data source to find an existing S3 bucket
Add a resource that depends on both the VPC and the bucket
Create an explicit dependency between two resources
Add a data source for AWS SSM parameters

← Day 4: Variables & Outputs | Day 6: State Management →

Remember: Understanding dependencies is crucial for building reliable, complex infrastructure!

Command Palette