Skip to main content

Command Palette

Search for a command to run...

Day 5: Resource Dependencies & Data Sources

Updated
9 min read
Day 5: Resource Dependencies & Data Sources
S

I'm a cloud-native enthusiast and tech blogger, sharing insights on Kubernetes, AWS, CI/CD, and Linux across my blog and Facebook page. Passionate about modern infrastructure and microservices, I aim to help others understand and leverage cloud-native technologies for scalable, efficient solutions.

Welcome to Day 5! Today we’ll explore how Terraform manages relationships between resources and how to query existing infrastructure using data sources. Understanding dependencies is crucial for building complex, reliable infrastructure.

🎯 Today’s Goals

  • Understand implicit vs explicit dependencies

  • Master the depends_on meta-argument

  • Learn about data sources and their uses

  • Query existing AWS resources

  • Build infrastructure that references external resources

  • Understand the resource graph

🔗 Resource Dependencies

When building infrastructure, resources often depend on each other. Terraform needs to know the order to create or destroy them.

Example Dependency Chain

    VPC
     │
     ├─► Subnet
     │     │
     │     └─► EC2 Instance
     │
     └─► Internet Gateway
           │
           └─► Route Table

🤝 Implicit Dependencies

Implicit dependencies are automatically detected when one resource references another’s attributes.

# VPC is created first (no dependencies)
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

# Subnet depends on VPC (implicit dependency)
resource "aws_subnet" "public" {
  vpc_id     = aws_vpc.main.id  # ← This creates implicit dependency
  cidr_block = "10.0.1.0/24"
}

# Instance depends on Subnet (implicit dependency)
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"
  subnet_id     = aws_subnet.public.id  # ← Implicit dependency
}

Terraform’s creation order:

  1. VPC

  2. Subnet (waits for VPC)

  3. Instance (waits for Subnet)

Destruction order (reverse):

  1. Instance

  2. Subnet

  3. VPC

📌 Explicit Dependencies (depends_on)

Sometimes dependencies exist that Terraform can’t detect automatically. Use depends_on for explicit dependencies.

When to Use depends_on

# IAM role must exist before instance profile
resource "aws_iam_role" "instance_role" {
  name = "instance-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })
}

# IAM policy attachment
resource "aws_iam_role_policy_attachment" "instance_policy" {
  role       = aws_iam_role.instance_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Instance profile needs the policy to be attached
resource "aws_iam_instance_profile" "instance_profile" {
  name = "instance-profile"
  role = aws_iam_role.instance_role.name

  # Explicit dependency - ensure policy is attached first
  depends_on = [aws_iam_role_policy_attachment.instance_policy]
}

Multiple Dependencies

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"

  depends_on = [
    aws_iam_instance_profile.instance_profile,
    aws_security_group.web,
    aws_subnet.public
  ]
}

🔍 Data Sources

Data sources allow Terraform to query existing infrastructure or external information. They don’t create resources - they only read data.

Data Source Syntax

data "provider_resource" "name" {
  # Filter criteria
}

# Reference with: data.provider_resource.name.attribute

Example: Query Existing VPC

# Query existing VPC by tag
data "aws_vpc" "existing" {
  tags = {
    Name = "production-vpc"
  }
}

# Use the VPC ID in a new subnet
resource "aws_subnet" "new_subnet" {
  vpc_id     = data.aws_vpc.existing.id
  cidr_block = "10.0.10.0/24"
}

📚 Common AWS Data Sources

1. AWS AMI (Amazon Machine Image)

# Get latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t2.micro"
}

2. AWS Availability Zones

# Get all available AZs in current region
data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_subnet" "public" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]
}

3. AWS Account Information

data "aws_caller_identity" "current" {}

output "account_id" {
  value = data.aws_caller_identity.current.account_id
}

output "caller_arn" {
  value = data.aws_caller_identity.current.arn
}

4. AWS Region

data "aws_region" "current" {}

output "current_region" {
  value = data.aws_region.current.name
}

5. Existing Security Group

data "aws_security_group" "default" {
  name   = "default"
  vpc_id = aws_vpc.main.id
}

6. Existing Subnet

data "aws_subnet" "selected" {
  filter {
    name   = "tag:Name"
    values = ["production-subnet-1"]
  }
}

🧪 Hands-On Lab: Dependencies & Data Sources

Let’s build a complete infrastructure using both implicit/explicit dependencies and data sources!

Step 1: Create Project Directory

mkdir terraform-dependencies-lab
cd terraform-dependencies-lab

Step 2: Create data-sources.tf

# data-sources.tf

# Get current AWS region
data "aws_region" "current" {}

# Get current AWS account
data "aws_caller_identity" "current" {}

# Get available availability zones
data "aws_availability_zones" "available" {
  state = "available"
}

# Get latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  filter {
    name   = "root-device-type"
    values = ["ebs"]
  }
}

Step 3: Create main.tf

# main.tf

terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

# VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "dependencies-lab-vpc"
  }
}

# Internet Gateway (implicit dependency on VPC)
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "dependencies-lab-igw"
  }
}

# Public Subnets using data source for AZs
resource "aws_subnet" "public" {
  count = 2

  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.${count.index + 1}.0/24"
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "public-subnet-${count.index + 1}"
    AZ   = data.aws_availability_zones.available.names[count.index]
  }
}

# Route Table
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "public-route-table"
  }
}

# Route Table Association
resource "aws_route_table_association" "public" {
  count = 2

  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# Security Group
resource "aws_security_group" "web" {
  name        = "web-security-group"
  description = "Allow HTTP and SSH"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "SSH"
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTP"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "web-sg"
  }
}

# IAM Role for EC2
resource "aws_iam_role" "ec2_role" {
  name = "ec2-ssm-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })

  tags = {
    Name = "ec2-ssm-role"
  }
}

# Attach SSM policy to role
resource "aws_iam_role_policy_attachment" "ssm_policy" {
  role       = aws_iam_role.ec2_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Instance Profile (explicit dependency on policy attachment)
resource "aws_iam_instance_profile" "ec2_profile" {
  name = "ec2-instance-profile"
  role = aws_iam_role.ec2_role.name

  # Explicit dependency to ensure policy is attached first
  depends_on = [aws_iam_role_policy_attachment.ssm_policy]
}

# EC2 Instance using AMI from data source
resource "aws_instance" "web" {
  ami                    = data.aws_ami.amazon_linux.id
  instance_type          = "t2.micro"
  subnet_id              = aws_subnet.public[0].id
  vpc_security_group_ids = [aws_security_group.web.id]
  iam_instance_profile   = aws_iam_instance_profile.ec2_profile.name

  user_data = <<-EOF
              #!/bin/bash
              yum update -y
              yum install -y httpd
              systemctl start httpd
              systemctl enable httpd
              echo "<h1>Hello from Terraform!</h1>" > /var/www/html/index.html
              echo "<p>Instance in ${data.aws_availability_zones.available.names[0]}</p>" >> /var/www/html/index.html
              echo "<p>AMI: ${data.aws_ami.amazon_linux.id}</p>" >> /var/www/html/index.html
              EOF

  tags = {
    Name = "web-server"
  }

  # Explicit dependency on route table association
  depends_on = [aws_route_table_association.public]
}

Step 4: Create outputs.tf

# outputs.tf

output "account_id" {
  description = "AWS Account ID"
  value       = data.aws_caller_identity.current.account_id
}

output "region" {
  description = "AWS Region"
  value       = data.aws_region.current.name
}

output "availability_zones" {
  description = "Available AZs"
  value       = data.aws_availability_zones.available.names
}

output "ami_id" {
  description = "AMI ID used for instance"
  value       = data.aws_ami.amazon_linux.id
}

output "ami_name" {
  description = "AMI name"
  value       = data.aws_ami.amazon_linux.name
}

output "vpc_id" {
  description = "VPC ID"
  value       = aws_vpc.main.id
}

output "subnet_ids" {
  description = "Subnet IDs"
  value       = aws_subnet.public[*].id
}

output "instance_id" {
  description = "EC2 Instance ID"
  value       = aws_instance.web.id
}

output "instance_public_ip" {
  description = "EC2 Instance Public IP"
  value       = aws_instance.web.public_ip
}

output "website_url" {
  description = "Website URL"
  value       = "http://${aws_instance.web.public_ip}"
}

Step 5: Initialize and Plan

terraform init
terraform plan

Notice in the plan:

  • Data sources are read first

  • Resources are created in dependency order

  • Implicit dependencies shown with arrows

Step 6: Visualize Dependencies

terraform graph | dot -Tpng > dependencies.png

Open dependencies.png to see the dependency graph!

Step 7: Apply Configuration

terraform apply

Type yes to confirm.

Step 8: Test the Website

After apply completes:

# Get the website URL
terraform output website_url
# Test with curl
curl $(terraform output -raw website_url)

You should see the HTML page with instance details!

Step 9: Examine Data Source Values

terraform output ami_id
terraform output ami_name
terraform output availability_zones

These values were queried from AWS, not hardcoded!

Step 10: Understand the Dependency Chain

Data Sources (Read First)
├── aws_region.current
├── aws_caller_identity.current
├── aws_availability_zones.available
└── aws_ami.amazon_linux

Resources (Created in Order)
├── 1. aws_vpc.main
├── 2. aws_internet_gateway.main (depends on VPC)
├── 3. aws_subnet.public[0,1] (depends on VPC, uses AZ data)
├── 4. aws_security_group.web (depends on VPC)
├── 5. aws_iam_role.ec2_role
├── 6. aws_iam_role_policy_attachment.ssm_policy (depends on role)
├── 7. aws_iam_instance_profile.ec2_profile (explicit depends_on policy)
├── 8. aws_route_table.public (depends on VPC and IGW)
├── 9. aws_route_table_association.public[0,1] (depends on subnet and RT)
└── 10. aws_instance.web (depends on subnet, SG, profile, uses AMI data)

Step 11: Clean Up

terraform destroy

Terraform destroys in reverse dependency order!

🎨 Resource Graph

Terraform builds a dependency graph to determine execution order:

# Generate graph in DOT format
terraform graph
# With specific plan
terraform graph -type=plan
# For destroy operations
terraform graph -type=plan-destroy

📊 Data Sources vs Resources

Data SourceResource
Reads existing infrastructureCreates new infrastructure
data "aws_vpc" "main"resource "aws_vpc" "main"
Referenced with data.aws_vpc.mainReferenced with aws_vpc.main
Read-onlyCreate/Update/Delete
No state changesManages state

🔑 Key Concepts

Implicit Dependencies

  • Created automatically when referencing attributes

  • Most common type

  • Terraform detects them automatically

Explicit Dependencies

  • Use depends_on meta-argument

  • For non-obvious dependencies

  • Accepts list of resources

Data Sources

  • Query existing infrastructure

  • Read external information

  • Don’t create resources

  • Evaluated before resources

📝 Best Practices

DO:

  1. Prefer implicit dependencies

     subnet_id = aws_subnet.main.id  # ✅ Implicit
    
  2. Use depends_on sparingly

     # Only when necessary
     depends_on = [aws_iam_role_policy_attachment.policy]
    
  3. Use data sources for existing resources

     data "aws_ami" "latest" {
       most_recent = true
     }
    
  4. Document why depends_on is needed

     depends_on = [aws_route_table.main]
     # Ensure route exists before instance tries to access internet
    

DON’T:

  1. Don’t use depends_on when implicit dependencies work

  2. Don’t create circular dependencies

  3. Don’t hardcode AMI IDs - use data sources

  4. Don’t assume resource creation order without dependencies

📝 Summary

Today you learned:

  • ✅ Implicit vs explicit dependencies

  • ✅ When and how to use depends_on

  • ✅ Data sources and their purpose

  • ✅ Common AWS data sources

  • ✅ How Terraform builds the resource graph

  • ✅ Best practices for managing dependencies

🚀 Tomorrow’s Preview

Day 6: State Management Fundamentals

Tomorrow we’ll:

  • Deep dive into Terraform state

  • Understand state file structure

  • Learn about state locking

  • Configure remote state backends

  • Master state commands

  • Implement S3 backend for state storage

💭 Challenge Exercise

Modify today’s lab to:

  1. Use a data source to find an existing S3 bucket

  2. Add a resource that depends on both the VPC and the bucket

  3. Create an explicit dependency between two resources

  4. Add a data source for AWS SSM parameters


← Day 4: Variables & Outputs | Day 6: State Management →


Remember: Understanding dependencies is crucial for building reliable, complex infrastructure!

T

Thank you!

but I found out there is error in outputs.tf. Wrong

value = "http://${aws_instance.web.public_ip}"

It must be

output "website_url" { description = "Website URL" value = "http://${aws_instance.web.public_ip}" }

1
S

Thanks for noting that! Yes, the correct full block in outputs.tf should be. I appreciate the feedback!.

More from this blog

S

StackOps - Diary

33 posts

Welcome to the StackOps - Diary. We’re dedicated to empowering the tech community. We delve into cloud-native and microservices technologies, sharing knowledge to build modern, scalable solutions.