Starting up...
0%
Back to all articles
Infrastructure as Code: A Comprehensive Guide to Terraform Best Practices
Terraform May 22, 2023

Infrastructure as Code: A Comprehensive Guide to Terraform Best Practices

Kabir Dikko
DevOps Engineer

Infrastructure as Code: A Comprehensive Guide to Terraform Best Practices

In the world of modern infrastructure management, manually provisioning and configuring resources is no longer feasible. Infrastructure as Code (IaC) has revolutionized how we build, change, and version infrastructure, with Terraform emerging as one of the leading tools in this space.

As a DevOps engineer who has implemented Terraform across multiple organizations, I’ve learned that while getting started with Terraform is relatively straightforward, using it effectively at scale requires following certain practices and patterns. This guide shares the lessons I’ve learned along the way.

Why Terraform?

Before diving into best practices, it’s worth highlighting why Terraform has become so popular:

  • Provider Agnostic: Works with AWS, Azure, GCP, and hundreds of other providers
  • Declarative Syntax: You define the desired state, not the steps to get there
  • State Management: Tracks the real-world resources that correspond to your configuration
  • Plan & Apply Model: Shows changes before they’re applied
  • Modular Design: Facilitates code reuse through modules

Organizing Your Terraform Code

1. Use a Consistent Directory Structure

A well-organized directory structure makes your Terraform code easier to navigate and understand:

infrastructure/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   └── production/
├── modules/
│   ├── networking/
│   ├── compute/
│   └── database/
└── README.md

This structure separates environment-specific configurations from reusable modules, making it easier to maintain consistency across environments.

2. Use Workspaces for Environment Separation

For simpler projects, Terraform workspaces provide an alternative to separate directory structures:

# Create and select workspaces
terraform workspace new dev
terraform workspace new staging
terraform workspace new production

# Select a workspace
terraform workspace select dev

# Reference the workspace in your Terraform code
resource "aws_instance" "web" {
  count = terraform.workspace == "production" ? 3 : 1
  # ... other configuration ...
  tags = {
    Environment = terraform.workspace
  }
}

However, for complex infrastructure, I recommend using separate directories for each environment for clarity and separation of state.

Terraform State Management

1. Use Remote State Storage

Never store state locally or in version control. Instead, use a remote backend:

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "us-west-2"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

This approach provides:

  • Secure storage with encryption
  • State locking to prevent concurrent modifications
  • Versioning for rollback capabilities

2. Implement State Locking

State locking prevents concurrent operations that could corrupt your state file:

terraform {
  backend "s3" {
    # ... other configuration ...
    dynamodb_table = "terraform-locks"
  }
}

With AWS, a DynamoDB table provides this locking mechanism.

Writing Maintainable Terraform Code

1. Use Variables and Locals for Flexibility and DRY

Leverage variables for user input and locals for computed values:

variable "environment" {
  description = "The deployment environment (dev, staging, prod)"
  type        = string
}

locals {
  common_tags = {
    Environment = var.environment
    Project     = "MyProject"
    ManagedBy   = "Terraform"
  }
}

resource "aws_instance" "example" {
  # ... other configuration ...
  tags = merge(local.common_tags, {
    Name = "example-instance"
  })
}

2. Output Important Values

Always output values that other systems might need:

output "vpc_id" {
  description = "The ID of the VPC"
  value       = module.vpc.vpc_id
}

output "public_subnets" {
  description = "List of IDs of public subnets"
  value       = module.vpc.public_subnets
}

3. Use Data Sources for Existing Resources

Integrate with existing infrastructure using data sources:

data "aws_vpc" "existing" {
  id = "vpc-12345678"
}

resource "aws_security_group" "example" {
  vpc_id = data.aws_vpc.existing.id
  # ... other configuration ...
}

Module Design Patterns

1. Create Composable Modules

Design modules that do one thing well and can be composed together:

module "vpc" {
  source = "./modules/networking/vpc"
  # ... parameters ...
}

module "web_servers" {
  source    = "./modules/compute/web"
  vpc_id    = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnet_ids
  # ... other parameters ...
}

2. Version Your Modules

For shared modules, use versioning to maintain stability:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "3.14.0"
  # ... parameters ...
}

3. Document Your Modules

Every module should have:

  • A README explaining its purpose and usage
  • Variable descriptions
  • Output descriptions
  • Example usage
variable "instance_type" {
  description = "The EC2 instance type to use for web servers"
  type        = string
  default     = "t3.micro"
}

Terraform Workflow Best Practices

1. Use a Consistent Workflow

Adopt a consistent workflow for applying changes:

# Initialize the working directory
terraform init

# Format the code
terraform fmt

# Validate the configuration
terraform validate

# Plan the changes
terraform plan -out=tfplan

# Apply the changes
terraform apply tfplan

2. Integrate with CI/CD

Automate your Terraform workflows with CI/CD:

# Example GitHub Actions workflow
name: Terraform

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v1
      
    - name: Terraform Init
      run: terraform init
      
    - name: Terraform Format
      run: terraform fmt -check
      
    - name: Terraform Validate
      run: terraform validate
      
    - name: Terraform Plan
      run: terraform plan
      
    # Only apply on merge to main
    - name: Terraform Apply
      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
      run: terraform apply -auto-approve

3. Implement Testing

Test your Terraform code with tools like Terratest:

package test

import (
	"testing"
	"github.com/gruntwork-io/terratest/modules/terraform"
	"github.com/stretchr/testify/assert"
)

func TestTerraformVPC(t *testing.T) {
	terraformOptions := &terraform.Options{
		TerraformDir: "../examples/vpc",
		Vars: map[string]interface{}{
			"region": "us-west-2",
			"cidr":   "10.0.0.0/16",
		},
	}

	defer terraform.Destroy(t, terraformOptions)
	terraform.InitAndApply(t, terraformOptions)
	
	vpcID := terraform.Output(t, terraformOptions, "vpc_id")
	assert.Contains(t, vpcID, "vpc-")
}

Security Best Practices

1. Manage Secrets Properly

Never store sensitive values in your Terraform code:

# BAD: Hard-coded secrets
resource "aws_db_instance" "example" {
  password = "super-secret-password"  # Don't do this!
}

# BETTER: Use variables with sensitive flag
variable "db_password" {
  description = "Database password"
  type        = string
  sensitive   = true  # Won't show in logs or CLI output
}

# BEST: Use a secrets manager
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "database/password"
}

resource "aws_db_instance" "example" {
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
}

2. Implement Least Privilege

Ensure your Terraform service accounts have only the permissions they need:

# Example AWS IAM policy for Terraform
resource "aws_iam_policy" "terraform" {
  name = "terraform-policy"
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "ec2:Describe*",
          "ec2:CreateVpc",
          "ec2:DeleteVpc",
          # ... other necessary permissions ...
        ]
        Effect   = "Allow"
        Resource = "*"
      }
    ]
  })
}

3. Use Provider Aliases for Multiple Accounts

Manage multiple accounts or regions securely:

provider "aws" {
  region = "us-west-2"
  alias  = "production"
  
  assume_role {
    role_arn = "arn:aws:iam::123456789012:role/TerraformRole"
  }
}

provider "aws" {
  region = "us-west-2"
  alias  = "development"
  
  assume_role {
    role_arn = "arn:aws:iam::987654321098:role/TerraformRole"
  }
}

resource "aws_s3_bucket" "prod_bucket" {
  provider = aws.production
  bucket   = "my-production-bucket"
}

resource "aws_s3_bucket" "dev_bucket" {
  provider = aws.development
  bucket   = "my-development-bucket"
}

Performance and Cost Optimization

1. Use for_each Instead of count Where Appropriate

The for_each meta-argument provides better handling when resources are added or removed from the middle of a list:

# Using count (not ideal for sets that change frequently)
resource "aws_iam_user" "example_count" {
  count = length(var.user_names)
  name  = var.user_names[count.index]
}

# Using for_each (better for sets that change)
resource "aws_iam_user" "example_foreach" {
  for_each = toset(var.user_names)
  name     = each.value
}

2. Use Terraform’s Import Feature for Existing Resources

Bring existing infrastructure under Terraform’s management:

terraform import aws_s3_bucket.website my-existing-bucket

Then, after import, document the resource in your Terraform files.

3. Implement Cost Estimation

Use the infracost tool to estimate costs before applying changes:

infracost breakdown --path .

Real-World Example: Multi-Tier Application

Let’s put these practices together with a simplified real-world example for a multi-tier application:

# main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
  
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "environments/production/terraform.tfstate"
    region         = "us-west-2"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

provider "aws" {
  region = var.region
  default_tags {
    tags = local.common_tags
  }
}

locals {
  common_tags = {
    Environment = var.environment
    Project     = var.project_name
    ManagedBy   = "Terraform"
    Owner       = "DevOps Team"
  }
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "3.14.0"
  
  name = "${var.project_name}-${var.environment}"
  cidr = var.vpc_cidr
  
  azs             = var.availability_zones
  private_subnets = var.private_subnet_cidrs
  public_subnets  = var.public_subnet_cidrs
  
  enable_nat_gateway = true
  single_nat_gateway = var.environment != "production"
}

module "security_groups" {
  source = "./modules/security"
  
  vpc_id = module.vpc.vpc_id
  environment = var.environment
}

module "web_tier" {
  source = "./modules/compute/web"
  
  instance_type  = var.web_instance_type
  instance_count = var.web_instance_count
  subnet_ids     = module.vpc.private_subnets
  security_group_ids = [module.security_groups.web_sg_id]
  
  depends_on = [module.vpc]
}

module "database" {
  source = "./modules/database"
  
  engine_version    = "13.4"
  instance_class    = var.db_instance_class
  subnet_ids        = module.vpc.database_subnets
  security_group_ids = [module.security_groups.db_sg_id]
  
  # Get password from secrets manager
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
  
  depends_on = [module.vpc]
}

# Outputs for other systems
output "vpc_id" {
  description = "The ID of the VPC"
  value       = module.vpc.vpc_id
}

output "web_instance_ids" {
  description = "IDs of the web tier instances"
  value       = module.web_tier.instance_ids
}

output "database_endpoint" {
  description = "The connection endpoint for the database"
  value       = module.database.endpoint
  sensitive   = true
}

Conclusion

Terraform has transformed how we manage infrastructure, but using it effectively requires deliberate practices. By following these guidelines, you can create maintainable, secure, and efficient infrastructure code that scales with your organization.

The key takeaways:

  1. Structure your code logically with clear separation of environments
  2. Manage state carefully with remote backends and locking
  3. Write maintainable code using variables, outputs, and consistent formatting
  4. Design modules to be composable and well-documented
  5. Implement workflows that include validation, planning, and testing
  6. Secure your infrastructure by managing secrets properly and applying least privilege
  7. Optimize for performance and cost

Remember, the goal of Infrastructure as Code isn’t just automation—it’s creating a reliable, versionable, and testable definition of your infrastructure that evolves with your application.


Do you have any Terraform best practices that you’ve found particularly valuable? Share them in the comments below!

Share this article