Infrastructure as Code: A Comprehensive Guide to Terraform Best Practices
In the world of modern infrastructure management, manually provisioning and configuring resources is no longer feasible. Infrastructure as Code (IaC) has revolutionized how we build, change, and version infrastructure, with Terraform emerging as one of the leading tools in this space.
As a DevOps engineer who has implemented Terraform across multiple organizations, I’ve learned that while getting started with Terraform is relatively straightforward, using it effectively at scale requires following certain practices and patterns. This guide shares the lessons I’ve learned along the way.
Why Terraform?
Before diving into best practices, it’s worth highlighting why Terraform has become so popular:
- Provider Agnostic: Works with AWS, Azure, GCP, and hundreds of other providers
- Declarative Syntax: You define the desired state, not the steps to get there
- State Management: Tracks the real-world resources that correspond to your configuration
- Plan & Apply Model: Shows changes before they’re applied
- Modular Design: Facilitates code reuse through modules
Organizing Your Terraform Code
1. Use a Consistent Directory Structure
A well-organized directory structure makes your Terraform code easier to navigate and understand:
infrastructure/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ └── production/
├── modules/
│ ├── networking/
│ ├── compute/
│ └── database/
└── README.md
This structure separates environment-specific configurations from reusable modules, making it easier to maintain consistency across environments.
2. Use Workspaces for Environment Separation
For simpler projects, Terraform workspaces provide an alternative to separate directory structures:
# Create and select workspaces
terraform workspace new dev
terraform workspace new staging
terraform workspace new production
# Select a workspace
terraform workspace select dev
# Reference the workspace in your Terraform code
resource "aws_instance" "web" {
count = terraform.workspace == "production" ? 3 : 1
# ... other configuration ...
tags = {
Environment = terraform.workspace
}
}
However, for complex infrastructure, I recommend using separate directories for each environment for clarity and separation of state.
Terraform State Management
1. Use Remote State Storage
Never store state locally or in version control. Instead, use a remote backend:
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "production/terraform.tfstate"
region = "us-west-2"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
This approach provides:
- Secure storage with encryption
- State locking to prevent concurrent modifications
- Versioning for rollback capabilities
2. Implement State Locking
State locking prevents concurrent operations that could corrupt your state file:
terraform {
backend "s3" {
# ... other configuration ...
dynamodb_table = "terraform-locks"
}
}
With AWS, a DynamoDB table provides this locking mechanism.
Writing Maintainable Terraform Code
1. Use Variables and Locals for Flexibility and DRY
Leverage variables for user input and locals for computed values:
variable "environment" {
description = "The deployment environment (dev, staging, prod)"
type = string
}
locals {
common_tags = {
Environment = var.environment
Project = "MyProject"
ManagedBy = "Terraform"
}
}
resource "aws_instance" "example" {
# ... other configuration ...
tags = merge(local.common_tags, {
Name = "example-instance"
})
}
2. Output Important Values
Always output values that other systems might need:
output "vpc_id" {
description = "The ID of the VPC"
value = module.vpc.vpc_id
}
output "public_subnets" {
description = "List of IDs of public subnets"
value = module.vpc.public_subnets
}
3. Use Data Sources for Existing Resources
Integrate with existing infrastructure using data sources:
data "aws_vpc" "existing" {
id = "vpc-12345678"
}
resource "aws_security_group" "example" {
vpc_id = data.aws_vpc.existing.id
# ... other configuration ...
}
Module Design Patterns
1. Create Composable Modules
Design modules that do one thing well and can be composed together:
module "vpc" {
source = "./modules/networking/vpc"
# ... parameters ...
}
module "web_servers" {
source = "./modules/compute/web"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
# ... other parameters ...
}
2. Version Your Modules
For shared modules, use versioning to maintain stability:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "3.14.0"
# ... parameters ...
}
3. Document Your Modules
Every module should have:
- A README explaining its purpose and usage
- Variable descriptions
- Output descriptions
- Example usage
variable "instance_type" {
description = "The EC2 instance type to use for web servers"
type = string
default = "t3.micro"
}
Terraform Workflow Best Practices
1. Use a Consistent Workflow
Adopt a consistent workflow for applying changes:
# Initialize the working directory
terraform init
# Format the code
terraform fmt
# Validate the configuration
terraform validate
# Plan the changes
terraform plan -out=tfplan
# Apply the changes
terraform apply tfplan
2. Integrate with CI/CD
Automate your Terraform workflows with CI/CD:
# Example GitHub Actions workflow
name: Terraform
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Terraform
uses: hashicorp/setup-terraform@v1
- name: Terraform Init
run: terraform init
- name: Terraform Format
run: terraform fmt -check
- name: Terraform Validate
run: terraform validate
- name: Terraform Plan
run: terraform plan
# Only apply on merge to main
- name: Terraform Apply
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: terraform apply -auto-approve
3. Implement Testing
Test your Terraform code with tools like Terratest:
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestTerraformVPC(t *testing.T) {
terraformOptions := &terraform.Options{
TerraformDir: "../examples/vpc",
Vars: map[string]interface{}{
"region": "us-west-2",
"cidr": "10.0.0.0/16",
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
vpcID := terraform.Output(t, terraformOptions, "vpc_id")
assert.Contains(t, vpcID, "vpc-")
}
Security Best Practices
1. Manage Secrets Properly
Never store sensitive values in your Terraform code:
# BAD: Hard-coded secrets
resource "aws_db_instance" "example" {
password = "super-secret-password" # Don't do this!
}
# BETTER: Use variables with sensitive flag
variable "db_password" {
description = "Database password"
type = string
sensitive = true # Won't show in logs or CLI output
}
# BEST: Use a secrets manager
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = "database/password"
}
resource "aws_db_instance" "example" {
password = data.aws_secretsmanager_secret_version.db_password.secret_string
}
2. Implement Least Privilege
Ensure your Terraform service accounts have only the permissions they need:
# Example AWS IAM policy for Terraform
resource "aws_iam_policy" "terraform" {
name = "terraform-policy"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"ec2:Describe*",
"ec2:CreateVpc",
"ec2:DeleteVpc",
# ... other necessary permissions ...
]
Effect = "Allow"
Resource = "*"
}
]
})
}
3. Use Provider Aliases for Multiple Accounts
Manage multiple accounts or regions securely:
provider "aws" {
region = "us-west-2"
alias = "production"
assume_role {
role_arn = "arn:aws:iam::123456789012:role/TerraformRole"
}
}
provider "aws" {
region = "us-west-2"
alias = "development"
assume_role {
role_arn = "arn:aws:iam::987654321098:role/TerraformRole"
}
}
resource "aws_s3_bucket" "prod_bucket" {
provider = aws.production
bucket = "my-production-bucket"
}
resource "aws_s3_bucket" "dev_bucket" {
provider = aws.development
bucket = "my-development-bucket"
}
Performance and Cost Optimization
1. Use for_each
Instead of count
Where Appropriate
The for_each
meta-argument provides better handling when resources are added or removed from the middle of a list:
# Using count (not ideal for sets that change frequently)
resource "aws_iam_user" "example_count" {
count = length(var.user_names)
name = var.user_names[count.index]
}
# Using for_each (better for sets that change)
resource "aws_iam_user" "example_foreach" {
for_each = toset(var.user_names)
name = each.value
}
2. Use Terraform’s Import Feature for Existing Resources
Bring existing infrastructure under Terraform’s management:
terraform import aws_s3_bucket.website my-existing-bucket
Then, after import, document the resource in your Terraform files.
3. Implement Cost Estimation
Use the infracost
tool to estimate costs before applying changes:
infracost breakdown --path .
Real-World Example: Multi-Tier Application
Let’s put these practices together with a simplified real-world example for a multi-tier application:
# main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0"
}
}
backend "s3" {
bucket = "company-terraform-state"
key = "environments/production/terraform.tfstate"
region = "us-west-2"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
provider "aws" {
region = var.region
default_tags {
tags = local.common_tags
}
}
locals {
common_tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "Terraform"
Owner = "DevOps Team"
}
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "3.14.0"
name = "${var.project_name}-${var.environment}"
cidr = var.vpc_cidr
azs = var.availability_zones
private_subnets = var.private_subnet_cidrs
public_subnets = var.public_subnet_cidrs
enable_nat_gateway = true
single_nat_gateway = var.environment != "production"
}
module "security_groups" {
source = "./modules/security"
vpc_id = module.vpc.vpc_id
environment = var.environment
}
module "web_tier" {
source = "./modules/compute/web"
instance_type = var.web_instance_type
instance_count = var.web_instance_count
subnet_ids = module.vpc.private_subnets
security_group_ids = [module.security_groups.web_sg_id]
depends_on = [module.vpc]
}
module "database" {
source = "./modules/database"
engine_version = "13.4"
instance_class = var.db_instance_class
subnet_ids = module.vpc.database_subnets
security_group_ids = [module.security_groups.db_sg_id]
# Get password from secrets manager
password = data.aws_secretsmanager_secret_version.db_password.secret_string
depends_on = [module.vpc]
}
# Outputs for other systems
output "vpc_id" {
description = "The ID of the VPC"
value = module.vpc.vpc_id
}
output "web_instance_ids" {
description = "IDs of the web tier instances"
value = module.web_tier.instance_ids
}
output "database_endpoint" {
description = "The connection endpoint for the database"
value = module.database.endpoint
sensitive = true
}
Conclusion
Terraform has transformed how we manage infrastructure, but using it effectively requires deliberate practices. By following these guidelines, you can create maintainable, secure, and efficient infrastructure code that scales with your organization.
The key takeaways:
- Structure your code logically with clear separation of environments
- Manage state carefully with remote backends and locking
- Write maintainable code using variables, outputs, and consistent formatting
- Design modules to be composable and well-documented
- Implement workflows that include validation, planning, and testing
- Secure your infrastructure by managing secrets properly and applying least privilege
- Optimize for performance and cost
Remember, the goal of Infrastructure as Code isn’t just automation—it’s creating a reliable, versionable, and testable definition of your infrastructure that evolves with your application.
Do you have any Terraform best practices that you’ve found particularly valuable? Share them in the comments below!