What is a Data Source in Terraform?
In Terraform, a data source allows you to fetch and use information defined outside of Terraform or managed by other means. Data sources enable you to access and use existing resources and configurations managed outside of your Terraform state. This can include infrastructure components, configuration details, or any information that Terraform can query from external systems.
Why Use Data Sources?
- Integration: Integrate Terraform configurations with existing infrastructure and resources managed by other tools.
- Reusability: Use information from resources that are not managed by Terraform within your Terraform configurations.
- Dynamic Configuration: Fetch and use dynamic values at runtime to make your Terraform configurations more flexible and adaptive.
Common Use Cases
- Fetching the latest AMI ID for an EC2 instance.
- Getting information about existing VPCs, subnets, security groups, etc.
- Querying external data sources like DNS records, user information, etc.
Structure of a Data Source
A data source in Terraform has the following structure:
data "provider_name_resource" "local_name" {
# Configuration arguments
}
- provider_name_resource: The type of data source (e.g.,
aws_ami
,aws_vpc
). - local_name: A unique name for the data source within your Terraform configuration.
- Configuration arguments: Parameters used to filter and fetch the required data.
Example Data Sources and Usage
Example 1: Fetching the Latest Amazon Linux AMI
provider "aws" {
region = "us-east-1"
}
data "aws_ami" "latest_amazon_linux" {
most_recent = true
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
owners = ["137112412989"] # Amazon
}
resource "aws_instance" "example" {
ami = data.aws_ami.latest_amazon_linux.id
instance_type = "t2.micro"
}
In this example:
- The
aws_ami
data source is used to fetch the most recent Amazon Linux 2 AMI ID. - The
aws_instance
resource then uses this AMI ID to launch a new EC2 instance.
Example 2: Fetching Information About an Existing VPC
provider "aws" {
region = "us-east-1"
}
data "aws_vpc" "default" {
default = true
}
resource "aws_subnet" "example" {
vpc_id = data.aws_vpc.default.id
cidr_block = "10.0.1.0/24"
}
In this example:
- The
aws_vpc
data source fetches information about the default VPC in the region. - The
aws_subnet
resource uses the VPC ID from the data source to create a new subnet within the default VPC.
Example 3: Using External Data Source
Terraform also supports external data sources using the external
provider. This can be useful to fetch data from external scripts or APIs.
provider "external" {}
data "external" "example" {
program = ["python", "${path.module}/scripts/fetch_data.py"]
query = {
param1 = "value1"
param2 = "value2"
}
}
output "external_data" {
value = data.external.example.result
}
In this example:
- The
external
data source runs a Python scriptfetch_data.py
and passes parametersparam1
andparam2
to it. - The output of the script is used as the value of
external_data
.
Advanced Example: Combining Data Sources
Here’s a more complex example that combines multiple data sources to configure an AWS infrastructure.
Example 4: Fetching AMI and VPC Information
provider "aws" {
region = "us-east-1"
}
# Fetch the latest Amazon Linux 2 AMI
data "aws_ami" "latest_amazon_linux" {
most_recent = true
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
owners = ["137112412989"] # Amazon
}
# Fetch information about the default VPC
data "aws_vpc" "default" {
default = true
}
# Fetch information about existing security group
data "aws_security_group" "example" {
filter {
name = "group-name"
values = ["default"]
}
}
# Use the fetched data to create an EC2 instance
resource "aws_instance" "example" {
ami = data.aws_ami.latest_amazon_linux.id
instance_type = "t2.micro"
subnet_id = data.aws_vpc.default.default_network_acl_id
vpc_security_group_ids = [data.aws_security_group.example.id]
}
- Data sources in Terraform are used to fetch and use data from existing resources and external systems.
- They provide a way to integrate Terraform configurations with resources not managed by Terraform.
- Data sources enhance the flexibility and reusability of your Terraform configurations.
Here’s an example of using the AWS data source in Terraform to retrieve information about an existing Amazon S3 bucket
In this example, we’re using the aws_s3_bucket
data source to retrieve information about an existing S3 bucket named example-bucket
. We’re then outputting several attributes of the bucket using the output
block.
The aws_s3_bucket
data source retrieves the specified bucket’s attributes such as bucket
, region
, arn
, policy
, id
, acceleration_status
, versioning
, website_domain
, website_endpoint
, and logging
as per the below documentation. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/s3_bucket
data "aws_s3_bucket" "example" {
bucket = "example-bucket"
}
output "bucket_name" {
value = data.aws_s3_bucket.example.bucket
}
output "bucket_region" {
value = data.aws_s3_bucket.example.region
}
output "bucket_arn" {
value = data.aws_s3_bucket.example.arn
}
output "bucket_policy" {
value = data.aws_s3_bucket.example.policy
}
Here’s an example of using the AWS data source in Terraform to retrieve information about an existing availability zone
In this example, we’re using the aws_availability_zone
data source to retrieve information about the availability zone named us-west-2a
in the US West (Oregon) region. We’re then outputting the zone_id
and zone_name
attributes of the availability zone using the output
block.
The aws_availability_zone
data source retrieves the specified availability zone’s attributes such as zone_id
, zone_name
, region_name
, and opt_in_status
as per the below documentation. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zone
data "aws_availability_zone" "example" {
state = "available"
name = "us-west-2a"
}
output "zone_id" {
value = data.aws_availability_zone.example.zone_id
}
output "zone_name" {
value = data.aws_availability_zone.example.zone_name
}