Jump to Category
| ⚙️ State Management & Core Concepts | Advanced HCL & Expressions |
| Modules & Code Organization | Collaboration & Workflows |
| ️ Operations & CI/CD | ✨ Ecosystem & Advanced Patterns |
State Management & Core Concepts
1. What is the purpose of a remote backend and why is it crucial for team collaboration?
A **remote backend** stores the Terraform state file in a shared, remote location (like an S3 bucket, GCS bucket, or Azure Blob Storage). By default, Terraform stores state in a local file named `terraform.tfstate`.
It’s crucial for teams because:
- Shared State: It provides a single source of truth for the infrastructure’s state, allowing all team members to work from the same up-to-date information.
- State Locking: Most remote backends support locking. When one person runs `terraform apply`, the state is locked, preventing other team members from running concurrent `apply` commands and corrupting the state.
- Security: It keeps potentially sensitive information out of local developer machines and source control.
2. Explain the difference between `terraform refresh`, `terraform plan`, and `terraform apply`.
- `terraform refresh`: Reconciles the state file with the real-world infrastructure. It queries the cloud provider to check for any “drift” (changes made outside of Terraform) and updates the state file to match reality. This command is now implicitly run as part of a `plan`.
- `terraform plan`: Creates an execution plan. It compares the desired state (your HCL code) with the current state (in the state file, after a refresh), and shows you what actions Terraform will take (create, update, or destroy) to make the real infrastructure match your code. It does *not* make any changes.
- `terraform apply`: Executes the plan generated by `terraform plan`. It performs the actual create, update, and delete operations on your cloud resources.
3. What is a provisioner in Terraform and why should they be used as a last resort?
A **provisioner** is a feature that lets you execute scripts or configuration management tools on a local or remote machine as part of resource creation or destruction. Examples include `local-exec` (runs a command on the machine running Terraform) and `remote-exec` (runs a command on the newly created resource via SSH).
They are a last resort because Terraform cannot model the state of the actions they perform. If a provisioner fails, Terraform may not be able to recover cleanly and can leave the state file in a tainted, unknown state. The preferred approach is to use cloud-native initialization mechanisms like `user_data` scripts (for EC2) or to build custom machine images (e.g., with Packer) that already have the required software installed.
Read about Provisioners in the official docs.4. What are the key components of a Terraform provider?
A Terraform provider is a plugin that is responsible for understanding API interactions and exposing resources for a specific service (like AWS, GCP, or even Datadog).
Its key components are:
- Resources: The primary component. A resource definition maps to an object that can be created, read, updated, and deleted (CRUD), like an `aws_instance`.
- Data Sources: Allow Terraform to fetch information about existing resources that are managed outside of the current Terraform configuration.
- Provider Configuration: The block that configures the provider itself, such as setting the region or providing credentials.
5. What does the `terraform init` command do?
`terraform init` is the first command you run in a new or cloned Terraform project. It performs several initialization steps:
- Backend Initialization: Configures the backend where the state file will be stored (e.g., S3).
- Provider Plugin Installation: Scans the code for provider requirements, downloads the necessary plugins from the Terraform Registry, and installs them in the `.terraform` directory.
- Module Installation: If the configuration uses modules from external sources (like Git or the Registry), it downloads them into the `.terraform/modules` directory.
Advanced HCL & Expressions
6. What is the difference between using `count` and `for_each` to create multiple resources? When is `for_each` preferred?
Both `count` and `for_each` create multiple instances of a resource.
- `count`: Creates a list of identical resources. The instances are identified in the state file by an integer index (e.g., `aws_instance.server[0]`, `aws_instance.server[1]`). If you remove an item from the middle of the list, Terraform will see it as a change to all subsequent items, potentially causing them to be destroyed and recreated.
- `for_each`: Iterates over a map or a set of strings. The instances are identified in the state file by the map key or set string (e.g., `aws_instance.server[“app-a”]`, `aws_instance.server[“app-b”]`).
**`for_each` is almost always preferred** because it creates a stable mapping between your configuration and the resources. If you remove an item, only that specific resource is destroyed, and the others are unaffected. This makes your configuration more resilient to changes.
Read the documentation on `for_each`.7. What are dynamic blocks and what problem do they solve?
A **dynamic block** allows you to dynamically generate nested configuration blocks (like `ingress` rules in a security group or `setting` blocks in an App Service) inside a resource argument.
It solves the problem of needing to create a variable number of these nested blocks. Without dynamic blocks, you would have to resort to complex workarounds or create separate resources for each variation. A dynamic block iterates over a complex data structure (like a list of objects) and renders a configuration block for each item, making your code much cleaner and more reusable.
Explore Dynamic Blocks in the documentation.8. What is the purpose of the `locals` block?
A `locals` block is used to define local variables within a module. Its purpose is to reduce repetition and improve readability. You can use it to assign a short, descriptive name to a complex expression or to combine several values into a new data structure. Unlike input variables, local values cannot be overridden from outside the module. They are purely for making the code inside a module cleaner and easier to maintain.
9. Explain the `try` and `can` functions.
- `try(…)`: Evaluates a series of expressions and returns the result of the first one that succeeds without errors. If all expressions fail, it returns a specified fallback value. It’s useful for gracefully handling cases where an attribute might not exist.
- `can(…)`: Evaluates an expression and returns `true` if it can be evaluated successfully, or `false` if it produces an error. It’s a boolean check used for conditional logic where you need to know if a value is available before attempting to use it.
10. How do you create conditional resources in Terraform?
The standard way to create a conditional resource is to use the `count` meta-argument. You can set `count` based on a boolean variable. If the condition is true, `count` is set to 1, creating one instance of the resource. If the condition is false, `count` is set to 0, and no instances are created.
Example: `count = var.create_public_ip ? 1 : 0`
For resources created with `for_each`, you can achieve a similar effect by passing a conditionally filtered map to `for_each`.
11. What is a complex object and how would you define a variable for it?
A complex object is a data structure with nested attributes, like a map of objects or a list of maps. You define a variable for it using a type constructor like `object(…)` or `map(…)`.
Example for a variable that expects a map of user objects:
variable "users" {
type = map(object({
email = string
is_admin = bool
}))
}
Defining the type explicitly makes the configuration more robust and provides better validation and IDE support.
Modules & Code Organization
12. What are the key benefits of using Terraform modules?
- Reusability: Package common infrastructure patterns into a reusable unit that can be called multiple times.
- Organization: Break down a complex configuration into smaller, more manageable pieces.
- Encapsulation: Hide the complexity of a set of resources behind a simple interface (the module’s input variables).
- Consistency: Enforce standards and best practices by providing a standardized way to deploy a certain piece of infrastructure.
13. What are the different sources you can use for a module?
You can source modules from various locations:
- Local Paths: A directory on your local filesystem (e.g., `source = “./modules/vpc”`).
- Terraform Registry: The public or a private Terraform Registry (e.g., `source = “terraform-aws-modules/vpc/aws”`).
- Git Repository: Any Git repository over protocols like HTTPS or SSH. You can specify a branch, tag, or commit hash (e.g., `source = “git::https://example.com/vpc.git?ref=v1.0.0″`).
- HTTP URLs: A URL that points to a `.zip` archive of a module.
14. How do you manage outputs from a module?
You define outputs in a module’s `outputs.tf` file using an `output` block. This exposes specific values from the resources created inside the module. In the parent module that calls this module, you can then access these outputs using the syntax `module.
15. How can you pass provider configurations to a child module?
By default, a child module inherits the provider configurations from its parent. However, if you need to use a different provider configuration (e.g., deploying a resource into a different AWS region or account), you can pass it explicitly. You define multiple provider blocks in the parent module with an `alias`, and then pass the aliased provider to the child module via the `providers` meta-argument in the `module` block.
Collaboration & Workflows
16. Compare using Terraform Workspaces vs. separate directories for managing environments.
- Directories: Each environment (dev, staging, prod) has its own directory with its own set of `.tf` files and its own state file. This provides strong isolation but can lead to code duplication. This is often the recommended approach for production systems.
- Workspaces: Allows you to manage multiple states from a single directory of configuration files. Each workspace has its own separate state file. This is useful for feature branch development or temporary environments but can be problematic for managing distinct long-lived environments like dev/prod, as all environments share the same code and a change cannot be easily promoted from one to another.
17. What is Terragrunt and what problems does it solve?
Terragrunt** is a thin wrapper around Terraform that provides extra tools for working with multiple Terraform modules. It helps solve common problems in large-scale Terraform projects:
- DRY (Don’t Repeat Yourself): It allows you to define your backend configuration, provider versions, and input variables once in a parent `terragrunt.hcl` file and inherit them across multiple modules, reducing boilerplate.
- Orchestration: It helps manage dependencies between different Terraform modules, allowing you to `apply` or `destroy` multiple modules in the correct order.
- State Management: It simplifies the management of remote state configuration for multiple environments.
18. How does state locking work and why is it important?
State locking prevents multiple users from running Terraform operations on the same state file at the same time. When a command that could modify the state (like `apply`) is run, Terraform places a “lock” on the state file via the remote backend (e.g., using a DynamoDB table). If another user tries to run a conflicting command, they will receive an error until the first command completes and releases the lock. This is critical for preventing race conditions and state corruption in a team environment.
19. How would you structure a large Terraform project with multiple applications and environments?
A common approach is to use a directory-based structure that separates environments and components.
For example:
environments/
├── prod/
│ ├── networking/
│ │ └── main.tf
│ └── app1/
│ └── main.tf
└── staging/
├── networking/
└── app1/
This structure provides strong isolation. Reusable infrastructure patterns (like how to deploy a standard web app) would be defined in a separate `modules` directory and called from the environment-specific code. Tools like Terragrunt are often used to manage a structure like this and keep the code DRY.
Operations & CI/CD
20. What is the `terraform import` command and what are its limitations?
The `terraform import` command is used to bring existing, manually-created infrastructure under Terraform’s management. You provide the command with a resource address in your code and the ID of the real-world resource. Terraform then reads the resource’s current state and writes it to the state file.
Limitation: It does **not** generate the HCL code for you. You must write the resource block in your `.tf` files manually first. The command only populates the state file.
Read the documentation for `terraform import`.21. What does the `-target` flag do and why should it be used with extreme caution?
The `-target` flag allows you to direct Terraform’s `plan` and `apply` commands to a specific resource or module, ignoring all others. It’s intended for exceptional circumstances, like recovering from a previous failed run or working around a bug.
It should be used with caution because it can cause dependency “drift.” Terraform will not update other resources that might depend on the targeted resource, potentially leading to an inconsistent and broken infrastructure state. It bypasses the dependency graph, which is one of Terraform’s core safety features.
22. Describe a typical CI/CD workflow for Terraform.
- A developer creates a pull request (PR) with infrastructure changes.
- The CI system automatically runs `terraform init`, `terraform validate`, and `terraform fmt` to check for correctness and style.
- The CI system then runs `terraform plan` and posts the output as a comment on the PR for team review.
- Once the PR is approved and merged into the main branch, a separate CI/CD pipeline triggers.
- This pipeline runs `terraform init` and then `terraform apply -auto-approve` to deploy the changes to the target environment.
23. How do you handle sensitive data like passwords or API keys in Terraform?
You should never hardcode secrets in your `.tf` files. The best practice is to fetch them from a dedicated secrets management tool at runtime.
The flow is:
- Store the secret in a service like AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault.
- Use a `data` source in your Terraform code to read the secret from the secrets manager.
- Pass the value from the data source to the resource argument.
You can also mark variables as `sensitive = true` to prevent Terraform from showing their values in logs or plan outputs.
24. What are some strategies for upgrading the Terraform version and provider versions in a large project?
Upgrades should be done incrementally and carefully.
- Read Changelogs: Carefully read the release notes for any breaking changes.
- Use Version Constraints: Pin provider versions using `~>` (pessimistic constraint) to avoid accidentally picking up major breaking changes.
- Upgrade Incrementally: Upgrade one minor version at a time, running `terraform plan` after each step to see the impact.
- Test in a Non-Production Environment: Always apply and test the upgrade in a staging or development environment before touching production.
25. What does it mean if a resource is “tainted”?
A tainted resource is one that Terraform has marked as degraded or damaged. When you run `terraform apply`, Terraform will plan to destroy the tainted resource and then recreate it from scratch, even if there are no configuration changes. You can manually taint a resource using the `terraform taint` command (now deprecated in favor of `terraform apply -replace`). It’s useful for forcing the recreation of a resource that has become corrupted due to an external factor or a failed provisioner.
Ecosystem & Advanced Patterns
26. What is Policy as Code and how can it be used with Terraform?
Policy as Code (PaC) is the practice of defining rules and policies for your infrastructure in a high-level, human-readable language. These policies are then checked automatically as part of your CI/CD pipeline.
For Terraform, this can be implemented with tools like:
- HashiCorp Sentinel: Integrated into Terraform Cloud/Enterprise. It allows you to define fine-grained policies (e.g., “no S3 buckets can be public” or “EC2 instances must be tagged with ‘owner'”).
- Open Policy Agent (OPA): An open-source, general-purpose policy engine. You can use it with tools like `conftest` to check the JSON output of a `terraform plan` against your policies.
27. How does Terraform handle dependencies between resources?
Terraform automatically builds a dependency graph by analyzing the references between resources in your code. For example, if an `aws_instance` resource references the ID of an `aws_subnet`, Terraform knows it must create the subnet before it creates the instance. This is called an **implicit dependency**.
In rare cases where there is a dependency but no direct reference in the code, you can define an **explicit dependency** using the `depends_on` meta-argument to tell Terraform the correct creation order.
28. What is the lifecycle block in a resource definition?
The `lifecycle` block is a special meta-argument that customizes the behavior of a resource during its lifecycle. Key arguments include:
- `create_before_destroy`: If a change requires a resource to be replaced, this ensures the new resource is created *before* the old one is destroyed, allowing for zero-downtime replacements.
- `prevent_destroy`: A safety mechanism that will cause Terraform to error out if it tries to destroy this resource. Useful for protecting critical stateful resources like databases.
- `ignore_changes`: Tells Terraform to ignore changes to specific attributes that might be managed by an external process.
29. How can you write a custom provider for Terraform?
While most developers won’t write a custom provider from scratch, understanding the process is valuable. You would typically use the **Terraform Plugin Framework**, which is a Go framework for building providers. The process involves:
- Defining a schema for your resources and data sources.
- Implementing the CRUD (Create, Read, Update, Delete) functions for each resource. These functions will contain the logic for calling the target service’s API.
- Implementing the Read function for each data source.
- Compiling the Go code into a binary, which is the provider plugin that Terraform Core executes.
30. What is the difference between a `variable` and a `data` source?
- A `variable` is a parameter passed into your Terraform configuration, either from the command line, a `.tfvars` file, or a parent module. It’s for user-provided input.
- A `data` source is used to fetch information from an external source (like a cloud provider’s API) about resources that are *not* managed by your current Terraform configuration. For example, you would use a data source to get the ID of a pre-existing VPC so you can create a new subnet inside it.


