Principles of AI-Driven Infrastructure

date: 2026-03-08

tags: [#ai, #infrastructure, #opentofu, #iac, #security, #agents, #devops, #best-practices ]

draft: false

---

AI agents are great at generating boilerplate code, parsing logs, and explaining legacy code. But infrastructure is not ordinary code. It has state, irreversibility, cascading dependencies, loose coupling, and secrets right in the workflow. For an LLM, renaming a resource and dropping a database are just string changes.

My principles may not be perfect, but they reflect what I have learned over a year and a half of working with agents in infrastructure.

Preparing the Environment

1. Enforced Constraints

An agent should not decide not to break production – it should not have the ability to do so.

LLMs follow prohibitions unreliably, especially in long sessions. Therefore, the agent’s environment should interact with infrastructure in read-only mode. Mutating operations should only go through a separate pipeline after explicit human approval.

2. Secrets Outside the Agent’s Context

The context window is a potential leak channel. OpenTofu/Terraform state files contain passwords, keys, and endpoints in plain text. Helm values and Ansible group_vars often contain credentials. Real-world attacks have shown that agents treat private data the same as any other data by default – making no exceptions for it.

Protection methods:

Complete inaccessibility of secrets
Blocking file access
No mention of files and non-standard file placement
Use of decoy files

3. Verification of Sources and Supply Chains

The agent uses MCP servers, skills, third-party modules, community charts, IDE extensions – each of them is a potential attack vector.

Rule: the agent connects only to explicitly allowed sources with pinned versions and lock files. A new MCP/skill/module/extension is added only after a team review.

Task Definition

4. Context = Code + State + Decisions

An LLM sees configuration files but does not know what is actually running on the servers right now. Without context, the agent will substitute “reasonable defaults” from training data. Therefore, the agent should receive data from code, from you, from tofu plan, kubectl get, ansible-inventory, from descriptions of dependencies, constraints (budget, SLA, compliance), and the reasoning behind decisions (ADR, SRS). Everything, including what “everyone already knows,” must be in the agent’s context.

5. Only Delegate What Is Cheap to Verify

An agent can always make a mistake – the question is how quickly and reliably you will catch it.

The cost of verification is composed of several criteria:

Criterion	Cheap to verify	Expensive to verify
Automated checks	Tools exist: `tofu validate`, `helm lint`, `molecule test`	Manual analysis and expert judgment only
Change visibility	Changes are fully visible in `plan`/`diff`	Side effects are hidden (DNS propagation, IAM policy evaluation, network routes)
Number of dependencies	Isolated resource, few connections	Cascading dependencies between components
Reversibility	Rollback in seconds: `tofu apply` of the previous state	Rollback is impossible or requires downtime (data migration, CIDR change)
Required expertise	Any team engineer can understand the `plan`	Deep context needed (network topology, migration order, compliance)

6. Explicit Priority Hierarchy

An agent does not “feel” what data loss or downtime means. By default, these are less important to it than instructions, and even less so than the user’s request.

Without a hierarchy, the agent may sacrifice data, for example, while trying to optimize for speed or code “cleanliness.” An explicit order in AGENTS.md:

Data integrity and service availability
Security
Standards compliance
Implementation correctness
Speed and convenience

Each level must not be violated for the sake of a lower one. The agent does not delete or recreate an RDS for “cleanliness” (convenience vs. data integrity). It does not open 0.0.0.0/0 to “make it work” (correctness vs. security). It does not set force_destroy = true on S3 to simplify code (convenience vs. data integrity).

Independent automated reviewers should be configured with the same priority order; they should treat changes as “someone else’s” code and not be inclined to rationalize them.

Code Generation

7. Small Steps, Frequent Checkpoints

LLMs degrade on long chains of reasoning, and infrastructure changes are cascading, with little type safety and less explicit connections than in regular code.

One iteration = one logical change = one tofu plan / helm diff / ansible --check = one commit. Not “migrate the entire VPC,” but “add a subnet in a new AZ, show the plan.” Each checkpoint is a place where a human can stop and revert a single commit rather than an entire refactoring.

Result Verification

8. Deterministic Verification Instead of Trust

LLMs are poor at finding their own mistakes – asking them to “check” often leads to a convincing justification of the wrong answer.

The agent’s output is verified by a chain of tools. The key amplifier is a closed feedback loop: the agent sees the output, fixes the error, and reruns the check. The quality of the result is proportional to the quality of the feedback.

9. Automated Blast Radius Assessment

AI does not understand consequences – changing an instance type and deleting an RDS are equivalent to it. Therefore, the blast radius should be computed automatically from tool output: OPA/Sentinel policies on tofu plan, operation classification (create/update/destroy), counting affected resources and their criticality.

The agent can highlight a disproportionate blast radius relative to the task at hand, but it should not assess the degree itself, as this lulls the operator into complacency.

10. Cost Control

An AI agent generates a “reasonable” configuration from training data but has no idea about your budget.

A GPU instance instead of CPU, a NAT Gateway in every AZ, an RDS with excessive resources – all of this looks correct in the plan, passes validation, but can cost many times more than necessary.

Cost estimation (infracost, usage-based estimation) should be part of the feedback loop on par with tofu validate and plan. The agent sees the cost delta and stops if it exceeds the threshold.

Team and Process

11. Counteracting Skill Degradation

AI-generated code looks professional, passes linters – and dulls vigilance.

Research shows weaker cognitive engagement among users of AI tools: they remember less and feel less ownership. A team that has stopped understanding its own infrastructure will not be able to independently respond to an incident when the agent is unavailable or cannot get full context.

Countermeasures: regular “manual days” without the agent, on-call rotation with mandatory infrastructure walkthroughs, reviewing AI code with the same scrutiny as a junior’s code, periodic incident drills without AI access.

All of these principles boil down to one idea: an AI agent is a powerful generator, but not an arbiter. It writes the code, while deterministic tools and humans decide whether that code can be applied. The cheaper it is to verify the result, the more you can delegate.