AI Agents That Deploy Themselves: The Infrastructure Stac...

Coding agents write code. That part works. What happens after the code is written — the deploy, the domain, the billing, the cloud account — has always needed a human. That constraint is dissolving. In the last few weeks, a stack of infrastructure designed specifically for autonomous agents has started to take shape. Agents can now create cloud accounts, purchase domains, provision databases, and deploy to production with no human copying API tokens or filling out signup forms.

This is not a theoretical pipeline. It is shipping today. And it changes what "done" means when you tell an agent to build something.

What changed: agents as cloud customers

On April 30, 2026, Cloudflare and Stripe co-announced a protocol under Stripe Projects that lets agents provision cloud infrastructure on behalf of a user. The flow:

1. You start a project with stripe projects init

2. You tell your coding agent what to build

3. The agent queries a catalog of available services (Cloudflare Workers, domain registration, storage, databases)

4. It provisions a new Cloudflare account if one does not exist

5. It registers a domain and deploys the application

6. You get a running production app on a real domain

The human accepts the terms of service and approves payment, but no manual setup is required. Stripe handles identity (the user is already signed in) and provides a payment token — the agent never touches raw credit card data. A default spending limit of $100/month per provider prevents runaway costs.

This is not Cloudflare-specific. The protocol is designed to be generic. Any platform with signed-in users can act as the "orchestrator" and integrate with infrastructure providers the same way Stripe does. PlanetScale is already integrated for Postgres provisioning directly from Cloudflare.

The three pillars of agent infrastructure

The Cloudflare/Stripe protocol formalizes three primitives that agents need to operate autonomously in production:

Discovery

Before an agent can provision anything, it needs to know what services exist. The stripe projects catalog command returns a JSON catalog of available services from all providers. For an agent, this is context. It reads the catalog, understands what it can provision, and makes decisions based on what the user asked for. No documentation browsing, no guessing.

This mirrors how MCP (Model Context Protocol) servers expose tools to agents — a machine-readable description of capabilities. The difference is that MCP describes what an agent can do within a session, while service catalogs describe what an agent can provision in the world.

Authorization

Creating accounts without sending humans through signup flows requires an identity provider. Stripe attests to the user's identity. If no Cloudflare account exists, one is created automatically and credentials are returned to the agent. If an account already exists, a standard OAuth flow grants access.

This is the "agent as first-class customer" pattern. The agent is not impersonating a human. It is acting on behalf of an authenticated user through a delegated, auditable flow.

Payment with limits

The hardest problem is trust. Agents are unpredictable. The protocol solves this with payment tokens (never raw card data) and default spending caps. The $100/month limit is conservative enough to prevent catastrophic bills while generous enough for real prototyping and small production workloads.

Budget alerts can be configured on the provisioned account for when the agent needs more room.

Safe execution: the sandbox problem

Giving agents production access raises an obvious question: what happens when something goes wrong? Two new tools address this from different angles.

Tilde.run: transactional agent filesystems

Tilde.run launched as a sandbox where every agent run is a transaction you can roll back. It composes data from multiple sources — GitHub repos, S3 buckets, Google Drive folders — into a single versioned filesystem mounted at `~/sandbox`. The agent reads and writes to this filesystem normally. On clean exit, changes commit atomically. On failure, nothing changes.

The isolation goes beyond filesystem transactions. Every outbound network call is policy-checked and logged. Cloud metadata endpoints (169.254.169.254) are blocked by default. Unauthorized hosts are denied. This prevents the three main agent security risks: data exfiltration, credential abuse, and prompt-injected callouts.

Key properties:

•POSIX filesystem — any language, any tool, no SDK lock-in
•Atomic commits and instant rollbacks for any agent run
•Network egress policies with default-deny
•Mounts from GitHub, S3, Google Drive, and local storage
•Per-action policies and human approval gates

Agent-skills-eval: measuring whether agent skills help

A separate problem: adding MCP tools and agent skills does not automatically improve outputs. The agent-skills-eval project on GitHub (featured on HN with 31 points) provides a framework for testing whether specific agent skills actually improve task completion. It runs the same task with and without a given skill and measures the difference.

This matters because the infrastructure stack is growing fast. Every new MCP server, every new catalog entry, every new sandbox tool adds complexity. Without evaluation, teams are just guessing that their agent stack is better than a simpler setup.

The bottleneck shifts again

Simon Willison wrote about vibe coding and agentic engineering converging. A separate essay on The Typical Set made a sharper point: the bottleneck was never the code. The bottleneck was always people trying to agree on what to build.

With agents handling provisioning and deployment, the bottleneck shifts further. When an agent can go from a spec to a running production app in minutes, the spec becomes the limiting factor. Not the code. Not the deploy. The spec.

This has implications for how teams structure work:

•Specifications get more precise. Vague tickets do not work when the implementer is an agent that will happily build exactly what you asked for, even if it is wrong.
•Review focuses on intent, not syntax. Code review with agents is about "is this the right thing to build?" not "is this variable named correctly?"
•Infrastructure choices become agent decisions. When the agent can query a catalog and provision what it needs, the team no longer needs to pre-configure cloud accounts, CI pipelines, and deployment targets.

What developers should do now

If you are building with agents today, here is the practical stack:

1. Agent framework — Claude Code, Cursor, OpenCode, or Copilot for implementation. Pick based on your workflow, not benchmarks. See the NeuralStackly benchmarks for current comparisons.

2. Service catalog + provisioning — Stripe Projects with Cloudflare is the most complete option today. stripe projects init and your agent has a path from code to production.

3. Sandbox for safety — Tilde.run for transactional execution with rollbacks. Essential if your agent touches production data or makes network calls you cannot fully predict.

4. Evaluation — agent-skills-eval or a custom harness to test whether your agent's tooling actually improves outcomes. Do not assume more tools equals better results.

5. Observability — LangSmith, Braintrust, or Helicone for tracing agent runs. You need to know what the agent did, not just that it finished.

The infrastructure is here. The discipline is not.

The tools for fully autonomous agent deployment exist right now. An agent can build an app, provision a cloud account, buy a domain, and ship to production. The safety rails — transactional sandboxes, spending limits, network policies — are catching up.

What is still missing is organizational discipline. Agents ship fast. They ship too fast if you let them. Jevons Paradox applies: when code gets cheaper to produce, teams produce more of it, not the same amount faster. More prototypes, more internal tools, more features nobody asked for. Focus is about saying no. That discipline is harder now, not easier.

The developers who benefit most from this infrastructure will be the ones who treat the agent as an accelerator for decisions they have already made — not as a replacement for making those decisions.

For more on AI developer tools, agent frameworks, and the infrastructure stack, browse the NeuralStackly tools directory or compare agents side-by-side on the comparisons page.