Securing Your Agent Runtime

Harden your agent runtime before production: sandbox security, credential isolation, an egress proxy for agents, and dangerous-command approval gates.

Agent SecurityAI OperationsGuide

Kimmo Nurmisto

Founder, Grolea · 7 min read

You have agents running. They read from a database, call a few APIs, maybe push code or move money. The prototype works. Now comes the part that decides whether this thing is allowed anywhere near production: making the runtime safe to leave unattended.

This is the gap most teams hit second. The first problem is getting an agent to do the task at all. The second is realizing that the same agent, given a wide token and a shell, can leak a credential, call an endpoint it should never touch, or run a destructive command because a prompt told it to. A secure agent runtime is what closes that gap. Below is the threat model worth taking seriously, the specific controls that address it, and where packaged configuration saves you the weeks it takes to assemble these by hand.

What goes wrong when an agent runs unsandboxed

An unsandboxed agent is a program that decides its own next action from untrusted input. That is a different risk shape than a normal service, and it fails in ways worth naming concretely.

Credential leakage through the model. Agents need secrets to do real work: a database URL, an API key, a deploy token. If those secrets sit in the agent's environment or get pasted into its context, they are one prompt-injection away from exfiltration. A malicious string in a fetched web page or a poisoned ticket can convince an agent to print its environment, summarize its own config, or POST a key to an attacker's endpoint. The model does not know the token is secret. It only knows the instruction looked plausible.

Unbounded egress. By default an agent can reach the entire internet. That is convenient until it is the delivery mechanism. Unrestricted outbound traffic is how a leaked credential actually leaves the building, how a compromised dependency phones home, and how exfiltrated data gets where it is going. If you cannot say which hosts your agent is allowed to talk to, you cannot say what it is doing with your data.

Destructive commands with no gate. An agent with shell access and a vague instruction will eventually run something irreversible. rm -rf on the wrong path, a DROP TABLE against prod instead of staging, a force-push that erases history, a payment call with an extra zero. The failure is rarely malice. It is an agent confidently acting on a misread instruction, with nothing between the decision and the side effect.

Blast radius with no walls. When all of an agent's tools share one identity and one environment, a single bad step contaminates everything. There is no boundary that says this task may read but not write, or may touch the test database but not the production one. One compromised step owns the whole runtime.

The throughline: an agent's autonomy is exactly what makes it useful and exactly what makes an ungoverned runtime dangerous. The fix is not to make the agent less capable. It is to put walls around what any single action can reach.

Mitigation patterns that actually hold

Three controls do most of the work. Each maps directly to one of the failure modes above, and they compound: credential isolation contains what an action can authenticate as, an egress proxy contains where it can send, and approval gates contain what it can irreversibly do.

Credential isolation

The goal of agent credential isolation is simple to state: an agent should never hold a secret it does not need for the action in front of it, and a leaked secret should buy an attacker as little as possible.

In practice that means a few things working together. Secrets are injected per-task and scoped to that task, not loaded globally into a long-lived agent environment. Credentials are short-lived and brokered, so the agent receives a narrow, expiring token rather than a root key. Where you can, the agent never sees the raw secret at all: it calls through a broker that holds the credential and makes the privileged call on its behalf. Read paths and write paths get different identities, so a task that only needs to read cannot mutate anything even if it is hijacked.

The test is a question you should be able to answer for any agent you run: if this agent's full context leaked right now, what could an attacker do with what is in it? If the answer is "reach production with admin rights," the runtime is not hardened. If the answer is "use a read-only token that expires in five minutes," you have isolation that holds.

An egress proxy for agents

Credential isolation limits what an action can authenticate as. An egress proxy limits where it can send. Routing all agent outbound traffic through an egress proxy turns an open door into a guest list. Instead of "any host on the internet," the agent gets an allowlist: these package registries, these API hosts, this model endpoint, nothing else.

A well-configured egress proxy for agents does three things: enforce a default-deny allowlist, log every outbound request, and give you a single revocation point. The allowlist turns a new outbound destination into a deliberate decision rather than a silent capability. The logs convert "we think the agent only talked to our API" into an audit trail you can actually read. And the revocation point means when something looks wrong, you cut egress in one place instead of chasing it across every tool. The payoff is direct: a leaked credential is far less dangerous when there is nowhere unapproved to send it.

Dangerous-command approval gates

Some actions should never happen on autopilot. A dangerous command approval step puts a human (or a stricter policy) between the agent's decision and an irreversible side effect.

The pattern is a classifier plus a gate. You define which operations are high-consequence: deletes, production writes, financial transactions, force-pushes, anything you cannot cleanly undo. Routine actions run autonomously, because gating everything trains people to approve without reading. High-consequence actions pause for explicit approval, with the full command and its context shown to the approver before anything executes. The result is autonomy where it is cheap and a checkpoint where a mistake is expensive. This is the control that lets you actually leave an agent running, because the worst-case action is one a person sees first.

Underneath all three sits the sandbox itself. Agent sandbox security means each task runs in an isolated execution environment with a constrained filesystem, scoped network access, and no standing path to anything outside its boundary. The sandbox is what makes the other three controls enforceable rather than advisory. Without it, credential scoping, egress rules, and approval gates are conventions an agent can step around. With it, they are walls.

What Paperclip pre-configures, and what stays custom to your stack

The open-source Paperclip runtime is built around these boundaries rather than bolting them on afterward. It's public, open-source, and I run it in production. Out of the box it gives you the enforceable primitives: tasks can run in isolated execution environments, secrets live in a vault and reach an agent as references rather than ambient environment variables, and high-consequence actions can be gated through native approval types and task-level review gates. That's the scaffolding the three controls above hang on.

What it doesn't hand you (and shouldn't, because it can't know your stack) is the configuration that turns those primitives into a security posture. The egress allowlist is the clearest example: routing agent traffic through a default-deny proxy is infrastructure you stand up, and only you know which hosts belong on the list. The same goes for the finer-grained credential work (short-lived brokered tokens, separate read and write identities, how your secrets broker is wired), for which operations count as high-consequence in your business, and for which identity maps to which task. This is the part that takes weeks, because it's judgment about your environment, not a library you install — which is exactly why it's worth getting right before anything runs unattended.

If you're hardening an agent runtime for production, the controls above are the short list worth getting right first: sandbox isolation, credential isolation, a default-deny egress proxy, and approval gates on the actions you can't undo. Get those four in place and you have a runtime you can actually leave running.

I write about building and running agent companies in production, including the security work, as I do it. Get on the list for the next deep-dives, and email me if you're working through agent runtime hardening on your own stack.