How to Set Up an AI Agent Company

How to set up an AI agent company: an operator's guide to org design, agent mandates, budgets, governance, and recovery when an agent gets stuck.

Agent CompaniesAgent GovernanceAI OperationsOperator Guide

Kimmo Nurmisto

Founder, Grolea · 9 min read

If you've decided to run part of your company with AI agents, the "set up" part turns out to be the hard part. The demos make it look easy: write a clever prompt, point it at a tool, walk away. Then you try to run real work through it and the cracks show. Two agents do the same job. Nobody owns the result. Costs run away on the third task. The whole thing quietly drifts from what you intended.

I've stood up several agent companies now, and one lesson keeps repeating: you are not configuring a chatbot, you are setting up a company. The unit of work is the org, not the prompt. Roles, budgets, approval gates, and a way to recover when something breaks. Get that scaffolding right and the agents earn their keep. Skip it and you've built an expensive way to generate plausible-looking nonsense.

This is the guide I wish I'd had when I started: how to set up an AI agent company that actually runs, written for the operator doing the configuring rather than an audience of researchers.

Start with the org, not the agents

The instinct is to open a blank file and start writing your first agent. Resist it. The first thing to get right in agent company operations is the same thing you'd get right in a human company: who does what, and who decides.

Before you define a single agent, write down the shape of the company. The goals, meaning the outcomes you're actually trying to produce. The functions you need to hit them: research, writing, review, delivery. And where authority sits, which is the decision an agent can make alone versus the one that needs a human or a more senior agent to sign off.

This sounds like overhead when you have two agents. It stops sounding like overhead the moment you have eight, because by then the failure mode isn't a bad answer from one agent. It's two agents stepping on each other, or a task that finished with nobody accountable for whether it was right. An org chart, even a tiny one, is what prevents that. Map the company first; the agents are just the staffing of a structure you've already designed.

Write mandates, not prompts

A prompt tells a model what to say next. A mandate tells an agent what it owns, what it must never do, and who it answers to. That difference is everything once the agent is going to act repeatedly, unsupervised, over days.

For each role, I write down four things:

Scope. The work this agent owns end to end, stated concretely enough that you could tell whether a given task belongs to it.
Boundaries. The things it must escalate rather than decide. This is where you encode judgement: never publish to the live site, never commit spend above the cap, never imply a product is available when it isn't.
Handoffs. Who it receives work from and who it hands work to. Agents that don't know their handoffs invent them, and invented handoffs are where work falls on the floor.
Decision rights. The explicit list of what it can do without asking, kept separate from what it must escalate.

Writing this for every role is also the cheapest quality control you'll ever do. Most "the agent went rogue" stories are really "nobody wrote down what the agent wasn't allowed to do." A clear mandate is a guardrail you author once instead of a fire you fight every week.

Give every agent a budget and an approval gate

Two controls do more to keep an agent company sane than anything else: a spending budget and an approval gate. Neither is glamorous, and both are the difference between a system you can leave running and one you have to babysit.

A budget is a hard ceiling on what an agent can spend, whether that's tokens, API calls, or money, before it has to stop and ask. Agents are cheerful and tireless, which is exactly the problem. An agent in a loop will happily spend your monthly cap in an afternoon if nothing tells it to stop. The budget is what tells it to stop. In Paperclip that is the per-agent Budget tab, backed by two loop caps most people miss: Max turns per run ships at 1000 (bring it down to what a real task finishes inside, around 30), and Max concurrent runs ships at 20 (cap it to one or two, and to one for a decision-maker like a CEO agent so two runs can't make conflicting calls at once).

An approval gate is a checkpoint where work pauses for sign-off before it goes any further. This is the heart of practical AI agent governance, and I mean governance of your agent fleet and the actions it's allowed to take in the world, not the model-safety sense of the term. The questions are concrete and operational. Which actions are reversible and which aren't? Sending an email, merging code, publishing a page, spending money: none of those should happen without a gate. Reading a file or drafting a document can run wide open.

A useful pattern here is the policy gate, a rule evaluated before an agent is allowed to take a sensitive action, so the check happens automatically instead of relying on the agent to remember its own boundaries. The mandate says what's not allowed; the policy gate enforces it. You want both, because an agent having a bad day will cheerfully ignore its own instructions, and the gate doesn't care how the agent is feeling. In Paperclip these are built-in approval types and per-task review gates, with a board hiring gate (requireBoardApprovalForNewAgents in .paperclip.yaml) that makes adding a new agent itself require sign-off; how tightly you gate is a posture you set deliberately, loose to tight, not a default you inherit, though the safest starting posture is fail-closed: deny by default, and open one allow at a time. The same class of gate aimed at destructive shell commands rather than business actions is the backbone of a secure agent runtime.

Plan for the agent that gets stuck

Everything above assumes things go right. The real test of a setup is what happens when they don't, and with agents that's not an edge case. It's a Tuesday. So before you go live, decide what to do when an AI agent gets stuck.

Agents get stuck in recognisable ways:

It loops, repeating the same failing action and burning budget each time.
It produces confident output that's subtly wrong, or reports a task done when it isn't, and the next agent downstream builds on it.
It hits a dependency that isn't ready (a file, an approval, or another agent's output) and either stalls or hallucinates its way past the gap.

Good AI agent error recovery comes down to three things you set up in advance. First, bounded retries: an agent that fails should try again a fixed number of times and then escalate, not loop forever. Second, a recovery path, a defined "if you're blocked, do this." Mark the work blocked, name what you're waiting on, hand it to whoever owns the unblock, and stop. An agent that knows how to be blocked correctly is worth more than one that never admits it's stuck. Third, state you can inspect: a record of what each agent did and why, so when something comes out wrong you can find where it went wrong instead of guessing.

The operators who are happy with their agent companies aren't the ones whose agents never fail. They're the ones who designed for failure on purpose, so a stuck agent is a logged, recoverable event instead of a silent corruption that surfaces a week later.

Treat setup as a repeatable artifact, not a one-off

Here's the trap that catches people who've done all of the above well. They do it once, by hand, for one company. Then the second company starts from a blank file again. Every mandate gets re-derived, slightly differently. The budgets and gates that worked the first time get half-remembered the second. The setup drifts, and drift is where the subtle errors live.

The fix is to treat your setup as an artifact instead of an act. Write the org design, the mandates, the budgets, and the gates down as structured files you can review, version, and reuse. The point isn't bureaucracy. A company you can read back is a company you can correct, and a company you can correct is one you can run more than one of.

This is the reasoning behind a tool I built and open-sourced, Paperclip Blueprints: a small CLI that turns a markdown brief into a complete, schema-validated Paperclip company, with identity, agent definitions, projects, operations rules, and governance, and the platform's best practices already wired in. Wiring that generated company after import, the secrets, models, and run-policy caps a portable bundle can't carry, is the step that turns it from imported to running. It exists because doing this scaffolding by hand kept producing the same inconsistencies and burning the same hours. If you want the longer version of why I moved Grolea toward building these operating systems instead of advising on them, that's in the Rewrite.

You don't need a tool to get this right, and you certainly don't need mine. If you are weighing platforms, the field spans several different categories, and the right one depends on the layer you're solving for. What you do need is an agent company setup checklist you can run more than once: org before agents, mandates before prompts, a budget and a gate on every agent, a recovery path for the ones that get stuck, and the whole thing written down well enough that the next company starts from your best work instead of a blank page.

The short version

Setting up an AI agent company is less about the models than the operating system you wrap around them. Decide the org first. Give each agent a mandate, a budget, and a gate. Plan for the agent that gets stuck before it does. Capture the whole setup as something you can reuse, so your second company is better than your first instead of just different.

Do that, and the agents stop being a demo and start being a company.