Rewrite Your CLI for Agents (Or Get Replaced)

How to build CLI tools for AI agents: the interface shift developers are missing

Mar 10, 2026

The most important interface shift in a decade is happening right now, and most teams are sleepwalking through it.

AI agents are the fastest-growing consumer of developer tooling. They don’t click buttons. They don’t read man pages. They invoke commands, parse output, and move on. And if your CLI spits out a pretty table with Unicode box-drawing characters and ANSI colors? Congratulations — you’ve built something an agent has to hallucinate its way through.

The post that crystallized the moment landed on Hacker News recently: Justin Poehnelt’s “You Need to Rewrite Your CLI for AI Agents,” written from the experience of building Google’s new Workspace CLI — agents-first from day one. It hit the front page and the comments exploded. Everyone felt the pain it describes.

The thesis is simple: the primary consumer of your CLI is no longer a human. Act accordingly or get wrapped, forked, or replaced by someone who does.

The Interface Mismatch

Here’s what “human-first” CLI design looks like:

my-cli spreadsheet create \
  --title "Q1 Budget" \
  --locale "en_US" \
  --sheet-title "January" \
  --frozen-rows 1 \
  --frozen-cols 2

Ten flags. Flat namespace. Can’t express nesting without inventing bespoke flag hierarchies. A human can tab-complete their way through it. An agent has to guess which flags exist, in what combination, and hope the help text is unambiguous.

Now the agent-first version:

gws sheets spreadsheets create --json '{
  "properties": {"title": "Q1 Budget", "locale": "en_US"},
  "sheets": [{"properties": {"title": "January",
    "gridProperties": {"frozenRowCount": 1, "frozenColumnCount": 2}}}]
}'

One flag. The full API payload. An LLM generates this trivially because it maps directly to the schema. Zero translation loss.

This isn’t about abandoning human ergonomics. It’s about making the raw-payload path a first-class citizen alongside your convenience flags. The practical minimum: --output json, an OUTPUT_FORMAT=json env var, or — better yet — NDJSON by default when stdout isn’t a TTY.

Schema Introspection > Static Docs

Agents can’t google your documentation without blowing their token budget. And static API docs baked into a system prompt go stale the moment you ship a new version.

The Google Workspace CLI solved this with runtime schema introspection:

gws schema drive.files.list
gws schema sheets.spreadsheets.create

Each call dumps the full method signature — params, request body, response types, required OAuth scopes — as machine-readable JSON. The agent self-serves. No pre-stuffed documentation. No 50-page system prompt.

This is the pattern that matters: make the CLI itself the documentation, queryable at runtime. Your tool should be able to answer “what do you accept?” and “what will you return?” without the agent ever leaving the terminal.

The gh CLI already does a version of this. docker does it. The tools that don’t are the ones getting wrapped by shim layers — and every shim is a maintenance liability waiting to happen.

Context Window Discipline

Here’s a number that should scare you: a single Gmail API response can consume a meaningful chunk of an agent’s context window. Humans scroll past irrelevant fields. Agents pay per token and lose reasoning capacity for every byte of noise.

Two mechanisms matter:

Field masks limit what the API returns. gws drive files list --params '{"fields": "files(id,name,mimeType)"}' — only get what you need.

NDJSON pagination emits one JSON object per line, stream-processable without buffering an entire response into memory. The agent processes page by page instead of choking on a 200KB blob.

This is context window discipline, and it’s non-negotiable. If your CLI dumps everything and expects the consumer to filter, you’re burning tokens that could be spent on reasoning.

The MCP Question

“But what about MCP?” Fair question. Anthropic’s Model Context Protocol was supposed to be the universal connector — a clean, structured protocol for agents to talk to any tool. And it works. But there’s a cost nobody talks about.

Jannik Reinhard ran the numbers in a real-world comparison. A compliance-checking task against Microsoft Graph:

MCP approach: ~145,000 tokens (28K just for schema injection before asking a single question)
CLI approach: ~4,150 tokens

That’s a 35x reduction. A typical MCP server ships dozens or hundreds of tool definitions, all of which get dumped into the agent’s context whether it needs them or not. Stack a few MCP servers for a real enterprise workflow — GitHub, a database, Microsoft Graph, Jira — and you’re burning 150K+ tokens on plumbing alone.

MCP isn’t wrong. But it’s an abstraction layer, and abstraction layers have tax. For many workflows, a well-designed CLI with --json output and schema introspection is faster, cheaper, and more reliable than routing through a protocol server. The CLI is the tool call.

The Checklist

If you maintain a CLI and you’re not thinking about agent consumers, here’s the minimum viable checklist. It’s not long:

--json flag everywhere. Structured output to stdout, human messages to stderr.
Meaningful exit codes. Not just 0/1. Agents need to branch on failure modes.
Idempotent operations. Agents retry. Your tool should handle that gracefully.
Schema introspection. mytool schema <command> should return what the command accepts and returns.
NDJSON pagination. Stream large result sets. Don’t buffer.
Noun-verb command structure. mytool resource action — it turns discovery into a tree search instead of a guessing game.
TTY detection. Pretty output for humans, JSON for pipes. Automatically.

None of this is exotic. Most of it is just good Unix hygiene that we’ve been lazy about for years. The difference is that now there’s a consumer — a very fast-growing, very demanding consumer — that will route around your tool if you don’t provide it.

The Bottom Line

RTK, a Hacker News Show HN from last month, wraps existing CLI commands to strip human-oriented formatting before it hits an agent’s context. It saved 60-90% of tokens. That tool exists because your CLI doesn’t output clean data by default.

Google just shipped a Workspace CLI built agents-first. CLIWatch is building benchmarks that score tools on agent-readiness — pass rates, token efficiency, turn counts — with badges for your README.

The migration is happening. The question isn’t whether your CLI needs an agent-friendly interface. It’s whether you build it yourself or someone else builds a wrapper that makes you a dependency they’d rather not have.

Your CLI’s next power user doesn’t read your README. It reads your --help output, introspects your schema, and parses your JSON. Design for that user, or watch them move on to someone who did.

The Undercurrent

Discussion about this post

Ready for more?