Evolution & governance

These guidelines are living. Agent capabilities change fast, and CLI patterns for agents should change with them. This page makes that explicit: how the document is versioned, the capability assumptions each part rests on, and the questions still open.

Status: v0.1 (draft), 2026-06-23.

Versioning

The standard is versioned MAJOR.MINOR.PATCH:

MAJOR — a breaking normative change: an invariant added, removed, or tightened such that a previously-conformant tool no longer conforms.
MINOR — new non-breaking guidance: a pattern, an antipattern, a SHOULD, a clarified level.
PATCH — editorial clarifications, examples, fixes.

A tool cites the version and level it targets (see Conformance). Numeric anchors like default output bounds are tunables, not invariants — they can move in a MINOR when the capability assumptions behind them shift.

Changelog

v0.1.0 (2026-06-23) — Initial draft: 10 invariants; patterns for foundations, safety, self-description, token economy, and auth; the antipattern catalogue; two conformance levels.

Capability assumptions

Every rule rests on an assumption about what LLM agents can and can’t do. When an assumption changes, the dependent rules should be revisited — not silently kept.

Rule / area	Assumption today	Revisit when
Bounded output / token caps	Context is scarce; recall degrades as it fills (“context rot”).	Windows grow and recall stays flat → raise default bounds (keep the principle).
`schema` + `agent` self-description	Models learn a tool at runtime, not from training; they benefit from a machine-readable contract.	Models reliably infer the full surface from `--help` alone → demote toward SHOULD.
Prompt-injection fencing	Agents may follow instructions found in fetched content.	Models become reliably immune to injected instructions → relax fencing.
Never require a TTY prompt	Agents have no TTY and can’t answer prompts.	Durable for headless agents; unlikely to change.
Read-only by default	Agents act on inference, can be wrong, can be steered.	Durable — this is about consequence, not capability.
Structured output / errors / exit codes	Machines parse fields and branch on codes.	Durable.

If you find yourself wanting to break a rule because “the model is good enough now,” that’s the signal to open a proposal against the assumption — not to quietly ignore the rule.

Open questions

The CLI ⇄ MCP boundary. Where exactly does a CLI stop being the right transport? The field’s rule of thumb (high-frequency/local → CLI; infrequent/SaaS → MCP) is useful but not precise. See Philosophy.
Identity, audit, and revocation. The strongest argument against CLIs for agents in multi-user/enterprise settings: when an agent runs a CLI, whose identity is it, and how is that call audited and revoked? CLIs have no clean answer yet. See Auth.
A standard shape for schema --json. There is no ratified cross-tool schema for a CLI’s self-description. Converging on one would let agents introspect any conformant tool uniformly.
Output bounds as windows grow. The right default cap is empirical and tied to recall, not just window size. Needs measurement, not guesswork.

Governance

This is an opinionated, evidence-backed standard maintained in the open. To propose a change:

Open an issue on GitHub naming the rule and the assumption you’re challenging (with evidence where possible).
Normative changes (invariants, levels) ship in a MAJOR/MINOR with rationale recorded in the changelog; clarifications ship as PATCH.
Disagreement is expected and welcome — a living standard earns its authority by being revised in public, not by being frozen.