a scaffolder for phase-gated, eval-first agent missions
A CLI that scaffolds phase-gated, worktree-isolated, eval-first agent missions, built by an agent team running the discipline it encodes.
Context
The method is the asset. Every build in this portfolio ran the same way: a mission brief written before any code, phases with human gates, disposable workers scoped to their own subtree, reviewers who never grade their own work, and a failing eval that defines done. I had been hand-rolling that structure each time. orchestrate turns it into one command.
What it does
orchestrate init
How it was built
The part worth seeing: orchestrate was built by the exact method it scaffolds. An orchestrator dispatched scoped Sonnet workers into isolated git worktrees, each owning a non-overlapping part of the tree. Reviews were done by Opus reviewers who wrote none of the code under review. It ran in five gated phases, spec and a failing eval first, then templates, then the CLI, then a worked example wired to the eval, then a clean-clone review gate. Workers did no git; the manager committed at each boundary. The tool that scaffolds agent missions was itself the output of an agent mission, run under the discipline it encodes.
What's proven
Eval-first throughout: the suite was proven failing first, then driven to green, ending at eleven passing tests. The generated output is byte-identical across two runs. A mission produced by init passes check, the tool eating its own cooking. The final gate was a review from a clean clone, and the falsification table came back clean on every criterion: the dogfood round-trip holds, the suite is green from a fresh checkout, output is deterministic, no worker touched git, and nothing outside the declared scope was added. I re-ran the suite and the round-trip first-hand to confirm.
It is a brutally-scoped v1: two commands, stdlib only, no platform around it. That is the point. The value is not surface area, it is a repeatable method made concrete, with the discipline proven on its own construction.
Demo