a governed terminal multiplexer for the Claude Code CLI
A terminal HUD that watches the Claude Code CLI from out-of-band taps and puts every permission decision in the operator's hands.
Context
Every Claude Code session is a black box. You see the output, not the cost, the tool sequence, or whether the agent is working, waiting, or stuck. I built Orchestra so I could run sessions unattended and read the machine's state at a glance when I came back.
Orchestra is a Python application that hosts the real interactive claude CLI inside a pseudo-terminal pane and overlays interpreted side panels built from structured taps. It is a terminal multiplexer, not a wrapper. The CLI runs exactly as it would in a normal terminal. Orchestra watches from outside.
Every interpretation comes from two out-of-band sources: Claude Code's hook system and its OpenTelemetry stream. Nothing is read off the screen. That keeps the data clean and the host invisible to the model.
Architecture
Three layers.
The pane layer hosts the real claude CLI in a pseudo-terminal, drives a pyte VT emulator with the byte stream, renders the screen, and forwards keystrokes. It interprets nothing. That boundary is a design rule, not a convention.
The interpretation layer takes structured events from two independent taps: a hooks server that accepts Claude Code hook events over HTTP, and an OTLP receiver that accepts the CLI's metric and log exports at a local endpoint. Both feeds normalise into one internal vocabulary in model/events.py. State accumulates in model/state.py, the single source of truth for all four panels. The hook and OTel views of the same tool call reconcile on tool_use_id, a join key I confirmed empirically across four tool calls.
The control layer is Phase 2, live in the current branch. A PreToolUse hook blocks, the relay round-trips to Orchestra's gate, and I approve, deny, or rewrite the call from the HUD. A pre-gate fire test confirmed that a native permissions.deny rule beats a hook allow from the same settings file. The destructive-class deny floor is platform-enforced.
The stack is Python 3.12, Textual for the TUI, pyte for terminal emulation, and stdlib for the PTY, HTTP server, and OTLP parsing. No external observability dependencies.
Governance
Three governance properties, stated plainly.
Out-of-band observation. The hooks and OTel taps are invisible to the model. There is no MCP tool calling back to Orchestra, which would add tokens to every turn and make observation depend on the model's cooperation. A hook fires deterministically on a lifecycle event. It does not wait for the model to choose to report.
Subscription guard. Orchestra launches claude with ANTHROPIC_API_KEY and ANTHROPIC_AUTH_TOKEN unset from the child environment. A session checked against claude auth status --json returns subscriptionType: max, a real subscription session.
Gate ordering. A matching permissions.deny rule in the scoped settings file blocks a tool call even when the hook returns allow. I confirmed this with a three-run differential test on claude 2.1.175 on 13 June 2026. Tighten-only is platform-enforced, not a convention.
What shipped
Phase 0 proved the substrate: the real interactive CLI embedded in a Textual PTY pane, confirmed on Textual 8.2.7, pyte 0.8.2, Python 3.12. It also confirmed the http/json OTLP transport and the --settings merge behaviour.
Phase 1 shipped the full observe-only HUD: four panels (activity log, session status, token usage, topology), the ctrl+\ focus toggle, auth-mode display, and a responsive layout. The live exit test passed on claude 2.1.173 and was clean on 2.1.175. Test suite: 131 passed, 1 skipped. Pyright strict clean. The usage panel accumulated live token totals from a real session; the numbers come from the Ledger export, never typed in.
Phase 2 is underway. The gate spine, the D2 gate matrix, and the D3 hybrid timeout are in phase2/gate-spine. The pre-gate fire test is resolved.
Artifacts
The artifacts above link to the source documents. The Ledger and Eval now produce real exports, and every number on this site comes from one of them. A number that does not come from a real run is not a number.
Evidence