Orchestrator — autograd-style multi-agent explorer

Phase 6.D introduced a sqlite-backed multi-agent orchestrator under src/stock_core/orchestrator/ that runs the research search itself. The design docs live in docs/decisions/orchestrator/.

Motivation

After three strategies and three "no edge" verdicts in Phase 5, the bottleneck was no longer ideas — it was the throughput of trying variants. Manually invoking each loss-function or feature-ablation experiment one at a time, waiting for it to finish, and writing the verdict to a markdown file was the slowest step. The orchestrator turns experiment exploration into a deterministic, parallel, self-resuming pipeline.

Architecture

Five "teams" run in parallel on the production box, each owning one research axis:

Team-Loss explores loss-function variants (negative_sharpe, sharpe_with_position_floor, negative_sortino, negative_mean_pnl, ...)
Team-Features explores feature ablations and feature-engineering tweaks
Team-Architecture explores neural-network shape variants (hidden ∈ {8, 16, 32}, depth, dropout, ...)
Team-Universe waits for one working trunk strategy, then scales to more stocks
Team-Personality waits for a trunk, then composes the three personalities (return-max / Sharpe-max / drawdown-averse)

Each team runs as a tmux session under the stockwork service account. A bash driver claims pending nodes from state.db atomically, invokes claude --print to run the experiment, parses the verdict back into the database. Two background sessions support the teams:

Supervisor — every 5 minutes, restarts dead team sessions, unsticks stale jobs.
Persister — every 30 minutes, commits state.db snapshots + state to git, syncs artifacts to S3 via rclone.

Autograd-style backward rules

The state graph is a DAG: each verdict is a node, each child_of relationship is an edge. Backward rules fire on verdict status to spawn new nodes — proven_accepted spawns universe + personality children; proven_rejected with high sharpe_std spawns a "tighter regularisation" sibling; and so on.

Two production bugs surfaced and were patched within minutes of launch (L33):

Inconclusive cascade — rules fired on tool-failure verdicts → infinite ever-longer node IDs. Patched all 5 rules to early-return on inconclusive.
Ping-pong cycle — rules fired on proven_rejected too → 2-loss pool oscillated forever. Added _depth(parent.id) >= 3 depth-cap guard.

State schema

The sqlite schema lives at src/stock_core/orchestrator/state.py. Core columns on nodes:

id — slash-separated path encoding lineage (e.g. loss/sharpe-pf/alpha-0p5-hidden-8)
team — owning team string
status — pending | running | proven_accepted | proven_rejected | inconclusive | blocked
verdict — JSON blob with sharpe_mean, mean_pnl_mean, all_4_leakage_passed, post_cost_proxy, notes, etc.
finished_at, started_at, parent_id

The orchestrator package: seed.py seeds initial nodes, runners.py exposes the experiment runners, backward.py holds the rules, state.py owns the schema and atomic claim semantics.

What it produced

The orchestrator ran 367 experiments end-to-end across the 12-phase lifetime of the project. 160+ rejected nodes shared the same structural pattern — mean_pnl = 0 + sharpe_std >> sharpe_mean — which was the empirical evidence for L32 (position-aware losses).

It also surfaced the first leakage-clean strategies in the codebase (L35) — the arch-r1-hidden-8 family with all four classic leakage tests passing and honest Sharpe = +0.124 with non-degenerate positions. Marked proven_rejected only because the post-cost proxy fell below 0.4 — the methodology was sound at this scale, the universe was the blocker.

Live state

The live/ pages on this site are regenerated every 10 minutes from orchestrator/state.db on the production box. See Orchestrator state for the current snapshot.

Orchestrator — autograd-style multi-agent explorer ​

Motivation ​

Architecture ​

Autograd-style backward rules ​

State schema ​

What it produced ​

Live state ​