Orchestrator — autograd-style multi-agent explorer
Phase 6.D introduced a sqlite-backed multi-agent orchestrator under src/stock_core/orchestrator/ that runs the research search itself. The design docs live in docs/decisions/orchestrator/.
Motivation
After three strategies and three "no edge" verdicts in Phase 5, the bottleneck was no longer ideas — it was the throughput of trying variants. Manually invoking each loss-function or feature-ablation experiment one at a time, waiting for it to finish, and writing the verdict to a markdown file was the slowest step. The orchestrator turns experiment exploration into a deterministic, parallel, self-resuming pipeline.
Architecture
Five "teams" run in parallel on the production box, each owning one research axis:
- Team-Loss explores loss-function variants (
negative_sharpe,sharpe_with_position_floor,negative_sortino,negative_mean_pnl, ...) - Team-Features explores feature ablations and feature-engineering tweaks
- Team-Architecture explores neural-network shape variants (
hidden ∈ {8, 16, 32}, depth, dropout, ...) - Team-Universe waits for one working trunk strategy, then scales to more stocks
- Team-Personality waits for a trunk, then composes the three personalities (return-max / Sharpe-max / drawdown-averse)
Each team runs as a tmux session under the stockwork service account. A bash driver claims pending nodes from state.db atomically, invokes claude --print to run the experiment, parses the verdict back into the database. Two background sessions support the teams:
- Supervisor — every 5 minutes, restarts dead team sessions, unsticks stale jobs.
- Persister — every 30 minutes, commits
state.dbsnapshots + state to git, syncs artifacts to S3 via rclone.
Autograd-style backward rules
The state graph is a DAG: each verdict is a node, each child_of relationship is an edge. Backward rules fire on verdict status to spawn new nodes — proven_accepted spawns universe + personality children; proven_rejected with high sharpe_std spawns a "tighter regularisation" sibling; and so on.
Two production bugs surfaced and were patched within minutes of launch (L33):
- Inconclusive cascade — rules fired on tool-failure verdicts → infinite ever-longer node IDs. Patched all 5 rules to early-return on
inconclusive. - Ping-pong cycle — rules fired on
proven_rejectedtoo → 2-loss pool oscillated forever. Added_depth(parent.id) >= 3depth-cap guard.
State schema
The sqlite schema lives at src/stock_core/orchestrator/state.py. Core columns on nodes:
id— slash-separated path encoding lineage (e.g.loss/sharpe-pf/alpha-0p5-hidden-8)team— owning team stringstatus—pending | running | proven_accepted | proven_rejected | inconclusive | blockedverdict— JSON blob withsharpe_mean,mean_pnl_mean,all_4_leakage_passed,post_cost_proxy,notes, etc.finished_at,started_at,parent_id
The orchestrator package: seed.py seeds initial nodes, runners.py exposes the experiment runners, backward.py holds the rules, state.py owns the schema and atomic claim semantics.
What it produced
The orchestrator ran 367 experiments end-to-end across the 12-phase lifetime of the project. 160+ rejected nodes shared the same structural pattern — mean_pnl = 0 + sharpe_std >> sharpe_mean — which was the empirical evidence for L32 (position-aware losses).
It also surfaced the first leakage-clean strategies in the codebase (L35) — the arch-r1-hidden-8 family with all four classic leakage tests passing and honest Sharpe = +0.124 with non-degenerate positions. Marked proven_rejected only because the post-cost proxy fell below 0.4 — the methodology was sound at this scale, the universe was the blocker.
Live state
The live/ pages on this site are regenerated every 10 minutes from orchestrator/state.db on the production box. See Orchestrator state for the current snapshot.