Skip to content

The journey

A one-page diary of what it actually takes to test whether a small team — well, one person plus AI coding agents — can beat the Indian stock market with machine learning.

The starting hypothesis

If you give a neural network enough features about Indian large-cap stocks — prices, news, macro signals, calendars — and train it directly against profit-and-loss instead of against intermediate proxies, it should find tradeable patterns. Other people do this professionally. The infrastructure to try it costs less than $200 a month.

That's the bet. The goal was modest by professional standards: a Sharpe ratio between 0.4 and 0.7 — translating roughly to "consistently makes a bit more than safe alternatives, after costs." Anything above that would be suspect; anything below that would be statistical noise.

What "differentiable trading" means in plain English

Most quant-finance models do this:

  1. Build a model that predicts something — like "will this stock go up tomorrow?"
  2. Then write rules — "if predicted up, buy 1% of portfolio."
  3. Backtest the rules. Tweak. Repeat.

Differentiable trading skips step 2. The same neural network is responsible for both the prediction and the position size, and it's trained directly against the only thing that matters: did this trade make money? Gradients flow backward from the P&L all the way to the model's weights, through the entire simulated trading process.

It's a clean idea. It produces beautiful learning curves. The training loss drops cleanly. The numbers look great.

And almost always, the numbers are wrong — because the model has found a way to cheat.

The leakage problem (and why most ML-finance projects don't survive scrutiny)

"Leakage" is when the model accidentally sees something it shouldn't have at prediction time. Some examples that took weeks to catch:

  • The model memorised tickers. When you tell it "this is RELIANCE", it doesn't actually need the features — it can just remember that RELIANCE rose 12% in 2023 and predict similarly. Solved by stripping ticker identity from inputs.
  • Per-stock normalisation leaked the future. Normalising each stock's features using that stock's historical mean accidentally encoded each stock's identity into the scale itself. Caught by Learning #57.
  • A "cheat column" planted as a test sometimes failed silently. Standardising features after planting the cheat made the cheat too small to be detected — which meant the test passed when leakage was actually present elsewhere. Caught by Learning #58.

Every time one of these surfaces, the "edge" disappears. After many rounds, the apparent edge is gone.

What "no measurable edge" looks likeHonest Sharpe on the X-axis. Shaded band = statistical noise at this scale.-2-1-0.50+0.5+1+3+5noise bandtarget 0.4–0.7-0.51per-stock momentum-0.35linear momentum+5artifact: failed window-robustnessartifact: failed static-featuresLeakage-clean (every test passes)Artifact-positive (a test fails)

The honest result, after 12 phases, 367 controlled experiments, and 54 numbered learnings:

No measurable edge on Indian large-caps at either the 5-stock × 3-year or the 46-stock × 3-year scale, across 8 different loss functions, 12 default-configurations, and many hyperparameter variants.

The two strategies that survived every single test produced Sharpes of -0.35 and -0.51 — both inside the statistical noise band. Honest negative results.

A year, twelve phasesEach phase is a published learning — even when the answer is "no edge here either".PHASE 1Differentiablebacktest2-4Data + lossPHASE 5-6Leakage harness8 strategies, no edge7-8Nifty 50 scale9-10Per-stock baseline115-agent orchestrator12Window verdictPIVOTOpenBracketv1 build

Why is "no edge found" still worth a year?

A few reasons:

  1. The infrastructure is the deliverable. The leakage tests, the walk-forward harness, the multi-agent orchestrator that runs experiments overnight — all of that is reusable. It's why phase 2 can be built in weeks instead of months.
  2. It rules out one whole class of approaches at this scale and feature set. Future research can start where this one ended, instead of where it started.
  3. The pattern of failure is the finding. Each phase failed the same way — the model finds something that looks like an edge, the new leakage test catches the artifact, the edge disappears. Knowing that pattern is more valuable than the seventh attempt at the same thing.

The pivot: OpenBracket

After the negative result was solid, the project pivoted. The new framing:

Don't try to trade. Just forecast a bracket. Let a human decide.

Concretely: instead of an end-to-end model that decides positions, build a pre-open forecasterOpenBracket — that runs once a day before the market opens, and outputs a low-to-high bracket for each stock in NIFTY 100, with calibrated confidence intervals and explanations.

That's the product side. The methodology — leakage testing, walk-forward only, bit-exact reproducibility — carries over directly. What changes is the model (XGBoost replaces neural networks, because the year showed trees handle the tabular structure just as well with less drama) and the deliverable (a JSON file you read at 8:15 AM, not an autonomous trading bot).

What didn't change

The hardest lesson from year one was epistemic, not technical:

A leakage-clean Sharpe of -0.5 is more honest research output than a leakage-leaking Sharpe of +5.

The negative results are documented carefully on the Technical side, with each phase getting its own page. Every numbered learning (there are 54 of them) is cross-referenced. If someone else wants to try the same hypothesis, every dead-end is mapped.

Some open questions that the year didn't close:

  • Universe size. The work covered 5 and 46 stocks. Nifty 100, BSE 500 — untried.
  • Event-driven signals. Earnings-announcement drift in particular looked plausibly real but never ran at scale.
  • Tighter statistical bounds. Bootstrap-calibrated leakage thresholds (Learning #53) would make borderline results easier to call.

The forecaster might find any of those. Or not. The infrastructure to find out exists now.


Read on:

OpenBracket v0.6 — methodology release-ready; v1 forecaster in active build.