Backtest Lab

Backtest Strategy Lab

Loading finalized completed markets, the strategy sweep, and Monte Carlo confidence checks.

Scope: all-finalized
Markets: --
Samples: --
Strategies: --
Mode: Balanced deployment
The page can rank by total PnL, win rate, or a blended recommendation score.
Balanced Recommendation
-- Loading recommendation.
Highest Win Rate
-- Loading high win-rate strategy.
Monte Carlo Confidence
-- Bootstrap confidence will appear here after the run loads.
Total Return
--
Cumulative result shown as percent first, with dollar PnL underneath for the fixed $1 stake model.
Win Rate
--
Percentage of profitable trades in the selected strategy.
Avg Return / Trade
--
Expected value per paper trade after fill confirmation, shown in dollars and percent of each $1 stake.
Entry Fill Rate
--
Signals that actually became entries after the market moved 1 cent through the buy limit. This is not the take-profit hit rate.

Portfolio Deployment

Loading the shared-bankroll deployment study.

$100.00
Starting balance for the shared paper portfolio.
--
Ending balance for the 1% per-strategy base case using the top 10 active strategies.
--
Combined PnL% from the shared bankroll in the 1% base case.
--
Best uniform per-strategy allocation discovered in the 1%-10% sweep.
Alloc / Strategy Ending Balance Total PnL% Max DD Monte Carlo +

🧬 Genetic Evolution Lab

Loading genetic evolution results.

--
Generations evolved
--
Total evaluations
--
New compound strategies discovered
--
Peak fitness score

πŸ“Š Portfolio Correlation Analysis

Loading correlation analysis.

--
Diversity Score (1.0 = perfectly uncorrelated)
--
Avg pairwise correlation
--
Max pairwise correlation
--
Long / Short balance

Scientific Readout

Loading methodology.

Balanced pick
Loading.
Highest win-rate pick
Loading.

Bucket Leaders

These cards compare the strongest strategies inside each 5-cent window so we can quickly spot where the cleanest pockets live.

Strategy Ranking

The table below can rank by total PnL, highest win rate, or the balanced recommendation score. Click a row to inspect the actual trade path and equity curve.

Strategy Family Trades Total Return Win Rate Entry Fill Bootstrap +

Equity Curve

Loading equity curve.

Family Summary

This compresses the wider search space so we can see which idea families deserve more data collection versus pruning.

Family Best Total PnL Avg Win

Selected Trades

A quick inspection panel for how the selected strategy actually behaved market by market.

Execution assumption: each trade uses a $1 stake, entries and exits are limit orders, and a trade only counts as filled if the market moves at least 1 cent through the quoted limit after the signal.
Research Whitepaper Β· v2 Β· 2026-04-20

Accuracy-First Portfolio Design for Polymarket BTC 5-Minute Binary Markets

A full sweep of 56,613 strategy variants across 470 finalized markets from Apr 18–20, 2026. This document records what worked, what failed, the data limitations encountered, and the exact research questions that need to be answered next.

Abstract

We evaluated six strategy families (thresholdTakeProfit, thresholdExpiryHold, lateMomentumHold, lateMomentumTakeProfit, atrFavoriteTakeProfit, atrExpiry) across 56,613 parameterised variants using a through-the-limit fill simulator on 470 finalized BTC 5-minute Polymarket markets. A genetic search added 100 evolved compound strategies. Monte Carlo bootstrap (500 iterations) was applied to all shortlisted candidates. Because the live-collected dataset spans only 2.5 days, the system operated in accuracy-first mode: 100% win rate was the primary deployment criterion, with total expected value as a tiebreaker. The result is a 16-strategy paper portfolio (8 long / 8 short) assembled from 8 unique signal archetypes, each confirmed at 100% win rate with bootstrap positiveRunRate = 1.00.

βœ“ What Worked

Terminal momentum + oracle confirmation (lateMomentumHold). In the final 10 seconds, when the market prices one outcome at 80–95 cents AND the Chainlink oracle confirms that BTC is displaced from strike by β‰₯0.025% of strike, the outcome resolves correctly 100% of the time (23 trades, Sharpe 1.83; 19 trades at the 90–95c premium zone, Sharpe 4.92). These represent the clearest edge in the dataset: market probability and oracle direction both point the same way with almost no time left for reversal.
ATR-normalised displacement targeting (atrFavoriteTakeProfit). When the oracle is displaced from strike by β‰₯1.0Γ— the current 5-minute ATR while the market is pricing the favoured outcome at 65–99 cents (45–180 seconds remaining), a +3 cent take-profit fires at 100% win rate across 85 trades (Sharpe 0.90). The ATR normalisation is critical β€” a $50 oracle displacement in a $20-ATR environment is not the same certainty as a $50 displacement in a $150-ATR environment. Requiring the displacement to exceed the ATR filters out noisy entries where the market might still reverse.
Bootstrap robustness. Every 100% win-rate strategy passed 500-iteration Monte Carlo resampling with positiveRunRate = 1.00 and p05 total PnL well above zero. This means the win streak is not a lucky ordering of the same trades β€” it holds on any random 80% subsample of the data.
Zero drawdown. The 100% win-rate strategies recorded maxDrawdown = 0.00 across the full dataset. In a shared-bankroll portfolio at 1% per strategy, this translates to monotonically increasing equity curves with no losing streaks β€” exactly what is needed for compounding during the data-collection phase.

βœ— What Failed or Was Excluded

High-PnL lottery strategies are not deployable yet. The strategy with the highest absolute PnL was up price 5–10c, TTE <120s, TP +30c with $156.78 across 215 trades β€” but its win rate is only 35.8%. It works because the large TP (+30c) on cheap tickets creates a 2.3Γ— profit factor. This is a valid long-term strategy but requires a much larger sample (thousands of markets) to confirm the edge holds across different volatility regimes. Deploying it now would create deep drawdowns and obscure whether the 100%-WR portfolio is working.
Near-duplicate strategy variants inflate the option count. The lateMomentumHold family produced many similar-looking results: price 80–95c, 82–95c, and 85–95c each have 100% WR β€” but the 80–82c band contributed only 2 extra trades out of 23 total. These are not independent strategies. For the portfolio, we enforced a rule: any two selected strategies must differ in at least 2 independent conditions, not just a Β±2 cent price range shift.
thresholdTakeProfit early-entry (TTE 180–300s) missed the 100% threshold. Buying the heavy favourite at the start of the market (TTE 180–300s, price 85–95c, TP +2c) achieved 98.6% win rate across 69 trades β€” excellent, but not 100%. In accuracy-first mode this is excluded. With 30 days of data, this would very likely clear the statistical bar for inclusion in the growth portfolio.
Genetic algorithm produced no verified exotic edges. The genetic search ran 40 generations across 12,000 evaluations and evolved 100 compound strategies. The best result was a 6-trade, 100%-WR fade strategy (buying 15–30c underdogs for +8c profit). Six trades is too small a sample to trust β€” the 95th percentile bootstrap PnL is only $3.29. The genetic approach is promising but needs 30+ days of data to produce statistically significant compound edges.
Orderbook pressure strategies have low fill rates. OB-score-based entry achieved only 59.5% fill rates because the signal fires and then the market moves away before a fill confirmation occurs. These strategies are generating signals correctly but require tighter spread management or a different fill assumption. Not deployable in paper form yet.

⚠ Data Limitations

Short sample: 2.5 days, 470 markets. This is enough to identify the very strongest signals (like terminal momentum at 90–95c) but not enough to separate moderate strategies that differ only in small parameters. A minimum of 7 days (~1,344 markets) is needed to confirm the Β±5c price band edges. Thirty days (~8,640 markets) would unlock exotic genetic combinations with statistical confidence.
Share price data is only available from live collection. Polymarket's CLOB price-history API does not retain data older than a few hours. Historical back-downloads using the Chainlink oracle can reconstruct oracle-based signals perfectly, but share prices require real-time collection. Strategies that rely on share price as a signal are limited to the Apr 18–20 live data only.
~50% of Polymarket BTC 5m markets have zero CLOB trading activity. The system auto-creates markets continuously but many are never traded. The CLOB returns {history:[]} (HTTP 200) for these. They still have valid Chainlink oracle data and are usable for pure oracle-signal strategies β€” but strategies using share price as an input are undefined for these markets. The percentage of no-trade markets varies by time of day and will be studied in historical download analysis.
Only BTC UP/DOWN tested. The same strategy families should apply to ETH and SOL markets. Multi-asset backtesting is a future work item once the data collection system covers all three assets.

πŸ’‘ Key Scientific Learnings from v2

1 Β· Market phase is the primary signal dimension. The 5-minute window has three distinct regimes: early (TTE 60–300s, price discovery phase), mid (TTE 45–180s, displacement window), and terminal (TTE 0–60s, settlement certainty zone). Strategies that respect these phase boundaries outperform strategies that try to trade across the full 300-second window.
2 Β· ATR normalisation of oracle delta is essential. Requiring |delta| β‰₯ N Γ— ATR (rather than |delta| β‰₯ absolute $X) makes strategies robust to different volatility regimes. On a calm day (ATR $30), a $30 displacement is meaningful. On a volatile day (ATR $120), it's noise. The ATR-normalised strategies maintained 100% win rate across the full 2.5-day period which included different volatility levels.
3 Β· Small TP beats large TP in the certainty zone. For high-probability terminal positions (90–95c price, <10s TTE), no take-profit is needed β€” the position expires profitably 100% of the time. For mid-market positions (65–99c, 45–180s), a +3c TP captures the edge without waiting for settlement. Larger TPs on certain entries just introduce timing risk without improving the win rate.
4 Β· Direction-agnostic ("auto") beats forced direction in this data. Strategies with sideMode:"auto" (bet with the Chainlink delta direction) consistently outperformed the same strategy forced to only trade UP or DOWN. The BTC market is not always bullish or bearish β€” it alternates, and the oracle direction is the cleanest real-time signal of which side is "right" in that specific 5-minute window. The deployed 8L/8S portfolio is built from these auto archetypes, split into forced-direction variants purely for portfolio symmetry.
5 Β· Fill rate matters as much as win rate. A strategy with 100% win rate and 12% fill rate (like atrFavoriteTakeProfit with strict parameters) will generate fewer live trades than one with 100% win rate and 42% fill rate (like lateMomentumHold). The portfolio needs a mix of high-fill and low-fill strategies to ensure activity across different market conditions.

πŸš€ Deployed Portfolio v2 β€” 16 Strategies (8L / 8S)

# Side Family TTE Window Price Zone Filter Exit Trades (auto) Sharpe
L1LONGlateMomentumHold0–10sUP 90–95c|Ξ”| β‰₯ 0.025% of strikeExpire194.92
L2LONGlateMomentumHold0–10sUP 80–90c|Ξ”| β‰₯ 0.025% of strikeExpire123.81
L3LONGlateMomentumTakeProfit0–20sUP 80–90c|Ξ”| β‰₯ 0.05% of strike+5c TP111.63
L4LONGatrFavoriteTakeProfit45–180sUP 65–99c|Ξ”| β‰₯ 1.0Γ— ATR+3c TP850.90
L5LONGatrFavoriteTakeProfit45–180sUP 75–99c|Ξ”| β‰₯ 1.25Γ— ATR+2c TP430.78
L6LONGatrFavoriteTakeProfit60–240sUP 65–99c|Ξ”| β‰₯ 1.5Γ— ATR+3c TP322.87
L7LONGatrExpiry0–60sUP 45–99c|Ξ”| β‰₯ 1.25Γ— ATRExpire201.00
L8LONGatrExpiry0–45sUP 45–99c|Ξ”| β‰₯ 1.5Γ— ATRExpire131.89
S1SHORTlateMomentumHold0–10sDOWN 90–95c|Ξ”| β‰₯ 0.025% of strikeExpire194.92
S2SHORTlateMomentumHold0–10sDOWN 80–90c|Ξ”| β‰₯ 0.025% of strikeExpire123.81
S3SHORTlateMomentumTakeProfit0–20sDOWN 80–90c|Ξ”| β‰₯ 0.05% of strike+5c TP111.63
S4SHORTatrFavoriteTakeProfit45–180sDOWN 65–99c|Ξ”| β‰₯ 1.0Γ— ATR+3c TP850.90
S5SHORTatrFavoriteTakeProfit45–180sDOWN 75–99c|Ξ”| β‰₯ 1.25Γ— ATR+2c TP430.78
S6SHORTatrFavoriteTakeProfit60–240sDOWN 65–99c|Ξ”| β‰₯ 1.5Γ— ATR+3c TP322.87
S7SHORTatrExpiry0–60sDOWN 45–99c|Ξ”| β‰₯ 1.25Γ— ATRExpire201.00
S8SHORTatrExpiry0–45sDOWN 45–99c|Ξ”| β‰₯ 1.5Γ— ATRExpire131.89
Note on displayed stats: Trades and Sharpe figures above are from the direction-agnostic (auto) version of each archetype. Each directional variant (UP or DOWN) will fire on approximately half those markets β€” the half where the target side is confirmed as the oracle-favoured direction. Win rate remains 100% because the price-zone condition (e.g. DOWN at 90c) already implies the oracle agrees. Allocation: 1% of total balance per strategy per trade.

πŸ”­ What to Try Next (v3 Research Agenda)

Priority 1 β€” Accumulate 30 days of live data. The current 2.5-day sample supports ~8 unique archetypes with certainty. At 30 days (~8,640 markets), every subtle parameter difference (80–90c vs 80–95c, ATRΓ—1.0 vs Γ—1.25) becomes statistically distinguishable. Target: run node scripts/download-historical.js --days 30 to backfill Chainlink-oracle data, then combine with live-collected share-price data for a hybrid dataset.
Priority 2 β€” Share-price divergence strategies. The live data has second-by-second share prices that the historical data lacks. With 7+ days of live data, test whether share price divergence from oracle expectation (share price says 70c, oracle delta says 90c probability) predicts short-term corrections. This is the "arbitrage between market perception and reality" idea.
Priority 3 β€” Expand thresholdTakeProfit early-entry to deployment. The 98.6% WR early-entry strategy needs ~14 days of data to hit 100% WR confirmation at 95% confidence interval. At that point it adds a third family to the portfolio (early-market phase coverage currently missing). Target condition: 100+ trades with 100% bootstrap positiveRunRate.
Priority 4 β€” Re-run genetic search on 30-day dataset. The genetic algorithm found a 6-trade fade edge that looks promising but can't be validated. With 30 days, the same search should produce 50–200 trade genetic results, making compound strategies testable. Specifically look for: combinations of time-of-day, ATR regime, and share-price-vs-oracle divergence as compound entry conditions.
Priority 5 β€” Time-of-day and session analysis. Historical data shows that ~50% of BTC 5m markets have zero CLOB trading activity. Testing which UTC hours have active markets (and which don't) may allow the paper portfolio to size up during peak-activity hours and reduce exposure during zero-liquidity windows.
Priority 6 β€” High-PnL lottery strategies (deferred to v4). The highest raw PnL strategy (up price 5–10c, TP +30c, $156 PnL, 36% WR) should be validated as a portfolio component only after 7+ days of data confirm the profit factor is stable. At that point, allocating 0.5% per trade to this style provides asymmetric upside with bounded risk β€” the classic Kelly-fraction approach to low-win-rate/high-payout strategies.