The /audit production book showed 0 profitable asset classes. Research-grade edge existed only in isolated labs and tournament paper books β never reaching the live, policy-clean layer. This was not a "wait longer" problem. It was a research-to-production translation failure plus resolver/label contamination.
production_scanner.py ingests from 11 signal sources β most without walk-forward proof. The lab's own verification engine showed all strategies with Monte Carlo p β 0.45β0.52 β none statistically significant at 95% CI. Yet hundreds of strategies still emitted into production.
Three separate resolvers used 48h, 120h, and 7 days for the same FOREX picks. This meant a pick could be marked as a WIN by one resolver and a LOSS by another β depending on which ran first.
A legacy 0.1 basis point WIN threshold classified spread noise as profits β driving 63% of FOREX wins and 67% of COMMODITY wins to be resolver flicker, not real edge.
paper_trading/strategies/ has 56 strategy files (~150 individual strategies). Only 6 had verified forward proof. The rest generated noise that diluted any real edge.
Single-symbol concentration (e.g., BNBUSDT) inflated strategy performance. A surface dominated by one source/system looked amazing but was fragile or fake.
| Fix | Files | Status |
|---|---|---|
| Unified FOREX TIME_EXIT 48h/120h/7d β 72h across all 7 resolvers |
force_close_breached.py, universal_pick_resolver.py, outcome_resolver.py,check_resolver_health.py, resolve_stale_open_picks.py,orphan_resolver_dryrun.py, prune_active_picks.py |
β Done |
| Source provenance tagging _resolver_version + _resolver_source on every resolved pick |
universal_pick_resolver.py |
β Done |
| Enabled crypto VWAP/Bollinger Changed env defaults from "0" to "1" |
crypto_verified_wf.py |
β Done |
| Theme B contamination documented Root cause already patched (v2, 2026-04-28); historical re-resolution remains |
outcome_resolver.py (analysis) |
β Documented |
| Module | What it does | Size |
|---|---|---|
admissibility_pipeline.py |
Unified 10-step standard replacing 6 fragmented validators. Every strategy must pass pre-registration, purged-embargoed walk-forward, DSR/PBO correction, block bootstrap, regime robustness, forward evidence, and stability checks before capital allocation. | ~480 lines |
cost_model.py |
Per-asset-class cost curves: CRYPTO 13bps, EQUITY 7bps, ETF 3bps, FOREX 2bps, COMMODITY 7bps, FUTURES 5.5bps, BOND 4.5bps. | ~100 lines |
| Block bootstrap | Integrated into admissibility_pipeline.py (Step 6). Preserves temporal dependence. | β |
| Tool | What it measures | Alert threshold |
|---|---|---|
concentration_monitor.py |
Herfindahl-Hirschman Index for symbol and source concentration across all active pick sources | HHI > 0.25 (alert), HHI > 0.20 (warning) |
| Module | How it gates | Impact |
|---|---|---|
emitter_discipline.py |
Blocks picks from KILL/MONITOR_ONLY strategies BEFORE quality gates. | 25 strategies hard-killed, 8 monitor-only, 42 proven |
ETF dual momentum already wired and enabled. CRYPTO gatekeeper confidence-inversion gate at 70 already active. Crypto VWAP/Bollinger strategies now default-enabled.
| Asset Class | Best Pick / Strategy | Evidence | PF | WR | Sample | Money-Ready? |
|---|---|---|---|---|---|---|
| CRYPTO | deepseek_v4 SHORT (BTC/ETH) | AI tournament #1; EAGLE3 SHORT 67% WR vs LONG 33% (n=216) | 3.46 | 57.7% | 273 | PAPER ONLY |
| Why: deepseek_v4 is #1 ranked across 46 models with highest PF. The SHORT directional edge is backed by 216 tournament picks. EAGLE-4 flip is active in scanner. Production CRYPTO PF remains 0.97 β this is PAPER edge, not real money. | ||||||
| EQUITY | BAC, JPM, MSFT, NVDA | EAGLE3 tournament rankings; individual ~64% WR in paper | N/A | ~64% | paper | NO |
| Why NOT: Production EQUITY has PF 0.33, WR 26.9% on n=52. These symbols show edge in paper book but collapse in live production. WATCH, don't trade. | ||||||
| ETF | ETF Dual Momentum (EEM, IWM, GLD) | Only Tier-2 PASS: PF 1.60, WR 53.8%, n=104; WF OOS PF 1.21 | 1.60 | 53.8% | 104 | SHADOW PILOT |
| Why best candidate: ONLY strategy passing Tier-2 admissibility. Simple 12-1 month momentum with SPY-trend guard. Lowest concentration risk. Blocker: forward paper n<30. Shadow-size at 0.2% next step. | ||||||
| FOREX | HARD-DISABLED β WR 33.3%, PF 0.48, n=45. Indistinguishable from random. | NO | ||||
| Why: 63% of wins were spread noise. After fixing TIME_EXIT + threshold, needs complete rebuild. Lift criteria: WR β₯ 55% on n β₯ 150, PF β₯ 1.5. | ||||||
| COMMODITY | COT rehabilitation needed | PF 0.69, WR 40.4%, n=712 but contaminated | 0.69 | 40.4% | 712 | NO |
| FUTURES | INSUFFICIENT β n=13, WR 15.4%. Concentration artifacts. | NO | ||||
| BOND | NO DATA β n=0 live sample. | NO | ||||
| Statistical Test | Implemented? | Enforced at emission? |
|---|---|---|
| Bonferroni Correction | β 10+ implementations | β Via admissibility pipeline |
| Benjamini-Hochberg FDR | β | β Via admissibility pipeline |
| DSR / PBO / SPA | β Now in pipeline | β Step 5 of 10 |
| Monte Carlo (permutation) | β 20+ implementations | β Via admissibility pipeline |
| Block Bootstrap CI | β | β Step 6 of 10 |
| Walk-Forward Validation | β | β Step 3 of 10 |
The LiteLLM proxy at http://localhost:4000/v1 is operational with 16 models. Three new modes tested and verified:
| Mode | Status | Best use |
|---|---|---|
ollama-cloud-large | β Working | Deep strategy research, backtest methodology design |
ollama-cloud | β Working | Brainstorming, quick analysis, swarm synthesis |
ollama-cloud-local | β Working | Safety checks, conservative analysis |
We fixed the plumbing. The FOREX resolver was broken at the pipe joints. The emitter system was a firehose with no pressure valve. Six different validation blueprints for the same house. The statistical arsenal was fully stocked but nobody was required to use it.
Where the real edge lives: AI tournament (deepseek_v4 PF 3.46), ETF dual momentum lab (PF 1.60), crypto VWAP/Bollinger walk-forward. None are money-ready yet β but they now have a clear, enforced path to get there.
What's still needed: Forward paper evidence. The admissibility pipeline gates the door. Shadow-sizing puts strategies through the flight simulator. Only then do they graduate to real capital.
Live Audit Dashboard Β· AI Tournament Β· AI Leaderboard Β· Pick Funnel Β· Research Index