SUPREME PLAN — 90-Day Execution Roadmap (May 15 – Aug 15, 2026)
Date: 2026-05-15
Status: Living Document | Author: Grok (Senior Quant) + Subagent Fleet
Goal: Turn the platform into a focused, edge-compounding machine capable of supporting real-money allocation in at least one diversified asset class within 90 days.
Core Principle:
One production-grade, diversified asset class with trustworthy forward results beats seven half-baked ones. Cotton (CT=F) is promising but one symbol. We must diversify or explicitly cap risk.
Latest Update (2026-05-15, post deep_dive_cotton_2026-05-15.md + PRs #1060/#1061/#1065/#1068 + swarm)
- The Phase 2-D kill of CT=F was data-flawed: the panel cited 8.3% WR / n=12, but the actual PRE-kill 12 picks resolved to 66.7% WR / PF 3.50 / +32.30% in the resolver-v2 ledger.
- Verdict: HOLD_KILLED_PENDING_DATA (corrected from an earlier REVIVE_SHADOW draft after 3-engine swarm review). Raw n=41 collapses to effective independent n ≈ 3-4 — 39/41 picks share weekly COT signals. 100% SHORT in a downtrend = regime-survivorship risk. No revival until: effective-n ≥ 20 by signal-cluster, regime decomposition passes, friction-adjusted DSR, survivorship check. See
reports/deep_dive_cotton_2026-05-15.md.
- The Phase 2-D panel's cited sample sizes do not reconcile with the resolver-v2 ledger at all (GC=F cited n=91 → ledger n=3; CT=F/KC=F cite identical "n=12 WR 8.3%"). Kill verdicts are not reproducible. See
reports/phase2d_kill_audit_2026-05-15.md.
- Real root cause (swarm + peer consensus): kill-threshold mis-calibration — small-n rolling-window kills with no statistical guardrail; thresholds tuned on in-sample dead strategies → a self-reinforcing ratchet that destroys evidence faster than it accumulates.
- M-055 shipped (PR #1068):
audit_trail/kill_gate.py — a statistical kill-gate (min-n + binomial p-value + Wilson 95% CI). Replayed against the Phase 2-D kills it blocks 4 of 5 (CL=F/CT=F/KC=F on min-n, SI=F on non-significance). Opt-in sidecar; wire-in is the next step.
- Correct sequence: M-055 (done) → wire kill_gate into quality_gates → M-056 incubator → M-057 decouple score_booster from crypto-gating → M-058 auto-spawn last.
- Priority shift: EQUITY (VIX-regime 12-1 momentum on clean large-caps) + ETF (sector rotation + VIX gate) as near-term primary pilots. COMMODITY secondary (data hygiene + diversify beyond CT=F; cotton stays HOLD_KILLED). BOND scanner gap (2/14 symbols, TLT 75% concentration) is genuinely actionable.
- Meta-lesson: 5 senior-quant AIs converged on "cotton = real-money pilot" — all wrong, all reading the same stale inputs. Convergence ≠ verification.
1. Current State Snapshot (Verified, Not Trusted from Old Reports)
Using latest dashboard_data.json (generated 2026-05-15) + recent audits + DB forensics:
Top Performers (Live / Recent Closed):
- COMMODITY — Strongest risk-adjusted edge (PF often 2.0+ in recent windows). Driven heavily by cotton COT positioning. n is reasonable but universe is narrow.
- EQUITY — Best diversified candidate. Decent sample size, improving metrics in some regimes.
- ETF — Respectable win rates in pockets but low overall n. Not yet scalable.
- CRYPTO — Highest volume by far, but quality is poor. Many toxic systems and low-liquidity symbols drag the class.
- FOREX & BOND — Statistically weak or insufficient sample. Not ready for material capital.
Critical Systemic Gaps (The Real Blockers):
- Outcome tracking is still terrible (at_signal_outcomes coverage << 5% in many periods; paper_trades nearly empty).
- DB freshness and ghost rows remain a problem (DB Freshness Guardian is partially built but not fully enforced).
- Symbol universe coverage per class is poorly documented and likely contains too many low-liquidity names (especially crypto/penny).
- No systematic pruning/inversion process for failing strategies.
- Execution realism (slippage, position sizing, concentration, drift breakers) is still mostly scaffolding.
2. 90-Day Strategic Priorities (Ranked)
Priority 1: Foundation (Weeks 1-3) — Do Not Skip
- Make DB Freshness Guardian production-grade with real GHA enforcement (fail on RED).
- Force reliable outcome tracking (
at_signal_outcomes + paper_trades + paper_portfolio_daily).
- Run full symbol universe + liquidity audit per major asset class (free data only: yfinance, CoinGecko, FRED, CME delayed, etc.).
- Clean up test suite to <5 failures on main.
Priority 2: COMMODITY Pilot — Diversify or Explicitly Limit (Weeks 1-12)
Primary bet for the 90 days.
- Cotton (CT=F) COT edge is real but one symbol = unacceptable concentration for serious money.
- Must expand to 4–6 other liquid commodity futures with decent history (GC gold, SI silver, CL crude, NG natural gas, ZS/ ZC grains, etc.).
- Deliver full COT lag-correction + MATCH gate + friction-adjusted DSR (fix the 0.08 vs 0.0008 bug).
- Wire real position sizing, slippage model, kill switches, and concentration limits (Phase J safety modules from the clean cherry-pick).
- Run disciplined paper pilot with 0.5–1% portfolio risk per trade and 30–60 day track record.
Go/No-Go at Day 60: If we cannot diversify beyond cotton with positive expectancy after costs, treat COMMODITY as a high-conviction single-strategy sleeve with hard limits rather than a full asset class.
Priority 3: EQUITY — Build the Broad Book (Weeks 4-12, Parallel)
- Strongest candidate for a diversified, multi-strategy book.
- Focus on earnings surprise + post-earnings drift (PEAD) using free SEC EDGAR + Yahoo data.
- Add VIX + yield curve regime filters.
- Aggressive pruning of low-quality equity strategies (delete or invert).
Priority 4: ETF — Lower-Effort Parallel Track
- Sector rotation + macro regime models.
- Only scale if COMMODITY pilot shows real promise.
Priority 5: CRYPTO — Major Cleanup, Not Expansion
- Shrink universe to top 20–30 liquid coins only.
- Kill or quarantine the toxic systems dragging the class (many identified in previous audits).
- Do not allocate new capital until quality improves dramatically.
De-prioritize (Maintenance Mode)
- FOREX & BOND: Data collection and small research only. Not ready for production capital in this 90-day window.
3. Per-Asset-Class 90-Day Plans — Full Coverage Complete
All asset classes in the findtorontoevents.ca/audit dropdown now have dedicated, subagent-generated 90-day plans (using the money-maker-continual-improve skill). Full set:
- COMMODITY — Strongest surface edge but 73% concentration on CT=F + COT over-emission falsification (true n tiny). Must diversify or de-risk heavily.
- EQUITY — Best diversified candidate. Powerful unshipped VIX-regime 12-1 momentum (backtest PF 5.37). Production universe too narrow + speculative.
- ETF — High-ROI activate-now opportunity (sector rotation + VIX gate = Tier-1 backtests). Low effort, good diversification.
- CRYPTO — Highest volume, lowest quality. Requires brutal universe shrink + toxic system quarantine.
- FOREX — Sub-floor performance. Data improvements + carry/momentum research only for 90 days.
- BOND — n=11, PF 0.66, extreme TLT concentration. De-prioritize for 90 days (research/opt-in sidecar only).
- FUTURES — n=0 in the FUTURES tile (commodity futures routed to COMMODITY via classification bug). Merge financial futures into unified CTA or deprecate the separate class.
- Penny Stocks + Meme Coins — PF 0.19–0.50, WR 7–16%, massive drag. Full quarantine (0% allocation, research-only, no dedicated sleeve).
Additional deep-dive subagents are queued for FOREX, BOND, FUTURES, and Low-Quality Equities (Penny Stocks + Meme Coins) — the categories explicitly noted as weak or missing in the /audit asset class dropdown.
Master Summary View (Current Assessment):
- COMMODITY: Best edge but cotton concentration is the #1 risk to solve. Diversify or cap.
- EQUITY: Best "broad" candidate for a multi-strategy book. Focus here for diversification.
- ETF: Marginal — good WR in spots, low n. Worth light parallel work.
- CRYPTO: Volume king, quality problem. Needs brutal universe reduction.
- FOREX / BOND: Not investment-grade yet. Data only.
4. Success Metrics (End of 90 Days — Measurable)
- At least one asset class (target: diversified COMMODITY or strong EQUITY) with:
- 60+ day trustworthy paper trading track record
- PF > 1.6 and Sortino > 1.8 after realistic costs
- Max drawdown < 12–15%
- Clear, documented edge with known failure modes
- Outcome tracking coverage > 40% for new signals
- Main CI green (< 5 failures)
- Symbol universe + liquidity audit complete and documented for top 4 classes
5. Immediate Next Actions (This Week)
- Finish and enforce DB Freshness Guardian.
- Fix outcome tracking pipeline (resolver + paper trading).
- Launch symbol universe audit scripts (free data sources).
- Clean cherry-pick of Phase J safety modules (d3995f5ac4d) for COMMODITY pilot.
- Begin COT lag + MATCH productionization (fix friction rate bug).
- Start subagent-generated per-class deep dive reports (already in flight).
This document is the single source of truth for the next 90 days. All other M-IDs in the old master plan are de-prioritized unless they directly support the above.
Detailed per-asset-class execution plans (with specific symbols, data sources, backtest frameworks, and pruning lists) will be linked here as the subagents complete their work.
Generated with the money-maker-continual-improve skill + live subagent fleet.
Source: reports/SUPREME_PLAN_90days.md ·
90-Day Plan — May 15 2026 Edition · generated by tools/generate_90day_plan_pages.py