Ten angles surfaced for durable cross-class edge. 4 already built or partially built, 6 net-new. Highest-value net-new pair: #1 rolling-window profiling × #2 edge-decay heatmap (shared compute kernel, single dashboard widget, no production-strategy risk).
System already has: rolling-30d metrics, walk-forward by_class, hf_decay_watchlist, regime annotations (VIX/BTC.D/DXY), concept_drift KS statistics, TA baseline benchmark grid, walk-forward Tier-1 promotion gate, benchmark-relative 30d excess return per system.
What's missing: rolling-window panels at multiple horizons, edge-decay heatmap, top-N portfolio slice simulator, meta-learning gate-pass predictor, formal peer-review rubric.
PARTIAL~1d work
hf_stats.rolling_metrics already emits a 14-row history but only at one window (window_days=30). Most recent: 2026-04-22 net_sharpe 0.1605 / WR 42.98% / n=3134 / max_drawdown_pct 721.91 / ulcer_index 362.7. Earlier 2026-04-09: net_sharpe 0.3695 / WR 48.05% / n=768 / max_drawdown_pct 105.17. Sharpe halved and ulcer 6×'d in 13d — visible because rolling exists, invisible at multiple horizons because they don't.
ROLLING_WINDOWS_DAYS = [7, 30, 90, 365, 1095] in audit_trail/dashboard_generator.py._compute_rolling_metrics(closed, window_days) helper; emit hf_stats.rolling_by_window[window_days] as a list of timestamp-keyed rows.closed by asset_class, run the same kernel n_classes × n_windows times. Skip windows where n_trades < 30 (statistical floor).LOW. Read-only on closed picks; no production-gate change. CPU adds <3s per dashboard build (verified by extrapolation from current 14-row build).
NOT BUILT~0.5d on top of #1
Pairs naturally with #1: same rolling-window kernel, different cell aggregation. Highlights classes where edge erodes fast (CRYPTO post-quarantine) vs. classes that stay robust (ETF walk-forward consistency 100% on 4 folds).
excess_return[class][window_days] where excess = sum(pnl_pct in window) − benchmark_return(class, window_days). Benchmark already wired in tools/live_market_fetcher.benchmark_return(); commit cf229ea31ba added per-system 30d. Extend for 7/30/90/365/1095.edge_decay_heatmap top-level key in dashboard_data.json.PARTIAL~0.5d
tools/run_tv_backtest_benchmark.py already emits per-symbol PF/Sharpe/WR/MDD/trades across 7 symbols. Sample 7-symbol full run (reports/tv_backtest_benchmark_20260511T173937Z.json): only QQQ:rsi passes robustness≥0.60 AND trades≥5 across the entire grid. That's the cross-symbol variance signal — 6/7 winners are statistical noise.
tools/run_tv_backtest_benchmark.py: CRYPTO=BTC/ETH/SOL/AVAX/MATIC/LINK/DOGE/XRP/ADA/BNB; EQUITY=SPY/QQQ/IWM/AAPL/MSFT/NVDA/META/AMZN/GOOGL/TSLA; etc.std_dev_by_class[class][metric] = standard deviation across symbols. High std on PF + low std on WR = symbol-luck issue; low std on both = class-wide signal.by_class.cross_symbol_std into TA-baseline panel renderer (Opt A, commit 4ea32d227cf).NOT BUILT~2d
System has 30+ named strategies but no canonical feature taxonomy. The peer-research orchestrator (PR 3) keyword-routes spec.entry text to 6 signal handlers (sma_cross / rsi_mr / momentum / mean_reversion_zscore / breakout / buy_and_hold) — that's the closest existing taxonomy. alpha_engine/ml_ranker.py + feature_health.py exist but operate per-pick, not per-strategy-family.
[uses_ma, uses_rsi, uses_volume, uses_sentiment, uses_orderbook, uses_funding, uses_breakout, uses_mean_reversion]. Hand-curated for ~30 strategies, ~1h work.strategy_clusters top-level key. Render as small grouped bar chart per cluster.Hand-curated feature tags are subjective. Consider auto-tagging via LLM (cheap-engine call per strategy with spec text) once #6 v3 spec-translator from the research orchestrator is shipped.
PARTIAL~1d
Heavy infrastructure already exists. tools/live_market_fetcher.py classifies VIX (COMPLACENCY/NORMAL/ELEVATED/PANIC), BTC.D (RISING_STRONG/RISING_MILD/FALLING), DXY (USD_WEAK/STRONG/FLAT), equity regime (RIPPING/GAINING/FLAT/FALLING). audit_trail/quality_gates.py:4111 annotates picks at gate time. regime_validation block exists in dashboard_data.json but currently TRENDING_UP/DOWN/RANGING/HIGH_VOL/CRASH all show total=0 — the regime tag isn't being persisted into closed-pick rows.
audit_trail/quality_gates.py computes regime tags but doesn't persist them to pick.regime_tag. Add persistence step in passes_active_gate.closed.timestamp against the historical regime cache. Need a regime-history file — today live_market_regime.json is point-in-time. Add daily snapshot to tools/live_market_fetcher.py writing to audit_dashboard/data/regime_history/.json .per_regime_metrics[regime][class] = {sharpe, wr, n_trades}. Strategies that only deliver in one regime get fragile=true flag.Concept-drift root-cause report (reports/concept_drift_root_cause_2026-05-11.md) confirmed VIX -44.64% / 30d collapse is the real driver. Most current "edge" was earned in PANIC-vol regime that no longer exists. Without per-regime tagging, every Tier-2 claim is implicitly regime-conditioned.
BUILT~0d (wire-in only)
Walk-forward by_class in alpha_engine/walkforward_validator.py already does locked-window train/test splits. dashboard_data.json::walkforward.by_class: ETF folds=4 consistency=100% oos_sharpe=11.41; EQUITY folds=8 consistency=75% oos_sharpe=6.43; CRYPTO folds=25 consistency=84% oos_sharpe=2.57; FOREX folds=52 consistency=48.1% oos_sharpe=-3.74. Opt B (commit cf4e924744a) wired this into Tier-1 promotion gate.
walk_forward_by_strategy() would isolate which strategies pass/fail OOS within a class. Useful as a kill-list seed.d884694ace2 (2026-05-10 P0 quarantine) as cutoff.walkforward.by_strategy in dashboard payload + table in /audit.NOT BUILT~1.5d
Settles concentration-vs-diversification debate empirically. Current 6 Tier-2 verified systems span PF 1.84 to PF 19.19 — equal-weight is probably wrong, max-weight on PF 19.19 (multi_asset_cot n=130) is probably also wrong.
excess_return_30d_pct (already wired by W4, commit cf229ea31ba).portfolio_topN[class][N] = {sharpe, mdd, n_trades, holding_period_mean, turnover_pct}.PARTIAL~1.5d
hf_stats.concept_drift emits ONE system-wide KS_D=0.313 (vs critical 0.047). That's output-distribution drift on pnl_pct. Input-feature drift is not computed. alpha_engine/feature_health.py + ml_drift_repair_workflow.py exist but are not wired to dashboard_data.json.
1d_realized_vol, 7d_realized_vol, volume_z, funding_rate, oi_chg_24h, btc_d, vix, dxy_chg_30d.audit_dashboard/data/feature_dist_history/.parquet (or .json if no parquet available).feature_drift[class][feature] = {ks_D, ks_critical, alert_on}.Snapshot history needs at least 90d of accumulated data before this is actionable. Start writing snapshots today so the readout is meaningful in Aug.
NOT BUILT~3d
gate_pass = 1 if walk-forward consistency ≥ 60 AND oos_sharpe > 0 (Opt B gate definition), else 0.mlfinlab purged-CV shim in repo (per project_next_phase_integrations_2026_04_22.md).p_gate_pass per system in dashboard_data.json.project_performance_reality.md).MEDIUM. ML estimator over a small training set (n < 200 strategies) can overfit. Use walk-forward gate as the ground truth label only after Opt B (commit cf4e924744a) accumulates ~3mo of demotions for label balance.
NOT BUILT~0.5d (rubric); ongoing labor (reviews)
docs/EDGE_REVIEW_RUBRIC.md with 6-axis scoring 1-5: (1) data-quality, (2) regime-fit, (3) statistical significance (n, p-value), (4) backtest-leakage controls (purge gap, embargo), (5) cross-symbol generalization, (6) cost-honesty (slippage + funding + execution latency modeled).audit_dashboard/data/peer_review/_.json . Surface latest score on /audit system card.| # | Investigation | State | Effort | Risk | Pairs with |
|---|---|---|---|---|---|
| 1 | Rolling-window profiling | PARTIAL | 1d | LOW | #2, #5, #6 |
| 2 | Edge-decay heatmap | NEW | 0.5d | LOW | #1 |
| 6 | Locked-window OOS | BUILT | 0d | LOW | Opt B already shipped |
| 5 | Regime-sensitive weighting | PARTIAL (bug-fix) | 1d | LOW | #1, #8 |
| 3 | Cross-symbol variance | PARTIAL | 0.5d | LOW | Opt A panel |
| 7 | Top-N portfolio Monte Carlo | NEW | 1.5d | MED | W4 (excess_return_30d) |
| 10 | Peer-review rubric | NEW | 0.5d + labor | LOW | #9 |
| 4 | Strategy clustering | NEW | 2d | MED | Research orchestrator v3 |
| 8 | Feature-drift KS | PARTIAL | 1.5d | MED | #5 regime history |
| 9 | Meta-learning gate-pass | NEW | 3d | MED-HIGH | #10, #3 |
Wave 1 (1.5d): #1 + #2. Shared kernel, single dashboard widget, zero production-strategy risk. Surfaces edge durability per class at a glance.
Wave 2 (1d): #5 bug-fix + #3. Persists regime tag on picks (closes the regime_validation.regime_wr_breakdown all-zero rows). Extends Opt A panel with cross-symbol std-dev.
Wave 3 (1.5d): #7. Top-N portfolio simulator. Settles concentration debate empirically before any real-money sizing decision.
Total: ~4d effort for 4 of the 10 angles, no production-gate risk, additive to dashboard only.
cf4e924744a — Opt B walk-forward Tier-1 promotion gate (consistency≥60 + sharpe>0) — satisfies #6 wiring requirementcf229ea31ba — W4 benchmark-relative 30d return per system — foundation for #2 + #74ea32d227cf — Opt A TA-baseline panel — surface for #382a34bc0fdb — tools/live_market_fetcher.py — foundation for #5 + #8reports/concept_drift_root_cause_2026-05-11.md — documents WHY #5 + #1 matter (VIX -44.64% / 30d regime collapse since 2026-04-22)audit_dashboard/data/dashboard_data.json — payload contractdocs/PERFORMANCE_CHARTER.md — tier thresholdsreports/tradingview_backtest_benchmark_2026-05-11.md — Opt A/B originreports/financial_datasets_edge_recommendations_2026-05-10T07Z.md — W4 originreports/concept_drift_root_cause_2026-05-11.md — T4 conclusionreports/quarantine_verification_2026-05-11T19Z_16h_plus.md — 37h post-quarantine checkfeat/audit-dashboard-enhancements-hermes-2026-05-09Generated 2026-05-11. Research surface — not financial advice. See /audit/ for live dashboard.