AI Hedge Fund Simulation — Full Methodology & Results

Data Source

All picks are queried from tournament_picks table (MySQL: ejaguiar1_stocks, 3,149 rows, 34 AI models, 9 asset classes). Only forward-test OPEN picks are used. Synthetic/backtest data (SYNTHETIC_SEED_ENRICHED, BACKTEST_VERIFIED) and KILLED personas (ml_pattern, relative_strength, dividend_compound) are excluded. FOREX is blocked by kill gate (57.3% WR, -0.39% avg PnL).

Confidence Assignment

Method	Description	Picks
PERSONA WR	When persona has n≥20 resolved picks, confidence = persona win rate	6 picks (PG 64%, SOL 65%, TLT/SPY/GLD/SHY 62.5%)
MODEL-REPORTED	Pick comes from model that reported its own confidence (HIGH/MEDIUM/LOW string or 0-1 float)	7 picks (MSFT 30%, XOM 30%, CL=F 80%)
IMPUTED	No WR and no model-reported confidence — coin-flip assumption	9 picks (SI=F 50%, PENNY/FUTURES 0%)

Ranking Formula

Composite Score = Confidence × WR × RR × ln(n+1), normalized per asset class. Picks with n=0 score zero by definition.

Debate Process

Round 1 (7 models): Risk Manager + Portfolio Manager + Cerebras GPT-OSS-120B + DeepSeek V4 + KiloCode + Kimi + Cursor — debated which picks are safest, which to veto. Produced consensus top 5 and systemic issue list.

Round 2 (2 models): Multi-Asset Allocator + Financial Data Architect — expanded to all 9 asset classes, identified IPO/mutual fund infrastructure gaps.

Round 3 (3 models): Quant Researcher (EV/Sharpe/Kelly) + Behavioral Analyst (market narrative) + Hedge Fund PM ($500k AUM allocation). Produced 8-position risk-parity portfolio.

Round 4: IPO Lockup Expiry Strategy — SHORT 30 days before 180-day lockup expiry. Currently data-starved (needs live SEC EDGAR scraper).

Rounds 5-14 (10 agents): Pick-by-pick review, cross-round pattern analysis, devil's advocate audit, entry criteria standardization, data gap ranking, statistical edge recalculation, model attribution scorecard, final executive synthesis.

2. Results Per Asset Class

EQUITY 3 TOP PICKS

Rank	Pick	Direction	Entry	WR	n	RR	Conf	Status
1	META	LONG	$620.71	60%	124	1.7	50%	PROVEN
2	PG	SHORT	$167.37	64%	164	1.5	64%	VERIFIED
3	GOOGL	LONG	$186.63	60%	124	2.5	30%	ESTIMATED

Best equity edge: PG SHORT — only verified-WR equity pick (64%, n=164). META LONG is the AI monetization flywheel play. GOOGL has highest RR (2.5) but low confidence (30%). Gap: UEPS fundamental screen shows ADBE (Score 0.839) as theoretically highest-quality but has 0 forward-test data in tournament_picks.

CRYPTO 2 TOP PICKS

Rank	Pick	Direction	Entry	WR	n	RR	Conf	Status
1	SOLUSDT	LONG	$157.39	65%	23	2.1	65%	VERIFIED
2	AVAXUSDT	SHORT	$22.83	65%	23	1.8	65%	SMALL N

SOLUSDT LONG is the only crypto pick with verified WR (65%, n=23). vol_arb persona has only 23 resolved picks — statistically insufficient (95% CI: ±20%). Risk: Crypto shorts (AVAX, BTC, ETH) conflict directionally with SOL long — if risk-on returns, all shorts get run over.

ETF 2 TOP PICKS

💡 ELI5: SPY is the whole US stock market. Shorting it means you think stocks will go DOWN. GLD is gold — people buy it when they're scared about inflation or war. Together they're betting that stocks fall and gold rises, which is the "stagflation" playbook.

Rank	Pick	Direction	Entry	WR	n	RR	Conf
1	SPY	SHORT	$726.80	62.5%	124	1.9	62.5%
2	GLD	LONG	$257.93	62.5%	124	1.2	62.5%

SPY SHORT + GLD LONG = textbook stagflation pair. Both anchored by risk_parity persona (n=124). SPY SHORT is the highest composite score in the entire book. GLD has weakest RR (1.2) but diversification value.

BOND 3 PICKS

Rank	Pick	Direction	Entry	WR	n	RR	Conf
1	TLT	LONG	$87.66	62.5%	124	1.8	62.5%
2	SHY	SHORT	$82.36	62.5%	124	1.6	62.5%

TLT LONG is the strongest consensus pick (7/7 models). SHY SHORT + TLT LONG = curve steepener. Risk: Bond Sharpe ratios inflated by low vol assumption (0.5% daily). Real bond strategies do not sustain Sharpe 19.

COMMODITY 3 PICKS

Rank	Pick	Direction	Entry	WR	n	RR	Conf
1	SI=F	SHORT	$34.93	62.5%	124	1.4	50%
2	CL=F	SHORT	$68.25	65%	23	1.1	80%

CL=F SHORT has near-zero EV (0.05 risk units) and RR=1.1 — risk/reward barely above breakeven. Gap: COMMODITY pipeline may be broken — top systems (multi_asset_cot, PF=4.72) have n=0 in resolved DB. All commodity picks are SHORT — check for regime bias. SI=F SHORT replaced CL=F in the adjusted portfolio.

PENNY & FUTURES BLOCKED

WR=0%, n=0 for all 6 picks. MVST, KULR, QBTS (penny stocks) and ES=F, GC=F, CL=F (futures) have zero resolved data. No statistical basis for inclusion. These should be removed from active pick list until n≥50 resolved trades. Currently listed as PAPER ONLY — no real capital allocation.

FOREX BLOCKED

Kill gate active: 57.3% WR, -0.39% avg PnL, 253 resolved picks. Statistical trap confirmed — many small wins, occasional large losers (3.2:1 loss-to-win ratio). 63% of FOREX wins are 1-basis-point "resolver flicker." Zero allocation until asymmetric TP/SL fix is validated.

3. What's Broken (P0 — Must Fix Before Real Money)

4. Areas for Improvement

findtorontoevents.ca/audit/

findtorontoevents.ca/audit/ai-tournament.html

Overall System

5. 1-Week Prediction

6. Key Files

#	Issue	Severity	Fix
1	ML confidence INVERTED — 0.85-0.90 band has 20% WR	CRITICAL	Flip scoring: high confidence → sell signal. Recalibrate against realized outcomes.
2	No n-threshold gate — 0-data personas generating live signals	CRITICAL	Require ≥50 resolved trades per source before any signal passes.
3	PENNY/FUTURES with 0 resolved data	CRITICAL	Drop from active pick list. Paper-track until n≥50.
4	CL=F duplicated at 2 prices ($68.25 / $73)	HIGH	Keep commodity entry, drop futures entry.
5	FOREX resolver bug — 63% wins are 1bp "flicker"	HIGH	Already blocked by kill gate. Fix asymmetric TP/SL before re-enabling.
6	COMMODITY pipeline broken — top systems have n=0 in resolved DB	HIGH	Investigate table mismatch. multi_asset_cot PF=4.72 invisible to OOS validator.

Pick	Direction	Prediction	Confidence	Rationale
TLT LONG	BULLISH	WIN	HIGH	7/7 model consensus. Cleanest macro expression.
SPY SHORT	BEARISH	WIN	HIGH	Foundation of risk-off book. Market correction continuing.
GLD LONG	BULLISH	WIN	HIGH	Stagflation hedge thesis intact.
PG SHORT	BEARISH	WIN	MEDIUM	Only verified-WR equity pick (64%).
CL=F SHORT	BEARISH	LOSS	LOW	Near-zero EV. Oil geopolitical risk.
SHY SHORT	BEARISH	LOSS	LOW	Coin flip. Rate cut priced in.

File	Content
`reports/AI_HEDGE_FUND_SIMULATION_EXECUTIVE_SUMMARY_2026-05-24.md`	Complete 159-line executive summary
`audit_dashboard/hedge_fund_simulation_20260524.html`	3-round debate results (7 models, per-agent insights)
`audit_dashboard/curated_picks_20260524.html`	Top 3 picks per asset class
`updates/2026-05-24-cross-asset-statistical-analysis.md`	Per-pick EV/Sharpe/Kelly, correlation matrix
`reports/CONFIDENCE_METHODOLOGY_2026-05-24.md`	3 confidence methods, thresholds, calibration gaps

⚠️ Not financial advice. Educational/research simulation only. Zero real money deployed. AI Tournament · Curated Picks · Updates

🤖 AI Hedge Fund Simulation — Full Report

1. Methodology