๐Ÿ—œ Git Bloat Lockdown

11 gitignore rounds ยท history rewrite ยท 1,522 drift files evicted ยท 2.3 GB โ†’ 327 MB

Comprehensive guide ยท 2026-05-25

Headline numbers

MetricBeforeAfterฮ”
.git/ server-side size2.3 GB~327 MB (post-GitHub-gc)โˆ’85%
Bloat per training/scan cycle~1 GB/day0 (gitignored)stopped
Tracked-but-now-ignored files1,5220evicted
Deploy-required files preservedโ€”16 (explicit ! negations)verified
Workflow patches neededโ€”1 (train_crypto_models.yml)shipped
Validator checksโ€”8/8 PASSrepeatable

How to recover your clone

Every other PC that has a clone of this repo must follow the recovery doc before the next git pull. The full guide is at:

โ†’ docs/GIT_HISTORY_REWRITE_2026-05-25.md (also viewable via gh api repos/eltonaguiar/findtorontoevents_antigravity.ca/contents/docs/GIT_HISTORY_REWRITE_2026-05-25.md)

TL;DR โ€” pick one

Option A (recommended): move your existing clone aside, fresh-clone the repo.
Option B (keeps untracked): replace .git/ in place with a bare clone, git reset --hard origin/main.
Option C (had unpushed work): git format-patch your commits, follow A or B, then git am the patches on top of new main.

Validator script

After recovery, run scripts/validate-history-rewrite.sh from the repo root. It runs 8 checks and prints PASS/FAIL per check + overall verdict:

  1. Recovery doc present
  2. Local HEAD matches origin/main
  3. git fsck clean (no missing blobs)
  4. Post-rewrite gitignore commits in lineage
  5. No scrubbed bloat paths tracked in index
  6. Gitignore covers the scrubbed paths
  7. No unresolved merge conflicts
  8. git pull --ff-only origin main would succeed

All 8 must PASS. If any FAIL, the script shows the specific fix command.

The 11 gitignore rounds

Each round was driven by a different audit โ€” local heuristics, mirror-clone history scan, Roo on paid-mode-fast, Roo on free-mode-fast, Cursor on the same PC, GitHub Copilot, multi-AI fan-out. Cumulative effect: a defense-in-depth pattern that catches every category of regenerable bloat the repo's CI/training pipelines emit.

RoundTriggerWhat it caught
v2Initial bloat-stop after 2026-05-23 stripped reset only covered 3 pathsproduction_models/*.pkl, dashboard_data.json variants, events.json, tmp_*.json, Kimi_Agent_*.zip, actionlint.exe
v3Top-30 history blob auditclosed_picks.json (+ enriched), audit_edge_review_live.json, snapshots/dashboard_data_*.json
v4Post-scrub blob re-audit โ€” found 1,746 .joblib files (3.6 GB!)enhanced_models/models/*.joblib, closed_picks.archive.jsonl
v5Round-3 working-tree audit (tracked files >1 MB)tmp/, *.bak, *.log, **/*.db, parallel_agent guess_models.pkl, KIMI rf_model.pkl, gatekeeper joblibs, kimi_edge_audit CSVs, baby_strats dupes, audit_full.html
v6Multi-agent + multi-AI audit โ€” biggest single win: hindsight/ had 2,304 timestamped JSONs (71 MB)alpha_engine/data/hindsight/**/*.json, live-state JSON patterns (**/data/active_picks, live_*, *_signals, scan_*, tournament, elimination_state, ml_weights, winner_history, regime_report, outcome_resolver_log), universal ML extensions (*.npy/*.npz/*.h5/*.pt/*.pth/*.onnx/*.ckpt/*.parquet/*.feather), Python/JS hygiene (__pycache__, .pytest_cache, etc.)
v7Roo on paid-mode-fast (ELTONSVLLM_SERVER) independent auditbacktest *.sql + JSONs, test screenshots/artifacts, *.bak-<date> variants, swing model *.pkl, audit quarantine/, kimi_attachments_*/, database/*/, STOCKS/competition file-level adds
v8Comprehensive lockdown after 2 deploy regressions from bare-dir ignoresReplaced bare STOCKS/competition/ with file-glob + explicit !negations for all 7 deploy files. Catch-all: **/*.{pt,pth,h5,npz,docx,pdf,tar,tgz,7z,sql.gz,sqlite,sqlite3}. 1,522 drift files mass-evicted via git ls-files -ci --exclude-standard | xargs git rm --cached.
v9Roo on free-mode-fast (ELTONSVLLM_SERVERFREE): 70 .pkl files still tracked at 53 MB totalml_crypto_predictor/models/*.pkl + workflow patch: train_crypto_models.yml changed git add โ†’ git add -f so the canonical writer still works
v10Peer review (5 AI engines) โ€” preemptive ML-experiment-tracking patternswandb/, mlruns/, .dvc/cache/, **/checkpoints/, **/tb_events/, **/runs/events.out.tfevents.*
v11Cursor independent audit on this PC โ€” found 23 stale-tracked files + .venv/ top-level gap/.venv/ + **/.venv/, alpha_engine/data/{closed_picks_fast,forex_walk_forward,precursor_history,prediction_quality_history,momentum_tracker_picks,theory_portfolios_history,strategy_performance,extensive_backtest_results}.json, pick_funnel_*.json, trade_logs/*.json, skyrocket_detector/*.joblib, **/*.backup, **/*_OLD.html, CHATWITHIT_ARCHIVE_*

The "parent-dir excludes block file-level negations" gotcha

Most expensive lesson of the session

Git docs: "It is not possible to re-include a file if a parent directory of that file is excluded." So if your .gitignore has STOCKS/competition/ (a bare directory exclude), no later !STOCKS/competition/foo.json negation can re-include foo.json. Git stops descending into the directory at the parent-level exclude.

This caused two production-deploy regressions during the session โ€” `super_signals.json` then `competition-stocks.json` both went missing from CI fresh-checkouts. Each time the FTP put failed with "No such file or directory".

The fix pattern

Replace bare directory ignores with file-globs, then add file-level negations:

# DOES NOT WORK โ€” negation is silently ignored
STOCKS/competition/
!STOCKS/competition/forward_picks.json

# WORKS โ€” file-glob + negations
STOCKS/competition/*.json
!STOCKS/competition/competition-crypto.json
!STOCKS/competition/competition-forex.json
!STOCKS/competition/forward_picks.json
# ... etc for every deploy-required file in the dir

The "gitignore vs git add -f" dance for CI writers

Some workflows are the canonical writer for files that should NOT be tracked by devs locally. For example, .github/workflows/train_crypto_models.yml retrains and commits ML model weights every day โ€” but if a dev does git add -A locally, we don't want their stale copies leaking into the index.

Pattern: gitignore the path AND have the canonical writer use git add -f to bypass gitignore for its specific commit. v9 fix in this session shows the template:

# In .gitignore (blocks local dev adds):
ml_crypto_predictor/models/*.pkl

# In the workflow (canonical writer keeps working):
git add -f ml_crypto_predictor/models/ \
           ml_crypto_predictor/production_models/ \
           ml_crypto_predictor/optim/ \
           backtest_results/ updates/data/ || true

Audit of all 20+ workflows that use git add against gitignored paths showed they already correctly use -f for paths that need it. quick-guess-ml.yml, now-scanner.yml, alpha-trend-catcher.yml etc. all follow the pattern.

Critical files preserved via !negations (don't break these)

Each of these is referenced by a deploy workflow (FTP put / scp / cp) and MUST stay tracked. The gitignore explicitly un-ignores them via ! after a broader pattern that would otherwise catch them:

FileWorkflow that needs it
STOCKS/competition/competition-{stocks,crypto,forex,meme_coins,penny_stocks,slim}.jsondeploy-competition-to-site.yml, algorithm-competition-refresh.yml
STOCKS/competition/forward_picks.jsonsame
cross_aggregation/data/super_signals.jsondeploy-competition-to-site.yml
KIMI_RISEOFTHECLAW/data/live_competition.jsondeploy-riseoftheclaw.yml
KIMI_RISEOFTHECLAW/data/signal_tracking.jsonsame
KIMI_RISEOFTHECLAW/data/live_signals_now.jsonsame
KIMI_RISEOFTHECLAW/data/active_picks.jsonsame
KIMI_RISEOFTHECLAW/data/closed_picks.jsonsame
regime_terminal/data/active_signals.jsonsame
crypto_ml_edge/data/active_picks.jsondeploy-riseoftheclaw.yml, deploy-pages.yml
crypto_signal_engine/data/active_picks.jsonsame
quan_engine/data/active_signals.jsonsame
data/ai_tournament/picks_latest.jsonai-tournament-pipeline.yml, audit_dashboard reads this
audit_dashboard/data/ai_tournament_picks_latest.jsonaudit_dashboard reads this
alpha_engine/db/schema.sqldefense against **/*.sql catch-all (file is .sql, not .sqlite โ€” harmless redundancy)

If a deploy starts failing

Diagnostic flow when a deploy workflow fails with "No such file or directory":

  1. Identify the missing file from the error log.
  2. Run git check-ignore -v <file> โ€” this prints the matching .gitignore line if it's being blocked.
  3. If matched, add an explicit !negation right after the matching pattern in .gitignore.
  4. git add -f <file>, commit, push.
  5. The recovery doc has an "open PRs" section explaining how to rebase PRs onto new main if needed.

Open work post-session

StatusItem
๐ŸŸก deferredSecond history scrub for ~3 GB cumulative historical blob weight (Copilot flagged). Bleed is stopped so existing weight just sits; .git may shrink naturally after GitHub's gc cycle runs (~1-2 days).
๐ŸŸก deferredBluesmind paid Claude/Kimi models (require Anthropic-native /v1/messages format, not openai-compat)
๐ŸŸข auto-recoversHuggingFace monthly quota (first of next month), Cloudflare daily quota (UTC midnight), OpenAI quota (top-up)
๐Ÿ”ด needs youRe-clone every other PC before next git pull โ€” recovery doc + validator are the source of truth
๐Ÿ”ด needs youxAI key rotation at console.x.ai (current direct-key invalid; Bluesmind-routed Grok still works)