11 gitignore rounds ยท history rewrite ยท 1,522 drift files evicted ยท 2.3 GB โ 327 MB
Comprehensive guide ยท 2026-05-25
| Metric | Before | After | ฮ |
|---|---|---|---|
.git/ server-side size | 2.3 GB | ~327 MB (post-GitHub-gc) | โ85% |
| Bloat per training/scan cycle | ~1 GB/day | 0 (gitignored) | stopped |
| Tracked-but-now-ignored files | 1,522 | 0 | evicted |
| Deploy-required files preserved | โ | 16 (explicit ! negations) | verified |
| Workflow patches needed | โ | 1 (train_crypto_models.yml) | shipped |
| Validator checks | โ | 8/8 PASS | repeatable |
Every other PC that has a clone of this repo must follow the recovery doc before the next git pull. The full guide is at:
โ docs/GIT_HISTORY_REWRITE_2026-05-25.md (also viewable via gh api repos/eltonaguiar/findtorontoevents_antigravity.ca/contents/docs/GIT_HISTORY_REWRITE_2026-05-25.md)
Option A (recommended): move your existing clone aside, fresh-clone the repo.
Option B (keeps untracked): replace .git/ in place with a bare clone, git reset --hard origin/main.
Option C (had unpushed work): git format-patch your commits, follow A or B, then git am the patches on top of new main.
After recovery, run scripts/validate-history-rewrite.sh from the repo root. It runs 8 checks and prints PASS/FAIL per check + overall verdict:
git fsck clean (no missing blobs)git pull --ff-only origin main would succeedAll 8 must PASS. If any FAIL, the script shows the specific fix command.
Each round was driven by a different audit โ local heuristics, mirror-clone history scan, Roo on paid-mode-fast, Roo on free-mode-fast, Cursor on the same PC, GitHub Copilot, multi-AI fan-out. Cumulative effect: a defense-in-depth pattern that catches every category of regenerable bloat the repo's CI/training pipelines emit.
| Round | Trigger | What it caught |
|---|---|---|
| v2 | Initial bloat-stop after 2026-05-23 stripped reset only covered 3 paths | production_models/*.pkl, dashboard_data.json variants, events.json, tmp_*.json, Kimi_Agent_*.zip, actionlint.exe |
| v3 | Top-30 history blob audit | closed_picks.json (+ enriched), audit_edge_review_live.json, snapshots/dashboard_data_*.json |
| v4 | Post-scrub blob re-audit โ found 1,746 .joblib files (3.6 GB!) | enhanced_models/models/*.joblib, closed_picks.archive.jsonl |
| v5 | Round-3 working-tree audit (tracked files >1 MB) | tmp/, *.bak, *.log, **/*.db, parallel_agent guess_models.pkl, KIMI rf_model.pkl, gatekeeper joblibs, kimi_edge_audit CSVs, baby_strats dupes, audit_full.html |
| v6 | Multi-agent + multi-AI audit โ biggest single win: hindsight/ had 2,304 timestamped JSONs (71 MB) | alpha_engine/data/hindsight/**/*.json, live-state JSON patterns (**/data/active_picks, live_*, *_signals, scan_*, tournament, elimination_state, ml_weights, winner_history, regime_report, outcome_resolver_log), universal ML extensions (*.npy/*.npz/*.h5/*.pt/*.pth/*.onnx/*.ckpt/*.parquet/*.feather), Python/JS hygiene (__pycache__, .pytest_cache, etc.) |
| v7 | Roo on paid-mode-fast (ELTONSVLLM_SERVER) independent audit | backtest *.sql + JSONs, test screenshots/artifacts, *.bak-<date> variants, swing model *.pkl, audit quarantine/, kimi_attachments_*/, database/*/, STOCKS/competition file-level adds |
| v8 | Comprehensive lockdown after 2 deploy regressions from bare-dir ignores | Replaced bare STOCKS/competition/ with file-glob + explicit !negations for all 7 deploy files. Catch-all: **/*.{pt,pth,h5,npz,docx,pdf,tar,tgz,7z,sql.gz,sqlite,sqlite3}. 1,522 drift files mass-evicted via git ls-files -ci --exclude-standard | xargs git rm --cached. |
| v9 | Roo on free-mode-fast (ELTONSVLLM_SERVERFREE): 70 .pkl files still tracked at 53 MB total | ml_crypto_predictor/models/*.pkl + workflow patch: train_crypto_models.yml changed git add โ git add -f so the canonical writer still works |
| v10 | Peer review (5 AI engines) โ preemptive ML-experiment-tracking patterns | wandb/, mlruns/, .dvc/cache/, **/checkpoints/, **/tb_events/, **/runs/events.out.tfevents.* |
| v11 | Cursor independent audit on this PC โ found 23 stale-tracked files + .venv/ top-level gap | /.venv/ + **/.venv/, alpha_engine/data/{closed_picks_fast,forex_walk_forward,precursor_history,prediction_quality_history,momentum_tracker_picks,theory_portfolios_history,strategy_performance,extensive_backtest_results}.json, pick_funnel_*.json, trade_logs/*.json, skyrocket_detector/*.joblib, **/*.backup, **/*_OLD.html, CHATWITHIT_ARCHIVE_* |
Git docs: "It is not possible to re-include a file if a parent directory of that file is excluded." So if your .gitignore has STOCKS/competition/ (a bare directory exclude), no later !STOCKS/competition/foo.json negation can re-include foo.json. Git stops descending into the directory at the parent-level exclude.
This caused two production-deploy regressions during the session โ `super_signals.json` then `competition-stocks.json` both went missing from CI fresh-checkouts. Each time the FTP put failed with "No such file or directory".
Replace bare directory ignores with file-globs, then add file-level negations:
# DOES NOT WORK โ negation is silently ignored
STOCKS/competition/
!STOCKS/competition/forward_picks.json
# WORKS โ file-glob + negations
STOCKS/competition/*.json
!STOCKS/competition/competition-crypto.json
!STOCKS/competition/competition-forex.json
!STOCKS/competition/forward_picks.json
# ... etc for every deploy-required file in the dir
git add -f" dance for CI writersSome workflows are the canonical writer for files that should NOT be tracked by devs locally. For example, .github/workflows/train_crypto_models.yml retrains and commits ML model weights every day โ but if a dev does git add -A locally, we don't want their stale copies leaking into the index.
Pattern: gitignore the path AND have the canonical writer use git add -f to bypass gitignore for its specific commit. v9 fix in this session shows the template:
# In .gitignore (blocks local dev adds):
ml_crypto_predictor/models/*.pkl
# In the workflow (canonical writer keeps working):
git add -f ml_crypto_predictor/models/ \
ml_crypto_predictor/production_models/ \
ml_crypto_predictor/optim/ \
backtest_results/ updates/data/ || true
Audit of all 20+ workflows that use git add against gitignored paths showed they already correctly use -f for paths that need it. quick-guess-ml.yml, now-scanner.yml, alpha-trend-catcher.yml etc. all follow the pattern.
!negations (don't break these)Each of these is referenced by a deploy workflow (FTP put / scp / cp) and MUST stay tracked. The gitignore explicitly un-ignores them via ! after a broader pattern that would otherwise catch them:
| File | Workflow that needs it |
|---|---|
STOCKS/competition/competition-{stocks,crypto,forex,meme_coins,penny_stocks,slim}.json | deploy-competition-to-site.yml, algorithm-competition-refresh.yml |
STOCKS/competition/forward_picks.json | same |
cross_aggregation/data/super_signals.json | deploy-competition-to-site.yml |
KIMI_RISEOFTHECLAW/data/live_competition.json | deploy-riseoftheclaw.yml |
KIMI_RISEOFTHECLAW/data/signal_tracking.json | same |
KIMI_RISEOFTHECLAW/data/live_signals_now.json | same |
KIMI_RISEOFTHECLAW/data/active_picks.json | same |
KIMI_RISEOFTHECLAW/data/closed_picks.json | same |
regime_terminal/data/active_signals.json | same |
crypto_ml_edge/data/active_picks.json | deploy-riseoftheclaw.yml, deploy-pages.yml |
crypto_signal_engine/data/active_picks.json | same |
quan_engine/data/active_signals.json | same |
data/ai_tournament/picks_latest.json | ai-tournament-pipeline.yml, audit_dashboard reads this |
audit_dashboard/data/ai_tournament_picks_latest.json | audit_dashboard reads this |
alpha_engine/db/schema.sql | defense against **/*.sql catch-all (file is .sql, not .sqlite โ harmless redundancy) |
Diagnostic flow when a deploy workflow fails with "No such file or directory":
git check-ignore -v <file> โ this prints the matching .gitignore line if it's being blocked.!negation right after the matching pattern in .gitignore.git add -f <file>, commit, push.| Status | Item |
|---|---|
| ๐ก deferred | Second history scrub for ~3 GB cumulative historical blob weight (Copilot flagged). Bleed is stopped so existing weight just sits; .git may shrink naturally after GitHub's gc cycle runs (~1-2 days). |
| ๐ก deferred | Bluesmind paid Claude/Kimi models (require Anthropic-native /v1/messages format, not openai-compat) |
| ๐ข auto-recovers | HuggingFace monthly quota (first of next month), Cloudflare daily quota (UTC midnight), OpenAI quota (top-up) |
| ๐ด needs you | Re-clone every other PC before next git pull โ recovery doc + validator are the source of truth |
| ๐ด needs you | xAI key rotation at console.x.ai (current direct-key invalid; Bluesmind-routed Grok still works) |