# Git history rewrite — 2026-05-25 18:00 UTC — Recovery guide for every clone

**Short version:** the entire git history of `main` and `gh-pages` was rewritten on 2026-05-25 ~18:00 UTC. Every commit on those branches now has a new SHA. **If you have a clone of this repo on any other machine — including CI runners, dev VMs, your laptop — you MUST follow this recovery guide before your next `git pull`, or your local copy will get into an inconsistent state.**

## TL;DR

Pick the recovery option that matches your situation:

| Situation | Recovery |
|---|---|
| Clean clone, no local changes you need | **Option A (re-clone)** — fastest, safest |
| Have uncommitted local changes | **Option B (preserve + reset)** |
| Have local commits not yet pushed | **Option C (cherry-pick onto new main)** |
| You're a CI runner / fresh worktree | Just delete the cached clone — next workflow run will re-clone |

---

## What changed and why

| Metric | Before | After | Δ |
|---|---|---|---|
| `.git/` size | 2.3 GB | **327 MB** | −85% |
| Working tree | ~unchanged | unchanged | — |
| Commit SHAs on `main` | original | **all rewritten** | 100% changed |
| Commit SHAs on `gh-pages` | original | **all rewritten** | 100% changed |
| Code content | — | identical | nothing lost |
| Other branches (PRs, feature) | original | unchanged | not rewritten |

**The rewrite removed these large/regenerable files from history** (they were never reachable from the working tree — they're now also gitignored so they won't come back):

- `ml_crypto_predictor/production_models/*.pkl` (~316 MB / 14 files, retrained periodically)
- `ml_crypto_predictor/enhanced_models/models/*.joblib` (~3.6 GB / 1,746 files, retrained periodically)
- `alpha_engine/data/closed_picks.json` + `closed_picks_enriched.json` + `closed_picks.archive.jsonl` (~60 MB combined, regenerated by pipeline)
- `audit/data/dashboard_data.json`, `audit_dashboard/data/dashboard_data.json`, `audit_trail/data/dashboard_payload.json` (regenerated hourly by `audit-dashboard.yml`)
- `events.json`, `next/events.json` (~21 MB each, scraped feeds)
- `tools/data/snapshots/dashboard_data_*.json` (archived snapshots)
- `tools/data/audit_edge_review_live.json` (~20 MB)
- `edge_analysis_results.json`, `data/live_picks.db`, `tmp_live5.json`, `actionlint_dir/actionlint.exe`, `Kimi_Agent_*.zip`

**These are still on disk in production / your working tree** — only their *historical* committed copies are gone. They're regenerated by the existing CI workflows and scrapers.

**Why**: `.git/` had grown to 2.3 GB (was 19 GB before the prior 2026-05-23 stripped reset) because regenerable binary files (ML model weights, JSON snapshots) were being committed every time the pipeline retrained / re-scraped. New `.gitignore` entries stop the bleed; the history rewrite recovered the 2 GB of accumulated bloat. The repo is now ~7× smaller to clone.

---

## Option A — Re-clone (recommended for most cases)

Fastest and lowest-risk if you don't have uncommitted local work.

```bash
# 1. Move your existing clone aside (don't delete yet — recovery insurance)
cd ~                     # or wherever the parent of your repo is
mv findtorontoevents_antigravity.ca findtorontoevents_antigravity.ca.OLD-pre-2026-05-25-scrub

# 2. Fresh clone
git clone git@github.com:eltonaguiar/findtorontoevents_antigravity.ca.git
cd findtorontoevents_antigravity.ca

# 3. Verify you got the new history
git log -1 --format='%h %s'
# expected: a recent commit from origin/main with the rewritten SHA

# 4. Once you're sure nothing was lost from the OLD copy, delete it
# (recommended: wait 24h before this step — you may discover untracked files
# that didn't come over)
# rm -rf ~/findtorontoevents_antigravity.ca.OLD-pre-2026-05-25-scrub
```

**Disk saved**: the *final* fresh clone will be **~327 MB** once GitHub's server-side gc runs. **But for the first 24-48 hours after the rewrite, clones will still be ~2 GB** because GitHub keeps both old and new packfiles until its next internal maintenance cycle. The full ~85% shrink shows up later, automatically — no action needed. Run `git gc --prune=now` after cloning to compact your local copy immediately.

---

## Option B — Reset existing clone (preserve local untracked + uncommitted changes)

Use this if you have working-tree changes or untracked files you don't want to lose.

```bash
cd /path/to/your/findtorontoevents_antigravity.ca

# 1. Save any uncommitted work
git stash push -u -m "pre-scrub-recovery $(date +%F)"

# 2. Replace .git with a fresh copy from origin
mv .git .git-pre-scrub-$(date +%s)         # save old .git for safety
git clone --no-checkout --bare git@github.com:eltonaguiar/findtorontoevents_antigravity.ca.git /tmp/_recover-gitdir
mv /tmp/_recover-gitdir/* .git/            # or: mv /tmp/_recover-gitdir .git, then remove unneeded files

# 3. Convert bare → normal repo so worktree commands work
git config --bool core.bare false
git symbolic-ref HEAD refs/heads/main

# 4. Sync working tree to the new HEAD (untracked files preserved, tracked files updated)
git reset --hard origin/main

# 5. Restore stashed work if you had any
git stash pop                              # may have conflicts; resolve normally

# 6. Verify your work is intact
git log --oneline -5
git status
```

If step 2 is fiddly, **Option A is genuinely easier** — just move the directory aside and clone fresh.

---

## Option C — You had local commits not yet pushed

If you had commits on a feature branch or on `main` that hadn't been pushed:

```bash
# 1. Save the SHAs of your unpushed commits (their old parent commits no longer exist on origin)
git log --oneline origin/main..HEAD > /tmp/my-unpushed-commits.txt
cat /tmp/my-unpushed-commits.txt

# 2. Save patches of each commit
git format-patch origin/main --output-directory /tmp/my-patches/

# 3. Follow Option A or Option B to get the rewritten history

# 4. Re-apply your commits onto the new main
git checkout -b my-recovery-branch
git am /tmp/my-patches/*.patch

# 5. Push your branch
git push -u origin my-recovery-branch

# 6. Open a normal PR against new main, merge as usual
```

If your local commits touched files that were scrubbed from history (model weights, dashboard JSON, etc.), the `git am` will skip those file changes — they're gitignored now and shouldn't be in commits anyway.

---

## CI / GitHub Actions runners

Workflow runs that started **before 18:00 UTC 2026-05-25** may have cached `actions/checkout` state with old SHAs. New runs after that time will fresh-clone from origin and pick up the new history automatically. No action needed except:

- If you see a CI run with `git fetch` failures like `error: missing blob abc123`, cancel + retry — the next run will start fresh.
- If a workflow uses a fixed-SHA reference (e.g. `actions/checkout@<sha>`) those still work — only OUR repo's SHAs changed, not GitHub's action SHAs.

---

## Open PRs

Any PR that branched off `main` before 18:00 UTC 2026-05-25 has stale base SHAs. To recover a PR:

```bash
# In your PR branch
git fetch origin
git rebase --onto origin/main <old-main-merge-base>
# Resolve any conflicts (likely none — the rewrite only removed gitignored files)
git push --force-with-lease
```

GitHub may auto-detect and offer to "Update branch" on the PR page — that's the same operation.

---

## Verification — how to confirm you're on the new history

### Quick one-shot validate script

Paste this into a terminal at the repo root. It runs 8 checks and prints PASS / FAIL per check + an overall verdict. **All 8 must PASS.**

```bash
#!/usr/bin/env bash
# Save as /tmp/validate-rewrite.sh, chmod +x, run inside the repo root.
cd "$(git rev-parse --show-toplevel 2>/dev/null)" || { echo "FAIL: not in a git repo"; exit 1; }

pass=0; fail=0
ok()   { echo "  ✓ PASS — $1"; pass=$((pass+1)); }
bad()  { echo "  ✗ FAIL — $1   (expected: $2)"; fail=$((fail+1)); }

echo "=== Git history rewrite recovery validator (2026-05-25) ==="
echo

# 1) Recovery doc is present (proves you fetched after the rewrite)
test -f docs/GIT_HISTORY_REWRITE_2026-05-25.md \
  && ok "recovery doc present" \
  || bad "recovery doc missing" "docs/GIT_HISTORY_REWRITE_2026-05-25.md should exist on main"

# 2) Local HEAD matches origin (you've actually fetched the new history)
git fetch origin main --quiet 2>/dev/null
LOCAL=$(git rev-parse HEAD 2>/dev/null)
REMOTE=$(git rev-parse origin/main 2>/dev/null)
[ -n "$LOCAL" ] && [ "$LOCAL" = "$REMOTE" ] \
  && ok "HEAD matches origin/main ($LOCAL)" \
  || bad "HEAD diverges from origin/main" "local=$LOCAL  remote=$REMOTE — run git pull or git reset --hard origin/main"

# 3) No missing-blob corruption
fsck_out=$(git fsck --no-progress 2>&1 | grep -iE 'missing blob|missing tree|missing commit' | head -3)
[ -z "$fsck_out" ] \
  && ok "fsck clean (no missing blobs/trees)" \
  || bad "fsck reports missing objects" "$fsck_out"

# 4) Post-rewrite gitignore commits are in lineage
matches=$(git log --oneline | grep -cE 'gitignore v4|gitignore regenerable bloat|gitignore v3')
[ "$matches" -ge 3 ] \
  && ok "post-rewrite gitignore commits present ($matches found, expected ≥3)" \
  || bad "post-rewrite gitignore commits missing" "expected to see commits 'chore: gitignore regenerable bloat', 'gitignore v3', 'gitignore v4'"

# 5) Scrubbed paths NOT tracked
leaked=$(git ls-files \
  | grep -cE '(^|/)ml_crypto_predictor/production_models/.*\.pkl$|(^|/)ml_crypto_predictor/enhanced_models/models/.*\.joblib$|^audit/data/dashboard_data\.json$|^events\.json$|^next/events\.json$|^data/live_picks\.db$|^alpha_engine/data/closed_picks\.json$|^alpha_engine/data/closed_picks_enriched\.json$|^alpha_engine/data/closed_picks\.archive\.jsonl$|^tools/data/audit_edge_review_live\.json$|^tools/data/snapshots/dashboard_data_.*\.json$')
[ "$leaked" -eq 0 ] \
  && ok "no scrubbed bloat paths tracked in index" \
  || bad "$leaked bloat path(s) still tracked" "run: git ls-files | grep production_models — those should not exist"

# 6) Scrubbed paths ARE gitignored
gi_lines=$(grep -cE '^ml_crypto_predictor/production_models/\*\.pkl|^ml_crypto_predictor/enhanced_models/models/\*\.joblib|^audit/data/dashboard_data\.json' .gitignore 2>/dev/null)
[ "${gi_lines:-0}" -ge 3 ] \
  && ok "gitignore covers the scrubbed paths" \
  || bad "gitignore missing scrub-path entries" "expected ≥3 of the bloat patterns in .gitignore — pull origin/main again"

# 7) Working tree clean enough to pull cleanly (no merge conflict markers)
conflicts=$(git ls-files --unmerged 2>/dev/null | wc -l)
[ "$conflicts" -eq 0 ] \
  && ok "no unresolved merge conflicts" \
  || bad "$conflicts unmerged path(s)" "resolve with git status / git mergetool"

# 8) A `git pull --ff-only` would succeed (proves no history divergence)
git fetch origin main --quiet 2>/dev/null
if git merge-base --is-ancestor HEAD origin/main 2>/dev/null; then
  ok "git pull --ff-only origin main would succeed (no divergence)"
else
  bad "git pull --ff-only origin main would fail" "your local main has commits not in origin OR diverged history — see Option B in this doc"
fi

echo
total=$((pass+fail))
if [ "$fail" -eq 0 ]; then
  echo "RESULT: ✓✓✓ ALL ${total} CHECKS PASSED — recovery complete"
  exit 0
else
  echo "RESULT: ✗ ${fail}/${total} CHECKS FAILED — see fixes above"
  exit 1
fi
```

### Quick manual checks (if you don't want to save a script)

```bash
# All four lines should succeed (echo their value on the right) or print 0 / empty
git rev-parse --verify HEAD                                          # any commit SHA
git fsck --no-progress 2>&1 | grep -c 'missing'                      # 0
git ls-files | grep -E '\.pkl$|\.joblib$' | wc -l                    # 0
git log --oneline | grep -cE 'gitignore v[34]'                       # ≥2
```

### Quick remote-side check (without cloning)

You can also verify origin is on the new history without any local repo:

```bash
gh api 'repos/eltonaguiar/findtorontoevents_antigravity.ca/commits?path=docs/GIT_HISTORY_REWRITE_2026-05-25.md&per_page=1' --jq '.[0].sha + " " + .[0].commit.message' 2>/dev/null | head -1
# Expected: a SHA + "docs: recovery guide for the 2026-05-25 git history rewrite"
# If empty: the rewrite hasn't propagated to origin/main yet
```

### Disk size sanity

```bash
du -sh .git
```

- **If you re-cloned today (Option A)**: expect ~2 GB initially (GitHub server still keeps both old + new packs). Run `git gc --prune=now` locally to compact your copy to **~330 MB immediately**, regardless of GitHub's gc state.
- **After GitHub's server-side gc cycle (1-2 days)**: new clones will get ~330 MB directly, no local gc needed.
- **If still 2.3+ GB after a fresh clone + local `git gc --prune=now`**: something went wrong, you're somehow still on pre-rewrite history. Re-do Option A.

---

## What was preserved (no loss)

- All code, configs, scripts, HTML, docs
- All commit messages (just with new SHAs)
- All branches except `main` and `gh-pages` (only those two were rewritten)
- All tags (not affected)
- All issues, PRs, comments, releases on GitHub
- All current copies of the gitignored data files on disk (they just stop being committed)

## What was lost (intentionally)

- All historical versions of the gitignored bloat files (you can no longer `git show` an old version of `production_models/SOL_USDT_production.pkl`, etc.)
- Old commit SHAs (every collaborator needs to update their local refs)

---

## Why this was necessary

Without the rewrite, `.git/` would have continued growing by ~1 GB/day. The 2026-05-23 stripped reset (commit `4aaa6ff84`) caught the original 19 GB problem but only gitignored 3 paths (`audit_dashboard/data/dashboard_data.json`, `audit_trail/data/dashboard_payload.json`, `swarm_runs/*`). Subsequent audits found 1,746 model weights and several large JSON files were still bleeding into history every commit cycle. This rewrite fixes both:

1. **Stop the bleed** (Stages A + A2): expand `.gitignore` + `git rm --cached` to prevent re-commits
2. **Recover accumulated bloat** (Stage B): rewrite history to delete the orphaned blob copies

For full implementation details see commits with messages starting with `chore: gitignore regenerable bloat`, `chore: gitignore v3`, `chore: gitignore v4`, and the merge commits that followed the `git-filter-repo` force-push.

---

## Questions / problems

If you hit something unusual:

- **"missing blob" errors after recovery** — you're still on old `.git` somehow. Use Option A.
- **`git pull` fails with non-fast-forward** — expected. Use Option A or B above, not `git pull`.
- **Your old `.git-pre-scrub-*` backup is huge** — that's the old 2 GB+ `.git`. Delete after 24h once confident nothing's missing.

Operator who ran the rewrite: see commits `cbbe44c6e` (Stage 1 gitignore), `0395e6776` (proxy session work), Stage A2 gitignore v3/v4, then the filter-repo force-push at ~20:24 UTC. Backup of pre-scrub `.git` is at `/tmp/git-backup-pre-scrub-1779733679.tar` on the operating machine if rollback is needed in the first 24 hours.
