Follow-Up Run Program (March 2026)
This run program is the operational answer to the follow-up thesis constraints in site/blog/wonton-soup-follow-up/index.md:
- broaden theorem coverage to hundreds of shared items,
- pair centralized and distributed MCTS on matched slices,
- expand basin coverage (wide and deep),
- add external-backend pilots,
- score with explicit locked references.
Entry Point
Use:
cd dossiers/wonton-soup
scripts/followup_run_program.shThe script defaults to dry-run and prints every command. Run with --execute to actually launch.
For deepseek provider on an external artifacts volume, set:
DEEPSEEK_ARTIFACT_ROOT=/path/to/specter-archiveThis root must contain: wonton-soup/models/ntp-mathlib-deepseek-1.3b-mlx-bf16.
Cohort Contract
- Program namespace:
PROGRAM_ID(default:followup-2026-03) - All run ids are nested under
PROGRAM_ID/.... - Lake/postprocess/K commands are scoped to one logs root:
resolve_logs_root()/<PROGRAM_ID> - Run-id date prefix is frozen at program start, so cohorts do not split at UTC date rollover.
- Use one pinned Lean corpus ref (
LEAN_CORPUS_REF) across phases. Prefer a pinned artifact ref such aslean:<corpus_id>@<build_id>#feasible.
Phases
scripts/followup_run_program.sh --phase p1 --execute
p1: shared Lean panel (centralized MCTS), provider parity (reprover/heuristic/deepseek)p2: paired centralized vs distributed MCTS on matched theorem slicep3: basin wide (--seeds 10 --blind) on the broad panelp4: basin deep (--seeds 50 --blind) on a focused subsetp5: external backend pilots (e,vampire,z3, optionalcoq stdlib)p6:postprocess+lake reconcilescoped to this cohort logs rootp7: locked-reference K jobs via lake job presets
Run all phases in order:
scripts/followup_run_program.sh --executeK Calibration Presets
Phase p7 executes:
analysis/lake/presets/73_followup_k_ref_reprover_v1.jsonanalysis/lake/presets/74_followup_k_ref_deepseek_v1.jsonanalysis/lake/presets/75_followup_k_ref_heuristic_v1.jsonanalysis/lake/presets/76_followup_k_ref_pooled_v1.json
Each preset:
- selects completed Lean runs from this cohort logs root,
- builds an explicit goal-outcomes reference (provider-specific or pooled),
- scores K against that reference,
- exports deterministic JSONL datasets for row-level and aggregate analysis.
External Backend Inputs
p5 is opt-in (ENABLE_EXTERNAL_PILOTS=1) and uses:
TPTP_ROOTforeandvampireSMTLIB_ROOTforz3ENABLE_COQ_STDLIB=1for Coq stdlib pilot
Unset inputs are skipped explicitly (fail-loud behavior by phase output).