Follow-Up Run Program (March 2026)
Provides the operational answer to the follow-up thesis constraints in site/blog/wonton-soup-follow-up/index.md:
- broaden theorem coverage to hundreds of shared items,
- pair centralized and distributed MCTS on matched slices,
- expand basin coverage (wide and deep),
- add external-backend pilots,
- score with explicit locked references.
The intended read of the phases follows the main dossier framing. Centralized MCTS establishes the structural baseline for the shared theorem slice. Distributed MCTS then tests whether coordination changes access to that same landscape. The program is not organized around two unrelated search systems; it is organized around one proof-space model with two control regimes.
Entry Point
Use:
cd dossiers/wonton-soup
scripts/followup_run_program.shThe script defaults to dry-run and prints every command. Run with --execute to actually launch.
For deepseek provider on an external artifacts volume, set:
DEEPSEEK_ARTIFACT_ROOT=/path/to/specter-archiveThis root must contain: wonton-soup/models/ntp-mathlib-deepseek-1.3b-mlx-bf16.
Cohort Contract
- Program namespace:
PROGRAM_ID(default:followup-2026-03) - All run ids are nested under
PROGRAM_ID/.... - Lake/postprocess/K commands are scoped to one logs root:
resolve_logs_root()/<PROGRAM_ID> - Run-id date prefix is frozen at program start, so cohorts do not split at UTC date rollover.
- Use one pinned Lean corpus ref (
LEAN_CORPUS_REF) across phases. Prefer a pinned artifact ref such aslean:<corpus_id>@<build_id>#feasible.
Phases
scripts/followup_run_program.sh --phase p1 --execute
p1: shared Lean panel (centralized MCTS), provider parity (reprover/heuristic/deepseek)p2: paired centralized vs distributed MCTS on matched theorem slice to measure coordination effects against thep1baselinep3: basin wide (--seeds 10 --blind) on the broad panelp4: basin deep (--seeds 50 --blind) on a focused subsetp5: external backend pilots (e,vampire,z3, optionalcoq stdlib)p6:postprocess+lake reconcilescoped to this cohort logs rootp7: locked-reference K jobs via lake job presets
Run all phases in order:
scripts/followup_run_program.sh --executeK Calibration Presets
Phase p7 executes:
analysis/lake/presets/73_followup_k_ref_reprover_v1.jsonanalysis/lake/presets/74_followup_k_ref_deepseek_v1.jsonanalysis/lake/presets/75_followup_k_ref_heuristic_v1.jsonanalysis/lake/presets/76_followup_k_ref_pooled_v1.json
Each preset:
- selects completed Lean runs from this cohort logs root,
- builds an explicit goal-outcomes reference (provider-specific or pooled),
- scores K against that reference,
- exports deterministic JSONL datasets for row-level and aggregate analysis.
External Backend Inputs
p5 is opt-in (ENABLE_EXTERNAL_PILOTS=1) and uses:
TPTP_ROOTforeandvampireSMTLIB_ROOTforz3ENABLE_COQ_STDLIB=1for Coq stdlib pilot
Unset inputs are skipped explicitly (fail-loud behavior by phase output).