SPECTER Labs
Technical Docs ops

Follow-Up Run Program (March 2026)

This run program is the operational answer to the follow-up thesis constraints in site/blog/wonton-soup-follow-up/index.md:

  • broaden theorem coverage to hundreds of shared items,
  • pair centralized and distributed MCTS on matched slices,
  • expand basin coverage (wide and deep),
  • add external-backend pilots,
  • score with explicit locked references.

Entry Point

Use:

cd dossiers/wonton-soup
scripts/followup_run_program.sh

The script defaults to dry-run and prints every command. Run with --execute to actually launch.

For deepseek provider on an external artifacts volume, set:

DEEPSEEK_ARTIFACT_ROOT=/path/to/specter-archive

This root must contain: wonton-soup/models/ntp-mathlib-deepseek-1.3b-mlx-bf16.

Cohort Contract

  • Program namespace: PROGRAM_ID (default: followup-2026-03)
  • All run ids are nested under PROGRAM_ID/....
  • Lake/postprocess/K commands are scoped to one logs root: resolve_logs_root()/<PROGRAM_ID>
  • Run-id date prefix is frozen at program start, so cohorts do not split at UTC date rollover.
  • Use one pinned Lean corpus ref (LEAN_CORPUS_REF) across phases. Prefer a pinned artifact ref such as lean:<corpus_id>@<build_id>#feasible.

Phases

scripts/followup_run_program.sh --phase p1 --execute

  • p1: shared Lean panel (centralized MCTS), provider parity (reprover/heuristic/deepseek)
  • p2: paired centralized vs distributed MCTS on matched theorem slice
  • p3: basin wide (--seeds 10 --blind) on the broad panel
  • p4: basin deep (--seeds 50 --blind) on a focused subset
  • p5: external backend pilots (e, vampire, z3, optional coq stdlib)
  • p6: postprocess + lake reconcile scoped to this cohort logs root
  • p7: locked-reference K jobs via lake job presets

Run all phases in order:

scripts/followup_run_program.sh --execute

K Calibration Presets

Phase p7 executes:

  • analysis/lake/presets/73_followup_k_ref_reprover_v1.json
  • analysis/lake/presets/74_followup_k_ref_deepseek_v1.json
  • analysis/lake/presets/75_followup_k_ref_heuristic_v1.json
  • analysis/lake/presets/76_followup_k_ref_pooled_v1.json

Each preset:

  • selects completed Lean runs from this cohort logs root,
  • builds an explicit goal-outcomes reference (provider-specific or pooled),
  • scores K against that reference,
  • exports deterministic JSONL datasets for row-level and aggregate analysis.

External Backend Inputs

p5 is opt-in (ENABLE_EXTERNAL_PILOTS=1) and uses:

  • TPTP_ROOT for e and vampire
  • SMTLIB_ROOT for z3
  • ENABLE_COQ_STDLIB=1 for Coq stdlib pilot

Unset inputs are skipped explicitly (fail-loud behavior by phase output).