Verification Matrix

PR-01 freezes the current Wonton Soup surface before structural refactors. This is not a feature spec; it is a compact map of what exists today, which tests exercise it, and which families should keep representative smoke coverage as the runtime code shrinks.

Command Families

Family	Representative commands	Primary coverage today	Notes
Lean proving workflows	`lean run`, `lean basin`, `lean suite`	`tests/test_cli_args.py`, `prover/tests/test_wonton_cli_wrappers.py`	Core orchestration surface; wrapper and help smoke should stay intact through refactors.
External backends	`run`, `e`, `vampire`, `z3`, `coq`	`prover/tests/test_wonton_cli_wrappers.py`, `prover/tests/test_wonton_validate_backend_args.py`	Mostly wrapper-level today; keep one representative CLI smoke path.
Corpus pipeline	`corpus build-`, `corpus validate`, `corpus list`, `corpus sweep-`	`corpus/tests/test_build_1000_plus.py`, `corpus/tests/test_selection.py`, `corpus/tests/test_artifacts_list.py`	Data-heavy builders do not need full end-to-end goldens; representative help and selection checks are enough.
Surface preservation	`spctr surface status`, `spctr surface refresh`, `spctr surface checkpoint`	`ops/spctr/tests/surface.rs`, `prover/tests/test_runtime_paths.py`	Agents should use the manifest-declared `wonton-lake` surface for cross-project preservation.
Lake indexing and exports	`lake reconcile`, `lake export-parquet`, `lake score-k`	`analysis/tests/test_lake_smoke.py`, `analysis/tests/test_export_parquet.py`, `analysis/tests/test_k_reference_score.py`	Best current synthetic end-to-end surface.
Lake references and jobs	`lake reference build-outcomes`, `lake job run`, `lake job presets`	`analysis/tests/test_lake_job.py`, `analysis/tests/test_lake_job_config.py`	Good target for typed-schema refactors; keep job config compatibility visible.
Analysis and reporting	`postprocess`, `verify-run-local`, `analyze`, `analysis`, `inspect-proof-ir`	`analysis/tests/test_verify_run_local.py`, `analysis/tests/test_inspect_proof_ir.py`, `analysis/tests/test_export_dashboard.py`	Shared payload logic should keep these outputs aligned.
Comparison and benchmarks	`compare`, `compare-cross-assistant`, `benchmark-cross-assistant`	`analysis/tests/test_cross_assistant_alignment.py`, `analysis/tests/test_cross_assistant_paired_benchmark.py`	High-value read model for future manifest/cohort work.
Meta and operator UX	`watch`, `list`, `setup`	`tests/test_cli_args.py`, `prover/tests/test_lean_setup.py`, `prover/tests/test_wonton_logs_dir.py`	These commands are thin, but they are part of the public operator surface.

Representative Golden Slices

The current suite is strong on wrappers and synthetic lake fixtures. Before larger refactors, the most important behavior slices to preserve are:

Lean run surface: command construction, help rendering, and argument wiring.
Sync surface: one representative status command and one aggregate push/pull command.
Lake surface: reconcile, reference build, and job run over a tiny synthetic log root.
Comparison surface: one cross-assistant benchmark/alignment path over synthetic fixtures.

Anything deeper than that should pay for itself with deletion, not with more scaffolding.