Verification Matrix

PR-01 freezes the current Wonton Soup surface before structural refactors. This is not a feature spec; it is a compact map of what exists today, which tests exercise it, and which families should keep representative smoke coverage as the runtime code shrinks.

Command Families

FamilyRepresentative commandsPrimary coverage todayNotes
Lean proving workflowslean run, lean basin, lean suitetests/test_cli_args.py, prover/tests/test_wonton_cli_wrappers.pyCore orchestration surface; wrapper and help smoke should stay intact through refactors.
External backendsrun, e, vampire, z3, coqprover/tests/test_wonton_cli_wrappers.py, prover/tests/test_wonton_validate_backend_args.pyMostly wrapper-level today; keep one representative CLI smoke path.
Corpus pipelinecorpus build-*, corpus validate, corpus list, corpus sweep-*corpus/tests/test_build_1000_plus.py, corpus/tests/test_selection.py, corpus/tests/test_artifacts_list.pyData-heavy builders do not need full end-to-end goldens; representative help and selection checks are enough.
Surface preservationspctr surface status, spctr surface refresh, spctr surface checkpointops/spctr/tests/surface.rs, prover/tests/test_runtime_paths.pyAgents should use the manifest-declared wonton-lake surface for cross-project preservation.
Lake indexing and exportslake reconcile, lake export-parquet, lake score-kanalysis/tests/test_lake_smoke.py, analysis/tests/test_export_parquet.py, analysis/tests/test_k_reference_score.pyBest current synthetic end-to-end surface.
Lake references and jobslake reference build-outcomes, lake job run, lake job presetsanalysis/tests/test_lake_job.py, analysis/tests/test_lake_job_config.pyGood target for typed-schema refactors; keep job config compatibility visible.
Analysis and reportingpostprocess, verify-run-local, analyze, analysis, inspect-proof-iranalysis/tests/test_verify_run_local.py, analysis/tests/test_inspect_proof_ir.py, analysis/tests/test_export_dashboard.pyShared payload logic should keep these outputs aligned.
Comparison and benchmarkscompare, compare-cross-assistant, benchmark-cross-assistantanalysis/tests/test_cross_assistant_alignment.py, analysis/tests/test_cross_assistant_paired_benchmark.pyHigh-value read model for future manifest/cohort work.
Meta and operator UXwatch, list, setuptests/test_cli_args.py, prover/tests/test_lean_setup.py, prover/tests/test_wonton_logs_dir.pyThese commands are thin, but they are part of the public operator surface.

Representative Golden Slices

The current suite is strong on wrappers and synthetic lake fixtures. Before larger refactors, the most important behavior slices to preserve are:

  1. Lean run surface: command construction, help rendering, and argument wiring.
  2. Sync surface: one representative status command and one aggregate push/pull command.
  3. Lake surface: reconcile, reference build, and job run over a tiny synthetic log root.
  4. Comparison surface: one cross-assistant benchmark/alignment path over synthetic fixtures.

Anything deeper than that should pay for itself with deletion, not with more scaffolding.