Goal Identity, Deduplication, and Preview/Commit
Canonical reference for:
- checkpoint-scoped goal identity in logs and MCTS
- semantic cycle-breaking via goal signatures
- preview/commit tactic application in the Lean adapter
Goal Identity in This Repo
Lean metavariable IDs (mvar_id) are stable only within a given proof state. With checkpoint rollback, raw IDs can be reused.
To prevent collisions, wonton-soup logs and tree nodes use checkpoint-scoped IDs:
cp<checkpoint_id>:<lean_mvar_id>
Example:
cp12:_uniq.100099.72
Implications:
MCTSTree.nodes_by_mvarkeys by checkpoint-scoped goal ID.- adapter state maps (
LeanAdapter.states) key by checkpoint-scoped goal ID. - history/graph/assembly artifacts record checkpoint-scoped IDs in
mvar_idfields.
Why Identity Alone Is Not Enough
Checkpoint-scoped IDs prevent collisions, but they do not merge semantically equivalent goals. Equivalent goals reached by different rewrites still produce distinct IDs.
That caused an observed failure mode: repeated rewrites cycling equivalent goals indefinitely (e.g., mul_add_distrib repeatedly rewriting commutative forms).
Goal Signatures for Cycle Breaking
Search uses an AST-based coarse goal signature for branch-local cycle breaking.
Input surface:
- goal type AST
- hypothesis ASTs
Normalization intentionally removes distinctions that are noisy for cycle detection:
- binder/fvar/mvar identity
- hypothesis ordering
- commutative surface forms for selected connectives
Tradeoff:
- prevents infinite equivalent-goal cycles
- may occasionally over-merge distinct states
A strict signature variant is also recorded for analysis payloads (goal_sig_strict) to preserve full structural distinctions for diagnostics.
Preview/Commit Tactic Application
The adapter API (prover/adapters/lean.py) separates speculative execution from state mutation.
preview_tactic(mvar_id, tactic) -> TacticPreview | None- rolls back to the parent checkpoint
- applies tactic
- returns child goals/terms/checkpoint info without mutating adapter maps
commit_tactic(preview) -> list[str]- mutates adapter bookkeeping (state map, partial term, assembly trace)
- creates child states and returns child checkpoint-scoped IDs
This split allows MCTS to test multiple tactics before committing to one.
MCTS Expansion Contract
At expansion time (prover/mcts.py), tactics are evaluated in order:
- preview tactic
- compute goal signatures for child goals
- discard children whose signatures already appear on the current root-to-node path or earlier siblings from the same preview
- commit only the first tactic that yields novel children
- mark node dead only when all tactics are exhausted
This avoids the earlier failure where the first duplicate-producing tactic prematurely killed a node, while also avoiding global pruning based on a deliberately lossy equivalence class.
Dead-Node Cascading
When no tactic can yield novel children on the current branch:
- node is marked dead
- dead children are skipped by selection
- parents become dead when all descendants are dead
- search stops when root is dead
This ensures iteration budget is not spent on branch-local cycles. Coarse signatures are not used to prove that the same semantic shape is globally dead across unrelated branches.
Behavioral Impact
- Search distribution: tree semantics preserved (no state-graph merge).
- Stability: collisions from rollback reuse are eliminated.
- Metrics/logging: node counts may shift upward because reused raw IDs no longer collapse.
- Comparability: goal-ID scheme is explicit (
checkpoint) in run metadata.
Why Not Full State-Graph Merging?
Alternatives considered:
- path-specific opaque IDs only
- full state-graph merge by goal signature
- hybrid shared statistics by signature
Current design keeps tree semantics while adding bounded semantic deduplication for cycle safety on the active branch. This minimizes search-policy drift while fixing concrete failure modes without making the search incomplete through global coarse-signature pruning.
Relation to Proof Artifacts
Goal identity and dedup semantics directly influence:
*_history.jsonselected paths and attempts*_graph.jsonnode identity*_assembly.json.gzstep target IDs- GED and trajectory-derived analyses that consume those artifacts
Implementation Surface
| File | Responsibility |
|---|---|
prover/adapters/lean.py |
checkpoint-scoped state IDs, preview/commit, reconstruction |
prover/mcts.py |
signature-aware expansion, dead-node handling, tree bookkeeping |
prover/goal_signature.py |
goal signature computation schemes |
prover/goal_cache.py |
signature mapping and per-signature outcome aggregation |
| Log File Schemas | artifact fields affected by goal-ID scheme |
| ADR: Checkpoint-Scoped Goal IDs | design decision record |