ADR: Checkpoint-Scoped Goal IDs
Context
Raw Lean metavariable IDs are not globally unique across rollback-heavy search. The search stack rolls REPL state back to prior checkpoints, and Lean can reuse raw mvar_id values after rollback.
Collision risk surfaces:
MCTSTree.nodes_by_mvar- adapter state maps (
LeanAdapter.states) - run artifacts keyed by
mvar_id(history/graph/goal-cache/assembly payloads)
Failure modes:
- hard failures (duplicate-key behavior)
- silent overwrite of logically distinct goal states
Decision
Use checkpoint-scoped goal IDs in all Lean search bookkeeping and logs:
cp<checkpoint_id>:<lean_mvar_id>
Example:
- raw ID at one branch:
_uniq.560 - same raw ID after rollback elsewhere:
_uniq.560 - logged IDs:
cp4:_uniq.560andcp9:_uniq.560(distinct)
Options Considered
- Path-specific opaque IDs
- Checkpoint-scoped IDs (chosen)
- Full state-graph merge by semantic signature
- Hybrid (unique IDs + shared stats)
Chosen option preserves tree semantics while fixing rollback collisions.
Behavioral Impact
- Search policy semantics: unchanged (tree of attempts remains tree-shaped)
- Node cardinality: may increase because raw-ID reuse no longer collapses states
- Structural metrics: absolute counts may shift; post-migration runs remain comparable under one scheme
- Goal-cache accounting: now counts checkpoint-scoped occurrences explicitly
Schema and Migration Impact
Affected artifact fields:
*_history.json:mvar_id,selected_path[],child_mvar_ids[]*_graph.json: node identifiersgoal_cache.json.gz:mvar_to_sigkeys and occurrence IDs*_assembly.json.gz: step-level target IDs and opened/closed lists
Compatibility guidance:
- do not combine raw-ID and checkpoint-ID runs in one metric population without explicit migration logic
- rely on
run_status.json.goal_id_schemeto gate mixed-run analyses
Implementation Notes
- adapter identity generation and state keys:
prover/adapters/lean.py - MCTS node identity handling:
prover/mcts.py - artifact readers should treat
mvar_idas opaque, scheme-tagged identity
Consequences
Pros:
- collision safety under rollback
- explicit scheme visible in run metadata
- reproducible identity semantics for downstream analysis
Cons:
- larger ID strings
- historical comparability requires scheme awareness
References
prover/adapters/lean.py(_state_id,preview_tactic,commit_tactic)prover/mcts.py(MCTSNode.mvar_id,nodes_by_mvar)- Goal Identity, Deduplication, and Preview/Commit
- Log File Schemas