Distributed MCTS
Active when --mcts-mode distributed. Centralized semantics remain in UCB1 and Blind-Uniform Search.
Distributed mode is not a second proof-space model. It is the collective-control layer over the same proof-space morphology studied in centralized runs. Centralized MCTS establishes the structural landscape; distributed mode tests how coordination, contention, and scheduling change access to that landscape.
The Tree Is an AND/OR Search Over Proof Obligations
Each node in the MCTS tree is a single proof obligation (a Lean metavariable with a goal type). Each edge is a tactic application. The tree has AND/OR semantics:
- OR: at any node, the search needs any one tactic to work.
- AND: if that tactic splits the goal into N subgoals, all N children must be solved.
node: a /\ b (OR: pick any tactic)
|
+-- tactic "constructor"
| +-- child: a (AND: both must be solved)
| +-- child: b
|
+-- tactic "simp"
+-- (closed, no children) terminal successMCTSNode.children is a dict[str, list[MCTSNode]] keyed by tactic string. An empty list means the tactic closed the goal entirely. This encoding makes the AND/OR structure explicit in the data: each key is an OR-branch, each value list is the AND-set.
MCTSTree.is_solved() walks this structure with iterative DFS: a node is solved if it is terminal, or if at least one tactic edge has all its children solved. extract_winning_tactics() follows the solved path to produce the final tactic sequence.
Checkpoint-Based Branching Over a Sequential REPL
The Lean REPL is sequential – one current proof state at a time. Tree search requires branching. The adapter resolves this with checkpoint rollback.
Each committed tactic creates a LeanEnvironmentCheckpoint. When the search wants to try a different tactic at the same node, the adapter rolls back to the parent checkpoint, restoring the REPL to the pre-tactic state.
Checkpoint 1 goal: a + b = b + a
|
+-- apply "ring" --> Checkpoint 2 (closed, terminal)
|
+-- (rollback to CP1)
|
+-- apply "simp" --> Checkpoint 3 (new subgoals)Tactic evaluation is two-phase:
adapter.preview_tactic(mvar_id, tactic)rolls back, applies the tactic, returns aTacticPreviewwith child goals and a new checkpoint. Does not mutate adapter bookkeeping.adapter.commit_tactic(preview)writes the state change into the adapter’s maps.
This split lets the search test multiple tactics before committing. Preview is speculative; commit is permanent. See Goal Identity, Deduplication, and Preview/Commit for the identity and cycle-breaking contracts around this mechanism.
Node identity ties a goal to a specific checkpoint: mvar_id = "cp{checkpoint_id}:{lean_mvar_id}". This prevents collisions when the same raw Lean mvar ID appears at different points in the search.
Shared State and Concurrency Model
Multiple async agents (typically 4–8) share one Python process, one MCTSTree, and one LeanAdapter:
+---------- shared state ----------+
| MCTSTree tree_lock |
| LeanAdapter expansion_lock |
| TacticProvider inflight set |
| stop_event |
+---------+----+----+----+---------+
| | | |
agent0 1 2 3Two locks partition the work:
tree_lockguards tree reads and writes (selection, backpropagation, node creation).expansion_lockserializes tactic evaluation against the single Lean REPL.
There is only one REPL subprocess. Agents do not evaluate tactics concurrently. The throughput gain comes from overlapping the fast phases (selection, backpropagation, tactic suggestion) of one agent with the slow phase (Lean tactic evaluation) of another.
Agent Iteration Lifecycle
Each agent runs a loop: reserve, expand, backpropagate, release. One pass through this loop is one iteration.
1. Reserve (under tree_lock)
The agent walks from root to a leaf using UCB-like scoring. At each level, it picks the highest-scoring live child that is not already in the inflight set. If all live children at some level are inflight, the agent backs off with a short sleep and retries.
On success, the selected node is added to inflight and the entire root-to-node path gets its inflight counts incremented (used for virtual loss, see below).
2. Expand (under expansion_lock)
The agent retrieves the goal state from the adapter, asks the tactic provider for scored candidates, and tries them in descending probability order:
- Tactic fails: skip, try the next candidate.
- Tactic succeeds with zero children: goal is closed. Commit, mark node terminal, break.
- Tactic succeeds with N children: check for cycles (see below), commit, create child nodes in the tree, break.
The agent commits the first successful tactic and stops. If all tactics fail, the node is marked dead.
3. Backpropagate (under tree_lock)
Walk from the expanded node to root, incrementing visit_count at every ancestor. If the expansion succeeded, also increment success_count.
After backpropagation, check tree.is_solved(). If the root is solved, set stop_event and all agents exit their loops.
4. Release
Remove the node from inflight, decrement inflight counts along the path. Other agents can now select this node or its neighbors without virtual-loss penalty.
Timeline
Agent 0: [reserve root] [--- expand: ring(fail), simp(ok) ---] [backprop] [reserve child]...
Agent 1: [reserve: root inflight, retry] [reserve root] [--- expand: omega ---] [backprop]...
Agent 2: [reserve child_1] [--- expand ---] [backprop]...
Agent 3: [reserve child_2] [--- expand ---]...
|
expansion_lock serializes all Lean callsVirtual Loss Prevents Redundant Selection
Without virtual loss, all agents would select the same most-promising node. Virtual loss applies a temporary pessimistic bias to nodes currently being worked on:
visits_eff = visit_count + inflight_count * virtual_loss
success_eff = success_count - inflight_count * virtual_lossWhen inflight_count >= 1 and virtual_loss > 0, the node’s UCB score drops sharply. Other agents prefer different nodes.
The penalty applies to the entire selected path, not just the leaf. If agent 0 is working deep in the left subtree, agent 1 is nudged toward the right subtree because every ancestor on agent 0’s path also has elevated inflight counts.
When agent 0 finishes and releases, inflight counts drop back and scores return to their true values.
Effective parent visits for the exploration term:
parent_visit_eff = parent.visit_count + inflight(parent) * virtual_lossOptional additive score modifiers:
depth_bias * node.depthpath_biaswhen node lies on the agent’s previous selected path
Unvisited and non-inflight nodes keep first-touch priority (+inf).
When virtual_loss = 0, agents can collide on the same node. The expansion_lock serializes them, and the second agent skips tactics already in node.children.
Cycle Prevention via Goal Signatures
A tactic can transform a goal into a semantically equivalent form (e.g. rw [add_comm] turning a + b into b + a, which a later rewrite could reverse). Without cycle detection, the search enters infinite loops.
Before expanding a node, the search collects goal signatures of all ancestors from root to the current node. Any child whose signature matches an ancestor is skipped.
Goal signatures are 12-hex hashes of the canonicalized goal AST, with commutative normalization applied to selected connectives. A strict variant (goal_sig_strict) without commutative normalization is recorded separately for analysis.
See Goal Identity, Deduplication, and Preview/Commit for signature computation details and the dead-node cascading contract.
Intervention Schedules
Blocking, delay, and reroute schedules are research tools for studying how the search degrades and recovers under damage. They do not affect Lean tactic correctness.
Blocking (BlockSchedule):
- probabilistic first-touch blocking by node
- finite duration and permanent blocks (
duration < 0) - optional immovable subset and unfreeze controls
Delay (DelaySchedule):
- probabilistically delays selected nodes for fixed windows
Reroute:
- when blocked or delayed, selection can retry alternate nodes up to
max_attempts - reroute trail is recorded in trace payloads
CLI Contract
| Flag(s) | Requirement | Validation |
|---|---|---|
--mcts-agents, --mcts-inflight | required in distributed mode | both must be provided |
| distributed options in centralized mode | forbidden | parser error |
--mcts-block-fraction | enables block schedule | must be in (0,1) |
--mcts-block-duration with block fraction | required | must be non-zero |
--mcts-block-seed with block fraction | required | must be provided |
--mcts-block-immovable-fraction | optional with block schedule | must be in [0,1]; requires positive duration |
--mcts-unfreeze-after | optional with block schedule | must be >= 1 |
--mcts-unfreeze-prob | optional with block schedule | must be in (0,1] |
--mcts-reroute-blocked + --mcts-reroute-max | paired | max required when reroute enabled; max >= 1 |
delay triplet (--mcts-delay-prob, --mcts-delay-duration, --mcts-delay-seed) | all-or-none | prob in (0,1), duration >= 1, seed >= 0 |
--mcts-virtual-loss | optional | >= 0 |
--mcts-depth-bias, --mcts-path-bias | optional | each >= 0 |
--mcts-history-cache | optional | boolean switch |
Cross-mode constraints:
--basin-blindrequires--basin-seeds.--basin-blindcurrently requires centralized mode.- Multi-provider runs do not support
--basin-seeds.
Provider and Ranker Compatibility
Distributed mode uses the same provider stack as centralized mode. Provider output is probability-sorted before ranker application. The ranker is reorder-only and must preserve count and tactic multiset. Violations fail fast.
Provenance
Run metadata captures distributed settings (distributed_mcts) and policy choices. Trace payloads capture per-iteration distributed context: agent_id, inflight state, and block/delay/reroute snapshots. These fields allow post-hoc reconstruction without relying on CLI history.
Implementation Surface
| File | Responsibility |
|---|---|
experiments/distributed_mcts/core.py | distributed_mcts_search, agent loop, virtual loss, intervention schedules |
prover/mcts.py | MCTSNode, MCTSTree, AND/OR solving, centralized baseline |
prover/adapters/lean.py | LeanAdapter, checkpoint rollback, preview/commit |
prover/goal_signature.py | goal signature hashing with commutative normalization |
prover/providers/base.py | TacticProvider, GoalAwareTacticProvider |
orchestrator/lean.py | run_corpus, run_theorem, distributed config wiring |