Distributed MCTS Semantics
Canonical semantics for distributed (cell-view) MCTS in wonton-soup.
Scope and Enablement
Distributed mode is active when:
--mcts-mode distributed
Centralized semantics remain in UCB1 and Blind-Uniform Search in Wonton-Soup.
Execution Model
Distributed mode has multiple logical controllers (agents) sharing one MCTSTree.
Shared primitives:
MCTSTree,MCTSNodeExplorationHistory,TacticAttemptMCTSTraceWriter
Control loop:
- Agent reserves a candidate node/path under a tree lock.
- Reservation marks inflight counts on node/path.
- Expansion uses one adapter (
adapter_mode="single"), so Lean interaction is serialized. - Backprop updates tree statistics and decrements inflight counters.
This is logical parallelism over one shared search state, not concurrent Lean execution.
Reservation Score and Virtual Loss
Reservation uses a UCT-like score with optional distributed modifiers:
- base exploitation/exploration from node stats (or
AND_MINvalue) - virtual-loss penalties from inflight contention
- additive depth/path bias
Effective parent visits:
parent_visit_eff = parent.visit_count + inflight(parent) * virtual_loss
Effective child stats:
visits_eff = child.visit_count + inflight(child) * virtual_loss
success_eff = child.success_count - inflight(child) * virtual_loss
Optional additives:
depth_bias * node.depthpath_biaswhen node lies on the agent’s previous selected path
Unvisited and non-inflight nodes keep first-touch priority (+inf).
Intervention Schedules
Blocking (BlockSchedule):
- probabilistic first-touch blocking by node
- finite duration and permanent blocks (
duration < 0) - optional immovable subset and unfreeze controls
Delay (DelaySchedule):
- probabilistically delays selected nodes for fixed windows
Reroute:
- when blocked or delayed, selection can retry alternate nodes up to
max_attempts - reroute trail is recorded in trace payloads
CLI Contract and Validation
| Flag(s) | Requirement | Validation |
|---|---|---|
--mcts-agents, --mcts-inflight |
required in distributed mode | both must be provided |
| distributed options in centralized mode | forbidden | parser error |
--mcts-block-fraction |
enables block schedule | must be in (0,1) |
--mcts-block-duration with block fraction |
required | must be non-zero |
--mcts-block-seed with block fraction |
required | must be provided |
--mcts-block-immovable-fraction |
optional with block schedule | must be in [0,1]; requires positive duration |
--mcts-unfreeze-after |
optional with block schedule | must be >= 1 |
--mcts-unfreeze-prob |
optional with block schedule | must be in (0,1] |
--mcts-reroute-blocked + --mcts-reroute-max |
paired | max required when reroute enabled; max >= 1 |
delay triplet (--mcts-delay-prob, --mcts-delay-duration, --mcts-delay-seed) |
all-or-none | prob in (0,1), duration >= 1, seed >= 0 |
--mcts-virtual-loss |
optional | >= 0 |
--mcts-depth-bias, --mcts-path-bias |
optional | each >= 0 |
--mcts-history-cache |
optional | boolean switch |
Cross-mode constraints:
--basin-blindrequires--basin-seeds.--basin-blindcurrently requires centralized mode.- Multi-provider runs do not support
--basin-seeds.
Provider and Ranker Compatibility
Distributed mode uses the same provider stack as centralized mode:
- provider output is probability-sorted before ranker application
- ranker is reorder-only and must preserve count + tactic multiset
- violations fail fast
Provenance and Reproducibility
Run metadata captures distributed settings (distributed_mcts) and policy choices.
Trace payloads capture distributed context:
agent- inflight state
- block/delay/reroute snapshots
These fields allow post-hoc reconstruction without relying on CLI history.