Analysis Metrics Reference
Canonical semantics for metric families used in wonton-soup.
Structural Metrics
GED Families
ged_search_graph: hard matching on goal signature + tactic family (Lean search graphs)ged_search_graph_soft: soft node substitution cost via normalized goal-AST TEDged_proof_graph: proof-object graph distance (external backends)ged_trace_graph: trace-derived proxy graph distance
Family-specific validity metadata:
validvalidity_notestrace_sourcetrace_completeness
Goal-AST Ordered TED
Normalized ordered tree edit distance over goal ASTs.
Used by:
ged_search_graph_softgoal_noveltysolution_path_soft_distanceroot_goal_similarity
Structure Hash (WL)
Fast labeled-graph structural hash used in basin-style convergence summaries.
Search-Dynamics Metrics
Trajectory Metrics
Depth/backtrack/uniqueness summaries from exploration history.
Detour Metrics
Attempt/success/failure summaries for search effort off the final solved path.
Trajectory Comparison (Recovery)
Divergence/reconvergence timing against wild-type solution-path signatures.
Solution-Path Soft Distance
Soft sequence distance between wild/intervention solved-path goal signatures.
Basin and Efficiency Metrics
Basin Analysis (Multi-Seed)
Repeated seeded runs measuring stability of solved structure distribution.
Core outputs:
solve_rateunique_structuresdominant_structure_frequencystructure_distribution
K-Style Search Efficiency
K = log10(tau_blind / tau_agent) with trace-derived blind baseline estimates.
Interpretation guardrail:
- configuration-dependent efficiency score
- not an intrinsic theorem property
For paired blind baseline experiments, use basin mode with explicit blind runs (paper_k).
Tactic/Goal Behavior Metrics
Goal-Type / Tactic Matrix
Aggregate outcome counts by goal surface and normalized tactic.
Tactic Fingerprint
Attempted/successful/solution-path tactic profile.
Goal Novelty
Novel and dropped goal-signature sets under intervention, with min-distance summaries.
Sheaf Metrics
Sheaf-inspired consistency/residual summaries over signature/tactic behavior.
Cross-Metric Comparability Rules
- Do not compare GED values across families without explicit normalization rationale.
ged_trace_graphis proxy-level evidence and must be labeled as such in reporting.- Compare metrics only within consistent goal-signature and goal-ID schemes.
- Explicitly report missing or partial metrics via validity fields.
Computation Ownership
In-run metrics are generated by core orchestration. Heavy metrics are filled by postprocess.
- In-run: trajectory/detour, many base per-variant metrics
- Postprocess: soft GED, goal novelty, solution-path soft distance, K-style efficiency
See Analysis Lexicon (Wonton-Soup) and Log File Schemas for ownership and schema details.