SPECTER Labs
Technical Docs concepts

ProofGraphIR

ProofGraphIR is the cross-assistant graph abstraction used by analysis/cross_assistant_alignment.py.

Why It Exists

Raw graph artifacts from different assistants are not directly comparable:

  • Lean wild-type graphs are small tactic-search traces.
  • Coq extraction graphs are proof-term DAGs with structural edges (fn, arg, …).

Without an IR layer, distance is dominated by raw scale mismatch (node/edge counts), which can collapse nearest-neighbor retrieval.

IR Fields

Per theorem graph, ProofGraphIR records:

  • graph_family: search_trace | proof_term_dag | external_proof | unknown
  • structural size: node/edge counts, max depth, root count
  • topology ratios: leaf ratio, branching ratio, mean branch factor
  • edge-role profile:
    • search traces: fam:<tactic_family>
    • term DAGs: structural roles (fn, arg, binder_type, body, value)
  • assistant-agnostic operator profile (shared ontology), for example:
    • apply, bind, branch, rewrite, normalize, close, automation, value, other
  • assistant-agnostic motif profile (action motifs), for example:
    • motif:term_apply, motif:term_bind, motif:bind_open, motif:branch_split, motif:rewrite_step, motif:close_goal
  • action-layer summaries derived from TacticActionIR:
    • action_kind_profile
    • effect_profile
    • continuation_profile
    • coupling_profile
  • Lean search traces can also carry explicit step dependencies and spawned-goal summaries through edge metadata and assembly-trace actionMetadata
  • shape hash: WL hash over canonicalized graph

Run-relative rank features are derived separately at comparison time:

  • node_rank, edge_rank, depth_rank, leaf_rank, branching_rank

These are computed per graph family within a run and are not intrinsic IR fields.

Kind Inference

Inference combines:

  • edge-role composition (term-edge fraction)
  • ordered expansion edges (order) as a search-trace signal
  • node goal_type hints for proof-term DAGs (app, const, lam, …)
  • backend hint from run config when available

Legacy Coq proof_graph artifacts are also repaired here when they are actually term-structural: if an unversioned graph is dominated by structural term edges and node hints, the IR upgrades it to proof_term_dag semantics instead of treating it as a generic external-proof trace.

When proof_term_graph is available, the Coq side can carry richer node metadata than wild_type_graph, including constructor names and binder metadata. That metadata feeds the current term-action enrichment for constructors such as Case, Prod, and Cast.

Distance Contract

Graph distance is kind-aware:

  • Same-kind:
    • rank-distance + operator-profile L1 + motif-profile L1 + edge-role L1
    • plus action_kind, effect, continuation, and coupling profile deltas
    • plus shape mismatch
  • Cross-kind:
    • rank-distance + operator-profile L1 + motif-profile L1
    • plus action/effect/continuation/coupling profile deltas
    • plus a small fixed mismatch penalty
    • role/shape channels are still suppressed because their semantics are incompatible

Full pair distance then combines graph, lexical token Jaccard, and connective-profile distance, with higher lexical/connective weight in cross-kind mode.

The aligner can now ablate lexical channels explicitly:

  • --lexical-ablation drop_tokens: remove theorem/display/constant token overlap while keeping connective profiles
  • --lexical-ablation graph_only: remove both lexical tokens and connective profiles

What This Enables

  • Explicit comparability semantics instead of implicit assumptions
  • Diagnostics that separate structural mismatch from lexical mismatch
  • Reduced retrieval collapse caused by absolute size bias
  • Theorem-level inspection via wonton.py inspect-proof-ir, including action traces and pair-distance component breakdowns

Current Limits

  • Cross-kind matching still relies heavily on lexical/statement signals.
  • Coq proof_term_graph is the richer surface for cross-kind diagnostics; older wild_type_graph logs can preserve the right topology while still dropping constructor metadata.
  • Goal coupling is only as good as the producer path; some legacy or imported graphs still lack explicit coupling data.
  • Current motif ontology is lightweight and requires expansion for richer proof-process semantics.

Next Steps

  • Expand TacticActionIR so Lean search traces can distinguish explicit continuation builders and real goal coupling.
  • Add proof-state transition motifs (for search traces) and beta/eta-normalized motifs (for terms).
  • Evaluate with translated theorem pairs where statements are intentionally obfuscated, so graph channels are stress-tested independently of theorem names. The new lexical ablation modes provide the first local benchmark lever for that stress test.
  • Treat TacticActionIR as the current step-level contract feeding this graph IR, while the addendum stays focused on the longer-term research question about richer tactic representation.

External Research Anchors