Log Query Cookbook
Operational query recipes for inspecting run artifacts. Canonical file schemas live in Log File Schemas.
Reading Gzipped Files
Note: Use gzcat on macOS, zcat on Linux.
# View full contents
gzcat file.json.gz | jq .
# Pretty-print with less
gzcat file.json.gz | jq . | less
# Query specific fields
gzcat summary.json.gz | jq '.theorems[] | select(.wild_type.solved) | .name'
# Count nodes in proof term
gzcat wild_type_proof_term.json.gz | jq '.nodes | length'
# Search for text patterns in gzipped files
gzcat goal_cache.json.gz | rg "pattern"
# Extract just the keys
gzcat goal_cache.json.gz | jq 'keys'For Python scripts, the analysis modules handle gzip transparently:
uv run python -c "
from analysis.summary import CorpusRun
from pathlib import Path
run = CorpusRun.load(Path('logs/corpus-.../'))
print(f'Solved: {len([t for t in run.theorems if t.wild_type_solved])}/{len(run.theorems)}')
"Interpretation Workflows
1. “How well did the run go?”
Start with: summary.json.gz
gzcat summary.json.gz | jq '{
total: .aggregates.theorem_count,
solve_rate: .aggregates.wild_type_solve_rate,
interventions: .aggregates.intervention_count,
int_rate: .aggregates.intervention_solve_rate
}'Check:
- Wild-type solve rate (baseline capability)
- Intervention solve rate (robustness)
- If wild-type is low, check
failure_analysis.jsonfor patterns
Red flags:
MODEL_HALLUCINATION> 0 suggests inference bugs (check provider settings)- High
TACTIC_DESERTsuggests model lacks tactics for these goal types - Wild-type much lower than expected suggests corpus mismatch or model issues
2. “Why did theorem X fail?”
Start with: <theorem>/wild_type_history.json
cat theorem_name/wild_type_history.json | jq '.iterations[-5:] | map({iteration, attempts: [.attempts[]? | {tactic, outcome}]})'Look for:
- Last few tactics tried - are they sensible?
attempts[].outcome == "failure"patterns - same tactic failing repeatedly?attempts[].goal_typenear the end - is it within the model’s capability envelope?
Then check: <theorem>/wild_type_metrics.json
cat theorem_name/wild_type_metrics.json | jq '{
failure_ratio: .detour.failure_ratio,
unique_tactics: .trajectory.tactic_diversity,
max_depth: .trajectory.max_depth_reached
}'Interpretation:
failure_ratio> 0.8 with lowtactic_diversity= model stuck in a rutfailure_ratio> 0.8 with hightactic_diversity= genuinely hard problemmax_depth= 0 means couldn’t even make progress on root goal
3. “Did blocking tactic X change the proof?”
Start with: <theorem>/<intervention>_comparison.json
cat theorem_name/block_simp_comparison.json | jq '{
ged: .ged_search_graph.value,
iter_delta: .trajectory_diff.iteration_diff,
new_axioms: .axiom_delta
}'Interpretation:
ged = 0: Identical proof structure (tactic wasn’t used anyway)gedsmall (1-3): Minor variation, same approachgedlarge (>5): Significantly different proof foundaxiom_deltanon-empty: Different lemmas used (interesting!)
For deeper analysis: Compare wild_type_graph.json vs block_X_graph.json
# Compare proof lengths
echo "Wild:" && cat wild_type_graph.json | jq '.nodes | length'
echo "Intervention:" && cat block_simp_graph.json | jq '.nodes | length'4. “Is the model behaving consistently?”
Start with: sheaf_analysis.json
cat sheaf_analysis.json | jq '{
consistency: .equiv_consistency,
problematic_families: [.per_family_residual | to_entries[] | select(.value > 20)]
}'Interpretation:
equiv_consistency> 0.95: Model responds consistently to similar goals (good)equiv_consistency< 0.90: High variance, model is unreliable- High
per_family_residualfor a tactic family = that tactic has unpredictable effects
Check inconsistent cases:
cat sheaf_analysis.json | jq '.inconsistent_cases[:5]'These are goal signatures where the model behaves erratically.
5. “What tactics work for what goals?”
Start with: goal_cache.json.gz
# List all unique signatures
gzcat goal_cache.json.gz | jq '.entries | keys | length'
# See which signatures have the most occurrences
gzcat goal_cache.json.gz | jq '.entries | to_entries | map({sig: .key[:30], occs: (.value.occurrences | length)}) | sort_by(-.occs) | .[:10]'
# Check outcomes for a specific signature (family 2 = intro)
gzcat goal_cache.json.gz | jq '.entries["<sig>"].occurrences | to_entries[0].value.outcomes'From summary.json.gz (aggregated tactic matrix):
# Which goal types saw the most tactic attempts
gzcat summary.json.gz | jq '.aggregates.goal_type_tactic_matrix | to_entries | map({goal: .key[:40], tactics: (.value | keys | length)}) | sort_by(-.tactics) | .[:10]'
# Success rate for "intro" on goals containing "→"
gzcat summary.json.gz | jq '.aggregates.goal_type_tactic_matrix | to_entries | map(select(.key | contains("→"))) | .[0].value | to_entries | map({tactic: .key, success: .value.success, fail: .value.failure})'6. “How did the proof actually get constructed?”
Start with: <theorem>/wild_type_assembly.json.gz
# See the theorem and step count
gzcat theorem_name/wild_type_assembly.json.gz | jq '{theorem: .theorem[:60], steps: (.steps | length)}'
# Trace each step with goals closed/opened
gzcat theorem_name/wild_type_assembly.json.gz | jq '.steps | to_entries | .[] | "\(.key + 1). \(.value.tactic) closed=\(.value.goalsClosed | length) opened=\(.value.goalsOpened | length)"'
# See which tactics closed goals
gzcat theorem_name/wild_type_assembly.json.gz | jq '.steps | map(select(.goalsClosed | length > 0)) | .[] | {tactic, closed: (.goalsClosed | length)}'What to look for:
- Steps that close multiple goals (powerful tactics)
- Steps that open many goals (case splits, inductions)
- The order tactics are applied (strategy)
For proof term analysis:
# Count node types in final proof
gzcat theorem_name/wild_type_assembly.json.gz | jq '.finalTerm.nodes | group_by(.[1].kind) | map({kind: .[0][1].kind, count: length}) | sort_by(-.count)'
# Total nodes in proof term
gzcat theorem_name/wild_type_assembly.json.gz | jq '.finalTerm.nodes | length'
# Check for incomplete proofs (mvar nodes = holes)
gzcat theorem_name/wild_type_assembly.json.gz | jq '.finalTerm.nodes | map(select(.[1].kind == "mvar")) | length'Common patterns:
- Many
appnodes = lots of function application (typical) - Many
lamnodes = proof involves constructing functions mvarnodes = incomplete proof term (solved proofs are expected to have zero)
7. “Compare two corpus runs”
# Set up aliases for the two runs
RUN1="logs/corpus-2026-01-06-190731" # ReProver
RUN2="logs/corpus-2026-01-07-120200" # DeepSeek
# Compare overall solve rates
echo "Run 1:" && gzcat $RUN1/summary.json.gz | jq '.aggregates | {theorems: .theorem_count, rate: .wild_type_solve_rate}'
echo "Run 2:" && gzcat $RUN2/summary.json.gz | jq '.aggregates | {theorems: .theorem_count, rate: .wild_type_solve_rate}'
# Get theorem names that are solved in run1 but failed in run2
comm -23 \
<(gzcat $RUN1/summary.json.gz | jq -r '.theorems[] | select(.wild_type.solved) | .name' | sort) \
<(gzcat $RUN2/summary.json.gz | jq -r '.theorems[] | select(.wild_type.solved) | .name' | sort)
# Get theorem names that failed in run1 but solved in run2
comm -13 \
<(gzcat $RUN1/summary.json.gz | jq -r '.theorems[] | select(.wild_type.solved) | .name' | sort) \
<(gzcat $RUN2/summary.json.gz | jq -r '.theorems[] | select(.wild_type.solved) | .name' | sort)
# Compare intervention solve rates
gzcat $RUN1/summary.json.gz | jq '{run: "run1", interventions: .aggregates.intervention_count, rate: .aggregates.intervention_solve_rate}'
gzcat $RUN2/summary.json.gz | jq '{run: "run2", interventions: .aggregates.intervention_count, rate: .aggregates.intervention_solve_rate}'8. “Find interesting intervention effects”
Anisomorphic proofs (structurally different when tactic blocked):
cat analysis_report.json | jq '.by_theorem[] | .theorem as $t | .interventions[] | select(.classification == "TYPE_B_ANISOMORPHIC") | {
theorem: $t,
intervention: .name,
ged: .search_dynamics.ged_search_graph,
wild_dag: .complexity.wild.dag_size,
int_dag: .complexity.intervention.dag_size
}'Interventions that found simpler proofs:
cat analysis_report.json | jq '.by_theorem[] | .theorem as $t | .interventions[] | select(.complexity.intervention.dag_size < .complexity.wild.dag_size) | {
theorem: $t,
intervention: .name,
reduction: (.complexity.wild.dag_size - .complexity.intervention.dag_size)
}' | head -209. “Debug a specific proof attempt”
Full replay of what happened:
# 1. See the theorem
cat theorem_name/wild_type_history.json | jq -r '.theorem'
# 2. Trace each step
cat theorem_name/wild_type_history.json | jq -r '.iterations[] | .iteration as $i | .attempts[]? | "\($i): \(.tactic) -> \(.outcome)"'
# 3. Check where it got stuck
cat theorem_name/wild_type_history.json | jq '.iterations | map(select([.attempts[].outcome] | index("success") | not)) | .[0]'
# 4. See the proof graph structure
cat theorem_name/wild_type_graph.json | jq '{
nodes: (.nodes | length),
edges: (.edges | length),
root_type: .nodes[0].type
}'Quick Reference: What File For What Question
| Question | File(s) |
|---|---|
| Overall success rate | summary.json.gz |
| Why did X fail? | X/wild_type_history.json, failure_analysis.json |
| Did blocking Y matter? | X/block_Y_comparison.json |
| Model consistency | sheaf_analysis.json, goal_cache.json.gz |
| Proof structure | X/wild_type_assembly.json.gz |
| Tactic patterns | goal_cache.json.gz, summary.json.gz (goal_type_tactic_matrix) |
| Compare interventions | X/ged_matrix.json, analysis_report.json |