CLI And Reporting
The CLI now supports four compute-facing entry points:
compute: validate one JSON request and emit one JSON resultsweep: validate a batch of JSON/JSONL requests and emit summary rowsreport: render summary rows, or raw compute requests, as JSONL, CSV, or Markdownbenchmark: run fixed reference demo cases and emit timings plus summary rows
Compute Schema
Top-level keys:
trials: required list of paired trial recordsproblem_space: optional structured problem-space objectH,H_unit,w,w_unit: legacy scalar inputs whenproblem_spaceis omittedagent_policy,blind_policy: optional policy namesagent_policy_spec,blind_policy_spec: optional structured policy specsbootstrap_samples,bootstrap_ci_level,bootstrap_seed: optional uncertainty controls
w may be a scalar or an object:
{
"default": 1.0,
"by_operator": {"move_b": 3.0},
"description": "toy operator costs",
"state_dependent": false,
"unit": "joule"
}Batch Formats
sweep --input accepts:
- a single JSON object
- a JSON array of objects
- a JSON object with
cases: [...] - JSONL with one object per line
report --input accepts either summary rows or raw compute requests.
benchmark is intentionally narrow. It runs a small deterministic suite of reference demos with fixed seeds so runtime changes and formatting regressions are easy to spot in review.
Example:
uv run python -m paper_k benchmark --case sorting-small --case bitstring-small --format markdownBenchmark rows include the same K_restricted_mean_at_stop and solve-rate columns as sweep/report output, plus mean_wall_sec, min_wall_sec, and max_wall_sec.
Python Surface
If you need executable problem spaces or exact finite-horizon calculations, use the Python API directly:
ProblemExecutorExecutablePolicycompare_policies_in_problem_space(...)exact_finite_horizon_metrics(...)
Golden Report Guard
The CLI tests pin an exact Markdown rendering in tests/fixtures/report_markdown_golden.md. When the report layout or numeric formatting changes, update that fixture deliberately in the same patch so the review surface stays explicit.