Task Selection

This document defines which history-mined tasks belong in phase-0 and which do not.

Selection Goals

keep the corpus small enough to inspect by hand
prefer tasks with a strong history trail
prefer tasks that can be pinned to one commit
prefer tasks with a single explicit acceptance command
prefer tasks whose runtime assumptions can be recorded plainly
prefer tasks where a maintainer-only gold resolution can be captured

Inclusion Criteria

A task is eligible when all of the following are true:

it comes from tinygrad git history, an issue thread, or a PR thread
the starting repository state can be described as repo_remote plus repo_commit
the intended change can be summarized as a bounded patch
the success condition can be checked by one deterministic command or a tightly defined command set
the required hardware and runtime assumptions can be recorded
the gold resolution can be kept in the private ledger without leaking into the public row

Exclusion Criteria

Exclude tasks that:

depend on moving upstream branches
require manual intervention during evaluation
have no stable acceptance command
are only meaningful under undocumented local machine state
depend on GPU hardware before the CPU lane is stable
cannot be made reproducible inside a fixed time window
cannot be evaluated without network access

Curation Procedure

Start from git history and issue or PR threads.
Identify isolated, legible fixes or feature changes.
Record the public task row without gold solution material.
Rewrite the task statement so it describes the file and test intent without copying the raw commit subject.
Record the maintainer-only ledger entry with gold commit, optional gold patch, stripped commit metadata, and leakage notes.
Verify the task can be exported as a model-visible prompt packet without repo provenance.
Verify the task can be frozen with a deterministic identifier and manifest.
Reject tasks that need live discovery or ambiguous success criteria.
Freeze the resulting corpus into the benchmark manifest.

Preferred Task Shapes

The first corpus favors small, technical, self-contained work:

bug fixes with a targeted regression test
narrow API or runtime adjustments
small refactors that preserve behavior and are checkable by tests
setup or packaging fixes that unblock a known command path
tasks whose changed tests live in stable local suites such as test/unit, test/null, test/backend, or similarly small tests/ layouts

Mining Heuristics

When mining from git history, prefer commits that satisfy most of the following:

non-merge commit with exactly one parent
small patch footprint
source files changed alongside localized tests
one clear behavioral intent
deterministic local acceptance command
no vendored, generated, or formatting-dominated noise

The mined candidate set also carries an explicit quality score and review priority so manual curation can start from the strongest fixes first.

For the initial tinygrad corpus, prefer miner configurations that include tinygrad/ sources and test/ suites while explicitly excluding extra/, examples/, docs/, and unsuitable test prefixes such as test/external/, test/models/, test/speed/, test/web/, test/amd/, test/device/, and test/mockgpu/.

Reject or down-rank commits that are mostly:

merges or reverts
docs-only or rename-only changes
broad refactors
flaky or network-dependent integration work
GPU-only validation without pinned hardware
tests under external, model-heavy, speed, web, or hardware-specific suites

Avoid For Phase-0

large architectural rewrites
tasks that require multiple subsystems to change at once
performance-only claims
subjective-review tasks
bespoke GPU setup or host tuning
tasks whose evaluation requires browsing git history from inside the model workspace

Provenance Record

Each selected task retains enough evidence to explain why it was chosen:

source thread or commit reference
why the task is reproducible
why the acceptance command is stable
why the task belongs in the CPU correctness lane
what gold resolution was used for comparison
any known caveats or environment assumptions

The public benchmark stays clean. Leakage-sensitive material belongs in the maintainer-only ledger, keyed by item_id. Model-visible prompt packets are cleaner still: no repo remote, no commit id, no source refs, and no miner provenance metadata.