Local DB
Every number, with a receipt. Show me where this came from.
Research credibility lives or dies on whether you can trace a number back to its inputs. This page indexes every chart and stat on the site with: where the number comes from, how it's derived, and whether the source is deterministic (computed at prebuild), self-reported (the agent put it in TRACE.json), human-rated (submitted via the rubric CLI), or authored (hand-curated content).
Per GAD-D-69(programmatic-eval priority), every new metric must answer "can this be collected programmatically?" before "how do we score it?". The push is to move self-report sources toward deterministic ones — the gaps are tracked in .planning/docs/GAPS.md.
9
2
4
1
Trust levels explained
Hero
Source
PLAYABLE_INDEX (lib/eval-data.generated.ts)Formula
Object.keys(PLAYABLE_INDEX).lengthSet at prebuild from auditPlayable() — counts directories under apps/portfolio/public/evals/<project>/<version>/.
Source
EVAL_RUNSFormula
EVAL_RUNS.filter(r => r.scores.composite != null).lengthSource
ALL_DECISIONSFormula
ALL_DECISIONS.length (parseAllDecisions() over .planning/DECISIONS.xml)Per-run card
Source
TRACE.json scores.compositeFormula
Σ_dimensions (weight * dimension_score), capped by gate failuresComposite is currently agent-self-reported in TRACE.json. Programmatic alternative tracked under GAPS.md G1 (deferred until UI stabilizes per gad-99).
Source
TRACE.json human_review (rubric form)Formula
Σ_dimensions (weight * score) per project's human_review_rubricSubmitted via `gad eval review --rubric '{...}'`. Per-dimension scoring per gad-61 / decision gad-70.
Roadmap
Source
pressureForRound() and constants in app/roadmap/roadmap-shared.tsFormula
f(requirement complexity, ambiguity, constraint density, iteration budget, failure cost) — currently authoredWill become programmatic when the pressure-score-formula open question resolves. See gad-75.
Emergent
Source
TRACE.json human_review.dimensions.skill_inheritance_effectivenessFormula
Human-rated 0.0–1.0 on whether the run productively inherited + evolved + authored skillsThe compound-skills hypothesis test signal. Hygiene component (file-mutation events + CHANGELOG validity) is queued as GAPS G11 — automatable.
Per-run page
Source
TRACE.json derived.tool_use_mixFormula
Counts of tool_use events per tool name from the trace streamReference pattern for all new programmatic metrics — see GAPS.md G4.
Source
TRACE.json derived.plan_adherence_deltaFormula
(tasks_committed - tasks_planned) / tasks_plannedSource
TRACE.json gitAnalysis (git log over the run's worktree)Formula
Counts of commits, batch vs per-task, ratio of task-id-prefixed commits to total/decisions
Source
ALL_DECISIONSFormula
parseAllDecisions() walks .planning/DECISIONS.xml/planning (tasks tab)
Source
ALL_TASKSFormula
parseAllTasks() walks .planning/TASK-REGISTRY.xml/planning (phases tab)
Source
ALL_PHASESFormula
parseAllPhases() walks .planning/ROADMAP.xml/glossary
Source
GLOSSARYFormula
data/glossary.json terms[]/questions
Source
OPEN_QUESTIONSFormula
data/open-questions.json questions[]/planning (bugs tab)
Source
BUGSFormula
data/bugs.json bugs[]