portfolio-bare

portfolio-bare/v3

Composite score

0.944

Source on GitHub Raw TRACE.json

Dimension scores

Where the composite came from

Each dimension is scored 0.0 – 1.0 and combined using the weights in evals/portfolio-bare/gad.json. Human review dominates on purpose — process metrics alone can't rescue a broken run.

Dimension	Score	Bar
Planning quality	1.000
Per-task discipline	1.000
Skill accuracy	0.800
Time efficiency	0.950

Composite formula

How 0.944 was calculated

The composite score is a weighted sum of the dimensions above. Weights come from evals/portfolio-bare/gad.json. Contribution = score × weight; dimensions sorted by contribution so you can see what actually moved the needle.

Dimension	Weight	Score	Contribution
requirement_coverage	0.40	0.000	0.0000(0%)
task_alignment	0.25	0.000	0.0000(0%)
state_hygiene	0.20	0.000	0.0000(0%)
decision_coverage	0.15	0.000	0.0000(0%)
Weighted sum	1.00		0.0000

Note: The weighted sum above (0.0000) doesn't exactly match the stored composite (0.9440). The difference is usually the v3 low-score cap (composite < 0.20 → 0.40, composite < 0.10 → 0.25) or a run with an older scoring pass.

Skill accuracy breakdown

Did the agent invoke the right skills at the right moments?

Tracing gap

This run stored only the aggregate skill_accuracy: 0.80 — there is no per-skill trigger breakdown in its TRACE.json. We can't tell you which of the expected skills fired vs missed. This is exactly the failure mode gad-50 calls out: the trace schema is too lossy to explain scores like this after the fact.

Phase 25 of the GAD framework work ships trace schema v4 — every tool use, skill invocation with its trigger context, and subagent spawn with inputs + outputs. Older runs like this one will keep their aggregate score but new runs will land with the full breakdown.

How tracing works →

Process metrics

How the agent actually worked

Primary runtime

—

older runs may not carry runtime attribution yet

Agent lanes

0 root · 0 subagent · source missing

Observed depth

—

0 traced event(s) with agent lineage

Wall clock

24m

5 phases · 12 tasks

Started

Apr 7, 3:42 AM

Run start captured in TRACE timing metadata

Ended

Apr 7, 4:06 AM

Missing end time usually means the run was scaffolded but never finalized

Tool uses

140

90,181 tokens

Commits

12 with task id · 1 batch

Planning docs

decisions captured · 5 phases planned