Bare
v3
requirements unknown
2026-04-07
pre-gate requirements

portfolio-bare

portfolio-bare/v3

Composite score

0.944

Dimension scores

Where the composite came from

Each dimension is scored 0.0 – 1.0 and combined using the weights in evals/portfolio-bare/gad.json. Human review dominates on purpose — process metrics alone can't rescue a broken run.

DimensionScoreBar
Planning quality1.000
Per-task discipline1.000
Skill accuracy0.800
Time efficiency0.950

Composite formula

How 0.944 was calculated

The composite score is a weighted sum of the dimensions above. Weights come from evals/portfolio-bare/gad.json. Contribution = score × weight; dimensions sorted by contribution so you can see what actually moved the needle.

DimensionWeightScoreContribution
requirement_coverage0.400.0000.0000(0%)
task_alignment0.250.0000.0000(0%)
state_hygiene0.200.0000.0000(0%)
decision_coverage0.150.0000.0000(0%)
Weighted sum1.000.0000

Note: The weighted sum above (0.0000) doesn't exactly match the stored composite (0.9440). The difference is usually the v3 low-score cap (composite < 0.20 → 0.40, composite < 0.10 → 0.25) or a run with an older scoring pass.

Skill accuracy breakdown

Did the agent invoke the right skills at the right moments?

Tracing gap

This run stored only the aggregate skill_accuracy: 0.80 — there is no per-skill trigger breakdown in its TRACE.json. We can't tell you which of the expected skills fired vs missed. This is exactly the failure mode gad-50 calls out: the trace schema is too lossy to explain scores like this after the fact.

Phase 25 of the GAD framework work ships trace schema v4 — every tool use, skill invocation with its trigger context, and subagent spawn with inputs + outputs. Older runs like this one will keep their aggregate score but new runs will land with the full breakdown.

How tracing works →

Process metrics

How the agent actually worked

Primary runtime
older runs may not carry runtime attribution yet
Agent lanes
0
0 root · 0 subagent · source missing
Observed depth
0 traced event(s) with agent lineage
Wall clock
24m
5 phases · 12 tasks
Started
Apr 7, 3:42 AM
Run start captured in TRACE timing metadata
Ended
Apr 7, 4:06 AM
Missing end time usually means the run was scaffolded but never finalized
Tool uses
140
90,181 tokens
Commits
13
12 with task id · 1 batch
Planning docs
0
decisions captured · 5 phases planned
Client debug · NEXT_PUBLIC_CLIENT_DEBUG=1
0 lines

No events yet. Window errors, unhandled rejections, and React render errors appear here. Set NEXT_PUBLIC_CLIENT_DEBUG_CONSOLE=1 to mirror console.error / console.warn.