Back to projects
eval project
tooling
baseline: v1

cli-efficiency

cli-efficiency

Measures token efficiency of gad CLI vs raw file reads for coding agent context. Compares two workflows: (1) CLI-first using gad context/session/state/phases/tasks, (2) baseline grep+read pattern used by GSD/RP agents.

Catalog scope

What skills this project can use

This project has no framework skills in its bootstrap set. The agent must author its own methodology from scratch.

Requirements history

5 versions — each change triggers a new round

5 of 5
v5
2026-04-09
Current

core shift from v4

Playtest-driven expansion. v4 was a designer rewrite; v5 comes entirely from user play of Bare v5 (0.805 rubric), Emergent v4 (rescored to 0.885 after user beat floor 1), GAD v9 (rate-limited), and GAD v10 (api-interrupted). Everything in v4 still applies — v5 adds 21 new/amended requirements (R-v5.01..21) on top as a structured `<addendum>` section inside the same template XML.

changes from v4

- **R-v5.01** Training via encounter, not menu — affinity rises from casting, not selecting. Training Dummy encounter room type. - **R-v5.02** Rune discovery as a gameplay loop — starter subset only, rest found in world, one rune per floor gated behind non-combat. - **R-v5.03** Merchants with buy/sell/trade — at least one per floor, gold as tracked resource. - **R-v5.04** NPC dialogue with branching outcomes — 3+ NPCs, 2+ branches each, choices change game state. - **R-v5.05** Inventory/bag with grid + equippable items — weapon/off-hand/body/trinket slots affecting stats. - **R-v5.06** Visible character sheet + skill tree — physical/combat skills separate from spells, distinct resource. - **R-v5.07** Spell and skill loadout slots — forced specialization as a build-pressure mechanic. - **R-v5.08** Progression sources sufficient to reach end boss (amends G1) — guaranteed mana-max / spell-power upgrade per floor. - **R-v5.09** Save checkpoints + continue-after-death (amends G1) — Continue must never hard-brick. - **R-v5.10** Notification lifecycle (amends G3) — auto-dismiss, clear on new game, no persistence across sessions. - **R-v5.11** Rest rooms must offer rest — forge rooms combining Forge+Train+Rest must expose Rest as an action. - **R-v5.12** Navigation and map usability (amends G3) — minimum 2D graph layout, one-click navigation. - **R-v5.13** Combat model must be explicitly chosen — **Model A (rule-based simulation, Unicorn-Overlord-style)** preferred over direct-control. - **R-v5.14** Action policies driven by entity traits — applies to enemies AND NPCs; dialogue changes with trait shifts. - **R-v5.15** Real-time game-time model — 1hr real = 1 day game, remove tick system, UI time-shading is soft. - **R-v5.16** Affinity reward loop — visible payoff for boosting a rune, not just a hidden stat. - **R-v5.17** Central visual navigation map with player token (stronger form of R-v5.12). - **R-v5.18** Visual player vs enemy identity — Pokemon / Unicorn Overlord style (UO preferred). - **R-v5.19** Spells as craftable ingredients — spells + runes both combine, procedural-but-semantic naming. Explicitly mirrors the emergent-evolution hypothesis (gad-68) as an in-game narrative analogue. - **R-v5.20** Rune uniqueness within a single spell — bug fix for Bare v5's double-affinity exploit. - **R-v5.21** Event-driven rendering — kill the per-tick redraw glitches observed across ALL round-4 builds.

scoring impact

- v4 gates remain (G1, G2, G3, G4). v5 amendments tighten G1 (death/continue, end-boss reachable), G2 (training is encounter-driven), G3 (notification lifecycle, map usability). - New scored dimensions: `inventory_and_equipment_present`, `npc_dialogue_present` — not gates, but meaningful score hits if missing. - Rubric weights unchanged from v4.

deferred to v6

- Deep evolution trees (multi-stage mutations). - Rune affinity decay when unused. - Multi-character party play — out of scope for escape-the-dungeon family.

brownfield vs greenfield

- Greenfield v5 applies to escape-the-dungeon, escape-the-dungeon-bare, escape-the-dungeon-emergent (same three templates all updated together). - Brownfield v5 extensions are not yet authored — round 5 starts greenfield-first.

round 5 unblock

This version is the trigger for round 5. Round 5 runs serially (gad-67) against this requirements set. HTTP 529 investigation + GAD v11 retry queued, see task registry.

source

`evals/_v5-requirements-addendum.md` holds the prose version with full user rationale quotes. The XML addendum in each template is the machine-readable form.

decision references

gad-65 (CSH), gad-68 (emergent-evolution), gad-71 (data/ pseudo-database for bug tracking), gad-72 (rounds framework — this is now round 5), gad-73 (fundamental skills triumvirate — the R-v5.19 spells-as-ingredients mechanic is the in-game analogue).

Changed from v4

Requirements version change from v4v5 defines a new round boundary.

Scoring weights

How this project is scored

Defined in evals/cli-efficiency/gad.json. The composite score is a weighted sum of these dimensions. See /methodology for the formula and caps.

DimensionWeight
token_reduction0.40
context_completeness0.35
information_loss0.25
Client debug · NEXT_PUBLIC_CLIENT_DEBUG=1
0 lines

No events yet. Window errors, unhandled rejections, and React render errors appear here. Set NEXT_PUBLIC_CLIENT_DEBUG_CONSOLE=1 to mirror console.error / console.warn.