cli-efficiency

Measures token efficiency of gad CLI vs raw file reads for coding agent context. Compares two workflows: (1) CLI-first using gad context/session/state/phases/tasks, (2) baseline grep+read pattern used by GSD/RP agents.

Source on GitHub

Catalog scope

What skills this project can use

This project has no framework skills in its bootstrap set. The agent must author its own methodology from scratch.

Requirements history

5 versions — each change triggers a new round

5 of 5

2026-04-09

Current

core shift from v4

Playtest-driven expansion. v4 was a designer rewrite; v5 comes entirely from user play of Bare v5 (0.805 rubric), Emergent v4 (rescored to 0.885 after user beat floor 1), GAD v9 (rate-limited), and GAD v10 (api-interrupted). Everything in v4 still applies — v5 adds 21 new/amended requirements (R-v5.01..21) on top as a structured `<addendum>` section inside the same template XML.

changes from v4

- **R-v5.01** Training via encounter, not menu — affinity rises from casting, not selecting. Training Dummy encounter room type. - **R-v5.02** Rune discovery as a gameplay loop — starter subset only, rest found in world, one rune per floor gated behind non-combat. - **R-v5.03** Merchants with buy/sell/trade — at least one per floor, gold as tracked resource. - **R-v5.04** NPC dialogue with branching outcomes — 3+ NPCs, 2+ branches each, choices change game state. - **R-v5.05** Inventory/bag with grid + equippable items — weapon/off-hand/body/trinket slots affecting stats. - **R-v5.06** Visible character sheet + skill tree — physical/combat skills separate from spells, distinct resource. - **R-v5.07** Spell and skill loadout slots — forced specialization as a build-pressure mechanic. - **R-v5.08** Progression sources sufficient to reach end boss (amends G1) — guaranteed mana-max / spell-power upgrade per floor. - **R-v5.09** Save checkpoints + continue-after-death (amends G1) — Continue must never hard-brick. - **R-v5.10** Notification lifecycle (amends G3) — auto-dismiss, clear on new game, no persistence across sessions. - **R-v5.11** Rest rooms must offer rest — forge rooms combining Forge+Train+Rest must expose Rest as an action. - **R-v5.12** Navigation and map usability (amends G3) — minimum 2D graph layout, one-click navigation. - **R-v5.13** Combat model must be explicitly chosen — **Model A (rule-based simulation, Unicorn-Overlord-style)** preferred over direct-control. - **R-v5.14** Action policies driven by entity traits — applies to enemies AND NPCs; dialogue changes with trait shifts. - **R-v5.15** Real-time game-time model — 1hr real = 1 day game, remove tick system, UI time-shading is soft. - **R-v5.16** Affinity reward loop — visible payoff for boosting a rune, not just a hidden stat. - **R-v5.17** Central visual navigation map with player token (stronger form of R-v5.12). - **R-v5.18** Visual player vs enemy identity — Pokemon / Unicorn Overlord style (UO preferred). - **R-v5.19** Spells as craftable ingredients — spells + runes both combine, procedural-but-semantic naming. Explicitly mirrors the emergent-evolution hypothesis (gad-68) as an in-game narrative analogue. - **R-v5.20** Rune uniqueness within a single spell — bug fix for Bare v5's double-affinity exploit. - **R-v5.21** Event-driven rendering — kill the per-tick redraw glitches observed across ALL round-4 builds.

scoring impact

- v4 gates remain (G1, G2, G3, G4). v5 amendments tighten G1 (death/continue, end-boss reachable), G2 (training is encounter-driven), G3 (notification lifecycle, map usability). - New scored dimensions: `inventory_and_equipment_present`, `npc_dialogue_present` — not gates, but meaningful score hits if missing. - Rubric weights unchanged from v4.

deferred to v6

- Deep evolution trees (multi-stage mutations). - Rune affinity decay when unused. - Multi-character party play — out of scope for escape-the-dungeon family.

brownfield vs greenfield

- Greenfield v5 applies to escape-the-dungeon, escape-the-dungeon-bare, escape-the-dungeon-emergent (same three templates all updated together). - Brownfield v5 extensions are not yet authored — round 5 starts greenfield-first.

round 5 unblock

This version is the trigger for round 5. Round 5 runs serially (gad-67) against this requirements set. HTTP 529 investigation + GAD v11 retry queued, see task registry.

source

`evals/_v5-requirements-addendum.md` holds the prose version with full user rationale quotes. The XML addendum in each template is the machine-readable form.

decision references

gad-65 (CSH), gad-68 (emergent-evolution), gad-71 (data/ pseudo-database for bug tracking), gad-72 (rounds framework — this is now round 5), gad-73 (fundamental skills triumvirate — the R-v5.19 spells-as-ingredients mechanic is the in-game analogue).

Changed from v4

Requirements version change from v4 → v5 defines a new round boundary.

Scoring weights

How this project is scored

Defined in evals/cli-efficiency/gad.json. The composite score is a weighted sum of these dimensions. See /methodology for the formula and caps.

Dimension	Weight
token_reduction	0.40
context_completeness	0.35
information_loss	0.25