eval project
tooling
subagent-utility
subagent-utility
Formal evaluation of subagent utility vs single-session — informs gad-16 revision
Catalog scope
What skills this project can use
This project has no framework skills in its bootstrap set. The agent must author its own methodology from scratch.
Requirements history
5 versions — each change triggers a new round
5 of 5
v5
2026-04-09Current
core shift from v4
Playtest-driven expansion. v4 was a designer rewrite; v5 comes entirely from user play of Bare v5 (0.805 rubric), Emergent v4 (rescored to 0.885 after user beat floor 1), GAD v9 (rate-limited), and GAD v10 (api-interrupted). Everything in v4 still applies — v5 adds 21 new/amended requirements (R-v5.01..21) on top as a structured `<addendum>` section inside the same template XML.
changes from v4
- **R-v5.01** Training via encounter, not menu — affinity rises from casting, not selecting. Training Dummy encounter room type.
- **R-v5.02** Rune discovery as a gameplay loop — starter subset only, rest found in world, one rune per floor gated behind non-combat.
- **R-v5.03** Merchants with buy/sell/trade — at least one per floor, gold as tracked resource.
- **R-v5.04** NPC dialogue with branching outcomes — 3+ NPCs, 2+ branches each, choices change game state.
- **R-v5.05** Inventory/bag with grid + equippable items — weapon/off-hand/body/trinket slots affecting stats.
- **R-v5.06** Visible character sheet + skill tree — physical/combat skills separate from spells, distinct resource.
- **R-v5.07** Spell and skill loadout slots — forced specialization as a build-pressure mechanic.
- **R-v5.08** Progression sources sufficient to reach end boss (amends G1) — guaranteed mana-max / spell-power upgrade per floor.
- **R-v5.09** Save checkpoints + continue-after-death (amends G1) — Continue must never hard-brick.
- **R-v5.10** Notification lifecycle (amends G3) — auto-dismiss, clear on new game, no persistence across sessions.
- **R-v5.11** Rest rooms must offer rest — forge rooms combining Forge+Train+Rest must expose Rest as an action.
- **R-v5.12** Navigation and map usability (amends G3) — minimum 2D graph layout, one-click navigation.
- **R-v5.13** Combat model must be explicitly chosen — **Model A (rule-based simulation, Unicorn-Overlord-style)** preferred over direct-control.
- **R-v5.14** Action policies driven by entity traits — applies to enemies AND NPCs; dialogue changes with trait shifts.
- **R-v5.15** Real-time game-time model — 1hr real = 1 day game, remove tick system, UI time-shading is soft.
- **R-v5.16** Affinity reward loop — visible payoff for boosting a rune, not just a hidden stat.
- **R-v5.17** Central visual navigation map with player token (stronger form of R-v5.12).
- **R-v5.18** Visual player vs enemy identity — Pokemon / Unicorn Overlord style (UO preferred).
- **R-v5.19** Spells as craftable ingredients — spells + runes both combine, procedural-but-semantic naming. Explicitly mirrors the emergent-evolution hypothesis (gad-68) as an in-game narrative analogue.
- **R-v5.20** Rune uniqueness within a single spell — bug fix for Bare v5's double-affinity exploit.
- **R-v5.21** Event-driven rendering — kill the per-tick redraw glitches observed across ALL round-4 builds.
scoring impact
- v4 gates remain (G1, G2, G3, G4). v5 amendments tighten G1 (death/continue, end-boss reachable), G2 (training is encounter-driven), G3 (notification lifecycle, map usability).
- New scored dimensions: `inventory_and_equipment_present`, `npc_dialogue_present` — not gates, but meaningful score hits if missing.
- Rubric weights unchanged from v4.
deferred to v6
- Deep evolution trees (multi-stage mutations).
- Rune affinity decay when unused.
- Multi-character party play — out of scope for escape-the-dungeon family.
brownfield vs greenfield
- Greenfield v5 applies to escape-the-dungeon, escape-the-dungeon-bare, escape-the-dungeon-emergent (same three templates all updated together).
- Brownfield v5 extensions are not yet authored — round 5 starts greenfield-first.
round 5 unblock
This version is the trigger for round 5. Round 5 runs serially (gad-67) against this requirements set. HTTP 529 investigation + GAD v11 retry queued, see task registry.
source
`evals/_v5-requirements-addendum.md` holds the prose version with full user rationale quotes. The XML addendum in each template is the machine-readable form.
decision references
gad-65 (CSH), gad-68 (emergent-evolution), gad-71 (data/ pseudo-database for bug tracking), gad-72 (rounds framework — this is now round 5), gad-73 (fundamental skills triumvirate — the R-v5.19 spells-as-ingredients mechanic is the in-game analogue).
Changed from v4
Requirements version change from v4 → v5 defines a new round boundary.