escape-the-dungeon-planning-only

Planning-only condition: agent gets the full .planning/ XML scaffold (ROADMAP, TASK-REGISTRY, DECISIONS, STATE) but only bootstrap skills (create-skill, find-sprites) — NO GAD framework skills. Tests whether the planning STRUCTURE alone (without skill methodology) improves outcomes vs bare. Isolates the variable: does having a roadmap + task registry + decision log help, even without the skills to execute against them?

planning.zip (28.9 KB)template.zip (28.9 KB)Source on GitHub

Catalog scope

What skills this project can use

This project inherits a minimal bootstrap skill set from the framework. The agent can apply these but must author its own methodology beyond them.

Bootstrap set — 2 skills inherited

create-skill

>- Capture a reusable pattern, recipe, or failure-mode fix as a skill document so future agents (including you after a context reset) can apply it without rediscovering it. Use this skill whenever you solve a non-obvious problem, discover a working pattern after two or more failed attempts, hit a bug whose fix isn't self-evident from the code, or finish a piece of work that future runs will likely repeat. Write the skill the moment you learn the lesson — not at the end. In bare/emergent eval conditions this is the primary mechanism for agent-authored methodology. The agent IS the workflow author, and skills are how that authorship persists.

find-sprites

Source visual assets (sprites, icons, tilesets, portraits) for a game or UI-heavy build in a way that yields a coherent, intentional look instead of a debug-console aesthetic. Use this skill when you need art for rooms, entities, UI controls, HP bars, spell icons, status effects, or any other visual element; when the build is failing its UI-quality gate because it looks unintentional; or when you're about to fall back to raw ASCII/text UI. The goal is "looks like someone designed it" — not photorealism, not 1:1 with AAA games, but internally consistent and readable.

Requirements history

5 versions — each change triggers a new round

5 of 5

2026-04-09

Current

core shift from v4

Playtest-driven expansion. v4 was a designer rewrite; v5 comes entirely from user play of Bare v5 (0.805 rubric), Emergent v4 (rescored to 0.885 after user beat floor 1), GAD v9 (rate-limited), and GAD v10 (api-interrupted). Everything in v4 still applies — v5 adds 21 new/amended requirements (R-v5.01..21) on top as a structured `<addendum>` section inside the same template XML.

changes from v4

- **R-v5.01** Training via encounter, not menu — affinity rises from casting, not selecting. Training Dummy encounter room type. - **R-v5.02** Rune discovery as a gameplay loop — starter subset only, rest found in world, one rune per floor gated behind non-combat. - **R-v5.03** Merchants with buy/sell/trade — at least one per floor, gold as tracked resource. - **R-v5.04** NPC dialogue with branching outcomes — 3+ NPCs, 2+ branches each, choices change game state. - **R-v5.05** Inventory/bag with grid + equippable items — weapon/off-hand/body/trinket slots affecting stats. - **R-v5.06** Visible character sheet + skill tree — physical/combat skills separate from spells, distinct resource. - **R-v5.07** Spell and skill loadout slots — forced specialization as a build-pressure mechanic. - **R-v5.08** Progression sources sufficient to reach end boss (amends G1) — guaranteed mana-max / spell-power upgrade per floor. - **R-v5.09** Save checkpoints + continue-after-death (amends G1) — Continue must never hard-brick. - **R-v5.10** Notification lifecycle (amends G3) — auto-dismiss, clear on new game, no persistence across sessions. - **R-v5.11** Rest rooms must offer rest — forge rooms combining Forge+Train+Rest must expose Rest as an action. - **R-v5.12** Navigation and map usability (amends G3) — minimum 2D graph layout, one-click navigation. - **R-v5.13** Combat model must be explicitly chosen — **Model A (rule-based simulation, Unicorn-Overlord-style)** preferred over direct-control. - **R-v5.14** Action policies driven by entity traits — applies to enemies AND NPCs; dialogue changes with trait shifts. - **R-v5.15** Real-time game-time model — 1hr real = 1 day game, remove tick system, UI time-shading is soft. - **R-v5.16** Affinity reward loop — visible payoff for boosting a rune, not just a hidden stat. - **R-v5.17** Central visual navigation map with player token (stronger form of R-v5.12). - **R-v5.18** Visual player vs enemy identity — Pokemon / Unicorn Overlord style (UO preferred). - **R-v5.19** Spells as craftable ingredients — spells + runes both combine, procedural-but-semantic naming. Explicitly mirrors the emergent-evolution hypothesis (gad-68) as an in-game narrative analogue. - **R-v5.20** Rune uniqueness within a single spell — bug fix for Bare v5's double-affinity exploit. - **R-v5.21** Event-driven rendering — kill the per-tick redraw glitches observed across ALL round-4 builds.

scoring impact

- v4 gates remain (G1, G2, G3, G4). v5 amendments tighten G1 (death/continue, end-boss reachable), G2 (training is encounter-driven), G3 (notification lifecycle, map usability). - New scored dimensions: `inventory_and_equipment_present`, `npc_dialogue_present` — not gates, but meaningful score hits if missing. - Rubric weights unchanged from v4.

deferred to v6

- Deep evolution trees (multi-stage mutations). - Rune affinity decay when unused. - Multi-character party play — out of scope for escape-the-dungeon family.

brownfield vs greenfield

- Greenfield v5 applies to escape-the-dungeon, escape-the-dungeon-bare, escape-the-dungeon-emergent (same three templates all updated together). - Brownfield v5 extensions are not yet authored — round 5 starts greenfield-first.

round 5 unblock

This version is the trigger for round 5. Round 5 runs serially (gad-67) against this requirements set. HTTP 529 investigation + GAD v11 retry queued, see task registry.

source

`evals/_v5-requirements-addendum.md` holds the prose version with full user rationale quotes. The XML addendum in each template is the machine-readable form.

decision references

gad-65 (CSH), gad-68 (emergent-evolution), gad-71 (data/ pseudo-database for bug tracking), gad-72 (rounds framework — this is now round 5), gad-73 (fundamental skills triumvirate — the R-v5.19 spells-as-ingredients mechanic is the in-game analogue).

Changed from v4

Requirements version change from v4 → v5 defines a new round boundary.

Known bugs

1 bug reported

open

Glitchy redraws on button clicks across all round-4 builds

Found in escape-the-dungeon/v10

UI visibly glitches/flickers on button clicks as if full per-tick redraws are running. Observed consistently across GAD v9, GAD v10, Bare v5, and Emergent v4.

Scoring weights

How this project is scored

Defined in evals/escape-the-dungeon-planning-only/gad.json. The composite score is a weighted sum of these dimensions. See /methodology for the formula and caps.

Dimension	Weight
human_review	0.30
requirement_coverage	0.20
implementation_quality	0.15
skill_reuse	0.15
workflow_quality	0.10
iteration_evidence	0.05
time_efficiency	0.05