escape-the-dungeon-planning-only
escape-the-dungeon-planning-only
Planning-only condition: agent gets the full .planning/ XML scaffold (ROADMAP, TASK-REGISTRY, DECISIONS, STATE) but only bootstrap skills (create-skill, find-sprites) — NO GAD framework skills. Tests whether the planning STRUCTURE alone (without skill methodology) improves outcomes vs bare. Isolates the variable: does having a roadmap + task registry + decision log help, even without the skills to execute against them?
Catalog scope
What skills this project can use
This project inherits a minimal bootstrap skill set from the framework. The agent can apply these but must author its own methodology beyond them.
create-skill>- Capture a reusable pattern, recipe, or failure-mode fix as a skill document so future agents (including you after a context reset) can apply it without rediscovering it. Use this skill whenever you solve a non-obvious problem, discover a working pattern after two or more failed attempts, hit a bug whose fix isn't self-evident from the code, or finish a piece of work that future runs will likely repeat. Write the skill the moment you learn the lesson — not at the end. In bare/emergent eval conditions this is the primary mechanism for agent-authored methodology. The agent IS the workflow author, and skills are how that authorship persists.
find-spritesSource visual assets (sprites, icons, tilesets, portraits) for a game or UI-heavy build in a way that yields a coherent, intentional look instead of a debug-console aesthetic. Use this skill when you need art for rooms, entities, UI controls, HP bars, spell icons, status effects, or any other visual element; when the build is failing its UI-quality gate because it looks unintentional; or when you're about to fall back to raw ASCII/text UI. The goal is "looks like someone designed it" — not photorealism, not 1:1 with AAA games, but internally consistent and readable.
Requirements history
5 versions — each change triggers a new round
core shift from v4
changes from v4
scoring impact
deferred to v6
brownfield vs greenfield
round 5 unblock
source
decision references
Changed from v4
Requirements version change from v4 → v5 defines a new round boundary.
Known bugs
1 bug reported
Glitchy redraws on button clicks across all round-4 builds
Found in escape-the-dungeon/v10
UI visibly glitches/flickers on button clicks as if full per-tick redraws are running. Observed consistently across GAD v9, GAD v10, Bare v5, and Emergent v4.
Scoring weights
How this project is scored
Defined in evals/escape-the-dungeon-planning-only/gad.json. The composite score is a weighted sum of these dimensions. See /methodology for the formula and caps.
| Dimension | Weight |
|---|---|
| human_review | 0.30 |
| requirement_coverage | 0.20 |
| implementation_quality | 0.15 |
| skill_reuse | 0.15 |
| workflow_quality | 0.10 |
| iteration_evidence | 0.05 |
| time_efficiency | 0.05 |