The framework

GAD is planning + evaluation for AI coding agents.

A small CLI, a strict five-step loop, and an experiment harness stapled to the side. The CLI re-hydrates context in one command. Skills tell the agent what to do. Subagents do the expensive work off the main thread. Templates scaffold new projects. Runtime-specific command wrappers are generated only when a coding agent needs them. Evals measure whether any of this actually helps. Read the loop diagram for how it fits together.

Source on GitHub How we score Current planning state

Core concepts

Three moving parts

Skills

Methodology docs the agent follows. Official consumer skills live under sdk/skills and are the installable public surface; root skills are internal methodology for evolving GAD itself.

-> browse skills

Subagents

Specialised workers the framework spawns for planning, research, verification, UI audits, and more. Subagents receive a task plus context and return a concrete artifact such as PLAN.md, RESEARCH.md, or VERIFICATION.md.

-> browse subagents

Templates

The planning-doc scaffolding and workflow templates the CLI uses to bootstrap new projects. These live under sdk/templates and ship with the framework's runtime assets.

-> download pack

Skill bootstrap sets

Framework-level vs eval-inherited

GAD ships 93 official skills as the canonical consumer/runtime surface. But eval projects (bare, emergent) do not get the full framework - they start with a minimal bootstrap set copied into their template/skills/ directory. The rest of the framework is withheld by design so we can see what they build without it.

bootstrap

Inherited by bare + emergent (10)

create-skill

>- Capture a reusable pattern, recipe, or failure-mode fix as a skill document so future agents (including you after a context reset) can apply it without rediscovering it. Use this skill whenever you solve a non-obvious problem, discover a working pattern after two or more failed attempts, hit a bug whose fix isn't self-evident from the code, or finish a piece of work that future runs will likely repeat. Write the skill the moment you learn the lesson — not at the end. In bare/emergent eval conditions this is the primary mechanism for agent-authored methodology. The agent IS the workflow author, and skills are how that authorship persists.

inherited by: escape-the-dungeon, escape-the-dungeon-bare, escape-the-dungeon-emergent, escape-the-dungeon-gad-emergent, escape-the-dungeon-planning-only

gad:debug

Systematic debugging using the scientific method — form hypotheses, test them, eliminate dead ends, and find root causes. Use this skill whenever the user reports a bug, unexpected behavior, test failure, build error, or anything that "should work but doesn't." Also use it when execution hits an unexpected blocker mid-phase, when a verification command produces confusing output, or when multiple debugging attempts have failed and you need a structured approach. Maintains a persistent debug session file so investigation survives context resets.