Back to skills
Skill

gad:eval-run

gad-eval-run

Run an eval project in an isolated git worktree

Execute a GAD eval project in an isolated git worktree. Each run gets its own worktree, a timestamped output directory, and an agent stub invocation. The worktree is cleaned up after the run. Results are saved in `evals//v/`.

Parse from $ARGUMENTS:

  • --project <name> (required)
  • --baseline <sha> (optional, defaults to HEAD)
  • --agent <model> (optional, defaults to configured default model)
  1. Validate project exists:

    ls "evals/$PROJECT" 2>/dev/null || echo "NOT_FOUND"
    

    If not found: Eval project '<name>' not found. Run \gad eval list` to see available projects.`

  2. Read project REQUIREMENTS.md:

    cat "evals/$PROJECT/REQUIREMENTS.md"
    
  3. Compute next run number: Count existing evals/<name>/v*/ directories. Next = max + 1.

  4. Create git worktree:

    BASELINE="${BASELINE:-HEAD}"
    WORKTREE_PATH=$(mktemp -d "/tmp/gad-eval-XXXXXX")
    git worktree add "$WORKTREE_PATH" "$BASELINE"
    
  5. Scaffold output directory:

    mkdir -p "evals/$PROJECT/v$RUN_NUM"
    cat > "evals/$PROJECT/v$RUN_NUM/RUN.md" << EOF
    # Eval Run v$RUN_NUM
    
    project: $PROJECT
    baseline: $BASELINE
    started: $(date -u +%Y-%m-%dT%H:%M:%SZ)
    agent: $AGENT_MODEL
    status: running
    EOF
    
  6. Execute agent (stub): In the worktree, run the eval agent:

    cd "$WORKTREE_PATH"
    # Stub: agent invocation goes here
    # Full implementation in gad-eval plan
    echo "STUB: agent run for $PROJECT" > "$WORKTREE_PATH/eval-output.txt"
    
  7. Collect results: Copy relevant output files from worktree to evals/<name>/v<N>/:

    cp "$WORKTREE_PATH/eval-output.txt" "evals/$PROJECT/v$RUN_NUM/"
    
  8. Remove worktree:

    git worktree remove --force "$WORKTREE_PATH"
    
  9. Update RUN.md with result: Set status: completed and ended: <timestamp>.

  10. Summary:

    ✓ Eval run complete
    
    Project:  $PROJECT
    Run:      v$RUN_NUM
    Baseline: $BASELINE
    Output:   evals/$PROJECT/v$RUN_NUM/
    
- Always remove the worktree, even if the run fails (use trap in bash) - Never modify files in the main worktree during an eval run - If baseline SHA is invalid, abort before creating the worktree
Source on GitHub

vendor/get-anything-done/sdk/skills/gad-eval-run/SKILL.md

Client debug · NEXT_PUBLIC_CLIENT_DEBUG=1
0 lines

No events yet. Window errors, unhandled rejections, and React render errors appear here. Set NEXT_PUBLIC_CLIENT_DEBUG_CONSOLE=1 to mirror console.error / console.warn.