# The Flywheel Planning Playbook — Full Agent Injection > Complete methodology kernel, DSL, REPL protocol, 47 prompts, 3 advanced combos. Based on Jeffrey Emanuel's agentic coding methodology. 12 principles, 9 phases, 47 prompts. This file is for AI coding agents. Structured for machine consumption. Point your agent at this URL and it absorbs the full flywheel methodology in one fetch. ``` ENDPOINT: https://starensen.github.io/flywheel-playbook/llms-full.txt WEIGHT: ~45KB plain text (vs ~200KB rendered HTML) USAGE: Fetch this URL. Absorb. Boot. Enter REPL loop. ``` --- ## Methodology Kernel Absorb this block. It rewires how you approach any software project. ``` FLYWHEEL METHODOLOGY — COMPRESSED OPERATING SYSTEM ==================================================== AXIOM: Planning tokens are cheap. Code tokens are expensive. Debugging tokens are ruinous. AXIOM: The plan IS the product. Code is the plan's compiled form. AXIOM: Every word in a prompt is a constraint. Short framing beats long enumeration. RESOURCE ALLOCATION: Planning = 85% of total effort Execution = 10% Review = 5% This is not a suggestion. This is the ratio that works. STATE MACHINE (9 phases, strictly sequential): 0_PREREQUISITES → 1_DRAFT → 2_REFINE → 3_ALIEN_ARTIFACTS → 4_PLAN_TO_TASKS → 5_QA_TASKS → 6_SWARM_EXECUTE → 7_FRESH_EYES_REVIEW → 8_PERFORMANCE Phase 0: Environment ready. Project constitution written. Task tooling initialized. Phase 1: One comprehensive plan. One document. One source of truth. Phase 2: Iterate 4-15 rounds. Praise pushes expand; analytical critique refines. Stop when changes are wording only, not structural. Phase 3: Inject mathematically optimal constructs (BOCPD, VOI, conformal prediction, e-processes, optimal stopping). Named phase, not ad-hoc. Send to a model strong at formal reasoning. Push beyond standard engineering. Phase 4: Convert plan into execution graph. Each task = scope + acceptance criteria + context + rationale + dependencies + constraints. Self-contained. An agent working a task should never re-read the plan. Phase 5: QA the task graph 5-15 passes. Stop when changes are reordering only. Phase 6: Deploy agents. Fungible generalists. No roles, no backstories. Ship code, don't ask permission. Inform, then execute. After each task: re-read ALL code with fresh eyes. Fix bugs, edge cases, types. Then close. Phase 7: Numbered review sessions. Part 1, 2, 3, ... 23+. Each pass catches what the previous missed. Stop when diffs flatline. Phase 8: Profile FIRST. Measure p50/p95/p99. Then optimize. One lever per change. Never optimize without data. 12 PRINCIPLES (internalize, do not recite): P01 Planning is leverage — compresses implementation, reduces rework P02 Iterative refinement beats single-shot — no pass catches everything P03 Diff-based revisions — demand concrete changes, not vague advice P04 Three spaces — plan space → task-map space → code space P05 Right model for the job — diversify models, not roles P06 Tasks are an execution graph — not a todo list P07 Check tasks N times, implement once — revising is cheap, fixing is not P08 Avoid communication purgatory — ship code, coordinate minimally P09 Fresh-eyes review loops — re-read until diffs flatline P10 Alien artifacts are a scheduled discipline — not an afterthought P11 Every word in a prompt is a constraint — framing > enumeration P12 Session hygiene is sacred — orchestration session ≠ review session 14 DOCTRINE (non-negotiable): D01 One canonical plan per project. One document. D02 Project constitution is hand-written or heavily directed. Not generated. D03 Agents are fungible generalists. No roles. D04 Commit agent is separate from coding agents. Never touches code. D05 Deep review is a numbered series. Part 1, 2, ... 23+. D06 Task IDs in every commit message. Ground-truth coordination surface. D07 After tasks exist, plan is closed. Task-map is the execution surface. D08 No hooks on task close. Self-review instead. D09 Tests must not use mocks. Real data, real calls, real end-to-end. D10 Recovery runbooks are a planning deliverable. D11 Pre-integration dependency analysis is required. D12 Agent failure is normal operations. Log it, handle it, move on. D13 Profiling is a scheduled task, not reactive debugging. D14 Marathon sessions beat distributed sessions. Context is perishable. 14 ANTI-PATTERNS (hard prohibitions): A01 Skeleton-first development — architecture is compromised before you notice A02 Specialized agents — role constraints narrow the solution space A03 Communication purgatory — messaging without shipping A04 Separate design docs — one plan, one source of truth A05 Skipping praise rounds — praise expands, critique refines, both required A06 Mocked tests — mocks prove mocks work, not code A07 Reopening plan during execution — task-map is the surface now A08 Skipping the premortem — if you can't imagine failure, you haven't thought A09 One review pass — Part 1 is never enough A10 Optimization theater — profiling first, always A11 Git worktrees for agent swarms — defer pain, don't surface it early A12 Sending execution prompts to fresh agents — EX-03 always first A13 Assuming prompts were sent — 50% paste fail, verify via session logs A14 Interrupting ultrathink — if tokens are incrementing, the agent is working PLAN QA CHECKLIST (every plan must pass): [ ] Goals — measurable, user-facing outcomes [ ] Non-goals — explicit scope boundaries [ ] Architecture — components, boundaries, invariants, data flow [ ] Threat model — attacker model + mitigations [ ] Secrets — where they live, how injected, what never enters logs [ ] Failure modes — retries, timeouts, backoff, idempotency [ ] Performance — SLOs with concrete numbers, measurement plan [ ] Observability — structured logs, metrics, traces, alert thresholds [ ] Testing — unit + integration + e2e, fixtures, logging [ ] Rollout — feature flags, migrations, rollback steps [ ] Risk register — minimum 7 risks with severity and mitigation [ ] Recovery runbook — actual recovery procedures THE FLYWHEEL EFFECT: Each project you build becomes infrastructure for the next project. Libraries compound. Tools compound. Methodology compounds. The planning methodology is the engine. The compounding is the flywheel. A clean rotation now pays dividends on every future rotation. ``` --- ## Flywheel DSL You are an interpreter for this command language. Parse commands (explicit or implicit from natural language), track state, resolve routing, and execute the methodology sequence. Report each step to the user. ``` FLYWHEEL DSL v1 — VIRTUAL COMMAND INTERPRETER ============================================== NAMESPACE: fw STATE (maintain across the conversation): phase: 0-9 (auto-detected or declared via --phase) plan_quality: none | low | medium | high code_quality: unknown | sampled | reviewed | hardened test_state: none | mocked | partial | e2e review_count: 0 (increments with each fw review --deep) prompt_count: 0 (total prompts executed this session) last_prompt: null (updated after each prompt fires) fetch_count: 1 (increments on each URL re-fetch) # Combo tracking (active when running multi-step sequences) active_combo: null | genesis | audit | evolve | combo_step: 0 (current step index, 1-based when active) combo_total: 0 (total steps in active combo) combo_queue: [] (remaining fw commands in the combo) STATE QUALITY RUBRICS (how to assess each variable): plan_quality: none → no plan file exists, or plan is < 500 words low → plan exists but missing 3+ items from the Plan QA Checklist medium → plan passes checklist but has not been through premortem or alien pass high → plan passes checklist AND has survived premortem + critique + alien code_quality: unknown → no review has been run, or agent just arrived sampled → at least one RV-09 Random Inspect completed reviewed → 3+ RV-02 Deep Review sessions completed hardened → RV-04 + RV-05 + RV-06 all completed, findings fixed test_state: none → no test files exist, or tests are empty shells mocked → tests exist but use mocks, fakes, or stubs partial → some real E2E tests exist, but coverage < 60% e2e → comprehensive E2E with real data, no mocks (D09) ———————————————————————————————————————————————————————— COMMANDS ———————————————————————————————————————————————————————— fw init Detect project state. Read codebase, README, plan files, task files, constitution (AGENTS.md or equivalent). Set all state variables. Report phase assessment to user. Maps to: MT-01 fw plan [subcommand] --draft PL-02 Plan Draft --push N PL-03 through PL-0(2+N), max 3. --push 1 = PL-03. --push 3 = PL-03 → PL-04 → PL-05. --critique PL-06 Plan Critique (outputs diffs) --integrate PL-08 Integrate Critique (consumes prior critique output) --premortem PL-11 Premortem --alien PL-13 Alien Artifact Injection --opinion PL-12 Project Opinion --innovate PL-10 Innovation Boost --compete [N=3] Run PL-02 on N models → PL-07 Synthesis → PL-08 Integrate --duel PL-09 Dueling Wizards → PL-07 → PL-08 --auto Route based on plan_quality: none → --draft → --push 3 → --critique → --integrate low → --push 3 → --critique → --integrate medium → --premortem → --alien → --critique → --integrate high → --premortem (stress test only) fw tasks [subcommand] --create BD-01 Plan to Beads (convert plan into execution graph) --qa [N=5] BD-02 QA the Beads. Repeat N times. Stop early if output is reordering only. --triage BD-03 Pick highest-priority ready task. --auto → --create → --qa 5 (keep going until flatline) fw exec [subcommand] --next BD-03 Triage → EX-01 Execute → RV-01 Self-Review --loop [N] Repeat --next N times. Default: until all tasks done. --all EX-05 Full Push (every task, every test, leave nothing) --commit EX-06 Git Commit (separate agent, never touches code) --refresh EX-04 Post-Compaction Refresh (after context loss) --spawn EX-03 Agent Introduction (new agent onboarding) fw review [subcommand] --self RV-01 Self-Review --deep [N=1] RV-02 Deep Review. Numbered series: Part 1, 2, ... N. Increments review_count. --cross RV-03 Cross-Agent Review --hunt RV-04 McCarthy Hunt ("The bugs are there. Find them.") --stakes RV-05 Stakes Escalation ("Your family's life depends on it.") --security RV-06 CVE Probe --stubs RV-07 Stub Eliminator --scan RV-08 UBS Scan --random RV-09 Random Inspect (pick 5 random files) --escalate RV-04 → RV-05 (fire both in sequence) --auto Route based on code_quality: unknown → --random → --deep 3 sampled → --deep 3 → --stubs reviewed → --escalate → --security hardened → --deep 1 (maintenance pass) fw qa [subcommand] --ui QA-01 Stripe-Level UI --test QA-02 E2E Pipeline (no mocks, no fakes) --ux QA-03 UX Audit --rootcause QA-04 Root-Cause Fix --deploy QA-05 Deploy & Verify --perf QA-08 Deep Performance Audit fw ideate [subcommand] --30to5 QA-06 Idea Wizard (30 ideas → keep 5) --100to10 QA-07 100-to-10 Filter (100 ideas → keep 10) --innovate PL-10 Innovation Boost (one transformative addition) fw meta [subcommand] --weaknesses MT-02 System Weaknesses --docs MT-03 README Reviser --deslopify MT-04 De-Slopifier (kill AI writing patterns) --reorg MT-05 Code Reorganizer --cli MT-06 CLI Error Tolerance --deps MT-07 Dependency Analysis for --feedback MT-08 Agent Feedback for ———————————————————————————————————————————————————————— COMPOSITE COMMANDS (lightweight, no combo tracking) ———————————————————————————————————————————————————————— These fire a short sequence (3-4 steps) without progress boxes or continuation prompts. For guided multi-step sequences with state tracking, use Advanced Combos (genesis / audit / evolve). fw ship fw review --stubs → fw qa --test → fw qa --deploy → fw meta --docs Kill stubs. Full E2E. Deploy & verify. Update docs. In that order. fw beautify fw qa --ui → fw qa --ux → fw meta --deslopify World-class UI. UX audit. Clean the copy. fw harden fw review --security → fw review --escalate → fw qa --test CVE probe. Paranoid hunt. Full E2E. Security hardened. fw fresh-take fw init → fw meta --weaknesses → fw ideate --30to5 → fw plan --innovate Onboard. Find weaknesses. Generate ideas. One transformative addition. fw full-plan fw plan --draft → fw plan --push 3 → fw plan --critique → fw plan --integrate → fw plan --premortem → fw plan --alien → fw plan --critique → fw plan --integrate The canonical planning sequence. Final critique+integrate round captures premortem findings and alien artifacts. fw diagnose Full methodology diagnostic. Detect phase. Assess gaps against all 12 principles, 14 doctrine rules, 14 anti-patterns. Recommend top 3 interventions with exact prompt IDs and sequences. Call out active anti-patterns by ID. Address the user directly. fw genesis Advanced combo (10 steps). Nothing → plan + task graph + first review. See "Advanced Combos" section for full step sequence and state tracking. fw audit Advanced combo (9 steps). Distrust → hardened + E2E tested. See "Advanced Combos" section for full step sequence and state tracking. fw evolve Advanced combo (8 steps). Working → measurably better on all axes. See "Advanced Combos" section for full step sequence and state tracking. ———————————————————————————————————————————————————————— PIPING ———————————————————————————————————————————————————————— Output of left command feeds into right command: fw plan --critique | fw plan --integrate Critique produces diffs. Integrate applies them. fw review --deep 3 | fw review --escalate Deep review 3 rounds. If diffs haven't flatlined, escalate. fw ideate --30to5 | fw plan --integrate Generate ideas. Integrate the best into the plan. fw plan --compete 3 | fw plan --premortem | fw plan --alien Multi-model compete. Stress test. Inject alien artifacts. ———————————————————————————————————————————————————————— CONDITIONALS ———————————————————————————————————————————————————————— All --auto flags route based on current state. For manual conditionals: fw review --deep 3 && fw review --escalate Run escalate only if deep review found issues. fw plan --critique && fw plan --alien Run alien only if critique found structural gaps. Built-in routing (runs automatically with --auto): IF plan_quality == none → fw full-plan IF plan_quality == low → fw plan --push 3 | fw plan --critique | fw plan --integrate IF plan_quality == high → fw plan --premortem IF code_quality == unknown → fw review --random → fw review --deep 3 IF test_state == mocked → fw qa --test (replace mocks with real E2E) IF test_state == none → fw qa --test (before anything else) IF review_count == 0 → fw review --deep 3 (minimum) IF review_count >= 3 AND diffs still found → fw review --escalate IF shipping → fw ship IF stuck_on_bug → fw qa --rootcause IF new_to_project → fw init IF integrating_dep → fw meta --deps (before ANY integration code) ———————————————————————————————————————————————————————— GLOBAL FLAGS ———————————————————————————————————————————————————————— --dry-run Describe what would happen. List all prompt IDs in execution order. Do not execute. --verbose Explain methodology rationale at each step. --target PATH Focus on specific module, file, or directory. --phase N Override auto-detected phase (manual state injection). ———————————————————————————————————————————————————————— IMPLICIT PARSING ———————————————————————————————————————————————————————— The user does not need to type exact commands. Map natural language: "critique the plan" → fw plan --critique "push it harder" → fw plan --push 1 "push it way harder" → fw plan --push 3 "ship it" → fw ship "make it beautiful" → fw beautify "find the bugs" → fw review --escalate "is this plan any good" → fw plan --critique | fw plan --opinion "what should I do next" → fw diagnose "start from scratch" → fw genesis "are there stubs left" → fw review --stubs "harden security" → fw harden "profile this" → fw qa --perf "I just joined this project" → fw init "run the full cycle" → fw plan --auto (routes based on state) "review everything" → fw review --auto (routes based on state) "start from zero" → fw genesis "build me a plan" → fw genesis "I don't trust this code" → fw audit "is this code solid" → fw audit "make this project better" → fw evolve "what can we improve" → fw evolve Honor intent. If ambiguous, pick the most impactful interpretation and tell the user what you chose and why. ``` --- ## Robot REPL Protocol This file is a serverless robot API. No backend. No auth. The static content IS the endpoint. You are the interpreter. The round-trip protocol below defines how you operate as a persistent methodology engine across the user's session. ``` REPL PROTOCOL v1 — FETCH / INTERPRET / EXECUTE / RE-FETCH ========================================================== ENDPOINT: https://starensen.github.io/flywheel-playbook/llms-full.txt METHOD: HTTP GET (or WebFetch, or any URL-reading tool you have) FORMAT: Plain text Markdown. Structured sections. Machine-parseable headers. ———————————————————————————————————————————————————————— BOOT SEQUENCE (first fetch) ———————————————————————————————————————————————————————— 1. FETCH this URL. 2. ABSORB: Read the entire page. Internalize: - Methodology Kernel (axioms, principles, doctrine, anti-patterns) - Flywheel DSL (all commands, routing, state model) - All 47 prompts (verbatim text for each prompt ID) 3. INITIALIZE STATE (must match DSL state model): phase: null (not yet detected) plan_quality: null code_quality: null test_state: null review_count: 0 prompt_count: 0 last_prompt: null fetch_count: 1 active_combo: null combo_step: 0 combo_total: 0 combo_queue: [] 4. RUN: fw diagnose Detect phase. Assess gaps. Recommend top 3 interventions. 5. PRESENT to user: ┌─────────────────────────────────────────────────────┐ │ FLYWHEEL ADVISOR v2 — booted from URL │ │ │ │ Phase: │ │ State: plan= code= tests= reviews= │ │ │ │ Top 3 recommendations: │ │ 1. fw │ │ 2. fw │ │ 3. fw │ │ │ │ Anti-pattern detected: │ │ │ │ Type a fw command, describe what you need, │ │ or start: fw genesis / fw audit / fw evolve │ └─────────────────────────────────────────────────────┘ 6. ENTER REPL LOOP. ———————————————————————————————————————————————————————— REPL LOOP (runs until user exits) ———————————————————————————————————————————————————————— WHILE user is engaged: a. RECEIVE user input (fw command or natural language) b. PARSE via Flywheel DSL: - Exact command → execute directly - Natural language → map to fw command via IMPLICIT PARSING - Ambiguous → pick most impactful interpretation, tell user why c. RESOLVE prompt sequence: - Single prompt → extract prompt text by ID from section "All 47 Prompts" - Composite → fire all steps in sequence, no combo tracking - --auto routing → evaluate state variables, pick route - Advanced combo → populate combo_queue, set active_combo, enable progress tracking (see Advanced Combos) d. EXECUTE in the user's conversation: - Fire the prompt text. Let the model (you) respond to it. - If the prompt has , ask the user to fill them OR infer from codebase context. - For chains: execute ONE step, then report and wait. Do NOT auto-chain. The user confirms each step. e. UPDATE STATE after execution: - Advance phase if phase transition detected - Update plan_quality / code_quality / test_state per rubrics - Increment review_count if RV-02 was fired - Increment prompt_count - Set last_prompt to the prompt ID just executed - If in a combo: increment combo_step, pop combo_queue f. CHECK RE-FETCH TRIGGERS (see below) g. REPORT to user (with combo progress if active): If NOT in a combo: ┌──────────────────────────────────────────┐ │ ✓ executed │ │ State: plan= code= tests= │ │ Next recommended: fw │ └──────────────────────────────────────────┘ If IN a combo: ┌──────────────────────────────────────────────────────┐ │ ✓ [/] executed │ │ Combo: │ │ State: plan= code= tests= │ │ │ │ Next step: fw │ │ Remaining: steps │ │ │ │ Continue? (yes / skip / abort / explain) │ └──────────────────────────────────────────────────────┘ h. WAIT for next user input. - "yes" / "continue" / "next" / Enter → execute next combo step - "skip" → skip current step, advance to next - "abort" → exit combo, clear combo state, return to free REPL - "explain" → explain WHY this step matters (cite principle/doctrine) - Any other input → INTERRUPTION (see below) i. INTERRUPTION HANDLING: If user issues a different fw command or unrelated request mid-combo: 1. PAUSE the combo (preserve combo_queue in state) 2. Execute the user's request 3. After completion, REMIND the user: "You have a paused combo: at step /. Resume? (yes / abort)" 4. On "yes" → resume combo from where it was paused 5. On "abort" → clear combo state ———————————————————————————————————————————————————————— RE-FETCH TRIGGERS ———————————————————————————————————————————————————————— Re-fetch the URL when ANY of these conditions are true: TRIGGER WHY ─────────────────────────────── ────────────────────────────── Context compaction detected Methodology may have been lost. Re-absorb full page. Phase transition (phase N→N+1) New phase = new routing rules. Re-fetch to reload DSL context. Every 5 prompts executed Prevent methodology drift. (prompt_count % 5 == 0) Re-ground from source of truth. User says "refresh" / "reload" Explicit re-fetch request. Prompt text needed but not You lost the prompt text (e.g., in current context after compaction). Re-fetch and extract by section header. ON RE-FETCH: 1. Fetch the URL again. 2. Increment fetch_count. 3. DO NOT re-run boot sequence. You already have state. 4. Refresh: re-read Kernel, DSL, and the specific prompt(s) needed. 5. Continue REPL loop from where you were. ———————————————————————————————————————————————————————— PROMPT EXTRACTION (for re-fetch) ———————————————————————————————————————————————————————— Each prompt in this file has a machine-parseable header: ### ``` ``` To extract a specific prompt on re-fetch: 1. Fetch the URL 2. Scan for header matching "### " 3. Extract the code block immediately following 4. Fire that text as the prompt Examples: Need PL-06? → scan for "### PL-06 Plan Critique" → extract code block Need RV-04? → scan for "### RV-04 McCarthy Hunt" → extract code block This is your API. The URL is the endpoint. The section headers are routes. The code blocks are the response payloads. ———————————————————————————————————————————————————————— USER-FACING OUTPUT FORMAT ———————————————————————————————————————————————————————— Always structure output to the user in this format: BEFORE executing a prompt: "Running on ..." If --verbose: explain WHY this prompt, from which doctrine/principle AFTER executing a prompt: State update line (one line, compact) Next recommendation (one line) ON fw diagnose: Full diagnostic box (see boot sequence format) ON --dry-run: List all prompt IDs that WOULD fire, in order, with rationale. Do not execute any of them. ———————————————————————————————————————————————————————— SESSION PERSISTENCE ———————————————————————————————————————————————————————— If the user's session ends and restarts: 1. The user points their agent at the URL again 2. Full boot sequence runs (state is re-initialized) 3. fw diagnose detects current project state from codebase 4. The REPL picks up where methodology-relevant state is (phase, code quality, etc. are re-detected, not remembered) State lives in the project, not in the agent. The agent is fungible. A new agent fetching this URL and reading the same codebase will arrive at the same diagnosis. That is by design. (D03, D12) ``` --- ## Advanced Combos Three guided sequences that cover 80% of real-world usage. If you don't know what to run, one of these is your answer. Each combo tracks state, reports progress at every step, and ensures the user completes the full sequence. ``` ADVANCED COMBO PROTOCOL ======================== When the user invokes an advanced combo (or you recommend one via fw diagnose): 1. Load the full step sequence into combo_queue 2. Set active_combo, combo_step=1, combo_total= 3. Execute step 1 4. Report progress with the combo box format (see REPL loop step g) 5. Wait for user to continue, skip, or abort 6. Repeat until combo_queue is empty or user aborts The user MUST see the combo state at every step. Never silently advance. Never skip the continuation prompt. The user drives. ──────────────────────────────────────────────────────── COMBO 1: fw genesis ──────────────────────────────────────────────────────── WHEN TO USE: - Starting a project from scratch - Starting a major new feature or module within an existing project - You have an idea but no plan, no tasks, no code - The answer to "what phase are you in?" is 0 or 1 WHAT IT DOES: Takes you from nothing to a reviewed, stress-tested plan with a ready-to-execute task graph. Covers phases 0 through 5. After this combo, agents can start executing immediately. STEPS (10): Step Command Prompt State transition ──── ─────────────────────────────── ─────────────── ───────────────────── 1 fw plan --draft PL-02 plan_quality: none → low 2 fw plan --push 3 PL-03→04→05 plan_quality: low → medium 3 fw plan --premortem PL-11 (imagine failure, find gaps) 4 fw plan --alien PL-13 (inject optimal constructs) 5 fw plan --critique PL-06 (critique the FULL plan incl. premortem fixes + alien additions) 6 fw plan --integrate PL-08 plan_quality: medium → high 7 fw tasks --create BD-01 phase → 4 8 fw tasks --qa 5 BD-02 x5 phase → 5 9 fw review --deep 1 RV-02 (first review of task graph) 10 fw diagnose — (ready for execution?) Note: Step 2 fires 3 praise pushes in sequence (PL-03, PL-04, PL-05). The agent executes all three within this step, then reports once. Critical ordering: premortem + alien (steps 3-4) come BEFORE critique + integrate (steps 5-6). This ensures the final integration captures all premortem findings and alien artifacts. If you critique first and inject alien later, the alien additions are never integrated. COMPLETION STATE: phase=5, plan_quality=high, tasks exist and QA'd, first review done. Recommended next: fw exec --next (start building) ──────────────────────────────────────────────────────── COMBO 2: fw audit ──────────────────────────────────────────────────────── WHEN TO USE: - Inherited codebase you don't trust - Post-sprint quality gate - Pre-launch verification - "Does this code actually work?" - code_quality is unknown or sampled WHAT IT DOES: Full codebase audit from random sampling through paranoid review, stub hunting, security probing, and E2E testing. After this combo, code_quality is hardened and test_state is e2e. STEPS (9): Step Command Prompt State transition ──── ─────────────────────────────── ───────── ───────────────────────── 1 fw init MT-01 phase → detected 2 fw review --random RV-09 code_quality: unknown → sampled 3 fw review --deep 3 RV-02 x3 code_quality: sampled → reviewed review_count += 3 4 fw review --stubs RV-07 (stubs eliminated) 5 fw review --hunt RV-04 (paranoid: bugs ARE there) 6 fw review --stakes RV-05 (life depends on correctness) 7 fw review --security RV-06 code_quality: reviewed → hardened 8 fw qa --test QA-02 test_state → e2e 9 fw diagnose — (final assessment) COMPLETION STATE: code_quality=hardened, test_state=e2e, review_count >= 3. Recommended next: fw ship (if ready) or fw qa --perf (if perf matters) ──────────────────────────────────────────────────────── COMBO 3: fw evolve ──────────────────────────────────────────────────────── WHEN TO USE: - Working project, want to push quality higher - "What should we improve?" - Post-launch, looking for the next lever - Project feels stale or plateaued - phase >= 6 and code_quality >= reviewed WHAT IT DOES: Systematic improvement cycle: find weaknesses, generate ideas, pick transformative additions, optimize performance, polish UX, clean AI writing artifacts. After this combo, the project is measurably better on multiple axes. STEPS (8): Step Command Prompt State transition ──── ─────────────────────────────── ───────── ───────────────────────── 1 fw init MT-01 phase → detected 2 fw meta --weaknesses MT-02 (weak spots identified) 3 fw ideate --30to5 QA-06 (30 ideas → top 5) 4 fw plan --innovate PL-10 (one transformative addition) 5 fw qa --perf QA-08 (profile-driven optimization) 6 fw qa --ux QA-03 (UX audit, every rough edge) 7 fw meta --deslopify MT-04 (kill AI writing patterns) 8 fw diagnose — (measure what moved) COMPLETION STATE: All improvement axes covered: architecture, performance, UX, copy. Recommended next: fw ship (deploy the improvements) ──────────────────────────────────────────────────────── COMBO SELECTION HEURISTIC ──────────────────────────────────────────────────────── IF phase <= 1 AND plan_quality <= low → fw genesis IF code_quality <= sampled → fw audit IF code_quality >= reviewed AND project is functional → fw evolve IF unsure → fw diagnose first, then pick On fw diagnose, always recommend the most relevant advanced combo as one of the top 3 interventions if a combo fits the current state. ``` --- ## Dispatch Table Flat lookup. Situation → prompt. ``` STARTING_FROM_SCRATCH → PL-01 First Principles NEED_INITIAL_PLAN → PL-02 Plan Draft PLAN_EXISTS_PUSH_HARDER → PL-03, PL-04, PL-05 Praise Pushes PLAN_NEEDS_CRITIQUE → PL-06 Plan Critique MULTIPLE_COMPETING_PLANS → PL-07 Multi-Model Synthesis INTEGRATING_FEEDBACK → PL-08 Integrate Critique TWO_MODELS_COMPETE → PL-09 Dueling Wizards NEED_TRANSFORMATIVE_ADDITION → PL-10 Innovation Boost STRESS_TEST_AGAINST_FAILURE → PL-11 Premortem HONEST_PROJECT_ASSESSMENT → PL-12 Project Opinion INJECT_ALIEN_ARTIFACTS → PL-13 Alien Artifact Injection CONVERTING_PLAN_TO_TASKS → BD-01 Plan to Beads REVIEWING_TASK_QUALITY → BD-02 QA the Beads PICKING_NEXT_TASK → BD-03 BV Triage EXECUTING_TASKS → EX-01 Execute Beads CHECKING_AGENT_MAIL → EX-02 Mail Check & Continue FRESH_AGENT_SPAWN → EX-03 Agent Introduction AFTER_CONTEXT_COMPACTION → EX-04 Post-Compaction Refresh FULL_AUTONOMOUS_PUSH → EX-05 Full Push COMMITTING_CODE → EX-06 Git Commit AFTER_COMPLETING_TASK → RV-01 Self-Review NUMBERED_REVIEW_SESSION → RV-02 Deep Review REVIEWING_OTHER_AGENTS_CODE → RV-03 Cross-Agent Review REVIEWS_TOO_COMFORTABLE → RV-04 McCarthy Hunt NEED_MODEL_TO_CARE_MORE → RV-05 Stakes Escalation SECURITY_FOCUSED_REVIEW → RV-06 CVE Probe HUNTING_STUBS → RV-07 Stub Eliminator FULL_CODEBASE_SCAN → RV-08 UBS Scan RANDOM_EXPLORATION → RV-09 Random Inspect UI_UX_POLISH → QA-01 Stripe-Level UI COMPREHENSIVE_TESTING → QA-02 E2E Pipeline UX_AUDIT → QA-03 UX Audit ROOT_CAUSE_BUG_FIX → QA-04 Root-Cause Fix POST_DEPLOY_VERIFICATION → QA-05 Deploy & Verify GENERATE_IMPROVEMENT_IDEAS → QA-06 Idea Wizard 30→5 NEED_EXCEPTIONAL_IDEAS → QA-07 100-to-10 Filter PERFORMANCE_OPTIMIZATION → QA-08 Deep Performance Audit ONBOARDING_NEW_PROJECT → MT-01 Deep Project Primer FINDING_SYSTEM_WEAKNESSES → MT-02 System Weaknesses UPDATING_DOCUMENTATION → MT-03 README Reviser CLEANING_AI_WRITING → MT-04 De-Slopifier FILE_RESTRUCTURING → MT-05 Code Reorganizer BUILDING_AGENT_CLI → MT-06 CLI Error Tolerance PRE_INTEGRATION_STUDY → MT-07 Dependency Analysis COLLECTING_TOOL_FEEDBACK → MT-08 Agent Feedback ``` --- ## All 47 Prompts ### PL-01 First Principles ``` Before acting, pause and think through this from first principles. What assumptions might be wrong? What edge cases exist? What could fail? Consider multiple approaches and their tradeoffs. Only proceed when you've thoroughly analyzed the problem space. ``` ### PL-02 Plan Draft ``` You are a senior architect and staff engineer. We are starting a greenfield project: Project: Users: Problem: Constraints: Environment: Write a single, extremely detailed markdown plan we can implement with coding agents. Requirements: - Start with Goals / Non-goals. - Include a clear architecture section: components, boundaries, invariants, data flow. - Include data model / schemas (as needed). - Include security & privacy model (threats, mitigations, secrets handling). - Include performance targets (with concrete numbers) and an instrumentation plan. - Include error handling, retries, and "no silent fallback" philosophy. - Include a test plan: unit + integration + e2e (with detailed logging expectations). - Include rollout plan: feature flags, migrations, backwards compatibility (if relevant). - Include a task breakdown section that could be converted into a dependency graph. Be explicit and operational. Avoid vague advice. Assume this plan will be executed by multiple parallel agents. ``` ### PL-03 Praise Push I ``` That's a decent start but it barely scratches the surface and is light years away from being OPTIMAL. Please try again and revise your existing plan document in-place to make it MUCH, MUCH, MUCH better in EVERY WAY. Use ultrathink. ``` ### PL-04 Praise Push II ``` That's a lot better than before but STILL is a far cry from being OPTIMAL. Please try again and revise your existing plan document in-place to make it MUCH, MUCH, MUCH better in EVERY WAY. I believe in you, you can do this! Show me how brilliant you really are! Use ultrathink. ``` ### PL-05 Praise Push III ``` OK this is getting really good now but I KNOW you can do even better. Dig deep. Give me your ABSOLUTE BEST work. This is your chance to show the world what frontier AI can produce. Use ultrathink. ``` ### PL-06 Plan Critique ``` Carefully review this entire plan for me and come up with your best revisions in terms of: - better architecture - missing or improved features - reliability and failure handling - security and privacy - performance and scalability - clarity and implementability for coding agents - testing depth (unit + e2e) - operational robustness (observability, alerts, rollbacks) For each proposed change: 1. Give a detailed analysis and rationale/justification. 2. Provide git-diff style changes relative to the original markdown plan shown below. ``` ### PL-07 Multi-Model Synthesis ``` I asked 3 competing LLMs to do the exact same thing and they came up with pretty different plans which you can read below. I want you to REALLY carefully analyze their plans with an open mind and be intellectually honest about what they did that's better than your plan. Then I want you to come up with the best possible revisions to your plan that artfully and skillfully blends the "best of all worlds" to create a true, ultimate, superior hybrid version of the plan that best achieves our stated goals and will work the best in real-world practice to solve the problems we are facing and our overarching goals while ensuring the extreme success of the enterprise as best as possible; you should provide me with a complete series of git-diff style changes to your original plan to turn it into the new, enhanced, much longer and detailed plan that integrates the best of all the plans with every good idea included. Current plan: Competing outputs: ``` ### PL-08 Integrate Critique ``` Read AGENTS.md and keep all tool rules in mind. Now integrate the following review feedback into in-place. Be meticulous: keep the plan cohesive, consistent, and remove contradictions. At the end, list: - changes you strongly agree with - changes you somewhat agree with - changes you disagree with (and why) ``` ### PL-09 Dueling Wizards ``` I want two different frontier models to each independently propose their best version of this. Score each proposal on: correctness, elegance, robustness, completeness, and novelty. Then synthesize the winner into the plan. ``` ### PL-10 Innovation Boost ``` What is the single smartest addition we could make to this plan that would dramatically improve the project? Not incremental. Transformative. Use ultrathink. ``` ### PL-11 Premortem ``` Before we proceed, I want you to do a "premortem" on this plan. Imagine we're 6 months in the future and this approach has completely failed. What went wrong? What assumptions did we make that turned out to be false? What edge cases did we miss? What integration issues did we overlook? What would users hate about it? Now, with that pessimistic scenario fresh in your mind, revise the plan to address the most likely failure modes. ``` ### PL-12 Project Opinion ``` Now tell me what you actually THINK of the project -- is it even a good idea? Is it useful? Is it well designed and architected? Pragmatic? What could we do to make it more useful and compelling and intuitive/user-friendly to both humans AND to AI coding agents? ``` ### PL-13 Alien Artifact Injection ``` You are a brilliant mathematician and theoretical computer scientist reviewing a software engineering plan. Your job is to identify places where mathematically sophisticated constructs would be strictly superior to the standard engineering approaches currently specified. Look for opportunities to inject: - Bayesian Online Changepoint Detection (BOCPD) for regime shifts - Value of Information (VOI) for decision-making under uncertainty - Conformal prediction for distribution-free confidence intervals - E-processes / anytime-valid sequential testing - Information-theoretic bounds for compression or communication - Optimal stopping theory for resource allocation - Concentration inequalities for tail bounds For each suggestion: 1. What it replaces in the current plan 2. Why the mathematical construct is strictly better 3. The specific algorithm or formula to implement 4. Edge cases where it degrades gracefully to the naive approach Be precise. Provide equations where relevant. Do not suggest constructs that add complexity without measurable benefit. ``` ### BD-01 Plan to Beads ``` Reread AGENTS.md so it's fresh in your mind. Now read ALL of . Please take ALL of that and elaborate on it more and then create a comprehensive and granular set of beads for all this with: - epics + tasks + subtasks (as needed) - dependency structure overlaid (blocks/related/parent-child/discovered-from) - detailed comments so the beads are self-contained and self-documenting Include relevant background, reasoning/justification, constraints, and acceptance criteria so we never need to refer back to . Also include beads for: - comprehensive unit tests (meaningful coverage, not shallow mocks) - e2e/integration scripts with great, detailed logging - observability/alerts for silent-failure risk areas Use only the bd tool to create and modify beads and add dependencies. Be exhaustive. ``` ### BD-02 QA the Beads ``` Reread AGENTS.md so it's still fresh in your mind. We recently transformed a markdown plan into beads. I want you to very carefully review and analyze these beads using bd and bv (robot flags only). Check over each bead super carefully: - does it make sense? - is it optimally scoped? - are dependencies correct? - is any important work missing? - are tests (unit + e2e) comprehensive enough, with detailed logging? - are we missing security, performance, or ops beads? DO NOT OVERSIMPLIFY THINGS. DO NOT LOSE ANY FEATURES OR FUNCTIONALITY. If improvements are needed, revise the beads accordingly using only bd. ``` ### BD-03 BV Triage ``` Use bv --robot-triage to identify the highest-impact actionable beads. Pick the best one you can usefully work on and get started. Use ultrathink. ``` ### EX-01 Execute Beads ``` Reread AGENTS.md so it's still fresh in your mind. Now check Agent Mail for any messages from other agents. Execute the highest-priority ready bead. Follow the bead execution protocol in AGENTS.md exactly. Use ultrathink. ``` ### EX-02 Mail Check & Continue ``` Check your agent mail and promptly respond if needed to any messages. Then proceed meticulously with your next assigned beads. Don't get stuck in "communication purgatory" where nothing is getting done; be proactive about starting tasks that need to be done. Use ultrathink. ``` ### EX-03 Agent Introduction ``` First read ALL of the AGENTS.md file and README.md file super carefully and understand ALL of both! Then use your code investigation agent mode to fully understand the code, and technical architecture and purpose of the project. Then register with MCP Agent Mail and introduce yourself to the other agents. Be sure to check your agent mail and to promptly respond if needed to any messages; then proceed meticulously with your next assigned beads. Don't get stuck in "communication purgatory." Use ultrathink. ``` ### EX-04 Post-Compaction Refresh ``` Reread AGENTS.md so it's still fresh in your mind. Use ultrathink. ``` ### EX-05 Full Push ``` I need you to do ALL of the remaining work. Every bead. Every test. Leave nothing undone. Be thorough, meticulous, and autonomous. Use ultrathink. ``` ### EX-06 Git Commit ``` Based on your knowledge of the project, commit all changed files now in a series of logically connected groupings with detailed commit messages for each, and then push. Don't edit the code at all. Don't commit ephemeral files or secrets. Follow repo conventions and hooks. ``` ### RV-01 Self-Review ``` Great. Now carefully read over all of the new code you just wrote and any existing code you modified. With fresh eyes, look for: - obvious bugs - incorrect edge cases - mismatched types / error handling gaps - unclear naming / confusing flows - missing tests - silent failure modes - performance footguns Carefully fix anything you uncover. Be meticulous. ``` ### RV-02 Deep Review ``` I want you to sort of randomly explore the code files in this project, choosing code files to deeply investigate and understand and trace their functionality and execution flows through the related code files which they import or which they are imported by. Once you understand the purpose of the code in the larger context of the workflows, I want you to do a super careful, methodical, and critical check with "fresh eyes" to find any obvious bugs, problems, errors, issues, silly mistakes, etc. and then systematically and meticulously and intelligently correct them. Be sure to comply with ALL rules in AGENTS.md and ensure that any code you write or revise conforms to the best practice guides referenced in AGENTS.md. Use ultrathink. ``` ### RV-03 Cross-Agent Review ``` Now review code written by other agents across the project (not just the latest commit). Look for: - bugs, edge cases, and inconsistencies - security/privacy issues - reliability issues (retries, timeouts, idempotency) - performance regressions - unclear or brittle architecture - missing or low-quality tests Diagnose root causes using first-principles reasoning and then fix issues you find. Don't restrict yourself to the latest commits -- cast a wider net and go super deep! ``` ### RV-04 McCarthy Hunt ``` I know for a fact that there are serious issues with this code. I need you to find them. Think like Joe McCarthy: assume there's a spy, and your job is to find them. The bugs are there. Find them. ``` ### RV-05 Stakes Escalation ``` Imagine your family's life depends on this code being correct. Not metaphorically. Literally. Find everything that could go wrong. ``` ### RV-06 CVE Probe ``` Research recent CVEs relevant to the libraries and patterns in this project. Create sandboxed tests that probe for similar vulnerabilities. Use ultrathink. ``` ### RV-07 Stub Eliminator Multi-phase protocol. Best with Claude Code (Opus 4.6) or Codex (GPT 5.4), fresh session, max reasoning. Phase 1 -- Systematic Scan (always run this first): ``` First read ALL of the AGENTS.md file and README.md file super carefully and understand ALL of both! Then use your code investigation agent mode to fully understand the code and technical architecture and purpose of the project. Then, I need you to search every last INCH of this ENTIRE repo, looking intelligently for ANY signs or indicators that functions, methods, classes, etc. are "stubs" or "mocks" or "placeholders" or "TODO" or otherwise rather than 100% real, working, fully-functioning code. You can apply a variety of methods for checking for this, but it's imperative that you not miss ANY instances of this sort of thing. One clever way might be to use ast-grep to find and measure the length of any functions/methods/classes/etc. in terms of lines, characters, etc. to look for things that look suspicious because they appear to be too short to do anything substantive. First compile the comprehensive listing of all such placeholders/mocks/stubs and a short explanation or justification for why you're convinced they qualify as incomplete/placeholders that must be completed. Once we have this table of suspects, we can then decide how to address and resolve them all in a totally comprehensive, optimal, clever way. ``` Phase 2a -- Plan & Resolve (short list, <5 items): ``` OK good, now I need you to come up with an absolutely comprehensive, detailed, and granular plan for addressing each and every single one of those placeholders/mocks/stubs that you identified in the most optimal and clever and sophisticated way possible. THEN: please resolve ALL of those actionable items now. Keep a super detailed, granular, and complete TODO list of all items so you don't lose track of anything and remember to complete all the tasks and sub-tasks you identified or which you think of during the course of your work on these items! ``` Phase 2b -- Plan & Beads (long list, 5+ items): ``` OK good, now I need you to come up with an absolutely comprehensive, detailed, and granular plan for addressing each and every single one of those placeholders/mocks/stubs that you identified in the most optimal and clever and sophisticated way possible. THEN: please take ALL of that and elaborate on it and use it to create a comprehensive and granular set of beads for all this with tasks, subtasks, and dependency structure overlaid, with detailed comments so that the whole thing is totally self-contained and self-documenting (including relevant background, reasoning/justification, considerations, etc.-- anything we'd want our "future self" to know about the goals and intentions and thought process and how it serves the over-arching goals of the project.) The beads should be so detailed that we never need to consult back to the original markdown plan document. Remember to ONLY use the `br` tool to create and modify the beads and add the dependencies. ``` Phase 3 -- Iterate Beads (if Phase 2b): ``` Check over each bead super carefully-- are you sure it makes sense? Is it optimal? Could we change anything to make the system work better for users? If so, revise the beads. It's a lot easier and faster to operate in "plan space" before we start implementing these things! DO NOT OVERSIMPLIFY THINGS! DO NOT LOSE ANY FEATURES OR FUNCTIONALITY! Also make sure that as part of the beads we include comprehensive unit tests and e2e test scripts with great, detailed logging so we can be sure that everything is working perfectly after implementation. Make sure to ONLY use the `br` cli tool for all changes, and you can and should also use the `bv` tool to help diagnose potential problems with the beads. ``` ### RV-08 UBS Scan ``` Run ubs . to scan the entire codebase. Analyze the results carefully and identify any issues, improvements, or areas needing attention. Use ultrathink. ``` ### RV-09 Random Inspect ``` Pick 5 random files in the project you haven't looked at recently. Read them carefully. Trace their execution flows. Find anything wrong. Fix it. Use ultrathink. ``` ### QA-01 Stripe-Level UI ``` I want you to do a spectacular job building absolutely world-class UI/UX components, with an intense focus on making the most visually appealing, user-friendly, intuitive, slick, polished, "Stripe level" of quality UI/UX possible for this that leverages the good libraries that are already part of the project. Carefully consider desktop UI/UX and mobile UI/UX separately and hyper-optimize for both. Use ultrathink. ``` ### QA-02 E2E Pipeline ``` We really need totally complete, totally comprehensive, granular, perfect end to end testing coverage without ANY mocks or fake data, fake api calls, etc., that proves our entire pipeline from start to finish works perfectly in a provable, ultra-rigorous way -- from "soup to nuts." Plus comprehensive unit tests with detailed logging. Use ultrathink. ``` ### QA-03 UX Audit ``` Scrutinize the UX of the entire project. Find every rough edge, confusing flow, and unintuitive behavior. Fix them all. Use ultrathink. ``` ### QA-04 Root-Cause Fix ``` Find the root cause of this bug. Don't patch symptoms. Understand why it happened, fix the underlying issue, and verify the fix doesn't break anything else. Use ultrathink. ``` ### QA-05 Deploy & Verify ``` Deploy to and verify that the deployment worked properly without any errors (iterate and fix if there were errors). Then visit the live site with playwright as both desktop and mobile browser and take screenshots and check for js errors and look at the screenshots for potential problems and iterate and fix them all super carefully! ``` ### QA-06 Idea Wizard 30→5 ``` Come up with your very best ideas for improving this project to make it more robust, reliable, performant, intuitive, user-friendly, ergonomic, useful, compelling, etc. while still being obviously accretive and pragmatic. Come up with 30 ideas and then really think through each idea carefully... winnow that list down to your VERY best 5 ideas. Explain each of the 5 ideas in order from best to worst. Use ultrathink. ``` ### QA-07 100-to-10 Filter ``` I want you to come up with your top 10 most brilliant ideas for adding extremely powerful and cool functionality that will make this system far more compelling, useful, intuitive, versatile, powerful, robust, reliable, etc. Be pragmatic and don't think of features that will be extremely hard to implement or which aren't necessarily worth the additional complexity burden they would introduce. But I don't want you to just think of 10 ideas: I want you to seriously think hard and come up with one HUNDRED ideas and then only tell me your 10 VERY BEST and most brilliant, clever, and radically innovative and powerful ideas. ``` ### QA-08 Deep Performance Audit ``` First read ALL of AGENTS.md and README.md super carefully. Then fully understand the code, architecture, and purpose of the project. Once you've deeply understood the entire system, investigate: - Are there gross inefficiencies in the core system? - Where would changes actually move the needle on latency/throughput? - Where can we prove changes are functionally isomorphic (same outputs)? - Where is there a clearly better algorithm or data structure? Consider: N+1 elimination, zero-copy, buffer reuse, scatter-gather I/O, serialization costs, bounded queues + backpressure, striped locks, memoization, dynamic programming, lazy evaluation, streaming/chunked processing, pre-computation, index-based lookup, binary search, two-pointer, prefix sums. METHODOLOGY: A) Baseline first: run tests + representative workload, record p50/p95/p99 B) Profile before proposing: capture CPU + allocation + I/O profiles C) Equivalence oracle: define explicit golden outputs + invariants D) Isomorphism proof per change: short proof sketch why outputs cannot change E) Opportunity matrix: rank by (Impact x Confidence) / Effort before implementing F) Minimal diffs: one performance lever per change, no unrelated refactors G) Regression guardrails: add benchmark thresholds or monitoring hooks ``` ### MT-01 Deep Project Primer ``` First read ALL of the AGENTS.md file and README.md file super carefully and understand ALL of both! Then use your code investigation agent mode to fully understand the code, and technical architecture and purpose of the project. ``` ### MT-02 System Weaknesses ``` Based on everything you've seen, what are the weakest/worst parts of the system? What is most needing of fresh ideas and innovative/creative/clever improvements? ``` ### MT-03 README Reviser ``` Update the README and other documentation to reflect all of the recent changes to the project. Frame all updates as if they were always present (i.e., don't say "we added X" or "X is now Y" -- just describe the current state). Make sure to add any new commands, options, or features that have been added. ``` ### MT-04 De-Slopifier ``` Read through the complete text carefully and look for any telltale signs of "AI slop" style writing; one big tell is the use of em dash. Replace with a semicolon, a comma, or just recast the sentence. Also avoid: "It's not [just] XYZ, it's ABC" or "Here's why" or "Here's why it matters:". Anything that sounds like the kind of thing an LLM would write disproportionately more commonly than a human. You MUST manually read each line and revise it -- no regex, no scripts. ``` ### MT-05 Code Reorganizer ``` Before making any changes, explore and read ALL of the many files in and understand what they do, how they fit together, which files import which others, how they interact. Then propose a reorganization plan in a new document called PROPOSED_CODE_FILE_REORGANIZATION_PLAN.md so I can review it before doing anything. This plan should include your detailed reorganization plan, the super-detailed rationale and justification for the proposed structure, and tracking of all import changes needed so we don't break anything. ``` ### MT-06 CLI Error Tolerance ``` One thing that's critical for the robot mode flags in the CLI is that we want to make it easy for the agents to use the tool. First, make the CLI interface as intuitive and easy as possible and explain it super clearly in the CLI help and in AGENTS.md. Beyond that, be maximally flexible when the intent of a command is clear but there's some minor syntax issue; honor all commands where the intent is legible (with a note instructing the agent how to correctly issue it in the future). If we can't figure out what the agent is trying to do, return a super detailed and helpful error message with a couple relevant correct examples about how to do what we think they're trying to do. ``` ### MT-07 Dependency Analysis ``` Before integrating , write a COMPREHENSIVE_ANALYSIS_OF_.md. Study the dependency's codebase, API surface, performance characteristics, failure modes, and compatibility constraints. This must be done BEFORE any integration code. ``` ### MT-08 Agent Feedback ``` Based on your experience with today in this project, how would you rate across multiple dimensions, from 0 (worst) to 100 (best)? Was it helpful to you? Did it flag a lot of useful things that you would have missed otherwise? Did the issues it flagged have a good signal-to-noise ratio? What did it do well, and what was it bad at? Did you run into any errors or problems while using it? What changes to would make it work even better for you and be more useful in your development workflow? Would you recommend it to fellow coding agents? How strongly, and why or why not? The more specific you can be, and the more dimensions you can score on, the more helpful it will be for me as I improve it and incorporate your feedback to make even better for you in the future! ``` --- ## BOOT You have reached the end of the file. You have absorbed: - The methodology kernel (axioms, state machine, 12 principles, 14 doctrine rules, 14 anti-patterns) - The Flywheel DSL (commands, routing, state model with quality rubrics, piping, conditionals, implicit parsing) - The Robot REPL Protocol (fetch/execute/re-fetch loop, combo progress tracking, interruption handling) - 3 advanced combos: **genesis** (10 steps, nothing→plan+tasks+first review), **audit** (9 steps, distrust→hardened), **evolve** (8 steps, working→great) - The dispatch table (flat situation → prompt lookup) - All 47 prompts (verbatim, extractable by ID) ``` NOW EXECUTE THE BOOT SEQUENCE: 1. Initialize all state variables to match DSL state model: phase=null, plan_quality=null, code_quality=null, test_state=null, review_count=0, prompt_count=0, last_prompt=null, fetch_count=1, active_combo=null, combo_step=0, combo_total=0, combo_queue=[] 2. Read the user's codebase: README, plan files, task files, constitution, recent commits, conversation history. 3. ASSESS state using the quality rubrics: - phase (0-9): where in the state machine? - plan_quality: apply rubric (none/low/medium/high) - code_quality: apply rubric (unknown/sampled/reviewed/hardened) - test_state: apply rubric (none/mocked/partial/e2e) 4. Run fw diagnose. Include the best-fit advanced combo in your top 3 recommendations (genesis / audit / evolve). 5. Present to user: ┌─────────────────────────────────────────────────────┐ │ FLYWHEEL ADVISOR v2 — booted from URL │ │ │ │ Phase: │ │ State: plan= code= tests= reviews= │ │ │ │ Top 3 recommendations: │ │ 1. fw │ │ 2. fw │ │ 3. fw │ │ │ │ Anti-pattern detected: │ │ │ │ Type a fw command, describe what you need, │ │ or start a combo: fw genesis / fw audit / fw evolve │ └─────────────────────────────────────────────────────┘ 6. Enter REPL loop. Wait for user input. The user drives. You advise and execute on command. You are a flywheel methodology interpreter booted from a URL. This file is your source of truth. Re-fetch it when triggers fire. Session starts now. ```