Skip to content

3.1 Tool Reference

The flywheel is built from 29 interconnected tools (16 core plus 13 supporting). No single tool is transformative on its own; the compounding effect comes from how they reinforce each other. NTM spawns agents that use Agent Mail to coordinate. BV tells agents what to work on. CASS and CM give agents memory across sessions. DCG and SLB prevent disasters, which is what allows "vibe mode" (dangerous flags enabled) to be safe.

"The magic isn't in any single tool. It's in how they work together. Using three tools is 10x better than using one." -- Jeffrey Emanuel

The tools are organized into tiers based on how central they are to the workflow.


3.1.1 Tier 1 -- Core (Always Running)

These are the tools that define the flywheel. Every ACFS project uses all of them.

Claude Code (CC)

Role: Primary AI coding agent. The workhorse of the flywheel.

Claude Code is the CLI interface to Claude models. It runs inside tmux panes managed by NTM, reads AGENTS.md for project context, communicates with other agents via Agent Mail, and executes beads as directed by BV. In Jeff's typical formation, 5-6 Claude Code instances (running Opus) handle primary implementation work, plus one dedicated commit agent running EX-06 only.

Key characteristics: - Runs in tmux (managed by NTM), persistent across SSH disconnects - Reads AGENTS.md and CLAUDE.md for project-specific instructions - Supports MCP tools (Agent Mail, CASS skills, etc.) via settings.json hooks - PreToolUse hooks enable DCG (destructive command guard) and RCH (remote compilation) - "Vibe mode" = passwordless sudo + dangerous flags enabled, guarded by DCG

Forensic signal in Jeff's repos: Named agents in commits (RedSnow, BrownLake, PurpleBear), multiple commits in the same 1-5 minute window confirming parallel execution.


Beads Rust (br / bd)

Stars 663
Language Rust
GitHub beads_rust

Role: Local-first issue tracker. The execution graph for all agent work.

Beads is not a todo list. It is a dependency-aware task graph stored in SQLite (primary) with JSONL for git portability. Every task has: scope, acceptance criteria, context, rationale, dependencies (blocks/related/parent-child), constraints, and priority (P0-P4). Agents never refer back to the plan during execution; the beads are self-contained.

Key commands:

br create "title" --priority P1 --parent <epic-id>    # Create a bead
br list --status open --sort priority                   # List open beads
br close <id>                                           # Mark complete
br stats                                                # Velocity metrics
br dep add <id> --blocks <other-id>                     # Add dependency
bd ...                                                  # Alias (backward compat)

Scale: Jeff has hit 693 beads/day at peak velocity. A typical project has 500-2000 beads created during BD-01 (Plan to Beads).

Forensic signal: beads.db or beads.jsonl in repo root. Bead IDs in commit messages ([bead-NNN] or bd-xxxxx). High-impact beads completed before low-impact ones (PageRank ordering from BV).


Beads Viewer (bv)

Stars 1,340
Language Go
GitHub beads_viewer

Role: Graph-theory triage engine for beads. Tells agents what to work on.

BV treats the bead graph as a DAG and applies nine graph metrics including PageRank and betweenness centrality to determine which beads are highest-impact and ready for execution. Its robot protocol (--robot-* flags) produces JSON output that agents consume directly.

Key commands:

bv --robot-triage          # JSON output: highest-impact actionable beads, PageRank-sorted
bv --robot-insights        # Analysis of graph structure, bottlenecks
bv --robot-timeline        # Projected completion timeline
bv --robot-next            # Single highest-priority ready bead
bv -export-pages /tmp/bv   # Static HTML site of the bead graph

How agents use it: EX-01 (Execute Beads) and BD-03 (Use BV) both direct agents to call bv --robot-triage to pick their next work item. This is how coordination happens without a human assigning tasks.

"They follow the plan as embodied in the beads task structure, and they use my bv tool to pick the best bead to work on next." -- Jeffrey Emanuel

Forensic signal: High-impact beads completed before low-impact ones. "bv" referenced in AGENTS.md. PageRank-consistent commit ordering.


NTM -- Named Tmux Manager

Stars 165
Language Go
GitHub ntm

Role: Agent cockpit. Spawns, monitors, and manages agent sessions.

NTM is the control plane for the agent swarm. It manages named tmux panes where Claude Code, Codex, and Gemini CLI instances run. 80+ commands cover session management, prompt broadcasting, file conflict detection, and context rotation. Sessions survive SSH disconnects.

Key commands:

ntm --robot-spawn=<project> --spawn-cc=4 --spawn-cod=4 --spawn-gmi=2
                                          # Spawn 10 agents: 4 CC, 4 Codex, 2 Gemini
ntm --robot-send=<session> --msg='<prompt>'
                                          # Send a prompt to a specific agent
ntm --robot-terse                         # Compact status of all sessions
ntm --robot-status --json                 # Full JSON status

Naming convention: NTM auto-generates color+noun names for agents: RedSnow, BrownLake, PurpleBear, PinkMountain, GreenRiver, BluePeak. These names appear in Agent Mail messages and sometimes in commit attribution.

The human steering pattern: Jeff monitors agents via NTM status and Agent Mail. When he sees a stall, he sends EX-02 (Anti-deadlock). When he sees a bug, RV-02 (Deep Review). When context compaction fires, EX-04 (Refresh). All prompts are sent verbatim via Stream Deck buttons.

"I don't even need to look at the button labels anymore." -- Jeffrey Emanuel, on his Stream Deck + NTM workflow

Forensic signal: Named agents in commits. Multi-agent commits in the same minute window. "ntm" referenced in AGENTS.md.


Stars 536
Language Rust
GitHub coding_agent_session_search

Role: Unified search across ALL agent sessions. The memory backbone.

CASS indexes 11 agent formats: Claude Code, Codex, Cursor, Gemini, ChatGPT, Cline, Aider, Pi-Agent, Factory, OpenCode, Amp. Powered by Tantivy for sub-60ms keyword queries, with optional MiniLM semantic search. When an agent needs to know "have we solved this before?" or "what did I do last session?", CASS is the answer.

Key commands:

cass search "query"                       # Keyword search across all sessions
cass search "query" --mode semantic       # Semantic search (MiniLM embeddings)
cass index ~/.claude/projects/            # Re-index session files

How it fits: CASS is the read-only prior-art layer. Before starting a new task, Jeff (or an agent with CASS as a skill) searches for prior work: "Have we handled this error pattern before?" "What approach did the Gemini agent use for the buffer pool?" CM (CASS Memory) builds on top of CASS to create persistent procedural memory.

"I do have a cass skill which the agents can use to find past conversation excerpts easily that are relevant." -- Jeffrey Emanuel

Forensic signal: No direct git signal (it is a personal context recall tool). Its value shows up indirectly: agents making decisions consistent with prior sessions, patterns reappearing across projects.


MCP Agent Mail

Stars 1,761
Language Python (Rust variant: 23 stars)
GitHub mcp_agent_mail

Role: Inter-agent coordination. The messaging backbone.

Agents register identities, send/receive messages, and declare file reservations to prevent edit conflicts. HTTP FastMCP server with Web UI and static export. Agents use it through MCP tools in Claude Code, Codex, or Gemini CLI.

Key operations (via MCP tools, not CLI): - Register agent identity on spawn (EX-03) - Send messages to specific agents by name - Declare file reservations (prevents two agents editing the same file) - Check inbox for blockers, discoveries, and coordination messages - Agent Mail SQLite is a monitoring surface for the human operator

The anti-deadlock pattern: EX-02 exists specifically because agents can fall into "communication purgatory" -- messaging each other without shipping code. EX-02 forces them back to work: "Don't get stuck in communication purgatory where nothing is getting done; be proactive about starting tasks."

Forensic signal: "check agent mail" in commits. Cross-agent coordination messages. AGENTS.md with "Agent Mail" coordination section. mcp_agent_mail_rust sync commits confirm parallel agents.


3.1.2 Tier 2 -- Safety, Quality, and Coordination

These tools extend the core with safety nets, quality gates, and cross-session memory.

UBS -- Ultimate Bug Scanner

Stars 184
Language Bash
GitHub ultimate_bug_scanner

Role: AST-grep patterns, 1000+ detection rules, 8 languages, 18 categories. The quality gate.

UBS is the standard scan that runs as part of RV-08 (UBS Scan prompt). Sub-5s scans. Auto-wires into Claude Code, Codex, Gemini, and Cursor.

Key commands:

ubs .                                     # Scan entire project
ubs file.ts file.py                       # Scan specific files (<1s)
ubs $(git diff --name-only --cached)      # Scan staged files only

Forensic signal: "ubs scan" in commit notes. Defensive patterns added post-scan. Maps to RV-08 (UBS Scan prompt).


DCG -- Destructive Command Guard

Stars 608
Language Rust
GitHub destructive_command_guard

Role: Claude Code PreToolUse hook. Blocks dangerous commands BEFORE execution.

DCG is what makes "vibe mode" (passwordless sudo, dangerous flags enabled) safe. It intercepts every tool use attempt and checks against 50+ rule packs across 17 categories: git, filesystem, DB, k8s, cloud, CI/CD. Fail-open design means it never blocks legitimate work.

What it catches: - rm -rf (outside /tmp) - git reset --hard, git push --force, git branch -D - git clean -f, git stash drop/clear - Database drops, k8s deletes, cloud resource destruction

Configuration: Lives in ~/.claude/settings.json as a PreToolUse hook.

"Agent safety should be baked into the runtime, not bolted on after. Validation gates that catch destructive ops before they execute is such an obvious pattern yet almost nobody ships with it by default." -- @doodlestein community member, Feb 2026

"[DCG] gets called automatically by the agent on every attempted tool use. [...] The agent will learn from dcg and stop trying to do dumb stuff. Definitely that happens within the individual session level via in-context learning, and might even persist now that Claude Code has built in native memory by default." -- Jeffrey Emanuel

Forensic signal: N/A for Jeff's repos (personal safety tool, no git trace).


Brenner Bot

Stars 64
Language TypeScript
GitHub brenner_bot

Role: Research orchestration inspired by Nobel laureate Sydney Brenner.

Not a code review bot. Brenner Bot is an "iterated knowledge distillation" system that converts unstructured research corpora into refined "algebras of ideas" with operators. Hypothesis lifecycle tracking, discriminative test design, anomaly management, evidence pack integration.

Jeff describes the broader value beyond molecular biology:

"Although the system itself is very cool and powerful, and Brenner's approach and worldview are breathtakingly brilliant and fruitful, perhaps the more relevant thing is the process of iterated knowledge distillation that was used to convert a massive amount of unstructured, meandering stories about his life and work into a refined 'algebra of ideas' complete with operators. [...] What else could it be used for? Perhaps something as far afield as philosophy or the classics?" -- Jeffrey Emanuel

Forensic signal: Deep research docs, hypothesis files in repos. Very rare. Jeff's most complex research projects only.


CM -- CASS Memory System

Stars 259
Language TypeScript
GitHub cass_memory_system

Role: Procedural memory for agents. Three-layer cognitive architecture: Episodic -> Working -> Procedural.

CM transforms scattered sessions into persistent cross-agent memory. Patterns discovered in one Cursor session automatically help Claude Code in the next session. It builds on top of CASS (session search) to create durable, actionable memory.

Key commands:

cm context "<task>" --json     # Inject relevant procedural rules into agent prompts
cm mark <id> --helpful         # Positive feedback loop
cm mark <id> --harmful         # Negative feedback loop
cm reflect                     # Trigger memory consolidation
cm stats                       # Memory health metrics
cm stale                       # Find outdated memories

How it fits: Pre-session, Jeff runs cm context "<task>" to load relevant procedural rules for the upcoming work. This context gets injected into agent prompts, giving fresh agents the benefit of prior sessions' discoveries.

Forensic signal: cm context blocks in project AGENTS.md. Consistent patterns across Jeff's many projects.


RCH -- Remote Compilation Helper

Stars 34
Language Rust
GitHub remote_compilation_helper

Role: Claude Code PreToolUse hook. Offloads cargo builds to remote workers.

Intercepts build commands, syncs source via rsync + zstd compression, builds on the remote machine, streams artifacts back. Jeff has a 64-core machine dedicated to compilation; RCH makes it transparent to agents running on VPS instances.

Forensic signal: Fast build times on VPS-class hardware. No direct git trace.


SLB -- Simultaneous Launch Button

Stars 60
Language Go
GitHub slb

Role: Two-person rule for dangerous commands. Nuclear-launch-style safety.

Four risk tiers: CRITICAL (2+ approvals required), DANGEROUS, CAUTION, SAFE. Cryptographic signing, rollback support. For security-critical projects where a single misconfigured agent could cause real damage.

Forensic signal: Security-critical projects only.


3.1.3 Tier 3 -- Acceleration and Infrastructure

These tools enable Jeff's specific scale (50+ agents across 3 machines) but are optional for most practitioners.

CAAM -- Coding Agent Account Manager

Stars 59
Language TypeScript
GitHub coding_agent_account_manager

Role: Sub-100ms auth switching across multiple AI provider accounts.

Jeff runs 2 Claude Max accounts (\(400/month), ChatGPT Pro (\)200/month), and Gemini. With 10+ agents across 3 machines, CAAM handles smart rotation, cooldown tracking, health scoring, and vault isolation. When one account hits rate limits, CAAM transparently rotates to the next.

Key commands:

caam switch      # Rotate to next healthy account
caam status      # Health and cooldown state for all accounts
caam rotate      # Force rotation

"Nah, I'd rather burn the tokens. The scripts never work right. I do have a lot of pro/max accounts, though." -- Jeffrey Emanuel

When you need it: Only relevant when running more agents than a single subscription supports (roughly >5 concurrent Claude Code instances on Max).


S2P -- Source to Prompt TUI

Stars 16
Language TypeScript
GitHub source_to_prompt_tui

Role: Interactive code-to-prompt generator.

A TUI tool for assembling code context into prompts. Navigate your codebase visually, select specific files and functions, and export them as structured prompt context.

Key commands:

s2p .                     # Launch TUI in current directory
s2p --out /tmp/ctx.md     # Export selected context to file
s2p --include "*.rs"      # Filter to Rust files only

When you need it: During PL-06/PL-07 planning rounds where you paste code into competing model web conversations. Also useful for extracting specific modules for cross-agent review (RV-03) when the agent needs focused context rather than the full codebase.


Meta Skill (ms)

Stars 128
Language Rust
GitHub meta_skill

Role: Skill management with MCP integration.

Dual persistence (SQLite + Git), hybrid search (BM25 + semantic + RRF), UCB bandit optimization for skill selection. Manages the growing library of agent skills and prompts -- think of it as a package manager for reusable prompt templates and agent behaviors.

Key commands:

ms search "query"    # Find relevant skills (hybrid BM25 + semantic)
ms install <id>      # Install a skill into current project
ms list              # List available skills with usage stats
ms suggest           # UCB bandit recommendation for current context

How it fits: The prompt pack (section 3.2) covers the 47 core prompts. Meta Skill manages the expanding library beyond these: project-specific skills, community-contributed patterns, and custom review templates. When your team develops a specialized review checklist for a recurring codebase pattern, Meta Skill stores, searches, and recommends it.


RANO

Stars 23
Language Rust
GitHub rano

Role: Network observer. Tracks outbound connections from AI CLI processes.

Monitors what Claude/Codex/Gemini is connecting to during execution. Captures DNS lookups and TCP connections per process.

Key commands:

rano watch --pid $(pgrep claude)    # Monitor a specific agent process
rano scan                           # Scan all running AI CLI processes
rano report --since "1 hour ago"    # Summary of recent outbound connections

When you need it: Auditing agent behavior in security-sensitive environments, diagnosing unexpected network activity (e.g., an agent making external API calls it should not), or verifying that sandbox restrictions are working. Pairs with DCG for comprehensive agent guardrails.


Supporting Tools (Quick Reference)

Tool Role
APR (Automated Plan Reviser Pro) Automates multi-round PL-06 spec review using extended reasoning. apr refine plan.md --rounds 10
JFP (JeffreysPrompts CLI) Browse and install battle-tested prompts. jfp list / jfp search / jfp install. This IS the MASTER_PROMPTS.md -- the CLI version.
RU (Repo Updater) Sync 100+ GitHub repos in one command. ru sync
SRPS (System Resource Protection) ananicy-cpp + curated rules. Auto-deprioritize background processes. Essential at 10+ agent scale.
MDWB (Markdown Web Browser) Convert websites to markdown for LLM consumption
TRU (TOON Rust) Token-optimized notation, 30-50% context reduction
PT (Process Triage) Bayesian zombie process detection
CAUT (Usage Tracker) Track LLM provider usage and costs
WA (WezTerm Automata) Terminal hypervisor for multi-agent automation (superseded by FrankenTerm)

3.1.4 Decision Matrix: Which Tools Do You Need?

Not everyone needs the full toolchain. Here is a decision matrix based on project scale:

Scale Agents Essential Tools Recommended Additions
Solo 1 CC Beads (br), CLAUDE.md UBS, DCG
Small team 2-3 agents + NTM, Agent Mail, BV + CASS, SLB
Standard 4-6 agents + CM, UBS, DCG + RCH (if Rust), SRPS
Heavy 7-10 agents + CAAM (if multi-account), SRPS + APR, Meta Skill
Jeff-scale 11-50+ agents All of the above + S2P, RANO, SLB, RU

The minimum viable flywheel is: Beads + BV + AGENTS.md + one Claude Code instance. You can run the entire methodology with just these. Everything else is acceleration.


Related pages on this site