3.4 Scaling Patterns
3.4.1 Formation Taxonomy¶
The number of agents you run is not a dial you turn to "fast." It is a formation decision that depends on project size, bead graph shape, budget, and hardware. Jeff uses a six-tier formation taxonomy:
| Formation | Agents | Typical Use | Model Mix (example) |
|---|---|---|---|
| A | 1 | Single-bead work, prototyping, learning the methodology | 1 CC (Opus) |
| B | 2--3 | Small projects, 30-50 beads | 2 CC + 1 Codex |
| C | 4--6 | Standard projects, 50-100 beads (Jeff's typical) | 3 CC + 2 Codex + 1 Gemini |
| D | 7--10 | Heavy projects, 100-200 beads | 5 CC + 3 Codex + 1 Gemini + 1 commit |
| E | 11--15 | Major systems (frankensqlite scale) | 6 CC + 4 Codex + 2 Gemini + 1 commit + 1-2 review |
| F+ | 15+ | Full-scale swarms (frankenglibc) | 8+ CC + 8+ Codex + Gemini, across 3 machines |
Source: ACFS methodology documentation (FLYWHEEL.md).
The commit agent is always a singleton. Review agents scale with the coder count: 1 reviewer per 4-5 coders. The "reality checker" role (an agent that validates against real data rather than tests) is optional, needed only for data-intensive projects.
"I have 16 GPT Pro accounts now. But they're both great and highly complementary. Claude has more of a 'can do' attitude. I also use Gemini for code review. More brains is better than one." -- Jeffrey Emanuel
By February 2026, Jeff's operational formation was 3 Opus agents (Claude Code) + 3 Codex agents + 3 Gemini agents per project, sometimes running across 3 machines simultaneously:
"8 Codex + 8 Claude Code per project across 3 machines = ~50+ agents" -- FLYWHEEL.md formation scale table (reconstructed from tweets)
3.4.2 Model Mix Ratios¶
Jeff's observed model allocation as of late February 2026, per his own statements:
| Model | Share of Work | Strengths |
|---|---|---|
| Codex | ~50-60% | Follows instructions precisely, thorough, catches things Claude misses |
| Claude (Opus) | ~10-30% | "Can do" attitude, strong architecture reasoning, only model that reliably handles git hooks |
| Gemini | ~20% | Review duties, fresh-eyes perspective that makes both Codex and Claude look incomplete |
"Gemini earns 20% of mix right now, for my workload. Claude between 10-30% depending on what I am doing. Codex the rest." -- Jeffrey Emanuel
"I thought codex was a thorough, judicious, genius engineer, and claude was competent but a little behind. After bringing gemini into the fold, codex looks lazy and claude seems totally inept." -- Jeffrey Emanuel
Behavioral differences: Claude self-corrects iteratively (fix, test, fix, test in rapid cycles). Codex front-loads its thinking (long pause, then dumps a large coherent change). During multi-model competition (PL-07), Jeff asks each model for two documents: a revised plan AND an argumentative narrative against the other's approach. This "two-document debate protocol" surfaces disagreements that a simple review would miss.
Model-role matrix (operational rules):
| Task | Recommended Models | Avoid |
|---|---|---|
| Coding (EX-01) | Codex, Claude | Gemini (shell restriction issues) |
| Review (RV-02) | Gemini, Claude | -- |
| Commit (EX-06) | Claude ONLY | Codex, Gemini (git issues) |
| QA / Reality Check | Codex, Gemini | -- |
| Alien artifacts (PL-13) | Gemini, o3 | -- (formal reasoning strength needed) |
Source: ACFS methodology documentation (ACFS_WIZARD.md).
3.4.3 Cost Modeling¶
Subscription-based cost (Jeff's actual stack, Feb 2026):
| Item | Monthly Cost |
|---|---|
| Claude Max x2 (2 accounts) | $400 |
| ChatGPT Pro (16 accounts at scale) | $3,200+ |
| Gemini (included in Google account) | $0 |
| VPS (Contabo, 64GB RAM) | $40-56 |
| Dedicated bare metal x2 (32-core AMD, 256GB RAM each) | $680 |
| Total (full scale) | ~$4,300+/month |
| Total (starter, 1 of each) | ~$440-456/month |
The starter cost for a solo practitioner: VPS (\(40-56) + Claude Max (\)200) + ChatGPT Pro (\(200) = ~\)440-456/month.
"Before claude code max i was spending ~$100 a day on sonnet" -- Jeffrey Emanuel
"Opus basically compresses a whole engineering team into a context window. When velocity 10x's, the $200 feels less like cost and more like leverage." -- Jeffrey Emanuel
"My token usage cost on some of these tools far exceeds the subscription cost. Makes me think the cost of tokens are subsidized by the model providers to a significant degree." -- Jeffrey Emanuel
Cost optimization levers:
- Use flat-rate subscriptions (Max/Pro) instead of API billing. Jeff's entire methodology is built on subscription tiers. At 50+ agents, per-token pricing would be ruinous.
- Use Sonnet/Codex for bulk execution, Opus/Gemini for review. The cheaper, faster models handle the volume; the expensive models handle the judgment.
- Kill stalled agents early. An agent burning tokens in a loop produces no value. Monitor via
git log --since="45 min ago"and kill anything with 0 commits in that window. - Context compaction is not free. Every compaction event degrades quality. For long sessions, terminate and restart rather than accumulating compactions.
3.4.4 Hardware Scaling¶
Single-machine ceiling: Jeff routinely runs 50+ agents on one machine.
"I am often running 50+ agents on a single machine, and if you don't have something like that set up, the machine quickly becomes an unresponsive mess." -- Jeffrey Emanuel
Hardware progression (from Jeff's tweets):
| Stage | Hardware | Agent Ceiling |
|---|---|---|
| Starter | VPS, 64GB RAM, 8-16 cores | 5-10 agents |
| Intermediate | Workstation, AMD 5975wx (32 cores) | 20-30 agents |
| Advanced | 64-core workstation + VPS fleet | 40-50 agents |
| Jeff's Feb 2026 | 3 machines (64-core primary + 2x bare metal Contabo) | 50-80+ agents |
"I have two dedicated bare metal servers for around $340/month each from Contabo, hosted in the US. They each have a 32-core AMD CPU and 256gb of RAM and NVME SSD." -- Jeffrey Emanuel
Scaling infrastructure (tools Jeff built to cope):
| Problem | Tool | Purpose |
|---|---|---|
| Rust builds crush the primary machine | RCH (Remote Compilation Helper) | Offloads cargo build to a fleet of dedicated VPS machines |
| Disk fills up from build artifacts | SBH (Storage Ballast Helper) | Intelligently clears junk so RCH keeps working |
| Zombie/runaway processes | PT (Process Triage) | Bayesian zombie process detection |
| RAM/CPU exhaustion from swarm | SRPS (System Resource Protection) | Auto-deprioritize background processes |
| Temp files and stuck tests | Dedicated maintenance agent | Has SSH to all machines, clears junk and kills runaways |
"Running big agent swarms can be hazardous to your machine's health. It has become such a problem for me that I've been forced to create a suite of programs to help me cope with this, including: (1) remote_compilation_helper for offloading builds to a fleet of dedicated VPS machines (this is the single most critical one). (2) storage_ballast_helper for intelligently clearing out junk so you avoid filling up your primary machine. Without this, rch stops working after a few hours. (3) process_triage for helping to diagnose messed up runaway or zombie processes." -- Jeffrey Emanuel
"It's so incredibly useful to have a dedicated agent with knowledge of and ssh access to all your machines, that is highly skilled in the art of maintaining machines that are used for big agent swarms. Basically, clearing out junk temp files and killing stuck tests and runaways." -- Jeffrey Emanuel
Related pages on this site