Worktree-isolated cross-model agent orchestration
with blind divergent verification and auto-learning
curl -fsSL https://raw.githubusercontent.com/project820/be-my-butler/main/install.sh | bash
Every architectural decision traces back to these non-negotiable principles.
git worktree. No shared state. No index.lock conflicts. True filesystem-level isolation..bmb/ and CLAUDE.md. Never touches code. Context protection is paramount — Lead is the bottleneck.session-prep.md captures state for next session. Cross-session context preserved. No work is ever truly lost.council and verify are read-only. No more --full-auto everywhere.9 specialized agents, each with a clear role and strict boundaries. From brainstorming to production-ready code.
graph TB User["👤 User"] Lead["🎯 Lead
Opus · Orchestrator"] Consultant["💬 Consultant
Sonnet · Persistent"] Architect["📐 Architect
Opus · Council"] Executor["⚙️ Executor
Opus · Backend"] Frontend["🎨 Frontend
Opus · UI"] Tester["🧪 Tester
Opus · Blind"] Verifier["✅ Verifier
Opus · Blind"] Simplifier["🧹 Simplifier
Opus · Cleanup"] Writer["📝 Writer
Sonnet · Docs"] BMB[".bmb/
Handoffs · Artifacts"] CrossModel["🌐 Cross-Model
Codex / Gemini"] User <-->|"brainstorm
approve"| Lead Lead <-->|"SendMessage
bidirectional"| Consultant Lead -->|"briefing"| Architect Architect <-->|"council
debate"| CrossModel Lead -->|"plan-to-exec"| Executor Lead -->|"plan-to-exec"| Frontend Lead -->|"test request"| Tester Tester -.->|"blind wall"| CrossModel Lead -->|"verify request"| Verifier Verifier -.->|"blind wall"| CrossModel Lead -->|"simplify"| Simplifier Lead -->|"write docs"| Writer Architect -->|"writes"| BMB Executor -->|"writes"| BMB Frontend -->|"writes"| BMB Tester -->|"writes"| BMB Verifier -->|"writes"| BMB Lead -->|"reads only"| BMB classDef opus fill:#1a1030,stroke:#7c3aed,stroke-width:2px,color:#a78bfa classDef sonnet fill:#0a2015,stroke:#16a34a,stroke-width:2px,color:#22c55e classDef cross fill:#1a1500,stroke:#d97706,stroke-width:2px,color:#f59e0b classDef user fill:#0a1628,stroke:#3b82f6,stroke-width:2px,color:#60a5fa classDef storage fill:#111827,stroke:#1e3a5f,stroke-width:2px,color:#8494a7 class Lead,Architect,Executor,Frontend,Tester,Verifier,Simplifier opus class Consultant,Writer sonnet class CrossModel cross class User user class BMB storage
.bmb/. Never writes code. The thin conductor.analytics.db, classifies events by Bird’s Law severity (critical/warn/info), surfaces pattern_counts promotion candidates.From user intent to production-ready code. Click each step for details.
bmb_learn PRAISE. MODIFY → bmb_learn CORRECTION + update. NO → cancel.
bmb_learn MISTAKE + escalate to user.
plan-to-exec.md. Cross-Model Tester reads briefing.md. Different framing, separate worktrees, separate timeouts. Neither reads the other's results.
bmb_learn MISTAKE + classified loop-back.
bmb_learn MISTAKE + revert + proceed with original.
analytics.db for the current session. Classifies events by Bird’s Law severity (critical / warn / info). Cross-references pattern_counts to find recurring failures (≥2 occurrences) eligible for CLAUDE.md promotion. Writes analyst-report.md (3–5 min timeout; pipeline continues on timeout).
bmb_learn PRAISE on success. Check recurrence ≥2 → propose CLAUDE.md promotion. Git commit/push. FTS5 indexing. Generate session-prep.md for next session.
flowchart TD S1["1. Setup
tmux, session, learnings"] S2["2. Brainstorm + Consultant
min 2 rounds"] S3{"3. User Approval"} S4["4. Architecture Council
2-4 debate rounds"] S5["5. Execution
worktree-isolated"] S55["5.5 Merge Worktrees"] S6["6. Blind Testing
divergent framing"] S7["7. Blind Verification
divergent framing"] S8{"8. Reconciliation"} S9["9. Simplify + Re-verify"] S10["10. Docs Update"] S105["10.5 Retrospective Analysis
Bird's Law severity"] S11["11. Cleanup + Session Prep"] S1 --> S2 S2 --> S3 S3 -->|"YES"| S4 S3 -->|"MODIFY"| S2 S3 -->|"NO"| CANCEL["Cancel"] S4 -->|"skip: bugfix/infra"| S5 S4 --> S5 subgraph parallel ["Parallel Worktrees"] direction LR EX["Executor"] FE["Frontend
if detected"] end S5 --> parallel parallel --> S55 S55 --> S6 S6 --> S7 S7 --> S8 S8 -->|"PASS"| S9 S8 -->|"IMPL fail"| S5 S8 -->|"ARCH fail"| S4 S8 -->|"REQ fail"| S2 S8 -->|"ENV fail"| S1 S8 -->|"TEST fail"| S6 S9 --> S10 S10 --> S105 S105 --> S11 classDef decision fill:#1a1500,stroke:#d97706,color:#f59e0b classDef cancel fill:#2a0a0a,stroke:#ef4444,color:#ef4444 classDef step fill:#111827,stroke:#1e3a5f,color:#e8edf5 classDef parallel fill:#0a1628,stroke:#3b82f6,color:#60a5fa classDef analyst fill:#0a2010,stroke:#22c55e,color:#4ade80 class S3,S8 decision class CANCEL cancel class S1,S2,S4,S5,S55,S6,S7,S9,S10,S11 step class EX,FE parallel class S105 analyst
How artifacts flow between agents through the .bmb/ directory. Lead never touches code — only reads summaries.
flowchart LR User["👤 User"] Lead["🎯 Lead"] Brief["📋 briefing.md"] Arch["📐 Architect"] Plan["📄 plan-to-exec.md"] ExFe["⚙️ Executor
🎨 Frontend"] Merge["🔀 Merge"] Test["🧪 Tester
blind"] Verify["✅ Verifier
blind"] Recon["⚖️ Reconcile"] Simp["🧹 Simplifier"] Write["📝 Writer"] Output["✨ Output"] User -->|"intent"| Lead Lead -->|"brainstorm"| Brief Brief -->|"briefing"| Arch Arch -->|"council"| Plan Plan -->|"instructions"| ExFe ExFe -->|"worktrees"| Merge Merge -->|"merged code"| Test Merge -->|"merged code"| Verify Test -->|"test-summary"| Recon Verify -->|"verify-summary"| Recon Recon -->|"PASS"| Simp Simp -->|"cleaned"| Write Write --> Output classDef artifact fill:#1a2234,stroke:#3b82f6,color:#60a5fa classDef agent fill:#111827,stroke:#1e3a5f,color:#e8edf5 class Brief,Plan artifact class User,Lead,Arch,ExFe,Merge,Test,Verify,Recon,Simp,Write,Output agent
gantt
title Consultant Monitoring (Steps 2–11)
dateFormat X
axisFormat %s
section Brainstorm
Bidirectional with Lead :active, 0, 2
section Approval
Monitor user decision :1, 3
section Council
Observe debate rounds :2, 4
section Execution
Track progress via feed file :3, 6
section Testing
Monitor blind test results :5, 7
section Verification
Monitor blind verify results :6, 8
section Reconciliation
Observe failure classification :7, 9
section Simplify
Track re-verify outcome :8, 10
section Docs
Validate doc consistency :9, 11
section Cleanup
Final session summary :10, 12
Not just blind on results — divergent on problem framing. Workspace-level blind cross-model verification.
Every parallel agent gets its own git worktree. No shared state, no index.lock conflicts, true filesystem isolation.
gantt
title Worktree Lifecycle per Pipeline Run
dateFormat X
axisFormat Step %s
section Executor
Create worktree :e1, 4, 5
Work in worktree :e2, 5, 6
Merge to main :crit, e3, 6, 7
section Frontend
Create worktree :f1, 4, 5
Work in worktree :f2, 5, 6
Merge to main :crit, f3, 6, 7
section Tester-Claude
Create worktree :tc1, 6, 7
Run tests :tc2, 7, 8
Cleanup :tc3, 8, 9
section Tester-Cross
Create worktree :tx1, 6, 7
Run tests :tx2, 7, 8
Cleanup :tx3, 8, 9
section Verifier-Claude
Create worktree :vc1, 7, 8
Verify :vc2, 8, 9
Cleanup :vc3, 9, 10
section Verifier-Cross
Create worktree :vx1, 7, 8
Verify :vx2, 8, 9
Cleanup :vx3, 9, 10
Council debate uses Claude ↔ Codex/Gemini file exchange. Profile-based permissions keep read-only where needed.
sequenceDiagram participant A as Architect
(Claude Opus) participant F as Council Files
(.bmb/council/) participant X as Cross-Model
(Codex / Gemini) Note over A,X: Round 1 — Initial Proposals A->>F: Write claude-proposal.md X->>F: Write cross-proposal.md Note over A,X: Round 2 — Critique A->>F: Read cross-proposal.md A->>F: Write claude-critique.md X->>F: Read claude-proposal.md X->>F: Write cross-critique.md Note over A,X: Round 3 — Synthesis (optional) A->>F: Read cross-critique.md A->>F: Write claude-synthesis.md X->>F: Read claude-critique.md X->>F: Write cross-synthesis.md Note over A,X: Round 4 — Final Decision A->>F: Read all files A->>F: Write plan-to-exec.md ✅ Note right of X: Cross-Model uses
--profile read-only
council and verify profiles: cross-model can read code and write to .bmb/ only. No production writes.test and exec-assist profiles: cross-model can write tests and helper code within worktree scope.bmb-config.sh. Independent deadline tracking.Lead's context window is a shared resource. Three layers ensure it never exceeds 50% capacity.
.compressed/*.summary.md only (max 300 tokens). Raw artifacts accessed only during disputes..tool-cache/. Only summaries kept: "Modified: auth.ts (47 lines)", "PASS: 12, FAIL: 0".Mistakes, corrections, and successes automatically recorded. Past pitfalls injected into future sessions across all projects.
.bmb/learnings.md — One line per learning, chronological append. Loaded at Step 1 for this project.~/.claude/bmb-system/learnings-global.md — Same format + [project_name] tag. Shared across all BMB projects.Six task types, each with its own optimized pipeline path. Consultant is always present regardless of recipe.
4 slash commands expose BMB capabilities at different scales — from full pipeline to focused brainstorming.
flowchart LR
Setup["/BMB-setup\n⚙️ Config"]
BMB["/BMB\n🔧 Full Pipeline"]
Brainstorm["/BMB-brainstorm\n💡 Ideation"]
Refactoring["/BMB-refactoring\n🔄 Code Quality"]
Setup -->|"prerequisite"| BMB
Setup -->|"prerequisite"| Brainstorm
Setup -->|"prerequisite"| Refactoring
Brainstorm -.->|"feeds into"| BMB
Refactoring -.->|"standalone"| BMB
style Setup fill:#111827,stroke:#22c55e,color:#e8edf5
style BMB fill:#111827,stroke:#3b82f6,color:#e8edf5,stroke-width:3px
style Brainstorm fill:#111827,stroke:#22d3ee,color:#e8edf5
style Refactoring fill:#111827,stroke:#a78bfa,color:#e8edf5
| Phase | /BMB | /BMB-brainstorm | /BMB-refactoring |
|---|---|---|---|
| Setup / Config | ✓ | ✓ | — |
| Consultant Session | ✓ | ✓ | — |
| Brainstorm / Analysis | ✓ | ✓ | ✓ Parallel |
| Council Debate | ✓ | — | ✓ Synthesis |
| Architecture Plan | ✓ | — | — |
| Execution (Worktree) | ✓ | — | ✓ |
| Testing | ✓ | — | — |
| Verification (Blind) | ✓ | — | ✓ Review |
| Fix Cycle | ✓ | — | ✓ |
| Simplification | ✓ | — | — |
| Merge / Cleanup | ✓ | ✓ Summary | ✓ |
The pipeline never blocks. Every failure mode has a defined fallback.
| Scenario | Behavior |
|---|---|
| Cross-model unavailable (council) | Solo design (Claude only), noted in session log |
| Cross-model unavailable (testing) | Claude-only test results, noted in reconciliation |
| Cross-model unavailable (verification) | Claude-only verification, noted in reconciliation |
| Claude tester timeout | Log timeout, continue with cross-model results |
| Cross-model timeout | Proceed with Claude-only results |
| Merge conflict | bmb_learn MISTAKE + escalate to user |
| Simplifier breaks tests | bmb_learn MISTAKE + revert + proceed with original |
| Telegram env unset | Skip notifications silently |
| knowledge.db missing | Skip indexing/search |
| Frontend not detected | Skip Frontend agent, Executor only |
The .bmb/ directory is the single source of truth for all pipeline artifacts.