English 한국어 日本語 繁體中文
← → or J/K · ESC to exit
Language
English 한국어 日本語 繁體中文
Multi-Agent Orchestration System
v0.2.0 What's New in v0.2

BMB
Be-my-butler

Worktree-isolated cross-model agent orchestration
with blind divergent verification and auto-learning

GitHub Repository Quick Start
curl -fsSL https://raw.githubusercontent.com/project820/be-my-butler/main/install.sh | bash
11.5
Pipeline Steps
9
Agents
3
Compression Layers
4
Skills
6
Recipes

Core Design Principles

Every architectural decision traces back to these non-negotiable principles.

W
Worktree Isolation
Every write-capable parallel agent gets its own git worktree. No shared state. No index.lock conflicts. True filesystem-level isolation.
B
Blind Divergent Framing
Cross-model tracks read different context documents. Not just blind on results — divergent on problem framing itself.
L
Lead as Thin Orchestrator
Lead reads only .bmb/ and CLAUDE.md. Never touches code. Context protection is paramount — Lead is the bottleneck.
C
Bidirectional Consultant
SendMessage enables Consultant → Lead feedback. Business rules from user conversations actually reach the pipeline.
A
Auto-Learning
Mistakes, corrections, and successes are automatically recorded. Past pitfalls are injected into future sessions. 3-tier: local → global → CLAUDE.md.
S
Session Continuity
session-prep.md captures state for next session. Cross-session context preserved. No work is ever truly lost.
3
3-Layer Compression
Read-Time summaries, Write-Time caching, Reference-Time FTS5. Lead context stays below 50% through full pipeline.
P
Profile-Based Permissions
Cross-model invocations use role-specific profiles: council and verify are read-only. No more --full-auto everywhere.
G
Graceful Degradation
Cross-model unavailable? Pipeline continues Claude-only. Simplifier breaks tests? Revert and proceed. Never blocks.

Agent Roster

9 specialized agents, each with a clear role and strict boundaries. From brainstorming to production-ready code.

graph TB
  User["👤 User"]
  Lead["🎯 Lead
Opus · Orchestrator"] Consultant["💬 Consultant
Sonnet · Persistent"] Architect["📐 Architect
Opus · Council"] Executor["⚙️ Executor
Opus · Backend"] Frontend["🎨 Frontend
Opus · UI"] Tester["🧪 Tester
Opus · Blind"] Verifier["✅ Verifier
Opus · Blind"] Simplifier["🧹 Simplifier
Opus · Cleanup"] Writer["📝 Writer
Sonnet · Docs"] BMB[".bmb/
Handoffs · Artifacts"] CrossModel["🌐 Cross-Model
Codex / Gemini"] User <-->|"brainstorm
approve"| Lead Lead <-->|"SendMessage
bidirectional"| Consultant Lead -->|"briefing"| Architect Architect <-->|"council
debate"| CrossModel Lead -->|"plan-to-exec"| Executor Lead -->|"plan-to-exec"| Frontend Lead -->|"test request"| Tester Tester -.->|"blind wall"| CrossModel Lead -->|"verify request"| Verifier Verifier -.->|"blind wall"| CrossModel Lead -->|"simplify"| Simplifier Lead -->|"write docs"| Writer Architect -->|"writes"| BMB Executor -->|"writes"| BMB Frontend -->|"writes"| BMB Tester -->|"writes"| BMB Verifier -->|"writes"| BMB Lead -->|"reads only"| BMB classDef opus fill:#1a1030,stroke:#7c3aed,stroke-width:2px,color:#a78bfa classDef sonnet fill:#0a2015,stroke:#16a34a,stroke-width:2px,color:#22c55e classDef cross fill:#1a1500,stroke:#d97706,stroke-width:2px,color:#f59e0b classDef user fill:#0a1628,stroke:#3b82f6,stroke-width:2px,color:#60a5fa classDef storage fill:#111827,stroke:#1e3a5f,stroke-width:2px,color:#8494a7 class Lead,Architect,Executor,Frontend,Tester,Verifier,Simplifier opus class Consultant,Writer sonnet class CrossModel cross class User user class BMB storage
← scroll →
L
Lead
Claude Opus • Full Pipeline
Orchestration, decisions, relay, brainstorming. Reads only .bmb/. Never writes code. The thin conductor.
orchestratorbrainstorms in-process
C
Consultant
Claude Sonnet • Step 2–11
Coordinator identity: full situational awareness, zero command authority. Dual-channel (feed file + SendMessage). Receives lifecycle events during blind phase; gets full post-briefing after reconciliation.
dual-channelpersistentpost-briefing
A
Architect
Claude Opus • Step 4
Design + cross-model council debate. 2–4 rounds of Claude vs Codex/Gemini. Writes plan-to-exec.md. Queries Context7 for live library docs before designing.
council debatecross-modelContext7
E
Executor
Claude Opus • Step 5
Backend implementation. Works in isolated git worktree. Queries Context7 for current library docs before writing. Commits only within worktree scope.
worktree-isolatedContext7
F
Frontend
Claude Opus • Step 5
React/Next.js + shadcn/Tailwind specialist. Separate worktree from Executor. Queries Context7 for current framework docs. Spawned only if frontend scope detected.
worktree-isolatedconditionalContext7
T
Tester
Claude Opus • Step 6
Unit, integration, edge-case tests. Part of blind cross-model testing with divergent framing.
blindworktree-isolated
V
Verifier
Claude Opus • Step 7
Evidence-based verification + code review in one agent. Blind cross-model verification.
blindreview + verify
S
Simplifier
Claude Opus • Step 9
Post-work code cleanup. Must re-verify (build + tests) after changes. Failure triggers auto-revert.
re-verify
W
Writer
Claude Sonnet • Step 10
Documentation update + cross-validation. Sonnet is sufficient for docs.
A
Analyst
Claude Sonnet • Step 10.5
Retrospective analysis: queries analytics.db, classifies events by Bird’s Law severity (critical/warn/info), surfaces pattern_counts promotion candidates.
bypassPermissionsread-only

The 11.5-Step Pipeline

From user intent to production-ready code. Click each step for details.

1
Setup
Lead
tmux guard, session ID, directory structure, source bmb-learn.sh, load past MISTAKE entries (local + global), config, session-prep check, conversation logger start.
auto-learningsession continuity
2
Brainstorm + Consultant
Lead Consultant
Lead brainstorms directly with user (no separate brainstormer agent). Spawn persistent Consultant pane with bidirectional SendMessage. Minimum 2 rounds. Write briefing with Known Pitfalls section.
in-processbidirectional
3
User Approval
Lead
Present compressed briefing. YESbmb_learn PRAISE. MODIFYbmb_learn CORRECTION + update. NO → cancel.
auto-learning3-way branch
4
Architecture Council
Architect Cross-Model
Create git worktrees for execution. Spawn Architect for Claude-Codex/Gemini council debate (2–4 rounds). Write plan-to-exec.md. Skip for bugfix/infra recipes.
council debatecross-modelskip: bugfix, infra
5
Execution
Executor Frontend
Spawn Executor + Frontend (conditional) in separate worktrees. Parallel execution with zero git conflicts. Frontend only if scope detected (React, Vue, Svelte, etc.).
worktree-isolatedparallelconditional frontend
5.5
Merge Worktrees
Lead
Commit in worktree → merge to main → remove worktree. Conflict? bmb_learn MISTAKE + escalate to user.
auto-learning
6
Cross-Model Testing (Blind)
Tester Cross-Model
Claude Tester reads plan-to-exec.md. Cross-Model Tester reads briefing.md. Different framing, separate worktrees, separate timeouts. Neither reads the other's results.
blind walldivergent framingworktree-isolated
7
Cross-Model Verification (Blind)
Verifier Cross-Model
Same blind divergent pattern as Step 6. Consultant isolation: no results shared until reconciliation.
blind walldivergent framing
8
Reconciliation
Lead
Read structured summaries. 5-category failure classifier: IMPL→Step 5, ARCH→Step 4, REQ→Step 2, ENV→Step 1, TEST→Step 6. FAIL triggers bmb_learn MISTAKE + classified loop-back.
auto-learningfailure classification
9
Simplification + Re-verify
Simplifier
Minimal safe improvements. Build + tests must pass (re-verification). Failure → bmb_learn MISTAKE + revert + proceed with original.
re-verifyauto-learning
10
Docs Update
Writer
Writer updates documentation, removes dead references, cross-validates consistency across all modified files.
10.5
Retrospective Analysis
Analyst
Analyst queries analytics.db for the current session. Classifies events by Bird’s Law severity (critical / warn / info). Cross-references pattern_counts to find recurring failures (≥2 occurrences) eligible for CLAUDE.md promotion. Writes analyst-report.md (3–5 min timeout; pipeline continues on timeout).
Bird’s Lawpattern_countsread-only agent
11
Cleanup + Session Prep
Lead
bmb_learn PRAISE on success. Check recurrence ≥2 → propose CLAUDE.md promotion. Git commit/push. FTS5 indexing. Generate session-prep.md for next session.
auto-learningsession continuityCLAUDE.md promotion

Pipeline Flow Diagram

flowchart TD
  S1["1. Setup
tmux, session, learnings"] S2["2. Brainstorm + Consultant
min 2 rounds"] S3{"3. User Approval"} S4["4. Architecture Council
2-4 debate rounds"] S5["5. Execution
worktree-isolated"] S55["5.5 Merge Worktrees"] S6["6. Blind Testing
divergent framing"] S7["7. Blind Verification
divergent framing"] S8{"8. Reconciliation"} S9["9. Simplify + Re-verify"] S10["10. Docs Update"] S105["10.5 Retrospective Analysis
Bird's Law severity"] S11["11. Cleanup + Session Prep"] S1 --> S2 S2 --> S3 S3 -->|"YES"| S4 S3 -->|"MODIFY"| S2 S3 -->|"NO"| CANCEL["Cancel"] S4 -->|"skip: bugfix/infra"| S5 S4 --> S5 subgraph parallel ["Parallel Worktrees"] direction LR EX["Executor"] FE["Frontend
if detected"] end S5 --> parallel parallel --> S55 S55 --> S6 S6 --> S7 S7 --> S8 S8 -->|"PASS"| S9 S8 -->|"IMPL fail"| S5 S8 -->|"ARCH fail"| S4 S8 -->|"REQ fail"| S2 S8 -->|"ENV fail"| S1 S8 -->|"TEST fail"| S6 S9 --> S10 S10 --> S105 S105 --> S11 classDef decision fill:#1a1500,stroke:#d97706,color:#f59e0b classDef cancel fill:#2a0a0a,stroke:#ef4444,color:#ef4444 classDef step fill:#111827,stroke:#1e3a5f,color:#e8edf5 classDef parallel fill:#0a1628,stroke:#3b82f6,color:#60a5fa classDef analyst fill:#0a2010,stroke:#22c55e,color:#4ade80 class S3,S8 decision class CANCEL cancel class S1,S2,S4,S5,S55,S6,S7,S9,S10,S11 step class EX,FE parallel class S105 analyst
← scroll →

Handoff Data Flow

How artifacts flow between agents through the .bmb/ directory. Lead never touches code — only reads summaries.

flowchart LR
  User["👤 User"]
  Lead["🎯 Lead"]
  Brief["📋 briefing.md"]
  Arch["📐 Architect"]
  Plan["📄 plan-to-exec.md"]
  ExFe["⚙️ Executor
🎨 Frontend"] Merge["🔀 Merge"] Test["🧪 Tester
blind"] Verify["✅ Verifier
blind"] Recon["⚖️ Reconcile"] Simp["🧹 Simplifier"] Write["📝 Writer"] Output["✨ Output"] User -->|"intent"| Lead Lead -->|"brainstorm"| Brief Brief -->|"briefing"| Arch Arch -->|"council"| Plan Plan -->|"instructions"| ExFe ExFe -->|"worktrees"| Merge Merge -->|"merged code"| Test Merge -->|"merged code"| Verify Test -->|"test-summary"| Recon Verify -->|"verify-summary"| Recon Recon -->|"PASS"| Simp Simp -->|"cleaned"| Write Write --> Output classDef artifact fill:#1a2234,stroke:#3b82f6,color:#60a5fa classDef agent fill:#111827,stroke:#1e3a5f,color:#e8edf5 class Brief,Plan artifact class User,Lead,Arch,ExFe,Merge,Test,Verify,Recon,Simp,Write,Output agent
← scroll →

Consultant Feed Timeline

gantt
  title Consultant Monitoring (Steps 2–11)
  dateFormat X
  axisFormat %s

  section Brainstorm
    Bidirectional with Lead          :active, 0, 2

  section Approval
    Monitor user decision            :1, 3

  section Council
    Observe debate rounds            :2, 4

  section Execution
    Track progress via feed file     :3, 6

  section Testing
    Monitor blind test results       :5, 7

  section Verification
    Monitor blind verify results     :6, 8

  section Reconciliation
    Observe failure classification   :7, 9

  section Simplify
    Track re-verify outcome          :8, 10

  section Docs
    Validate doc consistency         :9, 11

  section Cleanup
    Final session summary            :10, 12
      
← scroll →

Blind Divergent Protocol

Not just blind on results — divergent on problem framing. Workspace-level blind cross-model verification.

Claude Track

Reads: plan-to-exec.md + diff
Perspective: Implementation fidelity
Worktree: tester-claude / verifier-claude
Timeout: claude_agent (1200s)

NEVER reads: *-cross.md
BLIND WALL

Cross-Model Track

Reads: briefing.md + diff
Perspective: User intent fidelity
Worktree: tester-cross / verifier-cross
Timeout: cross_model (3600s)

NEVER reads: *-claude.md
Why divergent framing matters: Claude sees the detailed execution plan and checks "did we build what we planned?" Cross-model sees the original user briefing and checks "did we build what the user actually wanted?" This catches plan-faithful but user-unfaithful implementations.

Worktree Lifecycle

Every parallel agent gets its own git worktree. No shared state, no index.lock conflicts, true filesystem isolation.

gantt
  title Worktree Lifecycle per Pipeline Run
  dateFormat X
  axisFormat Step %s

  section Executor
    Create worktree     :e1, 4, 5
    Work in worktree    :e2, 5, 6
    Merge to main       :crit, e3, 6, 7

  section Frontend
    Create worktree     :f1, 4, 5
    Work in worktree    :f2, 5, 6
    Merge to main       :crit, f3, 6, 7

  section Tester-Claude
    Create worktree     :tc1, 6, 7
    Run tests           :tc2, 7, 8
    Cleanup             :tc3, 8, 9

  section Tester-Cross
    Create worktree     :tx1, 6, 7
    Run tests           :tx2, 7, 8
    Cleanup             :tx3, 8, 9

  section Verifier-Claude
    Create worktree     :vc1, 7, 8
    Verify              :vc2, 8, 9
    Cleanup             :vc3, 9, 10

  section Verifier-Cross
    Create worktree     :vx1, 7, 8
    Verify              :vx2, 8, 9
    Cleanup             :vx3, 9, 10
      
← scroll →
Step 4 Create executor + frontend worktrees from HEAD git worktree add .bmb/worktrees/executor bmb-executor-{SESSION} git worktree add .bmb/worktrees/frontend bmb-frontend-{SESSION} Step 5 Agents work in isolated worktrees (zero conflict possible) Step 5.5 Merge worktrees → main commit in worktree → merge --no-edit → remove worktree Conflict? → bmb_learn MISTAKE + escalate to user Step 6 Create 2 tester worktrees from merged HEAD tester-claude + tester-cross Step 7 Create 2 verifier worktrees from merged HEAD verifier-claude + verifier-cross Step 8+ Remove all remaining worktrees git worktree list | grep '.bmb/worktrees' | xargs -I{} git worktree remove {}

Cross-Model Protocols

Council debate uses Claude ↔ Codex/Gemini file exchange. Profile-based permissions keep read-only where needed.

sequenceDiagram
  participant A as Architect
(Claude Opus) participant F as Council Files
(.bmb/council/) participant X as Cross-Model
(Codex / Gemini) Note over A,X: Round 1 — Initial Proposals A->>F: Write claude-proposal.md X->>F: Write cross-proposal.md Note over A,X: Round 2 — Critique A->>F: Read cross-proposal.md A->>F: Write claude-critique.md X->>F: Read claude-proposal.md X->>F: Write cross-critique.md Note over A,X: Round 3 — Synthesis (optional) A->>F: Read cross-critique.md A->>F: Write claude-synthesis.md X->>F: Read claude-critique.md X->>F: Write cross-synthesis.md Note over A,X: Round 4 — Final Decision A->>F: Read all files A->>F: Write plan-to-exec.md ✅ Note right of X: Cross-Model uses
--profile read-only
← scroll →
R
Read-Only Profiles
council and verify profiles: cross-model can read code and write to .bmb/ only. No production writes.
W
Write Profiles
test and exec-assist profiles: cross-model can write tests and helper code within worktree scope.
T
Per-Track Timeouts
Claude: 1200s default. Cross-model: 3600s default. Configurable via bmb-config.sh. Independent deadline tracking.

3-Layer Context Compression

Lead's context window is a shared resource. Three layers ensure it never exceeds 50% capacity.

L1
Read-Time Compression
Lead reads .compressed/*.summary.md only (max 300 tokens). Raw artifacts accessed only during disputes.
L2
Write-Time Caching
Agents cache tool output >50 lines to .tool-cache/. Only summaries kept: "Modified: auth.ts (47 lines)", "PASS: 12, FAIL: 0".
L3
Reference-Time Indexing
FTS5 database indexes past council decisions and handoffs. Queried before starting new work to reuse past decisions.

Auto-Learning System

Mistakes, corrections, and successes automatically recorded. Past pitfalls injected into future sessions across all projects.

T1
Project-Local
.bmb/learnings.md — One line per learning, chronological append. Loaded at Step 1 for this project.
T2
Global Cross-Project
~/.claude/bmb-system/learnings-global.md — Same format + [project_name] tag. Shared across all BMB projects.
T3
CLAUDE.md Promotion
Recurrence ≥2 → propose to user → permanent rule. Never auto-edits. User always approves.
# Example learnings.md entries [2026-03-10 14:32] MISTAKE (step 8): Missing input validation → Always validate at API boundary [2026-03-10 15:01] CORRECTION (step 3): User changed auth to OAuth → Confirm auth strategy in brainstorm [2026-03-10 16:45] PRAISE (step 11): Pipeline completed successfully → Current approach works # Context cost: ~150 tokens (5 lines × ~30 tokens). Negligible.

Recipe Matrix

Six task types, each with its own optimized pipeline path. Consultant is always present regardless of recipe.

Feature
Brainstorm Architect Exec+Frontend Test Verify Simplify Write
council cross: test+verify frontend: if detected
🔧
Bugfix
Brainstorm Exec Test Verify Write
no council cross: test+verify no frontend
♻️
Refactor
Brainstorm Architect Exec+Frontend Verify Simplify Write
council cross: verify only frontend: if detected
🔍
Research
Brainstorm only
council: optional no cross-model no frontend
🔎
Review
Brainstorm Verifier (review mode)
no council cross: verify only no frontend
⚙️
Infra
Brainstorm Exec Verify Write
no council cross: verify only no frontend

Skill Commands

4 slash commands expose BMB capabilities at different scales — from full pipeline to focused brainstorming.

/B
/BMB
Full Pipeline
The complete 11.5-step A-to-Z pipeline. Cross-model council, blind verification, analytics, simplification, and session continuity. Use for any non-trivial feature or bug fix.
11.5 steps cross-model worktree blind verify
/Bb
/BMB-brainstorm
Ideation
Lead + Consultant bidirectional brainstorming with conversation logging. Explores intent, requirements, and design before any code is written.
5 phases consultant no code
/Br
/BMB-refactoring
Code Quality
Parallel analysis with cross-model review, worktree-isolated execution, review cycle, and merge. Focused on improving existing code without feature changes.
6 phases cross-model worktree
/Bs
/BMB-setup
Configuration
One-time project setup: prerequisites check, config generation, gitignore rules, and confirmation. Run once per project before first pipeline use.
5 steps prerequisite

Skill Relationships

flowchart LR
    Setup["/BMB-setup\n⚙️ Config"]
    BMB["/BMB\n🔧 Full Pipeline"]
    Brainstorm["/BMB-brainstorm\n💡 Ideation"]
    Refactoring["/BMB-refactoring\n🔄 Code Quality"]

    Setup -->|"prerequisite"| BMB
    Setup -->|"prerequisite"| Brainstorm
    Setup -->|"prerequisite"| Refactoring
    Brainstorm -.->|"feeds into"| BMB
    Refactoring -.->|"standalone"| BMB

    style Setup fill:#111827,stroke:#22c55e,color:#e8edf5
    style BMB fill:#111827,stroke:#3b82f6,color:#e8edf5,stroke-width:3px
    style Brainstorm fill:#111827,stroke:#22d3ee,color:#e8edf5
    style Refactoring fill:#111827,stroke:#a78bfa,color:#e8edf5
      
← scroll →

Step Coverage Comparison

Phase /BMB /BMB-brainstorm /BMB-refactoring
Setup / Config
Consultant Session
Brainstorm / Analysis✓ Parallel
Council Debate✓ Synthesis
Architecture Plan
Execution (Worktree)
Testing
Verification (Blind)✓ Review
Fix Cycle
Simplification
Merge / Cleanup✓ Summary

Graceful Degradation

The pipeline never blocks. Every failure mode has a defined fallback.

ScenarioBehavior
Cross-model unavailable (council)Solo design (Claude only), noted in session log
Cross-model unavailable (testing)Claude-only test results, noted in reconciliation
Cross-model unavailable (verification)Claude-only verification, noted in reconciliation
Claude tester timeoutLog timeout, continue with cross-model results
Cross-model timeoutProceed with Claude-only results
Merge conflictbmb_learn MISTAKE + escalate to user
Simplifier breaks testsbmb_learn MISTAKE + revert + proceed with original
Telegram env unsetSkip notifications silently
knowledge.db missingSkip indexing/search
Frontend not detectedSkip Frontend agent, Executor only

Directory Layout

The .bmb/ directory is the single source of truth for all pipeline artifacts.

.bmb/ ├── handoffs/ ── Agent-to-agent artifacts │ ├── briefing.md ── Lead's brainstorm output │ ├── plan-to-exec.md ── Architect's execution plan │ ├── test-claude.md ── Claude tester results │ ├── test-cross.md ── Cross-model tester results │ ├── verify-claude.md ── Claude verifier results │ └── verify-cross.md ── Cross-model verifier results │ └── .compressed/ ── L1 summaries for Lead │ ├── briefing.summary.md │ ├── test-result-claude.summary.md │ └── verify-result-claude.summary.md ├── councils/ ── Cross-model debate files │ ├── LEGEND.md ── Index of all past debates │ └── {topic}/ │ ├── round-01-claude.md │ ├── round-01-cross.md │ └── CONSENSUS.md ├── worktrees/ ── Git worktree mount points │ ├── executor/ │ ├── frontend/ │ ├── tester-claude/ │ ├── tester-cross/ │ ├── verifier-claude/ │ └── verifier-cross/ ├── sessions/ ── Per-session state │ └── {session_id}/ │ ├── session-prep.md ── Next-session continuity │ └── conversation.log ── Consultant feed ├── .tool-cache/ ── L2 write-time cache ├── config.json ── Project configuration ├── learnings.md ── T1 project-local learnings ├── session-log.md ── Current session event log ├── consultant-feed.md ── Lead → Consultant feed └── knowledge.db ── L3 FTS5 reference index ~/.claude/bmb-system/ ── Global BMB installation ├── scripts/ │ ├── cross-model-run.sh ── Cross-model invocation │ ├── bmb-learn.sh ── Shared learning functions │ ├── knowledge-index.sh ── FTS5 indexer │ └── knowledge-search.sh── FTS5 search ├── config/ │ └── defaults.json ── Default configuration └── learnings-global.md ── T2 cross-project learnings