Skip to content

Division Swarm

The OS for AI in production. Architected for control, safety, and scale.

Division Swarm runs long-running AI systems at fleet scale. Composable flows, per-entity state, forkable and audited end to end. Safe to deploy, engineered to stay under your control.

Why Division Swarm

Six things production AI actually needs.

  • 01

    Control

    You shape the flow, not the LLM. Strict gates validate every transition. Hallucinations stop at the gate.

  • 02

    Safety

    Cost ceilings per entity. Schema-validated events. Tenant isolation. Static checks before deploy.

  • 03

    Scale

    Thousands of concurrent agent teams. Each with its own durable state, surviving crashes and restarts.

  • 04

    Composability

    Flows nest into systems of any depth. Sub-flows materialize per entity on demand.

  • 05

    Observability

    Every event, every decision, every dollar traced end to end.

  • 06

    Determinism

    Replay any past timeline exactly. Test contract changes against real production data.

The model

The LLM never runs the system. Agents reason. Code decides. State survives.

Work is an entity moving through a state machine you declared: named states, guarded transitions, gates that must clear before the next step. Every transition (data writes, emitted events, state change) commits in one atomic transaction.

Agents reason inside scoped sessions and emit events. Deterministic code routes them. Nothing happens because an LLM said so. The runtime decides what each agent result actually changes.

State survives crashes, restarts, and weeks of waiting on a timer or a human. Hundreds of entities advance concurrently in isolated workspaces. Any run can be replayed turn by turn or forked from the log.

A handler declaration

contracts/ticket.yaml
ticket-orchestrator:
  event_handlers:
    ticket.validated:
      guard:
        check: "entity.priority in ['high', 'urgent']"
        on_fail: discard
      data_accumulation:
        writes: [order_summary, validation_context]
      sets_gate: g1_validation
      advances_to: processing
      emit: work.assigned

Runtime architecture

events contracts flows.yaml boot verification static analyzer engine orchestrator Postgres event store durable truth scoped sessions agent · sandbox agent · sandbox agent · sandbox

Composition

Compose small flows into massive AI systems.

A flow is a complete, importable package: state machine, system nodes, agents, tools, policy. Its boundary is declared by typed input and output pins, so composition is a declaration, not a refactor.

A parent flow imports children and wires their pins. The static analyzer verifies every connection at boot. Hundreds of flows run as one coordinated system on a single deployment, with the same durable persistence, isolation, and audit trail as a single flow.

Build a flow once; reuse it across teams, projects, and systems. Composition is how a small contract becomes a platform without becoming a monolith.

package 1 parent · 3 children

Rewind anything

git bisect for agent runs.

Because every event and every state change is persisted, any run can be rewound to any point in its history. Change one thing (a model, a prompt, a policy value, a tool's response) and re-execute the counterfactual. Compare it against the original, turn by turn.

The cost of iterating on a system drops to the cost of forking a run. Debugging stops being archaeology. And when a regulator or an auditor asks what happened, you don't show them logs; you show them the run.

run #4f2a · original
run #4f2a · forked
ingest_ticket
ingest_ticket
classify_intent
classify_intent
retrieve_context
retrieve_context
prompt: triage@v1 → triage@v2
draft_reply · cautious
draft_reply · decisive
request_approval
auto_resolve
escalated to human ·
closed

// one variable changed at the fork point: every downstream turn re-executed

Where it fits

Use a workflow runner for bounded tasks. Use Division Swarm when the system has to last.

Most agent frameworks aim at bounded workflows: a code-review pipeline, a research-then-synthesize chain, a task that runs and finishes. If that's your problem, use one.

Division Swarm is for the tier above: systems that run continuously, hold state across crashes, coordinate dozens of agents over days, and have to be auditable and replayable end to end. When your multi-agent system needs to be a durable system of record, not a workflow invocation, that's where Division Swarm lives.

Shape
Workflow runners A workflow that runs once and exits
Division Swarm A stateful system that runs continuously
State
Workflow runners In-memory, gone on exit
Division Swarm Event-sourced, durable in Postgres
Crash recovery
Workflow runners Re-run from the top
Division Swarm Resume from last committed transaction
Long waits
Workflow runners Not the model
Division Swarm Entities wait days, then resume
History
Workflow runners Logs
Division Swarm Auditable, replayable, forkable
Best for
Workflow runners Bounded tasks
Division Swarm Long-running autonomous systems

Technical facts

Single Go binary.
SQLite for local and dev, Postgres for production. No sprawling dependency tree.
Model-neutral.
Anthropic API, Claude CLI, OpenAI-compatible Chat Completions, and native OpenAI Responses ship today; the runtime is provider-agnostic.
Contract-driven.
Flows declared in YAML, validated by a static analyzer before the runtime boots.
MCP-native.
Division Swarm both consumes upstream MCP servers and exposes its own MCP endpoint.
Open and inspectable.
Apache 2.0. Single machine-readable platform specification governs every public surface. license to verify

Integration services

Engineering partnership for enterprise Swarm adoption.

For teams running Division Swarm in regulated or high-stakes contexts, the maintainer's team is available for direct engineering partnership. Engagements are scoped, time-boxed, and led by the people who designed the runtime:not a salesforce.

Fit assessment

A focused review of the workload, the existing infrastructure, and the decision boundary between Swarm and a bounded workflow runner. Output: a written recommendation, not a deck.

Flow authoring

Contract design and implementation for the business processes the team intends to run on Swarm. Includes the contract bundle, a passing conformance plan, and the operator runbook.

Production hardening

Postgres deployment topology, observability wiring, identity and secret-manager integration, and a staged rollout from a single flow to a multi-flow surface.

Team enablement

Onboarding for the engineers who will own the system day-to-day: state-machine authoring, analyzer-driven iteration, replay and fork in operations, and the audit story end to end.

Direct engagement

Direct introduction to the maintainer's team. Replies within two business days.

Discuss an engagement

Engineering rigor

Read the spec. Check the bundles. Run verify yourself.

Division Swarm doesn't ask for trust. Every claim in this page is bound to a single machine-readable platform specification, enforced by a static analyzer that runs before the runtime can boot, and proved against a conformance suite the maintainer doesn't pay an LLM bill to execute.

Boot a flow with a half-finished contract and the analyzer refuses to start the runtime. Change a payload field on one event and 46 structural checks ripple across the bundle until every consumer is reconciled. Open the repository and the same checks that run in CI run on your laptop in under a second.

Platform specification, complete
v1.6.0
Analyzer checks before runtime boot
46
Conformance bundles across 12 tiers
200+
Public source, public spec
Apache 2.0
~/contracts verifier
$ swarm verify --contracts ./my-flow
[1/46] event_runtime_wiring_validation ok
[2/46] single_node_per_event ok
[3/46] transition_reference_validation ok
[4/46] entity_writer_coverage ok
[46/46] mcp_gateway_route_coverage ok
verify ok: contracts=my-flow
$

Output of the static analyzer against an example bundle. Reproducible locally; no runtime, no database, no LLM credential required.

Run your first system in minutes.

terminal
# install
go install github.com/division-sh/swarm/cmd/swarm@latest

# verify and run your own bundle (see docs.division.sh/quickstart)
swarm verify --contracts ./my-flow
swarm run --contracts ./my-flow --event order.created --payload ./payload.json

Newsletter

Releases, new flows, and occasional notes on production multi-agent patterns. No spam.