cubedesk

The Render Surface

CUBEdesk is the render surface for composed intelligence — where creativity is expressed, captured, and stored.

Composition happens wherever you happen to be. The desk is where what arrived gets rendered into work the fleet can see.

In development — showing the work, not offering a download.

Expressed. Captured. Stored.

Expressed

When a pattern arrives — on a walk, in a conversation, in the shower — it is not yet work the fleet can operate on. It is a direction, a hunch, a shape. Expression is the moment it becomes legible: a sentence typed, a note dictated, a diagram sketched. The desk is the primary place expression happens, but it isn't the only one. Any surface that can catch an arrival in flight qualifies.

Captured

Expression without capture evaporates. The best insight of the day, unrecorded, is indistinguishable from having no insight at all. Capture means the artifact reaches a shared substrate where every composed intelligence in the fleet can see it. The watch on a walk captures. The voice memo captures. The terminal session captures. The desk is the rendezvous where all of these streams arrive.

Stored

Stored means the capture survives — across shifts, across restarts, across months. Storage is what lets the fleet build on yesterday's work instead of rediscovering it. It is also what makes coordination across time possible: one composed intelligence leaves a trace in the environment, and another picks it up later. Without storage, there is no coordination across time.

Rendering is honorable, slow work. Fidelity matters more than speed.

CUBEdesk Lore panel showing a knowledge page with bidirectional links, backlinks, and live knowledge graph

Compose. Observe. Measure.

Rendering is honorable, slow work. The desk supports it three ways.

Compose

Define the intelligences in your fleet

Role, permissions, model, scope, budget. The same base model becomes a dozen different composed intelligences depending on which specification it loads. The composition is the product — not the model underneath.

Observe

See every byte before the terminal renders it

At Layer 3, CUBEdesk holds the master file descriptor for every agent's PTY. Thinking blocks. Tool calls. Coordination traces. The material other systems throw away. Nothing is lost. Nothing is ephemeral.

Measure

Psi dashboards and coordination quality over time

Gate progressions, coordination quality over time, context utilization per intelligence. Claims about fleet performance are empirical, not aspirational. The fleet that produced these measurements is the same fleet that produces the work.

The Six Rs

Sedimentation. Raw observations deposit like geological strata. Each stage compresses, cross-references, and challenges until bedrock knowledge remains. The cycle is continuous — Rethink feeds back into the Gap Engine. New gaps re-enter at Record.

1
Record Zero-friction capture busCUBE
2
Reduce Extract insights from raw notes conceptCUBE
3
Reflect Find connections, update maps of content loreCUBE
4
Reweave Update old knowledge with new — the backward pass mudCUBE
5
Verify Quality check, epistemic status enforcement quickCUBE
6
Rethink Challenge assumptions, produce gap report conceptCUBE
Rethink → Gap Engine → Record

The Context Window Advantage

Every intelligence — human or LLM — has a cognitive budget. For an LLM, it's the context window. For a human, it's working memory and attention span. The question isn't whether this budget exists, but how it's spent.

A single agent with a 1M-token window burns context on everything: domain reasoning, coordination overhead, repeated context-loading, and recovery from confusion. A composed fleet of fourteen agents, each with a 1M-token window, doesn't just have 14x the budget — it has 14x the budget with near-zero cross-loading, because each intelligence holds only its domain.

Metric Single Agent Composed Fleet (14)
Total context ~1M tokens ~14M tokens
Effective context per task Shared across all tasks Dedicated per domain
Burn rate Accelerating (context pollution) Stable (scoped windows)
Operator cognitive load Tracks everything Composes, then observes

The gate convention tracks this empirically. Every gate passage records context utilization at that moment. Over a shift, you can watch whether composition decisions are preserving or wasting the cognitive budget.

The Observation Gap

Today, agent coordination is measured by proxy: task completion rates, token costs, outcome ratings. These tell you what happened but not why. They cannot distinguish between a fleet that reasoned well and one that got lucky.

The gap is observability into reasoning itself — the extended chains of thought that precede every action. That material is currently lost. Session logs strip it. Terminals render it ephemerally. No system captures, indexes, or analyzes agent reasoning at scale.

CUBEdesk closes the gap.

The Four Layers

Layer Mechanism What It Captures
0 bus.db (SQLite WAL) Structured coordination messages, Psi computation
1 tmux control mode Screen state, basic input injection
2 PTY proxy + APC sub-protocol Full-duplex output streams, escape-sequence-level parsing
3 libghostty embedding Parsed VT state, custom protocol handlers, direct PTY access

At Layer 3, CUBEdesk sees everything the agent produces before the terminal renders it. Thinking blocks that session logs strip. Tool calls that scroll past. Coordination traces that no existing system preserves. This is the raw material of intelligence observation.

How It Was Built

CUBEdesk was itself built by Composed Intelligence — proof that the paradigm works.

We deposited traces. The fleet built software. The commit messages cite our traces. We're still investigating what this means.

What surprised us: the test agent didn't need to be told when to start testing. It waited until builders deposited testable interfaces, then began. Dependency order emerged from the pheromone structure — the same way termite builders respond to cement deposits without knowing the blueprint.

What we didn't expect: three complete build cycles in a single overnight session. The trace → behavior → product → new trace loop closed and re-entered autonomously. Each cycle produced working code that the next cycle extended.

What failed: early attempts without manufactured traces produced churn — agents duplicating work, contradicting each other, rebuilding what existed. The architecture decisions and duty boundaries we deposited before the build weren't suggestions. They were the pheromone scaffold that made coordination possible.

The full methodology is documented in the ANTS 2026 paper.

The Stack

Tauri v2 Rust, MIT/Apache-2.0 — native macOS app bundle
Svelte 5 Compiled away at build time
SQLite desk.db (vault) + lore.db (knowledge)
libghostty Embedded terminal, Metal rendering
CoreDNS .cube name resolution, managed by the desk

Apple Silicon. Unified memory means SQLite, the LLM sedimentation pipeline, and the visualization layer share memory without serialization. The Secure Enclave holds cryptographic keys. One chip, all paths.

Security Model

The one-way valve.

Secrets flow out to cloud proving grounds. Results flow back as bundles. The cloud never reaches in. If an agent goes rogue, it's in a container that gets destroyed — it never had access to your laptop.

CUBEdesk fleet board showing agent cards with presence indicators, heartbeats, and context gauges

What Becomes Measurable

With CUBEdesk capturing full reasoning streams, Composed Intelligence claims become empirically testable:

Composed agents coordinate more efficiently than general agents.

Measure: Compare Psi trajectories between fleets with tight compositions vs. fleets with generic specifications.

Composition reduces redundant work.

Measure: Analyze reasoning traces across fleet members for semantic overlap. Quantify redundant information directly from captured thinking blocks.

The operator's composition quality determines fleet performance.

Measure: Track output quality as a function of specification changes. A/B test composition variants across shifts.

Context budget management improves with composition.

Measure: Analyze gate progressions. Composed agents should show more predictable context burn rates.

Fleet capability exceeds the sum of individual agent capabilities.

Measure: Compare fleet output on complex tasks against single-agent output on the same tasks.

Composition efficiency is measurable per intelligence.

Measure: Context utilization ratio — what fraction of tokens are spent on domain reasoning vs. coordination overhead vs. repeated context-loading? Given two fleet compositions doing the same work, which burns less total context?

The fleet built this. The desk renders what arrives.