cubedesk
The Render Surface
CUBEdesk is the render surface for composed intelligence — where creativity is expressed, captured, and stored.
Composition happens wherever you happen to be. The desk is where what arrived gets rendered into work the fleet can see.
In development — showing the work, not offering a download.
Expressed. Captured. Stored.
Expressed
When a pattern arrives — on a walk, in a conversation, in the shower — it is not yet work the fleet can operate on. It is a direction, a hunch, a shape. Expression is the moment it becomes legible: a sentence typed, a note dictated, a diagram sketched. The desk is the primary place expression happens, but it isn't the only one. Any surface that can catch an arrival in flight qualifies.
Captured
Expression without capture evaporates. The best insight of the day, unrecorded, is indistinguishable from having no insight at all. Capture means the artifact reaches a shared substrate where every composed intelligence in the fleet can see it. The watch on a walk captures. The voice memo captures. The terminal session captures. The desk is the rendezvous where all of these streams arrive.
Stored
Stored means the capture survives — across shifts, across restarts, across months. Storage is what lets the fleet build on yesterday's work instead of rediscovering it. It is also what makes coordination across time possible: one composed intelligence leaves a trace in the environment, and another picks it up later. Without storage, there is no coordination across time.
Rendering is honorable, slow work. Fidelity matters more than speed.
Compose. Observe. Measure.
Rendering is honorable, slow work. The desk supports it three ways.
Define the intelligences in your fleet
Role, permissions, model, scope, budget. The same base model becomes a dozen different composed intelligences depending on which specification it loads. The composition is the product — not the model underneath.
See every byte before the terminal renders it
At Layer 3, CUBEdesk holds the master file descriptor for every agent's PTY. Thinking blocks. Tool calls. Coordination traces. The material other systems throw away. Nothing is lost. Nothing is ephemeral.
Psi dashboards and coordination quality over time
Gate progressions, coordination quality over time, context utilization per intelligence. Claims about fleet performance are empirical, not aspirational. The fleet that produced these measurements is the same fleet that produces the work.
The Six Rs
Sedimentation. Raw observations deposit like geological strata. Each stage compresses, cross-references, and challenges until bedrock knowledge remains. The cycle is continuous — Rethink feeds back into the Gap Engine. New gaps re-enter at Record.
The Context Window Advantage
Every intelligence — human or LLM — has a cognitive budget. For an LLM, it's the context window. For a human, it's working memory and attention span. The question isn't whether this budget exists, but how it's spent.
A single agent with a 1M-token window burns context on everything: domain reasoning, coordination overhead, repeated context-loading, and recovery from confusion. A composed fleet of fourteen agents, each with a 1M-token window, doesn't just have 14x the budget — it has 14x the budget with near-zero cross-loading, because each intelligence holds only its domain.
The gate convention tracks this empirically. Every gate passage records context utilization at that moment. Over a shift, you can watch whether composition decisions are preserving or wasting the cognitive budget.
The Observation Gap
Today, agent coordination is measured by proxy: task completion rates, token costs, outcome ratings. These tell you what happened but not why. They cannot distinguish between a fleet that reasoned well and one that got lucky.
The gap is observability into reasoning itself — the extended chains of thought that precede every action. That material is currently lost. Session logs strip it. Terminals render it ephemerally. No system captures, indexes, or analyzes agent reasoning at scale.
CUBEdesk closes the gap.
The Four Layers
At Layer 3, CUBEdesk sees everything the agent produces before the terminal renders it. Thinking blocks that session logs strip. Tool calls that scroll past. Coordination traces that no existing system preserves. This is the raw material of intelligence observation.
How It Was Built
CUBEdesk was itself built by Composed Intelligence — proof that the paradigm works.
We deposited traces. The fleet built software. The commit messages cite our traces. We're still investigating what this means.
What surprised us: the test agent didn't need to be told when to start testing. It waited until builders deposited testable interfaces, then began. Dependency order emerged from the pheromone structure — the same way termite builders respond to cement deposits without knowing the blueprint.
What we didn't expect: three complete build cycles in a single overnight session. The trace → behavior → product → new trace loop closed and re-entered autonomously. Each cycle produced working code that the next cycle extended.
What failed: early attempts without manufactured traces produced churn — agents duplicating work, contradicting each other, rebuilding what existed. The architecture decisions and duty boundaries we deposited before the build weren't suggestions. They were the pheromone scaffold that made coordination possible.
The full methodology is documented in the ANTS 2026 paper.
The Stack
Apple Silicon. Unified memory means SQLite, the LLM sedimentation pipeline, and the visualization layer share memory without serialization. The Secure Enclave holds cryptographic keys. One chip, all paths.
Security Model
The one-way valve.
Secrets flow out to cloud proving grounds. Results flow back as bundles. The cloud never reaches in. If an agent goes rogue, it's in a container that gets destroyed — it never had access to your laptop.
What Becomes Measurable
With CUBEdesk capturing full reasoning streams, Composed Intelligence claims become empirically testable:
Composed agents coordinate more efficiently than general agents.
Measure: Compare Psi trajectories between fleets with tight compositions vs. fleets with generic specifications.
Composition reduces redundant work.
Measure: Analyze reasoning traces across fleet members for semantic overlap. Quantify redundant information directly from captured thinking blocks.
The operator's composition quality determines fleet performance.
Measure: Track output quality as a function of specification changes. A/B test composition variants across shifts.
Context budget management improves with composition.
Measure: Analyze gate progressions. Composed agents should show more predictable context burn rates.
Fleet capability exceeds the sum of individual agent capabilities.
Measure: Compare fleet output on complex tasks against single-agent output on the same tasks.
Composition efficiency is measurable per intelligence.
Measure: Context utilization ratio — what fraction of tokens are spent on domain reasoning vs. coordination overhead vs. repeated context-loading? Given two fleet compositions doing the same work, which burns less total context?
The fleet built this. The desk renders what arrives.