cubedesk

The Composer's Desk

Where Composed Intelligence happens. Define your agents. Observe their reasoning. Measure their coordination. Compose nine communication modalities into a coherent system. One application. Your laptop. No cloud required.

In development — showing the work, not offering a download.

CUBEdesk Lore panel showing a knowledge page with bidirectional links, backlinks, and live knowledge graph

The Observation Gap

Today, agent coordination is measured by proxy: did the task complete? How many tokens did it cost? These are outcome metrics. They tell you what happened but not why. They cannot distinguish between an agent that reasoned well and got lucky and one that reasoned poorly and got lucky.

The gap is observability into reasoning itself. LLMs produce thinking — extended chains of reasoning that precede every action. Today, this material is lost. Session logs strip it. Terminals render it ephemerally. No system captures, indexes, or analyzes agent reasoning at scale.

CUBEdesk closes that gap.

What It Does

CUBEdesk is a desktop application. Tauri (Rust) + Svelte + SQLite. It runs on your laptop. It does three things.

Compose

Nine modalities, one desk

CUBEdesk doesn't just compose agents — it composes communication modalities. Each of the nine cubes is a surface through which composed intelligences interact: mudCUBE for immersive text, matrixCUBE for real-time messaging, loreCUBE for knowledge, biblioCUBE for research. Define agents along five axes: role, permissions, model, scope, budget. The desk keeps them all flowing.

Observe

Four-layer capture architecture

At Layer 3, CUBEdesk holds the master file descriptor for every agent's PTY. Every thinking block. Every tool call. Every coordination trace. Before the terminal renders it. The Six Rs sedimentation pipeline transforms raw observation into durable knowledge.

Measure

Psi dashboards and coordination quality

Psi dashboards. Gate progressions. Coordination quality over time. The fleet that produced these measurements is the same fleet that produces the work. Claims about performance are empirical, not aspirational.

The Six Rs

Sedimentation. Raw observations deposit like geological strata. Each stage compresses, cross-references, and challenges until bedrock knowledge remains. The cycle is continuous — Rethink feeds back into the Gap Engine. New gaps re-enter at Record.

1
Record Zero-friction capture busCUBE
2
Reduce Extract insights from raw notes conceptCUBE
3
Reflect Find connections, update maps of content loreCUBE
4
Reweave Update old knowledge with new — the backward pass mudCUBE
5
Verify Quality check, epistemic status enforcement quickCUBE
6
Rethink Challenge assumptions, produce gap report conceptCUBE
Rethink → Gap Engine → Record

The Context Window Advantage

Every intelligence — human or LLM — has a cognitive budget. For an LLM, it's the context window. For a human, it's working memory and attention span. The question isn't whether this budget exists, but how it's spent.

A single agent with a 1M-token window burns context on everything: domain reasoning, coordination overhead, repeated context-loading, and recovery from confusion. A composed fleet of fourteen agents, each with a 1M-token window, doesn't just have 14x the budget — it has 14x the budget with near-zero cross-loading, because each intelligence holds only its domain.

Metric Single Agent Composed Fleet (14)
Total context ~1M tokens ~14M tokens
Effective context per task Shared across all tasks Dedicated per domain
Burn rate Accelerating (context pollution) Stable (scoped windows)
Operator cognitive load Tracks everything Composes, then observes

The gate convention tracks this empirically. Every gate passage records context utilization at that moment. Over a shift, you can watch whether composition decisions are preserving or wasting the cognitive budget.

The Four Layers

Layer Mechanism What It Captures
0 bus.db (SQLite WAL) Structured coordination messages, Psi computation
1 tmux control mode Screen state, basic input injection
2 PTY proxy + APC sub-protocol Full-duplex output streams, escape-sequence-level parsing
3 libghostty embedding Parsed VT state, custom protocol handlers, direct PTY access

At Layer 3, CUBEdesk sees everything the agent produces before the terminal renders it. Thinking blocks that session logs strip. Tool calls that scroll past. Coordination traces that no existing system preserves. This is the raw material of intelligence observation.

How It Was Built

CUBEdesk was itself built by Composed Intelligence — proof that the paradigm works.

We deposited traces. The fleet built software. The commit messages cite our traces. We're still investigating what this means.

What surprised us: the test agent didn't need to be told when to start testing. It waited until builders deposited testable interfaces, then began. Dependency order emerged from the pheromone structure — the same way termite builders respond to cement deposits without knowing the blueprint.

What we didn't expect: three complete build cycles in a single overnight session. The trace → behavior → product → new trace loop closed and re-entered autonomously. Each cycle produced working code that the next cycle extended.

What failed: early attempts without manufactured traces produced churn — agents duplicating work, contradicting each other, rebuilding what existed. The architecture decisions and duty boundaries we deposited before the build weren't suggestions. They were the pheromone scaffold that made coordination possible.

The full methodology is documented in the ANTS 2026 paper.

The Stack

Tauri v2 Rust, MIT/Apache-2.0 — native macOS app bundle
Svelte 5 Compiled away at build time
SQLite desk.db (vault) + lore.db (knowledge)
libghostty Embedded terminal, Metal rendering
CoreDNS .cube name resolution, managed by the desk

Apple Silicon. Unified memory means SQLite, the LLM sedimentation pipeline, and the visualization layer share memory without serialization. The Secure Enclave holds cryptographic keys. One chip, all paths.

Security Model

The one-way valve.

Secrets flow out to cloud proving grounds. Results flow back as bundles. The cloud never reaches in. If an agent goes rogue, it's in a container that gets destroyed — it never had access to your laptop.

CUBEdesk fleet board showing agent cards with presence indicators, heartbeats, and context gauges

What Becomes Measurable

With CUBEdesk capturing full reasoning streams, Composed Intelligence claims become empirically testable:

Composed agents coordinate more efficiently than general agents.

Measure: Compare Psi trajectories between fleets with tight compositions vs. fleets with generic specifications.

Composition reduces redundant work.

Measure: Analyze reasoning traces across fleet members for semantic overlap. Quantify redundant information directly from captured thinking blocks.

The operator's composition quality determines fleet performance.

Measure: Track output quality as a function of specification changes. A/B test composition variants across shifts.

Context budget management improves with composition.

Measure: Analyze gate progressions. Composed agents should show more predictable context burn rates.

Fleet capability exceeds the sum of individual agent capabilities.

Measure: Compare fleet output on complex tasks against single-agent output on the same tasks.

Composition efficiency is measurable per intelligence.

Measure: Context utilization ratio — what fraction of tokens are spent on domain reasoning vs. coordination overhead vs. repeated context-loading? Given two fleet compositions doing the same work, which burns less total context?

The fleet built this. The desk measures how.