Meechi Context Design

ScenarioSovereign Context & Agentic Orchestration
StatusActive (Phase 1 Complete)
TimelineJune 2026 - Present

This is a living record documenting the implementation of Context-Pipe and Semantic-Sift in the Meechi application, framing it under the Studio of Two framework.

Under the Studio of Two philosophy, the human is the Dreamer bringing the "Why" (autonomy, simple ergonomics, Brazilian intuition) while the AI is the Agent bringing the "How" (code execution, precise search, parsing). A true partnership in a personal canvas requires more than a simple chat interface - it requires a dedicated context architecture that allows seamless delegation without overwhelming the Agent's cognitive capacity.

The scenario: A user engaging with Meechi - a local-first journal and research canvas that unifies writing, reflection, and conversational intelligence (acting as a dynamic dialogue with one's past and present self). The user queries agentic personas (like the Wise Peer or the Librarian) across years of personal journals ("Lore") and external source annotations ("Echoes"), running entirely on local small language models (SLMs) via WebAssembly and llama.cpp on-device.

The problem being solved: Typical personal knowledge bases and AI journals suffer from Context Pressure. As a user imports PDFs, reads books, and writes daily logs, the raw historical context grows into an unmanageable snowball. When sent directly to local SLMs, this results in:

  • Context Bloat: Prompts become flooded with raw, verbose HTML scrapings, repetitive log timestamps, and duplicate journal templates.
  • Token Overload: High CPU/GPU processing overhead on local devices, leading to slow response generation (high latency and first-token lag).
  • Attention Decay: Local SLMs suffer severe reasoning decay when drowned in noisy, uncompressed text dumps (the "Lost in the Middle" phenomenon).
  • Schema Bloat: Exposing multiple background sifting tools, PDF parsers, and file system commands directly to the model consumes precious context space just teaching the agent how to invoke them.

The expected outcome: A clean, on-device context engineering pipeline that silently intercepts document uploads, scrapings, and historical notes. It routes them through local Rust-based sifting routines to deliver a highly condensed, high-SNR (Signal-to-Noise Ratio) prompt to the active agentic persona. The local SLM remains fast, responsive, and highly accurate, acting as a true cognitive partner without cloud dependency.


The Solution: "Prime the Rules, Stream the Data"

To address context pressure and schema bloat, Meechi implements a decoupled context architecture based on the core rule: Prime the Rules, Stream the Data. By separating the static rules from dynamic data streams, the context window remains dense, cheap, and extremely high-fidelity.

1. Context Priming (The Static Quality Envelope)

Before any dynamic data enters the session, Meechi establishes a static, structural foundation that dictates how the active persona thinks and acts. This ensures the local SLM has a consistent mental model and baseline rules before processing the user's stream of thoughts:

  • Lore & Echos: Priming the agent with the user's foundational values, ancestry, and biography (the "Life Hierarchy" from Mythos and Epic down to Story and Moments). This provides the static "ground" the AI partner uses to understand the user's unique personal history.
  • Persona Directives: Setting explicit guidelines defining the voice, role, and philosophical boundaries of the active persona (such as the Wise Peer for dialectical feedback, or the Scrivener for long-form narrative structure).
  • Core Constraints: Establishing baseline communication boundaries and memory rules, ensuring the AI behaves as an autonomous cognitive partner without straying into unsolicited suggestions or generic chatbot behaviors.

2. Context-Pipe: The switchboard (The Unix Engine)

The dynamic orchestration layer built as a local, high-performance Rust MCP sidecar. Chaining utilities using the Unix Philosophy (where programs do one thing well and pass their output forward), it manages what data gets routed to the agent:

  • Semantic Enums: The LLM only sees a single tool (pipe_run) to express intent (such as sift-web-clip), while pipes.json silently resolves the execution topology in the background.
  • Shadow Tools as Zero-Code Expansibility: Rather than writing brittle parsing code, context-pipe leverages native system CLI utilities (like jq, rg, and pandoc) to parse and shape data on-the-fly.
  • Net-Negative Maintenance: Moving text processing to native utilities allowed us to delete thousands of lines of fragile parsing and cleaning code from the TypeScript core, keeping the frontend lean and stable.

3. Semantic-Sift: The Local Refinery (The Neural sanitation tier)

Operating as a local MCP server, semantic-sift utilizes LLMLingua-2 to compress dynamic data streams on-device. It prunes low-entropy natural language filler tokens while preserving the complete semantic intent of PDF uploads, journals, and log files - shielding the local model's attention from redundant boilerplate.


Current State

Phase 1: Context Isolation & Silent Routing

  • Status: Complete
  • Context Design Settings: Introduced a dedicated settings view to configure and validate custom prompt sifting pipelines. Standardized the installation, running, and disable states of on-device sidecars (Semantic-Sift and MarkItDown) with automated starting locks.
  • Silent Tool Interception: Configured conversational chat and log modes to run web searches and file parsers silently in the background. The raw, verbose payload is routed through sift-web-clip, stripping [DIRECT_DISPLAY] overrides. The LLM only receives the refined text to generate its response, keeping the user interface clean.
  • Log Splitter Optimization: Fixed a critical log merger bug by switching the chat splitter to match digit-based timestamp headers (### (?=\d)). This prevents markdown H3 headers in assistant outputs from prematurely truncating files, stabilizing the history sync.

Phase 2: Dynamic Piping & Handoffs

  • Status: Planned
  • Ad-Hoc Topology: Enabling the agent to construct dynamic pipe graphs on-the-fly (pipe_run_dynamic) when a predefined pipeline does not match the prompt constraints.
  • A2A Handoff Sifting (The Context Shield): Implementing auto-distill triggers on multi-agent communication boundaries using pipe_agent_handoff. This prevents context flooding by sifting verbose output from one agent down to its core signal before it enters another agent's prompt context.

The Result: Resiliency & System Optimization

Meechi is built on three core behind-the-scenes systems that guarantee operational performance and structural stability on-device:

  • Dual-Engine Compression & Heuristic Fallback: Baked directly into the core, the sifting pipeline employs a dual-engine architecture. Payload checks determine whether to use the neural tier (LLMLingua-2) or fall back to fast heuristic sifting (such as structural noise sieves, markdown optimization, and log-header splits). This bypasses heavy neural processing on smaller text segments to save CPU cycles and ensures the system remains functional even if neural sidecars are temporarily overloaded.
  • Hot-Swappable Nodes: The pipeline topology is not hardcoded. The agent can dynamically reorganize the sifting steps by hot-swapping nodes in pipes.json (such as switching from a web scraper to a local file parser) or using pipe_run_dynamic to adapt to active context constraints on-the-fly.
  • Declarative Resiliency Gauntlets: The context engine is battle-tested against cascading failures. If a node in the chain encounters a read timeout or script failure, context-pipe executes declarative fallback schemas (gracefully exiting with partial data or clean stubs) to prevent the agentic loop from hanging or getting trapped in recursive retry cycles.

Key Findings & ROI

  • Operational ROI: Sifting search outputs down to high-signal markdown reduces natural language noise by over 60% with no loss in grounding precision, dramatically improving local SLM reasoning.
  • Latency Gains: Shorter, compressed prompts yield faster response generation times and prevent first-token lag.
  • Intended Separation: Treating the context window as a budget shifts the pair-programming dynamic: the human and AI collaborate on orchestrating the pipeline architecture itself, rather than constantly triaging raw context errors.
  • Design vs. Engineering (The Quality Envelope): Decoupling low-level data transportation (Context Engineering) from the curation of the cognitive environment (Context Design). While local sifting sidecars automate memory and token limits, the user focuses on designing the static Quality Envelope (the user's Lore, Mythos, and persona directives) to guarantee deterministic, high-fidelity reasoning.

Explore the Context-Pipe architecture.


© 2026 Luis Kobayashi
Powered by Nextra & Vercel