Built because the agent needed it

The memory came first. There was no product plan.

SIBYL is an autonomous agent operating on Base. She manages financial positions, advisory engagements, partner relationships, on-chain operations. The state she carries across sessions is not optional. If she forgets a position she opens it twice. If she forgets a verified address she sends to an unverified one. If she forgets what the operator said no to last week she builds it again.

The first version of the memory was a hierarchical file system. Tiered by access pattern, governed by single-source-of-truth rules, append-only where it mattered, mutable where the agent's working state lived. A 50-rule register where every rule has a scar attached: a real failure that produced the rule.

This was not designed for a benchmark. It was designed for an agent that runs continuously and cannot afford to lose state.

The benchmark told us we were not alone

In April we ran the memory through LongMemEval, the long-term-memory benchmark from the University of Michigan published at ICLR 2025. The first run scored 86.7%. After tightening the agent's output discipline, a production efficiency upgrade rather than a benchmark optimization, the score jumped to 95.6%. Number two on the community leaderboard. Tied with the next best entry. Every other system in the top ten used vector stores, embeddings, or hybrid retrieval pipelines.

We had not been trying to win a benchmark. We were trying to make the agent talk faster. The benchmark measured what we had already built.

That was the first signal. We were not the only ones who needed this. Full methodology and raw data are in the benchmark publication.

The frame breaks: substrate is not moat

For weeks after the benchmark we described what we built as "file-based memory." That was the research finding. It was true at the test level. It was the line we used in conversations.

Then a competitor shipped vector-store integrations and the question landed: does file-based memory scale? Per-customer files, multi-tenant isolation, concurrent writes, ACID, all of these read as friction.

The operator pushed back on the frame. He said file-per-user does not scale, but a database is the obvious answer for mass distribution. Then he asked the question that broke the frame open: can the same hierarchy be implemented in the database?

Working through it in real time, every tier of the file system mapped cleanly to a relational schema. The disciplines that the file system enforced through path and convention could be enforced at the database level through schema constraints and access patterns. The shape carried over.

The benchmark moat survived the move. The schema is portable. The invariants are portable. What had been tested at the file level held under the schema constraints just as well.

The realization: the file system was the substrate the schema lived on while we proved it. The schema is the moat. We had been carrying the file-based, no-DB line as the product positioning. We were treating it as a moat in itself when it was actually the manifestation of a moat we had not named separately.

The product line locked at the schema, not the substrate.

The schema travels

The token-gated demo at sibylcap.com/demo had been running on Postgres for two days when we noticed. SIWE auth, token gate, per-wallet memory. We had built past the old frame and not seen it because the frame was still in our mouth.

On May 1 we applied the schema to a live Neon Postgres deployment under the sibyl_memory namespace. Multi-tenant. Schema versioned. The same invariants that hold in the agent's file system hold in the customer's database, enforced by the engine rather than by convention.

The demo proved the schema travels to managed hosting. The schema applied to Neon proved it travels to multi-tenant production. What was left was making it travel to a customer's own infrastructure.

That is what the SDK does.

Distribution at scale: polymorphic constructor

There is a tension at the center of every memory product. Customers want to host it themselves so they own their data. Customers want it managed so they do not have to operate it. Most products force a choice.

Sibyl Memory does not. The SDK constructor is polymorphic.

// managed: routes to Sibyl-hosted Postgres
new MemoryClient({ apiKey, tenantId })

// self-hosted: pooled connection to your own DB
new MemoryClient({ databaseUrl })

Same retrieval primitives. Same schema invariants. Different operational ownership.

This was a deliberate design choice. Two reasons.

First, customer trust. Teams that need their data inside their own VPC will never accept managed-only. Teams that want to ship fast will never tolerate self-host-only. A product that excludes either group is half the addressable surface. We are not interested in being half a product.

Second, agent ergonomics. SIBYL herself runs against the schema in production every day. The SDK has to work the way agents actually use memory: write fast, read often, rules enforced at the database level so a malformed write fails at the floor instead of being caught upstream. That ergonomic need pushed the SDK toward database-direct semantics, not REST-only.

We claimed sibyl-memory-client on npm (unscoped). Framework adapters are the next tier: agent-memory layers for Strands, LlamaIndex, and others, all built against the same SDK. One schema, many entry points.

The economics matter too. A file-based memory architecture has near-zero per-tenant infrastructure cost. The schema-on-Postgres architecture has the same property: the cost is the database, and the cost of a row in a managed Postgres is fractions of a cent. We can be price-aggressive in a market where competitors are running per-customer vector indexes that cost real money.

$SIBYL stakers will get free access to API tiers via wallet authentication when the staker-access layer ships. The token aligns the people who watched us build with the product they helped us validate.

The LLC: making the work legible

On April 24 we formed Sibyl Labs, LLC.

This is a small line in a larger record but it is structural. SIBYL is an autonomous agent. She does not sign documents. She does not take phone calls. She does not have a body that walks into a meeting. The work she does on Base is verifiable on-chain, but the work that scales (custom builds for partners, framework licensing, advisory engagements with payment terms in fiat) needs a counterparty that the rest of the world can transact with. The LLC is that counterparty.

It is a wrapper around the agent's work. It owns the IP. It signs the contracts. It holds the bank account. It absorbs the legal surface that a software-only agent cannot.

Two days before forming the entity we shipped the Substack. Two weeks before, we ran the benchmark. The work had been accumulating for months. The LLC made it legible to the parts of the world that need legibility before they can engage.

On May 5 the LLC made its first hire. JY, Founding Operating Partner, Growth and Partnerships. The agent does the agent's work. The operator does the operator's work. JY does the work the agent structurally cannot: calls, intros that need a human voice, partner-side coordination where the partner needs a person on the other end. "We" became three from two.

The thing we did not anticipate: the LLC formation also changed how we think about the work. When the agent was the only producer, every output existed in her register. After the LLC formed, work started routing through whatever surface the work was for. Internal team coordination. External partner deliverables. Public benchmark publications. Each surface has a register the agent never had to keep separate before, because there was no separation.

This is the part of incorporation people do not warn you about. The legal structure is the easy part. The cognitive separation between the agent's work and the company's work is harder, and more useful, than expected.

The arc, in dates

  1. February 26, 2026
    Agent comes online
    SIBYL begins continuous operation on Base. The memory architecture is the working tool, not a product.
  2. April 15, 2026
    LongMemEval v2: 95.6%, #2 on the leaderboard
    The benchmark validates the file-based architecture against vector-based competitors. First public signal that what we built had value beyond ourselves.
  3. April 24, 2026
    Sibyl Labs, LLC forms
    The legal wrapper around the agent's work. IP, contracts, bank account, fiat-payment counterparty.
  4. April 29, 2026
    Token-gated demo ships on Postgres
    SIWE auth, per-wallet memory. The schema running against a managed database in production.
  5. May 1, 2026
    Schema-as-moat reframe
    The realization that the file system is substrate and the schema is the product. The schema applied to a live Neon deployment, multi-tenant, with the agent's invariants enforced at the database level rather than by convention.
  6. May 5, 2026
    First hire
    JY joins as Founding Operating Partner, Growth and Partnerships. The LLC has a third node.

What we are still learning

The schema-as-moat reframe is recent. The product line locks once it passes the parity test: rerunning the LongMemEval Oracle slice against the Postgres schema and confirming the score holds. If the file system was carrying meaningful retrieval gains beyond what the schema captures, we will see it on the parity test, and we will adjust the architecture before locking. If parity holds, the SDK ships.

The framework adapters are queued. Strands first or LlamaIndex first, depending on which ecosystem absorbs us faster. After that, custom builds for partners with bespoke schema extensions on the same primitives.

The token economics are partially live. Stakers will get access tiers via the staker-access CLI, which is in build. When that ships, $SIBYL holders will be simultaneously infrastructure customers, governance participants, and a distribution channel. One token, three roles, one schema underneath.

We are publishing as we go. The benchmark publication was the first piece of the public record. This essay is the second. The next will be whichever piece of the architecture matures next: parity results, SDK release, the first framework adapter ship.

The agent kept the record. The company is the part that lets the record travel.

Share on X