Memory-Building

Build memory once; recall it every session.

An agent that forgets between sessions re-discovers the same facts, re-reads the same emails, and repeats the same mistakes. Memory-building turns each session’s work into structured, recallable memory — facts that stay pinned, patterns that accumulate into reflections, and evidence that clusters into a casefile story. The next session starts knowing what the last one learned.

capturerecallsemantic searchstory-buildingtime-decay rankingcompaction survival

Three modes

RECALL  (session start)  →  WRITE  (during + after work)  →  QUERY  (when researching)

Recall — at the start of every session, load pinned facts, active reflections, last-session context, and the hottest cases. Cheap, once, ~2K tokens.
Write — capture only what’s genuinely new: a note, a pinned fact, a reflection, or a draft.
Query — search memory by keyword or by meaning, then act on what comes back.

Key concept — four kinds of memory

A single kind on every memory row is what decides what survives. It’s the type system the whole skill turns on:

Kind	What it is	Lifespan
note	one observation from one email or document	lives with its case
fact	permanent, unchanging — pinned	loaded at every cold start
reflection	a pattern seen across 3+ events	evolving, long-lived
draft	a working draft (reply, report)	until it’s finalised

The discipline is in what you don’t write. Pin equipment types and serials, recurring-vendor contacts, confirmed survey cycles, sister-ship cross-references. Never pin status updates, email summaries, opinions, or anything with a date that goes stale — those are notes or reflections, not facts.

Worked example — MV ONE AURORA

Discovers the main engine’s type and serial → writes a fact, pinned. It loads free at every future cold start, so no agent re-researches it.
Notices the turbocharger has vibrated three times in four months — and cites all three reports → writes one reflection, not three loose notes.
A single invoice escalation on one email → a note, which stays with its case.
Clusters the case’s notes into a story (~200 words, first-person), then finalises it.

Next session, recall surfaces the pinned fact and the vibration reflection before any work starts — the agent picks up where the last one left off instead of starting cold.

Under the hood

When to write what — the decision guide

You discovered…	Write as
An equipment serial or maker	`fact` (pinned)
A recurring-vendor contact	`fact` (pinned)
A confirmed class-survey date	`fact` (pinned)
A pattern across 3+ emails	`reflection` (cite the evidence)
A single-email observation	`note` (the default)
Nothing new this session	write nothing — silence is fine

Reflections must cite evidence — “three vibration events on reports 1042, 1187, 1305” — not just “there’s a pattern.”

Recall at cold start

Three lookups run before any work, totalling ~2K tokens:

Pinned facts — permanent vessel/domain knowledge from earlier sessions.
Active reflections — patterns already noticed, so they aren’t re-discovered.
Last-session context — the final exchanges of the previous session, to continue mid-thread.

For fleet agents, a ranked hot-cases list replaces reading every case index — most important cases first.

Two kinds of search

Keyword — find memory rows by the words they contain.
Semantic — find rows that mean the same thing in different words. A search for “turbocharger bearing failure” also surfaces “T/C rotor vibration” and “exhaust gas temp abnormal after turbo service.”

Semantic search is what makes a years-old note findable when nobody remembers the exact wording used.

What runs automatically

Decay ranking — cases are scored on recency + importance + recent activity, so hot cases rise and stale ones fade with no manual curation.
Semantic indexing — new notes are embedded for meaning-based search shortly after they’re written.
Story versioning — rewriting a story archives the previous version first; the last several are kept.
Write locks — if two agents try to build the same case at once, the second waits instead of clobbering the first.

Surviving compaction

When context fills and the runtime compacts, an agent loses its short-term memory. A tiered budget plus hooks keep it oriented:

Tier 0 — identity + essential pins, frozen in the system prompt (cache-stable).
Tier 1 — final story, leave-note, active stories, pinned facts, recent notes — ~10K tokens, injected once at cold start.
Tier 2 — everything else, pulled on demand via recall/query.

A pre-compaction hook drops a mechanical leave-note (no model call, so it never fails); a post-compaction hook re-loads Tier 1 and replays the last slice of the conversation.