Memory Layer¶
The Problem¶
Stateless LLMs treat each session as a clean slate. Without memory, agents repeat mistakes humans already corrected, re-attach context at token cost, and cannot maintain continuity over multi-step workflows. The existing knowledge capture tools record institutional knowledge, but lack per-analyst personalization, temporal reasoning, and automatic surfacing of relevant context.
How It Works¶
The memory layer stores everything agents accumulate across sessions in a single memory_records table backed by PostgreSQL with pgvector for semantic search. Memories are scoped by two axes: user (who created it) and persona (who can see it).
flowchart TB
subgraph "During AI Session"
A[Agent discovers knowledge] --> B{memory_manage<br/>command: remember}
C[User shares context] --> D[capture_insight]
end
subgraph "PostgreSQL + pgvector"
B --> E[(memory_records)]
D --> E
end
subgraph "Automatic"
E --> F[Cross-enrichment middleware<br/>attaches memories to<br/>toolkit responses]
E --> G[Staleness watcher<br/>flags stale memories]
end
subgraph "Explicit Recall"
H[memory_recall] --> E
end
subgraph "Admin Curation"
E --> I[apply_knowledge<br/>promotes to DataHub]
end
Memory Types¶
Memories are classified by LOCOMO dimension for structured retrieval:
| Dimension | Purpose | Examples |
|---|---|---|
knowledge |
Factual/institutional | "We have two distinct selling seasons", "Test stores 9001-9099 are training environments" |
event |
Temporal/episodic | "On March 15 the analyst ran a Q1 sales rollup filtering out test stores" |
entity |
Entity attributes | "The customer_id column contains PII", "This table was migrated from Oracle in 2024" |
relationship |
Links between entities | "acme_legacy_sales is deprecated in favor of elasticsearch.sales" |
preference |
User preferences | "This analyst prefers SQL over natural language queries" |
Scoping¶
| Axis | Field | Purpose |
|---|---|---|
| User | created_by (email) |
Ownership. Users can only update/forget their own memories unless admin. |
| Persona | persona |
Visibility. Memories created under a persona are visible to that persona. Admin sees all. |
Tools¶
memory_manage¶
CRUD operations for memory records. Opt-in per persona (requires memory_* in tools.allow).
| Command | Purpose |
|---|---|
remember |
Create a new memory with optional embedding |
update |
Revise content, category, tags on an existing record |
forget |
Soft-delete (archive) a memory |
list |
Query memories with filters, persona-scoped by default |
review_stale |
List memories flagged as stale by the lineage watcher |
memory_recall¶
Multi-strategy retrieval for when cross-enrichment is not enough.
| Strategy | Method | LOCOMO Dimension |
|---|---|---|
entity |
Direct URN lookup | Single-hop recall |
semantic |
Hybrid vector + lexical ranking via pgvector, with automatic lexical-only fallback when the embedder is unavailable | Open-domain recall |
lexical |
Forced Postgres full-text keyword match (no embedding call) | Exact-term recall |
graph |
DataHub lineage traversal + entity lookup | Multi-hop reasoning |
auto (default) |
Runs semantic (hybrid/lexical) + graph in parallel, deduplicates | All dimensions |
Hybrid ranking¶
The semantic strategy fuses two signals per record: the embedding cosine similarity and a lexical full-text match flag, blended as 0.6 * semantic + 0.4 * lexical. This mirrors the api-gateway ranking precedent and materially improves recall on identifier-heavy content (entity URNs, column names, error codes) where pure vector search underweights an exact token. The vector arm is backed by an hnsw ANN index on memory_records.embedding; the lexical arm by a GIN index on to_tsvector('english', content).
Graceful degradation¶
When no embedding provider is configured (or it is down), semantic recall no longer errors. It falls back to lexical-only matching and labels the response so the degradation is not silent:
{
"strategy": "semantic",
"ranking": "lexical",
"degraded": true,
"note": "embedding provider unavailable; results are lexical-only (exact-term matches), not semantic",
"memories": [ ... ]
}
Lexical search also surfaces rows whose embedding is NULL (saved during an outage) that vector search would skip entirely. Every recall response carries a ranking field (hybrid, lexical, entity, or graph).
Embedding Backfill¶
Memory is a consumer of the shared index-jobs framework (source_kind = memory), the same backfill queue the api-catalog and tools corpora use. The synchronous embed on write is preserved (a just-saved memory stays immediately recallable), and a periodic reconciler converges the gaps it cannot cover off the request path:
- A memory saved while the embedder was down (
embedding IS NULL) is re-embedded automatically once the provider returns, with no manual re-save. - A provider model swap re-embeds rows stamped with the previous model (
embedding_modeldiffers from the current model). - The
memorykind appears on the admin Indexing dashboard with a real indexed/expected coverage ratio.
The write path stamps embedding_model and embedding_text_hash (SHA-256 of the content) alongside each vector, so a healthy row is never flagged as a gap and the worker's text-hash dedup skips re-embedding unchanged content.
capture_insight (existing, refactored)¶
Now writes to memory_records instead of the legacy knowledge_insights table. Creates memory records with insight-specific metadata (suggested_actions, related_columns). Generates embeddings via Ollama when available.
Ownership is keyed on the user's email (created_by), the same key memory_manage uses and the one the portal scopes by, so a person's insights and memories share an owner and both appear under their My Knowledge view. Insights captured before this was unified were keyed on the OIDC subject; the 000056_knowledge_owner_email_backfill migration rewrites those rows to the email last seen for that subject in audit_logs (stashing the original in metadata.legacy_created_by so it is reversible). Rows with no audit mapping are left unchanged.
apply_knowledge (existing, refactored)¶
Reads from memory_records via an adapter. Promotes curated memories into durable DataHub knowledge (context documents, glossary terms, tags, structured properties).
Cross-Enrichment¶
The existing bidirectional enrichment middleware automatically attaches relevant memories to toolkit responses. When a Trino query, DataHub lookup, or S3 operation returns results containing DataHub URNs, the middleware recalls memories linked to those entities and appends them as a memory_context content block.
No explicit memory_recall call is needed for this — it happens transparently on every enriched tool response.
Staleness Detection¶
A background watcher periodically checks active memories against DataHub entity state. When a referenced entity is deprecated or its schema changes, the memory is flagged as stale with a reason. Stale memories are excluded from default recall and surfaced via memory_manage(command='review_stale') for admin curation.
Correction Chains¶
When a memory is updated or superseded, the correction chain is tracked in metadata.superseded_by. This supports temporal reasoning: "X was said, then corrected to Y" has a clean signal path through the memory graph.
Relationship to Knowledge Capture¶
Memory is the universal store. An insight (captured via capture_insight) is a subtype of memory — a memory that may carry proposed catalog changes. But knowledge is broader than catalog mutations. Domain context like "we have two selling seasons" is institutional knowledge that does not map to a DataHub tag or description update. The apply_knowledge tool is where differentiation happens: it reviews memories and promotes the appropriate ones into durable DataHub entities.