Memory Pipeline

Every call to memory.store runs through a multi-stage pipeline before the text reaches the vector database. This page explains each stage so you can reason about why your memory ended up with the importance, dedup action, categories, and temporal classification it did.

Stages

Raw text
  │
  ▼
1. PII detection ──► reject or redact
  │
  ▼
2. Embedding (text-embedding-3-small)
  │
  ▼
3. Semantic deduplication
   ├─ ≥ 0.98 cosine    → duplicate_exact (no new row)
   ├─ ≥ 0.70 cosine    → LLM merge decision
   └─ < 0.70 cosine    → stored_new
  │
  ▼
4. Importance scoring (LLM)
  │
  ▼
5. Temporal classification (persistent vs. temporal)
  │
  ▼
6. Category extraction (multi-tag)
  │
  ▼
7. pgvector insert (HNSW index)

1. PII detection

A regex-based detector inspects the text for passwords, API keys, credit card numbers, SSNs, phone numbers, and other sensitive patterns. Behaviour:

Detected and redactable → the offending substring is replaced with a placeholder. The redacted text continues through the pipeline.
Detected but not safely redactable (keyword-only flags like “password” without a clear span) → request rejected with 422 pii_rejected.

The PII detector is a defense layer, not a substitute for client-side hygiene. Don’t rely on it to sanitize untrusted input — pre-filter on your side too.

2. Embedding

The text (post-redaction) is embedded using OpenAI text-embedding-3-small. The vector is used for both dedup comparison (stage 3) and stored alongside the memory for future retrieval queries.

3. Semantic deduplication

The embedding is compared against existing memories in the workspace by cosine similarity. Two thresholds govern behaviour:

Cosine similarity	Action	`dedup.action` value
≥ 0.98	Exact-enough duplicate. No new row inserted. The response returns the existing memory’s ID.	`duplicate_exact`
0.70 – 0.98	Near-duplicate. An LLM compares old and new text and decides: merge into existing, replace with richer version, or store both.	`updated_existing` or `stored_new`
< 0.70	Distinct enough. New memory is inserted.	`stored_new`

When dedup.action is duplicate_exact or updated_existing, the response’s dedup.existing_id field carries the pre-existing memory’s ID.

The 0.70 threshold is the interesting one. Pure cosine isn’t reliable enough to merge — “client likes morning calls” and “client likes morning meetings” might score 0.85 but mean the same thing, while “Postgres uses pgvector” and “Postgres is a database” might score 0.75 and mean different things. The LLM disambiguates.

4. Importance scoring

An LLM rates the memory’s business value. The output is an integer surfaced as the importance field on the response. Higher numbers indicate higher-impact memories.

Importance is used both as a ranking factor in retrieval (see Hybrid Scoring) and as a decay signal — low-importance, stale memories are candidates for cleanup. Importance is not a hard quota: low-importance memories aren’t blocked from storage, they just rank lower.

5. Temporal classification

The same LLM pass also classifies the memory as:

Type	Meaning	Examples
`persistent`	A durable fact, preference, or rule. No expiry.	”Client prefers async standups”, “Postgres is the primary database”.
`temporal`	Time-bound state. Has (or implies) an expiry.	”Meeting at 3pm tomorrow”, “On-call rotation through Friday”.

For temporal memories with parseable dates, temporal.valid_until carries the expiry as an ISO 8601 timestamp. Expired temporal memories rank lower in retrieval but aren’t deleted — they remain in the store as historical context.

6. Category extraction

The LLM emits a list of semantic tags — ["project_management", "team_workflow", "database"] etc. Categories surface on both the store response and retrieve response (under meta.categories when present) and are intended for downstream filtering and analytics.

The category vocabulary isn’t fixed — the LLM derives tags from the content itself. Common categories tend to cluster across a workspace as it grows.

7. Storage

Finally, the memory is inserted into the workspace’s agent_memory table:

Embedding stored in a pgvector column with an HNSW index for fast cosine queries.
Importance, categories, and temporal classification stored in the meta JSONB column.
agent_id set to the workspace ID (logical reference — there’s no foreign key, by design, to keep the memory layer portable).

What happens after storage

The memory is immediately available to memory.retrieve. It will also start participating in:

Hybrid retrieval scoring — see Hybrid Scoring.
Health diagnostics — see Self-Optimization.
Automatic decay — temporal memories whose valid_until has passed, plus persistent memories that haven’t been retrieved within the workspace’s decay window, are flagged as stale.

There’s no manual delete endpoint on the public API. Memory cleanup is curated through the dashboard’s optimization recommendations.

← Previous status Next Hybrid Scoring →