Rate Limits

Mnemexa enforces two kinds of limits: a per-minute token bucket for burst protection, and per-billing-cycle unit quotas for plan enforcement. Both surface as 429 responses with a Retry-After header.

Per-minute rate limit (token bucket)

A Redis-backed token bucket runs per API key and per workspace. Each accepted request consumes one token; tokens refill at the configured rate.

SettingDefaultConfigurable
Refill rate (rpm_limit)30 requests / minuteYes, via plan or admin override
Burst capacity (burst_limit)15 requestsYes, via plan or admin override

Your workspace’s actual limits depend on your plan — check Settings → Plan in the dashboard, or call status and review your subscription.

When the bucket empties

You get a 429 Too Many Requests response with these headers:

Retry-After: 3
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 3

Wait Retry-After seconds, then retry. The Python SDK does not auto-retry 429s — it raises RateLimitError with the retry_after value populated so your code can decide how to handle the backoff. (Auto-retrying rate limits is harmful in fan-out workloads — it amplifies the problem.)

Per-cycle unit quotas (plan limits)

In addition to the per-minute throttle, your subscription has hard caps on total operations per billing cycle:

QuotaEndpoint
memory_write_limitPOST /v1/memory/store
memory_retrieve_limitPOST /v1/memory/retrieve

When you hit a quota:

  • Memory endpoints return 429.
  • The workspace’s workspace_status flips to limit_reached (visible via status).
  • optimize.health and status continue working — they’re free diagnostics.
  • The workspace resets to active at the start of the next billing cycle (or sooner if the admin grants an override).

Distinguishing a rate-limit 429 from a quota 429

Both return 429. The difference matters because retrying after Retry-After won’t help with a quota:

import mnemexa

client = mnemexa.Client()
try:
    client.memory.store(text="…")
except mnemexa.RateLimitError as exc:
    status = client.status()
    if status.workspace_status == "limit_reached":
        print(f"Plan quota exhausted — upgrade or wait for next cycle.")
    else:
        print(f"Rate limited — back off {exc.retry_after}s and retry.")

Per-API-key vs per-workspace

The per-minute token bucket has two layers:

  1. Per API key — each key has its own bucket. Two keys on the same workspace don’t compete for the same per-minute budget.
  2. Per workspace — a separate, larger bucket caps the total across all keys for the workspace.

The first failure across either layer triggers the 429. In practice, the per-key limit is what most callers hit; the per-workspace limit is a safety rail against runaway parallelism.

Free endpoints

status and optimize.health are free diagnostics — they don’t count against your plan’s unit quotas and are exempted from workspace-level lifecycle rejection (they work even when the workspace is suspended or limit_reached).

They still pass through the per-minute rate limiter, so a script polling either endpoint at >30 RPM will get throttled.

Don’t poll optimize.health at sub-minute intervals. The signals it computes refresh once per optimization sweep (every few minutes) — high-frequency polling won’t surface new data and will burn through your rate-limit budget.

Best practices

  • Catch RateLimitError and back off, don’t auto-retry. Especially in worker fleets — auto-retry on 429 amplifies the storm.
  • Check workspace_status periodically. A daily call to status catches limit_reached transitions before your application logic does.
  • Use optimize.health for memory-quality monitoring, not for connection health checks. Use status for the latter — it’s lighter.
  • Spread bursts. The burst capacity is intentionally smaller than RPM (15 vs 30) so you can’t pre-eat a minute’s budget in one second.

Upgrading

To raise limits, change your plan from Settings → Plan in the dashboard. Plan changes apply at next-billing-cycle for downgrades and immediately (pro-rated) for upgrades. The new rpm_limit_snapshot / burst_limit_snapshot / unit quotas take effect on the new subscription record — see Workspaces.