Rate Limits
Mnemexa enforces two kinds of limits: a per-minute token bucket for burst protection, and per-billing-cycle unit quotas for plan enforcement. Both surface as 429 responses with a Retry-After header.
Per-minute rate limit (token bucket)
A Redis-backed token bucket runs per API key and per workspace. Each accepted request consumes one token; tokens refill at the configured rate.
| Setting | Default | Configurable |
|---|---|---|
Refill rate (rpm_limit) | 30 requests / minute | Yes, via plan or admin override |
Burst capacity (burst_limit) | 15 requests | Yes, via plan or admin override |
Your workspace’s actual limits depend on your plan — check Settings → Plan in the dashboard, or call status and review your subscription.
When the bucket empties
You get a 429 Too Many Requests response with these headers:
Retry-After: 3
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 3
Wait Retry-After seconds, then retry. The Python SDK does not auto-retry 429s — it raises RateLimitError with the retry_after value populated so your code can decide how to handle the backoff. (Auto-retrying rate limits is harmful in fan-out workloads — it amplifies the problem.)
Per-cycle unit quotas (plan limits)
In addition to the per-minute throttle, your subscription has hard caps on total operations per billing cycle:
| Quota | Endpoint |
|---|---|
memory_write_limit | POST /v1/memory/store |
memory_retrieve_limit | POST /v1/memory/retrieve |
When you hit a quota:
- Memory endpoints return
429. - The workspace’s
workspace_statusflips tolimit_reached(visible via status). optimize.healthandstatuscontinue working — they’re free diagnostics.- The workspace resets to
activeat the start of the next billing cycle (or sooner if the admin grants an override).
Distinguishing a rate-limit 429 from a quota 429
Both return 429. The difference matters because retrying after Retry-After won’t help with a quota:
import mnemexa
client = mnemexa.Client()
try:
client.memory.store(text="…")
except mnemexa.RateLimitError as exc:
status = client.status()
if status.workspace_status == "limit_reached":
print(f"Plan quota exhausted — upgrade or wait for next cycle.")
else:
print(f"Rate limited — back off {exc.retry_after}s and retry.")
Per-API-key vs per-workspace
The per-minute token bucket has two layers:
- Per API key — each key has its own bucket. Two keys on the same workspace don’t compete for the same per-minute budget.
- Per workspace — a separate, larger bucket caps the total across all keys for the workspace.
The first failure across either layer triggers the 429. In practice, the per-key limit is what most callers hit; the per-workspace limit is a safety rail against runaway parallelism.
Free endpoints
status and optimize.health are free diagnostics — they don’t count against your plan’s unit quotas and are exempted from workspace-level lifecycle rejection (they work even when the workspace is suspended or limit_reached).
They still pass through the per-minute rate limiter, so a script polling either endpoint at >30 RPM will get throttled.
Don’t poll optimize.health at sub-minute intervals. The signals it computes refresh once per optimization sweep (every few minutes) — high-frequency polling won’t surface new data and will burn through your rate-limit budget.
Best practices
- Catch
RateLimitErrorand back off, don’t auto-retry. Especially in worker fleets — auto-retry on 429 amplifies the storm. - Check
workspace_statusperiodically. A daily call to status catcheslimit_reachedtransitions before your application logic does. - Use optimize.health for memory-quality monitoring, not for connection health checks. Use status for the latter — it’s lighter.
- Spread bursts. The burst capacity is intentionally smaller than RPM (15 vs 30) so you can’t pre-eat a minute’s budget in one second.
Upgrading
To raise limits, change your plan from Settings → Plan in the dashboard. Plan changes apply at next-billing-cycle for downgrades and immediately (pro-rated) for upgrades. The new rpm_limit_snapshot / burst_limit_snapshot / unit quotas take effect on the new subscription record — see Workspaces.