Performance envelope¶
Where the Dazzle runtime is fast, where it degrades, and by how much — measured on a reproducible benchmark rather than estimated.
This page reports the results of the SP6 benchmark (evaluator briefing #9). The
harness is committed at
benchmarks/; the
raw numbers cited here come from
benchmarks/results/results.md
and results.json.
Scope & method¶
The benchmark boots examples/invoice_ops
— a multi-tenant invoice app with six entity tables and a tenant_id scope
predicate on every surface — against a pre-seeded PostgreSQL database and
measures four representative probes:
| Probe | What it calls | What it isolates |
|---|---|---|
list |
GET /invoices |
Scope-filtered paginated list (full auth + predicate path) |
read |
GET /invoices/{id} |
Single-row primary-key lookup with tenant scope check |
search |
GET /invoices?q=<term> |
LIKE-based search within tenant scope |
aggregate |
Repository.aggregate |
Raw GROUP BY status SQL (no HTTP overhead) |
Run parameters (from results.json):
- Host:
mini, PostgreSQL: 17.9 (Homebrew) - Tenants: 3 — the minimum that forces the planner to evaluate the
tenant_idpredicate at ~33% selectivity rather than eliding it - Iterations: 200 timed calls per probe (first 10 discarded as warm-up); p50 / p95 / p99 reported
In-process measurement. The app is booted in-process via the Dazzle ASGI test transport — there is no network hop. This is deliberate: it measures the framework's request path (auth, scope-predicate compile, repository, SQL, response marshaling) without the variance of an external HTTP layer. Real deployments add network latency on top of these numbers.
Two schema configs. Every probe is run against two schemas:
default— the framework-generated schema exactly assrc/dazzle/back/runtime/sa_schema.pyemits it: primary-key indexes, FK constraints, and unique constraints. No index ontenant_idor FK columns.indexed— thedefaultschema plusbenchmarks/indexes.sql: single-column b-tree indexes on everytenant_idcolumn and every FK column.
Comparing the two answers a concrete question: does adding the obvious indexes change the latency?
The operating envelope¶
Latency in milliseconds. Scale is invoices per tenant; with 3 tenants the
Invoice table holds 3× that many rows (the 1,000,000 row = 3,000,000 total
Invoice rows).
list — GET /invoices¶
| Scale | default p50 |
default p95 |
default p99 |
indexed p50 |
indexed p95 |
indexed p99 |
|---|---|---|---|---|---|---|
| 1,000 | 15.0 | 16.6 | 23.5 | 14.1 | 16.4 | 22.8 |
| 10,000 | 15.8 | 18.3 | 27.2 | 15.0 | 17.9 | 23.9 |
| 100,000 | 26.5 | 29.4 | 34.4 | 23.1 | 26.2 | 32.0 |
| 1,000,000 | 79.8 | 84.9 | 87.8 | 79.9 | 84.9 | 86.9 |
read — GET /invoices/{id}¶
| Scale | default p50 |
default p95 |
default p99 |
indexed p50 |
indexed p95 |
indexed p99 |
|---|---|---|---|---|---|---|
| 1,000 | 16.9 | 21.2 | 26.5 | 15.9 | 18.8 | 22.1 |
| 10,000 | 15.8 | 19.2 | 22.9 | 16.1 | 19.2 | 24.3 |
| 100,000 | 16.2 | 18.5 | 24.1 | 16.1 | 19.2 | 24.8 |
| 1,000,000 | 16.1 | 18.7 | 24.5 | 17.5 | 18.6 | 23.0 |
search — GET /invoices?q=<term>¶
| Scale | default p50 |
default p95 |
default p99 |
indexed p50 |
indexed p95 |
indexed p99 |
|---|---|---|---|---|---|---|
| 1,000 | 14.2 | 16.7 | 17.4 | 14.2 | 16.2 | 23.1 |
| 10,000 | 15.8 | 18.7 | 24.1 | 15.4 | 19.2 | 23.3 |
| 100,000 | 26.7 | 29.3 | 33.7 | 22.9 | 25.8 | 32.1 |
| 1,000,000 | 80.2 | 83.9 | 87.9 | 79.1 | 84.0 | 85.4 |
aggregate — Repository.aggregate (GROUP BY status)¶
| Scale | default p50 |
default p95 |
default p99 |
indexed p50 |
indexed p95 |
indexed p99 |
|---|---|---|---|---|---|---|
| 1,000 | 2.5 | 2.9 | 3.0 | 2.7 | 3.0 | 3.2 |
| 10,000 | 6.3 | 7.3 | 10.3 | 6.2 | 7.2 | 7.6 |
| 100,000 | 20.3 | 21.3 | 22.0 | 20.3 | 21.4 | 22.3 |
| 1,000,000 | 141.4 | 143.2 | 144.5 | 140.4 | 142.0 | 142.4 |
Where it degrades¶
Two findings, both honest and both load-bearing.
1. The runtime degrades roughly linearly with dataset size¶
Three of the four probes climb with row count:
listp95: ~17 ms (1k) → ~18 ms (10k) → ~29 ms (100k) → ~85 ms (1M).searchp95: ~17 ms → ~19 ms → ~29 ms → ~84 ms — the same curve.aggregatep95: ~3 ms → ~7 ms → ~21 ms → ~143 ms — steeper, because it scans the whole table.readis the exception: flat at ~18–21 ms p95 at every scale. A single-row lookup on the primary key does not degrade — the PK index Dazzle already emits is doing its job.
So the practical envelope: for an entity in the tens of thousands of rows per tenant range, scope-filtered list and search stay comfortably under ~30 ms p95. At a million rows per tenant (3M total) the same calls reach ~85 ms and a full-table aggregate reaches ~143 ms. These are healthy numbers for a single request, but they grow with the data, and the framework emits no index that arrests the growth.
2. Single-column tenant_id / FK indexes do not change this¶
The indexed config — single-column b-tree indexes on every tenant_id and FK
column — is within measurement noise of default at every scale. At 1M
invoices per tenant the two configs are effectively identical: list p95 84.9
vs 84.9, search 83.9 vs 84.0, aggregate 143.2 vs 142.0. The small apparent
gap at 100k (list 29.4 vs 26.2) is timing noise — it vanishes at 1M where the
signal is 10× larger.
Why the single-column indexes don't help is part structural certainty, part honest open question.
Structurally certain:
searchmatches with a leading-wildcardLIKE '%term%'. A plain b-tree cannot serve a leading-wildcard pattern under any conditions, so thetenant_idb-tree in theindexedconfig is simply irrelevant to it.aggregateis a full-tableGROUP BY status— it reads every row by definition, so no single-column index changes its cost.readis a primary-key lookup, already served optimally by the PK index Dazzle emits — which is why it stays flat at every scale.- A
listsurface that applies a default sort (ORDER BY created_at DESC LIMIT n) needs an index ordered bycreated_at; a single-columntenant_idindex cannot satisfy that ordering, and Dazzle emits nocreated_atindex.
Honest open question: for the scope-filtered list and search probes the
benchmark establishes the outcome — a single-column tenant_id b-tree does
not move end-to-end p95 at any scale — but not a complete per-component cost
breakdown. Isolated EXPLAIN of individual sub-queries shows plan-level
differences that do not translate into the measured request latency, and the
single-threaded harness does not attribute the remaining wall-clock. The
load-bearing claim here is the measured outcome, not a mechanism.
One factor that is clear: the benchmark runs 3 tenants, so tenant_id
matches ~⅓ of the table — low selectivity, where an index earns little. An app
with hundreds of tenants has a far more selective predicate and would benefit
more from indexing it; the benchmark deliberately measures the low-tenant-count
case.
The real lever¶
The fix for a sorted list path is a composite (tenant_id, created_at)
index — one index that covers both the scope predicate and the sort, so the
planner can serve WHERE tenant_id = $1 ORDER BY created_at DESC LIMIT n
straight from the index with no sort node. For search, the lever is a PostgreSQL full-text
(tsvector + GIN) index, not a plain b-tree. Neither is what
benchmarks/indexes.sql contains, and — more importantly — neither is what the
framework's schema builder generates today. This gap is tracked as
issue #1202.
Until the schema builder closes that gap, an app expecting large per-tenant tables should add the composite index by hand via an Alembic migration (ADR-0017).
Boot & surface complexity¶
examples/invoice_ops has 7 entities and ~24 entity/surface definitions and
boots in-process for every benchmark cell with no measurable warm-up penalty —
the first 10 probe iterations are discarded and the p50/p99 spread stays tight,
which means boot and schema reflection complete well before the timed window.
The benchmark does not isolate a dedicated boot-time-vs-entity-count number; for
runtime tracing of boot phases and slow framework stages, use dazzle perf
(see Performance observability).
What is not benchmarked¶
The envelope above covers single-request read latency on one entity. It does not cover:
- Write /
INSERTthroughput — the harness seeds via PostgreSQLCOPYand only times reads. Insert and update latency under load is unmeasured. - Concurrent-user capacity — the harness is single-threaded; it issues one request at a time. It says nothing about throughput under N concurrent clients, connection-pool saturation, or lock contention.
- Events per day / workflow executions per day — the event subsystem, job loop, and process executor are not exercised. See the Observability Guide for their operational surface.
- JOIN-heavy multi-table queries — all four probes are single-table-dominant
on
Invoice. Theindexedconfig includes FK-column indexes, but because no probe exercises a multi-table JOIN, the value of FK indexes for JOIN paths is not measured here. An app with heavy relational fan-out should benchmark its own JOIN paths before drawing conclusions.
Do not over-read the envelope: it is a clean, reproducible baseline for scope-filtered single-entity reads, not a capacity model for a whole deployment.
Reproducing it¶
The benchmark is fully reproducible. See
benchmarks/README.md
for prerequisites and the full sweep command. In short:
createdb dazzle_invoice_ops_bench
python -m benchmarks.run # full sweep, all scales, both configs
python -m benchmarks.run --scales 1000 # fast smoke test
Results are written to benchmarks/results/results.json and results.md.
Companion reading: Performance observability covers
dazzle perf local tracing for diagnosing a slow path in your own app; the
Scaling Guide covers production infrastructure
configuration.