PII & Privacy Primitives¶
Added: v0.61.0 (Phase 1 of the analytics / consent / privacy design —
see docs/superpowers/specs/2026-04-24-analytics-privacy-design.md.)
Dazzle treats personal data as a first-class DSL concept. Two primitives carry the weight:
pii()— a field modifier that classifies personal data by category and sensitivity.subprocessor— a top-level construct declaring a third-party that handles personal data on the app's behalf.
Together they feed every compile-time privacy artefact: the privacy page, the GDPR Record of Processing Activities (ROPA), the cookie policy, and the SOC 2 / ISO 27001 subprocessor list. They also drive the runtime PII stripping applied before any analytics event leaves the framework.
The pii() modifier¶
Syntax¶
entity User "User":
id: uuid pk
email: str(200) pii # bare → standard
phone: str(50) pii(category=contact) # category only
dob: date pii(category=identity, sensitivity=high) # both kwargs
ssn: str(20) pii(category=identity, sensitivity=special_category)
notes: text pii(category=freeform) required # with other modifiers
Categories¶
Closed vocabulary — the parser rejects any value outside this list.
| Category | Meaning | Examples |
|---|---|---|
contact |
Ways to reach a subject | email, phone, postal address |
identity |
Identifying attributes | name, DOB, national ID, passport |
location |
Where a subject is | precise geolocation, IP address |
biometric |
Biological templates | fingerprint, face template |
financial |
Money-related | bank account, card number, salary |
health |
Medical data (GDPR Art. 9) | diagnoses, prescriptions |
freeform |
Unstructured text that may contain PII | notes, descriptions |
behavioral |
Inferred from usage | preferences, browsing history |
Sensitivities¶
| Sensitivity | Meaning | Handling |
|---|---|---|
standard |
Ordinary personal data (default) | Stripped from analytics unless opted-in |
high |
Higher-risk (DOB, full name, precise location) | Same as standard; flagged in audit |
special_category |
GDPR Art. 9 / 10 — health, biometric, criminal, political, religious | Second opt-in gate, mandatory audit-on-read |
What the annotation does¶
- Compile-time: feeds the privacy page, ROPA, cookie policy generators.
- Runtime (analytics): PII-annotated values are stripped from events
unless the surface opts in:
analytics: include_pii: [email](per-surface whitelist — no blanket app-level opt-in). - RBAC: unchanged — PII annotation is orthogonal to
permit:/scope:. - Audit:
special_categoryfields receive mandatory audit logging on read. - GDPR exports: right-to-access / right-to-portability handlers auto-populate from annotated fields.
Relationship to the sensitive modifier¶
Dazzle's existing sensitive modifier remains as a coarse boolean flag. The
pii() modifier is richer — a field may be both sensitive AND pii(); they
are independent. Over time, framework code should migrate to reading .pii
instead of .is_sensitive for PII-aware behaviour.
The subprocessor construct¶
Syntax¶
subprocessor google_analytics "Google Analytics 4":
handler: "Google LLC"
handler_address: "1600 Amphitheatre Parkway, Mountain View, CA 94043, USA"
jurisdiction: US
data_categories: [pseudonymous_id, device_fingerprint, page_url]
retention: "14 months"
legal_basis: legitimate_interest
consent_category: analytics
dpa_url: "https://business.safety.google/adsprocessorterms/"
scc_url: "https://business.safety.google/sccs/"
cookies: [_ga, _ga_*, _gid]
purpose: "Product and web usage analytics."
Required keys¶
handler— legal entity processing the datajurisdiction— ISO country code or region (EU, EEA, UK, US, APAC)retention— free-form retention period stringlegal_basis— GDPR Article 6 basis (see below)consent_category— Dazzle-native consent category (see below)
Optional keys¶
handler_address— postal address of the handler entitydata_categories— list ofDataCategoryvalues (closed vocabulary)dpa_url— link to the signed Data Processing Agreementscc_url— link to Standard Contractual Clauses (required whenjurisdictionis outside EEA)cookies— cookie names / glob patterns this subprocessor setspurpose— short human description
Legal basis (GDPR Article 6)¶
consentcontractlegal_obligationvital_interestspublic_tasklegitimate_interest
Consent categories¶
Dazzle uses four consent categories that map cleanly to Consent Mode v2:
| Dazzle category | Meaning | Consent Mode v2 equivalent |
|---|---|---|
analytics |
Product and web usage telemetry | analytics_storage |
advertising |
Ad targeting and measurement | ad_storage + ad_user_data + ad_personalization |
personalization |
User-preference customisation | ad_personalization / functionality_storage |
functional |
Essential for the service to work | functionality_storage + security_storage |
Data categories¶
Closed vocabulary used inside data_categories: [...]. Overlaps with PII
categories where meaningful; extended with web-specific values.
contact,identity,location,behavioral,financial,healthdevice_fingerprint— browser/device signals used for trackingpseudonymous_id— unique IDs that aren't directly identifyingpage_url— URLs of pages visitedsession_data— within-session timing / flowcontent— content of user messages (email/SMS bodies, etc.)
Framework-provided subprocessor registry¶
Dazzle ships declarations for common third-party services. App-level
declarations override registry entries by matching name.
| Name | Label | Jurisdiction | Consent category |
|---|---|---|---|
google_analytics |
Google Analytics 4 | US | analytics |
google_tag_manager |
Google Tag Manager | US | analytics |
plausible |
Plausible Analytics | EU | analytics |
stripe |
Stripe Payments | US | functional |
twilio |
Twilio | US | functional |
sendgrid |
SendGrid | US | functional |
aws_ses |
Amazon SES | US | functional |
firebase_cloud_messaging |
Firebase Cloud Messaging | US | functional |
See src/dazzle/compliance/analytics/registry.py for the full declaration
including DPA / SCC links.
The dazzle analytics audit command¶
dazzle analytics audit # human-readable table (default)
dazzle analytics audit --format json # machine-readable
dazzle analytics audit --project-dir ./my # audit a different project
Produces two reports:
-
PII annotation audit — flags entity fields whose names strongly suggest PII (email, phone, dob, ssn, etc.) but lack a
pii()annotation. Uses a conservative substring-heuristic — false-positives are expected, false- negatives are the real failure mode. -
Subprocessor audit — lists every subprocessor in effect (framework defaults + app-declared), flags collisions where an app override differs from the framework default in
consent_category/jurisdiction/legal_basis, and marks subprocessors that require SCCs for EU→non-EU data transfers.
The audit never fails the build. Treat it as advisory. Future phases will add enforcement (e.g. require SCC URL when transfer detected).
Analytics provider abstraction (Phase 3)¶
Declare analytics providers in the DSL:
analytics:
providers:
gtm:
id: "GTM-XXXXXX"
plausible:
domain: "example.com"
consent:
default_jurisdiction: EU
consent_override: denied
The framework resolves the registered ProviderDefinition for each provider
name, unions its required CSP origins into the response's
Content-Security-Policy, and renders its script snippets into the HTML —
all gated on the current consent state. GTM bootstraps even under deny-
defaults so Consent Mode v2 can signal the container on later grant;
Plausible only loads when analytics consent is granted.
Provider registry¶
Shipped in v0.61.0:
| Name | Label | Consent category | Required params | Links to subprocessor |
|---|---|---|---|---|
gtm |
Google Tag Manager | analytics | id |
google_tag_manager |
plausible |
Plausible Analytics | analytics | domain |
plausible |
Adding a provider: define a ProviderDefinition in
src/dazzle/compliance/analytics/providers/registry.py, add the matching
Jinja snippet templates, and update the linked_subprocessor_name to point
at the matching subprocessor declaration.
Client event vocabulary (Phase 4)¶
Dazzle auto-emits six structured events on window.dataLayer when analytics
is enabled. The vocabulary is versioned (dz/v1); additive changes stay in
v1, breaking changes cut a new version.
| Event | Fires on | Core parameters |
|---|---|---|
dz_page_view |
htmx swap + initial load | workspace, surface |
dz_action |
click on [data-dz-action] |
action_name, entity, surface |
dz_transition |
state-machine transition | entity, from_state, to_state, trigger |
dz_form_submit |
form POST 2xx | form_name, entity, surface |
dz_search |
filterable_table input / swap | surface, entity, result_count |
dz_api_error |
htmx 4xx/5xx response | status_code, surface |
Every event carries dz_schema_version="1" and (in multi-tenant apps)
dz_tenant. Full schema in src/dazzle/compliance/analytics/event_vocabulary.py;
drift is pinned by tests/unit/test_event_vocabulary_v1.py.
Disable semantics¶
Analytics emission is suppressed when:
DAZZLE_ENVisdev/development/testDAZZLE_MODEistrial/qa
Override with DAZZLE_ANALYTICS_FORCE=1 (for framework devs testing the
stack). This keeps automated agent runs from polluting production
analytics.
PII safety of events¶
The JS bus reads from data-dz-* attributes only, never from input
values. The server-side template layer decides what lands in those
attributes, honouring pii() annotations. Entity IDs, user IDs, email
addresses never appear in events unless a surface explicitly opts-in.
Server-side analytics sinks (Phase 5)¶
Server-side analytics forward business events from the framework's event bus to provider ingest endpoints (GA4 Measurement Protocol, Plausible Events API, etc.). Runs alongside the client-side dataLayer — captures events that don't depend on browser JS or user consent (state transitions, audit events, completed orders).
analytics:
server_side:
sink: ga4_measurement_protocol
measurement_id: "G-XXXXXX"
bus_topics: [audit.*, transition.**, order.completed]
sink: name of a registeredAnalyticsSink. Ships:ga4_measurement_protocol.measurement_id: provider-specific default; per-tenant overrides in Phase 6.bus_topics: glob list (*= one segment,**= any remainder).
API secret handling¶
The GA4 API secret never lives in the DSL or TOML. It comes from the
DAZZLE_GA4_API_SECRET environment variable at runtime. A sink
constructed without the secret logs a warning and drops events — it does
not fail the publisher.
Reliability¶
Each sink emission:
- 2xx response → success, increment
metrics.success_total. - 4xx response → log + drop;
metrics.dropped_total. (Bad event shape won't be fixed by retry.) - 5xx / network error → retry with exponential backoff, up to 3 attempts;
metrics.failure_totalif exhausted. - All outcomes are non-fatal to the originating publisher. Use metrics to monitor delivery reliability.
PII stripping¶
When the bridge knows the entity schema (via entity_specs_by_name),
event payloads pass through strip_pii() before the sink sees them.
pii()-annotated fields are dropped; opt-in happens per-surface as in
Phase 1.
Per-tenant analytics resolution (Phase 6)¶
Multi-tenant Dazzle apps can route analytics per tenant — tenant A fires to GTM-ABC, tenant B to GTM-DEF, tenant C has analytics disabled entirely. Every analytics-touching request flows through a single resolver call.
Register a resolver¶
from dazzle.compliance.analytics import (
set_tenant_analytics_resolver,
TenantAnalyticsConfig,
)
from dazzle.core.ir import AnalyticsProviderInstance
def resolve(request):
host = request.url.hostname
if host == "acme.example.com":
return TenantAnalyticsConfig(
tenant_slug="acme",
providers=[
AnalyticsProviderInstance(
name="gtm", params={"id": "GTM-ACMEXX"}
),
],
data_residency="US",
)
# Default: no analytics for unknown tenants
return TenantAnalyticsConfig()
set_tenant_analytics_resolver(resolve)
Call this once during app startup. The default app-wide resolver is installed automatically — your custom one wins.
Fail-closed contract¶
If the resolver raises or returns a non-TenantAnalyticsConfig, the
framework silently falls back to the strict default (no analytics,
default consent defaults). It never propagates the error to the
publisher. Use TenantAnalyticsConfig(providers=[]) to explicitly
disable analytics for a tenant.
Cross-tenant isolation¶
The CSP header, script tags, consent cookie defaults, and
data-dz-tenant body attribute are all resolved per-request. Two
tenants on the same process receive independent configs — tenant A's
response never leaks tenant B's GTM ID into the Content-Security-Policy
header.
What's resolved per-tenant¶
providers— which analytics scripts load (+ CSP origins)data_residency— EU/UK → denied defaults; else grantedconsent_override— hard override for consent defaultsprivacy_page_url/cookie_policy_url— banner link targetstenant_slug— populatesdata-dz-tenantfor event tagging
What's NOT in Phase 6¶
Tenant-resolution is a runtime concern. The framework doesn't dictate your tenant entity shape, your hostname → tenant mapping, or your caching layer. The resolver callable is the boundary. If you add a database-backed tenant lookup, wrap it with your own cache — the framework calls the resolver on every analytics-touching request.
See the design spec at docs/superpowers/specs/2026-04-24-analytics-privacy-design.md
for the full roadmap.