# tracking-setup — PLANNING

This is the running architecture + status doc. The original plan that drove the
build is at `/home/rian/.claude/plans/i-d-like-to-create-foamy-ocean.md`.

## v1 status (2026-05-07)

**v1 = analysis-only MVP**: crawl → analyze → review → approve flow, no Google
API writes. Built end-to-end in one session.

### What works

- ✅ Multi-tenant data model: Organizations / Clients / Users / RBAC.
- ✅ Internal auth (email + bcrypt password) with bootstrap-admin flow.
- ✅ Magic-link auth for client reviewers (single-use, 48-hour TTL).
- ✅ Playwright crawler — homepage + up to 25 same-origin pages, full-page
  screenshots, DOM snapshots, structured `interactive.json` (forms, CTAs,
  buttons, mailto/tel links, video embeds).
- ✅ Claude analyzer — single Opus call with system prompt cached, homepage
  screenshot for visual context, structured-JSON output mapped to Action rows.
- ✅ Internal review UI — edit title/summary/actions, delete actions, send to
  client.
- ✅ Client review UI — per-action approve/reject with comments, mobile-first
  CSS, status auto-flips to "approved" when every action has been decided.
- ✅ WeasyPrint PDF export of the approved (or in-flight) plan.
- ✅ Audit log per client (`/clients/{id}/audit`).
- ✅ Per-CrawlRun spend cap and page cap.
- ✅ `/healthz` for monitoring.

### What's pending in v1

- ⏳ **Email wiring (Resend or Postmark).** Magic links currently print to
  `srv-gw logs`. Pick a provider and wire `app/email/sender.py`.
- ⏳ **End-to-end verification** with `ANTHROPIC_API_KEY` set on a real client
  site. Required to validate the analyzer prompt produces useful output.
- ⏳ **PlusROI branding** in PDF + UI (logo, colors, copy). Default styling
  works; rebrand before showing real clients.
- ⏳ **Domain rebrand.** Currently on `tracking-setup.demoing.info` (gated by
  the demoing.info auth cookie — fine for staging). Move to e.g.
  `setup.plusroi.ca` before sending real client magic links.

### What's NOT in v1 (and planned for later)

| Phase | Scope |
|---|---|
| v2 | Google OAuth + GA4 Admin API + GTM API. "Apply" button — pick an approved Report, push tags/triggers/variables to a *draft* GTM workspace, mark GA4 conversions, surface a workspace diff. **GTM rule: never auto-publish.** |
| v3 | Google Ads conversion import + Search Console verification + brand-new GA4 / GTM container creation flows. |
| v4 | Public SaaS path (only if traction warrants) — Google Ads API standard-access application, OAuth verification, Stripe billing. |

## Architecture (v1)

Two layers, deliberately separated. (v1 only ships the analysis layer.)

**Analysis layer (read-only, AI-driven)**
- `app/crawler/playwright_runner.py` — sync Playwright crawl, runs in a
  background daemon thread. Saves per-page artifacts to `/data/crawls/<id>/`.
- `app/reports/builder.py::analyze_and_build_report` — single Claude Opus call,
  system prompt cached. Input: textual summary of all pages' interactive
  inventory + the homepage screenshot. Output: strict JSON matching a fixed
  schema, mapped 1:1 to `action` rows.
- Per-CrawlRun spend tracking (Anthropic usage → cents → DB column). Hard
  abort if `CRAWL_MAX_SPEND_USD` exceeded.

**Configuration layer (write, API-driven)** — *deferred to v2*

## Data model

See `app/db.py::init_db()` for the source of truth. Tables:

```
organization        master|single_client; name unique
google_account_ctx  stub in v1 (no OAuth yet); reserved for v2
client              N:1 organization; primary_url; permission_confirmed_by/at
user                role admin|internal|client; password_hash nullable
user_client_access  user × client × role(owner|editor|reviewer)
user_organization_access  user × organization
crawl_run           per Playwright run; status, spend, error
report              N:1 client, 1:1 crawl_run; status draft→sent→in_review→approved
action              N:1 report; kind, title, proposed_spec_json, status, sort_order
action_comment      thread per action
magic_link          token, expires_at, used_at, optional client_id_scope
audit_event         actor, client, kind, payload_json
```

## Notable design decisions

1. **Single Opus call (not per-page).** Per-page analyzer calls would be more
   token-efficient with caching, but the synthesis (deduping events across
   pages, prioritizing) lives in the model anyway — easier to do it all at once.
   Per-page calls become worth it if cost-per-crawl exceeds the spend cap on
   25-page sites.

2. **No real job queue.** Crawls run in `threading.Thread(daemon=True)`. For v1
   load (one or two concurrent crawls) this is fine. If we ever run >5
   concurrent crawls or need restart-resilient jobs, swap to RQ/Celery/Arq.

3. **Background-task ⇄ DB.** The crawl thread doesn't share a DB connection
   with the request handler — it opens its own via `db.execute()` /
   `db.query()`. Each call opens a connection, executes, closes — fine for
   SQLite WAL mode.

4. **Magic-link tokens stored as plain values in DB.** Acceptable because
   they're 32-byte URL-safe random + DB has no other readers. If we move to
   Postgres or share the DB more broadly, hash before storing.

5. **No real markdown rendering.** `notes_md` and `summary_md` are rendered as
   `<pre class="notes-md">` (preserving line breaks). Adding `markdown-it` or
   `mistune` is a v1.1 enhancement.

## Open questions (sensible defaults picked, override anytime)

1. **Brand / domain.** Default: `tracking-setup.demoing.info` for staging;
   rebrand before showing real clients.
2. **Email provider.** Default: Resend. Confirm before wiring.
3. **PDF stack.** Default: WeasyPrint. Working.
4. **Client editing of actions.** Default v1: approve/reject/comment only —
   no freeform spec editing for clients. Add in v1.1 if reviewers ask.
5. **Crawler permission.** Soft policy: client-creation form has a required
   "I have authorization to crawl this site" checkbox; timestamp + user
   stored in `client.permission_confirmed_by/at`.

## Verification plan (next session)

1. Set `ANTHROPIC_API_KEY` in `.env`; `srv-gw deploy --project tracking-setup`.
2. Claim bootstrap admin via the magic link in logs.
3. Create Organization (PlusROI master), Client (a site you own e.g. plusroi.ca).
4. Hit "Start crawl"; wait 1-5 min for completion; inspect:
   - `/data/crawls/<id>/pages/*/screenshot.png` — full pages captured?
   - `/data/crawls/<id>/pages/*/interactive.json` — forms/CTAs found?
   - `/data/crawls/<id>/analyzer_raw.json` — Claude's raw response.
   - `/reports/<id>` — Action rows look reasonable?
5. Compare AI output against your manual audit of the same site. Note false
   positives / negatives. Iterate on `SYSTEM_PROMPT` in `reports/builder.py`.
6. Send report to a personal email; receive magic link from logs; review as
   "client"; approve some, reject one with a comment.
7. Download PDF; check rendering.
