Concepts
Categories
Every finding belongs to one of six categories. Each is produced by a dedicated analyzer that runs on the pages where it's relevant. Here's what each one observes — and why patient-facing pages weigh more.
Tracking & Third-Party Exposure · 30%
An inventory of third-party scripts and pixels — analytics, advertising, session-replay tools, live-chat widgets, and tracking embeds — matched against a maintained signature catalog. The differentiator: the same tracker is flagged at much higher severity on a page classified as intake_form, appointment, or portal than on a generic marketing page. Consent-gated scripts are reported separately at reduced severity.
What it observes
- GA4 / Universal Analytics
- Meta (Facebook) Pixel
- TikTok / LinkedIn / Pinterest / Reddit and other ad-network tags
- Hotjar / FullStory and other session-replay tools
- Live-chat widgets (Intercom, Drift, Tawk.to, Tidio, Crisp, LiveChat, Zendesk)
- Cookie-mode YouTube embeds
Privacy Policy & Disclosures · 20%
An analysis of the privacy-policy page (located by the crawler) against a fixed disclosure schema. It checks whether the policy exists and whether it discloses the things a patient-facing site usually should.
What it observes
- Policy exists and is reachable
- Mentions HIPAA / PHI
- Discloses third-party sharing
- Discloses analytics / advertising use
- Has a last-updated date and contact method
Forms & PHI Exposure · 20%
An enumeration of forms on classified pages, with field-name heuristics for PHI-shaped inputs (date of birth, SSN, insurance ID, symptoms, medications, reason for visit) and checks on how those forms transmit data. Severity reflects what the channel can actually leak: typed entries in a URL score harder than fixed dropdown choices.
What it observes
- method="GET" on a PHI-shaped form
- mailto: form actions (submissions as plain email)
- Form action pointing to a third-party domain
- Missing autocomplete controls on sensitive fields
- Form submitted over plain HTTP
Transport Security (TLS/HTTPS) · 15%
Transport-layer checks using the standard library: certificate validity and expiry, negotiated and accepted protocol versions, hostname match, the plain-HTTP redirect, and detection of mixed content (HTTP sub-resources loaded on an HTTPS page).
What it observes
- Expired or soon-to-expire certificate
- Hostname / certificate mismatch
- Outdated TLS protocol negotiated, or legacy TLS (1.1 and older) still accepted
- Plain-HTTP version not redirecting to HTTPS
- Mixed content on a secure page
Security Headers · 10%
Presence and configuration of the security response headers that harden a site against common web attacks, plus the flags on cookies the site sets. Policies delivered via <meta> tags are recognized.
What it observes
- Strict-Transport-Security
- Content-Security-Policy
- X-Frame-Options / X-Content-Type-Options
- Referrer-Policy / Permissions-Policy
- Cookies without Secure, session cookies without HttpOnly
Infrastructure Hygiene · 5%
Passive, standard-crawler-equivalent checks: a small set of well-known paths, version disclosure on the root page, and DNS-only checks of the domain's email anti-spoofing records. Deliberately limited — no brute forcing, no auth bypass, nothing construable as intrusion testing.
What it observes
- Exposed /.git/HEAD or /.env
- Open directory listings
- Common backup-file patterns (backup.sql, site.zip)
- CMS / server version disclosure
- Missing SPF or DMARC records (patient-phishing exposure)
Why patient-facing pages weigh more
The crawler classifies each page (privacy_policy, intake_form, appointment, portal, general, unknown). That classification doesn’t just decide which analyzers run — it scales severity. A tracker that sends data from a page where a patient enters symptoms or books a visit is a materially different risk from the same tracker on a homepage, and the score reflects that.