Concepts

Categories

Every finding belongs to one of six categories. Each is produced by a dedicated analyzer that runs on the pages where it's relevant. Here's what each one observes — and why patient-facing pages weigh more.

Tracking & Third-Party Exposure · 30%

An inventory of third-party scripts and pixels — analytics, advertising, session-replay tools, live-chat widgets, and tracking embeds — matched against a maintained signature catalog. The differentiator: the same tracker is flagged at much higher severity on a page classified as intake_form, appointment, or portal than on a generic marketing page. Consent-gated scripts are reported separately at reduced severity.

What it observes

GA4 / Universal Analytics
Meta (Facebook) Pixel
TikTok / LinkedIn / Pinterest / Reddit and other ad-network tags
Hotjar / FullStory and other session-replay tools
Live-chat widgets (Intercom, Drift, Tawk.to, Tidio, Crisp, LiveChat, Zendesk)
Cookie-mode YouTube embeds

Privacy Policy & Disclosures · 20%

An analysis of the privacy-policy page (located by the crawler) against a fixed disclosure schema. It checks whether the policy exists and whether it discloses the things a patient-facing site usually should.

What it observes

Policy exists and is reachable
Mentions HIPAA / PHI
Discloses third-party sharing
Discloses analytics / advertising use
Has a last-updated date and contact method

Forms & PHI Exposure · 20%

An enumeration of forms on classified pages, with field-name heuristics for PHI-shaped inputs (date of birth, SSN, insurance ID, symptoms, medications, reason for visit) and checks on how those forms transmit data. Severity reflects what the channel can actually leak: typed entries in a URL score harder than fixed dropdown choices.

What it observes

method="GET" on a PHI-shaped form
mailto: form actions (submissions as plain email)
Form action pointing to a third-party domain
Missing autocomplete controls on sensitive fields
Form submitted over plain HTTP

Transport Security (TLS/HTTPS) · 15%

Transport-layer checks using the standard library: certificate validity and expiry, negotiated and accepted protocol versions, hostname match, the plain-HTTP redirect, and detection of mixed content (HTTP sub-resources loaded on an HTTPS page).

What it observes

Expired or soon-to-expire certificate
Hostname / certificate mismatch
Outdated TLS protocol negotiated, or legacy TLS (1.1 and older) still accepted
Plain-HTTP version not redirecting to HTTPS
Mixed content on a secure page

Security Headers · 10%

Presence and configuration of the security response headers that harden a site against common web attacks, plus the flags on cookies the site sets. Policies delivered via <meta> tags are recognized.

What it observes

Strict-Transport-Security
Content-Security-Policy
X-Frame-Options / X-Content-Type-Options
Referrer-Policy / Permissions-Policy
Cookies without Secure, session cookies without HttpOnly

Infrastructure Hygiene · 5%

Passive, standard-crawler-equivalent checks: a small set of well-known paths, version disclosure on the root page, and DNS-only checks of the domain's email anti-spoofing records. Deliberately limited — no brute forcing, no auth bypass, nothing construable as intrusion testing.

What it observes

Exposed /.git/HEAD or /.env
Open directory listings
Common backup-file patterns (backup.sql, site.zip)
CMS / server version disclosure
Missing SPF or DMARC records (patient-phishing exposure)

Why patient-facing pages weigh more

The crawler classifies each page (privacy_policy, intake_form, appointment, portal, general, unknown). That classification doesn’t just decide which analyzers run — it scales severity. A tracker that sends data from a page where a patient enters symptoms or books a visit is a materially different risk from the same tracker on a homepage, and the score reflects that.

How scoring works

The weights and the math.

Severity taxonomy

How serious each finding is.

Remediation catalog

Fixing what each analyzer finds.

GA & HIPAA explained

Background on tracking risk.