Concepts

Categories

Every finding belongs to one of six categories. Each is produced by a dedicated analyzer that runs on the pages where it's relevant. Here's what each one observes — and why patient-facing pages weigh more.

Tracking & Third-Party Exposure · 30%

An inventory of third-party scripts and pixels — analytics, advertising, session-replay tools, live-chat widgets, and tracking embeds — matched against a maintained signature catalog. The differentiator: the same tracker is flagged at much higher severity on a page classified as intake_form, appointment, or portal than on a generic marketing page. Consent-gated scripts are reported separately at reduced severity.

What it observes

  • GA4 / Universal Analytics
  • Meta (Facebook) Pixel
  • TikTok / LinkedIn / Pinterest / Reddit and other ad-network tags
  • Hotjar / FullStory and other session-replay tools
  • Live-chat widgets (Intercom, Drift, Tawk.to, Tidio, Crisp, LiveChat, Zendesk)
  • Cookie-mode YouTube embeds

Privacy Policy & Disclosures · 20%

An analysis of the privacy-policy page (located by the crawler) against a fixed disclosure schema. It checks whether the policy exists and whether it discloses the things a patient-facing site usually should.

What it observes

  • Policy exists and is reachable
  • Mentions HIPAA / PHI
  • Discloses third-party sharing
  • Discloses analytics / advertising use
  • Has a last-updated date and contact method

Forms & PHI Exposure · 20%

An enumeration of forms on classified pages, with field-name heuristics for PHI-shaped inputs (date of birth, SSN, insurance ID, symptoms, medications, reason for visit) and checks on how those forms transmit data. Severity reflects what the channel can actually leak: typed entries in a URL score harder than fixed dropdown choices.

What it observes

  • method="GET" on a PHI-shaped form
  • mailto: form actions (submissions as plain email)
  • Form action pointing to a third-party domain
  • Missing autocomplete controls on sensitive fields
  • Form submitted over plain HTTP

Transport Security (TLS/HTTPS) · 15%

Transport-layer checks using the standard library: certificate validity and expiry, negotiated and accepted protocol versions, hostname match, the plain-HTTP redirect, and detection of mixed content (HTTP sub-resources loaded on an HTTPS page).

What it observes

  • Expired or soon-to-expire certificate
  • Hostname / certificate mismatch
  • Outdated TLS protocol negotiated, or legacy TLS (1.1 and older) still accepted
  • Plain-HTTP version not redirecting to HTTPS
  • Mixed content on a secure page

Security Headers · 10%

Presence and configuration of the security response headers that harden a site against common web attacks, plus the flags on cookies the site sets. Policies delivered via <meta> tags are recognized.

What it observes

  • Strict-Transport-Security
  • Content-Security-Policy
  • X-Frame-Options / X-Content-Type-Options
  • Referrer-Policy / Permissions-Policy
  • Cookies without Secure, session cookies without HttpOnly

Infrastructure Hygiene · 5%

Passive, standard-crawler-equivalent checks: a small set of well-known paths, version disclosure on the root page, and DNS-only checks of the domain's email anti-spoofing records. Deliberately limited — no brute forcing, no auth bypass, nothing construable as intrusion testing.

What it observes

  • Exposed /.git/HEAD or /.env
  • Open directory listings
  • Common backup-file patterns (backup.sql, site.zip)
  • CMS / server version disclosure
  • Missing SPF or DMARC records (patient-phishing exposure)

Why patient-facing pages weigh more

The crawler classifies each page (privacy_policy, intake_form, appointment, portal, general, unknown). That classification doesn’t just decide which analyzers run — it scales severity. A tracker that sends data from a page where a patient enters symptoms or books a visit is a materially different risk from the same tracker on a homepage, and the score reflects that.