How it works

Crawl, classify, score — then tell you what to fix.

Sift Health reads the public pages of a healthcare website the way a careful visitor would, looks for the risk indicators that have driven real enforcement, and hands you a prioritized list. It is passive, public-pages-only, and it never claims to be a compliance audit.

The scanning pipeline

  1. 01

    Crawl the public site

    A bounded crawl starts from your root domain — reading robots.txt and the sitemap, then probing a curated list of patient-facing paths like /privacy-policy, /appointments, /book, /patient-forms, and /portal. It stops at roughly 15–20 pages. Only publicly accessible pages are requested; nothing behind a login.

  2. 02

    Classify each page

    Every page is classified heuristically — privacy_policy, intake_form, appointment, portal, general, or unknown. This classification is the hinge of the whole product: it decides which analyzers run on a page and how heavily any finding there is weighted.

  3. 03

    Run the analyzers

    Page by page, the relevant analyzers run: a third-party tracker and chat-widget inventory everywhere, TLS and security-header checks everywhere, a form analyzer on intake and appointment pages, a privacy-policy analyzer on the policy page, passive infrastructure-hygiene checks against well-known paths, and DNS-only checks of the domain's email anti-spoofing records.

  4. 04

    Score the risk

    Findings roll up into six category scores and a single 0–100 overall score with an A–F grade. Severity and page type both shape the penalty, so a tracker on an appointment page costs far more than the same tracker on a homepage.

  5. 05

    Recommend fixes

    Every finding links to a remediation-catalog entry: what it is, why it matters, the step-by-step fix, and references. You leave with a prioritized, actionable list — not just a number.

Six categories, weighted by real risk

The overall score is a weighted roll-up of six categories. The weights aren’t arbitrary — they put the most pressure on the exposures that have actually drawn regulator and plaintiff attention: third-party trackers on patient-facing pages.

30%

Tracking & Third-Party Exposure

Analytics, advertising, session-replay scripts, chat widgets, and tracking embeds — flagged hardest on patient-facing pages.

20%

Privacy Policy & Disclosures

Whether the policy exists and discloses analytics, advertising, and third-party sharing.

20%

Forms & PHI Exposure

PHI-shaped intake/appointment forms: how they transmit data and where they send it — including GET, mailto:, and third-party destinations.

15%

Transport Security (TLS/HTTPS)

Certificate validity, legacy protocol acceptance, hostname match, HTTP→HTTPS redirect, and mixed content.

10%

Security Headers

HSTS, CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy, and cookie flags.

5%

Infrastructure Hygiene

Passive checks for exposed files, open directories, version disclosure, and the domain's email anti-spoofing records (SPF/DMARC).

Draft weights, tuned with pilot data. The exact math is in the scoring concept doc.

The scope, stated plainly

Public pages only

The scanner requests the same pages any visitor or search crawler can already reach. It never logs in or touches anything behind authentication.

Passive only

Standard crawler-equivalent requests. No brute forcing, no auth-bypass attempts, nothing construable as intrusion testing.

Indicators, not verdicts

Results are risk indicators and recommendations — never a compliance pass/fail. Compliance depends on safeguards no external scan can see.

Authorized targets

Scan sites you own or are engaged to assess. Recurring monitoring requires domain verification.

Run a free scanRead the scoring math