Dagger — Detection Engine

Dagger is Sonomos’s PII detection engine. It uses a combination of advanced pattern analysis and on-device AI to identify sensitive data across 62+ categories — all running locally in your browser.

How detection works

When text appears on a page — whether typed, pasted, or loaded — Dagger processes it through multiple detection layers:

Pattern analysis

Deterministic matchers built for structured PII formats. These are fast, precise, and include built-in validation (such as checksum verification for financial identifiers). Examples include:

Social Security numbers
Credit card numbers
Email addresses
Phone numbers (US and international)
IP addresses
Dates of birth
Medical record numbers
Driver’s license numbers
Passport numbers

AI-powered recognition

On-device small language models identify unstructured PII that pattern analysis alone can’t catch:

Person names in free text
Organization names
Location references
Context-dependent identifiers

All AI models run locally on your device — no data is sent to external servers for analysis.

Detection categories

Dagger currently includes 62 detectors with comprehensive test coverage. Each detector classifies matches into severity tiers:

Severity	Color	Examples
High	🔴 Red	SSN, credit card, passport, medical record
Medium	🟡 Amber	Full name, date of birth, address
Low	🟢 Green	Email, phone number, IP address

Severity drives the risk widget color and determines whether Cloak auto-masks or prompts the user.

Image and document detection

For PII embedded in images (screenshots, scanned documents), Dagger uses optical character recognition to extract text before running it through the same detection pipeline. PDF content is also parsed and analyzed automatically.

Known limitations

ZIP code detection: Some edge cases with street numbers and ZIP codes in certain address formats. An improved version is in development.
Short-token false positives: Very short common words may occasionally be flagged. These are excluded from Cloak masking automatically.