Practitioner Research · Insurance AI

What we build,
what we learn, and
what the evidence shows.

We architect governed AI systems for insurers and MGAs. This is where we publish the work — implemented workflows, benchmark studies, and the reasoning behind every architectural decision.

10
Published field use cases across insurance operations
4
Domains: Underwriting, Claims, Operations, Actuarial
0 LLM
Required for 80%+ of structured insurance document fields
10×
Products in the governed intelligence stack
Perspectives

What we believe about insurance AI.

Not as a vendor. As practitioners who have built these systems, debugged them in production, and had to explain them to underwriters and regulators.

01 · The real objection

Clients don't object to token cost. They object to the audit trail.

When an insurer says "we're concerned about LLM cost," the underlying question is: can you show a regulator exactly why a specific value was extracted from a specific document? Cost is the proxy for a governance concern that's harder to articulate.

02 · The extraction problem

Most extraction stacks use LLM where regex would be more accurate.

A 70B parameter model extracting a VIN from a police report is engineering theater. Compiled regex against a 17-character alphanumeric pattern achieves 99% exact match, zero variance, zero hallucination, and produces a verifiable character offset. The LLM is worse on this task by every measure that matters.

03 · Data sovereignty

PHI to external APIs isn't a cost decision — it's a compliance decision.

FNOL documents, police reports, and IA reports contain DOB, DL#, medical codes, and policy numbers. Every external LLM API call with these documents is a potential HIPAA exposure. No anonymization wrapper fixes this — the architecture has to ensure PHI never leaves the carrier's environment.

04 · What governed AI means

Governance isn't a compliance layer you add at the end. It's the architecture.

A governed system means every field value traces to a document position, every rule traces to a versioned definition, and every decision can be reconstructed from first principles. You can't retrofit this onto a black-box LLM pipeline. It requires the right structure from the start.

Deterministic first

Every field that can be extracted without inference, should be. LLM adds cost and variance without improving quality on structured fields.

Every value traced

Char offset for text, bounding box for images, section ID for documents. "The model said so" is not an audit trail for insurance operations.

Infrastructure sovereign

For carrier-cloud and air-gapped deployments, the full pipeline runs within the carrier's environment. Nothing leaves.

Human escalation is a feature

Low-confidence fields surface to handlers by design. AMICA said it directly: "duplicative, unclear data escalated to a representative." That's the right model.

Field Research

10 implemented workflows.
Documented as practitioners.

Each case documents the real problem, why generic approaches fall short, and what an architecture that actually solves it looks like. No NDA — but no implementation blueprints either.

Underwriting 10 min read

Prior Carrier Loss Runs: From PDF Chaos to Underwriting Intelligence

What risk signals are buried in prior carrier loss runs that underwriters never see because the data arrives as PDFs with carrier-specific codes that don't map to your coverage lines?

Loss RunsCoverage Gaps
Read →
Underwriting 10 min read

Municipal Budgets Tell Risk Stories Underwriters Never Have Time to Decode

Counties operate jails, water plants, law enforcement, and healthcare facilities — each with distinct liability profiles. Yet underwriters get 200-page PDFs and 15 minutes. What signal is lost, and what happens after a loss when that signal surfaces?

Public SectorBudget Analysis
Read →
Claims 11 min read

WC FNOL: Jurisdiction Rules That Run Consistently Across All 50 States

Workers Comp FNOL routing requires applying the exact statutory rules for the state and date of injury — statute of limitations, benefit schedules, reporting deadlines. LLM-based routing introduces variance that can't be audited against the version of law that applied at time of injury.

Workers CompJurisdictionZero LLM
Read →
Operations 16 min read

Your AI Is Making Decisions on Data Nobody Has Certified

Your enterprise is deploying AI agents that autonomously consume data from structured databases and unstructured documents. When a regulator asks "was this data certified before your AI used it to deny this claim?" — what do you show them?

AI CertificationData Quality10 Dimensions
Read →
Operations 12 min read

What if Audits Prevented Problems Instead of Just Documenting Them?

Traditional audit happens after binding — when it's too late to prevent problems and too expensive to fix them. This case documents a shift from compliance checkbox to preventive risk intelligence that evaluates 100% of submissions before binding.

AuditGovernancePre-bind
Read →
Operations 14 min read

MDM Platforms Fail When Nobody Assessed Data Readiness First

Your enterprise acquires operations in APAC. Their customer names follow different cultural patterns, addresses use unfamiliar formats, source schemas don't align with your MDM target model. What happens when the 5–6 week manual mapping process per region becomes the bottleneck preventing global expansion?

MDM ReadinessEntity Resolution
Read →
Operations 16 min read

Enterprise MDM Provides No Sandbox to Test Configurations Before Production

You set your blocking scheme and auto-merge threshold — but will that generate 50,000 candidate pairs or 10 million false positives? Will golden records pick current addresses or stale CRM data? You discover these answers in production, with real customer data at risk.

Golden RecordsMatching Engine
Read →
Actuarial 12 min read

AI-Driven Cohort Analysis: What if Cohort Design Was Evidence-Based, Not Traditional?

Most actuaries inherit cohort segmentation from predecessors without rigorous testing. Cohort selection is foundational to reserve accuracy — yet it's rarely questioned. This case documents automating hypothesis generation, construction, and statistical validation to find objectively better segmentations.

ReservingLoss TrianglesStatistical Testing
Read →
Evidence Lab

How we evaluate tools.
Our benchmarks, open for scrutiny.

Framework assessments and benchmark studies grounded in insurance document realities — not vendor marketing. Updated as we conduct new research.

Published

The Six-Tier Extraction Stack

Why the question "which tool should we use?" is wrong. The right question is "which tier does this field require?" — and the tier determines the tool, the infrastructure, and the sovereignty posture.

T0 pdfplumber / PyMuPDF — No LLM · No GPU
T1 Tesseract / PaddleOCR — No LLM · Optional GPU
T2 Regex / spaCy / Lookup tables — No LLM · No GPU
T2.5 Granite-Docling-258M (IBM, Apache 2.0) — Doc VLM · 8GB
T4 Ollama + Llama 3.3 70B — Local LLM · Cat 4 only
T5 Groq / OpenAI API — External LLM · PHI risk
Tool SelectionArchitectureCost Model
In Progress

Cat 1 Field Study: Regex vs. LLM on Structured Fields

Direct comparison of compiled regex against Groq LLM on Cat 1 fields (VIN, plate, DOB, DL#, policy number) across 200 FNOL and police report documents. Metrics: field-level exact match, hallucination rate, run variance.

Hypothesis: regex F1 ≥ 0.97, hallucination = 0.00, variance = 0.00. LLM F1 0.85–0.93, non-zero variance.

FNOLPolice ReportsDeterministic
Notify when published →
In Progress

Scanned Document Quality: Tesseract vs. Granite-Docling-258M

DPI degradation curve study at 300/200/150/75 DPI. Table TEDS, Cat 1 exact match, and DPI sensitivity coefficient across 50 scanned insurance documents.

Hypothesis: T2.5 degrades <10% vs. Tesseract at 25–30%. Table TEDS 0.90+ vs. 0.72.

OCR QualityScanned DocsTable Extraction
Notify when published →
Planned

Local vs. Cloud LLM for Cat 4: Subrogation Analysis

Accuracy comparison of Ollama + Llama 3.3 70B (T4, in-premise) versus Groq API (T5, cloud) on subrogation indicator detection in IA reports. Includes T2 trigger pre-filter effectiveness study.

Local LLMSubrogationData Sovereignty
The Stack

What we build on.

The governed intelligence stack behind the use cases above. Ten capabilities across four layers — from AI-ready data pipelines to personified workbenches.

This isn't a product pitch — it's context for what makes the architectures in the field cases possible.

AI-Ready Pipelines

Assure

Data quality certification — structured and unstructured — before AI consumes it. 10-dimension composite trust score.

GA
AI-Ready Pipelines

Redact

1,100+ sensitive data types detected and redacted in-pipeline before egress. PHI, PII, PCI, HIPAA, GDPR.

GA
AI-Ready Pipelines

DataDNA

Pipeline-native lineage registry. The trust control plane for AI data consumption across Snowflake and Databricks.

Preview
Semantic

Semantic Hub

Insurance ontology + document intelligence + knowledge repository. The context layer for governed AI.

GA
Semantic

Resolve

AI-powered entity resolution. MDM pre-assessment in days, not weeks. Customer identity that carriers trust.

GA
Governance

AI Compliance Hub

Every AI decision reconstructible from governing chunks, version, effective date, and trace path.

Preview
Workflow

Recipe Packs

Insurance agentic workflows for WC/Auto/Property claims and commercial underwriting. Each tool invocation is traceable.

GA
Apps

Workbench

Claims examiner intelligence workspace. Facts, intelligence, and evidence in a single cockpit from FNOL intake.

GA

If the problems in these cases
sound familiar, let's talk.

We work with insurers and MGAs who are serious about the architecture — not just the demo. Conversations start with the problem, not the product.