🔒

Access Required

Enter the email address associated with your access grant.

No access? Request here →

Organizations select extraction tools the same way they select software platforms — by brand recognition, benchmark scores, or vendor relationships. The result is architectural mismatch: LLMs deployed on fields that regex handles deterministically, cloud APIs processing PHI that should never leave the building, and expensive GPU inference running on documents where a 400KB Python library would have done the same job at a fraction of the cost.

A VIN extracted from a clean PDF requires a different tier than a dollar amount from a handwritten form photographed on a phone. Wrong-tier tools introduce unnecessary LLM calls, external API exposure, and hallucination risk on fields that regex handles deterministically. The six-tier model gives teams a decision framework that starts with the field, not the tool.

The Six Tiers

Each tier is defined by its sovereignty posture and the field categories it handles reliably.

Tier Name Tools Best for Sovereignty
T0 Native PDF Text pdfplumber, PyMuPDF Clean digital PDFs — policy docs, ACORD forms, endorsements Full — no API, no model
T1 Rule-Based OCR Tesseract, EasyOCR Scanned docs, structured forms, machine-print Full
T2 Regex Extraction Python re, custom parsers Cat 1 fields: VIN, DL#, DOB, policy#, claim#, ICD-10 codes Full
T2.5 Document VLM (Local) Granite-Docling-258M
IBM, Apache 2.0
Complex layouts, tables, mixed content — loss run schedules, IA reports Full — 8GB, runs in-premise
T4 Local LLM Ollama + Llama 3.3 70B Cat 4 fields: narrative interpretation, context-dependent extraction, cross-document synthesis Full — in-premise, no egress
T5 External LLM API Groq, OpenAI, Anthropic Complex reasoning, multi-document synthesis, cases where T4 latency is prohibitive Posture B/C — PHI leaves boundary

Field categorization drives tier selection

The tier-to-tool mapping is downstream of field categorization. Before selecting a tool, classify the target fields:

Category Field Type Examples Recommended Tier
Cat 1 Structured, regex-deterministic VIN, DL#, DOB, policy#, claim#, ICD-10, ZIP, phone T2 (regex) after T0/T1 text extraction
Cat 2 Semi-structured, layout-dependent Table cells, form fields, endorsement amounts, loss run rows T2.5 (Granite-Docling) for complex layouts; T1 + regex for simple
Cat 3 Narrative, layout-flexible Injury descriptions, cause of loss, coverage summaries, legal disclaimers T4 (local LLM) — keep PHI on-premise
Cat 4 Contextual, judgment-required Fraud indicators, liability assignment, cross-document reconciliation, complex coverage questions T4 first; T5 only when T4 latency exceeds workflow SLA and PHI posture permits

T5 is never the first choice for Cat 1 fields. Regex is faster, cheaper, more accurate, and keeps data on-premise. Using an LLM to extract a VIN is architectural malpractice.

Questions about tier selection for your document types?

We're happy to walk through field categorization for your specific FNOL, submission, or claims document set before you commit to a tooling decision.

gps@elevatenow.tech →

If the problems in these cases
sound familiar, let's talk.

We work with insurers and MGAs who are serious about the architecture — not just the demo. Conversations start with the problem, not the product.