Enter the email address associated with your access grant.
Five evaluation groups purpose-built for insurance. Group 5 is a compliance gate, not a performance metric — and it's where most tools fail.
Standard OCR benchmarks — DocVQA, SROIE, FUNSD — were built for document understanding tasks, not insurance extraction. They measure whether a model can answer questions about a document. That's not the problem. The problem is whether a VIN extracted from a scanned police report is byte-perfect, whether the same document produces the same output on every run, and whether that output can be traced back to a specific location in the source when a regulator asks.
The framework below is what we actually use to evaluate tools. We share it because the evaluation criteria are as important as the results — and because vendors who cherry-pick generic benchmarks are counting on you not asking the right questions.
The first four map to standard document intelligence dimensions. The fifth — audit and compliance — is the one that determines whether output is actually deployable in a regulated insurance workflow.
Group 5 is a compliance gate, not a performance metric. A tool can pass Groups 1–4 and be disqualified by Group 5 alone.
"Group 5 is a compliance gate, not a performance metric. A tool can pass Groups 1–4 and be disqualified by Group 5 alone."
Request the full evaluation dataset
The complete dataset includes 50 test documents across scan quality classes A–D, per-document results for each group, and the evaluation harness scripts.
gps@elevatenow.tech →We work with insurers and MGAs who are serious about the architecture — not just the demo. Conversations start with the problem, not the product.