ElevateNow Platform · Architecture White Paper · Intelligence Infrastructure

Ontology-Anchored Text-to-SQL:
A Platform-Agnostic Middleware
for Reliable Natural Language Analytics

Text-to-SQL systems fail not because the models are inadequate, but because they lack the institutional context to resolve join ambiguity, column-level semantics, and domain-specific aggregation. This paper presents a middleware architecture that separates intelligence from execution — making SQL generation accurate, auditable, and portable across any analytical platform.

Date May 2026 Version 1.0 Classification Strategic Architecture Domain Insurance · Data Intelligence

The Hypothesis

Two Observations from Research — One Combined Solution

This architecture is grounded in two independent bodies of work that converge on the same conclusion: the information needed to generate correct SQL already exists inside your organization. The problem is that it has never been made programmatically accessible to the generation layer.

Observation 1 · Query Log Research

Institutional join knowledge lives in your query history — not in your schema

SQL query logs are not just an audit trail. They are a compressed record of every correct analytical decision your organization has ever made. Each successful query encodes which tables were joined, in which order, with which filters — the exact institutional knowledge that a language model lacks when generating SQL from scratch.

A semantic layer describes what data means. Query logs record how data has been correctly accessed. Both are required. Without the log-derived patterns, a model operating on schema alone will always hallucinate joins — picking a structurally valid path that is institutionally wrong.

The hypothesis: Mine the logs. Validate each pattern against the ontology. Surface the validated patterns as generation-time context. The model stops hallucinating because it is no longer guessing.

Observation 2 · Self-Improving Text-to-SQL

Static verified examples decay — an evolutionary feedback loop compounds accuracy over time

Manual curation of example queries does not scale. The question space grows faster than human curators can document it. Research into autonomous self-improvement for text-to-SQL demonstrates that systems can evaluate the quality of their own SQL output — through execution success, result set validity, and iterative refinement — without requiring a human-provided ground truth for every query.

An Elo-style scoring mechanism can rank competing SQL candidates generated for the same natural language question. The winning variant, once confirmed by execution and ontology conformance, becomes a new verified example — automatically. The system's accuracy ceiling rises with every query run.

The hypothesis: Replace static curation with a scored feedback loop. Promote only what the ontology validates. The example library becomes a living institutional memory — not a snapshot frozen at build time.

Combined solution in one sentence: Build a middleware layer that extracts institutional query patterns from historical logs, anchors them to an ontology that enforces which patterns are correct, and operates a self-improving feedback loop that continuously promotes successful patterns into the generation context — on any SQL platform, without rewriting the intelligence when the platform changes.

The Problem

Three Root Causes of SQL Generation Failure — The Model Is Not One of Them

Accuracy problems in production text-to-SQL deployments are consistently attributable to missing context, not model capability. The same model that fails on your schema performs well on standard benchmarks — because benchmark schemas have clean, unambiguous join paths. Yours do not.

~60%

of text-to-SQL errors are join path errors — the model picked a valid but wrong traversal

~25%

are aggregation level errors — wrong grain, missing GROUP BY, incorrect window scope

~15%

are column disambiguation errors — multiple similarly-named columns, wrong one chosen

Root Cause 1

Join Path Ambiguity

Your schema has multiple valid join paths between the same two tables. The semantic model describes all possibilities. The LLM picks one based on column name proximity — not on which path is institutionally correct for the question type asked.

Root Cause 2

Static Schema Context

The YAML semantic model (Cortex Analyst, Unity Catalog) describes what data is — tables, columns, relationships. It cannot encode which join path is intended for a given question type, or what the institutional pattern is for "loss ratio by line."

Root Cause 3

No Institutional Memory

Every query starts from zero. The system has no memory of which queries ran successfully last month, which join patterns are trusted for which business questions, or which shortcuts are forbidden. Verified Queries fixes this manually — but doesn't scale.

The central insight: Your organization already has the institutional knowledge that the model lacks. It exists in three places: your ontology (what entities mean and how they relate), your query history (how data has been successfully accessed), and your domain experts (who know which joins are correct). The middleware architecture makes this knowledge programmatically accessible to any SQL generator — on any platform.

Architecture — Conceptual View

Three Zones, One Principle: Own the Intelligence, Swap the Platform

Consumer Input

"What is the loss ratio by line of business?"

BI dashboard · chat UI · agentic workflow · API

↓

① Pre-Generation Intelligence

You Own ThisPlatform-Agnostic

Ontology + Query Memory

Resolves the correct join path from the ontology before the model sees anything. Retrieves verified examples from the query library. Assembles a constrained, context-rich prompt so the generator operates on institutional knowledge, not guesswork.

Schema Linker Path Resolver Example Retriever Prompt Assembler

↓ enriched context: schema subset · canonical join path · verified examples

② SQL Generator

Pluggable Adapter

Any Platform — One Interface

A thin, swappable adapter that receives the enriched context and produces a SQL candidate. Replacing Snowflake with Databricks requires changing only this layer — no intelligence logic is rewritten.

Snowflake Cortex Analyst Databricks Genie Generic LLM Local / OSS

↓ SQL candidate

③ Post-Generation Validation + Learning

You Own ThisPlatform-Agnostic

Validate · Execute · Correct · Remember

Checks the SQL against the ontology before it touches the warehouse. Executes, self-corrects on failure, and scores the result. Successful patterns are promoted back to the Query Library — improving every future query of the same type.

Conformance Checker Execution Engine Self-Corrector Feedback Capture

↓

Verified Result

Ontology-conformant SQL · Executed · Results returned to consumer

Feedback Loop

⟳

Successful patterns scored and promoted to the Query Library

Feeds Example Retriever on the next query. Accuracy compounds with every run.

Architecture — Detailed View

Eight Middleware Blocks Across Three Zones — Each Eliminating a Specific Failure Mode

The same three zones from the conceptual view, broken into their constituent blocks. Each block has a single responsibility and a measurable impact on SQL accuracy. Blocks 1–4 run before generation; Blocks 5–8 run after.

① Pre-Generation Intelligence · Platform-Agnostic · You Own This

Block 1

Schema Linker

Extracts entity signals from the question. Returns only the tables and columns relevant to this query — reducing hallucination surface area by 60–80%.

Block 2

Path Resolver

Extracts concept signals from the question, traverses DEPENDS_ON chains in the ontology, resolves physical tables from the Schema Registry, and emits the join sequence. The path is derived, not looked up.

Block 3

Example Retriever

Fetches the top-K verified queries semantically similar to the current question from the query library. Provides few-shot anchors from real, validated history.

Block 4

Prompt Assembler

Combines schema subset, canonical path, and examples into a structured context block formatted for the target platform adapter.

↓ structured context block → schema subset · canonical join path · verified examples

② SQL Generator · Pluggable Adapter · Swap Without Touching Intelligence

Adapter A

Snowflake Cortex Analyst

YAML semantic model + Verified Queries block. Context injected via description fields.

Adapter B

Databricks Genie / DBRX

Unity Catalog + Trusted Assets + Genie Instructions. Same context, different injection point.

Adapter C

Generic LLM (Claude / GPT)

Raw schema + system prompt + few-shot examples. Full control. No platform dependency.

Adapter D

Local / OSS (SQLCoder, Defog)

Fine-tunable on domain queries. Full data residency. Maximum latency control.

↓ generated SQL candidate

③ Post-Generation Validation + Learning · Platform-Agnostic · You Own This

Block 5

Conformance Checker

Parses generated SQL. Validates the actual join graph against the ontology canonical path. Blocks forbidden shortcuts before a single byte reaches the warehouse.

Block 6

Execution Engine

Routes conformance-passed SQL to the platform execution adapter. Returns a uniform result record — status, row count, timing, query ID — regardless of platform.

Block 7

Self-Corrector

On failure, constructs a targeted correction prompt from the structured error, re-routes through the generator, and retries. Resolves ~80% of syntax errors autonomously.

Block 8

Feedback Capture

Scores successful results on four dimensions. Promotes high-scoring patterns to the query library after K=5 confirmations. The system's accuracy ceiling rises automatically.

↓ verified, executed result

④ Execution Adapter · Thin, Swappable

Snowflake

Snowpark Connector

Submits SQL, returns resultset + query ID

Databricks

DBX SQL Connector

SparkSQL execution, DBFS result storage

Generic

SQLAlchemy / JDBC

Dialect-aware, connection-pooled

⑤ Data Foundation · Platform-Agnostic Knowledge Store

Store A

Ontology Graph

Entities, canonical join paths, forbidden shortcuts, required filters. Source of truth for all path decisions.

Store B

Query Library

Verified examples mined from historical logs, scored by success + conformance, indexed for semantic retrieval.

Store C

Schema Registry

Platform-neutral table/column metadata. Feeds Block 1. Updated on schema change events.

Intelligence Layer Deep Dive

Each Block Eliminates a Specific Class of SQL Error — Before Generation Begins

The intelligence layer runs before the SQL generator sees the question. By the time a prompt is assembled, the generator operates on a constrained, ontology-validated context — not an open-ended schema dump.

Block 1 · Intelligence Layer

Schema Linker

Extracts named entities and domain signals from the natural language question. Maps them to specific tables and columns in the schema registry using a combination of lexical matching, embedding similarity, and domain taxonomy. Returns a focused schema subset — only the tables and columns relevant to this question.

Value: Reduces the schema context injected into the generator by 60–80%. Smaller context means fewer hallucination opportunities, faster generation, and lower token cost. The generator no longer has to "guess" which of 200 tables are relevant — it's told exactly which 4 to use.

Input"loss ratio by line, Q1 2026"

Outputtables: [policy, claim, premium]cols: [lob, written_premium, incurred_loss, loss_dt]

Block 2 · Intelligence Layer

Path Resolver

Given the concept signals extracted by the Schema Linker, traverses the ontology's DEPENDS_ON chains to determine which business concepts are required and in what order. Then resolves each concept to its physical table and join key via the Schema Registry. Returns a derived join sequence — computed at runtime from two separate, independently maintained sources of truth.

Value: Eliminates the single largest error class: wrong join path. Instead of the generator inferring joins from column name similarity (which produces plausible but wrong SQL), it receives the exact join sequence that domain experts have encoded as correct for this question type. This is the ontology's highest-leverage contribution to SQL accuracy.

Inputentities: [policy, claim, premium]intent: "loss_ratio"

Outputpath: policy→premium→claimjoin: policy.id = claim.policy_idfilter: policy_status='active'forbidden: [claim→reserve direct]

Block 3 · Intelligence Layer

Example Retriever

Searches the query library using dense vector similarity (embedding of the current question vs. embeddings of stored verified queries). Returns the top-K most semantically similar examples — full question + verified SQL pairs. These examples are sourced from historical query logs that passed ontology conformance checks, plus any manually curated entries.

Value: Implements few-shot prompting with domain-specific, institutionally verified examples. The generator sees exactly how a similar question was successfully answered before — including the correct aggregation level, the right filter syntax, and the proper window function. This eliminates aggregation errors and column disambiguation errors simultaneously.

Inputquestion embeddingtop_k=3

Output3× {question, verified_sql, score}source: query_libraryconformance: ✓ ontology-validated

Block 4 · Intelligence Layer

Prompt Assembler

Combines the outputs of Blocks 1–3 into a structured context payload. The format is adapter-specific: for Cortex Analyst, it generates YAML-compatible description injections; for a generic LLM, it assembles a system prompt with schema, path hint, and examples; for Databricks Genie, it formats Trusted Asset entries. One assembler function per adapter — all intelligence logic remains shared.

Value: The assembler is the portability bridge. By encapsulating format translation here, every upstream intelligence block (Schema Linker, Path Resolver, Example Retriever) is written once and reused across all platform adapters. Adding a new platform requires only a new assembler format function — not a rewrite of any intelligence logic.

Inputschema_subsetcanonical_pathexamples[3]target_adapter

Outputstructured_context_blockadapter_format: yaml | prompt | json

Post-Generation Layer Deep Dive

Validate, Execute, Correct, and Learn — Every Query Improves the System

The post-generation layer closes the loop. SQL that passes conformance is executed; failures trigger automated self-correction. Successful patterns are captured and promoted — so tomorrow's generator starts from a better baseline than today's.

Block 5 · Post-Generation Layer

Conformance Checker

Parses the generated SQL using a lightweight AST parser. Extracts the actual join graph from the FROM/JOIN clauses. Compares it against the canonical path returned by the Path Resolver. Flags any deviation: missing required filter, presence of a forbidden shortcut, or join to a table not in the canonical path.

Value: Catches ontology violations before execution — so bad SQL never reaches the warehouse. In insurance analytics, a forbidden join (e.g., Claim → Reserve bypassing ClaimPayment) can produce a result that looks plausible but is institutionally wrong. The conformance check is the institutional knowledge defense layer that no SQL executor provides natively.

Inputgenerated_sqlcanonical_path (from Block 2)

Outputconformance: pass | failviolations: [{type, detail}]

Block 6 · Post-Generation Layer

Execution Engine

Routes conformance-passed SQL to the platform execution adapter. Manages connection pooling, timeout handling, and result pagination. Returns a structured execution record: status (success/error), row count, execution time, error message (if any), and the platform-native query ID for audit trails.

Value: Provides a uniform execution interface regardless of which platform is running the query. Callers never interact with Snowflake Snowpark vs. Databricks SQL Connector directly — they receive the same structured result format. This is what makes the system genuinely platform-agnostic at the execution layer, not just at the generation layer.

Inputsql (conformance: pass)adapter: snowflake|databricks|generic

Output{status, rows, time_ms, query_id, error?}

Block 7 · Post-Generation Layer

Self-Corrector

Triggered when the Execution Engine returns an error. Constructs a correction prompt that includes: the original question, the failed SQL, the structured error message, the canonical path hint (re-injected), and an instruction to address the specific error class. Re-routes through the SQL Generator adapter. Retries up to N=3 times before escalating to a human-readable error response.

Value: Implements the core iterative refinement mechanism from the RoboPhD paper. Rather than surfacing cryptic database errors to end users, the system attempts self-repair autonomously. In practice, ~80% of syntax errors and ~50% of semantic errors are corrected on the first retry. This dramatically reduces the "text-to-SQL failed — ask your data team" fallback rate.

Inputfailed_sqlerror_payloadattempt_count

Outputcorrected_sql (retry)or: escalation_response (max retries)

Block 8 · Post-Generation Layer

Feedback Capture

For every successful execution, computes a composite quality score: ontology conformance rate (binary), execution success (binary), result set quality (non-empty, within expected row count bounds), and optionally a user acceptance signal (did the user act on the result? did they re-query?). High-scoring question/SQL pairs become candidates for the query library. After K=5 successful runs of the same pattern, auto-promotion occurs.

Value: Closes the self-improvement loop. Every query that succeeds makes the next similar query more likely to succeed — because the Example Retriever now has a new, validated example to return. The query library grows automatically from production traffic. No human curation required after initialization. This is the difference between a static verified-query system and a continuously improving one.

Inputexecution_recorduser_signal (optional)

Outputscore: 0.0–1.0promotion_candidate: boollibrary_updated: bool

The Ontology Foundation

Two Layers, Cleanly Separated — Ontology for Concepts, Registry for SQL Retrieval

The ontology encodes what business concepts semantically depend on. A separate Schema Registry encodes how those concepts are physically realized in tables. The Path Resolver combines both at query time to derive the join path. Neither layer encodes SQL.

Why two layers? Because the people who know the domain (actuaries, underwriters) and the people who know the data (engineers) should be able to evolve their layer independently. An actuary can redefine what Loss Ratio requires without touching a table name. An engineer can rename a column without touching a business rule. The Path Resolver synthesizes both at runtime — the join path is derived, never declared.

Existing ontology edges (what you likely have): CLASSIFIES (entity categorization), GOVERNS (rule applicability), INFORMS (signal relationships), SUPERSEDES (version control). These describe what knowledge means — they are governance edges. Keep them as-is.

New edge type: DEPENDS_ON — encodes which business concepts a given metric semantically requires, and in what order. No table names. No column references. No SQL. A domain expert reads and writes this.

# Ontology layer — concepts only, owned by domain experts
{
  edge_type: "DEPENDS_ON",
  source: "Concept:LossRatio",
  depends_on: [
    "Concept:WrittenPremium",
    "Concept:LossIncurred"
  ],
    # WrittenPremium must be resolved before LossIncurred
    # — actuarial rule, not a join hint
  ordering: "WrittenPremium before LossIncurred",
  invalid_paths: ["LossRatio → Reserve"]
    # semantically wrong by definition — Reserve is not
    # an incurred loss, it is a future liability estimate
}

# Schema Registry — physical realization, owned by engineers
{
  concept: "Concept:WrittenPremium",
  table: "written_premium",
  entity_key: "policy_id",
    # how this concept joins to the Policy entity
  platform_overrides: {
    databricks: "gold.written_premium"
  }
    # table name changes per platform — the concept does not
}

How this scales

Questions are infinite. Business concepts are not. An insurance analytics domain has roughly 50–150 named concepts: Loss Ratio, Combined Ratio, Written Premium, Earned Premium, Loss Development Factor, Frequency, Severity, ALAE, and so on.

Every question — no matter how phrased — maps to one or a few of these concepts. The Schema Linker's job is to extract which concepts the question is about. The Path Resolver looks up those concepts' DEPENDS_ON chains and resolves tables from the Schema Registry. The join path is computed, not looked up.

This is how dbt Semantic Layer, Cube.dev, and AtScale work. The innovation here is anchoring the concept definitions in a governed ontology rather than in a static config file.

What this enables

Path Resolver (Block 2) — extracts concept signals from the question, traverses DEPENDS_ON chains, resolves tables from the registry, emits the join sequence. Join path is derived at runtime.
Conformance Checker (Block 5) — validates the generated SQL's join graph against the derived concept path. Flags joins that violate the ordering constraint or use an invalid concept path.
Independent evolution — domain experts update DEPENDS_ON edges without touching the registry. Engineers update table names in the registry without touching ontology rules.
Governance audit — every SQL execution cites the DEPENDS_ON edge ID that governed the join derivation — a complete trail from question to concept to table to result.

Anti-pattern to avoid: Do not collapse concept dependencies and physical join paths into a single declaration. If the business rule and the table name live in the same record, a domain expert editing business logic can silently break table resolution — and an engineer renaming a table can silently invalidate a business rule.

Query Library

Mine Your Query History — Automate What Verified Queries Does Manually

Every platform exposes a query history system table. Your best examples are already there — they just haven't been extracted, validated, and made available to your SQL generator. This pipeline does that automatically.

Step 1: Extract

Pull successful SELECTs from platform query history. Filter by: execution_status=SUCCESS, duration within bounds, query_type=SELECT. Each platform has its own system table.

Step 2: Cluster

Group queries by join pattern fingerprint: which tables appear in FROM/JOIN, in what order, with what filter shapes. Queries in the same cluster answer the same question type.

Step 3: Validate

For each cluster, derive the expected join path from the ontology's DEPENDS_ON chains + Schema Registry. Clusters whose join graph matches the derived path are promoted. Non-conforming clusters are flagged for review.

Step 4: Annotate & Promote

Reverse-generate a natural language question for each cluster's representative SQL (using LLM). Store as a {question, sql} pair in the query library. Now available to the Example Retriever.

Platform	Query History Source	Key Columns	Native Verified Example System	Pipeline Integration
Snowflake	`SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY`	QUERY_TEXT, EXECUTION_STATUS, TOTAL_ELAPSED_TIME	Verified Queries in Cortex Analyst YAML	Auto-populate YAML verified_queries block from library
Databricks	`system.query.history` (Unity Catalog)	statement_text, status, duration	Trusted Assets in Genie space config	Auto-populate Trusted Assets from library on refresh
BigQuery	`INFORMATION_SCHEMA.JOBS`	query, state, total_slot_ms	None native — inject via system prompt	Include top-K examples in generator system prompt
Generic LLM	Application-layer query log (your store)	Custom schema	None native — full control	Few-shot examples injected directly in prompt

Self-Improvement

The Feedback Loop That Compounds Accuracy Over Time

Inspired by the RoboPhD autonomous agent evolution framework (BIRD-Bench, 2025). The Elo-style scoring mechanism replaces manual curation with automated quality signal propagation.

The flywheel: Every successful query run generates a new candidate for the query library → the library grows → future similar questions get better examples → the generator produces better SQL → more queries succeed → the library grows further. The system accelerates with scale, not degrades.

Scoring Dimensions

Ontology conformance (40%): Does the SQL's join graph match the concept dependency chain derived from DEPENDS_ON edges?
Execution success (30%): Did the query run without error?
Result quality (20%): Non-empty result, within expected row count bounds.
User acceptance (10%): Did the user act on or export the result?

Promotion Rules

Score ≥ 0.85: Immediate candidate for library.
K=5 confirmations: Same pattern confirmed 5× → auto-promoted to query library.
Ontology conformance required: No promotion without the join graph matching a derivable concept path. Domain knowledge gates the library.
Periodic revalidation: Library entries re-scored on schema change events.

Expected Trajectory

Week 1–2: Seed library from historical logs (~50–200 examples). Baseline accuracy established.
Month 1: 300–500 library entries from production traffic. Most common question types fully covered.
Month 3: Long-tail coverage. Rare question types get examples from similar patterns via embedding retrieval.
Steady state: Generator accuracy plateaus at the ontology-constrained ceiling — the system is as accurate as your DEPENDS_ON concept definitions are complete.

Platform Comparison

The Middleware Works Across Platforms — Only the Adapter Changes

Each platform's native text-to-SQL offering has a different injection point for context. The middleware produces the same structured context block; the adapter formats it for the platform's API.

Capability	Snowflake Only (Cortex Analyst + Verified Queries)	Databricks Only (Genie + Trusted Assets)	This Middleware (Platform-Agnostic)
Verified example source	Manual curation only	Manual curation only	Auto-mined from query logs
Example scale	~20–50 practical limit	~20–50 practical limit	Hundreds, auto-maintained
Join path validation	None — LLM still picks path	None — LLM still picks path	Ontology conformance check
Forbidden join enforcement	None	None	Pre-execution block
Self-correction on failure	None — surfaces error to user	None	Auto-retry with error context
Example freshness	Static until manually updated	Static until manually updated	Continuously promoted from traffic
Platform portability	Snowflake-only	Databricks-only	Swap adapter, keep all intelligence
Audit trail	Platform query log only	Platform query log only	DEPENDS_ON edge ID + derived path + conformance record
Intelligence ownership	Vendor-controlled	Vendor-controlled	Fully yours — ontology + query library

Implementation Roadmap

Four Phases — Value Delivered at Each Milestone

Each phase is independently deployable and produces measurable accuracy improvement. Phase 1 alone is expected to eliminate 60–70% of current SQL errors.

Phase 1

Foundation

Ontology

Add DEPENDS_ON edges; define 20–50 business concepts

Schema Linker

Entity/concept extraction + schema subset selection

Path Resolver

Traverse DEPENDS_ON + Schema Registry; derive join path

Target Platform

Snowflake (Cortex Analyst adapter)

Phase 2

Query Library

Log Miner

Extract & cluster from ACCOUNT_USAGE.QUERY_HISTORY

Validator

Check each cluster against derived concept paths

Example Retriever

Vector similarity search on library

Seed Size

Target: 100–200 verified examples

Phase 3

Self-Correction & Loop

Conformance Checker

AST-based join path validation

Self-Corrector

Error-context retry loop (max N=3)

Feedback Capture

Score + promotion pipeline

Improvement

Library grows from production traffic

Phase 4

Multi-Platform

Databricks Adapter

Genie / DBRX + Unity Catalog integration

Generic LLM Adapter

Claude / GPT with full prompt control

Schema Registry

Cross-platform neutral metadata store

Portability

Zero intelligence rewrite on swap

Strategic Directives

Five Rules That Define the Architecture's Integrity

These are not preferences — they are the constraints that make the system maintainable, portable, and trustworthy over time.

Concept dependencies live in the ontology. Physical joins live in the Schema Registry. Never collapse the two. OK

DEPENDS_ON edges encode business rules — which concepts a metric requires and in what order. The Schema Registry encodes physical realization — which table and key implements each concept. The Path Resolver synthesizes both. Collapsing them into a single record means domain experts and engineers can no longer evolve their layer independently.

Intelligence blocks are platform-agnostic. NEVER platform-specific.

Schema Linker, Path Resolver, Example Retriever, Prompt Assembler, Conformance Checker, Execution Engine, Self-Corrector, and Feedback Capture contain zero platform-specific code. Only the thin adapter layer (SQL generator call, execution call, example format) is platform-specific.

No query reaches the warehouse without passing conformance. NEVER bypass.

The Conformance Checker is not optional. Even in development mode, conformance is run — it can be set to warn-only, but it always runs. A query whose join graph violates the concept dependency ordering (e.g., skipping WrittenPremium when computing Loss Ratio) can produce a plausible but materially incorrect result, which is worse than an error.

Query library promotion requires ontology conformance. NEVER promote non-conforming queries.

A query that produces a correct-looking result via a join path that violates the concept dependency ordering must not enter the query library — it will poison future few-shot examples and propagate the incorrect pattern. Conformance against the derived concept path is a prerequisite for library promotion, not just a preference check.

Platform migration is an adapter swap, not an architecture change. OK

When moving from Snowflake to Databricks (or adding a second platform), the implementation scope is: write a new Prompt Assembler format function, a new SQL Generator adapter call, and a new Execution adapter call. All intelligence blocks, the ontology, and the query library carry over without modification. Migration is measured in days, not months.

ElevateNow Platform · Intelligence Infrastructure · Confidential

Informed by: RoboPhD Self-Improving Text-to-SQL (arXiv:2601.01126) · VentureBeat Query Log Research (2025) · Production experience with Snowflake Cortex Analyst

Version 1.0 · May 2026

Access Required

Ontology-Anchored Text-to-SQL:A Platform-Agnostic Middlewarefor Reliable Natural Language Analytics

Institutional join knowledge lives in your query history — not in your schema

Static verified examples decay — an evolutionary feedback loop compounds accuracy over time

Join Path Ambiguity

Static Schema Context

No Institutional Memory

Schema Linker

Path Resolver

Example Retriever

Prompt Assembler

Snowflake Cortex Analyst

Databricks Genie / DBRX

Generic LLM (Claude / GPT)

Local / OSS (SQLCoder, Defog)

Conformance Checker

Execution Engine

Self-Corrector

Feedback Capture

Snowpark Connector

DBX SQL Connector

SQLAlchemy / JDBC

Ontology Graph

Query Library

Schema Registry

Schema Linker

Path Resolver

Example Retriever

Prompt Assembler

Conformance Checker

Execution Engine

Self-Corrector

Feedback Capture

Step 1: Extract

Step 2: Cluster

Step 3: Validate

Step 4: Annotate & Promote

Ontology-Anchored Text-to-SQL:
A Platform-Agnostic Middleware
for Reliable Natural Language Analytics