Blog Splash

Gestell - Best in world at FinanceBench

March 4th, 2025

Gestell is best in the world at FinanceBench, by Patronus AI, with a score of 88% - outperforming the closest competition by ~30%

Not only did we beat FinanceBench on the ‘Shared Vector Store’ setting which lets an LLM reason across an entire database, we beat also FinanceBench at its ‘Oracle’ results which gave the model the exact page containing the answer for context

Gestell’s ETL for LLMs is now proven better than giving your model the exact page it needs as context

How did we do it?

Gestell is an integrated end-to-end ETL - from chunking, to vectorization, graph creation and more. We take a fundamentally different approach to FinanceBench - we focus on data structuring to serve as the basis of model search, and thus the reasoning process itself. Gestell mimics the ‘Oracle’ style of data provisioning while pulling from the entire dataset. Gestell even improves upon it by giving additional data context to your model. This leads to Gestell being able to beat Oracle testing on FinanceBench on a dataset that is ~50,000x more complex

Why does this matter?

FinanceBench is the gold-standard for testing search-based reasoning systems. Unlike other benchmarks that overly optimize the data environment (with exact context, formatted JSONs, or simple QA questions), FinanceBench presents a realistic challenge: a database of 368 documents and 50k+ pages that all requiring ingestion and structuring with complex financial querying, making it most similar to real-life use-cases that your business might be facing

Who did we beat?

Our results are below. The closest competition on the full database is from Databricks with fine-tuned embeddings scoring 67%

FrameworkTesting Style% Correct% Did not Answer% Incorrect
Gestell: Integrated ETLFull Database88.0%n/a12.0%
GPT-4 Turbo: PatronusOracle85.3%n/a14.7%
FT e5-mistral-7b: Databricks, Fine-tuned EmbeddingsFull Database67.0%n/a33.0%
GPT-4 Turbo: Patronus, Shared Vector StoreFull Database19.3%67.3%13.3%
Llama2: Patronus, Shared Vector StoreFull Database19.3%11.3%69.3%

Can this apply generally to your own business?

Yes, Gestell is an integrated ETL that is entirely generalizable and customizable to your specific domain. No need for 5 different point solutions to barely get your ‘RAG’ working, Gestell does this all with deep customization enabled in-app. Among ETL and data structuring providers, Gestell is the only one that has demonstrated the capability to structure data to enable true search-based reasoning across large databases. Let us handle your data so you can deliver the model and end-product to your customers

The gap between walled-garden benchmarks and human business processes is currently only reliably spanned by Gestell. If you are interested in Enframing your data with Gestell - reach out today