Gestell is best in the world at FinanceBench, by Patronus AI, with a score of 88% - outperforming the closest competition by ~30%
Not only did we beat FinanceBench on the ‘Shared Vector Store’ setting which lets an LLM reason across an entire database, we beat also FinanceBench at its ‘Oracle’ results which gave the model the exact page containing the answer for context
Gestell’s ETL for LLMs is now proven better than giving your model the exact page it needs as context
Gestell is an integrated end-to-end ETL - from chunking, to vectorization, graph creation and more. We take a fundamentally different approach to FinanceBench - we focus on data structuring to serve as the basis of model search, and thus the reasoning process itself. Gestell mimics the ‘Oracle’ style of data provisioning while pulling from the entire dataset. Gestell even improves upon it by giving additional data context to your model. This leads to Gestell being able to beat Oracle testing on FinanceBench on a dataset that is ~50,000x more complex
FinanceBench is the gold-standard for testing search-based reasoning systems. Unlike other benchmarks that overly optimize the data environment (with exact context, formatted JSONs, or simple QA questions), FinanceBench presents a realistic challenge: a database of 368 documents and 50k+ pages that all requiring ingestion and structuring with complex financial querying, making it most similar to real-life use-cases that your business might be facing
Our results are below. The closest competition on the full database is from Databricks with fine-tuned embeddings scoring 67%
Framework | Testing Style | % Correct | % Did not Answer | % Incorrect |
Gestell: Integrated ETL | Full Database | 88.0% | n/a | 12.0% |
GPT-4 Turbo: Patronus | Oracle | 85.3% | n/a | 14.7% |
FT e5-mistral-7b: Databricks, Fine-tuned Embeddings | Full Database | 67.0% | n/a | 33.0% |
GPT-4 Turbo: Patronus, Shared Vector Store | Full Database | 19.3% | 67.3% | 13.3% |
Llama2: Patronus, Shared Vector Store | Full Database | 19.3% | 11.3% | 69.3% |
Yes, Gestell is an integrated ETL that is entirely generalizable and customizable to your specific domain. No need for 5 different point solutions to barely get your ‘RAG’ working, Gestell does this all with deep customization enabled in-app. Among ETL and data structuring providers, Gestell is the only one that has demonstrated the capability to structure data to enable true search-based reasoning across large databases. Let us handle your data so you can deliver the model and end-product to your customers
The gap between walled-garden benchmarks and human business processes is currently only reliably spanned by Gestell. If you are interested in Enframing your data with Gestell - reach out today