Insights
Research — the data-readiness gap
DataVault Research
The data-readiness gap: why AI projects fail, and what actually fixes them
Across two years of independent studies the verdict is consistent — most AI spend returns nothing, and the cause sits upstream of the model, in the data. This brief gathers the evidence and what it means for anyone deciding whether to start.
Most AI spend returns nothing
The most-cited finding of 2025 was also the bluntest. MIT's Project NANDA, surveying more than 300 AI initiatives, concluded that roughly 95% of generative-AI pilots produced zero measurable impact on the bottom line — what the authors call the "GenAI Divide," with a thin 5% capturing nearly all the value.
RAND's 2024 study put the failure rate at over 80% — double that of traditional IT projects — and S&P Global found abandonment accelerating: 42% of companies walked away from most of their AI initiatives in 2025, up from 17% a year earlier, scrapping on average about 46% of proofs-of-concept before they ever reached production.
The cause is the data, not the model
What unites these post-mortems is the diagnosis. MIT, Gartner and RAND independently point past the algorithm to the foundation beneath it: data that was never discovered, governed, cleaned or connected well enough to build on. The model is rarely why a project fails.
Where the time goes: discovery, governance, cleaning
For the projects that do start, the cost shows up as time. Anaconda's State of Data Science benchmark found data professionals spend about 45% of their time on data preparation — loading and cleaning — more than model training, selection and deployment combined. The expensive talent hired to build models spends its days reconciling formats and scrubbing errors.
And it compounds: governance and fragmentation are repeatedly named the top barriers, so before any cleaning even begins, teams lose weeks just locating trusted sources and agreeing what the data means.
The projects that never begin
The hardest failures are invisible — questions shelved before a line of code, because the organisation simply doesn't hold the data. Cloudera and Harvard Business Review Analytic Services found that only 7% of enterprises consider their data completely ready for AI. The other 93% know the foundation isn't there.
Gartner expects 60% of projects unsupported by AI-ready data to be abandoned through 2026, and reports 63% of organisations either lack — or aren't sure they have — the data management practices AI requires. Informatica's survey of data leaders put data quality and readiness at the very top of the obstacle list, above skills and tooling.
What the few who succeed do differently
The pattern among the winners is unambiguous: they fix the data first. McKinsey's 2025 research found that organisations reporting significant financial returns from AI were twice as likely to have redesigned their end-to-end data workflows before selecting a model. The catch is cost and time — building that foundation in-house typically means a 12-to-24-month data-engineering programme before the first answer arrives.
Where DataVault fits
Dare to ask. We bring the answer.
DataVault removes the exact step that kills these projects. The open and official data the world publishes is already collected, kept and shaped into a common form — discovered, governed and current — so you skip the foundation-building entirely and go straight to the question. The teams who can't start today can start tomorrow.
No re-collecting the world
Sources are gathered once, for everyone — no per-project pipelines, no data-janitor hours, no multi-year foundation to fund before the first result.
Start at the question
The data is already loaded, cleaned to a shared shape, governed and kept current. The months of prep collapse into a query you can run today.
History & joins you couldn't get
Every version is kept and sources meet that never lived together — answering questions that were impossible while your data sat in silos.
Sources
· MIT Project NANDA — The GenAI Divide: State of AI in Business 2025
· RAND Corporation — The Root Causes of Failure for Artificial Intelligence Projects (2024)
· S&P Global Market Intelligence — AI initiative abandonment survey (2025)
· Anaconda — State of Data Science (time-allocation benchmark)
· Cloudera & Harvard Business Review Analytic Services — Taming the Complexity of AI Data Readiness (2026)
· Gartner — Lack of AI-Ready Data Puts AI Projects at Risk (press release, 2024–2025)
· Informatica — CDO Insights (2025)
· McKinsey & Company — The State of AI (2025)