Insights

Research — the data-readiness gap

DataVault Research

The data-readiness gap: why AI projects fail, and what actually fixes them

Across two years of independent studies the verdict is consistent — most AI spend returns nothing, and the cause sits upstream of the model, in the data. This brief gathers the evidence and what it means for anyone deciding whether to start.

Briefing~6 min readSources cited throughout

Most AI spend returns nothing

The most-cited finding of 2025 was also the bluntest. MIT's Project NANDA, surveying more than 300 AI initiatives, concluded that roughly 95% of generative-AI pilots produced zero measurable impact on the bottom line — what the authors call the "GenAI Divide," with a thin 5% capturing nearly all the value.

95%
of generative-AI pilots return zero measurable P&L impact
MIT Project NANDA, 2025
80%
of AI projects fail — twice the rate of non-AI IT efforts
RAND Corporation, 2024
42%
of companies scrapped most AI initiatives in a year (up from 17%)
S&P Global, 2025

RAND's 2024 study put the failure rate at over 80% — double that of traditional IT projects — and S&P Global found abandonment accelerating: 42% of companies walked away from most of their AI initiatives in 2025, up from 17% a year earlier, scrapping on average about 46% of proofs-of-concept before they ever reached production.

The cause is the data, not the model

What unites these post-mortems is the diagnosis. MIT, Gartner and RAND independently point past the algorithm to the foundation beneath it: data that was never discovered, governed, cleaned or connected well enough to build on. The model is rarely why a project fails.

The failure is almost never the model. It is data readiness, workflow integration, and the absence of a defined outcome before build starts. — Analysis of MIT Project NANDA findings, 2025

Where the time goes: discovery, governance, cleaning

For the projects that do start, the cost shows up as time. Anaconda's State of Data Science benchmark found data professionals spend about 45% of their time on data preparation — loading and cleaning — more than model training, selection and deployment combined. The expensive talent hired to build models spends its days reconciling formats and scrubbing errors.

~45%
of a data professional's time spent finding, loading & cleaning data
Anaconda, State of Data Science
62%
name weak data governance their single biggest barrier
industry survey, 2025
56%
cite siloed, fragmented data as a top obstacle to readiness
Cloudera / HBR, 2026

And it compounds: governance and fragmentation are repeatedly named the top barriers, so before any cleaning even begins, teams lose weeks just locating trusted sources and agreeing what the data means.

The projects that never begin

The hardest failures are invisible — questions shelved before a line of code, because the organisation simply doesn't hold the data. Cloudera and Harvard Business Review Analytic Services found that only 7% of enterprises consider their data completely ready for AI. The other 93% know the foundation isn't there.

7%
of enterprises say their data is completely ready for AI
Cloudera / HBR, 2026
60%
of AI projects without AI-ready data forecast to be abandoned (through 2026)
Gartner, 2024–2025
43%
name data quality & readiness their top obstacle — ahead of skills
Informatica CDO Insights, 2025

Gartner expects 60% of projects unsupported by AI-ready data to be abandoned through 2026, and reports 63% of organisations either lack — or aren't sure they have — the data management practices AI requires. Informatica's survey of data leaders put data quality and readiness at the very top of the obstacle list, above skills and tooling.

What the few who succeed do differently

The pattern among the winners is unambiguous: they fix the data first. McKinsey's 2025 research found that organisations reporting significant financial returns from AI were twice as likely to have redesigned their end-to-end data workflows before selecting a model. The catch is cost and time — building that foundation in-house typically means a 12-to-24-month data-engineering programme before the first answer arrives.

Fix the data first, then apply the AI. The most cost-effective path is to build the data foundation before model selection — but for most teams that foundation is a multi-year project. — Synthesis of McKinsey 2025 AI survey and CDO research

Where DataVault fits

Dare to ask. We bring the answer.

DataVault removes the exact step that kills these projects. The open and official data the world publishes is already collected, kept and shaped into a common form — discovered, governed and current — so you skip the foundation-building entirely and go straight to the question. The teams who can't start today can start tomorrow.

Lower cost

No re-collecting the world

Sources are gathered once, for everyone — no per-project pipelines, no data-janitor hours, no multi-year foundation to fund before the first result.

Faster

Start at the question

The data is already loaded, cleaned to a shared shape, governed and kept current. The months of prep collapse into a query you can run today.

More value

History & joins you couldn't get

Every version is kept and sources meet that never lived together — answering questions that were impossible while your data sat in silos.

Sources

· MIT Project NANDA — The GenAI Divide: State of AI in Business 2025
· RAND Corporation — The Root Causes of Failure for Artificial Intelligence Projects (2024)
· S&P Global Market Intelligence — AI initiative abandonment survey (2025)
· Anaconda — State of Data Science (time-allocation benchmark)
· Cloudera & Harvard Business Review Analytic Services — Taming the Complexity of AI Data Readiness (2026)
· Gartner — Lack of AI-Ready Data Puts AI Projects at Risk (press release, 2024–2025)
· Informatica — CDO Insights (2025)
· McKinsey & Company — The State of AI (2025)

These figures are drawn from published third-party research and reflect industry-wide findings, not DataVault results. Several headline numbers (notably the 80% and 95% failure rates) are widely cited but also debated; they are reproduced here as reported by the named sources.