In short: the return on AI usually doesn’t vanish because it isn’t there - it vanishes because no one measures it. Fewer than one in five companies track well-defined KPIs for AI solutions, and those are exactly what correlate most strongly with real bottom-line impact. Companies that measure from the start mostly do see the return. This post lays out a simple method: set a baseline before you start, pick the right metrics, measure business outcome rather than activity, and count over the right horizon.
Why can’t most companies prove a return on AI?
Because they don’t build the measurement - they count on the effect “being visible”. A McKinsey study found that out of all 25 practices tested, tracking well-defined KPIs for AI has the biggest effect on the bottom line - yet fewer than one in five companies do it. In other words: the thing that most determines the return is also the thing most often skipped.
The result is that a project may genuinely save money, but no one can show it to the board - so the funding dries up. It’s one of the main ways AI rollouts fail: no number means no proof, and no proof means no decision to scale. Measurement isn’t a formality at the end - it’s the condition for knowing whether it’s even worth continuing.
Want to know the return a specific process will actually give you before you commit to it? We’ll work it out together on a free consultation - on your numbers.
What should you actually measure - which metrics?
Business outcome, not activity. “How many employees use AI” or “how many queries hit the model” are vanity metrics - they go up even when the company gains nothing. What counts is the change in a number the board already tracks. The choice of metric depends on the process:
| Process type | What to measure (outcome, not activity) |
|---|---|
| Repetitive office work | hours per task × rate - i.e. the cost of the process |
| Customer service | handling time, cases closed without escalation |
| Quality and control | rate of errors, complaints, rework |
| Outsourced processes | cost of agencies, BPO, subcontractors |
| Manufacturing | downtime, throughput, machine availability |
The rule is one: every metric has to convert into money, or into time that turns into money. If you can’t say how much money a change in a given indicator is worth, it’s not yet an ROI metric - it’s a curiosity.
Why is the “before” baseline the most important part?
Because without a number from before the rollout you can’t calculate the difference - and the difference is the entire return. The most common mistake is to start measuring only once the pilot “works”. By then you have nothing to compare against, and all that’s left is an impression that things are better. Impressions don’t pass a budget review.
So the order is the reverse of what it seems: first measure today’s state of one process - how many hours it takes, what it costs, how many errors it produces - and only then switch on AI. We laid out that same simple “hours times rate” calculation in our post on how much adopting AI costs. Measuring the “before” state takes one afternoon - and without that one number, the entire later proof of return hangs in the air.
Productivity isn’t yet a return - how do you avoid mistaking activity for outcome?
This is the most common measurement trap: “faster” doesn’t always mean “cheaper for the company”. A worker can finish a task twice as fast while the end result doesn’t change - because the bottleneck was a different step. Software data shows it clearly: Google’s DORA 2024 report found that AI adoption raises individual productivity and satisfaction, yet every 25% increase in its use was associated with an estimated 1.5% drop in delivery throughput and a 7.2% drop in stability. The authors’ conclusion is directly about measurement: improving the work process itself doesn’t automatically improve the end result.
The practical takeaway: measure at the end of the chain, where value for the customer and the P&L is created - not the sense of pace in the middle. Faster work midway through the process means nothing if the end shows no lower cost and no extra sales.
Where should you look for the biggest return?
Not where it’s easiest to measure - and those are two different things. A report from MIT (the NANDA project, 2025) found that more than half of generative AI budgets go to sales and marketing, while the back office often delivers a higher return - because that’s where AI eliminates real external costs, like agencies or process handling. The authors note that this shift reflects how easy the effect is to attribute, not its actual value.
For you that’s a concrete hint: before you pick a process “for show”, work out where money is actually leaking - repetitive work, outsourced tasks, costly errors. The most visible process and the most profitable process are rarely the same process.
When do you count the return - over what horizon?
Early enough to correct course, and long enough for the effect to compound. A single week shows nothing; a quarter on one process - quite a lot. The good news is that once measurement is in place, the return is usually there: according to Deloitte, nearly three-quarters of companies say their most advanced generative AI deployment is meeting or exceeding expectations on return.
It’s also worth remembering that the price of the tool itself drops fast, so it isn’t what decides the maths. According to the Stanford HAI report, the cost of querying a model at the quality of the old GPT-3.5 fell more than 280-fold in about eighteen months. That’s why the return is counted from the process - from the time and costs that genuinely disappear - not from a subscription that gets cheaper month over month.
Frequently asked questions
Where do I start measuring if we measure nothing today? With one process and one number. Pick a repetitive task, measure what it costs today (time, cost, errors), switch on AI and compare after a few weeks. Which process to pick first, we cover in where to start.
Can the ROI of AI even be measured reliably? On a narrow process - yes, and precisely. The difficulty only appears with “the whole company’s general productivity”, where too many variables come in. That’s why we measure process by process, not the company all at once.
How do you measure soft benefits like quality or satisfaction? Through numeric proxies: quality by the rate of errors and complaints, customer satisfaction by response time and the number of repeat contacts. A soft benefit that can’t be translated into any counter usually won’t survive a budget conversation.
Key takeaways
- The return usually doesn’t vanish - it just isn’t measured. Fewer than one in five companies track KPIs for AI, and those correlate most strongly with the bottom line.
- Measure business outcome, not activity - user or query counts are vanity metrics.
- Set the “before” baseline before you start - without a pre-rollout number you can’t prove the difference.
- Productivity isn’t a return - “faster” only counts when the process genuinely costs less or delivers more.
- The biggest return is often where it’s hardest to measure - don’t mistake the most visible process for the most profitable one.
- Count the return from the process, not the tool’s price - the subscription drops fast, the process saving compounds month over month.