Skip to content

How to measure AI success in your organisation.

Most organisations have no idea whether their AI investment is working. Here's how to find out, and fix it.

The gap between paying for AI and benefiting from it is enormous. And growing.

~40%

Of medium and large Australian firms that have adopted AI describe their use as still 'minimal.'

Reserve Bank of Australia, 2025 Survey

'Are we getting ROI from AI?' is a reasonable question. But it's the third question, not the first. Ask it before you've answered the first two and you'll get a number that's either meaningless or misleading.

Two-thirds of Australian businesses have adopted AI in some form. But the largest group, nearly 40% of respondents in the RBA's 2025 survey of medium and large firms, describe that use as 'minimal.' Meanwhile, EY's Australian AI Workforce Blueprint found that while 68% of Australian workers now use AI, 42% haven't been given a clear reason or purpose to use it.

That's not a technology problem. It's a capability and sequencing problem. And it can't be solved by measuring ROI, because there isn't meaningful ROI yet to measure.

The organisations pulling ahead aren't spending more. They're measuring earlier, at the right layer, and using what they find to move faster. This is that framework.

Measure in this order.
Not the other way around.

AI success lives across three distinct layers, each one a prerequisite for the next. Organisations that skip to layer three without establishing layers one and two are measuring outcomes they haven't yet earned.

01
Leading Indicator

Usage & Capability

This is your leading indicator, and it has two components that are easy to conflate but critically different. Usage tells you whether AI has become part of how people work. The target: 80% of your desk workers using their enterprise AI tool on any given workday. Below that, you have an adoption problem no downstream measurement will fix.

Capability tells you whether usage is translating into value. Someone using AI to ask questions they used to Google is not the same as someone using it to compress three hours of research into twenty minutes of structured analysis. Usage without capability is expensive habit formation.

Pro tip: Self-reported capability is almost always inflated. The only reliable way to assess it is to ask people to complete actual tasks with AI and evaluate the outputs. Surveys tell you what people think they can do. Task-based assessment tells you what they actually can. Neoma has expert benchmarking and assessment tools to support.

L1

AI Aware

Rarely or never uses AI at work. Unfamiliar with prompting or AI’s capabilities, risks and benefits.

L2

AI Capable

Uses AI as a search engine replacement. Limited prompting skill. Can’t reliably ensure accuracy.

L3

AI Proficient

Strong, role-specific use cases. Iterates on prompts. Handles intermediate to advanced tasks.

↑ Target: 80% of workforce

L4

AI Practitioner

Builds reusable automations and scalable AI systems that others benefit from.

80%
Daily active users target on any given workday
80%
Workforce at L3 capability or above
Platform
Data from Copilot Admin, ChatGPT Enterprise, etc.

ROI at this layer: 80% daily active users × 10% conservative productivity gain = equivalent of 78% headcount capacity recovered without adding a single role.

02
Where ROI Accelerates

Workflow Automation

Individual capability is valuable. Workflow automation is where ROI stops being incremental and starts compounding. A skilled individual using AI well might save two hours a day. An automated workflow running 500 times a month saves those hours without anyone needing to remember to use AI at all. That's structural advantage, not personal productivity.

Two metrics matter here. First, your worker-to-automation ratio: one meaningful automation per employee is a credible primary target. Not all will be high-value or long-lived. The point is widespread experimentation at a scale that reliably surfaces the ones that are. Second, whether each business function, Sales, Marketing, Finance, Operations, HR, has at least one significant agentic workflow running with consistent weekly usage.

1:1
Worker-to-automation ratio as primary target
1–3
Active agentic applications per business function
Self-reported
Monthly cadence from functional leads

ROI at this layer: 1,000 automations × 2 hours saved per week = 2,000 hours weekly, equivalent to 50 FTEs, without hiring or restructuring.

03
Business Outcomes

Department-Level Impact

This is where AI measurement connects to the metrics your board cares about. And it only means something if you've done the work at layers one and two first. The mistake most organisations make is trying to measure business impact before they've built the capability and automation infrastructure that creates it.

The approach is disciplined and simple. Before deploying an AI-automated process, define the specific KPI it should move. Establish a four-week baseline of that KPI and its current cost. Deploy the new process. Measure at 30, 60 and 90 days.

Sales
Faster lead response time, higher meeting booking rate
Creative
More concepts per brief, faster time to first draft
Operations
Lower cost per transaction, faster resolution times
Finance
Reduced month-end close time, automated reporting

Define before you deploy, not after. A four-week current-state baseline collected mid-deployment is better than nothing. Starting before is significantly better than starting late. The discipline is in the sequencing.

Three common measurement problems, and how to solve them.

Real-world measurement is messy. Here's how to handle the three issues that come up most often.

Problem 01

“We don’t have access to our LLM usage data.”

Fix it

Start with self-reporting from team leads combined with KPI impact tracking. Imperfect, but not nothing. At scale, a centralised AI intelligence dashboard that aggregates across tools is worth the investment.

Problem 02

“We don’t have baseline metrics.”

Fix it

Start measuring now, even if you’re mid-deployment. A four-week current-state snapshot gives you something to measure against. Waiting for the perfect baseline is how organisations end up with no baseline at all twelve months later.

Problem 03

“We can’t attribute results to AI specifically.”

Fix it

Run controlled pilots where possible: same role, same function, one group with AI and one without. Where that’s not practical, strong usage and capability metrics combined with directional KPI improvement is a credible story even without perfect attribution.

Start measuring capability before you measure outcomes.

Most Australian organisations are being asked to demonstrate AI ROI before they've established whether their workforce is capable of generating it.

The organisations getting ahead of Australia's projected shortfall of 60,000 AI-capable professionals by 2027 aren't waiting for the market to supply talent. They're building it deliberately, measuring it rigorously, and scaling what works. That's the work. And it starts with an honest picture of where you actually are.

Not sure where your organisation actually stands?

Neoma's AI capability benchmarking gives you an evidence-based baseline across all three layers: usage, automation and business impact. We use task-based assessment, not self-reporting, so the picture you get reflects what your workforce can actually do.

Our Review-Reskill-Retain methodology gives you a clear path from your current state to genuine, measurable AI capability. Whether you're mid-market, enterprise or government, we can show you what the path forward looks like.