Observation

A researcher at MIT gave two KPIs a voice. What they said to each other changed how he understood his own organisation.

Michael Schrage, a research fellow at the MIT Initiative on the Digital Economy, was exploring what generative AI could do with performance data. He took two of the most common measures in any customer-facing business — Customer Lifetime Value and Churn Rate — and gave each of them a persona. Then he asked them to have a conversation.

Not a conversation with him. A conversation with each other.

What emerged was not a novelty. It was, as Schrage put it, a roadmap — surfacing tensions, contradictions, and strategic implications that neither metric had revealed independently, and that no human analyst had thought to surface because nobody had thought to ask the right question. The two metrics knew things about the business, in combination, that the business had not known about itself.

Something important had shifted. The KPI was no longer a mirror. It had become an interlocutor.

The experiment reconstructed

To understand what Schrage discovered, it helps to see the kind of dialogue he was generating. The following is a reconstruction of the logic — not a transcript, but a faithful representation of the argument structure that emerged when two metrics were given permission to disagree with each other.

Dialogue — Customer Lifetime Value in conversation with Churn Rate

Context

A mid-scale subscription business. CLV trending upward for 18 months. Churn rate flat at 8.4% annually. Leadership satisfied with both.

CLV

I look healthy. Average lifetime value has grown 22% in 18 months. The business is acquiring better customers — higher-value segments, longer retention in the cohorts that stay. Leadership is right to be pleased with this number.

Churn Rate

You are being read too simply. Your average is improving because the low-value customers are the ones leaving. That is why I am flat at 8.4% — it looks like stability, but it is not. I am losing the bottom of the distribution. You are rising because the denominator is shrinking toward high-value customers. We are not both improving. One of us is improving at the expense of the other's honesty.

CLV

That interpretation assumes the low-value customers who churned were worth retaining. If they were not profitable to serve, their departure improves both of us legitimately. The question is whether the business can replace that volume at comparable cost.

Churn Rate

Exactly. And that question does not appear on any dashboard. The acquisition cost for the high-value segment is rising. I know because my composition has changed — the customers staying longer are the ones who required the most expensive onboarding. The business is improving the average by serving a smaller, more expensive population. This is not a growth model. It is a concentration model. And nobody has named it.

What emerged

The business was not growing. It was stratifying. The metrics looked healthy in isolation. Together, they described a strategic vulnerability — concentration risk in a narrowing high-value segment — that neither measure alone could have surfaced, and that no human analyst had named because nobody had thought to ask what the two numbers would say to each other.

This is what Schrage meant when he said the experiment produced a roadmap. The value was not in either metric alone. It was in the relationship between them — the tension, the contradiction, the implication that surfaced only when they were given permission to interrogate each other.

Exploration · Part One

From mirrors to interlocutors — the epistemological shift

For the entire history of management accounting, the metric has occupied a specific epistemological role. It is an instrument of observation. It tells you where something is. Its job is to be accurate, consistent, and timely. It does not argue. It does not interpret. It does not push back.

This passivity was not an accident. It was a design choice, and a reasonable one. In a world where data was scarce, collection was expensive, and computation was slow, the most valuable thing a metric could do was exist — to provide a reliable number where previously there was only intuition. The organisational energy went into getting the number right. Interpretation was left to the humans.

That world no longer exists.

The epistemological shift — how the role of measurement is changing

Metric as mirror

Passive. Reports what happened. Waits to be read. Interpreted in isolation. One number, one meaning. Human provides the question.

→

Metric as interlocutor

Active. Generates hypotheses. Interrogates related measures. Surfaces tensions. Reveals what was not asked. Human steers the dialogue.

When AI can process every metric simultaneously, hold the relationships between them in memory, surface anomalies in real time, and engage in natural language dialogue about what the numbers mean — the passivity of the traditional KPI is no longer technically necessary. The question is whether organisations will use that capacity wisely or simply produce more sophisticated versions of the same passive mirror, now updated in real time and displayed on a larger screen.

Exploration · Part Two

What the research says about the transition

The MIT Sloan Management Review and Boston Consulting Group conducted the most rigorous study of this transition to date — their 2024 global survey of over 3,000 managers across 25 industries and 17 executive interviews at companies including Schneider Electric, Sanofi, General Electric, and Wayfair. The findings are clear and uncomfortable in equal measure.

34%

of organisations currently use AI to create or revise KPIs — a striking minority given the availability of the tools

MIT Sloan + BCG · Global survey of 3,000+ managers · 2024

90%

of that 34% report measurable improvement in KPI quality, strategic alignment, and cross-functional collaboration

Same dataset · companies using AI to revise KPIs were 4.3× more likely to report improved alignment

The 66% of organisations not yet using AI to revise their KPIs are, in Schrage's framing, running legacy measurement infrastructure. Their metrics are mirrors. They report accurately. They do not argue.

But the more important insight from the research is what the 34% are actually discovering — and it is not simply that their numbers are more accurate. They are discovering metrics they did not know to ask for.

The case of Wayfair and the hidden driver

Case Study Wayfair — AI-driven KPI discovery · MIT Sloan / BCG 2024

The online furniture retailer Wayfair used algorithmic analysis to re-examine what was actually driving marketing campaign performance. The leadership team had a set of established KPIs they believed were the most critical to optimise — the standard marketing metrics: impressions, click-through rates, conversion rates, basket size.

The algorithm found something different. The most influential driver of campaign success was the percentage of impressions in which a person both viewed and heard a full advertisement. If that percentage fell below a certain threshold, campaigns consistently failed regardless of how well all other metrics performed.

This measure had not appeared in any dashboard. The marketing team had been optimising with confidence and precision for years — but they had been optimising the wrong things because they had not known to look for this one. Within six months of implementing the algorithm's recommendations, desired performance increased by 30 percentage points.

The metric that mattered most was one the organisation did not know to track. The algorithm did not improve performance on known measures — it discovered the measure that was actually causal.

The case of Schneider Electric and the dynamic target

Case Study Schneider Electric — dynamic KPIs · MIT Sloan / BCG 2024

Schneider Electric, the global energy management company, built what its chief governance officer Hervé Coureil described as a fundamental commitment: they want their KPIs to evolve over time. Not be replaced when they become irrelevant — to evolve continuously as the environment changes.

Schneider built dynamic KPIs that compute how volatile weather patterns influence the trade-offs between energy costs, distributive load-balancing risks, and carbon footprints in real time. These are not three separate metrics reported in three separate boxes. They are a single dynamic relationship — a measure that updates as the interdependencies between its components shift.

When weather patterns change, the optimal trade-off between energy cost and carbon footprint changes. The KPI tells Schneider managers not just where the measures currently are — but how the relationship between them is shifting, and what response it implies.

The KPI is no longer a number. It is a model of a relationship — updated in real time as the inputs that define that relationship change.

Exploration · Part Three

Three new things a metric can now do

Schrage's experiment and the corporate cases point toward three specific capabilities that the emerging generation of AI-assisted measurement makes possible — capabilities that did not exist when the Balanced Scorecard was designed, and that require a fundamentally different relationship between the organisation and its measures.

Capability 01 Cross-metric interrogation

Two metrics can be placed in conversation and asked what their relationship implies. Not simply whether they correlate, but what the nature of their relationship tells you about the organisation's strategic position. Customer Lifetime Value and Churn Rate arguing with each other is the simplest version. A full BSC with 15 measures, each interrogating every other, produces a map of interdependencies no human analyst could construct manually — and surfaces the contradictions that a flat dashboard hides.

The specific value here is not correlation analysis, which organisations have been capable of for decades. It is something harder to replicate manually: identifying when two metrics are moving in the same direction for opposite reasons, or when a metric that looks healthy is only healthy because an unhealthy metric is subsidising it.

Capability 02 Counterfactual scenario generation

A dashboard equipped with generative AI can be asked: what would our financial measures look like if the customer satisfaction trend of the last six months continues for another twelve? What if it reverses? The metric becomes a scenario engine — not predicting the future with certainty but generating structured hypotheses about possible futures that management can evaluate and challenge.

What makes this qualitatively different from traditional scenario planning is speed and specificity. A strategy offsite generates two or three scenarios in two days. An AI-assisted dashboard can generate twenty in twenty minutes — each anchored to the organisation's actual historical data rather than a consultant's generic market model.

Capability 03 Causal hypothesis discovery

The Wayfair case represents the most disruptive capability: AI can identify metrics that matter that the organisation did not know to track. This is not optimising known measures. This is discovering previously unimagined performance drivers — ones that lay hidden in the data, connected to outcomes through causal paths that human intuition never mapped.

As Schrage puts it, this cracks the human monopoly on KPI design. Choosing what to measure is no longer purely a human act of strategic judgement. It is increasingly a human-machine collaboration in which the machine finds what the human did not know to look for.

"We want our KPIs to evolve over time, because we don't want to drive our business on legacy or vanity metrics."

Hervé Coureil, Chief Governance Officer, Schneider Electric — MIT Sloan Management Review, 2024

Ground

What the convergence actually means

The constraint has never been computational.

Sixty percent of managers, according to the MIT research, believe their current KPIs need improving. Only 34% are doing anything about it. That gap is not a technology gap. The tools to interrogate, revise, and reconstitute performance measures are now accessible to any organisation with the appetite to use them. The gap is something more fundamental: the willingness to be wrong about what you have been measuring.

That willingness is rarer than it sounds. The KPIs in your current scorecard were chosen by people who knew the business well. They were debated, agreed upon, communicated downward, and written into performance contracts. When Wayfair's algorithm found that the most important driver of campaign performance was a measure nobody had been tracking, the implicit discovery was also that all the years of optimising every other measure were years spent with the right intention and the wrong instruments.

Three things determine which outcome you get. None of them are technological:

Accountability for the performance of the KPIs themselves — not just performance against them. In most organisations, nobody owns this question. The CFO owns the financial targets. The CMO owns the customer metrics. But nobody is responsible for asking whether those metrics are actually measuring what the organisation believes they measure. That accountability vacuum is where measurement systems quietly decay.
Institutional permission to be surprised without penalty. Schrage's CLV-Churn dialogue surfaces something that neither function wanted to name: that the business was concentrating, not growing. That kind of finding does not survive an organisation where the implicit rule is that the data is supposed to support the narrative leadership has already committed to.
The discipline to change what you optimise, not just what you report. Wayfair improved performance by 30 points because they changed the target — not the dashboard. The discovery only mattered because they acted on it. Most diagnostic insights generated by AI tools will be read, noted, and absorbed back into the existing measurement system without altering a single behaviour.

For thirty years, the Balanced Scorecard asked organisations to be more systematic about what they measured. AI-assisted measurement now asks something more unsettling: to be genuinely uncertain about whether they are measuring the right things at all — and to have the institutional courage to find out.

The question is not what your KPIs would say if they could talk back. It is whether you have built the kind of organisation that could hear the answer without filtering it through the conclusions it has already reached.

The capability to make your measures argue has never been easier to acquire.
The capacity to hear what they say has never been harder to build.

Sources referred

Schrage, M., Kiron, D., Candelon, F., Khodabandeh, S. & Chu, M. (2024). The future of strategic measurement: Enhancing KPIs with AI. MIT Sloan Management Review and Boston Consulting Group.
Ittner, C.D. & Larcker, D.F. (2003). Coming up short on nonfinancial performance measurement. Harvard Business Review, November.
Goodhart, C. (1975). Problems of monetary management: The UK experience. Papers in Monetary Economics, Reserve Bank of Australia.
Campbell, D.T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2(1), 67–90.
Kaplan, R.S. & Norton, D.P. (1992). The Balanced Scorecard — Measures that drive performance. Harvard Business Review.
Coureil, H. (2024). Interview in MIT Sloan Management Review, February 2024.

Disclaimer. This essay is a synthesis of published academic and institutional research for the purpose of intellectual inquiry and public discourse. It does not constitute professional advice. Interpretations and editorial positions are the author's own.