A researcher at MIT gave two KPIs a voice. What they said to each other changed how he understood his own organisation.
Michael Schrage, a research fellow at the MIT Initiative on the Digital Economy, was exploring what generative AI could do with performance data. He took two of the most common measures in any customer-facing business — Customer Lifetime Value and Churn Rate — and gave each of them a persona. Then he asked them to have a conversation.
Not a conversation with him. A conversation with each other.
What emerged was not a novelty. It was, as Schrage put it, a roadmap — surfacing tensions, contradictions, and strategic implications that neither metric had revealed independently, and that no human analyst had thought to surface because nobody had thought to ask the right question. The two metrics knew things about the business, in combination, that the business had not known about itself.
Something important had shifted. The KPI was no longer a mirror. It had become an interlocutor.
The experiment reconstructedTo understand what Schrage discovered, it helps to see the kind of dialogue he was generating. The following is a reconstruction of the logic — not a transcript, but a faithful representation of the argument structure that emerged when two metrics were given permission to disagree with each other.
This is what Schrage meant when he said the experiment produced a roadmap. The value was not in either metric alone. It was in the relationship between them — the tension, the contradiction, the implication that surfaced only when they were given permission to interrogate each other.
Exploration · Part OneFor the entire history of management accounting, the metric has occupied a specific epistemological role. It is an instrument of observation. It tells you where something is. Its job is to be accurate, consistent, and timely. It does not argue. It does not interpret. It does not push back.
This passivity was not an accident. It was a design choice, and a reasonable one. In a world where data was scarce, collection was expensive, and computation was slow, the most valuable thing a metric could do was exist — to provide a reliable number where previously there was only intuition. The organisational energy went into getting the number right. Interpretation was left to the humans.
That world no longer exists.
Passive. Reports what happened. Waits to be read. Interpreted in isolation. One number, one meaning. Human provides the question.
Active. Generates hypotheses. Interrogates related measures. Surfaces tensions. Reveals what was not asked. Human steers the dialogue.
When AI can process every metric simultaneously, hold the relationships between them in memory, surface anomalies in real time, and engage in natural language dialogue about what the numbers mean — the passivity of the traditional KPI is no longer technically necessary. The question is whether organisations will use that capacity wisely or simply produce more sophisticated versions of the same passive mirror, now updated in real time and displayed on a larger screen.
Exploration · Part TwoThe MIT Sloan Management Review and Boston Consulting Group conducted the most rigorous study of this transition to date — their 2024 global survey of over 3,000 managers across 25 industries and 17 executive interviews at companies including Schneider Electric, Sanofi, General Electric, and Wayfair. The findings are clear and uncomfortable in equal measure.
The 66% of organisations not yet using AI to revise their KPIs are, in Schrage's framing, running legacy measurement infrastructure. Their metrics are mirrors. They report accurately. They do not argue.
But the more important insight from the research is what the 34% are actually discovering — and it is not simply that their numbers are more accurate. They are discovering metrics they did not know to ask for.
The online furniture retailer Wayfair used algorithmic analysis to re-examine what was actually driving marketing campaign performance. The leadership team had a set of established KPIs they believed were the most critical to optimise — the standard marketing metrics: impressions, click-through rates, conversion rates, basket size.
The algorithm found something different. The most influential driver of campaign success was the percentage of impressions in which a person both viewed and heard a full advertisement. If that percentage fell below a certain threshold, campaigns consistently failed regardless of how well all other metrics performed.
This measure had not appeared in any dashboard. The marketing team had been optimising with confidence and precision for years — but they had been optimising the wrong things because they had not known to look for this one. Within six months of implementing the algorithm's recommendations, desired performance increased by 30 percentage points.
The metric that mattered most was one the organisation did not know to track. The algorithm did not improve performance on known measures — it discovered the measure that was actually causal.
Schneider Electric, the global energy management company, built what its chief governance officer Hervé Coureil described as a fundamental commitment: they want their KPIs to evolve over time. Not be replaced when they become irrelevant — to evolve continuously as the environment changes.
Schneider built dynamic KPIs that compute how volatile weather patterns influence the trade-offs between energy costs, distributive load-balancing risks, and carbon footprints in real time. These are not three separate metrics reported in three separate boxes. They are a single dynamic relationship — a measure that updates as the interdependencies between its components shift.
When weather patterns change, the optimal trade-off between energy cost and carbon footprint changes. The KPI tells Schneider managers not just where the measures currently are — but how the relationship between them is shifting, and what response it implies.
The KPI is no longer a number. It is a model of a relationship — updated in real time as the inputs that define that relationship change.
Schrage's experiment and the corporate cases point toward three specific capabilities that the emerging generation of AI-assisted measurement makes possible — capabilities that did not exist when the Balanced Scorecard was designed, and that require a fundamentally different relationship between the organisation and its measures.
Two metrics can be placed in conversation and asked what their relationship implies. Not simply whether they correlate, but what the nature of their relationship tells you about the organisation's strategic position. Customer Lifetime Value and Churn Rate arguing with each other is the simplest version. A full BSC with 15 measures, each interrogating every other, produces a map of interdependencies no human analyst could construct manually — and surfaces the contradictions that a flat dashboard hides.
The specific value here is not correlation analysis, which organisations have been capable of for decades. It is something harder to replicate manually: identifying when two metrics are moving in the same direction for opposite reasons, or when a metric that looks healthy is only healthy because an unhealthy metric is subsidising it.
A dashboard equipped with generative AI can be asked: what would our financial measures look like if the customer satisfaction trend of the last six months continues for another twelve? What if it reverses? The metric becomes a scenario engine — not predicting the future with certainty but generating structured hypotheses about possible futures that management can evaluate and challenge.
What makes this qualitatively different from traditional scenario planning is speed and specificity. A strategy offsite generates two or three scenarios in two days. An AI-assisted dashboard can generate twenty in twenty minutes — each anchored to the organisation's actual historical data rather than a consultant's generic market model.
The Wayfair case represents the most disruptive capability: AI can identify metrics that matter that the organisation did not know to track. This is not optimising known measures. This is discovering previously unimagined performance drivers — ones that lay hidden in the data, connected to outcomes through causal paths that human intuition never mapped.
As Schrage puts it, this cracks the human monopoly on KPI design. Choosing what to measure is no longer purely a human act of strategic judgement. It is increasingly a human-machine collaboration in which the machine finds what the human did not know to look for.
The constraint has never been computational.
Sixty percent of managers, according to the MIT research, believe their current KPIs need improving. Only 34% are doing anything about it. That gap is not a technology gap. The tools to interrogate, revise, and reconstitute performance measures are now accessible to any organisation with the appetite to use them. The gap is something more fundamental: the willingness to be wrong about what you have been measuring.
That willingness is rarer than it sounds. The KPIs in your current scorecard were chosen by people who knew the business well. They were debated, agreed upon, communicated downward, and written into performance contracts. When Wayfair's algorithm found that the most important driver of campaign performance was a measure nobody had been tracking, the implicit discovery was also that all the years of optimising every other measure were years spent with the right intention and the wrong instruments.
Three things determine which outcome you get. None of them are technological:
For thirty years, the Balanced Scorecard asked organisations to be more systematic about what they measured. AI-assisted measurement now asks something more unsettling: to be genuinely uncertain about whether they are measuring the right things at all — and to have the institutional courage to find out.
The question is not what your KPIs would say if they could talk back. It is whether you have built the kind of organisation that could hear the answer without filtering it through the conclusions it has already reached.
The capability to make your measures argue has never been easier to acquire.
The capacity to hear what they say has never been harder to build.