AI Learned What You Measure. Not What You Mean.

Joseph Noujaim
2 days ago
6 min read

A certain kind of confusion sits quietly inside most organizations, and it shows up most clearly when measurement becomes the answer to everything. Someone says the firm needs “more control”, and the response is to add reporting, add dashboards, add compliance checks, add layers of review, as though structure and control were the same thing, as though a more elaborate org chart could stand in for the harder work of governing behavior.

William G. Ouchi’s paper, written in the mid-1970s and published in Administrative Science Quarterly in 1977, begins by separating these two ideas with unusual discipline. Structure is what the organization looks like when it is drawn, and control is what the organization does when it tries to steer, evaluate, and correct its own activity. That distinction matters because organizations routinely change one while expecting the benefits of the other, and then wonder why drift continues anyway.

Ouchi frames control not as a property of bureaucracy but as an evaluation process: something is monitored, compared against a standard, and then rewarded or adjusted. When that is the starting point, control collapses into a surprisingly simple fork. There are only two things an organization can reliably observe in most work settings: what people do, and what those actions produce. From there, the familiar opposition between behavior control and output control follows naturally. Behavior control depends on watching and shaping actions, output control depends on evaluating results.

The paper then makes the claim that is still the key for contemporary governance debates. Neither form of control is simply a managerial preference. Each requires a condition that must already be true for the control to be “rational”. Behavior control requires at least an agreed understanding of the transformation process, meaning how inputs become outputs, what good work looks like, and what specific actions tend to produce it. Output control requires reliable and legitimate measures of output, meaning that outcomes can be measured in ways the organization accepts as meaningful.

This is a deceptively sharp model. It does not say “measure more” or “supervise more”. It says: if the organization does not know the transformation process, then supervising behavior becomes arbitrary, even performative. If the organization cannot measure outputs well, then output control becomes a narrowing device, pulling attention to the small set of things that can be counted and silently discarding the rest.

To test the relationship between structure and control, Ouchi studies 78 retail department store companies, using the organization as the unit of analysis. A retail setting sounds distant from algorithmic systems, yet the logic is close to the problems now faced in enterprise AI. Department stores have multiple goals, many of which matter to long-term performance, but only a subset of those goals are easily measurable. Sales volume can be counted, the quality of service is harder. Cleanliness, mentoring new staff, steady attention to detail, the prevention of small errors that later compound, all of this resists neat measurement.

The study finds that roughly a third of the variance in control mechanisms can be explained by structural characteristics, along with a feature of the environment, the nature of the clientele. More differentiated structures, both vertically and horizontally, are associated with more complete output measures. That is not because complexity makes measurement intrinsically truer. It is because complexity makes comparison harder. When subunits differ, output measures look attractive because they appear commensurable. A number can travel across divisions without explanation, which makes it politically and administratively usable.

But that is where Ouchi’s argument becomes more than a structure-control correlation. Measurement completeness is not the same thing as good control. Output measures are typically incomplete, and the incompleteness is not a minor technical issue. It is a governance vulnerability. When a firm relies heavily on output control built from incomplete measures, it risks driving out unmeasured goals. The store that evaluates only on sales volume will eventually see the tactics that raise sales volume, including pushing unwanted merchandise, neglecting stockwork, refusing to train new colleagues, or reducing service to anything not tied to the metric. The organization is not merely measuring; it is telling people what is safe to ignore.

This is the part of the paper that reads like a warning label for the AI era. Enterprise AI systems and agentic workflows are naturally attracted to output control because outputs are the only scalable interface between human intent and machine behavior. A delegated agent can be asked to hit a target, meet a service-level objective, reduce cost, increase throughput, raise conversion, or shorten cycle time. The outputs can be counted. The question is what disappears when those outputs become the dominant basis for evaluation.

Mandate drift, in the thesis sense, is competent behavior that becomes misaligned with organizational intent or authorization boundaries. It often looks like success because the system is delivering outputs that have been named as goals. Ouchi’s paper gives a mechanism for why that happens, and why it is so persistent. When the measurement set is incomplete, output control does not merely steer behavior. It teaches the system what counts as reality.

In that frame, governability of AI agency becomes less about whether the agent is intelligent, and more about whether the organization can supply the antecedent conditions that make any control scheme meaningful. If the organization cannot specify the transformation process, then behavior control over the agent becomes a ritual, a set of review meetings and policies that feel responsible but cannot reliably distinguish correct from incorrect behavior. If the organization cannot construct output measures that represent the full mandate, then output control becomes a drift engine, accelerating optimization toward what is measured and away from what is authorized.

Ouchi’s most interesting empirical nuance is that output control becomes safer when complementary control systems exist outside the organization’s formal measurement apparatus. In the department store case, aggressive and sophisticated clients create a form of external control. They complain, they demand service, they impose consequences for mistreatment. Professional norms can play a similar role, especially in work where output measures are narrow but practitioners carry a code that constrains behavior.

Translated to AI governance, this is a reminder that measurement completeness is not the only path to control. When internal metrics are thin, external forces can stabilize behavior, but only if those forces are real, legible, and capable of imposing cost. Regulators, customers with credible complaint channels, internal red teams, audit functions that can trigger remediation, professional communities that sanction misuse, and even well-designed appeal mechanisms that allow affected people to contest an agent’s decisions, can all function as complementary controls.

Yet Ouchi’s argument also implies a hard edge. External controls do not replace internal governance. They merely compensate for what the metric set cannot represent. If the organization does not build a way to detect, adjudicate, and correct drift, then external signals become noise, and the system becomes legitimized by ritual rather than constrained by reality.

This is where the governability loop becomes concrete. Detection requires logs, traceability, and measurement of not only outcomes but boundary conditions. Adjudication requires an explicit articulation of mandate, meaning what the agent was authorized to do, what it was not authorized to do, and what evidence counts when those boundaries are contested. Correction requires the power to change the system, not only to re-label it. Ouchi’s framework makes clear that without these capabilities, the organization will slide into the final cell of the matrix, the one where neither transformation knowledge nor output measures are sufficient, and control becomes ceremony, the performance of oversight.

For ALiEn, the Agency Licensing & Enforcement idea, this paper supplies a pragmatic constraint. A license cannot be only a goal statement. It has to be a measurable boundary. Enforcement cannot be only a policy. It has to be linked to detection mechanisms that can see violation and to correction mechanisms that can impose consequence. Otherwise, the enterprise is simply expanding structure and hoping control appears.

It is also worth noticing what Ouchi does not claim. The paper does not suggest that formalization, rules and procedures, neatly substitutes for output control. In the empirical results, formalization is not a strong negative predictor of output control. That matters for current governance programs that assume more policy automatically means more control. Policies are a form of structure, and they can become a substitute for thinking about the antecedent conditions. The firm can write rules for what it cannot measure and cannot truly supervise, and then treat the existence of those rules as evidence that control exists.

Ouchi’s contribution is to insist that control is an evaluation process, and evaluation requires a reality interface. In the AI era, the reality interface is measurement plus interpretive judgment plus enforceable authority boundaries, stitched together into a loop that can operate at scale.

The literature gets us here. The rest depends on you:

If an AI agent in an enterprise is evaluated primarily on a small set of outputs, because those are the only outputs that travel cleanly across the org chart, what would have to be true for the organization to credibly claim the agent is still operating within mandate, rather than merely within the metric?

Source

Official paper title: The Relationship Between Organizational Structure and Organizational Control
Authors: William G. Ouchi
Journal / venue: Administrative Science Quarterly, 22(1), 95–113
Year: 1977
DOI: 10.2307/2391748 (https://doi.org/10.2307/2391748)

thoughts on
AI governance

AI Learned What You Measure. Not What You Mean.

Source

Related Posts

Comments

thoughts on AI governance

Source

Comments

thoughts on
AI governance