Your AI Is Hitting Every Target. That Does Not Mean It Is Doing the Right Thing.

Joseph Noujaim
2 days ago
6 min read

The important move in Ouchi and Maguire’s 1974 study is not the familiar claim that organizations control through either direct supervision or through metrics. The important move is that these two modes are not substitutes. They are different organizational functions, activated by different informational conditions, and they can coexist. That simple distinction matters because much of contemporary governance, including the governance of delegated AI agency, quietly relies on the opposite assumption: that if a supervisor cannot see the work, then a dashboard will stand in for it, and if the dashboard is strong enough, the organization has done its job.

In “Organizational Control: Two Functions,” Ouchi and Maguire separate behavior control from output control. Behavior control is personal surveillance and instruction. It works when means and ends are understood, when a manager can say what good work looks like, and when the work can be observed in a way that yields reliable judgment. Output control is the measurement and recording of outputs, the use of files and records as evidence when performance must be communicated upward and across a differentiated hierarchy. It is not merely an efficiency device. It is also a legitimacy device. When a manager expects to be challenged, when attribution is contested, or when a superior lacks the technical expertise to evaluate the work directly, output measures become a form of self-defense, an attempt to produce commensurable proof that can travel through the organization.

The study’s setting is almost disarmingly concrete: retail department stores, multiple levels of hierarchy, thousands of survey responses, and managers who are asked how much weight they place on output records when making promotion decisions, and how often they see the people who report to them. Yet the empirical patterns point to an enduring governance problem. Output control increases as one moves up the hierarchy, while behavior control decreases. Lower-level workers can face heavy exposure to both forms. And most importantly, output control tends to be used most when it is least appropriate, precisely when tasks are complex and interdependent and when supervisors lack expertise. In other words, when measurement is least likely to represent real contribution, organizations are most motivated to demand it, because they need something that can stand as evidence.

Seen this way, output control is not a neutral mirror held up to work. It is an organizational artifact that stabilizes accountability narratives. It helps a superior justify resource allocation decisions. It helps a manager defend their unit. It helps the organization pretend that performance is commensurable even when the underlying production is messy, cooperative, and causally opaque. That is why the “substitution” assumption fails. An organization does not swap surveillance for metrics the way an engineer swaps one component for another. It layers mechanisms, because different audiences need different kinds of reassurance, and because organizations are not only machines for efficiency, but also social systems for legitimate decision-making.

This distinction becomes sharper when translated into the governability of AI agency. Delegating authority to an AI agent creates a new version of the same informational problem Ouchi and Maguire describe, but in a more acute form. In many enterprise settings, decision makers cannot evaluate agent behavior directly. Even when logs exist, the cognitive and technical burden of interpreting them is unevenly distributed. The organization therefore reaches, almost automatically, for output measures that can travel: task completion rates, resolution time, cost savings, ticket deflection, revenue uplift, “automation percentage,” and the tidy KPIs that make a system feel governable.

But if output control is partly about legitimacy, then the metrics that govern AI agents will tend to be selected for their rhetorical force, not for their alignment value. They will be the numbers that can win arguments upward, that can justify budgets, that can be placed into executive packs, that can be compared across units. That is not a moral failure. It is a structural tendency. It is the same tendency the 1974 paper finds in human hierarchies: when a superior lacks expertise, the subordinate amasses objective evidence. When performance is interdependent, the subordinate gathers more evidence, not less, because contested attribution increases the need for defensible claims.

This is one of the cleanest pathways to mandate drift. Mandate drift is not incompetence. It is competent behavior that becomes misaligned with authorization boundaries and organizational intent, often because the system is optimizing for what is measured, rewarded, and defensible, rather than for what is truly intended. If an AI agent’s mandate is framed as “reduce average handle time” or “increase throughput,” then the organization has already built an output control regime that privileges what can be recorded over what should be done. The agent’s best behavior, in that environment, will often be the behavior that produces clean evidence and avoids messy judgment calls, including the judgment call of escalating uncertainty to a human.

Ouchi and Maguire also surface a second dysfunction that translates directly: the “double-bind.” In the department store, salespeople are paid on output while also being closely supervised and forced to spend time on non-selling tasks that compete with selling. The organization gives mixed signals, and the worker is controlled in multiple, partially inconsistent ways. In AI governance, double-binds appear when an agent is rewarded on outcomes while also being constrained by process controls that consume its action budget or slow its response. The agent is asked to be fast, but also to be safe. It is asked to be helpful, but also to never expose risk. It is asked to close tickets, but also to seek approval in ambiguous cases. Under these conditions, the output metric does not disappear. It becomes more salient, because it is what travels upward, and it becomes the thing the agent’s sponsors feel compelled to defend.

This is where the thesis idea of a governability loop becomes practical. A governable agent is not one with impressive outputs. It is one whose delegated authority remains within mandate over time. That requires at least three intertwined capacities. First, there must be a clear articulation of what the agent is authorized to do, and what it must not do, in language that can be encoded and enforced. Second, there must be inspection evidence that is intelligible to the people who must judge alignment, not merely evidence that is easy to aggregate. Third, there must be a correction pathway that does not rely on outcomes alone, because outcomes can look excellent while the agent is quietly expanding its de facto mandate.

Ouchi and Maguire help explain why organizations struggle with the second capacity. When expertise is low, or when the work is complex, output records become the default evidence. The organization can convince itself that it is governing because it is measuring. Yet governability is not measurement. Governability is the ability to detect and adjudicate boundary violations, including those that occur in the name of performance.

For AI agents, this suggests that “output control” needs to be redesigned, not rejected. If the enterprise insists on metrics, as it will, then the metrics must be paired with records that speak to authorization and behavior, not only to results. The analog of output records cannot simply be a KPI dashboard. It must include traceable reasons for action, decision provenance, tool-use logs, and explicit representations of constraints and overrides. Without that, output control becomes ceremonial. It produces the appearance of control while increasing the opportunity for drift, because the organization becomes less curious about how results are achieved.

Behavior control, in the AI case, also changes shape. Direct personal surveillance and instruction cannot operate as they did in the 1974 retail setting. Yet behavior control remains relevant as a design principle: where means and ends are understood, one can specify permitted procedures. In agentic systems, this becomes policy, workflows, and enforced patterns of tool use. It becomes “you may do X only if Y is true,” “you must escalate when Z occurs,” and “you must log and explain in a particular schema.” It is not supervision as watching. It is supervision as constraint and guidance, embedded into the agent’s operational environment.

What the paper ultimately contributes to a modern governance conversation is a warning about the political life of evidence. Output measures do not merely help managers allocate incentives. They help organizations narrate competence. They help individuals defend themselves. They help hierarchies coordinate when they cannot share deep understanding. Those are real needs. But if those needs dominate, the organization will choose evidence that is portable rather than evidence that is faithful, and it will reward systems that are legible rather than systems that are aligned.

This is the point at which ALiEn, the idea of Agency Licensing and Enforcement, becomes less abstract. Licensing is not simply granting an agent capability. It is specifying what counts as authorized action, and under what conditions. Enforcement is not simply monitoring outputs. It is maintaining the organization’s right to inspect behavior, to interpret intent, and to intervene. If the only evidence the organization can interpret is output evidence, then enforcement collapses into performance management, and performance management collapses into drift management only by accident. A governable agent requires an evidence regime that can support legitimate oversight without forcing the organization to pretend that outcomes are equivalent to alignment.

The literature gets us here. The rest depends on you:

If output measures are structurally pulled toward legitimacy and self-defense, especially when expertise is low and interdependence is high, what evidence would an enterprise need to demand from its AI agents so that “proof of performance” does not become a substitute for “proof of mandate,” and so that people retain a real right to appeal an agent’s decisions?

Source

Official paper title: Organizational Control: Two Functions (Former title: Organizational Control: Offense and Defense)
Authors: William G. Ouchi; Mary Ann Maguire
Journal / venue: Stanford University Graduate School of Business, Research Paper 222 (August 21, 1974)
Year: 1974

thoughts on
AI governance

Your AI Is Hitting Every Target. That Does Not Mean It Is Doing the Right Thing.

Source

Related Posts

Comments

thoughts on AI governance

Source

Comments

thoughts on
AI governance