Goodhart’s Law: How Measuring The Wrong Things Drive Immoral Behaviour
Few years back a friend of mine had a burglary at his home. Valuable artefacts had been stolen. But the police refused to file and FIR, even after so much of pleading. They listed down the items that had been stolen, but in no way could we persuade them to file an FIR. They gave us all sorts of reasons, some useless, others false. At that time we couldn’t figure out why they were hell bent of not registering the FIR.
You see, the rate of crime in an area is measured by the number of FIRs registered in the police station. It speaks badly about the police if there are a lot of FIRs registered. It hurts their career immensely. So they game the system. If there are no FIRs, the area is theoretically crime-free. Bingo! How can we explain such behaviour?
It has been established that when you measure effectiveness solely based on quantitative indicators, people involved have a high incentive to demonstrate less ethical behaviour, and most likely less effective results as well. This is called Campbell’s Law.
The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.
— Campbell’s Law
“Incentive structures work,” as Steve Jobs put it. “So you have to be very careful of what you incent people to do, because various incentive structures create all sorts of consequences that you can’t anticipate.” Sam Altman, president of Y Combinator, echoes Jobs’s words of caution: “It really is true that the company will build whatever the CEO decides to measure.”
But measurements are important nonetheless. Measurements are a tool that help us get results. Measurement can be found in both self-improvement techniques, as well as in business with KPIs, OKRs, and analogous approaches as well. Measurements are effective because they force you to do a deep dive in what actually they are trying to do, and hold themselves accountable to it.
“Measurement is the first step that leads to control and eventually to improvement. If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.”
— H. James Harrington
Unfortunately, there’s also strong evidence that measurements can be dangerous. In fact, it’s incredibly difficult to come up with incentives or measurements that do not have some kind of perverse effect.
If a job-placement firm measures your effectiveness by the number of interviews you conduct, it would motivate you to run through the meetings as quickly as possible, without helping your clients actually find a job.
A factory that focusses on production metrics encourages you to neglect maintenance and repairs, thereby setting up future catastrophe.
For Google ads, when you are paid on a cost per thousand impressions basis your incentive is to figure out how to show the most possible ads on every page and ensure that visitors see the most possible pages on the site via clickbait. UX is thrown out of the window. Your website might gain a little more money in the short term, but ad-crammed articles, slow-loading multi-page slide shows, and sensationalist clickbait headlines will drive away readers in the long run.
If college admissions require essays, you will pay essay-writers to write those essays. If journalism is fuelled by clicks, you are going to write sensationalist pieces.
Measurement and incentives are closely related.
The classic example of this happened in India. The government offered rewards to people who caught and killed snakes. Unexpected result: People started to breed more snakes in order to get the rewards. This is also referred to as the Cobra Effect — when an attempted solution to a problem makes the problem worse, as a type of unintended consequence.
A close cousin of Campbell’s Law is Goodhart’s Law.
“When a measure becomes a target, it ceases to be a good measure.”
— Goodhart’s Law
All metrics of evaluation are bound to be abused. Goodhart’s Law states that when a feature of the economy is picked as an indicator of the performance, then it inexorably ceases to function as that indicator because people start to game it.
This leads to problems when other equally important aspects of a situation are neglected. In our job-placement firm example, the number of interviews conducted was a good objective, and you dutifully strove to increase your numbers. However by choosing only one metric to measure success, it motivated you (unknowingly or unknowingly) to sacrifice quality in the name of quantity. It’s a big blunder to set just one metric as a proxy for overall quality of performance. You respond to incentives, and your natural inclination is to maximise the standards by which you are judged.
You can see its effects in many areas. In school, you were given one objective: maximise your grade. This focus on one number was detrimental to your actual learning. You were incentivised to memorise content for a test, then promptly forget all of it so that you could get ready for the next one — without any consideration of whether you actually understood the concepts. You game the system by finding strategies to mug up facts. This strategy worked quite well given how success was measured in school, but this is hardly the best approach for good education.
It’s easy for you to mentally process a single number to summarise an analysis, but in most situations it is better to report multiple measures. There are times when a single well-designed metric can encourage the behaviour you want, for e.g. increasing savings rates for retirement, but, it is important to keep in mind that people will always try to maximise whatever measurement you choose, sometimes unknowingly.
At the end if you end up achieving a single goal at the expense of other equally important factors, then your solution will not really help the situation. One of the first steps in solving a problem is determining the right measure to gauge success. Rather than using a single number, the best assessment is usually a set of measurements — a mix of both qualitative and quantitative. By choosing multiple metrics, you can design a solution without the unintended consequences that occur when optimising for a narrow objective.
A good example to side the side effects of a bad measuring metric is any social media product. If the most important metric in a social media product is to track is engagement, you reward people for a particular measure — clicks, upvotes, likes, views, comments, etc., and people always find a way to “game” the system. Ring any bell?