Why Punishments Don’t Always Work

The job of a reward is to promote certain behaviour. But, what if we wish to discourage certain behaviour? We’ve talked about positive reinforcements—the carrot. But, what about the stick?

We had touched upon the importance of disincentivising undesirable behaviour before. Here we discuss the various methods, and their implications. Let’s start with Punishments.

Punishments are responses that decrease our likelihood to repeat a behaviour. For example, when rats in a Skinner Box push a lever to receive a shock instead of food, they stop pushing it.

Societal systems rely on threats and punishments to keep us in line. Despite that we keep on arriving late, spit in public, skip homework, and invest in the wrong places. Well, that’s because punishments have some inherent problems. Unwanted behaviours reappear as soon as we remove the punishment. Unwanted behaviours aren’t forgotten. They only remain suppressed. For example, we adhere to speed limits only when we are monitored. But if there’s no speed limit on a road, it gives us an option to go berserk, and we usually do.

Another problem is that punishments deter us from learning the right behaviour. Punishments teach us about what not-to-do, not what to-do. This inhibits our ability to learn newer and better responses.

Punishments also lead to a variety of reactions from us. Reactions such as rage, humiliation, and feeling of helplessness. None of these are part of our learning process.

On top of that, biased punishment systems do can have adverse effects upon human psyche. We punish kids more than adults. We scold boys more than girls. We rebuke janitors more than executives. All these experiences can give the sense of a prevalent injustice in the system.

Punishments also trigger our fight or flight response. They can make us aggressive when all the exits get blocked. In some cases, they create fear of desirable behaviours. For example, fear of punishment for not doing homework can create fear of school itself.

Having said that, I do not suggest that we completely rule out punishments from incentive systems. Activities such as bullying, homicide, or larceny cannot go unpunished. But punishments alone cannot help us with behaviour modification.

So, what else should we do?

Let’s say, if every time you reach office late, you have to pay $5. You will reach office on time to avoid paying $5. Thus it will strengthen the behaviour of reaching office on time. This is an example of Negative Reinforcement. It is the removal of an unpleasant experience to strengthen desired behaviour. There are two ways learning can be encouraged via negative reinforcement.

BF Skinner demonstrated its power through experiments. He placed a rat in his Skinner Box and subjected it to an unpleasant electric current. This caused it some discomfort.

As the rat moved about in the box it would knock a lever, and the electric current would switch off immediately. The rat then learned to go straight to the lever after a few times of being put in the box. This form of learning is known as Escape Learning. It is the behaviour we perform to stop (or escape from) an ongoing, unpleasant, averse stimulus. It is because of escape learning we know that we have to hit for the exit to escape from fire in a building.

Not only this, Skinner even taught rats how to avoid electric currents altogether. Skinner placed a rat in a Skinner Box with an electrified floor. A light turned on, followed by an electric current passing through the floor. The rat soon learned to find an escape, such as a pole to climb or a barrier to jump over onto a nonelectric floor. It also learned to press a lever when the light came on to stop the electric current from switching on at all.

At first, the rats responded only when the shock began. But as the pattern got repeated, the rats learned to avoid the shock by responding to the warning signal. This is known as Avoidance Learning. It is the behaviour we perform to avoid an unpleasant experience. This is what makes us run for the exit on hearing the fire alarm. We don’t wait for the fire to appear.

Negative reinforcements are like punishments, yet they yield very different outputs. Punishments encourage suppressing bad behaviours. Negative reinforcements, like positive reinforcements, encourage desired behaviours. It is only by combining both of them that we will be able to change user behaviour. By reinforcing desired behaviours and punishing undesired ones.

Following are 3 useful methods of behaviour modification. They are related to changing surrounding events that relate to a person’s behaviour.

1. Behaviour Extinction:

Behaviour Extinction is the process of decreasing a problem behaviour by discontinuing its reinforcements. It is easier said than done. Because humans find novel ways to get reinforcements when the natural reinforcers stop.

A whining child would double down on its whining, or find other ways to get the attention of its parent. A better extinction procedure is two fold. Parents should withhold their attention during whining, and also reward more desirable behaviours with extra attention in absence of whining. Not paying attention to the whining makes it irrelevant, and extinct.

Serious misbehaviours like bullying or theft cannot be left ignored. They need to be punished. Along with that, their positive counterparts such as being helpful, or giving away unused toys, need to be rewarded more.

2. Behaviour Shaping:

BF Skinner proposed the idea of shaping desired behaviour through successive approximation. Successive rewards should be delivered as subjects progress towards the desired behaviour.

The conditions required to receive the reward should shift each time subjects move a step closer. For example, to make someone run 5 km daily, we should start by incentivising them to just get up and run daily. After this, we incentivise them for increasing the distance run by 500 m every week, till they reach the 5 km goal.

If they fail to achieve it, we take back all their rewards, and make them start from the beginning.

3. Premack’s Principle:

Premack’s principle states that more probable behaviours would reinforce less probable behaviours. A variation of this is known as Token Economy. Targeted behaviours are reinforced with tokens that can later be exchanged for rewards.

For example, people who enjoy exercise usually use a daily run as a reward for getting other chores, such as getting groceries, done. Similarly, children learn to sit still in class by being rewarded to run around and make noise in recess. If they make noise during class, recess gets cancelled.

Punishments and Negative Reinforcements are crucial in designing incentive systems. A good system that discourages bad behaviour is a mix of both. Punishments alone cannot achieve this.