Operant Conditioning: How to Incentivise and Reinforce Desired Behaviour
In the 1920s, B.F. Skinner conducted a series of experiments to study how rewards and punishments affect our behaviour. Skinner, unlike his contemporaries, refused to hypothesise about what happened inside the minds of people and animals, and preferred to focus on what we can observe. He believed more in objective measurements, such as how much people ate, and less about subjective matters such as how hungry people are, or how much pleasure they get from eating.
“If we don’t like the consequences of an action we’ve taken, we’re less likely to do it again. If we do like the consequences, we’re more likely to do it again.” That assumption is the basis of the Operant Conditioning Theory, a type of learning in which the strength of a behaviour is modified by its consequences, such as reward or punishment.
Behaviour which is reinforced tends to be repeated or strengthened, and behaviour which is not reinforced tends to die out.
Skinner studied operant conditioning by conducting experiments using animals which he placed in an operant conditioning chamber (popularly known as ‘Skinner Box’) which he used to study the effects of reinforcers on lab animals.
The rats in the box had to figure out how to do a task, such as pushing a lever, that would reward them with food. Such an automated system allowed Skinner to study conditioned behaviour in a controlled setting.
It was found that consistency and timing of incentives (reinforcements) play important roles in shaping new behaviours. The best way to learn a new behaviour is through Continuous Reinforcement-the desired behaviour is reinforced every time it’s performed.
If you want to teach your dog a new trick, it is good to reward him for every correct response. At the beginning of the learning curve, your failure to immediately respond to a positive behaviour might be misinterpreted as a sign of incorrect behaviour from the dog’s perspective. Continuous Reinforcement is important when developing new personal habits as well.
On the other hand, Intermittent Reinforcement is the best way to maintain an already learnt behaviour. It is the reinforcement that is given only some of the times after the desired behaviour occurs. It can vary in interval and quantity.
Skinner conducted experiments to understand how variability impacted animal behaviour. First, he placed pigeons inside a box rigged to deliver a food pellet to the birds every time they pressed a lever. The pigeons learned the cause-and-effect relationship between pressing the lever and receiving the food pretty easily.
In the next part of the experiment Skinner added variability. Instead of providing a pellet every time a pigeon tapped the lever, the machine discharged food after a random number of taps. Sometimes the lever dispensed food, other times not.
Skinner revealed that the intermittent reward dramatically increased the number of times the pigeons tapped the lever. A hungry pigeon would tap the lever 12,000 times an hour while being rewarded on average for only every 110 taps. Adding variability increased the frequency of the pigeons completing the intended action.
Skinner’s pigeons tell us a great deal about what helps drive our own behaviours. More recent experiments reveal that variability increases activity in the nucleus accumbens and spikes levels of the neurotransmitter dopamine, driving our hungry search for rewards.
Intermittent reinforcement can be used on various schedules-ratio schedules and interval schedules-each with its own degree of effectiveness and situations to which it can be appropriately applied. Ratio schedules are based on the number of responses or the amount of work done, whereas interval schedules are based on the amount of time spent.
Fixed Ratio-Interval Reinforcement
Behaviour is reinforced only after the behaviour occurs a specified number of times, such as five lever pushes, or after a fixed interval, such as 5 minutes of pushing lever.
Even a pigeon or a rat inside a Skinner Box programmed for a fixed ratio-interval schedule learns that lever presses beyond the required minimum are just a waste of energy. Similarly, all 9-5 job employees know well that working beyond 5 o’clock is meaningless.
Variable Ratio-Interval Reinforcement
Behaviour is reinforced in variable ratios after an unpredictable number of times. Telemarketers, salespeople, and slot machine players are on this schedule because they never know when the next sale or win will occur, and how big it would be.
Social media apps, such as Facebook, Instagram, and Snapchat, use this model to engage you. Once you post something online, you never know when or how many likes, or comments, or followers you’re going to get. The rewards vary with every post, and you are constantly trying to figure out what works and what doesn’t. Unsurprisingly, this is the type of reinforcement that normally produces more responses than any other schedule.
While the design of reinforcements can be a powerful technique for continuing or amplifying a specific behaviour, it is important to recognise an important aspect of reinforcement-individual preferences for specific rewards. A cricket fan wouldn’t care for a football match. A vegan wouldn’t care for a cheeseburger.
But most of us usually want the same things-more power, more freedom, more fun, more thrill, more recognition, etc.,-in varying degrees. As long as they have the impression that pushing the lever is getting them nearer their goal, they are hooked.