Schedules of Reinforcement | Some Thoughts About Dogs

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

When training dogs, consciously or not, we undertake a pattern of delivering rewards. Sometimes, we give a dog a reward for every correct behaviour, sometimes the dog gets the reward for 3 times they get it right, or sometimes we mix it up and we may reward a sit-stay for 3 seconds, and then 8 seconds, then 5 seconds, and so forth.

Believe it or not, at some point, all the different patterns of rewards (“schedules of reinforcement”) have been given names and classified. First, I’ll describe these schedules of reinforcement, and then the ‘better’ alternative (according to Dunbar but also according to me!).

Continuous Reinforcement

Continuous reinforcement means rewarding a dog every time a response is performed.

Example: Every time you call your dog, you reward them with a treat.

Pros: This often builds up a very high level of response, as the dog understands they will get a treat, and so are highly motivated to accomplish the behaviour.

Cons: It’s easy to run out of treats! Because of the nature of the schedule, you reward everything – including ‘sloppy’ or ‘slow responses’.

Fixed Schedule – Ratio or Interval

A fixed schedule means that the reward is delivered on a consistence basis, though intermittently. Ratio or interval refers to number of responses and time passed during response respectively.

Example: Fixed Ratio: When your dog barks at the door, you normally ignore it the first and second time, but when they bark thrice, you let them in. Interval Ratio: This is difficult to conceptualise in dog training, as it assumes that the dog is maintaining a behaviour. The best example is training the stay: For every 10 seconds that the dog maintains a stay, you return and give the dog a treat.

Pros: Quick way for dog to learn. Good way to maintain behaviours. Prepares a dog to work without always getting rewarded (which is particularly useful for sports like obedience, where the dog cannot get rewarded in the ring).

Cons: Not as quick for dogs to understand as with continuous reinforcement. The dog’s behaviour make become ‘scalloped’, with more enthused and motivated behaviours nearer to the reward (e.g. if the dog catches on that they only get rewarded for every 3^rd sit, they will be more enthused about doing the 3^rd one than the 1^st and 2^nd). When the intervals or ratios are too long, the dog may ‘strike’ (i.e. quit perfoming the behaviour). Ratios are difficult to implement in practice, as it normally hard to count and train at the same time! Moving away from a fixed schedule often demotivates the dog (e.g. if the dog is used to being rewarded at every 3^rd behaviour, for this to cease, often means the dog will abruptly stop performing the behaviour).

Variable schedules – ratio or interval

Variable schedules are when the animal is rewarded at changing consistency. Ratio refers to the number of repetitions inbetween rewards, and interval refers to the time period inbetween rewards.

Example: Ratio: When you reward your dog for ‘sit’, you don’t reward them everytime. Instead, you reward them for their first successful behaviour, then you miss a couple, and reward the 4^th one, then you reward the 6^th behaviour, then you reward the 9^th behaviour, and so forth. (This is the principle that poker machines work on.) Interval: Again, if you have a dog in a stay, you could reward the dog for 5 seconds having passed, then 10 seconds, then 8 seconds, then 15 seconds, then 7 seconds, and so forth. So you are rewarding the dog based on a variable time interval schedule.

Pros: This reward schedule works well and is good for maintaining behaviours.

Cons: Too complex for any person to apply. It is very much a laboratory/computer type of training, and is too complex for people to implement. If you stretch the ratio too far, the dog will ‘strike’.

Dunbar’s Criticisms

In Dunbar’s eyes, these reinforcement schedules aim to maintain behaviours, but can allow for the quality of their behaviours to diminish. In fact, he says,

“The various schedules of reinforcement are theoretically interesting but the majority of them have limited practical application.” He doesn’t think these schedules are of use in puppy or dog training.

Dunbar’s Alternative: Differential Reinforcement

Instead, Dunbar calls for differential reinforcement. In this schedule, we reward only the best 10% of responses, and the rest are ignored. The best of the best responses get the best rewards. He argues that this system “maintain[s] high-levels of responding and ensure[s] ongoing… improvement”.

My Thoughts

I think Dunbar has been overly critical of the schedules of reinforcement. The truth is, we often reward dogs on ‘differential reinforcement’ anyway. It’s very natural for us to be more rewarding to dogs when they perform the behaviours we want in the way we want.

In my house, it’s not uncommon for me to ask my dogs to perform behaviours around the house. The common one is with my young dog Myrtle, who I often ask to ‘drop’. If she’s drops slowly, lazily, or just in an undesirable way, she gets a “good dog” and is released. If she drops fast, enthusiastically, instantly, at a distance, and with great focus and attention, she gets a “good dog”, she gets to chew on me, we run to the treat cupboard and get some food, and generally make a great fuss. We also have multiple levels in between, with moderate responses getting more moderate rewards.

I don’t think I’m exceptional in this way. Indeed, many positive trainers use ‘jackpots’ (‘awesome rewards’) when a dog does something exceptional in training. A long time ago, I posted on ‘mini-jackpots’, where I reward more successful attempts with more treats than average responses.

I agree with Dunbar in that differential reinforcement works well, but disagree with his assertion that we are not currently implementing it. Most people naturally get enthused when their dog behaves well, and rewards more elaborately for these responses. Furthermore, the term and use of ‘jackpotting’ is common. For these reasons, differential reinforcement, though effective, isn’t particularly novel or unique to Dunbar.