Schedules of Reinforcement
This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.
When training dogs, consciously or not, we undertake a pattern of delivering rewards. Sometimes, we give a dog a reward for every correct behaviour, sometimes the dog gets the reward for 3 times they get it right, or sometimes we mix it up and we may reward a sit-stay for 3 seconds, and then 8 seconds, then 5 seconds, and so forth.
Believe it or not, at some point, all the different patterns of rewards (“schedules of reinforcement”) have been given names and classified. First, I’ll describe these schedules of reinforcement, and then the ‘better’ alternative (according to Dunbar but also according to me!).
Continuous reinforcement means rewarding a dog every time a response is performed.
Example: Every time you call your dog, you reward them with a treat.
Pros: This often builds up a very high level of response, as the dog understands they will get a treat, and so are highly motivated to accomplish the behaviour.
Cons: It’s easy to run out of treats! Because of the nature of the schedule, you reward everything – including ‘sloppy’ or ‘slow responses’.
Fixed Schedule – Ratio or Interval
A fixed schedule means that the reward is delivered on a consistence basis, though intermittently. Ratio or interval refers to number of responses and time passed during response respectively.
Example: Fixed Ratio: When your dog barks at the door, you normally ignore it the first and second time, but when they bark thrice, you let them in. Interval Ratio: This is difficult to conceptualise in dog training, as it assumes that the dog is maintaining a behaviour. The best example is training the stay: For every 10 seconds that the dog maintains a stay, you return and give the dog a treat.
Pros: Quick way for dog to learn. Good way to maintain behaviours. Prepares a dog to work without always getting rewarded (which is particularly useful for sports like obedience, where the dog cannot get rewarded in the ring).
Cons: Not as quick for dogs to understand as with continuous reinforcement. The dog’s behaviour make become ‘scalloped’, with more enthused and motivated behaviours nearer to the reward (e.g. if the dog catches on that they only get rewarded for every 3rd sit, they will be more enthused about doing the 3rd one than the 1st and 2nd). When the intervals or ratios are too long, the dog may ‘strike’ (i.e. quit perfoming the behaviour). Ratios are difficult to implement in practice, as it normally hard to count and train at the same time! Moving away from a fixed schedule often demotivates the dog (e.g. if the dog is used to being rewarded at every 3rd behaviour, for this to cease, often means the dog will abruptly stop performing the behaviour).
Variable schedules – ratio or interval
Variable schedules are when the animal is rewarded at changing consistency. Ratio refers to the number of repetitions inbetween rewards, and interval refers to the time period inbetween rewards.
Example: Ratio: When you reward your dog for ‘sit’, you don’t reward them everytime. Instead, you reward them for their first successful behaviour, then you miss a couple, and reward the 4th one, then you reward the 6th behaviour, then you reward the 9th behaviour, and so forth. (This is the principle that poker machines work on.) Interval: Again, if you have a dog in a stay, you could reward the dog for 5 seconds having passed, then 10 seconds, then 8 seconds, then 15 seconds, then 7 seconds, and so forth. So you are rewarding the dog based on a variable time interval schedule.
Pros: This reward schedule works well and is good for maintaining behaviours.
Cons: Too complex for any person to apply. It is very much a laboratory/computer type of training, and is too complex for people to implement. If you stretch the ratio too far, the dog will ‘strike’.