11/4/12

Schedules of Reinforcement

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

 

When training dogs, consciously or not, we undertake a pattern of delivering rewards.  Sometimes, we give a dog a reward for every correct behaviour, sometimes the dog gets the reward for 3 times they get it right, or sometimes we mix it up and we may reward a sit-stay for 3 seconds, and then 8 seconds, then 5 seconds, and so forth.

Believe it or not, at some point, all the different patterns of rewards (“schedules of reinforcement”) have been given names and classified. First, I’ll describe these schedules of reinforcement, and then the ‘better’ alternative (according to Dunbar but also according to me!).

 

Continuous Reinforcement

Continuous reinforcement means rewarding a dog every time a response is performed.

Example:  Every time you call your dog, you reward them with a treat.

Pros: This often builds up a very high level of response, as the dog understands they will get a treat, and so are highly motivated to accomplish the behaviour.

Cons: It’s easy to run out of treats!  Because of the nature of the schedule, you reward everything – including ‘sloppy’ or ‘slow responses’.

 

Fixed Schedule – Ratio or Interval

A fixed schedule means that the reward is delivered on a consistence basis, though intermittently.  Ratio or interval refers to number of responses and time passed during response respectively.

Example: Fixed Ratio: When your dog barks at the door, you normally ignore it the first and second time, but when they bark thrice, you let them in. Interval Ratio: This is difficult to conceptualise in dog training, as it assumes that the dog is maintaining a behaviour.  The best example is training the stay: For every 10 seconds that the dog maintains a stay, you return and give the dog a treat.

Pros: Quick way for dog to learn. Good way to maintain behaviours. Prepares a dog to work without always getting rewarded (which is particularly useful for sports like obedience, where the dog cannot get rewarded in the ring).

Cons: Not as quick for dogs to understand as with continuous reinforcement.  The dog’s behaviour make become ‘scalloped’, with more enthused and motivated behaviours nearer to the reward (e.g. if the dog catches on that they only get rewarded for every 3rd sit, they will be more enthused about doing the 3rd one than the 1st and 2nd).  When the intervals or ratios are too long, the dog may ‘strike’ (i.e. quit perfoming the behaviour).  Ratios are difficult to implement in practice, as it normally hard to count and train at the same time!  Moving away from a fixed schedule often demotivates the dog (e.g. if the dog is used to being rewarded at every 3rd behaviour, for this to cease, often means the dog will abruptly stop performing the behaviour).

 

Variable schedules – ratio or interval

Variable schedules are when the animal is rewarded at changing consistency.  Ratio refers to the number of repetitions inbetween rewards, and interval refers to the time period inbetween rewards.

Example:  Ratio: When you reward your dog for ‘sit’, you don’t reward them everytime. Instead, you reward them for their first successful behaviour, then you miss a couple, and reward the 4th one, then you reward the 6th behaviour, then you reward the 9th behaviour, and so forth.  (This is the principle that poker machines work on.)  Interval: Again, if you have a dog in a stay, you could reward the dog for 5 seconds having passed, then 10 seconds, then 8 seconds, then 15 seconds, then 7 seconds, and so forth. So you are rewarding the dog based on a variable time interval schedule.

Pros: This reward schedule works well and is good for maintaining behaviours.

Cons: Too complex for any person to apply.  It is very much a laboratory/computer type of training, and is too complex for people to implement.  If you stretch the ratio too far, the dog will ‘strike’.

Continue reading

10/13/12

The #@*$ing Four Quadrants (Dunbar)

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

 

Dunbar has a clear opinion on the four quadrants of operant conditioning: Ditch them!  Dunbar feels we have entered into a time of ‘quadrant worship’ when, in reality, the quadrant was only ever designed to be a memory aid. The quadrants have also led to a division in the dog community, with half the people worshipping positive rewards and negative punishment (i.e. “positive trainers”), and the other half worshipping negative rewards and positive punishment.

 

Here’s a little theory:  In the quadrants, positive means “you give” and negative means “negate” or take away.

Dunbar used this table to illustrate the quadrants:

Start Stop
Reward Positive Reinforcement Negative Punishment
Punish Positive Punishment Negative Reinforcement

 

Dunbar thinks this is a complicated way of viewing things.  He says that the dog doesn’t assess anything other than “did the environment get better or worse?”  He believes dogs have a binary outlook to life.  They see things as good or bad. Continue reading

06/17/12

Reward Training Techniques (Dunbar)

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

Dunbar described five reward training techniques:

 

 

Lure Reward Training
He called these ‘techniques that cause the behaviour’ and the ‘Plan A’ of dog training – that is, it should be the first option when teaching a dog a behaviour.  More about this method is outlined in my lure reward training post.

 

All or none reward training
Dunbar created ‘all or none’ reward training after thinking about dogs in shelter situations.  These dogs need to default to good behaviour, or just be ‘good’ without any verbal cues.  In all or none reward training, you just wait for the animal to do what you want, and reward it.  For example, if you have a dog on leash and wait long enough, they’ll eventually sit.  The term ‘all or none’ comes from the behaviour: He’s either sitting, or he’s not.  Dunbar advocates this way for inattentive or ‘crazy’ dogs, and suggests it should be the ‘Plan B’ in dog training.

 

Black and white working cross breed runs with a tennis ball in mouth.

Life rewards: Running, playing fetch. Much better than any boring treat!

 

Shaping (often with clickers) Continue reading

11/19/11

McGreevy on Rewarding Dogs

This post is part of the McGreevy seminar series. Click here for the index.

 

McGreevy had a lot to say about rewarding dogs.  Reward training is his preferred method of training for dogs.

Most importantly, to know what a dog wants and likes can help us in our training.  Dogs value a range of things, and each can be used as reward.  However, what a dog wants and likes varies in different contexts.  McGreevy was big on appreciating animals as individuals in order to get the best out of them.

McGreevy believes in allowing dogs to pick their own rewards, and allow dogs to be ‘creative’ in their reward choice.  The speed and strength of a dog’s learning can indicate how attractive the reward is.

Rewards can be innate (i.e. a primary reinforcer) or learned (i.e. a secondary reinforcer).

We can also influence the value of rewards.  For example, if we play with a ball before we throw it, it may act as a greater reinforcer.  Also, by fasting a dog, they have a higher drive for food.

He listed a number of things that could be used as reinforcers.  They are what dogs consider to be resources, and so they value them and will work for them.


Fun, surprises, and play

Dogs like fun surprises, like unpredictable or concealed rewards. Dogs like the ‘fun’ of being rewarded with magically appearing stuff.

Dogs are opportunistic and playful.  They like to play, and it can take time to play with dogs effectively (he mentioned Steve Austin as ‘great at playing with dogs’).  Dogs can value each other as resources and play companions.  (He mentioned Alexandra Horowitz book, Inside of a Dog, for more insights on dog play.)

McGreevy emphasised that, when playing with dogs, we need to avoid dogs putting teeth on humans.  Chasing and using teeth are innately rewarding for dogs, and we need to prevent the opportunity for them to learn that humans are appropriate to chase and teeth.

 

Food

McGreevy called a bowl of dog food “a bowlful of training opportunities”.  He did note that some people, however, are of them the mindset that it is ‘wrong’ to make dogs work for meals and instead the dogs should have an innate ability to please.

 

Other Rewards

Dogs, as a domesticated animal are social, so they can be rewarded with social interaction.

Some dogs can also be rewarded with exercise, training, water, sex, liberty, sanctuary, and comfort.

 

Personal Experiences

I have found so much diversity in my dogs and what they find rewarding. I think this has made me a better trainer, in having to work  with dogs as individuals and not taking a ‘one size fits all’ approach.

With Clover I spent a lot of time with her to ensure that she would work well for both food and toys.  She loves her tennis ball, but she sometimes gets over-aroused and stops thinking when training.  For this reason, I normally use food rewards with her as it keeps her motivated but not over-aroused.  She does, however, receive a tennis ball reward at the end of tracking.

Chip is a dog that I can reward with a pat, praise, and a cuddle. He likes food, and he likes toys, but he often gets over aroused with both of these rewards.  For Chip, when we track, he has a reward of a cuddle and praise at the end of the track.  He must like it, otherwise he wouldn’t track!

So do your dogs find rewarding? What are your more ‘creative’ rewards?

 

Further reading: Ian Dunbar on Reward Training Techniques
This post is part of the McGreevy seminar series. Click here for the index. Continue reading