11/4/12

Schedules of Reinforcement

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

 

When training dogs, consciously or not, we undertake a pattern of delivering rewards.  Sometimes, we give a dog a reward for every correct behaviour, sometimes the dog gets the reward for 3 times they get it right, or sometimes we mix it up and we may reward a sit-stay for 3 seconds, and then 8 seconds, then 5 seconds, and so forth.

Believe it or not, at some point, all the different patterns of rewards (“schedules of reinforcement”) have been given names and classified. First, I’ll describe these schedules of reinforcement, and then the ‘better’ alternative (according to Dunbar but also according to me!).

 

Continuous Reinforcement

Continuous reinforcement means rewarding a dog every time a response is performed.

Example:  Every time you call your dog, you reward them with a treat.

Pros: This often builds up a very high level of response, as the dog understands they will get a treat, and so are highly motivated to accomplish the behaviour.

Cons: It’s easy to run out of treats!  Because of the nature of the schedule, you reward everything – including ‘sloppy’ or ‘slow responses’.

 

Fixed Schedule – Ratio or Interval

A fixed schedule means that the reward is delivered on a consistence basis, though intermittently.  Ratio or interval refers to number of responses and time passed during response respectively.

Example: Fixed Ratio: When your dog barks at the door, you normally ignore it the first and second time, but when they bark thrice, you let them in. Interval Ratio: This is difficult to conceptualise in dog training, as it assumes that the dog is maintaining a behaviour.  The best example is training the stay: For every 10 seconds that the dog maintains a stay, you return and give the dog a treat.

Pros: Quick way for dog to learn. Good way to maintain behaviours. Prepares a dog to work without always getting rewarded (which is particularly useful for sports like obedience, where the dog cannot get rewarded in the ring).

Cons: Not as quick for dogs to understand as with continuous reinforcement.  The dog’s behaviour make become ‘scalloped’, with more enthused and motivated behaviours nearer to the reward (e.g. if the dog catches on that they only get rewarded for every 3rd sit, they will be more enthused about doing the 3rd one than the 1st and 2nd).  When the intervals or ratios are too long, the dog may ‘strike’ (i.e. quit perfoming the behaviour).  Ratios are difficult to implement in practice, as it normally hard to count and train at the same time!  Moving away from a fixed schedule often demotivates the dog (e.g. if the dog is used to being rewarded at every 3rd behaviour, for this to cease, often means the dog will abruptly stop performing the behaviour).

 

Variable schedules – ratio or interval

Variable schedules are when the animal is rewarded at changing consistency.  Ratio refers to the number of repetitions inbetween rewards, and interval refers to the time period inbetween rewards.

Example:  Ratio: When you reward your dog for ‘sit’, you don’t reward them everytime. Instead, you reward them for their first successful behaviour, then you miss a couple, and reward the 4th one, then you reward the 6th behaviour, then you reward the 9th behaviour, and so forth.  (This is the principle that poker machines work on.)  Interval: Again, if you have a dog in a stay, you could reward the dog for 5 seconds having passed, then 10 seconds, then 8 seconds, then 15 seconds, then 7 seconds, and so forth. So you are rewarding the dog based on a variable time interval schedule.

Pros: This reward schedule works well and is good for maintaining behaviours.

Cons: Too complex for any person to apply.  It is very much a laboratory/computer type of training, and is too complex for people to implement.  If you stretch the ratio too far, the dog will ‘strike’.

Continue reading

09/3/12

Food in Dog Training (Dunbar)

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

Food is very useful in dog training.

My notes are a little brief in this section, but I think (!) that Dunbar described four principle roles of food in dog training:

 

Brindle crossbreed dogs eyes off rawhide treat.

Photograph copyright Ravyk Photography.

1. Lure
Food can be used to lure desirable behaviours.  This is very effective for pet owners, who often do need food to make up for deficiencies in other areas (e.g. poor training, poor vocal control, etc).  Read more about lure-reward training.

2. Reward
Food can be used to reward desirable behaviours.

3. Classical conditioning
Classical conditioning is associating something good with something else.  For example, feeding dogs every time they see another dog means that the dog is more likely to associate other dogs with good things.

4. Distraction
Otherwise known as ‘proofing’ in training, food can be used as a distraction in training exercises.

 

What if the dog doesn’t like food?

If a dog doesn’t like food, they should be trained to like food!  Feed the dog by hand instead of from a bowl, or turn food into a secondary reinforce – “you have to eat the kibble for you do be allowed to do fun things”. Food is too useful to not have in your toolbox for behaviour modification.

 

11/19/11

McGreevy on Rewarding Dogs

This post is part of the McGreevy seminar series. Click here for the index.

 

McGreevy had a lot to say about rewarding dogs.  Reward training is his preferred method of training for dogs.

Most importantly, to know what a dog wants and likes can help us in our training.  Dogs value a range of things, and each can be used as reward.  However, what a dog wants and likes varies in different contexts.  McGreevy was big on appreciating animals as individuals in order to get the best out of them.

McGreevy believes in allowing dogs to pick their own rewards, and allow dogs to be ‘creative’ in their reward choice.  The speed and strength of a dog’s learning can indicate how attractive the reward is.

Rewards can be innate (i.e. a primary reinforcer) or learned (i.e. a secondary reinforcer).

We can also influence the value of rewards.  For example, if we play with a ball before we throw it, it may act as a greater reinforcer.  Also, by fasting a dog, they have a higher drive for food.

He listed a number of things that could be used as reinforcers.  They are what dogs consider to be resources, and so they value them and will work for them.


Fun, surprises, and play

Dogs like fun surprises, like unpredictable or concealed rewards. Dogs like the ‘fun’ of being rewarded with magically appearing stuff.

Dogs are opportunistic and playful.  They like to play, and it can take time to play with dogs effectively (he mentioned Steve Austin as ‘great at playing with dogs’).  Dogs can value each other as resources and play companions.  (He mentioned Alexandra Horowitz book, Inside of a Dog, for more insights on dog play.)

McGreevy emphasised that, when playing with dogs, we need to avoid dogs putting teeth on humans.  Chasing and using teeth are innately rewarding for dogs, and we need to prevent the opportunity for them to learn that humans are appropriate to chase and teeth.

 

Food

McGreevy called a bowl of dog food “a bowlful of training opportunities”.  He did note that some people, however, are of them the mindset that it is ‘wrong’ to make dogs work for meals and instead the dogs should have an innate ability to please.

 

Other Rewards

Dogs, as a domesticated animal are social, so they can be rewarded with social interaction.

Some dogs can also be rewarded with exercise, training, water, sex, liberty, sanctuary, and comfort.

 

Personal Experiences

I have found so much diversity in my dogs and what they find rewarding. I think this has made me a better trainer, in having to work  with dogs as individuals and not taking a ‘one size fits all’ approach.

With Clover I spent a lot of time with her to ensure that she would work well for both food and toys.  She loves her tennis ball, but she sometimes gets over-aroused and stops thinking when training.  For this reason, I normally use food rewards with her as it keeps her motivated but not over-aroused.  She does, however, receive a tennis ball reward at the end of tracking.

Chip is a dog that I can reward with a pat, praise, and a cuddle. He likes food, and he likes toys, but he often gets over aroused with both of these rewards.  For Chip, when we track, he has a reward of a cuddle and praise at the end of the track.  He must like it, otherwise he wouldn’t track!

So do your dogs find rewarding? What are your more ‘creative’ rewards?

 

Further reading: Ian Dunbar on Reward Training Techniques
This post is part of the McGreevy seminar series. Click here for the index. Continue reading

11/2/11

McGreevy on Operant Conditioning

This post is part of the McGreevy seminar series. Click here for the index.

 

Please note: This article assumes some prior knowledge of operant or instrumental conditioning, as it mostly focuses on McGreevy’s comments on operant and instrumental conditioning, rather than on explaining these terms itself. If you are lacking a comprehensive understanding of Operant Conditioning, then I suggest this page from Crystal at Reactive Champion blog.  If you already have some idea of operant conditioning, come on in.  This may be confusing, but we can only hope it may add to your understanding.

Operant conditioning, also called instrumental conditioning, is when the animal’s voluntary response is instrumental (i.e. important) in establishing the consequence (i.e. reinforcement or punishment).  (By voluntary, we mean responses that the animal has control over.  Involuntary would be things like salivating or growing hair.)

McGreevy used the diagram below to consider operant conditioning.

Here, the ‘x’ marks the spot of neutral stimuli that does not modify behaviours.  That is, a neutral experience.  From here, stimuli can either be reinforcing and increase the probability of behaviours, or they can be punishing, and decrease the animal’s responses in question.  The purple arrows indicate negative punishment (-P) and negative reinforcement (-R).  Negative punishments use the removal of attractive stimuli to make a response less probable.  Negative reinforcements uses the removal of adverse stimuli to make a response more probable. Continue reading

01/5/11

Mini-Jackpotting

I’ve always been somewhat sceptical on the concept of jackpotting.  I don’t know why it has never sat well with me – it just seems a bit much to comprehend that dogs can have an understanding of a degree of success.

That being said, my experience does indicate some benefits in jackpotting.  I guess the best description of what I do is ‘mini-jackpotting’.This is what I use when free shaping behaviours, and I reward ‘more successful’ attempts with more food.

Over the last couple of days, I have been training scent identification and indication.  The process was very slow, until I started mini-jackpotting. In this example, my scent was a teabag and I wanted my dog to scratch/dig at the teabag.

Over the session, I was rewarded different interactions in different ways.  My dog would be rewarded with one piece of kibble if the looked at or moved towards the teabag.  I rewarded touching the object with a paw with numerous bits (about 5 pieces). An actual scratch or dig with about 10 pieces.

My dog was very slow at first, but mini-jackpotting seemed to very much speed up the learning process.  There are several reasons that this may be the case…

  1. I read once that dogs understand the time of a reward more than the quantity of the reward… i.e. Dogs find it more rewarding to be given 5 treats in a row, one after the other, rather than being given a handful of 5 treats.  So, dogs find a long reward more rewarding. (Unfortunately I don’t recall the source of this suggestion.)  As it takes more time to eat numerous treats, perhaps the dog understands this as more rewarding.
  2. Another approach on the time front is that when the dog is eating numerous treats, they are actually having time to think. Perhaps when I reward many-treats at once, the dog has more of an opportunity to think through and the improvements I see towards my target behaviour are actually from this thinking time, rather than the reward itself.
  3. The dog might actually understand that if they do x they get more treats than if they do y!

This is the most thought I’ve ever given to ‘mini jackpotting’, and I haven’t been very logical in its implementation.  If this system occurred by accident or subconscious desire to jackpot, I am unsure.  However, I have found it to be quite successful and I would be interested to see if anyone has had similar success.

Further reading: Schedules of Reinforcement