11/4/12

Schedules of Reinforcement

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

 

When training dogs, consciously or not, we undertake a pattern of delivering rewards.  Sometimes, we give a dog a reward for every correct behaviour, sometimes the dog gets the reward for 3 times they get it right, or sometimes we mix it up and we may reward a sit-stay for 3 seconds, and then 8 seconds, then 5 seconds, and so forth.

Believe it or not, at some point, all the different patterns of rewards (“schedules of reinforcement”) have been given names and classified. First, I’ll describe these schedules of reinforcement, and then the ‘better’ alternative (according to Dunbar but also according to me!).

 

Continuous Reinforcement

Continuous reinforcement means rewarding a dog every time a response is performed.

Example:  Every time you call your dog, you reward them with a treat.

Pros: This often builds up a very high level of response, as the dog understands they will get a treat, and so are highly motivated to accomplish the behaviour.

Cons: It’s easy to run out of treats!  Because of the nature of the schedule, you reward everything – including ‘sloppy’ or ‘slow responses’.

 

Fixed Schedule – Ratio or Interval

A fixed schedule means that the reward is delivered on a consistence basis, though intermittently.  Ratio or interval refers to number of responses and time passed during response respectively.

Example: Fixed Ratio: When your dog barks at the door, you normally ignore it the first and second time, but when they bark thrice, you let them in. Interval Ratio: This is difficult to conceptualise in dog training, as it assumes that the dog is maintaining a behaviour.  The best example is training the stay: For every 10 seconds that the dog maintains a stay, you return and give the dog a treat.

Pros: Quick way for dog to learn. Good way to maintain behaviours. Prepares a dog to work without always getting rewarded (which is particularly useful for sports like obedience, where the dog cannot get rewarded in the ring).

Cons: Not as quick for dogs to understand as with continuous reinforcement.  The dog’s behaviour make become ‘scalloped’, with more enthused and motivated behaviours nearer to the reward (e.g. if the dog catches on that they only get rewarded for every 3rd sit, they will be more enthused about doing the 3rd one than the 1st and 2nd).  When the intervals or ratios are too long, the dog may ‘strike’ (i.e. quit perfoming the behaviour).  Ratios are difficult to implement in practice, as it normally hard to count and train at the same time!  Moving away from a fixed schedule often demotivates the dog (e.g. if the dog is used to being rewarded at every 3rd behaviour, for this to cease, often means the dog will abruptly stop performing the behaviour).

 

Variable schedules – ratio or interval

Variable schedules are when the animal is rewarded at changing consistency.  Ratio refers to the number of repetitions inbetween rewards, and interval refers to the time period inbetween rewards.

Example:  Ratio: When you reward your dog for ‘sit’, you don’t reward them everytime. Instead, you reward them for their first successful behaviour, then you miss a couple, and reward the 4th one, then you reward the 6th behaviour, then you reward the 9th behaviour, and so forth.  (This is the principle that poker machines work on.)  Interval: Again, if you have a dog in a stay, you could reward the dog for 5 seconds having passed, then 10 seconds, then 8 seconds, then 15 seconds, then 7 seconds, and so forth. So you are rewarding the dog based on a variable time interval schedule.

Pros: This reward schedule works well and is good for maintaining behaviours.

Cons: Too complex for any person to apply.  It is very much a laboratory/computer type of training, and is too complex for people to implement.  If you stretch the ratio too far, the dog will ‘strike’.

Continue reading

10/23/12

Dog training doesn’t happen in a laboratory!

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

Along with Dunbar’s criticisms of the four quadrants of operant conditioning, he also criticised learning theory for being “mostly irrelevant” to pet dog training.  ‘These days’, learning theory is common knowledge for most dog trainers, but Dunbar considers it to be mostly irrelevant in the ‘real world’ of dog training.

 

Outside of the laboratory is a whole wide world of training environments and possible rewards. So why are we so caught up on learning theory?

Much of learning theory has been established by computer-use of reinforcements and punishments.  To Dunbar, this means the findings of learning theory, as delivered a lab, is only relevant to lab settings.  In a laboratory, the subjects are normally rats or pigeons, computers control the training, and the animals are contained.  In the real world of dog training, humans are not computers (they are inconsistent), dogs are more complex than rats and pigeons, dogs escape from people (aren’t contained), and dogs bite!

But humans have an advantage: Humans have voice and can moderate their tone to reward and punish.  Computers cannot use verbal rewards or punishments, and so research on verbal feedback is almost entirely neglected.  Dunbar encourages verbal feedback to train recalls, and claims it is easy to do.  He believes that verbals are more expressive than clicks, jerks and shocks.  Verbals can describe how desirable behaviour was and also an appropriate alternative behaviour.

Punishment may be effective in a laboratory, but (to quote his handout) “people are inconsistent and so the dog quickly learns those times when he will not be punished, i.e., when the owner is physically-absent (dog at home alone), physically-present but functionally absent (dog off leash), or physically-present but mentally absent (owner day-dreaming or making a phone call).”  On top of this, owners normally have bad timing, and dogs learn to be separated from their owners to avoid punishments.  (See also: Dunbar’s thoughts on punishment.)  Dunbar described people as “screwed before we start” if we seek to replicate laboratory settings in real-world dog training. Continue reading

10/13/12

The #@*$ing Four Quadrants (Dunbar)

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

 

Dunbar has a clear opinion on the four quadrants of operant conditioning: Ditch them!  Dunbar feels we have entered into a time of ‘quadrant worship’ when, in reality, the quadrant was only ever designed to be a memory aid. The quadrants have also led to a division in the dog community, with half the people worshipping positive rewards and negative punishment (i.e. “positive trainers”), and the other half worshipping negative rewards and positive punishment.

 

Here’s a little theory:  In the quadrants, positive means “you give” and negative means “negate” or take away.

Dunbar used this table to illustrate the quadrants:

Start Stop
Reward Positive Reinforcement Negative Punishment
Punish Positive Punishment Negative Reinforcement

 

Dunbar thinks this is a complicated way of viewing things.  He says that the dog doesn’t assess anything other than “did the environment get better or worse?”  He believes dogs have a binary outlook to life.  They see things as good or bad. Continue reading

09/3/12

Food in Dog Training (Dunbar)

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

Food is very useful in dog training.

My notes are a little brief in this section, but I think (!) that Dunbar described four principle roles of food in dog training:

 

Brindle crossbreed dogs eyes off rawhide treat.

Photograph copyright Ravyk Photography.

1. Lure
Food can be used to lure desirable behaviours.  This is very effective for pet owners, who often do need food to make up for deficiencies in other areas (e.g. poor training, poor vocal control, etc).  Read more about lure-reward training.

2. Reward
Food can be used to reward desirable behaviours.

3. Classical conditioning
Classical conditioning is associating something good with something else.  For example, feeding dogs every time they see another dog means that the dog is more likely to associate other dogs with good things.

4. Distraction
Otherwise known as ‘proofing’ in training, food can be used as a distraction in training exercises.

 

What if the dog doesn’t like food?

If a dog doesn’t like food, they should be trained to like food!  Feed the dog by hand instead of from a bowl, or turn food into a secondary reinforce – “you have to eat the kibble for you do be allowed to do fun things”. Food is too useful to not have in your toolbox for behaviour modification.

 

08/29/12

Put your Problem on Cue (Dunbar)

This post is part of the series in response to Dunbar’s 2012 Australian seminars. See index.

Dunbar advocates putting the 8 big behavioural problems on cue, and then training the opposite.  The idea is that you can cue the ‘opposite’ (non-problematic) behaviour when the dog is displaying the problem behaviour.  The problem behaviour should be taught first, as he thinks dogs are more likely to display ‘the most recently taught’ behaviour.  These 8 behaviour problems, and their opposites, are:

 

Large white and brindle wire haired cross breed sleeping

“Settle down” – a useful behaviour to cue dogs to perform when they’re jazzed up or over enthused.

1. Jazz Up / Settle Down

Often dogs can be over excited, over stimulated, or generally ‘worked up’ and this can be problematic for owners.  For this reason it is useful to have a ‘settle down’ cue, but Dunbar of course suggests that you teach the opposite, too – a ‘jazz up’ cue.  You could turn this into a class game where the winner is the person who settles down their dog the fastest, or meets a 3 second deadline.  Teaching a dog to ‘jazz up’ is also easy, and often inspires and motivates class members to train.

‘Settle down’ is useful when trying to prevent problematic behaviours, such as excitement at the front door, or fence-fighting behaviour.  ‘Jazz up’ could also, potentially, be useful reward in the obedience ring.  Diane Baumann, in her traditional training book Beyond Basic Obedience, encourages owners to have an exciting cue (like ‘jazz up’) to mean an exercise is finished.

 

2. Woof / Shush Continue reading