Blue Merle Minion: Operant Conditioning

(A rather more technical piece than I usually publish on here. This is a piece of coursework I produced and I'm so pleased with it that I wanted to share it 😊)

It is a perhaps sad truth that the modern domestic dog lives their life in a world largely designed for the convenience of its human inhabitants. Living in close proximity with humans brings behavioural requirements for those dogs to continue living somewhat harmoniously alongside the humans. This means that most humans desire a way to control the behaviours that the dogs around them display.

‘Operant conditioning is a science that comes complete with a difficult body of knowledge to master and apply with skill,’ (Burch and Bailey, 1999. p. 46)

One way in which behaviours can be increased or decreased systematically is by the provision of consequences that follow the behaviour. Operant learning is a form of conditioning, defined as ‘a behaviour change process wherein behaviours become more or less likely to occur across subsequent occasions due to the consequences that the behaviours have generated,’ (O’Heare, 2017. p. 54). The concept of operant conditioning received its first full definition in the work of B.F. Skinner, specifically in his book titled ‘The Behaviour of Organisms’ (Skinner, 1991). Skinner extended the work undertaken by Edward Thorndike, taking his Law of Effect that demonstrated food outside of a box encouraged a cat placed inside of the box to escape quicker (Thorndike, 1927).

Skinner added the word reinforcement to describe one of the factors that may affect whether a particular behaviour occurs on future occasions, the other being punishment. ‘Reinforcement occurs when a behaviour, followed by a consequent stimulus, is strengthened or becomes more likely to occur again,’ (Burch and Bailey, 1999. p.27). Punishment works in the opposite direction meaning that if the consequence of a behaviour is the application of a punisher, the behaviour is less likely to occur again in a similar situation (Burch and Bailey, 1999).

As well as these terms reinforcement and punishment, the principles of operant behaviour are also described using the terms ‘positive’ and ‘negative’ (Burch and Bailey, 1999). Often misunderstood as having positive or negative effects, in the context of operant conditioning the words mean adding in a stimulus (positive) or removing a stimulus (negative) to affect the likelihood of behavioural repetition.

The selections of positive or negative, reinforcement or punishment combine to give four options available for modification of behaviour: positive reinforcement, positive punishment, negative reinforcement and negative punishment (Burch and Bailey, 1999). Often, dog training professionals may refer to these four different outcomes as the four quadrants of operant conditioning (Starling et al., 2013). To sum up these options briefly:

Positive reinforcement results in a favourable consequent stimulus occurring, increasing the likelihood of behaviour repetition. In this quadrant, the dog receives something that they like and value on performing the desired behaviour. Most often, this comes in the form of food, but may also be a toy, a game, or human attention. “The evocative stimulus evokes the behaviour because of a history of reinforcement associated with the behaviour occurring immediately following that stimulus,” (O’Heare, 2017. p. 45).
Positive punishment results in an unfavourable consequent stimulus occurring, decreasing the likelihood of behaviour repetition. This includes aversive measures such as shouting, the use of shock, prong, or choke collars, leash ‘corrections’, and physically striking a dog.
Negative reinforcement results in the removal of an unfavourable stimulus, increasing the likelihood of behaviour repetition. In this category, we can include the stopping of shock when the desired behaviour occurs, or the lessening of pain from a prong or choke collar when the dog stops pulling on the leash, for example. This utilises escape conditioning or avoidance conditioning (Burch and Bailey, 1999).
Negative punishment results in the removal of a favourable stimulus, decreasing the likelihood of behaviour repetition. Examples include the removal of a toy or food item, or the removal of attention from a favoured human.

From an ethical standpoint, these quadrants can be organised into levels of appropriateness for use and their impacts on the welfare of the dog (Vieira de Castro et al., 2020). Of the quadrants listed above, positive reinforcement is the most ethically sound, as it does not involve any stimuli that the dog may find aversive. Another important note is that the dog is the individual who decides whether a particular stimulus is aversive or not (Burch and Bailey, 1999), and the dog decides how reinforcing any particular stimulus is to them (Vicars, Miguel and Sobie, 2014). If the presentation of that stimulus does not result in the strengthening of the behaviour, it is not reinforcing.

Finn's awaiting reinforcement delivery face 😍

Proponents of so-called ‘balanced’ training claim that to be effective trainers, all of the quadrants should be utilised. A study compared the use of one aversive training device, the electric shock collar, to the performance of positive reinforcement based trainers (China, Mills and Cooper, 2020). The study utilised three groups of dogs and trainers. One group trained the dogs using electric shock collars. A control group of dogs (control group one) was trained by the electric collar trainers but without shock collars. A second control group of dogs trained with positive reinforcement based trainers. Control group one used more hand and lead signals (including negative reinforcement methods such as lead pressure) than the electric collared group, while control group two used the least. The second control group also had the shortest latency to respond to cues, and a higher obedient response proportion to a single cue than the other groups. Given that the results of the study indicate electric shock collars perform at best no better than positive reinforcement, the ethical considerations of using aversive measures indicate that canine training professionals should utilise positive reinforcement.

Reinforcers used for positive reinforcement fall into two categories (O’Heare, 2017). Primary reinforcers are those linked to biological imperatives, such as food and mating opportunities for example. Other examples of primary reinforcers can include the opportunity to carry out and complete specific behaviour patterns, such as modal action patterns (Burch and Bailey, 1999). Secondary reinforcers are also known as conditioned reinforcers (Burch and Bailey, 1999), as their function as reinforcers requires some conditioning to take place. Common examples of secondary reinforcers include verbal praise and using marker sounds such as a particular word or a clicker. The timing of marking a behaviour is vital to ensure that the dog is not stopped from fully showing the behaviour if it comes too soon, or offering too many behaviours if it is marked too late. The marker must be ‘charged’ so that the dog understands what the marker means before use for training. To do this, the trainer would click or say the chosen marker word and immediately give the dog a reward. Through the process of associating these marker sounds with a primary reinforcer, most typically a food reward or praise and attention from the human, the formerly neutral stimulus becomes conditioned and functions as an indicator that the behaviour occurring at the exact moment the marker sound occurs is a desired behaviour and something that the dog values will appear imminently.

Consequences can be intrinsic or extrinsic (O’Heare, 2017). Intrinsic consequences are those generated directly by the behaviour, often within the dog’s body. Chasing prey can allow a dog to complete a modal action pattern, a natural and instinctive chain of movements that for some dogs may be an intrinsic reinforcer. The act of chasing prey can provide, if the dog captures the prey, a primary reinforcer of a meal for the dog. Extrinsic consequences are provided by another, and may be referred to as socially mediated (O’Heare, 2017). The provision of food, a toy, or praise from a human are common extrinsic consequences.

Within positive reinforcement training, there are a number of techniques available to train dogs in specific behaviours.

Luring involves taking behaviours the dog offers naturally or those already trained and using food rewards to lure the dog into altering the behaviour (Hiby, Rooney and Bradshaw, 2004). Luring is a common method used to train puppies to sit for example, by holding a food reward in front of their nose and lifting up and back over their head until their hind end lowers to the ground.

Capturing is marking and rewarding behaviours that the dog offers naturally when they occur (O’Heare, 2017). This is very low stress for the dog, which is an advantage, but depends on the behaviour being something natural that the dog offers.

Shaping involves taking a previously trained behaviour or one that the dog offers naturally and withholding reinforcement until the dog makes a movement that progresses towards a desired end behaviour (Burch and Bailey, 1999; O’Heare, 2017). The difference between the starting behaviour and final behaviour is broken down into a series of small steps. Once the dog has become proficient at each increased step, the reinforcement is withheld once more until the dog offers a behaviour closer to the desired result.

Another operant conditioning process is extinction. “Extinction occurs when a behaviour that has been previously reinforced is no longer reinforced, and the result is that the behaviour no longer occurs,” (Burch and Bailey, 1999. p. 49). The process can occur deliberately, with reinforcement consistently withheld following the behaviour’s demonstration, or accidentally when the rate of reinforcement is not high enough to maintain the strength of the behaviour. Most often thought of as withholding of the pleasant consequence of positive reinforcement, extinction could also be achieved through the withholding of negative reinforcement. Instead of not providing the food reward following the behaviour, in negative reinforcement the unpleasant stimulus is not removed as a consequence of the behaviour occurring.

When utilising extinction as a process, the behaviour strengthen and become more frequent for a period in an extinction burst (Burch and Bailey, 1999). Although an extinction burst may be difficult to tolerate if no reinforcement takes place the extinction burst will be short lived. If reinforcement does occur during an extinction burst, further attempts at extinction may be less likely to succeed, as intermittently reinforced behaviours are more resistant to extinction. In some cases after a period of a particular behaviour not being demonstrated that behaviour will start being shown again in spontaneous recovery (Burch and Bailey, 1999). If no reinforcement takes place, the behaviour should soon disappear once more.

Even the use of the most ethical of the operant processes raises questions on how much we should be manipulating canine behaviour. The ethics surrounding the entire concept of keeping companion animals is beginning to come under question (Bekoff, 2018) and this questioning of ethics can move through every factor involved in caring for and living with companion animals, including dogs. Who is the training or behaviour modification designed to benefit? If a dog is demonstrating normal, natural dog behaviour that is inconvenient to the people around them, does that constitute ‘bad’ behaviour? There is an argument existing that there is no good or bad behaviour when it comes to dogs, only behaviour and to change those behaviours purely for human convenience is not necessarily having the best interests of the dog central.

Another ethical consideration is whether what we are doing with dogs during training and behaviour modification is coercion and, if so, should we be using these methods at all. The dictionary definition of coerce is to ‘persuade an unwilling person to do something by force or threats’ (Waite, 2012. p.133). It is simple to see how this description may apply to positive punishment, as the dog tries to avoid the unpleasant stimulus, and negative reinforcement as the dog tries to escape the negative stimulus, by performing the desired behaviour. Negative punishment, although not involving physical pain for the dog, carries a threat of removal of something the dog values.

Positive reinforcement carries no connection with a threat as the desired result produces something the dog desires. Although the threat is not present, the pleasant consequence still functions as a persuader. The definition of operant conditioning is that it shapes behaviour due to consequences that follow the behaviour. To use operant conditioning for the purpose of reinforcing or reducing a behaviour we manipulate those consequences to influence the choices that the dog makes, which perhaps means we need to consider carefully the behaviours we require from the dogs around us.

Reference list:

Bekoff, M. (2018). Canine confidential: why dogs do what they do. Chicago: The University Of Chicago Press.

Burch, M. and Bailey, J., 1999. How Dogs Learn. Hoboken: John Wiley & Sons.

China, L., Mills, D.S. and Cooper, J.J. (2020). Efficacy of Dog Training With and Without Remote Electronic Collars vs. a Focus on Positive Reinforcement. Frontiers in Veterinary Science, 7.

Hiby, E. F., Rooney, N. J., & Bradshaw, J. W. S. (2004). Dog training methods: Their use, effectiveness and interaction with behaviour and welfare. Animal Welfare, 13(1), 63–69.

O’Heare. J., 2017. Science and Technology of Dog Training, (2nd Edition). Ottawa: BehaveTech Publishing.

Skinner, B.F. (1991). The behavior of organisms: an experimental analysis. Acton, Massachusetts: Copley Publishing Group.

Starling, M., Branson, N., Cody, D. and McGreevy, P. (2013). Conceptualising the Impact of Arousal and Affective State on Training Outcomes of Operant Conditioning. Animals, 3(2), pp.300–317.

Thorndike, E.L. (1927). The Law of Effect. The American Journal of Psychology, 39(1/4), p.212.

Vicars, S.M., Miguel, C.F. and Sobie, J.L. (2014). Assessing preference and reinforcer effectiveness in dogs. Behavioural Processes, 103, pp.75–83.

Vieira de Castro, A.C., Fuchs, D., Morello, G.M., Pastur, S., de Sousa, L. and Olsson, I.A.S. (2020). Does training method matter? Evidence for the negative impact of aversive-based methods on companion dog welfare. PLOS ONE, 15(12), p.e0225023.

Waite, M. (2012). Paperback Oxford English dictionary. Oxford: Oxford University Press.

Blue Merle Minion

Pages

Saturday, 16 October 2021

Operant Conditioning

No comments:

Post a Comment