Conditioning can be found everywhere in the world, from school, to work and in your own home. It is a form of learning whereby those being conditioned are taught to either do something, or not do something, depending on the consequence of their actions. The learning involved is described as being voluntary, contrary to classical conditioning which is an involuntary learned behaviour, for example becoming ill at the sight of a hospital due to previous experience of being ill at a hospital. Operant conditioning is also a major part of behaviourism theory, where Skinner believed that an action would always lead to a consequence. For example, a criminal stealing a car, which leads to their arrest and time in prison.
Operant (or instrumental) conditioning is founded on a reinforcement basis. There are two types of reinforcement; positive and negative. Positive reinforcement is where a particular action is rewarded, in the hope that such an action will be repeated again by that person. An example would be to give a child an ice cream for completing their homework. As the child has done their homework, the reward of the ice cream conditions the child to repeat their action again, when the need arises. The opposite form of reinforcement, negative, operates on a different level. This time an action is punished so that the behaviour will not happen again. A suitable example could be that this time a child has written on the walls, and has hence been shouted at by the parents. The parents here have negatively reinforced the child not to write on the walls, as the shouting as a deterrent to the child. To avoid facing the deterrent again, the child learns that they simply don’t write on the walls.
The most influential study into operant conditioning was by Skinner and his rats. The experiment undergone was where the rats were kept in cages, and the only way that they could access food was by pushing a lever which would release some. At first it took the rats a long time to discover that pushing the lever would release a pellet of food, but it appeared that as each pellet was rewarded to the rats, it took them less and less time to achieve it. Skinner (1938) argued that “an operant (the response played out by the rats) followed by a positive consequence was more likely to be emitted in the future”. This is certainly evident from the observations of his experiment, but it was questioned whether the rats had gained any knowledge from this, or instead just been conditioned to recreate this behaviour.
Skinner also put forth that external stimuli are not responsible for the responses of animals and humans. Contrary to Skinners belief however, there is evidence of a level of control over behaviour – known as discriminative stimuli. To explain, imagine a cat scratching wallpaper in the presence of the owners, and being scolded as a punishment. The owners are acting as positive discriminative stimuli. When the cat scratches without the presence of the owners, this acts a negative discriminative stimuli and the cats’ action goes relatively without consequence. The cat then learns only to scratch the wallpaper in the absence of the owners, as to avoid the punishment.
Although Skinner is considered the most influential psychologist in instrumental learning theories, the pioneer for this area of study into behaviour was by Edward Thorndike, who conducted an experiment into the conditioning of cats in 1898. His method was the puzzle box, and it consisted of a pedal that was attached to a rope attached to the door of the box, and the cat simply had to step on the pedal to open the door, and hence escape. The cat experienced a very similar learning curve to Skinners rats, and hence provided evidence for the lack of understanding – as the curve was continuously gradual rather than there being a point where the cat gained the knowledge that immediately pressing the lever would let it out.
Thorndike’s law of effect explains that the cats innate behaviour in the box such as meowing and scratching in an attempt to escape, was, after repeated trials, slowly replaced by the intended conditioned response of stepping on the pedal.
Both experiments explained boil down to the fundamental rules of operant conditioning, that due to the reward or either food or freedom, the animals were reinforced positively to continue this behaviour, despite the lack of understanding.
Negative reinforcement is another aspect of operant conditioning, and one whereby an individual or individuals are conditioned to cease a particular behaviour, through the execution of a punishment to them. The example addressed earlier with the child being shouted at can be criticised as the punishment, and in fact most punishments, raises ethics into the equation. The problem with punishment is that it must be severe enough in order to deter a behaviour and sufficiently condition that behaviour negatively, but not so severe as to label it as unethical. Research by Azrin ; Holtz (1966), and Church (1969) indicates that a severe initial punishment is more effective at conditioning than small punishments increasing to stronger punishments, as the latter can end up leading to an overall greater punishment, leading to more concerning ethical implications.
Different stimuli can act as reinforcers for different reasons, apart from apparent rewards and punishments. “Some stimuli serve as reinforcers because of their biological significance. These so-called primary reinforcers include food, water, escape from the scent of a predator, and so on – all stimuli with obvious biological importance” (Gleitman et al, 2007)
The final important point to make is regarding the schedule of conditioning. There are 4 main points that affect reinforcement in these different schedules. The first, fixed interval, is where the reinforcer is distributed after a set amount of time, and each the correct response elicits this reinforcer. The term fixed applies to the fact that the time delay is always constant. It’s counterpart, variable interval, works in the same way as fixed interval although the time different between each available reinforcer is random. This is slightly more effective at conditioning new behaviour than the former. Thirdly the method known as fixed ratio is described whereby in this case, the time difference between available reinforcers is irrelevant, and the number of responses elicited brings about the reinforcer. As the term suggests the number of responses required remains constant. Finally variable ratio, which is the most effective at conditioning, conditions as fixed ratio does, with the slight difference of the number of responses being varied. An example of this method could be a slot machine, whereby pushing the button (the response) could need to be repeated 1 or 100 times to attain the reward of money, which acts as the reinforcer.
To summarise, operant conditioning at its most basic level can be separated into four categories:
1. Positive Reinforcement
2. Negative Reinforcement
3. Positive Punishment
4. Negative Punishment
The positive and negative reinforcement supply a stimulus that is either pleasurable or not so, and so causes continued responses to acquire the pleasurable stimulus or to cease the undesirable one. Positive punishment is the introduction of an unpleasant stimulus after a specific response, so to decrease that response. Finally negative punishment is the removal of a favoured stimulus after a specific response again to decrease that behaviour. It can be seen that reinforcement is used only to increase responses, and punishment to do the opposite.