Operant conditioning is a form of psychological learning during which an individual modifies the occurrence and form of its own behavior due to the association of the behavior with a stimulus. Operant conditioning is distinguished from classical conditioning (also called respondent conditioning) in that operant conditioning deals with the modification of "voluntary behavior" or operant behavior. Operant behavior "operates" on the environment and is maintained by its consequences, while classical conditioning deals with the conditioning of reflexive (reflex) behaviors which are elicited by antecedent conditions. Behaviors conditioned via a classical conditioning procedure are not maintained by consequences.[1]
Reinforcement and punishment, the core tools of operant conditioning, are either positive (delivered following a response), or negative (withdrawn following a response). This creates a total of four basic consequences, with the addition of a fifth procedure known as extinction (i.e. no change in consequences following a response).
It is important to note that actors are not spoken of as being reinforced, punished, or extinguished; it is the actions that are reinforced, punished, or extinguished. Additionally, reinforcement, punishment, and extinction are not terms whose use is restricted to the laboratory. Naturally occurring consequences can also be said to reinforce, punish, or extinguish behavior and are not always delivered by people.
Here the terms positive and negative are not used in their popular sense, but rather: positive refers to addition, and negative refers to subtraction.
What is added or subtracted may be either reinforcement or punishment. Hence positive punishment is sometimes a confusing term, as it denotes the "addition" of a stimulus or increase in the intensity of a stimulus that is aversive (such as spanking or an electric shock). The four procedures are:
Also:
Operant conditioning, sometimes called instrumental conditioning or instrumental learning, was first extensively studied by Edward L. Thorndike (1874–1949), who observed the behavior of cats trying to escape from home-made puzzle boxes.[6] When first constrained in the boxes, the cats took a long time to escape. With experience, ineffective responses occurred less frequently and successful responses occurred more frequently, enabling the cats to escape in less time over successive trials. In his law of effect, Thorndike theorized that successful responses, those producing satisfying consequences, were "stamped in" by the experience and thus occurred more frequently. Unsuccessful responses, those producing annoying consequences, were stamped out and subsequently occurred less frequently. In short, some consequences strengthened behavior and some consequences weakened behavior. Thorndike produced the first known learning curves through this procedure.
B.F. Skinner (1904–1990) formulated a more detailed analysis of operant conditioning based on reinforcement, punishment, and extinction. Following the ideas of Ernst Mach, Skinner rejected Thorndike's mediating structures required by "satisfaction" and constructed a new conceptualization of behavior without any such references. So, while experimenting with some homemade feeding mechanisms, Skinner invented the operant conditioning chamber which allowed him to measure rate of response as a key dependent variable using a cumulative record of lever presses or key pecks.[7]
The first scientific studies identifying neurons that responded in ways that suggested they encode for conditioned stimuli came from work by Mahlon deLong[8][9] and by R.T. "Rusty" Richardson.[9] They showed that nucleus basalis neurons, which release acetylcholine broadly throughout the cerebral cortex, are activated shortly after a conditioned stimulus, or after a primary reward if no conditioned stimulus exists. These neurons are equally active for positive and negative reinforcers, and have been demonstrated to cause plasticity in many cortical regions.[10] Evidence also exists that dopamine is activated at similar times. There is considerable evidence that dopamine participates in both reinforcement and aversive learning.[11] Dopamine pathways project much more densely onto frontal cortex regions. Cholinergic projections, in contrast, are dense even in the posterior cortical regions like the primary visual cortex. A study of patients with Parkinson's disease, a condition attributed to the insufficient action of dopamine, further illustrates the role of dopamine in positive reinforcement.[12] It showed that while off their medication, patients learned more readily with aversive consequences than with positive reinforcement. Patients who were on their medication showed the opposite to be the case, positive reinforcement proving to be the more effective form of learning when the action of dopamine is high.
When using consequences to modify a response, the effectiveness of a consequence can be increased or decreased by various factors. These factors can apply to either reinforcing or punishing consequences.
Most of these factors exist for biological reasons. The biological purpose of the Principle of Satiation is to maintain the organism's homeostasis. When an organism has been deprived of sugar, for example, the effectiveness of the taste of sugar as a reinforcer is high. However, as the organism reaches or exceeds their optimum blood-sugar levels, the taste of sugar becomes less effective, perhaps even aversive.
The Principles of Immediacy and Contingency exist for neurochemical reasons. When an organism experiences a reinforcing stimulus, dopamine pathways in the brain are activated. This network of pathways "releases a short pulse of dopamine onto many dendrites, thus broadcasting a rather global reinforcement signal to postsynaptic neurons."[13] This results in the plasticity of these synapses allowing recently activated synapses to increase their sensitivity to efferent signals, hence increasing the probability of occurrence for the recent responses preceding the reinforcement. These responses are, statistically, the most likely to have been the behavior responsible for successfully achieving reinforcement. But when the application of reinforcement is either less immediate or less contingent (less consistent), the ability of dopamine to act upon the appropriate synapses is reduced.
Operant variability is what allows a response to adapt to new situations. Operant behavior is distinguished from reflexes in that its response topography (the form of the response) is subject to slight variations from one performance to another. These slight variations can include small differences in the specific motions involved, differences in the amount of force applied, and small changes in the timing of the response. If a subject's history of reinforcement is consistent, such variations will remain stable because the same successful variations are more likely to be reinforced than less successful variations. However, behavioral variability can also be altered when subjected to certain controlling variables.[14]
Avoidance learning belongs to negative reinforcement schedules. The subject learns that a certain response will result in the termination or prevention of an aversive stimulus. There are two kinds of commonly used experimental settings: discriminated and free-operant avoidance learning.
In discriminated avoidance learning, a novel stimulus such as a light or a tone is followed by an aversive stimulus such as a shock (CS-US, similar to classical conditioning). During the first trials (called escape-trials) the animal usually experiences both the CS (Conditioned Stimulus) and the US (Unconditioned Stimulus), showing the operant response to terminate the aversive US. During later trials, the animal will learn to perform the response already during the presentation of the CS thus preventing the aversive US from occurring. Such trials are called "avoidance trials."
In this experimental session, no discrete stimulus is used to signal the occurrence of the aversive stimulus. Rather, the aversive stimulus (mostly shocks) are presented without explicit warning stimuli. There are two crucial time intervals determining the rate of avoidance learning. This first one is called the S-S-interval (shock-shock-interval). This is the amount of time which passes during successive presentations of the shock (unless the operant response is performed). The other one is called the R-S-interval (response-shock-interval) which specifies the length of the time interval following an operant response during which no shocks will be delivered. Note that each time the organism performs the operant response, the R-S-interval without shocks begins anew.
This theory was originally established to explain learning in discriminated avoidance learning. It assumes two processes to take place:
In 1957, Skinner published Verbal Behavior, a theoretical extension of the work he had pioneered since 1938. This work extended the theory of operant conditioning to human behavior previously assigned to the areas of language, linguistics and other areas. Verbal Behavior is the logical extension of Skinner's ideas, in which he introduced new functional relationship categories such as intraverbals, autoclitics, mands, tacts and the controlling relationship of the audience. All of these relationships were based on operant conditioning and relied on no new mechanisms despite the introduction of new functional categories.
Applied behavior analysis, which is the name of the discipline directly descended from Skinner's work, holds that behavior is explained in four terms: conditional stimulus (SC), a discriminative stimulus (Sd), a response (R), and a reinforcing stimulus (Srein or Sr for reinforcers, sometimes Save for aversive stimuli).[15]
Operant hoarding is a referring to the choice made by a rat, on a compound schedule called a multiple schedule, that maximizes its rate of reinforcement in an operant conditioning context. More specifically, rats were shown to have allowed food pellets to accumulate in a food tray by continuing to press a lever on a continuous reinforcement schedule instead of retrieving those pellets. Retrieval of the pellets always instituted a one-minute period of extinction during which no additional food pellets were available but those that had been accumulated earlier could be consumed. This finding appears to contradict the usual finding that rats behave impulsively in situations in which there is a choice between a smaller food object right away and a larger food object after some delay. See schedules of reinforcement.[16]
However, an alternative perspective has been proposed by R. Allen and Beatrix Gardner.[17][18] Under this idea, which they called "feedforward," animals learn during operant conditioning by simple pairing of stimuli, rather than by the consequences of their actions. Skinner asserted that a rat or pigeon would only manipulate a lever if rewarded for the action, a process he called "shaping" (reward for approaching then manipulating a lever).[19] However, in order to prove the necessity of reward (reinforcement) in lever pressing, a control condition where food is delivered without regard to behavior must also be conducted. Skinner never published this control group. Only much later was it found that rats and pigeons do indeed learn to manipulate a lever when food comes irrespective of behavior. This phenomenon is known as autoshaping.[20] Autoshaping demonstrates that consequence of action is not necessary in an operant conditioning chamber, and it contradicts the law of effect. Further experimentation has shown that rats naturally handle small objects, such as a lever, when food is present.[21] Rats seem to insist on handling the lever when free food is available (contra-freeloading)[22][23] and even when pressing the lever leads to less food (omission training).[24][25] Whenever food is presented, rats handle the lever, regardless if lever pressing leads to more food. Therefore, handling a lever is a natural behavior that rats do as preparatory feeding activity, and in turn, lever pressing cannot logically be used as evidence for reward or reinforcement to occur. In the absence of evidence for reinforcement during operant conditioning, learning which occurs during operant experiments is actually only Pavlovian (classical) conditioning. The dichotomy between Pavlovian and operant conditioning is therefore an inappropriate separation.
|