Reinforcement
From Wikipedia, the free encyclopedia
- For the reinforcement of construction materials, see Rebar.
In operant conditioning, reinforcement is any change in an organism's surroundings that:
- occurs regularly when the organism behaves in a given way (that is, is contingent on a specific response),
- is contiguous with the behaviour (associated in time and space), and
- is associated with an increase in the probability that the response will be made or in another measure of its strength.
For example: you give your dog food every time it sits when you tell it to. If the dog becomes more likely to sit when commanded, sitting is considered to have been reinforced by the administration of food contingent on it.
Note that it is the behavior that is reinforced, not the dog. The food serves as a reinforcer, strengthening that behavior, in that sitting subsequently occurs more often or more quickly in similar situations because of it. Reinforcement can only be truly confirmed retrospectively. An object, item, food or other reinforcer can only be called such by demonstrating increases in behavior after their administration.
The study of reinforcement has produced an enormous body of reproducible experimental results. Reinforcement is the central concept and procedure in the experimental analysis of behavior.
Contents |
[edit] Types of reinforcements
There are two types of behavioral reinforcers and two types of behavioral punishers.
- Positive reinforcement changes the surroundings by adding a stimulus that increases the likelihood of the behavior occurring in the future. Some things which can generally act as positive reinforcers include food, recreational drugs, direct stimulation of pleasure centers in the brain and conditioned reinforcers such as money.
- Negative reinforcement changes the surroundings by removing an aversive stimulus - such as turning off a painful electric current or removing a conditioned reinforcer such as changing the channel during commercials. There are two types of negative reinforcement. Escape conditioning occurs when the aversive stimulus has already begun, and the behavior terminates it. Examples include scratching an itch or hitting the snooze button on an alarm clock. Avoidance conditioning occurs when the behavior allows an aversive stimulus to be avoided before it starts. Examples include eating to avoid hunger, and taking an alternate route to avoid a traffic jam.
- Positive punishment changes the surroundings by adding an aversive stimulus following a behaviour in order to decrease the likelihood of the behaviour occurring in the future. For example, a dog is given an electric shock whenever it barks at a novel object or a stranger.
- Negative punishment changes the surroundings by removing a stimulus that was previously and still a reinforcer. For example, candy is taken away from a child whenever the child behaves inappropriately.
decreases likelihood of behavior | increases likelihood of behavior | |
---|---|---|
presented | positive punishment | positive reinforcement |
taken away | negative punishment | negative reinforcement |
Distinguishing "positive" from "negative" in these cases is largely a matter of emphasis. For example, in a very warm room, a current of external air serving as reinforcement may be positive because it is relatively cool but negative because it removes the uncomfortably hot air. Some reinforcement can simultaneously be both positive and negative. For example, a drug addict may take drugs for the added euphoria and to get rid of withdrawal symptoms. Another example is eating. Eating adds pleasurable flavors while removing feelings of hunger. Until then, many behavioral psychologists simply refer to reinforcement or punishment—without polarity—to cover all consequent environmental changes.
[edit] Other reinforcement terms
- An unconditioned reinforcer, sometimes called a primary reinforcer, is a stimulus or situation considered to be inherently reinforcing, generally for biological reasons (such as affection, food, or opportunity for sleep).
- A conditioned reinforcer, sometimes called a secondary reinforcer, is a stimulus or situation that has acquired reinforcing power after being paired many times with an unconditioned reinforcer or an earlier conditioned reinforcer (such as money). In classical conditioning, it is referred to as second-order conditioning. An example would be praise.
- A generalized reinforcer is a conditioned reinforcer that has been paired with many other reinforcers (such as money, a secondary generalized reinforcer).
- Differential reinforcement of incompatible behavior (DRI) is used in reducing an already frequent behavior without punishing it by reinforcing a specific incompatible response (like leaving a room so that fighting with someone in it is not possible).
- In differential reinforcement of other behavior (DRO), any behavior other than some undesired behavior is reinforced.
- Differential reinforcement of low response rate (DRL): a behavior is reinforced only if it occurred infrequently. "If you ask me for a potato chip no more than once every 10 minutes, I will give it to you. If you ask more often, I will give you none."
- Differential reinforcement alternate behavior (DRA): the reinforcers for the undesirable behavior are used instead for a more desirable behavior. For example, a teacher will pay more attention to students who sit than those who talk in class (this assumes that attention from the teacher is reinforcing).
- In reinforcer sampling a potentially reinforcing but unfamiliar stimulus is presented to an animal without regard to any prior behavior. The stimulus may then later be used more effectively in reinforcement.
- Social reinforcement involves various sorts of access to and interaction with others.
- Satiation occurs when a stimulus that had reinforced some behavior no longer seems to do so. Food is an example - the flavor of food is less reinforcing if one is already full.
[edit] Schedules of reinforcement
Main article: Schedule of reinforcement
When enough of the variations in an animal's surroundings are reduced or "controlled," its behavior patterns after reinforcement are remarkably predictable. When rates of reinforcement are adjusted in particular ways, even very complex behavior patterns can be predicted. A schedule of reinforcement is the protocol for determining which responses (i.e., which individual occurrences of a given behavior) will be reinforced. The two extremes are continuous reinforcement, in which every response results in reinforcement, and extinction, in which no response is reinforced.
Other schedules include:
- Fixed ratio (FR), in which every nth response is reinforced.
- Fixed interval (FI), in which reinforcement occurs after the passage of a specified length of time from the beginning of training or from the last reinforcement, provided that at least one response occurred in that time period.
- Variable ratio (VR), in which the number of responses required between reinforcements varies, but on average equals a predetermined number.
- Variable interval (VI), in which reinforcement occurs after the passage of a varying length of time around an average, provided that at least one response occurred in that period.
Ratio schedules produce higher rates of responding than interval schedules. Variable schedules produce higher rates than fixed schedules. The variable ratio schedule produces both the highest rate of responding and the greatest resistance to extinction (that is, resistance to "petering out"). One notable example is gambling behavior. In the fixed ratio schedule, there's a pause after a reinforcer is delivered. This is called a post-reinforcement pause. The fixed interval schedule do produce post-reinforcement pauses, but they are scalloped-shape. Any responses produced before the elapsed time are not reinforced, therefore a subject has learned to respond at a gradual rate. If an organism is subject to a fixed ratio schedule and there is a sudden increase in the number of responses necessary to obtain a reinforcer (say from FR50 to FR250) then the organism is observed to pause periodically before the delivery of the reinforcer. This phenomenon is called the ratio strain and it contrasts with the usual pattern of postreinforcement pause - ratio run and reinforcement in FR-schedules. Concerning extinction, partial reinforcement schedules are more resistant than continuous reinforcement schedules. This phenomenon is called the Partial reinforcement extinction effect (PREE). Ratio schedules tend to be more resistant than interval schedules and variable schedules more resistant than fixed ones.
[edit] Shaping
Shaping involves reinforcing successive, increasingly accurate approximations of a response desired by a trainer. In training a rat to press a lever, for example, simply turning toward the lever will be reinforced at first. Then, only turning and stepping toward it will be reinforced. As training progresses, the response reinforced becomes progressively more like the desired behavior.
[edit] Chaining
Chaining involves linking discrete behaviors together in a series, such that each result of each behaviour is both the reinforcement (or consequence) for the previous behavior, and the stimuli (or antecedent) for the next behavior. There are many ways to teach chaining, such as forward chaining (starting from the first behavior in the chain), backwards chaining (starting from the last behavior) and total task chaining (in which the entire behavior is taught from beginning to end, rather than as a series of steps). An example would be opening a locked door. First the key is inserted, then turned, then the door opened. Forward chaining would teach the subject first to insert the key. Once that task is mastered, they are told to insert the key, and taught to turn it. Once that task is mastered, they are told to perform the first two, then taught to open the door. Backwards chaining would involve the teacher first inserting and turning the key, and the subject is taught to open the door. Once that is learned, the teacher inserts the key, and the subject is taught to turn it, then opens the door as the next step. Finally, the subject is taught to insert the key, and they turn and open the door. Once the first step is mastered, the entire task has been taught. Total task chaining would involve teaching the entire task as a single series, prompting through all steps. Prompts are faded (reduced) at each step as they are mastered.
[edit] Controversies
The standard idea of behavioral reinforcement has been criticized as circular, since it appears to argue that response strength is increased by reinforcement while defining reinforcement as something which increases response strength. Other definitions have been proposed, such as F. D. Sheffield's "consummatory behavior contingent on a response," but these are not broadly used in psychology.
[edit] History of the terms
In the 1920s Russian physiologist Ivan Pavlov may have been the first to use the word reinforcement with respect to behavior, but (according to Dinsmoor) he used its approximate Russian cognate sparingly, and even then it referred to strengthening an already-learned but weakening response. He did not use it, as it is today, for selecting and strengthening new behavior. Pavlov's introduction of the word extinction (in Russian) approximates today's psychological use.
In popular use, positive reinforcement is often used as a synonym for reward, with people (not behavior) thus being "reinforced," but this is contrary to the term's consistent technical usage. Negative reinforcement is often used by laypeople and even social scientists outside psychology as a synonym for punishment. This is contrary to modern technical use, but it was B. F. Skinner who first used it this way in his 1938 book. By 1953, however, he followed others in thus employing the word punishment, and he re-cast negative reinforcement for the removal of aversive stimuli.
[edit] See also
[edit] References
- Dinsmoor, James A. (2004) "The etymology of basic concepts in the experimental analysis of behavior." Journal of the Experimental Analysis of Behavior, 82 (3): 311-316.
- Michael, Jack. (1975) "Positive and negative reinforcement, a distinction that is no longer necessary; or a better way to talk about bad things." Behaviorism, 3 (1): 33-44.
- Skinner, B. F. (1938) The behavior of organisms. New York: Appleton-Century-Crofts.
- Chance, Paul. (2003) Learning and Behavior. 5th edition Toronto: Thomson-Wadsworth.