Using the Principles of Learning to Understand Everyday Behavior

Arlene Lacombe; Kathryn Dumper; Marilyn Lovett; Marion Perlmutter; Rose M. Spielman; William Jenkins

36 Using the Principles of Learning to Understand Everyday Behavior

Learning Objectives

Review the ways that learning theories can be applied to understanding and modifying everyday behavior.
Describe the situations under which reinforcement may make people less likely to enjoy engaging in a behavior.
Explain how principles of reinforcement are used to understand social dilemmas such as the prisoner’s dilemma and why people are likely to make competitive choices in them.

The principles of learning are some of the most general and most powerful in all of psychology. It would be fair to say that these principles account for more behavior using fewer principles than any other set of psychological theories. The principles of learning are applied in numerous ways in everyday settings. For example, operant conditioning has been used to motivate employees, to improve athletic performance, to increase the functioning of those suffering from developmental disabilities, and to help parents successfully toilet train their children (Simek & O’Brien, 1981; Pedalino & Gamboa, 1974; Azrin & Foxx, 1974; McGlynn, 1990). In this section we will consider how learning theories are used in advertising, in education, and in understanding competitive relationships between individuals and groups.

Using Classical Conditioning in Advertising

Classical conditioning has long been, and continues to be, an effective tool in marketing and advertising (Hawkins, Best, & Coney, 1998). The general idea is to create an advertisement that has positive features such that the ad creates enjoyment in the person exposed to it. The enjoyable ad serves as the unconditioned stimulus (US), and the enjoyment is the unconditioned response (UR). Because the product being advertised is mentioned in the ad, it becomes associated with the US, and then becomes the conditioned stimulus (CS). In the end, if everything has gone well, seeing the product online or in the store will then create a positive response in the buyer, leading him or her to be more likely to purchase the product.

A similar strategy is used by corporations that sponsor teams or events. For instance, if people enjoy watching a college basketball team play basketball, and if that team is sponsored by a product, such as Pepsi, then people may end up experiencing positive feelings when they view a can of Pepsi. Of course, the sponsor wants to sponsor only good teams and good athletes because these create more pleasurable responses.

Advertisers use a variety of techniques to create positive advertisements, including enjoyable music, cute babies, attractive models, and funny spokespeople. In one study, Gorn (1982) showed research participants pictures of different writing pens of different colors, but paired one of the pens with pleasant music and the other with unpleasant music. When given a choice as a free gift, more people chose the pen color associated with the pleasant music. And Schemer, Matthes, Wirth, and Textor (2008) found that people were more interested in products that had been embedded in music videos of artists that they liked and less likely to be interested when the products were in videos featuring artists that they did not like.

Another type of ad that is based on principles of classical conditioning is one that associates fear with the use of a product or behavior, such as those that show pictures of deadly automobile accidents to encourage seatbelt use or images of lung cancer surgery to discourage smoking. These ads have also been found to be effective (Das, de Wit, & Stroebe, 2003; Perloff, 2003; Witte & Allen, 2000), due in large part to conditioning. When we see a cigarette and the fear of dying has been associated with it, we are hopefully less likely to light up.

Taken together then, there is ample evidence of the utility of classical conditioning, using both positive as well as negative stimuli, in advertising. This does not, however, mean that we are always influenced by these ads. The likelihood of conditioning being successful is greater for products that we do not know much about, where the differences between products are relatively minor, and when we do not think too carefully about the choices (Schemer et al., 2008).

Psychology in Everyday Life: Operant Conditioning in the Classroom

John B. Watson and B. F. Skinner believed that all learning was the result of reinforcement, and thus that reinforcement could be used to educate children. For instance, Watson wrote in his book on behaviorism,

Give me a dozen healthy infants, well-formed, and my own specified world to bring them up in and I’ll guarantee to take any one at random and train him to become any type of specialist I might select—doctor, lawyer, artist, merchant-chief and, yes, even beggar-man and thief, regardless of his talents, penchants, tendencies, abilities, vocations, and race of his ancestors. I am going beyond my facts and I admit it, but so have the advocates of the contrary and they have been doing it for many thousands of years (Watson, 1930, p. 82).

Skinner promoted the use of programmed instruction, an educational tool that consists of self-teaching with the aid of a specialized textbook or teaching machine that presents material in a logical sequence (Skinner, 1965). Programmed instruction allows students to progress through a unit of study at their own rate, checking their own answers and advancing only after answering correctly. Programmed instruction is used today in many classes, for instance to teach computer programming (Emurian, 2009).

Although reinforcement can be effective in education, and teachers make use of it by awarding gold stars, good grades, and praise, there are also substantial limitations to using rewards to improve learning. To be most effective, rewards must be contingent on appropriate behavior. In some cases teachers may distribute rewards indiscriminately, for instance by giving praise or good grades to children whose work does not warrant it, in the hope that they will “feel good about themselves” and that this self-esteem will lead to better performance. Studies indicate, however, that high self-esteem alone does not improve academic performance (Baumeister, Campbell, Krueger, & Vohs, 2003). When rewards are not earned, they become meaningless and no longer provide motivation for improvement.

Another potential limitation of rewards is that they may teach children that the activity should be performed for the reward, rather than for one’s own interest in the task. If rewards are offered too often, the task itself becomes less appealing. Mark Lepper and his colleagues (Lepper, Greene, & Nisbett, 1973) studied this possibility by leading some children to think that they engaged in an activity for a reward, rather than because they simply enjoyed it. First, they placed some fun felt-tipped markers in the classroom of the children they were studying. The children loved the markers and played with them right away. Then, the markers were taken out of the classroom, and the children were given a chance to play with the markers individually at an experimental session with the researcher. At the research session, the children were randomly assigned to one of three experimental groups. One group of children (the expected reward condition) was told that if they played with the markers they would receive a good drawing award. A second group (the unexpected reward condition) also played with the markers, and also got the award—but they were not told ahead of time that they would be receiving the award; it came as a surprise after the session. The third group (the no reward group) played with the markers too, but got no award.

Then, the researchers placed the markers back in the classroom and observed how much the children in each of the three groups played with them. As you can see in the figure “Undermining Intrinsic Interest,” the children who had been led to expect a reward for playing with the markers during the experimental session played with the markers less at the second session than they had at the first session. The idea is that, when the children had to choose whether or not to play with the markers when the markers reappeared in the classroom, they based their decision on their own prior behavior. The children in the no reward groups and the children in the unexpected reward groups realized that they played with the markers because they liked them. Children in the expected award condition, however, remembered that they were promised a reward for the activity the last time they played with the markers. These children, then, were more likely to draw the inference that they play with the markers only for the external reward, and because they did not expect to get an award for playing with the markers in the classroom, they determined that they didn’t like them. Expecting to receive the award at the session had undermined their initial interest in the markers.

Undermining Intrinsic Interest

Mark Lepper and his colleagues (1973) found that giving rewards for playing with markers, which the children naturally enjoyed, could reduce their interest in the activity.
Adapted from Lepper, M. R., Greene, D., & Nisbett, R. E. (1973). Undermining children’s intrinsic interest with extrinsic reward: A test of the “overjustification” hypothesis. Journal of Personality & Social Psychology, 28(1), 129–137.

This research suggests that, although giving rewards may in many cases lead us to perform an activity more frequently or with more effort, reward may not always increase our liking for the activity. In some cases reward may actually make us like an activity less than we did before we were rewarded for it. This outcome is particularly likely when the reward is perceived as an obvious attempt on the part of others to get us to do something. When children are given money by their parents to get good grades in school, they may improve their school performance to gain the reward. But at the same time their liking for school may decrease. On the other hand, rewards that are seen as more internal to the activity, such as rewards that praise us, remind us of our achievements in the domain, and make us feel good about ourselves as a result of our accomplishments are more likely to be effective in increasing not only the performance of, but also the liking of, the activity (Hulleman, Durik, Schweigert, & Harackiewicz, 2008; Ryan & Deci, 2002).

Other research findings also support the general principle that punishment is generally less effective than reinforcement in changing behavior. In a recent meta-analysis, Gershoff (2002) found that although children who were spanked by their parents were more likely to immediately comply with the parents’ demands, they were also more aggressive, showed less ability to control aggression, and had poorer mental health in the long term than children who were not spanked. The problem seems to be that children who are punished for bad behavior are likely to change their behavior only to avoid the punishment, rather than by internalizing the norms of being good for its own sake. Punishment also tends to generate anger, defiance, and a desire for revenge. Moreover, punishment models the use of aggression and ruptures the important relationship between the teacher and the learner (Kohn, 1993).

Reinforcement in Social Dilemmas

The basic principles of reinforcement, reward, and punishment have been used to help understand a variety of human behaviors (Rotter, 1945; Bandura, 1977; Miller & Dollard, 1941). The general idea is that, as predicted by principles of operant learning and the law of effect, people act in ways that maximize their outcomes, where outcomes are defined as the presence of reinforcers and the absence of punishers.

Consider, for example, a situation known as the commons dilemma, as proposed by the ecologist Garrett Hardin (1968). Hardin noted that in many European towns there was at one time a centrally located pasture, known as the commons, which was shared by the inhabitants of the village to graze their livestock. But the commons was not always used wisely. The problem was that each individual who owned livestock wanted to be able to use the commons to graze his or her own animals. However, when each group member took advantage of the commons by grazing many animals, the commons became overgrazed, the pasture died, and the commons was destroyed.

Although Hardin focused on the particular example of the commons, the basic dilemma of individual desires versus the benefit of the group as a whole can also be found in many contemporary public goods issues, including the use of limited natural resources, air pollution, and public land. In large cities most people may prefer the convenience of driving their own car to work each day rather than taking public transportation. Yet this behavior uses up public goods (the space on limited roadways, crude oil reserves, and clean air). People are lured into the dilemma by short-term rewards, seemingly without considering the potential long-term costs of the behavior, such as air pollution and the necessity of building even more highways.

A social dilemma such as the commons dilemma is a situation in which the behavior that creates the most positive outcomes for the individual may in the long term lead to negative consequences for the group as a whole. The dilemmas are arranged in a way that it is easy to be selfish, because the personally beneficial choice (such as using water during a water shortage or driving to work alone in one’s own car) produces reinforcements for the individual. Furthermore, social dilemmas tend to work on a type of “time delay.” The problem is that, because the long-term negative outcome (the extinction of fish species or dramatic changes in the earth’s climate) is far away in the future and the individual benefits are occurring right now, it is difficult for an individual to see how many costs there really are. The paradox, of course, is that if everyone makes the personally selfish choice in an attempt to maximize his or her own outcomes, the long-term result is poorer outcomes for every individual in the group. Each individual prefers to make use of the public goods for himself or herself, whereas the best outcome for the group as a whole is to use the resources more slowly and wisely.

One method of understanding how individuals and groups behave in social dilemmas is to create such situations in the laboratory and observe how people react to them. The best known of these laboratory simulations is called the prisoner’s dilemma game (Poundstone, 1992). This game represents a social dilemma in which the goals of the individual compete with the goals of another individual (or sometimes with a group of other individuals). Like all social dilemmas, the prisoner’s dilemma assumes that individuals will generally try to maximize their own outcomes in their interactions with others.

In the prisoner’s dilemma game, the participants are shown a payoff matrix in which numbers are used to express the potential outcomes for each of the players in the game, given the decisions each player makes. The payoffs are chosen beforehand by the experimenter to create a situation that models some real-world outcome. Furthermore, in the prisoner’s dilemma game, the payoffs are normally arranged as they would be in a typical social dilemma, such that each individual is better off acting in his or her immediate self-interest, and yet if all individuals act according to their self-interests, then everyone will be worse off.

In its original form, the prisoner’s dilemma game involves a situation in which two prisoners (we’ll call them Frank and Malik) have been accused of committing a crime. The police believe that the two worked together on the crime, but they have only been able to gather enough evidence to convict each of them of a more minor offense. In an attempt to gain more evidence, and thus to be able to convict the prisoners of the larger crime, each of the prisoners is interrogated individually, with the hope that he will confess to having been involved in the more major crime, in return for a promise of a reduced sentence if he confesses first. Each prisoner can make either the cooperative choice (which is to not confess) or the competitive choice (which is to confess).

The incentives for either confessing or not confessing are expressed in a payoff matrix such as the one shown in the figure “The Prisoner’s Dilemma.” The top of the matrix represents the two choices that Malik might make (to either confess that he did the crime or not confess), and the side of the matrix represents the two choices that Frank might make (also to either confess or not confess). The payoffs that each prisoner receives, given the choices of each of the two prisoners, are shown in each of the four squares.

The Prisoner’s Dilemma

If both prisoners make the cooperative choice by not confessing (the situation represented in the upper left square of the matrix), there will be a trial, the limited available information will be used to convict each prisoner, and they each will be sentenced to a relatively short prison term of 3 years. However, if either of the prisoners confesses, turning “state’s evidence” against the other prisoner, then there will be enough information to convict the other prisoner of the larger crime, and that prisoner will receive a sentence of 30 years, whereas the prisoner who confesses will get off free. These outcomes are represented in the lower left and upper right squares of the matrix. Finally, it is possible that both players confess at the same time. In this case there is no need for a trial, and in return the prosecutors offer a somewhat reduced sentence (of 10 years) to each of the prisoners.

The prisoner’s dilemma has two interesting characteristics that make it a useful model of a social dilemma. For one, the prisoner’s dilemma is arranged such that a positive outcome for one player does not necessarily mean a negative outcome for the other player. If you consider again the matrix in the figure “The Prisoner’s Dilemma,” you can see that if one player makes the cooperative choice (to not confess) and the other takes the competitive choice (to confess), then the prisoner who cooperates loses, whereas the other prisoner wins. However, if both prisoners make the cooperative choice, each remaining quiet, then neither gains more than the other, and both prisoners receive a relatively light sentence. In this sense both players can win at the same time.

Second, the prisoner’s dilemma matrix is arranged such that each individual player is motivated to make the competitive choice, because this choice leads to a higher payoff regardless of what the other player does. Imagine for a moment that you are Malik, and you are trying to decide whether to cooperate (don’t confess) or to compete (confess). And imagine that you are not really sure what Frank is going to do. Remember the goal of the individual is to maximize outcomes. The values in the matrix make it clear that if you think that Frank is going to confess, you should confess yourself (to get 10 rather than 30 years in prison). And, it is also clear that if you think Frank is not going to confess, you should still confess (to get 0 rather than 3 years in prison). So the matrix is arranged such that the “best” alternative for each player, at least in the sense of pure reward and self-interest, is to make the competitive choice, even though in the end both players would prefer the combination in which both players cooperate to the one in which they both compete.

Although initially specified in terms of the two prisoners, similar payoff matrices can be used to predict behavior in many different types of dilemmas involving two or more parties and including choices of helping and not helping, working and loafing, and paying and not paying debts. For instance, we can use the prisoner’s dilemma to help us understand roommates living together in a house who might not want to contribute to the housework. Each of them would be better off if they relied on the other to clean the house. Yet if neither of them makes an effort to clean the house (the cooperative choice), the house becomes a mess and they will both be worse off.

Summary

Classical conditioning was first studied by physiologist Ivan Pavlov. In classical conditioning a person or animal learns to associate a neutral stimulus (the conditioned stimulus, or CS) with a stimulus (the unconditioned stimulus, or US) that naturally produces a behavior (the unconditioned response, or UR). As a result of this association, the previously neutral stimulus comes to elicit the same or similar response (the conditioned response, or CR).

Classically conditioned responses show extinction if the CS is repeatedly presented without the US. The CR may reappear later in a process known as spontaneous recovery.

Organisms may show stimulus generalization, in which stimuli similar to the CS may produce similar behaviors, or stimulus discrimination, in which the organism learns to differentiate between the CS and other similar stimuli.

Second-order conditioning occurs when a second CS is conditioned to a previously established CS.

Psychologist Edward Thorndike developed the law of effect: the idea that responses that are reinforced are “stamped in” by experience and thus occur more frequently, whereas responses that are punishing are “stamped out” and subsequently occur less frequently.

B.F. Skinner (1904–1990) expanded on Thorndike’s ideas to develop a set of principles to explain operant conditioning.

Positive reinforcement strengthens a response by presenting something pleasant after the response, and negative reinforcement strengthens a response by reducing or removing something unpleasant. Positive punishment weakens a response by presenting something unpleasant after the response, whereas negative punishment weakens a response by reducing or removing something pleasant.

Shaping is the process of guiding an organism’s behavior to the desired outcome through the use of reinforcers.

Reinforcement may be either partial or continuous. Partial-reinforcement schedules are determined by whether the reward is presented on the basis of the time that elapses between rewards (interval) or on the basis of the number of responses that the organism engages in (ratio), and by whether the reinforcement occurs on a regular (fixed) or unpredictable (variable) schedule.

Not all learning can be explained through the principles of classical and operant conditioning. Insight is the sudden understanding of the components of a problem that makes the solution apparent, and latent learning refers to learning that is not reinforced and not demonstrated until there is motivation to do so.

Learning by observing the behavior of others and the consequences of those behaviors is known as observational learning. Aggression, altruism, and many other behaviors are learned through observation.

Learning theories can and have been applied to change behaviors in many areas of everyday life. Some advertising uses classical conditioning to associate a pleasant response with a product.

Rewards are frequently and effectively used in education but must be carefully designed to be contingent on performance and to avoid undermining interest in the activity.

Social dilemmas, such as the prisoner’s dilemma, can be understood in terms of a desire to maximize one’s outcomes in a competitive relationship.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Psychology Copyright © 2022 by LOUIS: The Louisiana Library Network is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.