applying the behavioral perspective LO20774

Jon Krispin (jkrispin@prestolitewire.com)
Tue, 02 Mar 1999 16:49:46 -0500

replying to LO20678 (Leo Minnigh)

Greetings LOr's,

in LO20678, Leo wrote the following:

>In other words, positive reinforcers as attractors seem less effective
>than punishment, according to the 4:1 ratio.

...snip...

>Does this mean that we must generate four times as much energy to guide
>behaviour with this method, than if we use punishment?? If this is true,
>and honostly I do not have reasons to doubt, it explains much. ...snip...
>However, I still cannot link this observation with my thoughts on the
>push/pull-principles.

Greetings Leo,

I wanted to add some additional background to the explanation that John
Gunkler provided on the seeming paradox in the 4:1 (positive
reinforcement:punishment) ratio that I mentioned in LO20649. I will do
this by introducing the idea that behavior has inertia, or momentum and
explaining some of the basics of schedules of reinforcement to which John
and I have alluded in several messages. By doing this, I think it will
become even more clear why positive reinforcement is greatly preferred to
negative reinforcement, and why, despite the appearances, the 4:1 ratio
does not mean that we are required to expend far energy to use positive
reinforcement (although this is generally the case in the short term). It
also will help to provide a greater understanding of the various costs (in
terms of the consumption of free energy) required to change the flow of a
system of behavioral energy using the push (negative reinforcement and
aversive controls) and the pull (positive reinforcement).

The typical practice in behavioral research is to examine the effects that
different schedules of reinforcement have on the flow of behavior within
an individual subject. This is in contrast to the between subjects,
"snapshot" designs that are more typically used in psychology. The flow
of behavior is most often measured using the rate (or frequency) of the
occurrence of the behavior over time. The essence of a schedule of
reinforcement is to explain the relationship of the occurrence of
reinforcement relative to the occurrence of behavior. The terms of the
relationship are most often expressed in rules regarding how often, or
under what conditions a particular behavior will be reinforced.

The first distinction in the different types of schedules of reinforcement
that can be made is between continuous reinforcement (CRF) and
intermittent reinforcement (INT). The simplest way to explain this
distinction is to say that, under a CRF schedule, every time the behavior
occurs, it is followed by reinforcement. In an INT schedule, not every
occurrence of the behavior is reinforced.

Within INT schedules, there are two different types, ratio schedules and
interval schedules of reinforcement (discussed below). And each of these
schedules can be even further defined as being either a fixed schedule or
a variable schedule.

The CRF schedule is actually a 1:1 (behavior:reinforcement) ratio schedule
- every time the behavior occurs, it is reinforced (the highest degree of
certainty of reinforcement possible). Many examples of CRF exist in our
immediate surroundings. For example, every time we flip a light switch,
the lights come on (or go off). Every time we turn the fawcet, the water
begins to flow. In many other situations, not every occurrence is
reinforced, but each subsequent occurrence of the behavior brings us
closer to the occurrence of reinforcement. For example, a 10:1 fixed
ratio schedule (usually expressed as FR10) tells us that it takes 10
occurrences of the behavior to attain the reinforcer (BTW, a CRF schedule
may also be expressed as FR1) . One illustration of this type of schedule
that sometimes occurs in "real life" is the piece rate method of
compensation. The more widgets we produce, the more we will get paid.

In a 10:1 variable ratio schedule (VR10), ON AVERAGE, every 10th response
is reinforced. Sometimes the reinforcer follows the 8th response, other
times the 12th response, then the 5th, then the 13th, then the 9th..., but
on average, reinforcement is provided for every 10th response (as opposed
to definitely following the 10th response). Sales representatives often
are on a VR schedule in relation to the frequency that their sales calls
are reinforced with an actual sale. They can't be sure whether or not the
next sales call will result in a sale, but they may also know that, on
average, every 8th sales call produces a sale (VR8). And they also know
that, generally speaking, the more sales calls that they make, the more
sales they will generate.

[The following paragraphs (between the brackets) address the phenomenon of
gambling addiction from a behavioral understanding. If you wish to
maintain the cohesiveness of the presentation of the basics of schedules
of reinforcement, please skip ahead to the paragraph following the end
bracket.

An exceptional ratio schedule of reinforcement applies to the slot machine
(and actually to most games of chance) and here's where the gambling
addiction comes into play. Every time the arm of the slot machine is
pulled (or the dice is rolled, or the wheel is spun,...) the probability
of hitting the jackpot stays the same. Since each cycle is statistically
independent of all other trials, more pulls of the arm do not necessarily
bring you any closer to the realization of reinforcement. This is
sometimes called a random ratio of reinforcement.

These schedules can powerfully attract behavior, and the casinos use this
knowledge to their advantage. The cheaper slot machines will pay out on
more combinations of icons (and they may also have fewer possible
combinations of icons, also increasing the likelihood that a "hit" will
occur). They also vary the amount of payout (reinforcement) given for
each winning combination. I have heard that some slot machines are
actually programmed to payout more than they take in (for example, those
in the high traffic areas) in order to "hook" the passer-by and get them
to play the machines that will make money for the casinos. In all cases,
the machines can be programmed to always provide a profit to the casino
(if you play long enough, you will lose all of your money for the pleasure
of pulling the handle). In most cases, as the amount of money required to
cycle the slot machine goes up, either the number of possible combinations
for the machine is increased (reducing the likelihood that you will have a
winning combination) or the number of winning combinations is reduced.
While the payout, when it happens, is increased, the margin of profit
generated by the machine for the casino generally goes up with the more
costly slot machines.

Another area related to reinforcement that enters into the picture with
gambling is an interrelationship between the types of reinforcement that
is available. Most individuals begin gambling because of the potential
positive reinforcement that comes from the possibility of winning more
than you came with. Casinos will further enhance this by providing
antecedents (they will point out how many other winners there have been,
and maybe how much they have won in their advertisements, they set off all
kinds of sirens and bells when someone "hits" on a slot machine to let
others know that they have won - this also enhances the winning experience
for the gambler). However, at some point, a new dynamic enters into the
picture. The gambling behavior will escalate as people begin to take new
risks to try and compensate for their losses (a negative reinforcer).
They become entrapped in an escalating cycle from which they find it
harder and harder to extricate themselves.

There is a training exercise that captures the essence of this dynamic
that goes something like this: An auction is set up where people in the
session will begin bidding to "purchase" a $20 bill (or some other amount
of currency - I am trying to stay mindful of the fact that this list has
an international list of subscribers). The highest bid will get the
money. If these are the only rules, the bidding will never exceed $20
dollars (an even exchange), but if you add one additional rule, the
bidding will often go much higher. The additional rule is that the person
with the second highest bid will have to also pay in the amount that they
bid, but they will get nothing in return. The highest bid gets the $20
bill, often at a loss (if they bid higher than $20).

Adding this dynamic introduces the element of bidding in order to minimize
the loss, as opposed to optimizing the gain. Assuming that bidding is
only allowed in increments of $1, generally what happens is that the party
that bid $18 dollars for the $20 bill is outbid by some party that is
looking to make a $1 profit, bidding $19. At this point, the person who
now must pay $18 dollars will get nothing in return, so they will bid $20,
hoping to break even. Now the person who bid $19, will generally raise
the bid to $21, in order to avoid paying the $19 (now costing them $1),
and the spiral continues to escalate.

Generally, participants in the exercise above will not want to participate
anymore after they have done it once. In gambling, the person who gets
themselves involved initially by the lure of quick money may find
themselves deep in a hole and feel that they have to continue to escalate
their commitment (in the form of the amount that they are gambling) in the
hopes that they can get themselves out of the hole. If they do hit, now
they have been positively reinforced for their actions, and they will be
more likely to continue their gambling behavior (again drawn by the lure
of quick, easy money).

While gambling does provide an example of a situation where both negative
and positive reinforcement schedules are operating concurrently to
influence behavior (often the case in the complexity of reality), this
does not invalidate my suggestion in LO20649 that we generally avoid
mixing negative reinforcement and positive reinforcement schedules when we
are "concocting our behavioral cocktail" or "baking our behavioral cake".
If we have no morals or ethics (read: no concern for the persons who we
are trying to influence), we may not be concerned with the side effects
that are generated, and may be willing to exploit the situation (casinos
provide a nice illustration of this - they are most profitable when they
are exploiting their patrons). From our previous discussions, we also
know that the use of negative reinforcement to "push" behavior will
preclude the possibility that emergence will occur for the system of
behavioral energy.]

The second basic type of schedule is the interval schedule. In this type
of schedule, the realization of reinforcement is dependent on a single
response after a passage of time (the defined interval). For example in a
1 minute, fixed interval schedule (FI 1-min) the first specified
behavioral response after the one minute interval has passed will be
reinforced. Responses made within the 1 minute interval do nothing to
influence the occurrence of reinforcement, and the passage of the interval
alone does nothing. If the experimental subject had a watch and knew the
exact interval, they could literally respond once following the elapsing
of the interval and receive the reinforcement, relaxing or doing whatever
they wanted to do during the interval (and actually, this is almost
exactly the pattern of responding that is shaped using pure FI schedules).
An example of a pure FI schedule in real life is not easy to find, but
there are many examples that have characteristics of FI schedules. For
example, in companies that still hand out pay checks on pay day, the
presentation of the pay check can be interpreted as a reinforcer for
attendance at work that day (typically, absenteeism rates are lower than
average on pay day). This artifact will likely be reduced as more and
more companies are making direct deposit of pay an option. Another
example is the procrastination phenomenon (actually an example of a
negatively reinforced interval schedule). We generally engage in at least
much higher levels of studying behavior (for example, for a weekly test),
monthly bill paying behavior, or weekly report filing behavior as the
deadline approaches than we do when the deadline is far removed in time.

Variable interval (VI) schedules work on the same principle that FI
schedules, with the difference being that the actual interval that passes
before reinforcement is available varies around some average interval,
rather than being fixed at a given interval. A VI 2-min schedule provides
reinforcer for responses following an average 2 minute interval. However,
this interval may actually be 30 seconds in some cases, 3 mins in others,
and so on, so the respondent does not know the actual duration of the
interval that they have to wait before reinforcement will be available
next. This type of schedule is used to shape the behavior of security
guards who are monitoring banks of video screens that show the activity in
different areas and air traffic controllers who are constantly monitoring
radar screens. It is also the type of schedule that maintains behavior
such as trying to look busy in case the boss happens to wander by your
office. This type of schedule maintains persistent levels of behavior and
builds patience, but it does not support particularly high rates of
responding.

Behavioral research has focused extensively on examining numerous
dimensions of behavior, two of which are important in this discussion.
They are 1)the rate of behavior that will be supported by a given schedule
of reinforcement, and 2)the momentum in behavior that is produced by the
schedule. Generally speaking, continuous reinforcement is required to
shape the acquisition of a new behavior, and this schedule will support
high rates of behavior, but it is prone to several different problems
(that will be mentioned shortly in the momentum discussion). Of the
intermittent schedules, the highest rates of behavior are realized with
variable ratio schedules. This is followed (in descending order of rates)
by fixed ratio schedules, variable interval schedules, and finally, fixed
interval schedules.

With the variable schedules, the rate of responding becomes very steady,
varying very little over time. However, with the fixed schedules, there
is more variability in the steadiness of responding. This was mentioned
in the paragraph on fixed interval schedules (in the procrastination
phenomenon, behavior is usually nonexistent at the beginning of the
interval, gradually increasing in rate as the end of the interval
approaches, until it may reach even a frantic pace immediately before the
reinforcement becomes available). A related phenomenon occurs with FR
schedules that has been labeled the post reinforcement pause. In this
case, there is a short pause in the performance of the subject immediately
following the presentation of reinforcement. This pause is reliable, but
of limited duration, and is typically followed by the resumption of the
behavior at the pre-reinforcement rate.

The rate and steadiness of behavior that is supported by the different
schedules does not present the whole picture. Another characteristic of
behavior shaped by the various schedules that has been extensively
investigated is the momentum or inertia of the behavior. For purposes of
research, this is studied by shaping the behavior of the experimental
subject using a particular schedule until the rate and pattern of
behavioral response has stabilized for some time and then removing the
availability of reinforcement (extinction). By measuring the number of
responses and the length of time that passes before the behavior stops (is
extinguished), we can gain some understanding of the momentum in behavior
that the schedule is able to produce (which will take the form of a curve
that follows the path of the behavior's extinction).

The short story here is that variable schedules (either VR or VI) are much
more robust to extinction than their fixed counterparts, producing much
more momentum in the behavior that they maintain. Fixed schedules will
extinguish very quickly, likely due to the fact that the responder
realizes very quickly that something has changed when the reinforcement is
withdrawn - with a CRF schedule, the subject generally notices the change
almost right away and often stops very quickly (CRF schedules are also
prone to reinforcement satiation where the reinforcer loses its ability to
reinforce. An example of this occurs when food is used to reinforce
behavior. At some point, if the food is coming at a fast enough rate, the
subject becomes satiated and will no longer find food reinforcing). The
variability in the occurrence of reinforcement in variable schedule makes
the removal of reinforcement less apparent, and therefore more robust to
extinction. The performer will persist in the behavior much longer when
variable schedules have been used.

To summarize the paragraphs above:

CRF - high rates and fairly steady rates of performance, but not very
robust to extinction

VR - high and steady rates of behavior, very robust to extinction

FR - fairly high rates of behavior, less steady than VR, not as robust to
extinction

VI - low to moderate rates of behavior, very steady rates, very robust to
extinction

FI - low rates of behavior, unsteady rates, not robust to extinction

Now that we have covered some of the basics of schedules, we can return to
the differences between positive and negative reinforcement within the
context of schedules of reinforcement (which can be summarized in
general). All other things being equal, positive reinforcement will
sustain higher rates of behavior than negative reinforcement, regardless
of the schedule of reinforcement. This is because of the pull/attraction
of the reinforcer and the focus that is generated. Negative
reinforcement, with its spreading effect, will get the behavior to occur,
but will usually also generate other kinds of activity (for example, the
search to find ways to completely avoid the possibility of the negative
outcome that have nothing to do with the behavior that is desired). The
performer will look for opportunities to engage in the behavior when
positive reinforcement is used, but will look for opportunities to avoid
the outcome if negative reinforcement is used.

Another observation regarding negative reinforcement that I would like to
make is that, while it is possible to have negative reinforcement ratio
schedules, the much more prevalent use of negative reinforcement takes the
form of either fixed or variable interval schedules. There are a number
of factors contributing to this. One is that negative reinforcement is
often tied to deadlines (If we/you don't have this done by this time,
we/you are in trouble) and are therefore event driven (occurring at some
interval). This situation may be unavoidable in some situations (e.g.,
unilateral assignment of due dates by a customer or some other source
outside of the system which we influence), but in many other situations,
we create for ourselves circumstances that exacerbate these situations
(e.g., through our performance appraisal process, and through MBO
activities).

Another contributing factor is the lack of time taken to pinpoint
(accurately define) the behaviors that are desired or needed to attain a
given result (which must happen before positive reinforcement can be
applied contingent upon the occurrence of the behavior). Given that there
is often a poor understanding of what behaviors are required to produce a
result, the spreading/dispersive effect of the negative reinforcement push
is used as a crutch. By sending behavior in all kinds of directions, for
example with a threat tied to a poor outcome/result, the chances that we
will get at least some of the behavior that we need is increased. From
this perspective, use of negative reinforcement in the form of an interval
schedule is "easier" for the influencer to use (requires less effort),
especially in the heat of the moment, and, as a result, it becomes the
"default" method of reinforcement. However, we are stuck with the
limitation that interval schedules (FI and VI) only support low to
moderate rates of behavior (and, when they are negatively reinforced,
never more than is necessary to get by).

Positive reinforcement, on the other hand, pairs up very well with ratio
schedules. The use of a variable ratio, positive reinforcement schedule
will produce much higher rates of behavior, and this behavior will also be
very robust to extinction. In fact, positive reinforcement, applied using
variable ratio schedules, produces behavior that is MOST robust to
extinction.

This brings us to yet another concept that is fundamental to understanding
the benefits of positive reinforcement in contrast to negative
reinforcement - thinning the schedule of reinforcement. Simply put,
thinning is the process of gradually reducing the likelihood that
reinforcement will occur, given a behavior. It may seem unlikely that a
VR200 schedule will be able to support much behavior, and in some
conditions this is true, but there are other conditions where this ratio
(and even much "leaner" schedules) will sustain very high levels of
behavior. It is very unlikely that a new behavior will be acquired using
an VR200 schedule - it will "feel" like there is no reinforcer for the
behavior (which is actually close to the truth). In general, frequent
reinforcement is necessary to shape the acquisition of a new behavior.
However, once the behavior has been shaped to a given rate and steadiness,
it can be maintained on a thinner schedule. You would not want to go
directly from a CRF schedule to a VR200 schedule (extinction is more
likely to result), but, if you stepped the transition from one schedule to
another in gradual increments, it is possible to maintain the behavior at
a high and steady rate without any deterioration, even on very thin/lean
schedules. It is even possible to change the type of schedule without any
change in the rate or steadiness of the behavior (e.g., from a VR schedule
to a VI schedule - when the behavior is occurring at a high and steady
rate, the subject will not be able to sense that the schedule has
changed).

Here again we have another dimension where positive reinforcement is
superior to negative reinforcement. Positive reinforcement can be thinned
to a much leaner schedule of reinforcement than negative reinforcement
before performance will begin to deteriorate. Clearly, positive
reinforcement is indeed a much more efficient force for influencing
behavior than negative reinforcement, and we have not lost the parallels
to the understanding of force in physics.

So now we return to the 4:1 ratio, to see if we can understand it from a
much broader conceptual context. Your question is actually a very
penetrating one, and, as you might infer from the length of this post (my
own verbosity aside), one that cannot be answered quickly. In my own
thinking, the 4:1 ratio is important for several reasons. First, it
defines the level of effort that is necessary to "turn the tide", or
change the paradigm through which the flow of behavior is shaped. As I
mentioned above, frequent reinforcement is required to encourage the
acquisition of a new behavior. This, paired with the fact that many of
the behaviors that are occurring in any given setting are moving in the
wrong (undesired) direction, and have some momentum of their own, begins
to explain why the seemingly incongruent ratio of 4:1 (positive
reinforcement:punishment) has some validity.

The 4:1 ratio is also a minimum ratio. The higher the positive
reinforcement:punishment ratio is in a company, the better. In this
sense, it is also a good heuristic for monitoring your work environment
over time. As new behaviors are established and require less frequent
reinforcement to be sustained, the occurrence of the use of punishment
should also decrease. If we find that we are reinforcing less, and using
punishment just as frequently (or more) than we have in the past, it is a
good sign that there is yet more work to do and that there is a good
possibility that negative reinforcement is still the predominant means of
"motivating" behavior. In this situation, we can take the energy that we
were infusing into the change of the flow of behavior in one area (which
now has gained some momentum of its own) and focus it on encouraging other
behaviors. In either case, the 4:1 ratio is not an indication of the
relative efficiency of positive reinforcement in influencing behavior
versus punishment and negative reinforcement, so much as it is a
representation that changing the flow of behavior requires and intensive
effort initially (but, thankfully, less effort in the long run) and an
indication of the "reinforcement health" of an organization over time.

To quote the great Forrest Gump, "And that's all I have to say about
that.".

I hope you are doing well.

Jon Krispin

-- 

"Jon Krispin" <jkrispin@prestolitewire.com>

Learning-org -- Hosted by Rick Karash <rkarash@karash.com> Public Dialog on Learning Organizations -- <http://www.learning-org.com>