Dopamine and Reward Prediction: What your brain looks like on Rickroll

Today Sci is going to blog a paper that she has been meaning to blog for a long time. It’s one of those papers that people who do certain kinds of science snuggle with when they go to sleep at night.
snuggle.jpg
(Sci and this paper)
But the real reason that Sci loves this paper is that it’s the neurobiological equivilant of a RickRoll.
rickastley.jpg
And the question behind this paper is: what is the mechanism behind reward prediction?
ResearchBlogging.org Schultz, Dayan, and Montague. “A neural substrate of prediction and reward” Science, 1997.


Now at this point you might be asking yourself: what the heck is reward prediction and why does anyone care about it? Reward prediction is in fact an extremely important thing in any organism’s life. If you can’t predict where and when you’re going to get food, shelter, or sex in response to specific stimuli, you’re going to be a very hungry, chilly, and undersexed organism. An ability to be able to predict a reward is especially good because it allows you to gage your behavioral reactions accordingly. For example, Sci’s reaction to being told she’s going to get a Hershey’s kiss is going to be markably different than her response to being told she gets an entire Snicker’s bar.
Rewarding stimuli in particular elicit a very specific series of actions in specific animals. For example, if you’re a rat, a signal for food is going to signal “approach” behavior, in which the rat is going to head over and get himself a sandwich.
templeton_l.jpg
But his experience with rewarding objects is going to be different depending on what he EXPECTS. If, for example, he’s been given to expect a medium sized reward, like half a sandwich, and gets a WHOLE sandwich, he’s going to react more strongly. If he thinks he’s going to get a half sandwich and all he gets is half a tomato, well, that’s just disappointing. These responses have been established through many long years of conditioning experiments, a la Pavlov’s dogs.
Basically (for those who haven’t heard of Pavlov and his pooches), Pavlov gave the dogs meat powder, and every time he did, he rang a bell. Obviously, when dogs taste meat powder, they salivate, and start drooling all over the place as only dogs can. By pairing the meat powder with the bell, pretty soon, every time he just rang the BELL, the dogs started salivating whether or not the meat powder was there, because they had come to expect it.
This phenomenon is called conditioning. The first mean powder is called the unconditioned stimulus. The bell is the conditioned stimulus. The dogs first salivating response to the unconditioned stimulus is the unconditioned response. When the dogs learn to associate the meat stimulus with the conditioned stimulus of the bell.
For a long time now, scientists have known that the neurotransmitter dopamine is involved in the rewarding aspects of things, including food and drugs. Right now, it is thought that dopamine neuron firing helps to process and construct information about possible rewarding events. But it was this paper that showed, for the first time, that dopamine neurons were really involved in the PREDICTION of reward. And here’s what they did:
First, they took a bunch of monkeys (it has also been done in rats) and implanted electrodes to record neurons in the Ventral Tegmental Area of the brain, an area that contains lots of dopamine neurons. With these electrodes, they could watch the neurons fire. In this case, they gave the monkeys an unexpected reward, fruit juice.
reward prediction1.png
See that spike above the “R”? That spike is a spike in dopamine neuron activity when the monkeys on unexpected fruit juice. The neurological equivalent of “w00t!”.
They then trained the rats on a conditioned stimulus paradigm. Basically, they paired a tone or light with a dose of fruit juice for the monkey. This meant that, when the monkey was done learning, it knew that when it got a light, fruit juice was forthcoming. And the neurons in the monkeys brains SHOWED the result of the learning. It looked like this:
reward prediction2.png
This is a condition where the monkey was given the tone (or light), and got the reward it expected. You can see that here, the spike in dopamine neuron activity has shifted, this time corresponding to the tone (woohoo! juice is on the way!) rather than to the reward itself.
But then, what happens when the conditioned stimulus of the tone or light is given, and no juice arrives?
reward prediction3.png
OOOOH. BUUUUUURN.
The spike is there, the monkey is waiting for juice. No juice arrives. And instead of normal firing, when no reward appears, there’s a DECREASE in dopamine neuron firing (the circled portion).
That monkey’s been Rickrolled.

(Does that hurt you like it hurts me?)
But this Rickroll is cool. First, it was the first time that anyone had shown that learning a conditioned stimulus (in this case, the light with the juice) actually shifted neuron activity to the stimulus, rather than the reward. And it ALSO showed that the timing of the reward was ALSO encoded. That monkey knew WHEN the reward was expected, and knew when it had been had. It showed that dopamine neurons are part of a system encoding the expectation of rewards and stimuli. Dopamine neurons don’t just encode reward, rather, they encode the expectation, and respond to whether it happens or doesn’t, with a spike if the reward is better than expected, and a dip if it’s worse.
This paper is pretty old in scientific terms (1997! Come on, now), but it remains the basis for a lot of reward expectation studies today, and a lot of studies have been built on this paper. And why not? A neurobiological Rickroll is the stuff of which great science is made!
Schultz, W. (1997). A Neural Substrate of Prediction and Reward Science, 275 (5306), 1593-1599 DOI: 10.1126/science.275.5306.1593

12 Responses

  1. I’d be interested to see a mathematical model of how strong the various reward/lack of reward effects are and see how they impact random rewards. I’d be willing to bet that the specifics of how this mechanism works could well be used to explain basic superstition (the attribution of specific positive effects with uncorrelated events).

  2. I’ve written about this before, too.

  3. I wonder if this tells us anything new about ADHD, because I’m an egoist. We know it’s characterised by low dopamine levels, and it manifests in failure to regulate executive function.

  4. Meat powder! That’s what I’m missing from my herb & spice rack!

  5. Meat powder! That’s what I’m missing from my herb & spice rack!

  6. I’ve heard it framed that the firing activity in studies like this reflects “surprise,” such that the bigger the difference between what’s expected and what actually happens, the bigger the response (increased or decreased firing). So in the first panel the reward is a surprise, because nothing predicts it; in the second, the CS is a surprise, and in the third, the lack of reward is a surprise (but the bad kind!)

  7. How does this play into the fact that intermittent reinforcement is the best way to set a behavior.

  8. How does this play into the fact that intermittent reinforcement is the best way to set a behavior.

  9. Dr. Becca, I would probably rephrase your “surprise” as “positive contrast” and “negative contrast” (there may be lots of other kinds of surprise). Some time ago I studied “hyperdopaminergic animals”, that based on microdialysis studies. When you looked at their responses to sucrose they weren’t really different until you did contrast studies – both positive and negative contrast were enhanced. They whole thing (e.g. all of Shultz’ work) really begs the question of what dopamine is doing in the first place. I still think that Shultz’ description of dopamine as a “salience” system has something to it, which includes expected and omitted rewards, but a bunch of other things too, especially before you grow to have expectations in the first place.

  10. Maybe, when this pattern, of reward expectation with no reward to follow, occurs early on in life and is reinforced by lowered expectations from pattern-repeat over time, it changes the way we problem-solvers focus on building solutions: we become process- rather than goal-oriented, receiving our jollies from untangling the knots and avoiding dopamine shortage by providing internal- rather than external action-reward expectation linkage.

  11. Maybe, when this pattern, of reward expectation with no reward to follow, occurs early on in life and is reinforced by lowered expectations from pattern-repeat over time, it changes the way we problem-solvers focus on building solutions: we become process- rather than goal-oriented, receiving our jollies from untangling the knots and avoiding dopamine shortage by providing internal- rather than external action-reward expectation linkage.

  12. Are there any studies showing how long it takes to extinguish the expectation/disappointment pattern once the reward has been stopped? How long does this extinction take, as compared to the establishment of the pattern?

Leave a reply to Dr Becca Cancel reply