Back to All Sims (also for general info and executable downloads)
This simulation illustrates the dynamic gating of information into PFC active maintenance, by the basal ganglia (BG). It uses a simple Store-Ignore-Recall (SIR) task, where the BG system learns via phasic dopamine signals and trial-and-error exploration, discovering what needs to be stored, ignored, and recalled as a function of reinforcement of correct behavior, and learned reinforcement of useful working memory representations. The model is the current incarnation the our original PBWM framework O'Reilly & Frank, 2006.
The SIR task requires the network to Recall (when the R unit is active) the letter (A-D) that was present when a Store (S) input was active earlier. Ignore (I) trials also have a letter input, but, as you might guess, these are to be ignored. Trials are randomly generated, and there can be a random number of Ignore trials between a Store and Recall trial, so the model must learn to maintain the stored information in robust working memory representations in the PFC, until the next Recall trial, with variable numbers of intervening and unpredictable distractors between task-relevant events.
You will notice that the network is configured with the input and output information at the top of the network instead of the usual convention of having the input at the bottom -- this is because all of the basal ganglia mechanisms associated with the gating system are located in an anatomically appropriate "subcortical" location below the cortical layers associated with the rest of the model.
The main processing of information in the model follows the usual path from Input to Hidden to Output. However, to make appropriate responses based on the information that came on earlier trials, the Hidden layer needs access to the information maintained in the PFC (prefrontal cortex) layer. The PFC will maintain information in an active state until it receives a gating signal from the basal ganglia gating system, at which point it will update to encode (and subsequently maintain) information from the current trial. In this simple model, the PFC acts just like a copy of the sensory input information, by virtue of having direct one-to-one projections from the Input layer. This makes it easy to see directly what the PFC is maintaining -- the model also functions well if the PFC representations are distributed and learned, as is required for more complex tasks. Although only one PFC "stripe" is theoretically needed for this specific task (but see the end of this documentation for link to more challenging tasks), the system works much better by having a competition between multiple stripes, each of which attempts to learn a different gating strategy, searching the space of possible solutions in parallel instead of only serially -- hence, this model has four PFC maintenance stripes that each can encode the full set of inputs. Each such stripe corresponds to a hypercolumn in the PFC biology.
Within each hypercolumn/stripe, we simulate the differential contributions of the superficial cortical layers (2 and 3) versus the deep layers (5 and 6) -- the superficial are labeled as
PFCmnt and the deep as
PFCmntD in the model. The superficial layers receive broad cortical inputs from sensory areas (i.e., Input in the model) and from the deep layers within their own hypercolumn, while the deep layers have more localized connectivity (just receiving from the corresponding superficial layers in the model). Furthermore, the deep layers participate in thalamocortical loops, and have other properties that enable them to more robustly maintain information through active firing over time. Therefore, these deep layers are the primary locus of robust active maintenance in the model, while the superficial layers reflect more of a balance between other (e.g., sensory) cortical inputs and the robust maintenance activation from the deep layers. The deep layers also ultimately project to subcortical outputs, and other cortical areas, so we drive the output of the model through these deep layers into the Hidden layer.
As discussed in the Executive Chapter, electrophysiological recordings of PFC neurons typically show three broad categories of neural responses (see Figure 10.3 in chapter 10, from Sommer & Wurtz, 2000): phasic responders to sensory inputs; sustained active maintenance; and phasic activity for motor responses or other kind of cognitive action. The
PFCmnt neurons can capture the first two categories -- it is possible to configure the PFCmnt units to have different temporal patterns of responses to inputs, including phasic, ramping, and sustained. However, the third category of neurons require a separate BG-gating action to drive an appropriate (and appropriately timed) motor action, and thus we have a separate population of output gating stripes in the model, called
PFCout (superficial) and
PFCoutD (deep). It is these PFCoutD neurons that project to the posterior cortical
Output layers of the model, and drive overt responding. For simplicity, we have configured a topographic one-to-one mapping between corresponding PFCmnt and PFCout stripes -- so the model must learn to gate the appropriate PFCout stripe that corresponds to the PFCmnt stripe containing the information relevant to driving the correct response.
In summary, correct performance of the task in this model requires BG gating of Store information into one of the PFCmnt stripes, and then not gating any further Ignore information into that same stripe, and finally appropriate gating of the corresponding PFCout stripe on the Recall trial. This sequence of gating actions must be learned strictly through trial-and-error exploration, shaped by a simple Rescorla-Wagner (RW) style dopamine-based reinforcement learning system located on the left-bottom area of the model (see Chapter 7 for details). The key point is that this system can learn the predicted reward value of cortical states and use errors in predictions to trigger dopamine bursts and dips that train striatal gating policies.
Matrix: this is the dynamic gating system representing the matrix units within the dorsal striatum of the basal ganglia. The bottom layer contains the "Go" (direct pathway) units, while top layer contains "NoGo" (indirect pathway). As in the earlier BG model, the Go units, expressing more D1 receptors, increase their weights from dopamine bursts, and decrease weights from dopamine dips, and vice-versa for the NoGo units with more D2 receptors. As is more consistent with the BG biology than earlier versions of this model, most of the competition to select the final gating action happens in the GPe and GPi (with the hyperdirect pathway to the subthalamic nucleus also playing a critical role, but not included in this more abstracted model), with only a relatively weak level of competition within the Matrix layers. Note that we have combined the maintenance and output gating stripes all in the same Matrix layer -- this allows these stripes to all compete with each other here, and more importantly in the subsequent GPi and GPe stripes -- this competitive interaction is critical for allowing the system to learn to properly coordinate maintenance when it is appropriate to update/store new information for maintenance vs. when it is important to select from currently stored representations via output gating.
GPeNoGo: provides a first round of competition between all the NoGo stripes, which critically prevents the model from driving NoGo to all of the stripes at once. Indeed, there is physiological and anatomical evidence for NoGo unit collateral inhibition onto other NoGo units. Without this NoGo-level competition, models frequently ended up in a state where all stripes were inhibited by NoGo, and when nothing happens, nothing can be learned, so the model essentially fails at that point!
GpiThal: Has a strong competition for selecting which stripe gets to gate, based on projections from the MatrixGo units, and the NoGo influence from GPeNoGo, which can effectively veto a few of the possible stripes to prevent gating. As discussed in the BG model, here we have combined the functions of the GPi (or SNr) and the Thalamus into a single abstracted layer, which has the excitatory kinds of outputs that we would expect from the thalamus, but also implements the stripe-level competition mediated by the GPi/SNr. If there is more overall Go than NoGo activity, then the GPiThal unit gets activated, which then effectively establishes an excitatory loop through the corresponding deep layers of the PFC, with which the thalamus neurons are bidirectionally interconnected.
Rew, RWPred, SNc: The
Rewlayer represents the reward activation driven on the Recall trials based on whether the model gets the problem correct or not, with either a 0 (error, no reward) or 1 (correct, reward) activation.
RWPredis the prediction layer that learns based on dopamine signals to predict how much reward will be obtained on this trial. The SNc is the final dopamine unit activation, reflecting reward prediction errors. When outcomes are better (worse) than expected or states are predictive of reward (no reward), this unit will increase (decrease) activity. For convenience, tonic (baseline) states are represented here with zero values, so that phasic deviations above and below this value are observable as positive or negative activations. (In the real system negative activations are not possible, but negative prediction errors are observed as a pause in dopamine unit activity, such that firing rate drops from baseline tonic levels). Biologically the SNc actually projects dopamine to the dorsal striatum, while the VTA projects to the ventral striatum, but there is no functional difference in this level of model.
In this model, Matrix learning is driven exclusively by dopamine firing at the time of rewards (i.e., on Recall trials), and it uses a synaptic-tag-based trace mechanism to reinforce/punish all prior gating actions that led up to this dopaminergic outcome. Specifically, when a given Matrix unit fires for a gated action (we assume it receives the final gating output from the GPi / Thalamus either via thalamic or PFC projections -- this is critical for proper credit assignment in learning), we hypothesize that structural changes in the synapses that received concurrent excitatory input from cortex establish a synaptic tag. Extensive research has shown that these synaptic tags, based on actin fiber networks in the synapse, can persist for up to 90 minutes, and when a subsequent strong learning event occurs, the tagged synapses are also strongly potentiated (Redondo & Morris, 2011, Rudy, 2015, Bosch & Hayashi, 2012). This form of trace-based learning is very effective computationally, because it does not require any other mechanisms to enable learning about the reward implications of earlier gating events. (In earlier versions of the PBWM model, we relied on CS (conditioned stimulus) based phasic dopamine to reinforce gating, but this scheme requires that the PFC maintained activations function as a kind of internal CS signal, and that the amygdala learn to decode these PFC activation states to determine if a useful item had been gated into memory. Compared to the trace-based mechanism, this CS-dopamine approach is much more complex and error-prone. Nevertheless, there is nothing in the current model that prevents it from also contributing to learning. However, in the present version of the model, we have not focused on getting this CS-based dopamine signal working properly -- there are a couple of critical issues that we are addressing in newer versions of the PVLV model that should allow it to function better.)
To explore the model's connectivity, click on
r.Wtand on various units within the layers of the network.
SIR Task Learning
Now, let's step through some trials to see how the task works.
- Switch back to viewing activations (
Step Trialin the toolbar.
The task commands (Store, Ignore, Recall) are chosen completely at random (subject to the constraint that you can't store until after a recall, and you can't recall until after a store) so you could get either an ignore or a store input. You should see either the S or I task control input, plus one of the stimuli (A-D) chosen at random. The target output response should also be active, as we're looking at the plus phase information (stepping by trials).
Notice that if the corresponding
GPiThal unit is active, the PFC stripe will have just been updated to maintain this current input information.
You should now see a new input pattern. The GPiThal gating signal triggers the associated PFC stripe to update its representations to reflect this new input. But if the GPiThal unit is not active (due to more overall NoGo activity), PFC will maintain its previously stored information. Often one stripe will update while the other one doesn't; the model has to learn how to manage its updating so that it can translate the PFC representations into appropriate responses during recall trials.
- Keep hitting
Step Trialand noticing the pattern of updating and maintenance of information in
PFCmnt, and output gating in
PFCout, and how this is driven by the activation of the
GPiThalunit (which in turn is driven by the
MatrixGo vs. NoGo units, which in turn are being modulated by dopamine from the SNc to learn how to better control maintenance in the PFC!).
When you see a R (recall) trial, look at the SNc (dopamine) unit at the bottom layer. If the network is somehow able to correctly recall (or guess!), then this unit will have a positive (yellow) activation, indicating a better-than expected performance. Most likely, it instead will be teal blue and inverted, indicating a negative dopamine signal from worse-than expected performance (producing the wrong response). This is the reinforcement training signal that controls the learning of the Matrix units, so that they can learn when information in PFC is predictive of reward (in which case that information should be updated in future trials), or whether having some information in PFC is not rewarding (in which case that information should not be updated and stored in future trials). It is the same learning mechanism that has been extensively investigated (and validated empirically) as a fundamental rule for learning to select actions in corticostriatal circuits, applied here to working memory.
- You can continue to
Step Trialand observe the dynamics of the network. When your mind is sufficiently boggled by the complexity of this model, then go ahead and hit
Step Run, and switch to the
You will see three different values being plotted as the network learns:
PctErr (dark green line): shows the overall percent of errors per epoch (one epoch is 100 trials in this case), which quickly drops as the network learns.
AbsDA (lighter green line): shows dopamine for Recall trials (when the network's recall performance is directly rewarded or punished). As you can see, this value starts high and decreases as the network learns, because DA reflects the difference from expectation, and the system quickly adapts its expectations based on how it is actually doing. The main signals to notice here are when the network suddenly starts doing better than on the previous epoch (PctErr drops) -- this should be associated with a peak in DA, whereas a sudden increase in errors (worse performance) results in a dip in DA. As noted above, these DA signals are training up the Matrix gating actions since the last Recall trial.
RewPred (blue line): plots the RWPred Rescorla-Wagner reward prediction activity, which cancels out the rewards in the
Rewlayer, causing DA to decrease. As the model does better, this line goes up reflecting increased expectations of reward.
The network can take roughly 5-50 epochs or so to train (it will stop when
PctErr gets to 0 errors 5 times in a row).
- Once it has trained to this criterion, you can switch back to viewing the network, and
Step Trialthrough trials to see that it is indeed performing correctly. You can also do a
Test Alland look at the
TstTrlPlotand click on the
TstTrlLogto see a record of a set of test trials. Pay particular attention to the
GPiThalactivation and what the PFC is maintaining and outputting as a result -- you should see Go firing on Store trials for one of the stripes, and NoGo on Ignore trials for that same stripe. The other PFCmnt stripe may gate for Ignore trials -- it can afford to do so given the capacity of this network relative to the number of items that needs to be stored -- but typically the model will not do output gating in PFCout for these.
Question 10.7: Report the patterns of DA dopamine firing in relation to the
PctErrperformance of the model, and explain how this makes sense in terms of how the network learns.
Now we will explore how the Matrix gating is driven in terms of learned synaptic weights. Note that we have split out the SIR control inputs into a separate CtrlInput layer that projects to the Matrix layers -- this control information is all that the Matrix layer requires. It can also learn with the irrelevant A-D inputs, but just takes a bit longer.
- Click on
s.Wtin the NetView tab, and then click on the individual SIR units in the
CtrlInputlayer to show the learned sending weights from these units to the
Question 10.8: Explain how these weights from S,I,R inputs to the various Matrix stripes makes sense in terms of how the network actually solved the task, including where the Store information was maintained, and where it was output, and why the Ignore trials did not disturb the stored information.
Note that for this simple task, the number of items that needs to be maintained at any one time is just one, which is why the network still gates Ignore trials (it just learns not to output gate them). If you're feeling curious you can use the Wizard in the software to change the number of PBWM stripes to 1, and there you should see that the model can still learn this task but is now pressured to do so by ignoring I trials at the level of input gating. However, by taking away the parallel learning abilities of the model, it can take longer to learn.
If you want to experience the full power of the PBWM learning framework, you can check out the
sir52_v50 model, which takes the SIR task to the next level with two independent streams of maintained information. Here, the network has to store and maintain multiple items and selectively recall each of them depending on other cues, which is very demanding task that networks without selective gating capabilities cannot achieve. This version more strongly stresses the selective maintenance gating aspect of the model (and indeed this problem motivated the need for a BG in the first place).
Bosch, M., & Hayashi, Y. (2012). Structural plasticity of dendritic spines. Current Opinion in Neurobiology, 22(3), 383–388. https://doi.org/10.1016/j.conb.2011.09.002
O'Reilly, R.C. & Frank, M.J. (2006), Making Working Memory Work: A Computational Model of Learning in the Frontal Cortex and Basal Ganglia. Neural Computation, 18, 283-328.
Redondo, R. L., & Morris, R. G. M. (2011). Making memories last: The synaptic tagging and capture hypothesis. Nature Reviews Neuroscience, 12(1), 17–30. https://doi.org/10.1038/nrn2963
Rudy, J. W. (2015). Variation in the persistence of memory: An interplay between actin dynamics and AMPA receptors. Brain Research, 1621, 29–37. https://doi.org/10.1016/j.brainres.2014.12.009
Sommer, M. A., & Wurtz, R. H. (2000). Composition and topographic organization of signals sent from the frontal eye field to the superior colliculus. Journal of Neurophysiology, 83(4), 1979–2001.
sir illustrates the dynamic gating of information into PFC active maintenance, by the basal ganglia (BG). It uses a simple Store-Ignore-Recall (SIR) task, where the BG system learns via phasic dopamine signals and trial-and-error exploration, discovering what needs to be stored, ignored, and recalled as a function of reinforcement of correct behavior, and learned reinforcement of useful working memory representations.