Introduction
Learning the consequences of actions is an adaptive behavior in humans and animals, which allows them to control their environment in a goal-directed manner. Instrumental learning (operant learning) is the ability to learn from consequences and optimize actions by acquiring action–outcome (A–O) associations, which requires both reward processing and causal integration (Dowd & Barch, Reference Dowd and Barch2012; Maia, Reference Maia2009; Miyata, Reference Miyata2019). Reward processing and reinforcement learning impairments are key characteristics of schizophrenia (SZ) and other psychotic disorders and are strongly associated with functional and clinical outcomes (Waltz et al., Reference Waltz, Xu, Brown, Ruiz, Frank and Gold2018). In fact, the literature examining reward processing in psychosis has largely relied on instrumental learning tasks, in which participants must make responses first, and rewards only occur after correct and/or rapid response execution (Bouton, Maren, & McNally, Reference Bouton, Maren and McNally2021; Dowd et al., Reference Dowd, Frank, Collins, Gold and Barch2016). In these tasks, the ability to anticipate a reward depends upon the ability to earn the reward by responding appropriately. This requires not only reward processing but also causal integration, either of which may be impaired in individuals with psychosis (Dowd et al., Reference Dowd, Frank, Collins, Gold and Barch2016; Waltz et al., Reference Waltz, Xu, Brown, Ruiz, Frank and Gold2018). These deficits have been shown across a wide variety of tasks and have been associated with negative symptoms, such as anhedonia and avolition (Dowd & Barch, Reference Dowd and Barch2012; Morris et al., Reference Morris, Vercammen, Lenroot, Moore, Langton, Short and Weickert2012). Numerous studies to date have reported an association between A–O learning deficits and the severity of negative symptoms, suggesting, to some extent, that changes in adaptive response to environmental stimuli may play an important part in the onset of SZ. According to ideomotor theories of action control, the anticipation of action goals emerges from the acquisition of bidirectional A–O associations (Hommel, Reference Hommel2009; Shin, Proctor, & Capaldi, Reference Shin, Proctor and Capaldi2010); once the capacity is impaired, individuals are unable to generate enough motivation to sustain a behavior pattern for a desired goal, which is associated with amotivation symptoms in SZ (Watson et al., Reference Watson, van Steenbergen, de Wit, Wiers and Hommel2015).
The fact that individuals with psychosis exhibit dysfunctional reinforcement processing mechanisms, which may contribute to negative symptoms, is further supported by evidence that the mesolimbic dopamine system, which is known to be important in psychotic symptoms and modulates instrumental learning, is disrupted in psychosis (Waltz et al., Reference Waltz, Schweitzer, Gold, Kurup, Ross, Salmeron and Stein2009). As such, investigating the neural representations of instrumental learning will contribute to understanding the relationship between the neurobiology of psychosis and its subjective experience and behavior performance, as well as informing illness diagnosis (Murray et al., Reference Murray, Corlett, Clark, Pessiglione, Blackwell, Honey and Fletcher2008). Several studies have attempted to examine the neural basis for A–O learning abnormalities in psychosis and have identified the meso-cortico-limbic circuit, including the ventral (VS) and dorsal striatum (DS), amygdala, insula, dorsolateral prefrontal cortex (DLPFC), orbital prefrontal cortex (OFC), medial prefrontal cortex (mPFC), and anterior cingulate cortex (ACC), as key brain regions involved (Gradin et al., Reference Gradin, Waiter, O’Connor, Romaniuk, Stickle, Matthews and Steele2013; Juckel et al., Reference Juckel, Schlagenhauf, Koslowski, Filonov, Wüstenberg, Villringer and Heinz2006a; Liu et al., Reference Liu, Zhao, Kratochvil, Jiang, Cui, Wang and Cui2023; Romaniuk et al., Reference Romaniuk, Honey, King, Whalley, McIntosh, Levita and Hall2010; Waltz et al., Reference Waltz, Schweitzer, Gold, Kurup, Ross, Salmeron and Stein2009). These BOLD (blood oxygen level dependent) signal activations in the meso-cortico-limbic reward circuit are thought to communicate information about reward contingencies in the environment that guide action selection and learning. However, the results in individuals with psychosis are still inconsistent. Although most studies have reported reduced VS activity in individuals with psychosis when encoding action value (Gradin et al., Reference Gradin, Kumar, Waiter, Ahearn, Stickle, Milders and Steele2011; Schlagenhauf et al., Reference Schlagenhauf, Huys, Deserno, Rapp, Beck, Heinze and Heinz2014; Waltz et al., Reference Waltz, Xu, Brown, Ruiz, Frank and Gold2018), some studies have reported no differences between psychosis and healthy control (HC) groups (Culbreth et al., Reference Culbreth, Westbrook, Xu, Barch and Waltz2016b; Dowd et al., Reference Dowd, Frank, Collins, Gold and Barch2016; Waltz et al., Reference Waltz, Kasanova, Ross, Salmeron, McMahon, Gold and Stein2013). In terms of the prefrontal cortex (PFC), numerous imaging studies have demonstrated altered cortical activity in the mPFC and OFC (Hernaus et al., Reference Hernaus, Xu, Brown, Ruiz, Frank, Gold and Waltz2018; Waltz et al., Reference Waltz, Xu, Brown, Ruiz, Frank and Gold2018), which is implicated in both valuation and converting value to actions (Balleine & O’Doherty, Reference Balleine and O’Doherty2010; Tanaka, Balleine, & O’Doherty, Reference Tanaka, Balleine and O’Doherty2008), while other studies have shown intact activity in either the OFC or mPFC during these processes (Culbreth, Gold, Cools, & Barch, Reference Culbreth, Gold, Cools and Barch2016a; Koch et al., Reference Koch, Schachtzabel, Wagner, Schikora, Schultz, Reichenbach and Schlösser2010; Morris et al., Reference Morris, Quail, Griffiths, Green and Balleine2015). Furthermore, the neural signals of the expected value of the outcome are significantly correlated with negative symptoms (Katthagen et al., Reference Katthagen, Kaminski, Heinz, Buchert and Schlagenhauf2020; Waltz et al., Reference Waltz, Schweitzer, Gold, Kurup, Ross, Salmeron and Stein2009); however, there is also evidence suggesting correlations with the severity of psychiatric symptoms (Gradin et al., Reference Gradin, Kumar, Waiter, Ahearn, Stickle, Milders and Steele2011). Additionally, some studies have shown that antipsychotic drugs contribute to the recovery of the activation of different brain regions within the meso-cortico-limbic circuit. The heterogeneity of the included samples and RL (reinforcement learning) paradigms, combined with the bias introduced by including region of interest analyses, may explain the inconsistencies across studies.
To address this issue, we performed a voxel-based meta-analysis to investigate the neural representations elicited by instrumental learning in individuals with psychosis compared to HCs. Our current work aimed to investigate the neural representations of aberrant behavior–outcome associations, which is different from several previous meta-analysis studies. One meta-analysis focused on prediction errors (PRs) in patients with SZ and patients with major depressive disorder (Yaple, Tolomeo, & Yu, Reference Yaple, Tolomeo and Yu2021). Another meta-analysis compared reward anticipation signals in individuals with schizophrenia using several paradigms, such as the monetary incentive delay (MID) task and instrumental reward learning task (Leroy et al., Reference Leroy, Amad, D’Hondt, Pins, Jaafari, Thomas and Jardri2020). Our recent meta-analysis focused on reward processing in individuals with schizophrenia during the anticipation and outcome stages using the MID task (Zeng et al., Reference Zeng, Yan, Cao, Su, Song, Luo and Yang2022). To minimize the heterogeneity of functional imaging paradigms, we included only studies that employed the instrumental learning task, where subjects obtain A–O associations via their own selections with subsequent feedback (see Description of the reinforcement learning task in the Supplementary Materials). Specifically, in the instrumental learning paradigm, participants earn rewards via their correct actions first, and they repeat the same actions in subsequent trials when the situation is similar. If there is a stimulus before the individual responds, this learning is called discriminated operant; if there is no stimulus, it is called free operant (Bouton et al., Reference Bouton, Maren and McNally2021) (see Supplementary Table S1). From the literature reviewed above, we could expect that individuals with psychosis may show blunted activation within meso-cortico-limbic pathways during instrumental learning. Moreover, we expect that abnormal neural activations during instrumental learning would be closely related to the severity of symptoms and clinical variables.
Methods
See the Methods in the Supplementary Materials.
Results
Brain activation differences between individuals with psychosis and HCs during instrumental learning
Included studies and sample characteristics
During the instrumental learning task, a total of 18 studies involving 456 individuals with psychosis and 454 HCs met the meta-analysis inclusion criteria (Figure 1). The mean age of individuals with psychosis (35.83 years) and HCs (34.93 years) was not significantly different (t = 0.4645, p = 0.6453). There was no significant difference (χ2 = 32.0000, p = 0.232) in the percentage of males between individuals with psychosis (69.57% male) and controls (63.77% male) (Table 1).

Figure 1. Flow diagram of the inclusion and exclusion process of selected articles. Of 667 articles initially identified, a total of 18 studies were enrolled in the meta-analysis. Notes: fMRI, ‘functional magnetic imaging’; ROI, ‘regions of interest’; VOI, ‘volume of interest’.
Table 1. Demographic and clinical characteristics of the studies included in the meta-analysis

Notes: ICD-10, international classification of diseases, 10th Edition; DSM-IV, diagnostic and statistical manual of mental disorders, 4th Edition; SZ, schizophrenia; FEP, first episode psychosis; HC, healthy control; T, Typical psychotic drugs; A, Atypical psychotic drugs; Olanzapine Equivalents are calculated according to the DDD(defined daily doses) method; PRL, Probabilistic Reversal learning task; PIL, Probabilistic instrumental learning task; Y, Yes; N, No; NA, not available. The IQ scores were respectively assessed using the aWechsler Test of Adult Reading(WTAR), bCulture Fair matrices test, cNational Adult Reading Test, dWechsler Abbreviated Scale of Intelligence-II(WASI), eWortschatztest(WST), fverbal IQ test.
Included paradigms and behavioral indicators
As illustrated in Supplementary Table S2, after checking all the studies, we found that the experimental paradigms used in the included studies can be summarized into the following three types: the probabilistic instrumental learning (PIL) task (in 12 studies), the probabilistic reversal learning (PRL) task (in 5 studies), and the probabilistic trial-and-error task (in 1 study).
Additionally, we have summarized some behavioral indicators related to instrumental learning deficits in psychosis in Supplementary Table S2. Among them, ‘correct choice’ (available in 6 datasets) refers to the percentage of trials in which participants choose the commonly rewarded stimuli in the PIL and PRL paradigms. ‘Total reward’ (available in 3 datasets) indicates the total amount of money that participants earned through correct responses during the whole task. The ‘learning rate’ (available in 6 datasets) is an important parameter in RL models for evaluating participants’ ability to learn from PE and to impact the updated expected value. In addition, ‘win-stay’ and ‘lose-shift’ (available in 5 datasets) are used in the PIL and PRL paradigms, respectively, and refer to the percentage of trials in which the participants selected the rewarded stimuli or avoided the unrewarded stimuli in the last trial. ‘Reversals’ (available in 4 datasets) are only used in the PRL task and refer to the number of reward contingency reversals during the whole task.
Summarizing the behavioral indicators we extracted, compared to HCs, we found that individuals with psychosis required more trials to learn the reward contingencies, achieved fewer reversals, showed less responsivity to positive feedback (win-stay and lose-shift probability), had fewer correct choices and total rewards, and showed attenuated learning rates. In brief, individuals with psychosis showed reduced task performance compared with HCs, which may reflect a cognitive decline when instrumental associations are built and value representations are created.
Main meta-analysis
According to the instrumental learning meta-analysis, the psychosis group presented increased activity in the left middle occipital gyrus (MOG), insula, lingual gyrus (ING), and postcentral gyrus (PoCG) and decreased activity in the cortico-striato-thalamo-cortical (CSTC) circuit, including the right DS, insula, thalamus, middle and posterior cingulate cortices (MCC and PCC), dorsolateral frontal gyrus (DLPFC), OFC, left cerebellum, mPFC, and associated sensory areas (inferior and middle temporal gyrus and inferior [IFG] and superior parietal gyrus [SPG]) (Table 2 and Figure 2).
Table 2. Results of the meta-analyses for brain activation difference between individuals with psychosis and HCs during instrumental learning

Notes: Results were threshold at p = 0.005, peak height threshold of 1, extent threshold of 10. BA, Brodmann area; MOG, middle occipital gyrus; ING, lingual gyrus; PoCG, postcentral gyrus; SPG, superior parietal gyrus; PCC, posterior cingulate cortex; MCC, middle cingulate cortex; IFG, inferior frontal gyrus; MFG, middle frontal gyrus; mPFC, medial prefrontal cortex; ITG, inferior temporal gyrus; dlPFC, dorsolateral prefrontal cortex; OFC, orbital prefrontal cortex; SDM, signed differential mapping; MNI, Montreal Neurological Institute.

Figure 2. Instrumental learning-evoked activation differences between individuals with psychosis and HCs in the meta-analysis. Brain regions that showed significant differences in instrumental learning-related activation in individuals with psychosis relative to HCs. Red and blue indicate hyperactivity and hypoactivity, respectively, in individuals with psychosis compared to HCs, and the color scale represents probability values from statistical randomization testing (z values). For the instrumental learning, the psychosis group showed hyperactivation in the middle occipital gyrus (MOG), insula, lingual gyrus, postcentral gyrus, and hypoactivation in the CSTC circuit, including the dorsal striatum (DS), insula, thalamus, middle and posterior cingulate cortex (MCC/PCC), dorsolateral prefrontal cortex (DLPFC), orbital prefrontal cortex (OFC), cerebellum, medial prefrontal cortex (mPFC), and association sensory area (inferior and middle temporal gyrus, inferior and superior parietal gyrus). Notes: P, ‘individuals with psychosis’; HC, ‘healthy control’.
Sensitivity analysis
Whole-brain jackknife sensitivity analysis revealed that the results in the left MOG, left ING, right SPG, left cerebellum, right PCC, right IFG, right precentral gyrus, right middle frontal gyrus (MFG), right DLPFC, and right thalamus activity were highly replicable and preserved in all combinations of datasets. The increased activation in the left insula and PoCG remained significant in all but one and three combinations, respectively. The decreased activation in the right striatum, left mPFC, and right OFC remained significant in all but one study (Supplementary Table S2).
Subgroup analysis
The subgroup analyses for instrumental learning meta-analysis were repeated for studies that included individuals with chronic SZ only, those involving patients receiving medication, those using money stimulus, those including psychosis patients diagnosed by the DSM, those using a 3.0-T MR scanner, and those including only English-speaking individuals. The above results remained largely unchanged when the analyses were repeated, except for studies that included individuals with chronic SZ only, studies involving patients receiving medication, and studies that included individuals with SZ diagnosed by the DSM (Supplementary Table S2).
Meta-regression analysis
We explored information on the mean age, duration of illness, percentage of male patients, symptom severity, dose equivalent, medication, and task performance variables using meta-regression analysis. Notably, meta-regression analyses revealed that the percentage of first-generation antipsychotic (FGA) users was negatively associated with mPFC hypoactivation (Montreal Neurological Institute [MNI] coordinates: x = 6, y = 24, z = 60, r = −0.473, p = 0.035), and the percentage of medicated individuals was associated with insula hyperactivation (MNI coordinates: x = −34, y = −8, z = 8, r = −0.480, p = 0.048) (Figure 3).

Figure 3. (A–B). Meta-regression analyses between clinical symptoms and brain activity during instrumental learning. (a) Scatter plot showing a significantly negative association between instrumental learning-evoked activity in the mPFC (MNI coordinates: x = 6, y = 24, z = 60, r = −0.473, p = 0.035) and the % (percentage) of medicated individuals (the proportion of individuals with psychosis who had ever received medicated treatment). (b) Scatter plot showing a significantly negative association between instrumental learning-evoked activity in the insula (MNI coordinates: x = −34, y = −8, z = 8, r = −0.480, p = 0.048) and the % (percentage) of FGA users (the proportion of individuals with psychosis who had ever received FGA). Notes: mPFC, ‘medial prefrontal cortex’; MNI, ‘Montreal Neurological Institute’.
Discussion
In the coordinate-based meta-analysis, we found a widely distributed reduced brain response during instrumental learning in the CSTC circuit, including the DS, insula, thalamus, MCC, PCC, cerebellum, mPFC, dorsolateral frontal gyrus (DLPFC), OFC, and associated sensory areas, and higher activity in the left MOG, insula, ING, and PoCG. Moreover, mPFC hypoactivation was negatively associated with the percentage of first-generation antipsychotic (FGA) users, and insula hyperactivation was negatively associated with the percentage of medicated patients. These findings in the psychosis group confirmed reward processing abnormalities when the subjects performed the instrumental learning task, providing further evidence of impaired action–outcome (A–O) learning in psychosis. Additionally, summarizing the behavioral indicators we extracted, compared to HCs, we found that the psychosis groups required more trials to learn the A–O contingencies and achieved fewer rewards in these tasks, which perhaps reflects a cognitive decline when building instrumental associations and creating value representations.
Our findings were consistent with the positive valence system within the Research Domain Criteria (RDoC) project, which is a framework for research on mental disorders that focuses on dimensions of behavioral and psychological functioning and their implementation of neural circuits (Cuthbert & Insel, Reference Cuthbert and Insel2013; Insel et al., Reference Insel, Cuthbert, Garvey, Heinssen, Pine, Quinn and Wang2010). In the RDoC, functions associated with processing reward-related information are fundamental drivers of motivation, learning, and goal-directed behavior and have been classified as positive valence systems under the RDoC (Dexter et al., Reference Dexter, Roberts, Ayoub, Noback, Barnes and Young2025). As is central to the RDoC framework, identifying the distinct mechanisms underlying instrumental learning task performance may provide a better understanding of the antecedents and processes that lead to different forms of psychopathology (Michelini et al., Reference Michelini, Palumbo, DeYoung, Latzman and Kotov2021). In addition, in a system that proposes a hierarchical dimension classification of mental health, called Hierarchical Taxonomy of Psychopathology (HiTOP), reward learning is also associated with the distress and substance abuse subfactor and the thought disorder spectrum, which has been proven by a series of behavioral and neuroimaging experiments; thus, identifying the aberrant brain mechanisms of instrumental learning is also highly important for further understanding potential biobehavioral systems underlying psychopathology and ultimately informing future classifications (Kotov et al., Reference Kotov, Krueger, Watson, Cicero, Conway, DeYoung, Eaton, Forbes, Hallquist, Latzman, Mullins-Sweatt, Ruggero, Simms, Waldman, Waszczuk and Wright2021; Michelini et al., Reference Michelini, Palumbo, DeYoung, Latzman and Kotov2021; Ruggero et al., Reference Ruggero, Kotov, Hopwood, First, Clark, Skodol and Zimmermann2019).
In instrumental learning, reward and punishment exist as opposite behavioral outcomes, enabling animals to build associations between stimuli, action, and outcomes, providing information for future decisions, and adapting to the changing environment (Bouton et al., Reference Bouton, Maren and McNally2021; Dexter et al., Reference Dexter, Roberts, Ayoub, Noback, Barnes and Young2025; Taylor, Pearlstein, & Stein, Reference Taylor, Pearlstein and Stein2020). It is a biological instinct to seek benefits and avoid harm, so the action will be reinforced if it results in reward and suppressed if it results in punishment. However, a decline in the reward learning ability of individuals with schizophrenia has been confirmed in many previous studies and is correlated with reduced working memory, impaired executive function, and increased negative symptoms (Nestor et al., Reference Nestor, Choate, Niznikiewicz, Levitt, Shenton and McCarley2014; Woodberry, Giuliano, & Seidman, Reference Woodberry, Giuliano and Seidman2008). The disruption in reward learning makes it difficult for participants with psychosis to develop goal-directed behavior toward a specific outcome, ultimately leading to amotivation in clinical practice, which is also considered a core symptom of psychosis (Waltz et al., Reference Waltz, Schweitzer, Gold, Kurup, Ross, Salmeron and Stein2009). Taken together, these results suggest that impaired motivational processes, induced by reward learning deficits, may represent a common denominator uniting the neuropsychological and clinical manifestations of psychosis. Summarizing our findings from the perspective of reward processing, we also propose a network neurobiological model, including the prefrontal cortex and striatal dopamine circuit. Within the prefrontal cortex circuit, when an action is planned in a given context, the medial prefrontal cortex first signals an outcome prediction, and then, after the action is executed, the prediction is updated by comparing it to the actual outcome to produce a discrepancy (PE) (Krawitz, Braver, Barch, & Brown, Reference Krawitz, Braver, Barch and Brown2011). Indeed, current evidence suggests a role of the thalamus in schizophrenia. For example, thalamo-cortical connections would be reflected in poor cognitive focus, e.g., impaired attention (Liu et al., Reference Liu, Zhao, Kratochvil, Jiang, Cui, Wang and Cui2023; Paul et al., Reference Paul, See, Vijayakumar, Njideaka-Kevin, Loh, Lee and Dogrul2024). In the striatal reward system, the striatal dopaminergic activity signal receives unexpected rewards, and reciprocal feedback loops with frontal regions such as the DLPFC and OFC facilitate the formation of value representations (Robinson et al., Reference Robinson, Cools, Carlisi, Sahakian and Drevets2012; Robinson, Frank, Sahakian, & Cools, Reference Robinson, Frank, Sahakian and Cools2010). Understanding the brain mechanisms of deficits in reward learning may provide insight into important motivational deficits that may be a future target for the treatment of SZ-spectrum disorders.
Compared with HCs, individuals with psychosis presented hyperactivity in the visual centers and somatosensory areas, including the MOG, ING, and PoCG. Consistent with our results, Tu and colleagues reported elevated functional connectivity in the postcentral, precentral, and lateral occipital gyri (Tu et al., Reference Tu, Bai, Li, Chen, Lin, Chang and Su2019). In fact, abnormally increased activation related to A–O associations in sensory areas would imply a heightened salience of irrelevant stimuli and impair goal-directed behavior through associations with reinforcing events (Zeng et al., Reference Zeng, Yan, Cao, Su, Song, Luo and Yang2022). The hyperactivation in the left insula was also reported in our meta-analysis. As an important part of the salience network (SN), the insula is widely involved in marking salient stimuli for additional processing and increasing the incentive salience of irrelevant or inappropriate stimuli (Manoliu et al., Reference Manoliu, Riedl, Zherdin, Mühlau, Schwerthöffer, Scherr and Sorg2014; Palaniyappan & Liddle, Reference Palaniyappan and Liddle2012; Singer, Critchley, & Preuschoff, Reference Singer, Critchley and Preuschoff2009), which, in turn, leads to delusions and hallucinations (Kapur, Reference Kapur2003). Our findings suggest that overactivation of salience processing in psychosis may cause inappropriate associations or delusions.
We detected hypoactivation in the PFC, including the mPFC, DLPFC, and OFC, in the psychosis groups, revealing a central role of the PFC in A–O learning and motivated behavior in dynamic environments. Furthermore, the PFC has been divided into multiple subregions that play complementary roles in reward-based A–O learning (Brown & Bowman, Reference Brown and Bowman2002; Buckley et al., Reference Buckley, Mansouri, Hoda, Mahboubi, Browning, Kwok and Tanaka2009; Gläscher et al., Reference Gläscher, Adolphs, Damasio, Bechara, Rudrauf, Calamia and Tranel2012; Luk & Wallis, Reference Luk and Wallis2013). Specifically, the OFC has been implicated in encoding outcome/state representations (Fellows, Reference Fellows2011; Morrison & Salzman, Reference Morrison and Salzman2009) and changes quickly after changes in reward contingencies (Fiuzat, Rhodes, & Murray, Reference Fiuzat, Rhodes and Murray2017), which contributes to the brain’s representation of a ‘cognitive map’ (Whyte et al., Reference Whyte, Kietzman, Swanson, Butkovich, Barbee, Bassell and Gourley2019). A cognitive map is analogous to a spatial map in that it organizes knowledge about the relationships between an action and a possible outcome in a particular state (Behrens et al., Reference Behrens, Muller, Whittington, Mark, Baram, Stachenfeld and Kurth-Nelson2018; Niv, Reference Niv2019; Wilson, Takahashi, Schoenbaum, & Niv, Reference Wilson, Takahashi, Schoenbaum and Niv2014), which plays an important role in guiding individual selection. Experiments across rodent and primate species also suggest that OFC lesions cause damage in adapting their choices based on updated valuations (Bradfield et al., Reference Bradfield, Dezfouli, van Holstein, Chieng and Balleine2015; Rhodes & Murray, Reference Rhodes and Murray2013). The mPFC is generally considered to be involved in processing and monitoring behaviors by predicting the outcomes of actions (Matsumoto, Matsumoto, Abe, & Tanaka, Reference Matsumoto, Matsumoto, Abe and Tanaka2007; Rudebeck et al., Reference Rudebeck, Behrens, Kennerley, Baxter, Buckley, Walton and Rushworth2008); afterward, it detects discrepancies between actual and predicted outcomes to generate PE and update the outcome anticipation appropriately (Alexander & Brown, Reference Alexander and Brown2014, Reference Alexander and Brown2011). Consistent with our findings, a series of studies revealed reduced activation in the mPFC in individuals with psychosis when processing PE (Jessup, Busemeyer, & Brown, Reference Jessup, Busemeyer and Brown2010; Krawitz et al., Reference Krawitz, Braver, Barch and Brown2011). Finally, the DLPFC is directly involved in the use of value information to guide action selection and the production of the ‘sense of agency’ (SoA) (Khalighinejad, Di Costa, & Haggard, Reference Khalighinejad, Di Costa and Haggard2016), which refers to the experience of being in control of one’s own actions and their consequences (Moore & Fletcher, Reference Moore and Fletcher2012). In addition, SoA has also been related to many neurological and psychiatric disorders, especially the positive symptoms (delusions and hallucinations) of psychosis, which means misattributing one’s own thoughts, feelings, and actions to external factors (Moore & Fletcher, Reference Moore and Fletcher2012; Penton et al., Reference Penton, Wang, Coll, Catmur and Bird2018). The observed dysfunction may suggest the impairment of value representation and A–O learning in individuals with psychosis.
We also found reduced neural activation in the striatal reward system, including the DS, insula, and thalamus, which are associated with reward processing during instrumental learning, in the psychosis groups. The striatum is a key structure of the basal ganglia that projects to frontal regions through dopamine neurotransmitters, collectively participating in A–O learning and reward processing (Delgado, Miller, Inati, & Phelps, Reference Delgado, Miller, Inati and Phelps2005; Haber, Reference Haber2003). Previous studies on reward processing have focused mainly on the VS, with more reward-related neurons found here, and the role of the DS in reward learning has gradually been confirmed in recent years (Ravel, Legallet, & Apicella, Reference Ravel, Legallet and Apicella2003; Takikawa, Kawagoe, & Hikosaka, Reference Takikawa, Kawagoe and Hikosaka2002; Watanabe, Lauwereyns, & Hikosaka, Reference Watanabe, Lauwereyns and Hikosaka2003). Many neuroimaging studies have supported the impact of the DS on the processing of all types of rewards and punishments, such as money, liquids, and odors (O’Doherty et al., Reference O’Doherty, Dayan, Schultz, Deichmann, Friston and Dolan2004; O’Doherty et al., Reference O’Doherty, Dayan, Friston, Critchley and Dolan2003). The DS, which connects with frontal and sensory cortices, is critical for acquiring and executing motivated behavior, which shifts to selection or action strategies if the state value changes (Burton, Nakamura, & Roesch, Reference Burton, Nakamura and Roesch2015; Foerde, Reference Foerde2018; Kesby, Eyles, McGrath, & Scott, Reference Kesby, Eyles, McGrath and Scott2018); the VS, which projects to the PFC and ACC regions, is required for creating value representations that form associations between the predictive outcome and action (Patterson & Knowlton, Reference Patterson and Knowlton2018; Porrino et al., Reference Porrino, Lyons, Smith, Daunais and Nader2004). The reward circuit provides necessary value information for the formation of A–O associations and voluntary actions, and impairment of the reward circuit leads to poor performance in instrumental learning behaviors in individuals with psychosis, which has been confirmed in many neuroimaging and meta-analysis studies (Katthagen et al., Reference Katthagen, Kaminski, Heinz, Buchert and Schlagenhauf2020; Vanes et al., Reference Vanes, Mouchlianitis, Collier, Averbeck and Shergill2018; Yang et al., Reference Yang, Song, Zou, Li and Zeng2024; Zeng et al., Reference Zeng, Yan, Cao, Su, Song, Luo and Yang2022). For example, our previous meta-analyses revealed hypoactivity in the reward circuit in individuals with SZ during the reward anticipation and PE processing phases (Yang et al., Reference Yang, Song, Zou, Li and Zeng2024; Zeng et al., Reference Zeng, Yan, Cao, Su, Song, Luo and Yang2022). Furthermore, there is a correlation between abnormal activation related to reward processing in the striatum and the severity of negative symptoms in individuals with unmedicated SZ (Katthagen et al., Reference Katthagen, Kaminski, Heinz, Buchert and Schlagenhauf2020). In summary, our results revealed that reward processing dysfunction is an important factor for aberrant A–O learning in psychosis individuals.
Attenuated activation in the MCC, PCC, and cerebellum was also found in individuals with psychosis during the instrumental learning task. Owing to the connectivity of the cingulate cortex, they participate as a whole in A–O learning (Rolls, Reference Rolls2019). The outcome inputs from the OFC to the ACC and the action information from the parietal cortex to the PCC are brought together to the MCC, after which A–O associations are integrated to guide behavior for the desired goal (Bush, Reference Bush2011; Rolls & Wirth, Reference Rolls and Wirth2018; Vogt, Reference Vogt2016). Additionally, recent anatomical work has revealed bidirectional connections between the cerebellum and the basal ganglia, which possibly indicates a critical role of the cerebellum in reinforcement learning (Bostan & Strick, Reference Bostan and Strick2010). It has been proposed that the cerebellum could contribute to anticipating action outcomes by predicting and transmitting the action state to the basal ganglia (Miall & Galea, Reference Miall and Galea2016). This evidence supports the involvement of the MCC, PCC, and cerebellum in A–O learning during instrumental learning tasks.
Correlations between instrumental learning-related responses and medication status
Our meta-analysis revealed that hypoactivity in the mPFC during instrumental learning was negatively associated with the percentage of FGA users. Dysfunction in the mPFC may result in hallucinations and delusions (Schlagenhauf et al., Reference Schlagenhauf, Wuestenberg, Schmack, Dinges, Wrase, Koslowski and Heinz2008). For example, previous studies revealed that hyperconnectivity between the mPFC and default mode network was correlated with more severe positive symptoms in individuals with psychosis (Whitfield-Gabrieli et al., Reference Whitfield-Gabrieli, Thermenos, Milanovic, Tsuang, Faraone, McCarley and Seidman2009). Brent and colleagues reported that delusional thinking was negatively correlated with connectivity between the lateral temporal cortex and ventral mPFC, which was possibly mediated by social cognition dysfunction (Brent et al., Reference Brent, Coombs, Keshavan, Seidman, Moran and Holt2014). Notably, as potent antagonists of D2-class dopamine receptors (Del’guidice & Beaulieu, Reference Del’guidice and Beaulieu2008), FGAs are effective for positive symptoms (hallucinations and delusions) (Garver, Reference Garver2006). Consistent with this, a systematic review revealed the superiority of FGA over second-generation antipsychotics (SGAs) in terms of the pharmacological treatment of delusional disorders (Muñoz-Negro & Cervilla, Reference Muñoz-Negro and Cervilla2016). In conclusion, these results may suggest that FGA plays a key role in the treatment of positive symptoms and that the relevant physiological function is related to the mPFC.
We also found that insula hyperactivation was negatively associated with the percentage of medicated individuals with psychosis. In line with this finding, previous studies have reported that left insula activation is negatively correlated with cumulative antipsychotic medication (Walter et al., Reference Walter, Suenderhauf, Smieskova, Lenz, Harrisberger, Schmidt and Borgwardt2016), and anatomical evidence has also indicated an association between antipsychotic exposure and reduced insula volume in individuals with psychosis (Palaniyappan & Liddle, Reference Palaniyappan and Liddle2012). According to the aberrant incentive salience hypothesis, dysfunction in the insula would cause inappropriate assignment of motivational salience and novelty and contribute to delusion and hallucination symptoms (Kapur, Reference Kapur2004; White, Joseph, Francis, & Liddle, Reference White, Joseph, Francis and Liddle2010). A meta-analysis of data from 7450 individuals with SZ who were treated with common, typical, and atypical antipsychotics revealed improvements in core psychotic symptoms such as hallucinatory behavior (Bertolino et al., Reference Bertolino, Blasi, Caforio, Latorre, De Candia, Rubino and Nardini2004; Mendrek et al., Reference Mendrek, Laurens, Kiehl, Ngan, Stip and Liddle2004). Our meta-regression results suggest that antipsychotic drugs have a positive influence on the abnormal salience attribution and positive symptoms of psychosis. Notably, although the regression analysis results during instrumental learning are statistically significant, they remain preliminary and require confirmation through longitudinal studies.
Clinical implications
From the RDoC perspective, the biological markers of instrumental learning may help elucidate the complex and multifaceted symptoms as well as the neurobehavioral disruptions observed in individuals with psychosis. Instrumental learning behavior depends on multiple component processes, including reward processing, the integration of action–outcome, and the signaling of mismatches between expected and obtained outcomes, called PE (Waltz et al., Reference Waltz, Xu, Brown, Ruiz, Frank and Gold2018; Yang et al., Reference Yang, Song, Zou, Li and Zeng2024). Dysfunction in reward processing is regarded in DSM-5 as a key factor in the anhedonic symptoms of schizophrenia (Francesmonneris, Pincus, & First, Reference Francesmonneris, Pincus and First2013). This relationship is consistent with the findings of several instrumental learning studies in psychosis, which have shown that increased negative symptoms, particularly anhedonia and avolition, are associated with reduced striatal responses to reward-predicting cues (Dowd & Barch, Reference Dowd and Barch2012; Juckel et al., Reference Juckel, Schlagenhauf, Koslowski, Wüstenberg, Villringer, Knutson and Heinz2006b; Simon et al., Reference Simon, Biller, Walther, Roesch-Ely, Stippich, Weisbrod and Kaiser2010). In other words, the structure of anhedonia is closely related to the process of reward evaluation, prediction, and motivation. In addition, within the RDoC framework, which aims to identify pathophysiological mechanisms that are common across multiple psychiatric disorders as well as those that are unique to specific psychiatric symptoms, reward processing abnormalities in the dopaminergic system could account for neurobiological dysfunctions observed in psychotic disorders (Cuthbert, Reference Cuthbert2022; Insel et al., Reference Insel, Cuthbert, Garvey, Heinssen, Pine, Quinn and Wang2010). In our meta-analysis, we found that reward learning deficits in patients with psychosis were associated with reduced activation in the CSTC circuitry. As psychosis is linked to changes in reward processing, probing neural processes of the reward system may improve the present understanding of the different profiles of motivational deficits and related neurobiological abnormalities associated with psychosis. Therefore, identifying specific profiles of abnormal reward processing during instrumental learning may be useful for identifying the brain–behavior dimensions of psychopathology and for supporting broader definitions of psychiatric symptoms.
Furthermore, our findings of abnormal neural representations of instrumental learning can help to better understand the effects of antipsychotic drugs in psychosis. Our study suggested that dysfunction of the mPFC and insula was associated with the medication state in psychosis, which may explain the antipsychotic medication effect on reinforcement learning. In patients responding to a treatment-induced blockade of dopamine D2 receptors, psychotic symptoms may be ameliorated by normalizing salience abnormalities in the reward system. In line with this, longitudinal studies have shown an improvement in attenuated striatal signaling in patients with antipsychotic-naive SZ when they receive monotherapy with a selective dopamine D2/3 antagonist (Nielsen et al., Reference Nielsen, Rostrup, Wulff, Bak, Broberg, Lublin and Glenthoj2012). One recent study revealed that reward processing in the caudate was normalized only after 6 weeks of aripiprazole monotherapy in individuals with FEP (Tangmose et al., Reference Tangmose, Rostrup, Bojesen, Sigvard, Glenthoj and Nielsen2023). Findings have also shown that patients with SZ treated with SGA drugs exhibit normalization of reward-related nucleus accumbens activation (Juckel, Schlagenhauf, Koslowski, Wüstenberg, et al., Reference Juckel, Schlagenhauf, Koslowski, Wüstenberg, Villringer, Knutson and Heinz2006b). These findings suggest that antipsychotic drugs may have a positive influence on abnormal salience attribution, which will help clarify the mechanisms of instrumental learning in psychosis, thereby guiding the development of effective interventions.
Limitations and future directions
Some limitations of this study need to be highlighted. First, publication bias was almost inevitable despite our comprehensive literature search (Cheung & Vijayakumar, Reference Cheung and Vijayakumar2016), and our meta-analyses, which were based on peak and effect sizes, were based on coordinates from published studies rather than raw statistical brain maps. Second, the correlation between brain activity and behavioral performance could not be determined due to insufficient data, and exploring the relationship between the brain and behavior will be our future research direction. Similarly, different subcategorical diagnoses in groups may affect between-subject variability, potentially affecting our findings. We also cannot rule out the potential influence of illness severity and stage on our results. Third, we did not distinguish studies based on the types of stimuli used, which is likely to affect our results. For example, some studies give only rewards to participants when they respond correctly, while others also punish participants when they make mistakes; the rewards they use include money, liquids, and scores. Fourth, the complicated effects of treatment, such as drug types, clinical response, and side effect profiles, cannot be ignored in our meta-analysis, and future longitudinal studies need to investigate the effects of medication and illness stage on neurological dysfunction in reward learning. Fifth, our study included only adult participants, so care should be taken when applying our research results to the child/adolescent population. Finally, diverse imaging acquisition techniques (e.g., different MRI field strengths, MRI scanners, and imaging parameters) may lead to methodological heterogeneity and potentially limit our ability to detect robust group differences.
In this study, we used a voxel-wise meta-analysis to investigate the neural responses during instrumental learning in participants with psychosis. Within the RDoC system, identifying the pathophysiological mechanisms that are unique to specific psychiatric symptoms, such as reward processing abnormalities in the dopaminergic system, could account for neurobiological dysfunctions observed in psychotic disorders. Our findings of functional alterations in psychosis may serve as state markers of psychosis that reflect the pathophysiological processes of the illness. Notably, although the meta-regression analysis results of brain activation and medicated states are statistically significant, they are only preliminary, and further large-scale longitudinal studies are needed in the future to understand the effects of and changes in antipsychotic drugs on brain activity. Moreover, the direct correlation between behavioral responses and brain activation in participants with SZ needs further exploration, especially with respect to the relationships between different behavioral stages and brain activity, to better investigate the relationships among the efficacy of drugs, patients’ behavior, and brain activity. Additionally, it would be interesting to explore the relevant concepts of reward learning and its neural representations, including the learning rate, reward sensitivity, and PE, in the future using theories and models of reinforcement learning, such as model-free versus model-based decision-making.
Conclusion
The present study examined the neural mechanisms during A–O learning in individuals with psychosis and their relevance to clinical symptomology. Our meta-analysis revealed hyperactivity in sensory areas and hypoactivity in the CSCT circuit in patients with psychosis during instrumental learning tasks. Additionally, instrumental learning-evoked mPFC hypoactivation was linked to the percentage of FGA users, and insula hyperactivation was linked to the percentage of medicated individuals. Our findings provide evidence for dysfunctions in value representation and A–O association integration in psychosis and have the potential to clarify the complex brain–behavior relationships in psychosis.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0033291725101323.
Data availability statement
The data supporting the findings in this study might be requested via the corresponding author of this article upon reasonable request.
Author contributon
J. Zeng and X. Yang contributed to the study conception and design and supervised the study. Y. Song, B. Cheng contributed significantly to the analysis and manuscript preparation. Y. Song and H. Cao perform the analysis with constructive discussions. X. Yang, Y. Song, and J. Zeng wrote the manuscript, which was reviewed by all authors and approved for publication.
Funding statement
This work was supported by the Fundamental Research Funds for the Central Universities (grant no. 2024CDJSKZK12); the Social Science Foundation of Chongqing (grant no. 2023NDYB93); Natural Science Foundation Project of Chongqing (grant no. CSTB2024NSCQ-MSX1116); the Venture and Innovation Support Program for Chongqing Overseas Returnees (grant nos. CX2019154 and CX2020119); the Social Science Foundation of Chongqing (grant no. 2020YBGL80); and Sichuan Science and Technology Program (grant no. 2024YFFK0361).
Competing interests
We declare no potential conflict of interest.