INTRODUCTION
Semantic verbal fluency (SVF) is a staple of neuropsychological assessments shown to be highly sensitive to a wide range of neurological conditions such as focal and diffuse brain injuries (Henry & Crawford, Reference Henry and Crawford2004a, Reference Henry and Crawford2004b) as well as neurodegenerative diseases such as Alzheimer’s disease (AD) (Henry, Crawford, & Phillips, 2004) and Parkinson’s disease (PD) (Henry & Crawford, Reference Henry and Crawford2004c). This task requires the retrieval of words from a given semantic category (e.g., animals) within a set time-limit, and the total number of words generated is tallied. Like many neuropsychological tools, this task relies on multiple cognitive abilities. At least two main interacting neural systems are involved in semantic memory retrieval (Ralph, Jefferies, Patterson, & Rogers, Reference Ralph, Jefferies, Patterson and Rogers2017). The first is a semantic store system akin to a mental encyclopedia which is supported by the temporal lobes. Second, an executive control system supported by prefrontal regions guides selection of concepts within the semantic store and monitors retrieved information. Thus, reduced SVF may indicate temporal lobe and/or frontal dysfunction, and the “total word” measure does not afford specific information as to the degree to which the two systems are compromised (Mayr, Reference Mayr2002; Reverberi, Cherubini, Baldinelli, & Luzzi, Reference Reverberi, Cherubini, Baldinelli and Luzzi2014).
To disentangle these contributions, Troyer and colleagues (Troyer et al., Reference Troyer, Moscovitch and Winocur1997) developed a method to characterize the verbal output, which tends to be organized in clusters of semantically related words. For instance, an output to an “animal” cue may begin with pets (cat, dog, goldfish) then switch to farm animals (cow, horse, pig). This method yields two indices: (1) Troyer mean cluster size (TMCS) which is the average number of consecutive words from the same subcategory (pets, farm animals), and is thought to reflect the temporal lobe, semantic store system, and (2) Troyer number of switches (TSW) between subcategories thought to reflect the fronto-executive control system.
Support of this neuroanatomical dissociation consists mainly of studies with neurological populations. Notably, TSW is reduced following frontal lobe lesions (Troyer, Moscovitch, Winocur, Alexander, & Stuss, Reference Troyer, Moscovitch, Winocur, Alexander and Stuss1998), and TMCS is reduced in groups with temporal lobe dysfunction such as lesions, Alzheimer’s disease (AD) and amnestic mild cognitive impairment (aMCI) (Ober, Dronkers, Koss, Delis, & Friedland, Reference Ober, Dronkers, Koss, Delis and Friedland1986; Price et al., Reference Price, Kinsella, Ong, Storey, Mullaly, Phillips and Perre2012; Troyer, Moscovitch, Winocur, Alexander, & Stuss, Reference Troyer, Moscovitch, Winocur, Alexander and Stuss1998; Troyer, Moscovitch, Winocur, Leach, & Freedman, Reference Troyer, Moscovitch, Winocur, Leach and Freedman1998). However, these findings are not consistently replicated and other studies find that TMCS is not reduced in AD and aMCI compared to a control group (Bertola et al., Reference Bertola, Lima, Romano-Silva, de Moraes, Diniz and Malloy-Diniz2014; Epker, Lacritz, & Munro Cullum, Reference Epker, Lacritz and Munro Cullum1999; Haugrud, Crossley, & Vrbancic, Reference Haugrud, Crossley and Vrbancic2011; Raoux et al., Reference Raoux, Amieva, Le Goff, Auriacombe, Carcaillon, Letenneur and Dartigues2008).
In Parkinson’s disease (PD) where SVF is frequently affected (Henry & Crawford, Reference Henry and Crawford2004c; Koerts et al., Reference Koerts, Meijer, Colman, Tucha, Lange and Tucha2013; Raskin, Sliwinski, & Borod, Reference Raskin, Sliwinski and Borod1992; Tröster et al., Reference Tröster, Fields, Testa, Paul, Blanco, Hames and Beatty1998; Williams-Gray et al., Reference Williams-Gray, Evans, Goris, Foltynie, Ban, Robbins and Sawcer2009), a few studies show reduced TSW, particularly in patients with PD-related mild cognitive impairment (PD-MCI) (Galtier, Nieto, Lorenzo, & Barroso, Reference Galtier, Nieto, Lorenzo and Barroso2017) and PD dementia (Epker et al., Reference Epker, Lacritz and Munro Cullum1999; Koerts et al., Reference Koerts, Meijer, Colman, Tucha, Lange and Tucha2013; Troyer, Moscovitch, Winocur, Leach, et al., Reference Troyer, Moscovitch, Winocur, Leach and Freedman1998). In contrast, TMCS is unaffected regardless of cognitive status (Donovan, Siegert, McDowall, & Abernethy, Reference Donovan, Siegert, McDowall and Abernethy1999; Epker et al., Reference Epker, Lacritz and Munro Cullum1999; Galtier et al., Reference Galtier, Nieto, Lorenzo and Barroso2017; Troyer, Moscovitch, Winocur, Leach, et al., Reference Troyer, Moscovitch, Winocur, Leach and Freedman1998). This pattern suggests that only the fronto-executive component, and not the semantic store component, is impaired.
While this is consistent with the classic dopamine-related, fronto-executive deficits in PD, it is at odds with a recent proposal that poor SVF signals temporal cortical involvement in this group. This interpretation is grounded in evidence from neuroimaging studies showing associations between temporo-parietal alterations and cognitive vulnerability in PD (for review, see Ray & Strafella, Reference Ray and Strafella2012) in conjunction with findings that deficits in SVF, but not in canonical executive functions (e.g., working memory, planning, and phonemic fluency), predict progression to PD dementia (Williams-Gray, Foltynie, Brayne, Robbins, & Barker, Reference Williams-Gray, Foltynie, Brayne, Robbins and Barker2007). However, the absence of a PD-related deficit in TMCS does not necessarily suggest that temporal lobe function is intact, but instead, may reflect a shortcoming of this index as a valid and sensitive measure of the semantic store component. As noted above, inconsistent findings are found in patients with temporal dysfunction, and this variability may be due to a floor effect as TMCS values are near zero even in healthy controls [e.g., M=0.75, SD=0.57 in controls (Troyer, Reference Troyer2000)].
Importantly, the main criticism of the Troyer method is its reliance on subjective, experimenter-based judgements of semantic similarity between words, which may not capture individuals’ intuitive associations between words. To address this, Pakhomov and colleagues (Pakhomov, Eberly, & Knopman, Reference Pakhomov, Eberly and Knopman2016; Pakhomov & Hemmy, Reference Pakhomov and Hemmy2014; Pakhomov, Hemmy, & Lim, Reference Pakhomov, Hemmy and Lim2012; Pakhomov, Jones, & Knopman, Reference Pakhomov, Jones and Knopman2015; Pedersen, Pakhomov, Patwardhan, & Chute, Reference Pedersen, Pakhomov, Patwardhan and Chute2007) developed an automated mean cluster size index (AMCS) based on semantic relatedness values derived from a computational analysis of the co-occurrence of words in text corpora such as Wikipedia. As a result, words that are not members of the same subcategory (e.g., pets) may be included in the same cluster if they frequently co-occur or have a frequent common word. For example, the words “tortoise” and “hare” frequently co-occur due to the famous children’s story, but are not part of the same cluster based on the Troyer procedure.
In addition to AMCS, which is based on the sequential relatedness between words (SeqRel), the cumulative relatedness (CuRel) amongst all words generated during SVF is computed, irrespective of order. CuRel depicts the semantic diversity of responses such that a large value denotes a restricted exploration of the semantic store (i.e., only highly related words), and this may be due to a truncated semantic store or a deficit in search strategies. To date, these novel measures (CuRel and AMCS) have been applied solely to MCI (amnestic and multi-domain combined) and AD where they show promise. Indeed, CuRel is greater in AD than in MCI and normal aging, and increases with AD progression, while AMCS is associated with the risk of dementia but it does not track progression (Pakhomov et al., Reference Pakhomov, Eberly and Knopman2016; Pakhomov & Hemmy, Reference Pakhomov and Hemmy2014).
Furthermore, both measures correlate with fMRI language network connectivity in these groups, with AMCS relating to parietal and temporal regions, and CuRel to parietal and frontal regions (Pakhomov et al., Reference Pakhomov, Jones and Knopman2015). The latter, together with evidence that CuRel correlates with tests of executive function and attention (Pakhomov et al., Reference Pakhomov, Hemmy and Lim2012), suggest that it measures the executive control system. In contrast, AMCS may reflect the integrity of the semantic store as it relates to temporal regions of the language network and shows no relationship with tests of attention and executive function (Pakhomov et al., Reference Pakhomov, Eberly and Knopman2016). AMCS also shows concurrent validity with the Troyer method TMCS (i.e., r=0.57; Pakhomov et al., Reference Pakhomov, Jones and Knopman2015).
Taken together, these data-driven indices have the potential to enhance our ability to assess the respective integrity of the semantic store and executive control systems. However, they have yet to be applied to clinical groups other than AD and MCI, such as PD. Their construct validity has not been established as there are no studies comparing these automated indices to neuropsychological tests of executive function and semantic knowledge. Lastly, this computational method has not been used to derive an automated number of switches (ASW) to substitute for the subjective and potentially biased TSW. Thus, our general aim is to assess the validity and usefulness of these indices in the context of cognitive decline in advanced PD. First, we investigate psychometric properties by comparing automated to the experimenter-dependent SVF indices (concurrent validity), and all SVF indices to measures of executive function and semantic knowledge (construct validity). Second, we examine whether individual differences in SVF in PD and group differences between PD with normal cognition (PD-NC) and PD-MCI are related to specific SVF indices. Finally, we determine whether SVF indices predict the presence of PD-MCI.
METHODS
Participants and Diagnostic Group Classification
Fifty patients with advanced PD, diagnosed according to the UK Brain Bank Criteria (Hughes, Daniel, Kilford, & Lees, Reference Hughes, Daniel, Kilford and Lees1992), who were evaluated to determine their candidacy for Deep Brain Stimulation were included in this retrospective study. This study was completed in accordance with the Helsinki Declaration and was approved by the Research Ethics Board at the University Health Network in Toronto. Exclusion criteria were onset of PD before 40 years of age, severe depression, presence of PD dementia, low premorbid IQ, or poor English proficiency or knowledge (Wechsler Test of Adult Reading [WTAR] standard score<80), previous neurosurgical interventions or other neurological diagnosis, and treatment with anti-cholinergic or cholinesterase inhibitor medications.
Of this sample, 22 patients were identified with PD-NC and 28 with PD-MCI (n=5 single-domain and n=23 multiple-domain). Performance on SVF tasks was not considered in diagnostic determination. Instead, these cognitive diagnoses were reached based on a consensus review (by M.C. and M.S.) of their comprehensive neuropsychological assessment, which included a clinical interview and psychometric testing sampling multiple cognitive domains (general mental status, IQ, attention, visuospatial function, language, executive function, and memory), using the MDS task force diagnostic Level 2 criteria for PD-MCI (Litvan et al., Reference Litvan, Goldman, Tröster, Schmand, Weintraub, Petersen and Williams‐Gray2012).
Note that the specific tests administered varied to some extent across patients, and for brevity, we do not report here specific scores on all measures used to support cognitive diagnoses. All neuropsychological tests were administered on a single day when patients were ON medications. Demographic (age, sex, years of education) and clinical characteristics [PD duration, age at PD onset, levodopa-equivalent daily dose (LEDD), Unified Parkinson’s Disease Rating Scale (UPDRS) part 3 – Total score ON and OFF medications, specific cognitive domains affected in PD-MCI] are presented in Table 1.
Table 1 Demographic and clinical characteristics

LEDD=Levodopa-equivalent daily dose; UPDRS=Unified Parkinson’s Disease Rating Scale; PD-NC=Parkinson’s disease with normal cognition; PD-MCI=Parkinson’s disease with mild cognitive impairment; Mdn=median; IQR, interquartile range.
* Indicates a significant mean difference in Mann-Whitney U tests between PD-MCI and PD-NC.
Semantic Verbal Fluency Measures
Total Words
The total number of words generated to the animal cue, by each individual during a 1-min interval was tallied, excluding repetitions and set-loss errors.
Experimenter-dependent SVF indices
SVF responses were transcribed. Using the method of Troyer et al. (Reference Troyer, Moscovitch and Winocur1997), the number of switches (TSW) between subcategories and mean cluster size (TMCS) were derived. Examples of the subcategory include pets, birds, fish, African animals, etc. The size of each cluster is the number of consecutive words that belong to the same subcategory minus 1. TMCS is the sum of all clusters’ sizes divided by the total number of clusters which includes those comprised of a single word. For example, the sequence cat-dog-parrot has a cluster size of 2; a cluster comprised of one word has a size of 0. The number of switches (TSW) is the number of clusters minus 1.
Data-driven, automated SVF indices
Automated indices including ASW, AMCS, and CuRel were calculated using the ‘VFClust’ Python package version 2 developed by Pakhomov et al. (Pakhomov & Hemmy, Reference Pakhomov and Hemmy2014; Pakhomov et al., Reference Pakhomov, Jones and Knopman2015; Ryan, Reference Ryan2013). This method is based on latent semantic analysis (LSA) which relies on the assumption that related words co-occur more frequently than unrelated words in general texts. LSA also detects latent associations between words that do not appear in the same context or text but have a frequent common relation with a third word. VFClust package relies on analyses of Wikipedia entries of animal articles as the text source. Animal names were lemmatized using WordNet lemmatizer tool in Natural Language Toolkit in python libraries (Bird, Reference Bird2006) and noise-reduction techniques were applied to derive a semantic space. A co-occurrence matrix was constructed where each animal word is represented as a vector. Semantic relatedness between a given pair of animal names is obtained by finding the cosine of the angle between the corresponding vectors, which ranges from −1 (low relatedness) to 1 (high relatedness).
These semantic relatedness values are used to derive automated SVF indices. ASW and AMCS require setting cluster boundaries using a predetermined threshold. Here, we used two thresholds as it is unclear which cutoff would be best suited for our sample: (1) 0.60 which was used by Pakhomov (Pakhomov & Hemmy, Reference Pakhomov and Hemmy2014) in his previous study in AD and MCI; and (2) 0.50, which represented the mean relatedness of all pairs generated in our overall sample (M=0.51; SD=0.18) and mean relatedness of all sequential words (SeqRel) for each participant (M=0.51; SD=0.05). Sequential words with a semantic relatedness value higher than the set threshold are considered members of the same cluster, and lower values indicate a switch to a different cluster. As in the Troyer method, ASW is the number of clusters minus 1, and AMCS is the sum of clusters sizes divided by the number of clusters including those comprised of one word. In addition to ASW and TSW indices derived from SeqRel values, we also derived CuRel, which is the average semantic relatedness between all words generated by a participant during the SVF test, irrespective of their order.
Measures of Executive Function and of Language/Semantic Knowledge
A subset of tests administered during the clinical neuropsychological assessment included measures of: (1) Executive function involving mental-set shifting: errors on the Wisconsin Card Sorting Task (WCSTerrors) and Trail-making Test B minus A (TMTB-A), and (2) Language/semantic knowledge: vocabulary (VOCAB) subtest of the Wechsler Abbreviated Scale of Intelligence (WASI), WTAR, and Boston Naming Test (BNT); (for tests descriptions see Strauss, Sherman, & Spreen, Reference Strauss, Sherman and Spreen2006). Raw scores on neuropsychological tests are presented in Table 2 using median and interquartile range.
Table 2 Neuropsychological tests and verbal fluency indices

Note. Median values are reported followed by interquartile range in parentheses. Higher scores for WSCTerrors and TMTB-A indicate poorer performance.
WCSTerrors=Wisconsin Card Sorting Task - errors; TMTB-A=Trail Making Test B minus A; VOCAB=vocabulary; BNT=Boston-Naming Test; WTAR=Wechsler Test of Adult Reading; SVF=Semantic Verbal Fluency; TMCS=Troyer mean cluster size; TSW=Troyer switch; AMCS=automated mean cluster size (0.5/0.6 thresholds); ASW=automated switches (0.5/0.6 thresholds); CuRel=cumulative relatedness; SeqRel=sequential relatedness; Mdn=median; IQR=interquartile range; PD-NC=Parkinson’s disease with normal cognition; PD-MCI=Parkinson’s disease with mild cognitive impairment.
* Indicates a significant mean difference in Mann-Whitney U tests between PD-MCI and PD-NC groups.
Statistical Analysis
Several variables were not normally distributed (e.g., education, age at PD onset, PD duration, UPDRS-part 3, TSW, TMCS, TMTB-A, WCSTerrors, WTAR, VOCAB, and BNT). Therefore, descriptive statistics are reported using median and interquartile range for demographic and clinical variables (Table 1), and for neuropsychological variables, including SVF (Table 2). We used nonparametric statistics including Mann-Whitney U test to compare diagnostic groups (PD-NC vs. PD-MCI) across these variables, and Spearman rank correlations between the experimenter-dependent and automated SVF indices (Table 3), and between SVF indices and neuropsychological tests of executive function and language/semantic knowledge (Table 4). To determine whether the magnitude of different correlations differ statistically, we used a method based on Fisher’s r-to-z transformation (Steiger, Reference Steiger1980).
Table 3 Convergent validity of automated and experimenter-dependent verbal fluency measures

TMCS=Troyer Mean Cluster Size; TSW=Troyer Switch; AMCS=automated mean cluster size (0.5/0.6 thresholds); ASW=automated switches (0.5/0.6 thresholds).
*** p<.001.
** p<.01.
* p<.05.
Table 4 Construct validity of SVF indices against total word count and neuropsychological tests

Note. Higher scores for WSCTerrors and TMTB-A indicate poorer performance.
TMCS, Troyer Mean Cluster Size; TSW, Troyer Switch; AMCS, automated mean cluster size (0.5/0.6 thresholds); ASW, automated switches (0.5/0.6 thresholds); WCSTerrors=Wisconsin Card Sorting Task - errors; TMTB-A=Trail Making Test B minus A; BNT=Boston-Naming Test; WTAR=Wechsler Test of Adult Reading; CuRel=cumulative relatedness.
* p<.05.
Finally, to determine whether SVF indices predict diagnostic groups (PD-NC vs. PD-MCI), we performed separate binary logistic regressions using experimenter-based (TSW and TMCS) and automated SVF indices derived at the .50 and .60 threshold (ASW-.50 and AMCS-.50 in one model; ASW-.60 and AMCS-.60 in another). For each model, the two predictors were entered simultaneously in one block with a target classification set at 0.5. The odds ratios are reported in reference to membership in the PD-MCI group. To identify which of the three methods provides the best prediction model of PD-MCI, we used the Bayesian information criterion (BIC) (Schwarz, Reference Schwarz1978) using the “VcdExtra” package in RStudio version 1.0.143 (Friendly, Reference Friendly2013) wherein the smallest value reflects a better fit. All other analyses were completed using IBM SPSS Statistics version 23. Although age and education may be contributing factors, we did not include these variables in our correlational and regression analyses here for reasons of simplicity and limited power. However, the same pattern of results was found when age or education were included as a covariate (see the Supplementary Materials).
RESULTS
Associations Between Automated and Experimenter-Dependent SVF Indices
As shown in Table 3, automated measures of switching at both thresholds (ASW-.50 and ASW-.60) correlated moderately with the experimenter-dependent switching measure (TSW). In contrast, small positive correlations were obtained between automated and experimenter-dependent cluster sizes indices, but only the correlation between TMCS and AMCS at the .50 threshold (and not at the .60) reached statistical significance. Together, these results suggest concurrent validity of the switching measures, but are less convincing vis-à-vis mean cluster size indices.
CuRel was not related to measures derived using the experimenter-dependent method (TSW and TMCS), but was positively correlated with AMCS and negatively correlated with ASW at both thresholds (Table 3). This lack of independence was also noted between switching and mean cluster size using all methods (TSW vs. TMCS; ASW-.50 vs. AMCS-.50; ASW-.60 vs. AMCS-.60) as evidenced by the moderate negative correlations between them (Table 3). Thus, despite suggestions that clustering and switching reflect the two systems preferentially, our data do not support this double dissociation.
Associations Between SVF Indices and Neuropsychological Tests
As shown in Table 4, SVF total words showed moderate correlations with both executive functions and language/semantic knowledge tests, supporting the idea that this measure involves multiple cognitive abilities. SVF indices of switching correlated mildly to moderately with executive functioning (TSW with WCSTerrors; and ASW-.50 and ASW-.60 with TMTB-A). However, none of the SVF indices correlated significantly with language/semantic knowledge tasks, including those thought to reflect the integrity of the semantic store (TMCS, AMCS). These indices also did not relate to executive functioning. These findings support the conclusion that switching during SVF reflects the executive control system, but fail to support the notion that MCS reflects semantic knowledge or that CuRel relates to either cognitive domain.
Individual Differences in SVF Total Words and Indices
SVF total words correlated moderately with all switching indices (TSW, ASW-.50 and ASW-.60), weakly with CuRel, but not with any of the mean cluster size measures (TMCS, AMCS-.50, AMCS-.60; see Table 3). The correlations between total words and switching were significantly different from those of total words and mean cluster size (TMCS vs. TSW, Z=2.37; p=.01; AMCS-.50 vs. ASW-.50, Z=3.99; p<.0001; AMCS-.06 vs. ASW-.60, Z=4.94; p<.0001). This suggests that switching abilities, rather than clustering, underlie individual differences in SVF total words in PD.
Diagnostic Group Differences PD-NC Versus PD-MCI
As shown in Table 1, PD-NC patients were slightly younger (U=191; Z=−2.3; p=.02), and had an earlier age of PD onset than PD-MCI patients (U=184; Z=−2.4; p=.02), but the differences in years were numerically small (approximately 3 years). However, there was no significant difference between groups with respect to disease severity as measured by disease duration (U=297; Z=−0.22; p=.83), LEDD (U=274; Z=−0.66; p=.51), UPDRS-part3 ON (U=307.5; Z=−0.10; p=.99) or OFF medication (U=301.5; Z=−0.13; p=.90). Years of education were also comparable (2 years of difference) although a trend favoring PD-NC was noted (U=212.5; Z=−1.9; p=.06). Also, the number of males in the PD-MCI group was significantly higher than the number of females compared to the PD-NC group (U=217.0; Z=−2.11; p=.03).
With respect to performance on neuropsychological measures (Table 2), PD-MCI patients had poorer executive functioning than PD-NC [WCSTerrors (U=194.5; Z=−2.2; p=.03), TMTB-A (U=199; Z=−2.3; p=.03)]. This was expected given that performance on these tasks supported a diagnosis of PD-MCI (i.e., executive function was one of the cognitive domain impaired in 24 of 28 PD-MCI patients). In contrast, language/semantic knowledge was not significantly different between groups [vocabulary (U=242; Z=−2.2; p=.20), BNT (U=286.0; Z=−0.42; p=.67), and WTAR (U=216.5; Z=−1.8; p=.070). This also mirrors findings that the language cognitive domain was impaired (based on PD-MCI criteria) only in 3 of 28 PD-MCI patients.
With respect to SVF measures (which were not used for diagnostic purpose), total words (U=154; Z =−3.0; p=.003) and all measures of switching [TSW (U=209; Z=−1.9; p=.050), ASW-.50 (U=209; Z=−1.9; p=.050) and ASW-.60 (U=184.5; Z=−0.45; p=.015)] were reduced in PD-MCI relative to PD-NC. However, there was no significant group difference in mean cluster size across techniques [TMCS (U=239; Z=−1.3; p=.18), AMCS-.50 (U=248; Z=−1.17; p=.24), and AMCS-.60 (U=285; Z=−0.45; p=.65)], nor in CuRel (U=305; Z=-.06, p=.95).
SVF Indices’ Prediction of PD-MCI
Classification accuracy ranged from 62% to 68% across models, as shown in Table 5. Regardless of the method used, SVF indices predicted PD-MCI diagnosis in logistic regression models [TSW and TMCS: χ 2 (2, N=50)=8.1, p=.018; ASW-.60 and AMCS-.60: χ 2 (3, N=50)=10.5, p=.015; ASW-.50 and AMCS-.50: χ 2 (3, N=50)=11.7, p=.009] (see Table 6). For each regression model, the odds ratios and confidence intervals for individual predictors are shown in Table 6. In all models, switching (TSW, ASW-.60, ASW-.50) was predictive of PD cognitive diagnosis, and only the mean cluster size measure derived at the 0.50 threshold (AMCS-.50) was also a significant predictor. This is again consistent with findings that individual differences in SVF in our PD group relate to a greater extent on executive functioning.
Table 5 Classification rates for logistic regression models

PD-NC, Parkinson’s disease with normal cognition; PD-MCI, Parkinson’s disease with mild cognitive impairment.
Table 6 Logistic regression model to predict PD-MCI

TMCS, Troyer Mean Cluster Size; TSW, Troyer Switch; AMCS, automated mean cluster size (0.5/0.6 thresholds); ASW, automated switches (0.5/0.6 thresholds).
Based on the Bayesian Information Criterion (BIC), models including automated SVF indices are preferred to those including the experimenter-based indices. In order of preference are the automated model with 0.5 threshold (BIC=68.70) the automated model with 0.6 threshold (BIC=70.67), and the experimenter-dependent model (BIC=72.25).
DISCUSSION
Our main goals were to investigate the validity of automated, LSA-based SVF indices and to expand their use to characterize cognition in advanced PD. This LSA-based approach addresses the subjective nature of previous methods by providing indices based on the objective and quantifiable degree of relatedness between words derived from corpora of texts. Our findings pertaining to the switching index are particularly compelling, while findings related to the other measures including the MCS index and CuRel are equivocal. Specifically, analyses of ASW showed evidence of its validity in that it correlated with experimenter-based TSW (concurrent validity) and with executive function tasks that preferentially tap mental flexibility (construct validity). Furthermore, all measures of switching (TSW, ASW-.6, ASW-.5) characterized individual difference in advanced PD. In contrast, measures of MCS failed to demonstrate concurrent validity, or construct validity when compared to tasks of semantic knowledge. These also did not consistently differentiate cognitive status in PD-NC and PD-MCI. Lastly, a model based on the automated measures outperformed the Troyer-based indices in differentiating cognitive status in PD. Here, we discuss findings pertaining to the validity of these indices and their application to the characterization of cognition in PD.
Our study is the first to derive automated switching indices (ASW), to provide evidence for their concurrent validity against the experimenter-based TSW index, and to demonstrate evidence of their construct validity against neuropsychological measures of executive function. The latter is consistent with a recent study in PD showing a positive correlation between experimenter-based TSW and the number of categories in the WCST (Galtier et al., Reference Galtier, Nieto, Lorenzo and Barroso2017), and with another study in PD showing moderate correlation (r=.47), albeit nonsignificant due to low power, between TSW and a composite score of executive functioning based on TMTB and measures of conceptualization (Demakis et al., Reference Demakis, Mercury, Sweet, Rezak, Eller and Vergenz2003). Studies in other populations also support the validity of TSW as a measure of executive function by establishing its relationship to mental flexibility (TMTB-A) in healthy older adults using a 10-min semantic fluency task (Rosen et al., Reference Rosen, Sunderland, Levy, Harwell, McGee, Hammond and Lefkowitz2005), and mental flexibility (TMTB, Stroop test, and errors on the intra/extra-dimensional set shifting) in individuals with Huntington’s disease (Ho et al., Reference Ho, Sahakian, Robbins, Barker, Rosser and Hodges2002). In summary, our study corroborates the validity of using switching as an indicator of executive ability.
However, our findings challenge the validity of mean cluster size as a useful marker of semantic memory deterioration. Automated MCS indices showed weak correlations (r=0.19 to 0.29) with the experimenter-dependent TMCS, which contrasted with the moderate relationship shown in one AD study (r=0.57) (Pakhomov et al., Reference Pakhomov, Jones and Knopman2015). This discrepancy may reflect the inherent characteristic differences between AD and PD or variability in reproducibility which was also noted across studies using the TMCS in the AD literature. Furthermore, MCS using any methods did not correlate with measures of semantic memory or language in our study, despite the use of established, valid measures of language ability and semantic knowledge such as BNT, WTAR, and a vocabulary test. Although the range of scores for BNT and WTAR tests are limited in our sample, this was not the case for the vocabulary test. Therefore, the relationships between these variables and MCS is unlikely to have been concealed by a range restriction on tests of semantic knowledge between diagnostic groups.
Surprisingly, this evidence of poor construct validity is consistent with the limited existing literature. Indeed, very few studies on experimenter-based SVF indices relate mean cluster size to neuropsychological measures of semantic knowledge or language. In the few studies that did, TMCS did not correlate with BNT in AD (Weakley & Schmitter-Edgecombe, Reference Weakley and Schmitter-Edgecombe2014), with the Information Subtest in PD and PD-MCI (Galtier et al., Reference Galtier, Nieto, Lorenzo and Barroso2017), nor with a composite language index based on naming and vocabulary tests in PD (Demakis et al., Reference Demakis, Mercury, Sweet, Rezak, Eller and Vergenz2003). Instead, the latter study showed TMCS to correlate with executive functioning in PD which further challenges the validity of TMCS as a measure of semantic memory, albeit this relationship was not found in our sample.
This is also in line with our findings showing correlations between mean cluster size and switching indices, suggesting that these are not independent despite the assumptions that they reflect different systems. Others have also shown this lack of independence between the experimenter-based indices, which partly underlined their criticisms of the method (Abwender, Swan, Bowerman, & Connolly, Reference Abwender, Swan, Bowerman and Connolly2001; Mayr, Reference Mayr2002; Pakhomov et al., Reference Pakhomov, Hemmy and Lim2012; Reverberi et al., Reference Reverberi, Cherubini, Baldinelli and Luzzi2014; Reverberi, Laiacona, & Capitani, Reference Reverberi, Laiacona and Capitani2006).Thus, although mean cluster size has proven helpful in characterizing AD (Pakhomov et al., Reference Pakhomov, Eberly and Knopman2016; Pakhomov & Hemmy, Reference Pakhomov and Hemmy2014), the lack of psychometric evidence for construct validity, its lack of independence from switching, and the variable findings in patients with temporal dysfunction call its utility into question and whether it validly reflects the integrity of the semantic memory store.
In our advanced PD group, we found switching to be particularly informative. Switching differentiated PD-NC and PD-MCI regardless of the method used, but MCS did not. This validates the findings of a previous study in PD-MCI where only TSW, and not TMCS, contributed to PD-MCI group membership prediction (Galtier et al., Reference Galtier, Nieto, Lorenzo and Barroso2017). This replication is important as there are several differences between the two samples with our groups presenting with longer disease duration (8 vs. 12 years), higher education (8.5 vs. 14.2 years), overall better SVF performance (total words 18 vs. 15 in PD-MCI) in our group and significant discrepancies between mean values of TSW and TMCS.
Although our sample consists of candidates for deep brain stimulation and excludes patients with PD dementia, a similar pattern of reduced switching but intact mean cluster size was found in PD dementia relative to healthy older adults using the Troyer method (Troyer, Moscovitch, Winocur, Leach, et al., Reference Troyer, Moscovitch, Winocur, Leach and Freedman1998). This suggests that our findings are not due to a selection bias but instead, are characteristic of PD across different degrees of cognitive decline. Importantly, we found better prediction of PD-MCI using automated indices, particularly switching, than the experimenter-based ones, which provides additional support for its use over the Troyer method.
Other techniques allow analyses of SVF that reflect the semantic store with more fine-grained dissociation potential. LSA-based methods, such as that used here, have shown promise. Notably, CuRel provides a dimensional approach to analyzing the breadth of the exploration of the semantic store from an entire SVF output. Previous work in AD and MCI showed it to be related to fMRI language network, including regions in left temporal, parietal, and frontal regions (Pakhomov et al., Reference Pakhomov, Jones and Knopman2015), and to track and predict AD progression (Pakhomov et al., Reference Pakhomov, Eberly and Knopman2016; Pakhomov & Hemmy, Reference Pakhomov and Hemmy2014).
However, despite the initial hypothesis that CuRel could serve as an improved index of semantic knowledge, our findings did not support this notion as it did not correlate with measures of language or semantic knowledge. Also, CuRel did not correlate with executive function in our sample, although such a relationship was found in AD and MCI (Pakhomov et al., Reference Pakhomov, Hemmy and Lim2012). Although CuRel does not differentiate between diagnostic groups in this study, it previously showed value in long-term prediction of cognitive outcomes in AD (Pakhomov et al., Reference Pakhomov, Eberly and Knopman2016), therefore, this warrants further investigation of the prediction of dementia in PD.
To conclude, we explored the utility of a novel LSA-based computational method, that infers semantics by analyzing the statistical co-frequency between words in available linguistic data, in comparison to a traditional procedure that relies on subjective categorization of words into semantic categories. We demonstrated that this approach outperformed the experimenter-dependent method in predicting cognitive decline in PD. Important advantages of this LSA-based method are its scalability and comparability for applications in diverse clinical groups, languages and cultures.
One limitation of this method is its reliance on a specified written text source (i.e., Wikipedia), which may not fully mirror participants’ internal semantic representations. Although we focused on a computational analysis of co-frequency, several other lexical factors that influence word retrieval may prove useful in future investigations (e.g., word frequency, familiarity, concreteness, source language) (Clark et al., Reference Clark, McLaughlin, Woo, Hwang, Hurtz, Ramirez and DeRamus2016; Juhasz, Chambers, Shesler, Haber, & Kurtz, Reference Juhasz, Chambers, Shesler, Haber and Kurtz2012; Reverberi et al., Reference Reverberi, Cherubini, Baldinelli and Luzzi2014; Taler, Johns, Young, Sheppard, & Jones, Reference Taler, Johns, Young, Sheppard and Jones2013). These are also amenable to computational analyses. Future research on the application of such computational methods is needed to maximize the information extracted from a widely used and simple clinical neuropsychological test.
Overall, our findings suggest that executive function plays an important role during SVF tasks and can adequately be operationalized with switching. MCS, however, may not be a good indicator of the contribution of the semantic memory component. Our findings also suggest that generative SVF tasks such as animal fluency are not adequate tests of semantic memory in patient groups with executive dysfunction nor are they an index of posterior-temporal dysfunction in PD.
ACKNOWLEDGMENTS
We thank Dr. Serguei Pakhomov for providing us with the VfClust program to perform the automated analysis. Authors do not have any conflicts of interests to disclose. This study did not have any sources of financial support.
SUPPLEMENTARY MATERIALS
To view supplementary material for this article, please visit https://doi.org/10.1017/S1355617718000759