Imagery ability assessments: a cross-disciplinary systematic review and quality evaluation of psychometric properties
BMC Medicine volume 20, Article number: 166 (2022)
Over the last two centuries, researchers developed several assessments to evaluate the multidimensional construct of imagery. However, no comprehensive systematic review (SR) exists for imagery ability evaluation methods and an in-depth quality evaluation of their psychometric properties.
We performed a comprehensive systematic search in six databases in the disciplines of sport, psychology, medicine, education: SPORTDiscus, PsycINFO, Cochrane Library, Scopus, Web of Science, and ERIC. Two reviewers independently identified and screened articles for selection. COSMIN checklist was used to evaluate the methodological quality of the studies. All included assessments were evaluated for quality using criteria for good measurement properties. The evidence synthesis was summarised by using the GRADE approach.
In total, 121 articles reporting 155 studies and describing 65 assessments were included. We categorised assessments based on their construct on: (1) motor imagery (n = 15), (2) mental imagery (n = 48) and (3) mental chronometry (n = 2). Methodological quality of studies was mainly doubtful or inadequate. The psychometric properties of most assessments were insufficient or indeterminate. The best rated assessments with sufficient psychometric properties were MIQ, MIQ-R, MIQ-3, and VMIQ-2 for evaluation of motor imagery ability. Regarding mental imagery evaluation, only SIAQ and VVIQ showed sufficient psychometric properties.
Various assessments exist to evaluate an individual’s imagery ability within different dimensions or modalities of imagery in different disciplines. However, the psychometric properties of most assessments are insufficient or indeterminate. Several assessments should be revised and further validated. Moreover, most studies were only evaluated with students. Further cross-disciplinary validation studies are needed including older populations with a larger age range. Our findings allow clinicians, coaches, teachers, and researchers to select a suitable imagery ability assessment for their setting and goals based on information about the focus and quality of the assessments.
Systematic reviews register
Imagery, defined as the representation and the accompanying experience of any sensory information without a direct external stimulus , or ‘seeing with the mind’s eye’, ‘hearing with the mind’s ear’ , is a fundamental cognitive process. For example, imagery can be helpful in decision-making or problem solving processes , in emotion regulation , for motor learning and performance . In sports, a strong imagery ability in athletes is associated with more successful and better performance [6, 7]. At the same time, several psychological disorders, such as posttraumatic stress disorder, depression, or social phobia, are associated with dysfunctions in imagery ability [8, 9]. In this context, the application of different imagery techniques showed positive effects in the treatment of psychological disorders , for pain treatment (guided imagery) , and to enhance motor rehabilitation in patients with neurological and orthopaedic disorders [11,12,13,14,15,16,17,18] as well as to enhance psychomotor skills or various aspects of performance in athletes (motor imagery) . The benefits of imagery depend on the individual capability to imagine  and it is deemed essential to assess imagery abilities prior to interventions .
Imagery is a multidimensional construct  with wide individual differences regarding preference of imagery (verbal and visual style), imagery control or imagery vividness [23, 24]. The pioneering work from Betts in 1909  already described and measured vividness of imagery in seven sensory modalities: visual, auditory, cutaneous, kinaesthetic, gustatory, olfactory and organic (e.g. feeling or emotion). Further research focused on additional dimensions of imagery clarity [26, 27], controllability , the ease and accuracy with which an image can be manipulated mentally [29, 30] and imagery perspective [7, 31]. Moreover, studies in cognitive and neuroscience [32, 33] assert that imagery is not unitary, and distinguished two types: spatial imagery and object imagery . Object imagery is defined as representations of the visual appearances of objects or scenes in terms of their precise form, size, shape and colour, whereas spatial imagery refers to rather abstract representations of the spatial relations among objects, parts of objects, locations of objects in space, movements of objects, object parts and other complex spatial transformations [34, 35].
Watt  and Cumming et al.  proposed a hierarchical model to explain the imagery process and components of imagery ability in sports. However, types of imagery are missing in their model. Now, we have revised this model and expanded it with the object and spatial type of imagery (Fig. 1).
The measurement of this multidimensional and multimodal construct has proven to be complex  and each type of assessments evaluates a different aspect of imagery ability . Over the past century, various assessments have been developed to evaluate an individual’s imagery ability considering different dimensions, sensory modalities, different perspectives, image manipulation, or the temporal coupling between real and imagined movements [7, 26, 27, 34, 40,41,42,43,44]. Most of those assessments are self-reported questionnaires (subjective assessments) and focus on object imagery. In contrast, the objective assessments focus more on spatial imagery . However, the literature lacks a systematic literature review of imagery evaluation methods and the evaluation of their measurement properties. Two previous narrative [45, 46] and one systematic  reviews mainly focused on assessments of a single imagery technique: motor imagery. In addition, these reviews only included assessments of motor imagery in the field of neurology or sports. Further, only two reviews reported the assessments’ psychometric properties [45, 47]). White et al.  evaluated self-report assessments of imagery, but all other assessments, developed or modified after that are missing in his review.
The aim of the present extensive and comprehensive systematic literature review was therefore to evaluate all available imagery ability assessments across four disciplines, regardless of the imagery technique used to answer the question: What imagery ability assessments exist in the fields of sports, psychology, medicine, and education, and what are their psychometric properties? For the interested clinician, coach, teacher, and researcher, our review provides (1) a systematic classification of the imagery ability assessments based on its construct, (2) a summary of the current level of evidence for the psychometric properties of the selected imagery ability assessments, and (3) all specific characteristics of the imagery ability assessment: version, subscales, scoring, equipment needed, etc.
In order to provide a comprehensive overview, we included all assessments that cover any aspect of imagery process and ability to vividly generate, transform, inspect, and maintain a mental image. Moreover, we included also assessments, which evaluated the frequency of use of imagery, the preference to think in words or images, and the temporal coupling of mental and physical practice.
This systematic review provides interested readers with a quick overview to select an appropriate imagery ability assessment for their current setting and goals based on information provided regarding the focus and quality of the imagery ability assessments.
Study design and registration
The protocol for this review was registered with the International Prospective Register of Systematic Reviews (PROSPERO; https://www.crd.york.ac.uk/prospero/, registration number CRD42017077004) and published . The present systematic review was written and reported using the Preferred Reporting Items for Systematic review and Meta-Analysis (PRISMA) guidelines, the PRISMA checklist, and the PRISMA abstract checklist [50, 51]. Additionally, we followed the recommendations for systematic reviews on measurement properties [52, 53].
We searched in four fields of interest: sports, psychology, medicine, and education. One author (ZS) and a librarian from the medical library of the University of Zurich independently performed the electronic search between September and October, 2017, in SPORTDiscus (1892 to current date of search), PsycINFO (1887 to current date of search), Cochrane Library (current issue), Scopus (1996 to current date of search), Web of Science (1900 to current date of search) and ERIC (1966 to current date of search). The search strategy included (1) construct: motor imagery, mental imagery, mental rehearsal, movement imagery, mental practice, mental training; (2) instrument: measure, questionnaire, scale, assessment; and (3) the filter for measurement properties by Terwee et al.  adapted for each database (Additional file 1: AF_1_Example search strategy_ Web of Science). An update of the search in all databases was performed in January 2021.
There was no limitation on a specific population (e.g. healthy individuals, adults, children, and patients). Additionally, there was no restriction on age, gender, or health status. We included all original articles published in English and German, which either developed mental or motor imagery assessments or validated their psychometric properties.
Articles were excluded if the authors only used neurophysiological methods to evaluate imagery ability (e.g. functional magnetic resonance imaging, electroencephalography, or brain-computer interface technology).
Figure 2 provides an overview of all databases and identified references. All citations were imported into the reference management software package EndNote (version X7; Thomson Reuters, New York, USA). De-duplication was performed by the librarian, who performed the original search. To examine the agreement and disagreement regarding studies’ eligibility between the two reviewers (ZS and CSA) in the preselection phase, 10% of all articles were randomly selected and screened by both reviewers. After preselection, titles, abstracts, and full texts from all identified articles were independently screened. Full texts were ordered if no decision could be made based on the available information. If no full text was available, the corresponding authors of the articles were contacted to obtain the missing papers. Disagreement of selected full texts was discussed by both reviewers, and if both reviewers were not able to agree on a decision a third reviewer would have been consulted to decide on in- or exclusion (which was not the case in this review). The Kappa statistic was calculated and interpreted in accordance with Landis and Koch’s benchmarks for assessing the inter-reviewer agreement: poor (0), slight (0.0 to 0.20), fair (0.21 to 0.40), moderate (0.41 to 0.60), substantial (0.61 to 0.80), and almost perfect (0.81 to 1.0) . The percentage agreement between the raters was also calculated .
Four researchers (ZS, SG, LM, and VZ) performed the data extraction into Microsoft Excel (Version 14.0, 2010, Microsoft Corp., Redmond, California, USA). ZS checked all data for accuracy. The following data were extracted: (1) characteristics of included articles: first author, year of publication, country of origin, study design, and number and main characteristics of participants (e.g. age, gender, and target population); (2) general characteristics of the assessment instrument: name, language, version, construct of evaluation, number of items, subscales, scoring, assessment format, time and equipment needed, examiner qualifications, and costs; and (3) data on the psychometric properties of the assessments: validity, reliability, and responsiveness.
Studies’ methodological quality: risk of bias rating
Two researches (ZS and CSA) carried out the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) evaluation independently. One study was evaluated by ZS and FB, because CSA was the first author. The COSMIN Risk of Bias checklist was applied to assess the methodological quality of studies on measurement properties . The COSMIN Risk of Bias checklist contains ten boxes with standards for Patient-Reported Outcome Measures (PROM) development, and for nine measurement properties: content validity, structural validity, internal consistency, cross-cultural validity, reliability, measurement error, criterion validity, hypotheses testing for construct validity and responsiveness. A 4-point rating system as ‘very good’, ‘adequate’, ‘doubtful’ and ‘inadequate’ was used for study evaluation (Additional file 2: AF_2_COSMIN_RoB_checklist). The overall rating of quality of each study was determined according to the lowest rating of any standard in the box (‘the worst score counts’ principle) .
Quality assessment of included instruments and GRADE approach
Based on the quality criteria for measurement properties proposed by Terwee et al.  and updated by Prinsen et al.  (Table 1), the measurement properties reported in the included studies were rated as positive, negative, or indeterminate. However, no criteria are defined to assess the quality of structural validity when authors only performed an explorative factor analysis (EFA). In this case, we followed the recommendation of de Vet et al. , Izquierdo et al.  and Watkins  and considered (1) number of extracted factors; (2) factor loading, that should be > 0.40; (3) items with loading ≥ 0.30 on at least two factors should be candidates for deletion; (4) correlation between factors and (5) the variance explained by the factors which should be > 50%. Guidelines for judging psychometric properties of imagery instruments by McKelvie  were also taken into account if there were any uncertainties.
Regarding the testing for construct validity, some hypotheses about expected differences between instruments were formulated by the reviewer team:
Strong correlation (at least 0.50) was expected if a related construct was measured with the comparator instrument.
Correlation between different modalities or dimensions of imagery, e.g. between vividness and auditory imagery, should be very low (< 0.30).
Correlation between subjective and objective assessments of imagery ability should be very low (< 0.30).
Regarding known-group validity based on previous evidence, no any sex differences regarding imagery ability were expected.
Just recently, a modified Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach for grading the quality of the evidence in systematic reviews of PROMs was introduced . Four of the five GRADE factors have been adopted for evaluating measurement properties in systematic reviews of PROMs: risk of bias (e.g. the methodological quality of the studies), inconsistency (e.g. unexplained inconsistency of results across studies), imprecision (e.g. total sample size of the available studies) and indirectness (e.g. evidence from different populations than the population of interest in the review). The GRADE approach was applied if studies evaluated the same instrument regarding language and version and the same population. Studies reporting psychometric properties of assessments tested with athletes and students were not pooled. Using the modified GRADE approach, the quality of the evidence is graded as high, moderate, low or very low (Table 2) [53, 64].
In total, 3922 references were retrieved in October, 2017. The search update in January 2021 resulted in 1616 additional references. We identified 78 additional references through reference list screening. The kappa statistic after screening of titles and abstracts was 0.83 (almost perfect), and the percentage agreement between the raters was 98%. After selecting the full texts, the kappa was 0.76 (substantial) and 85% percentage agreement was established. All distinguish between reviews have been discussed and the reviews agree on a decision.
Finally, 121 articles reporting 155 studies and describing 65 assessments from four disciplines were included in the present review. We categorised assessments based on their construct:
Motor imagery = movement imagery without engaging in its physical execution
Mental imagery in four sub-categories:
General mental imagery in any sensorial modality,
Spatial imagery or mental rotation = ability to rotate or manipulate mental images),
Distinguish between use of different cognitive style (e.g. verbal versus visual), and
Use of mental imagery (frequency of use in daily life).
Mental chronometry as temporal coupling between real and imagined movements.
Most studies were carried out in the fields of psychology and sport. We identified many assessments, which have been evaluated only with psychology students. Therefore, it was unclear whether those assessments should accordingly only be applied in the field of psychology. We defined such assessments as ‘not discipline specific’. Moreover, most studies evaluated different psychometric properties and according to COSMIN, each evaluation of a measurement property was separately assessed on its methodological quality. The overall rating of the quality of each study should be determined by taking the lowest rating of any standard in the box (e.g. ‘the worst score counts’ principle) . Furthermore, it was difficult to define a reasonable ‘gold standard’ for assessing criterion validity. If the authors correlated the score of a new instrument with an already established, widely used and well-known instrument, we considered the comparison as test for construct validity. Only if a shortened version was compared with the original version, we considered the comparison as test for criterion validity (proposed by COSMIN ).
Motor imagery assessments
In total, 33 out of the 121 articles focused on 15 motor imagery assessments: Florida Praxis Imagery Questionnaire (FPIQ), Imaprax, Kinesthetic and Visual Imagery Questionnaire (KVIQ-20) and short version KVIQ-10, Movement Imagery Questionnaire (MIQ), Revised Movement Imagery Questionnaire (MIQ-R), Movement Imagery Questionnaire-Revised second version (MIQ-RS), Movement Imagery Questionnaire-3 (MIQ-3), Movement Imagery Questionnaire for Children (MIQ-C), Test of Ability in Movement Imagery (TAMI), Test of Ability in Movement Imagery with Hands (TAMI-H), Vividness of Movement Imagery Questionnaire (VMIQ), Vividness of Haptic Movement Imagery Questionnaire (VHMIQ), Revised Vividness of Movement Imagery Questionnaire-2 (VMIQ-2) and the Wheelchair Imagery Ability Questionnaire (WIAQ). The characteristics of the included studies, their ‘risk of bias assessment/rating’, and their psychometric properties are presented in Tables 3 and 4. The general characteristics of included instruments are presented in the Additional file 3: Table 1S.
Motor imagery assessments: validity
Risk of bias rating
In total, 30 out of the 33 motor imagery articles reported structural, criterion or construct validity. Only ten studies [6, 43, 73, 74, 77,78,79,80, 83, 89] were rated as very good or adequate and 12 studies [27, 67,68,69, 75, 76, 82, 84, 85, 88, 92, 93] were rated as inadequate regarding their methodological quality. The ‘risk of bias assessment/rating’ could not be applied to the study by Hall et al.  due to insufficient reporting on statistical methods that were performed.
There is high evidence for sufficient structural validity regarding the MIQ-R, MIQ-3 and VMIQ-2 assessments. The MIQ-C showed also sufficient structural validity but with moderate evidence (only one study of very good methodological quality). Construct validity of the MIQ and WIAQ was sufficient, but with low evidence (one study per assessment with doubtful quality). The FPIQ and Imaprax were not evaluated for validity. Further, the structural and construct validity of the KVIQ (original and short versions) for different language versions ranged from insufficient to sufficient between studies. These psychometric properties were evaluated with different populations (e.g. healthy individuals, patients after a stroke, Parkinson’s disease (PD), multiple sclerosis (MS), or patients with orthopaedic problems). However, only one study per subgroup was identified, which meant that pooling the data was not feasible. Furthermore, the construct validity of the KVIQ was sufficient in two studies (with PD or with MS patients), but both studies had a very small sample size (N < 15) and were therefore downgraded for imprecision. Moreover, structural and construct validity of the MIQ-RS, TAMI, TAMI-H and VMIQ reported in several studies were rated as indeterminate.
Motor imagery assessments: Reliability
Risk of bias rating
In total, 29 out of the 33 motor imagery articles reported development, internal consistency or test-retest reliability. Nine studies [7, 31, 73, 79,80,81,82, 85, 90] were rated as very good or adequate regarding their methodological quality. A total of 15 studies [27, 43, 67, 71, 72, 74,75,76, 78, 83, 84, 86,87,88,89] showed doubtful methodological quality and five studies [66, 68,69,70, 77] were rated as inadequate.
The test-retest reliability of several assessments was insufficient or indeterminate due to a lack of details reported in the studies, e.g. how reliability was calculated. For example, authors of several studies did not calculate the intraclass correlation coefficient (ICC) and stated that a ‘reliability coefficient’ or ‘reliabilities’ were calculated without specific description on the types of coefficients that were calculated (e.g. ICC, Pearson or Spearman correlations). In most cases, internal consistency was insufficient or indeterminate due to low evidence for sufficient structural validity. Only the MIQ-R, MIQ-3 and VMIQ-2 revealed a very clear sufficient internal consistency with a high evidence (multiple studies of at least adequate methodological quality) which corresponds to a sufficient structural validity. The KVIQ showed sufficient test-retest reliability but with low evidence. However, the results were summarised only for patients after a stroke.
Only two studies [76, 83] reported a sample size calculation. For the MIQ, MIQ-R, MIQ-3, VMIQ, VMIQ-2, KVIQ, and TAMI, the results were qualitatively summarised and reported in the Summary of Findings (SoF) Table (Additional file 4: Table 2S).
Mental imagery assessments
In total, 90 out of 121 articles reported mental imagery assessments. Based on their construct, we divided the assessments into three subgroups:
General mental imagery ability assessments (n = 24): Auditory Imagery Scale (AIS), Auditory Imagery Questionnaire (AIQ), Bucknell Auditory Imagery Scale (BAIS), Betts Questionnaire Upon Mental Imagery (150 items, QMI), Betts Questionnaire Upon Mental Imagery (shorted 35 items, SQMI), Clarity of Auditory Imagery Scale (CAIS), Gordon Test of Visual Imagery Control (TVIC), Imaging Ability Questionnaire (IAQ), Imagery Questionnaire by Lane, Kids Imaging Ability Questionnaire (KIAQ), Mental Imagery Scale (MIS), Plymouth sensory imagery Questionnaire (Psi-Q), Sport Imagery Ability Measure (SIAM), Revised Sport Imagery Ability Measure (SIAM-R), Sport Imagery Ability Questionnaire (SIAQ), Survey of mental imagery, Visual Elaboration Scale (VES), Vividness of Olfactory Imagery Questionnaire (VOIQ), Vividness of Object and Spatial Imagery Questionnaire (VOSI), Vividness of Visual Imagery Questionnaire (VVIQ), Revised version Vividness of Visual Imagery Questionnaire (VVIQ-2), Vividness of Visual Imagery Questionnaire- Revised version (VVIQ-RV), Vividness of Visual Imagery Questionnaire-Modified (VVIQ-M), Vividness of Wine Imagery Questionnaire (VWIQ).
Assessments to evaluate ability to rotate or manipulate mental images- mental rotation (n = 12): Card Rotation Test, Cube-cutting Task (CCT), German Test of the Controllability of Motor Imagery (TKBV), Hand laterality task, Judgement test of foot and trunk laterality, Map Rotation Ability Test (MRAT), Mental Paper Folding (MPF), Mental Rotation of Three-Dimensional Objects, Measure of the Ability to Form Spatial Mental Imagery (MASMI), Measure of the Ability to Rotate Mental Images (MARMI), Shoulder specific left right judgement task (LRJT), Spatial Orientation Skills Test (SOST).
Assessments of mental imagery to distinguish between the use of different cognitive styles (n = 7): Object-Spatial Imagery Questionnaire (OSIQ), Object-Spatial Imagery and Verbal Questionnaire (OSVIQ), Paivio’s Individual Differences Questionnaire (3 IDQ versions with 86 items, 72 items and 34 items), Sussex Cognitive Styles Questionnaire (SCSQ), Verbalizer-Visualizer Questionnaire (VVQ).
Assessments to evaluate use of imagery (n = 5): Children’s Active Play Imagery Questionnaire (CAPIQ), Exercise Imagery Questionnaire - Aerobic Version (EIQ-AV), Sport Imagery Questionnaire (SIQ), Sport Imagery Questionnaire for Children (SIQ-C), Spontaneus Use of Imagery Scale (SUIS).
Tables 5 and 6 present the characteristics of included studies, the ‘risk of bias assessment/rating’ and the psychometric properties. The general characteristics of included instruments as well as SoF are presented in Additional files 5 and 6: Tables 3S and 4S.
Mental imagery assessments: Validity
Risk of bias rating
In total, 68 out of the 90 articles reported validity. A total of 18 studies [28, 42, 96, 102, 106, 111, 124, 125, 130, 141, 142, 146, 148, 150, 153, 157, 161, 166] were rated as very good or adequate and 21 studies [22, 35, 94, 98, 104, 109, 112, 115, 118, 119, 121, 127, 136, 145, 151, 152, 160, 162, 163, 165, 168] were rated as inadequate regarding their methodological quality.
The structural, construct, content and criterion validity of most assessments were indeterminate due to lack of details reported in the studies regarding statistical methods and analysis (for more details see Tables 5 and 6). Some information about performed factor analyses such as factor loading by EFA or correlation between factors are not reported. Or the authors conducted an EFA, for which several items were loaded on more than on factor, which could indicate that these items should be deleted. However, for mostly assessments, a confirmatory factor analysis (CFA) is missing to confirm the number of extracted factors. Regarding rating of construct validity, the reviewers have formulated own hypotheses depending on comparator instruments and constructs measured. However, it was not possible for the reviewers to formulate a hypothesis in all cases as in some studies the information on the comparison instrument and the construct to be measured was insufficient. Consequently, the construct validity was rated as indeterminate. Finally, only the SIAQ revealed sufficient structural and construct validity in several studies of at least adequate methodological quality. There is moderate evidence (two studies with at least adequate methodological quality) for sufficient structural validity of the SIQ. The SIQ-C, on the other hand, has a low evidence for insufficient rating of structural validity (only two studies with doubtful methodological quality available).
Mental imagery assessments: Reliability
Risk of bias rating
In total, 74 out of the 90 articles reported reliability. A total of 34 studies [29, 94,95,96,97, 102, 103, 105,106,107, 111, 112, 116, 118, 119, 124,125,126, 133, 137,138,139,140, 142, 145, 148, 150, 152,153,154, 157, 158, 168, 169] were rated as very good or adequate. A total of 22 studies [30, 34, 35, 41, 42, 98, 99, 101, 104, 108, 114, 115, 121, 122, 129, 132, 141, 143, 146, 156, 160, 170] were rated as inadequate regarding their methodological quality.
The internal consistency or Cronbach’s alpha values of most assessments were reported as very high. However, for a quality rating of the internal consistency, the structural validity should also be taken into account, which finally led to an insufficient or indeterminate rating of this psychometric property. Other reasons for an insufficient rating were that in several studies the Cronbach’s alpha was calculated as multidimensional total score and not for each subscale. Only the SIAQ showed sufficient internal consistency with high evidence (multiple studies of very good methodological quality). Test-retest reliability was insufficient or indeterminate for most assessments due to an inappropriate time interval between the measurement sessions, and a poor reporting on the reliability coefficient calculation.
Only one study  evaluated two assessments on mental chronometry: Time-dependent motor imagery screening test (TDMI) and Temporal Congruence Test (TCT) (Table 7). Both assessments showed sufficient test-retest reliability. No information about validity was provided. However, the methodological quality of this study was considered doubtful due to the small sample size.
Quality of studies and assessments
The aim of this systematic review was to evaluate all available assessments measuring individual imagery ability and their psychometric properties. Assessments were categorised based on their construct: motor imagery, mental imagery, and mental chronometry. A summary of the current level of evidence regarding the psychometric properties of the selected assessments is provided in the Tables 3, 4, 5, 6, and 7. All specific characteristics of the included assessments are presented in the supplementary material (Tables S1 and S3). In total, 121 articles were included reporting 155 studies evaluating psychometric properties of 65 assessments in four different disciplines. Articles reported data either about reliability or about validity. No study evaluated the responsiveness, which is defined as the ability of an instrument to detect change over time in the construct to be measured . One possible reason for not reporting on responsiveness might be that the imagery ability or different imagery techniques are used for motor learning, to enhance performance, or to treat different psychological disorders. Hence, the outcome measured is not an improvement of imagery ability, and therefore, responsiveness was not evaluated.
We included in our SR only assessments that comprise items that solely focus on imagery ability. Assessments like the Sport Mental Training Questionnaire (SMTQ)  were excluded, as the majority of items focus on mental skills, such as performance, foundation, or interpersonal skills. Only three items of the SMTQ are focussing on imagery ability.
The methodological quality of most included studies was rated low. The reasons for this rating were for instance: a small sample size, inadequate statistical analysis or insufficient information reported. In particular, several studies calculated Cronbach’s alpha as multidimensional total score for internal consistency and not for each subscale of the assessment. The lack of reporting could lead to inaccuracy, because it is important to know the degree of inter-item correlation among the items for each subscale. Furthermore, some studies calculated the split-half reliability to report internal consistency. With this method, the correlation coefficient may not represent an accurate measure of reliability due to the fact that a single scale is being split into two scales, decreasing the reliability of the measure as a whole . As proposed by COSMIN, we would recommend to calculate and report the internal consistency coefficient (usual Cronbach’s alpha for continuous scores) for each subscale separately. Specifically for structural validity, the authors did not report all details about the number of extracted factors by the EFA, the correlations among factors, the rotation methods applied and model fits from CFA (if performed). Furthermore, regarding construct validity, in some cases no information about the comparator instrument was available. Here, it was not possible to formulate a hypothesis by the reviewer to evaluate construct validity. Regarding the test-retest reliability, in several studies Person’s or Spearman’s reliability coefficient was calculated and no ICC. COSMIN recommends to calculate the ICC a two-way random effects model as the variance within individuals (e.g. systematic differences) and between time points taken into account this way. Using Pearson’s and Spearman’s correlation coefficient, systematic error is not taken into account . Moreover, the time interval for test-retest reliability was sometimes not appropriate (more than 3 weeks apart), which could explain a low (< 0.70) correlation coefficient.
One possible reason for poor reporting is that the majority of the instruments were developed during the early 90s. A practical guide for conducting and reporting of such studies was published much later [52, 57, 58, 64, 174].
Further, reporting deficits in the selected studies resulted in an only substantial agreement with regard to the kappa statistic calculated between the ratings of ZS and CSA after full texts’ selection. For example, some reports did not use the usual terms for psychometric properties when describing the study aim [129, 167]. This led to a confusion among the authors (ZS and CSA) in their attempt to determine which psychometric properties were evaluated.
The psychometric properties for most of the assessments regarding construct validity (e.g. correlation with other measures) and criterion validity were rated as indeterminate or insufficient. These findings corresponded to previous studies [39, 48]. A possible explanation could be that most of these questionnaires are self-reports and the individuals should express the ease or vividness of imagery in relation to the Likert scale. There are no references or standards against which reports of imagery experience can be validated. This is not trivial, considering that the idea about what a vivid image is can vary greatly from person to person. Moreover, the objective and subjective assessments showed low correlation suggesting that these two types of imagery (object and spatial) are not related to each other. Previous studies reported the same findings [22, 34, 35]. Structural validity by most assessments was also considered as indeterminate or insufficient. For example, in several studies, when evaluating Betts Questionnaire, the GTVIC, or the CAIS, only the EFA was conducted and reported. Depending on the method of analysis used in different studies, the number of extracted factors varied greatly. No study conducted a CFA to confirm the number of factors identified. Further, particularly the evaluation of the Betts Questionnaire by various studies [102, 104, 161] showed that some items seem to be unstable on the kinaesthetic and the visual scale and should be removed. This is very interesting, as most of the other assessments for measuring individual differences in imagery were developed based on the Betts Questionnaire as a pioneer assessment, whose structural validity may be considered as indeterminate.
Almost all studies, when reporting psychometric properties of the comparator instrument or the ‘gold standard’ instrument, only reported about reliability (e.g. internal consistency), which is in most cases very high. Such assessments often lacked structural or criterion validity but authors did not critically discuss that. In addition, most studies were only conducted with students aged 12–28 years, who received a course credit for study participation.
The best-evaluated assessments with sufficient psychometric properties were the MIQ, MIQ-R, MIQ-3 and VMIQ-2 for evaluation of motor imagery ability. They are mostly applied in the field of sport. All assessments are self-reports, very easy to use and evaluate vividness in two modalities: visual and kinaesthetic. Moreover, the MIQ-3 and VMIQ-2 evaluate also the perspective used during imagination: external or internal. The MIQ-3 is translated into several languages, which enables a wide use. The SIAQ as mental imagery assessment in sport showed sufficient psychometric properties, but the SIAQ is not able to distinguish between ease of imaging and vividness. The VVIQ was evaluated only with psychology students, and only internal consistency was sufficient. In the field of medicine, the KVIQ is the most evaluated assessment, focusing on vividness in two modalities: visual and kinaesthetic. The original version KVIQ-20 is translated into several languages, but due to the number of items, applying the KVIQ-20 can be quite time-consuming. Structural validity is particularly critical and further studies with large sample sizes and the use of a CFA are needed. Although all assessments described above are self-report, easy to use and cost-effective, a general limitation of these assessments is that they do not allow to control for imagery ability before or during an experiment.
Our results demonstrate that there are a number of published instruments for measuring the imagery ability in different disciplines. We categorised all assessments based on their construct and a clear differentiation between the terms ‘motor imagery’ and ‘mental imagery’. These terms are often confused in the literature.
Limitations regarding the COSMIN recommendations
As proposed by COSMIN, sample sizes are not taken into account when assessing study quality in terms of reliability. It is recommended, however, that sample size should be taken into account at a later step of the review process when the results of all available studies can be summarised (e.g. as imprecision, which refers to the total sample size). Hence, the pooled evidence from many small studies together can provide strong evidence for good reliability . However, in our review, it was not possible to pool or qualitatively summarise the results from all small studies with n = ≤30 due to their different subgroups of patients, different language versions and inconsistency of results. Therefore, we downgraded every study with a small sample size for imprecision as having a risk of bias. We used the ‘other flaws’ option to take this into account. For other psychometric properties like content validity or structural validity, there are standards concerning the sample size. However, some measures were developed and evaluated only for a specific population (e.g. patients) [68, 69]. Therefore, a large sample size is often not feasible, but robust data can be expected due to homogeneity. In cases where we estimated the sample size to be low, most of these studies were of inadequate methodological quality [67,68,69]. On the other hand, several studies with a large sample size (e.g. students), when the target population for a specific measure was not clearly described, were rated as ‘adequate’ or ‘very good’ [141, 142].
In our opinion, the studies with healthy individuals (students, athletes, etc.) or with patients should be more differentiated during evaluation following the COSMIN guideline.
Systematic review limitations and strengths
A limitation of our systematic review is that we did not emphasize on content validity of the evaluated assessments. We rated content validity only in case the authors did specify this as one of their study aims and included a sufficient description of the performed procedures. However, there were some questionnaire development studies, which could be considered assessing content validity. Nevertheless, most of the questionnaire development studies lacked important information about whether the target population was asked about relevance, comprehensiveness and comprehensibility of the questionnaire under development. The authors focused on reporting the validation steps. Therefore, we could not conclude, if the evaluation of content validity was not performed or not reported. Furthermore, we used the COSMIN evaluation tool, a widely accepted and valid tool for rating the methodological quality of studies. However, the COSMIN evaluation of methodology is strictly based on information published in the studies. As most identified articles were published more than 20 years ago, authors could not be contacted to request additional details. Therefore, some ratings as ‘doubtful’ could have been inequitable. In addition, our search was limited to English or German, so relevant articles may have been excluded. We applied the filter published by Terwee et al.  and adapted it for each database. However, we identified many articles by screening the references. The main reason why our filter did not find such articles is that the measurement properties are sometimes poorly reported in the abstract and some authors did not use any commonly used term for measurement properties in the title or abstract of their article. There is a large variation concerning terminology for measurement properties. For example, for reliability, many synonyms can be found in the literature (e.g. reproducibility, repeatability, precision, variability, consistency, dependability, stability, agreement, and measurement error) . However, the composition of the search strategy and the search itself were conducted by a professional research librarian from the University of Zurich in accordance with the review protocol providing a comprehensive search and detailed knowledge of different databases in all four disciplines. Therefore, the search was easily reproduced and verified by ZS resulting in the same number of identified records. Moreover, all references were selected by two authors (ZS and CSA) and several reviewers extracted and double-checked all the data from the included articles, which limited the risk of errors in the extraction process.
Over the last century, various assessments were developed to evaluate an individual’s imagery ability within different dimensions or modalities of imagery: vividness or image clarity, controllability, ease and accuracy of how an image can be mentally manipulated, perspective used, frequency of use of imagery and imagery preferences (verbal or visual style). However, the validity of many assessments is insufficient or indeterminate. Although reliability, in particular internal consistency, of most assessments was reported as high (Cronbach’s alpha > 0.70), due to insufficient or indeterminate structural validity this property of imagery assessment should also be regarded very critically. Furthermore, the COSMIN recommendations classified most studies as inadequate or doubtful due to small sample sizes, inadequate statistical analyses used, or an insufficient reporting. Most studies were conducted with young students and further studies are needed in other fields and wider age ranges.
Despite the limitations described, the present systematic review enables clinicians, coaches, teachers, and researchers to select a suitable imagery ability assessment for their settings and goals based on information provided regarding the assessment’s focus and quality.
Availability of data and materials
For the present systematic literature review, we used data from already published articles. All data from our further analysis can be found within the report.
Confirmatory factor analysis
COnsensus-based Standards for the selection of health Measurement Instruments
Explorative factor analysis
Grading of Recommendations Assessment, Development, and Evaluation
Patient-Reported Outcome Measures
Summary of Findings
Pearson J, Naselaris T, Holmes EA, Kosslyn SM. Mental imagery: functional mechanisms and clinical applications. Trends Cogn Sci. 2015;19(10):590–602.
Kosslyn SM, Ganis G, Thompson WL. Neural foundations of imagery. Nat Rev Neurosci. 2001;2(9):635–42.
Ghaem O, Mellet E, Crivello F, Tzourio N, Mazoyer B, Berthoz A, et al. Mental navigation along memorized routes activates the hippocampus, precuneus, and insula. Neuroreport. 1997;8(3):739–44.
Dalgleish T, Navrady L, Bird E, Hill E, Dunn BD, Golden A-M. Method-of-loci as a mnemonic device to facilitate access to self-affirming personal memories for individuals with depression. Clin Psycholog Sci. 2013;1(2):156–62.
Lotze M, Halsband U. Motor imagery. J Physiol Paris. 2006;99(4-6):386–95.
Robin N, Dominique L, Toussaint L, Blandin Y, Guillot A, Her ML. Effects of motor imagery training on service return accuracy in tennis: the role of imagery ability. Int J Sport Exerc Psychol. 2007;5(2):175–86.
Roberts R, Callow N, Hardy L, Markland D, Bringer J. Movement imagery ability: development and assessment of a revised version of the vividness of movement imagery questionnaire. J Sport Exerc Psychol. 2008;30(2):200–21.
Blackwell SE. Mental imagery: from basic research to clinical practice. J Psychother Integration. 2019;29(3):235–47.
Pearson DG, Deeprose C, Wallace-Hadrill SM, Burnett Heyes S, Holmes EA. Assessing mental imagery in clinical psychology: a review of imagery measures and a guiding framework. Clin Psychol Rev. 2013;33(1):1–23.
Graffam S, Johnson A. A comparison of two relaxation strategies for the relief of pain and its distress. J Pain Symptom Manage. 1987;2(4):229–31.
Braun S, Kleynen M, van Heel T, Kruithof N, Wade D, Beurskens A. The effects of mental practice in neurological rehabilitation; a systematic review and meta-analysis. Front Hum Neurosci. 2013;7:390.
Zimmermann-Schlatter A, Schuster C, Puhan MA, Siekierka E, Steurer J. Efficacy of motor imagery in post-stroke rehabilitation: a systematic review. J Neuroeng Rehabil. 2008;5:8.
Cramer SC, Orr EL, Cohen MJ, Lacourse MG. Effects of motor imagery training after chronic, complete spinal cord injury. Exp Brain Res. 2007;177(2):233–42.
Lebon F, Guillot A, Collet C. Increased muscle activation following motor imagery during the rehabilitation of the anterior cruciate ligament. Appl Psychophysiol Biofeedback. 2012;37(1):45–51.
Marusic U, Grospretre S, Paravlic A, Kovac S, Pisot R, Taube W. Motor imagery during action observation of locomotor tasks improves rehabilitation outcome in older adults after total hip arthroplasty. Neural Plasticity. 2018;2018:9.
Cupal DD, Brewer BW. Effects of relaxation and guided imagery on knee strength, reinjury anxiety, and pain following anterior cruciate ligament reconstruction. Rehabil Psychol. 2001;46(1):28–43.
Christakou A, Zervas Y, Lavallee D. The adjunctive role of imagery on the functional rehabilitation of a grade II ankle sprain. Hum Mov Sci. 2007;26(1):141–54.
Sordoni C, Hall C, Forwell L. The use of imagery by athletes during injury rehabilitation. J Sport Rehabil. 2000;9(4):329–38.
Martin KA, Moritz SE, Hall CR. Imagery use in sport: a literature review and applied model. Sport Psychol. 1999;13(3):245–68.
Munzert J, Krüger B. Motor and visual imagery in sports; 2013. p. 319–41.
Cumming J, Ramsey R, Mellalieu S, Hanton S. Imagery interventions in sport. Advances in applied sport psychology: a review; 2009. p. 5–36.
Lequerica A, Rapport L, Axelrod BN, Telmet K, Whitman RD. Subjective and objective assessment methods of mental imagery control: construct validation of self-report measures. J Clin Exp Neuropsychol. 2002;24(8):1103–16.
Galton F. Inquiries into human faculty and its development. MacMillan Co. 1883. https://doi.org/10.1037/14178-000.
Hall CR. Individual differences in the mental practice and imagery of motor skill performance. Can J Appl Sport Sci. 1985;10(4):17–21.
Betts GH. The distribution and functions of mental imagery. New York: Teachers College, Columbia University; 1909. p. 112.
Marks DF. Visual imagery differences in the recall of pictures. Br J Psychol (London, England: 1953). 1973;64(1):17–24.
Isaac A, Marks DF, Russell DG. An instrument for assessing imagery of movement: The Vividness of Movement Imagery Questionnaire (VMIQ). J Ment Imagery. 1986;10(4):23–30.
McKelvie SJ. Consistency of interform content for the Gordon Test of Visual Imagery Control. Percept Mot Skills. 1992;74(3 Pt 2):1107–12.
Schott N. German test of the controllability of motor imagery in older adults. Zeitschrift Gerontol Geriatr. 2013;46(7):663–72.
Hirschfeld G, Thielsch MT, Zernikow B. Reliabilities of mental rotation tasks: limits to the assessment of individual differences. Biomed Res Int. 2013;2013:340568. https://doi.org/10.1155/2013/340568.
Williams SE, Cumming J, Ntoumanis N, Nordin-Bates SM, Ramsey R, Hall C. Further validation and development of the movement imagery questionnaire. J Sport Exerc Psychol. 2012;34(5):621–46.
Kosslyn SM. Image and brain: the resolution of the imagery debate. Cambridge: MIT Press; 1994.
Kosslyn SM, Koenig OM. Wet mind—the new cognitive neuroscience. New York: Free Press; 1992. p. 13.
Blajenkova O, Kozhevnikov M, Motes MA. Object-spatial imagery: new self-report imagery questionnaire. Appl Cogn Psychol. 2006;20(2):239–63.
Blazhenkova O, Kozhevnikov M. The New Object-Spatial-Verbal Cognitive Style Model: theory and measurement. Appl Cogn Psychol. 2009;23(5):638–63.
Watt A. Development and validation of the sport imagery ability measure: Doctoral dissertation, Victoria University of Technology; 2003. Retrieved from http://citeseerx.ist.psu.edu
Cumming J, Eaves DL. The nature, measurement, and development of imagery ability. Imagination Cogn Pers. 2018;37(4):375–93.
Durio HF. The measurement of mental imagery ability [microform]: single or multidimensional construct? Washington, D.C.: Distributed by ERIC Clearinghouse; 1979.
McAvinue LP, Robertson IH. Measuring visual imagery ability: a review. Imagination Cogn Pers. 2007;26(3):191–211.
Galton F. Statistics of mental imagery. Mind. 1880;os-V(19):301–18.
Sheehan PW. A shortened form of Betts’ questionnaire upon mental imagery. J Clin Psychol. 1967;23(3):386–9.
Kwekkeboom KL. Measuring imaging ability: psychometric testing of the imaging ability questionnaire. Res Nurs Health. 2000;23(4):301–9.
Malouin F, Richards CL, Jackson PL, Lafleur MF, Durand A, Doyon J. The kinesthetic and visual imagery questionnaire (KVIQ) for assessing motor imagery in persons with physical disabilities: a reliability and construct validity study. J Neurol Phys Ther. 2007;31(1):20–9.
Malouin F, Richards CL, Durand A, Doyon J. Reliability of mental chronometry for assessing motor imagery ability after stroke. Arch Phys Med Rehabil. 2008;89(2):311–9.
McAvinue LP, Robertson IH. Measuring motor imagery ability: a review. Eur J Cogn Psychol. 2008;20(2):232–51.
Di Rienzo F, Collet C, Hoyek N, Guillot A. Impact of neurologic deficits on motor imagery: a systematic review of clinical evaluations. Neuropsychol Rev. 2014;24(2):116–47.
Melogno-Klinkas M, Nunez-Nagy S, Ubillos S. Outcome measures on motor imagery ability:use in neurorehabilitation. In: The 2nd International Congress on Neurorehabilitation and Neural Repair: 2017; Maastricht, Netherlands; 2017. p. 172.
White K, Sheehan PW, Ashton R. Imagery assessment: a survey of self-report measures. J Ment Imagery. 1977;1(1):145–69.
Suica Z, Platteau-Waldmeier P, Koppel S, Schmidt-Trucksaess A, Ettlin T, Schuster-Amft C. Motor imagery ability assessments in four disciplines: protocol for a systematic review. BMJ Open. 2018;8(12):e023439.
Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097.
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Br Med J. 2021;372:n71.
De Vet H, Terwee C, Mokkink L, Knol D. Measurement in Medicine: A Practical Guide (Practical Guides to Biostatistics and Epidemiology). Cambridge: Cambridge University Press; 2011. https://doi.org/10.1017/CBO9780511996214.
Prinsen CAC, Mokkink LB. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1147–57.
Terwee CB, Jansma EP, Riphagen II, de Vet HC. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res. 2009;18(8):1115–23.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74.
McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22:276–82.
Mokkink LB, de Vet HCW, Prinsen CAC, Patrick DL, Alonso J, Bouter LM, et al. COSMIN Risk of Bias checklist for systematic reviews of Patient-Reported Outcome Measures. Qual Life Res. 2018;27(5):1171–9.
Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21(4):651–7.
Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42.
Prinsen CAC, Vohra S, Rose MR, Boers M, Tugwell P, Clarke M, et al. How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” – a practical guideline. Trials. 2016;17(1):449.
Izquierdo I, Olea J, Abad FJ. Exploratory factor analysis in validation studies: uses and recommendations. Psicothema. 2014;26(3):395–400.
Watkins MW. Exploratory factor analysis: a guide to best practice. J Black Psychol. 2018;44(3):219–46.
McKelvie SJ. Guidelines for judging psychometric properties of imagery questionnaires as research instruments: a quantitative proposal. Percept Mot Skills. 1994;79(3):1219–31.
Mokkink LB, Prinsen CA, Patrick DL, Alonso J, Bouter LM, de Vet HC, et al. COSMIN methodology for systematic reviews of Patient-Reported Outcome Measures (PROMs)- user Manual; 2018.
Ochipa C, Rapcsak SZ, Maher LM, Gonzales Rothi LJ, Bowers D, Heilman KM. Selective deficit ofpraxis imagery in ideomotor apraxia. Neurology. 1997;49:474–80.
Fournier J. Imagix: multimedia software for evaluating the vividness of movement imagery. Percept Mot Skills. 2000;90:367–70.
Schuster C, Lussi A, Wirth B, Ettlin T. Two assessments to evaluate imagery ability: Translation, test-retest reliability and concurrent validity of the German KVIQ and Imaprax. BMC Med Res Methodol. 2012;12(1):1–3.
Randhawa B, Harris S, Boyd LA. The kinesthetic and visual imagery questionnaire is a reliable tool for individuals with Parkinson disease. J Neurol Phys Ther. 2010;34(3):161–7.
Tabrizi MY, Zangiabadi N, Mazhari S, Zolala F. The reliability and validity study of the Kinesthetic and Visual Imagery Questionnaire in individuals with Multiple Sclerosis. Brazilian J Phys Ther. 2013;17(6):588–92.
Demanboro A, Sterr A, dos Anjos SM, Conforto AB. A Brazilian-Portuguese version of the Kinesthetic and Visual Motor Imagery Questionnaire. Arq Neuro Psiquiatr. 2018;76(1):26–31.
Nakano H, Kodama T, Ukai K, Kawahara S, Horikawa S, Murata S. Reliability and validity of the Japanese version of the Kinesthetic and Visual Imagery Questionnaire (KVIQ). Brain Sci. 2018;8(5):79.
Hall C, Pongrac J, Buckholz E. The measurement of imagery ability. Hum Mov Sci. 1985;4(2):107–18.
Atienza F, Balaguer I, Garcia-Merita ML. Factor analysis and reliability of the Movement Imagery Questionnaire. Percept Mot Skills. 1994;78(3 Pt 2):1323–8.
Monsma EV, Short SE, Hall CR, Gregg M, Sullivan P. Psychometric properties of the revised Movement Imagery Questionnaire (MIQ-R). J Imagery Res Sport Phys Act. 2009;4(1). https://doi.org/10.2202/1932-0191.1027.
Gregg M, Hall C, Butler A. The MIQ-RS: a suitable option for examining movement imagery ability. Evid Based Complement Altern Med. 2010;7(2):249–57.
Butler AJ, Cazeaux J, Fidler A, Jansen J, Lefkove N, Gregg M, et al. The movement imagery questionnaire-revised, second edition (MIQ-RS) is a reliable and valid tool for evaluating motor imagery in stroke populations. Evid Based Complement Altern Med. 2012;2012:497289.
Loison B, Moussaddaq AS, Cormier J, Richard I, Ferrapie AL, Ramond A, et al. Translation and validation of the French Movement Imagery Questionnaire - Revised Second version (MIQ-RS). Ann Phys Rehabil Med. 2013;56(3):157–73.
Budnik-Przybylska D, Szczypinska M, Karasiewicz K. Reliability and validity of the Polish version of the Movement Imagery Questionnaire-3 (MIQ-3). Curr Issues Pers Psychol. 2016;4(4):253–67.
Paravlić A, Pišot S, Mitić P. Validation of the Slovenian version of motor imagery questionnaire 3 (MIQ-3): promising tool in modern comprehensive rehabilitation practice. Slovenian J Public Health. 2018;57(4):201–10.
Dilek B, Ayhan C, ve Yakut Y. Reliability and validity of the Turkish version of the movement imagery questionnaire-3: Its cultural adaptation and psychometric properties. Neurol Sci Neurophysiol. 2020;37(4):221-7. https://doi.org/10.4103/NSN.NSN_30_20.
Robin N, Coudevylle GR, Dominique L, Rulleau T, Champagne R, Guillot A, Toussaint L. Translation and validation of the movement imagery questionnaire-3 second French version. J Bodyw Mov Ther. 2021;28:540-6. https://doi.org/10.1016/j.jbmt.2021.09.004.
Trapero-Asenjo S, Gallego-Izquierdo T, Pecos-Martín D, Nunez-Nagy S. Translation, cultural adaptation, and validation of the Spanish version of the Movement Imagery Questionnaire-3 (MIQ-3). Musculoskelet Sci Pract. 2021;51:102313.
Martini R, Carter MJ, Yoxon E, Cumming J, Ste-Marie DM. Development and validation of the Movement Imagery Questionnaire for Children (MIQ-C). Psychol Sport Exerc. 2016;22:190–201.
Madan CR, Singhal A. Introducing TAMI: an objective test of ability in movement imagery. J Motor Behav. 2013;45(2):153–66.
Campos A, López A, Pérez MJ. Vividness of visual and haptic imagery of movement. Percept Mot Skills. 1998;87(1):271–4.
Eton DT, Gilner FH, Munz DC. The measurement of imagery vividness: a test of the reliability and validity of the Vividness of Visual Imagery Questionnaire and the Vividness of Movement Imagery Questionnaire. J Ment Imagery. 1998;22(3-4):125–36.
Ziv G, Lidor R, Arnon M, Zeev A. The Vividness of Movement Imagery Questionnaire (VMIQ-2) - translation and reliability of a Hebrew version. Israel J Psychiatry Relat Sci. 2017;54(2):48–52.
Qwagzeh A, Albtoush A, Alzoubi M, Aldeghidi M, Al-Awamleh A. A comparison of movement imagery ability among undergraduates sport students. Sport Sci. 2018;11:92–6.
Dahm SF, Bart VKE, Pithan JM, Rieger M. Deutsche Übersetzung und Validierung des VMIQ-2 zur Erfassung der Lebhaftigkeit von Handlungsvorstellungen. Zeitschrift Sportpsychol. 2019;26(4):151–8.
Faull AL, Jones ES. Development and validation of the Wheelchair Imagery Ability Questionnaire (WIAQ) for use in wheelchair sports. Psychol Sport Exerc. 2018;37:196–204.
Hall CR, Martin KA. Measuring movement imagery abilities: A revision of the Movement Imagery Questionnaire. Journal of Mental Imagery. 1997;21(1-2):143–54.
Madan CR, Singhal A. Improving the TAMI for use with athletes. J Sports Sci. 2014;32(14):1351–6.
Donoff CM, Madan CR, Singhal A. Handedness effects of imagined fine motor movements. Laterality. 2018;23(2):228-48. https://doi.org/10.1080/1357650X.2017.1354870.
Gissurarson LR. Reported auditory imagery and its relationship with visual imagery. J Ment Imagery. 1992;16(3-4):117–22.
Campos A. A research note on the factor structure, reliability, and validity of the Spanish Version of Two Auditory Imagery Measures. Imagination Cogn Pers. 2017;36(3):301–11.
Campos A. Spatial imagery: a new measure of the visualization factor. Imagination Cogn Pers. 2009;29(1):31–9.
Halpern AR. Differences in auditory imagery self-report predict neural and behavioral outcomes. Psychomusicol Music Mind Brain. 2015;25(1):37–47.
Sheehan PW. Reliability of a short test of imagery. Percept Mot Skills. 1967;25(3):744.
Juhasz JB. On the reliability of two measures of imagery. Percept Mot Skills. 1972;35(3):874.
Evans IM, Kamemoto Wanda S. Reliability of the Short Form of Betts' Questionnaire on Mental Imagery: Replication. Psychological Reports. 1973;33(1):281-2. https://doi.org/10.2466/pr0.1922.214.171.1241.
Westcott TB, Rosenstock E. Reliability of two measures of imagery. Perceptual and Motor Skills. 1976;42(3, Pt 2):1037–8.
Baranchok JS. The linguistic and statistical equivalence of Spanish and English versions of Betts Questionnaire upon mental imagery. US: ProQuest Information & Learning; 1995.
Sacco GR, Reda M. The Italian form of the Questionnaire Upon Mental Imagery (QMI). J Ment Imagery. 1998;22(3-4):213–28.
Campos A, Pérez-Fabello MJ. The Spanish version of Betts’ questionnaire upon mental imagery. Psychol Rep. 2005;96(1):51–6.
Willander J, Baraldi S. Development of a new Clarity of Auditory Imagery Scale. Behav Res Methods. 2010;42(3):785–90.
Campos A. Internal consistency and construct validity of two versions of the revised vividness of Visual Imagery Questionnaire. Percept Mot Skills. 2011;113(2):454–60.
Tużnik P, Francuz P. Factor structure and test-retest reliability of the Polish version of the Clarity of Auditory Imagery Scale. Curr Psychol. 2021;40:4364–71. https://doi.org/10.1007/s12144-019-00367-x.
McKelvie SJ, Gingras PP. Reliability of two measures of visual imagery. Percept Mot Skills. 1974;39(1):417–8.
Hiscock M. Imagery assessment through self-report: what do imagery questionnaires measure? J Consult Clin Psychol. 1978;46(2):223–30.
LeBoutillier N, Marks D. Inherent Response Leniency in the Modified Gordon Test of Visual Imagery Control Questionnaire. Imagination Cognition and Personality. 2002;21(4):311-8. https://doi.org/10.2190/JWAQ-VMV3-AB4B-CVQG.
Perez-Fabello MJ, Campos A. Factor structure and internal consistency of the Spanish version of the Gordon Test of Visual Imagery Control. Psychol Rep. 2004;94(3 Pt 1):761–6.
Lane JB. Problems in assessment of vividness and control of imagery. Percept Mot Skills. 1977;45(2):363–8.
Kwekkeboom KL, Maddox MA, West T. Measuring imaging ability in children. J Pediatr Health Care. 2000;14(6):297-303. https://doi.org/10.1067/mph.2000.106896.
D’Ercole M, Castelli P, Giannini AM, Sbrilli A. Mental imagery scale: a new measurement tool to assess structural features of mental representations. Meas Sci Technol. 2010;21(5):054019.
Andrade J, May J, Deeprose C, Baugh SJ, Ganis G. Assessing vividness of mental imagery: the plymouth sensory imagery questionnaire. Br J Psychol. 2014;105(4):547–63.
Pérez-Fabello MJ, Campos A. Spanish version of the Plymouth Sensory Imagery Questionnaire. Front Psychol. 2020;11:916.
Williams SE, Cumming J. Measuring Athlete Imagery Ability: The Sport Imagery Ability Questionnaire. J Sport Exerc Psychol. 2011;33(3):416-40. https://doi.org/10.1123/jsep.33.3.416.
Switras JE. An alternate-form instrument to assess vividness and controllability of mental imagery in seven modalities. Percept Mot Skills. 1978;46(2):379–84.
Grebot E. Validation with a French sample of the four scales of Switras’s survey of mental imagery. Percept Mot Skills. 2003;97(3 I):763–9.
Slee JA. The perceptual nature of visual imagery. Unpublished doctoral dissertation, Australian National Univer., Canberra, Australia, 1976.
Gilbert AN, Crouch M, Kemp SE. Olfactory and visual mental imagery. J Ment Imagery. 1998;22(3-4):137–46.
Blazhenkova O. Vividness of object and spatial imagery. Percept Mot Skills. 2016;122(2):490–508.
Rossi JS. Reliability of a Measure of Visual Imagery. Perceptual and Motor Skills. 1977;45(3):694. https://doi.org/10.2466/pms.19126.96.36.1994.
Campos A, González M, Amor A. The Spanish version of the Vividness of Visual Imagery Questionnaire: factor structure and internal consistency reliability, vol. 90; 2002.
LeBoutillier NM, David F. The factorial validity and reliability of the Eyes-Open version of the Vividness of Visual Imagery Questionnaire. J Ment Imagery. 2001;25(3-4):107–14.
Campos A, Perez-Fabello MJ. Psychometric quality of a revised version vividness of visual imagery questionnaire. Percept Mot Skills. 2009;108(3):798–802.
Croijmans I, Speed LJ, Arshamian A, Majid A. Measuring multisensory imagery of wine: the vividness of Wine Imagery Questionnaire. Multisens Res. 2019;32(3):179–95.
Ekstrom RB, French JW, Harman HH, Dermen D. Manual for kit of factor-referenced cognitive tests. Educational Testing Service. 1976.
Bray H, Moseley GL. Disrupted working body schema of the trunk in people with back pain. Br J Sports Med. 2011;45(3):168–73.
Zimney KJ, Wassinger CA, Goranson J, Kingsbury T, Kuhn T, Morgan S. The reliability of card-based and tablet-based left/right judgment measurements. Musculoskelet Sci Pract. 2018;33:105–9.
Williams LJ, Braithwaite FA, Leake HB, McDonnell MN, Peto DK, Lorimer Moseley G, Hillier SL. Reliability and validity of a mobile tablet for assessing left/right judgements. Musculoskelet Sci Pract. 2019;40:45-52. https://doi.org/10.1016/j.msksp.2019.01.010.
Linder M, Michaelson P, Roijezon U. Laterality judgments in people with low back pain - a cross-sectional observational and test-retest reliability study. Man Ther. 2016;21:128–33.
Campos A, Campos-Juanatey D. Measure of the ability to mentally rotate maps. N Am J Psychol. 2020;22:289–98.
Shepard RN, Feng C. A chronometric study of mental paper folding. Cognitive Psychology. 1972;3(2):228-43. https://doi.org/10.1016/0010-0285(72)90005-9.
Shepard RN, Metzler J. Mental Rotation of Three-Dimensional Objects. Science. 1971;171(3972):701-3. https://doi.org/10.1126/science.171.3972.701.
Vandenberg SG, Kuse AR. Mental rotations, a group test of three-dimensional spatial visualization. Percept Mot Skills. 1978;47(2):599–604.
Campos A, Campos-Juanatey D. Measure of spatial orientation ability. Imagination Cogn Pers. 2020;39(4):348–57
Campos A. Reliability and percentiles of a measure of spatial imagery. Imagination Cogn Pers. 2013;32(4):427–31.
Campos A. Measure of the ability to rotate mental images. Psicothema. 2012;24(3):431–4.
Breckenridge JD, McAuley JH, Butler DS, Stewart H, Moseley GL, Ginn KA. The development of a shoulder specific left/right judgement task: validity & reliability. Musculoskeletal Sci Pract. 2017;28:39–45.
Paivio A, Harshman R. Factor analysis of a questionnaire on imagery and verbal habits and skill, vol. 37; 1983.
Kardash CA, Amlund JT, Stock WA. Structural analysis of Paivio’s Individual Differences Questionnaire. J Exp Educ. 1986;55(1):33–8.
Mealor AD, Simner J, Rothen N, Carmichael D, Ward J. Different dimensions of cognitive style in typical and atypical cognition: new evidence and a new measurement tool. PLoS One. 2016;11(5):e0155483.
Stevens MJ, Rapp BJ, Pfost KS, Johnson JJ. Further Evidence of the Stability of the Verbalizer-Visualizer Questionnaire. Perceptual and Motor Skills. 1986;62(1):301-2. https://doi.org/10.2466/pms.19188.8.131.521.
Campos A, Lopez A, Gonzalez MA, Amor A. Imagery factors in the Spanish version of the Verbalizer-Visualizer Questionnaire. Psychol Rep. 2004;94(3):1149–54.
Wedell F, Roeser F, Hamburger K. Visualizer verbalizer questionnaire: evaluation and revision of the German translation, vol. 15; 2014.
Cooke L, Munroe-Chandler K, Hall C, Tobin D, Guerrero M. Development of the children's active play imagery questionnaire. J Sports Sci. 2014;32(9):860-9. https://doi.org/10.1080/02640414.2013.865250.
Kashani V, Mohamadi B, Mokaberian M. Psychometric properties of the Persian version of Children’s Active Play Imagery Questionnaire. Ann Appl Sport Sci. 2017;5:49–59.
Hausenblas HA, Hall CR, Rodgers WM, Munroe KJ. Exercise imagery: Its nature and measurement. J Appl Sport Psychol. 1999;11(2):171-80. https://doi.org/10.1080/10413209908404198.
Pérez-Fabello M, Campos A. Psychometric properties of the Spanish version of the Exercise Imagery Questionnaire (EIQ). Cuad Psicol Deporte. 2020;20:41–54.
Hall C, Mack D, Paivio A, Hausenblas H. Imagery use by athletes: development of the sport imagery questionnaire, vol. 29; 1998.
Vurgun N, Dorak R, Ozsaker M. Validity and reliability study of the sport imagery questionnaire for Turkish athletes. Int J Approximate Reasoning. 2012;4:32–8.
Ruiz MC, Watt AP. Psychometric characteristics of the Spanish version of the Sport Imagery Questionnaire. Psicothema. 2014;26(2):267–72.
Hall RC, Munroe-Chandler KJ, Fishburne GJ, Hall ND. The Sport Imagery Questionnaire for Children (SIQ-C), vol. 13; 2009.
Reisberg D, Pearson D, Kosslyn S. Intuitions and introspections about imagery: the role of imagery experience in shaping an investigator's theoretical views. Appl Cogn Psychol. 2003;17(2):147-60.
Nelis S, Holmes EA, Griffith JW, Raes F. Mental imagery during daily life: psychometric evaluation of the spontaneous use of imagery scale (SUIS). Psychol Belg. 2014;54(1):19–32.
Görgen SM, Hiller W, Witthöft M. The spontaneous use of imagery scale (SUIS) - development and psychometric evaluation of a German adaptation. Diagnostica. 2016;62(1):31–43.
Tanaka Y, Yoshinaga N, Tsuchiyagaito A, Sutoh C, Matsuzawa D, Hirano Y, et al. Mental imagery in social anxiety disorder: the development and clinical utility of a Japanese version of the Spontaneous Use of Imagery Scale (SUIS-J). Asia Pac J Couns Psychother. 2018;9(2):171–85.
Allbutt J, Ling J, Heffernan TM, Shafiullah M. Self-Report Imagery Questionnaire Scores and Subtypes of Social-Desirable Responding. J Individ Differ. 2008;29(4):181-8. https://doi.org/10.1027/1614-0001.29.4.181.
Hishitani S. Auditory Imagery Questionnaire: its factorial structure, reliability, and validity. J Ment Imagery. 2009;33(1-2):63–80.
White K, Ashton R, Law H. Factor analyses of the shortened form of Betts’ questionnaire upon mental imagery. Aust J Psychol. 1974;26(3):183–90.
Lorenz C, Neisser U. Factors of imagery and event recall. Mem Cogn. 1985;13(6):494–500.
Kihlstrom JF, Glisky ML, Peterson MA, Harvey EM, et al. Vividness and control of mental imagery: a psychometric analysis. J Ment Imagery. 1991;15(3-4):133–42.
Campos A, Pérez MJ. Visual Elaboration Scale as a measure of imagery. Percept Mot Skills. 1988;66(2):411-4. https://doi.org/10.2466/pms.19184.108.40.2061.
Richardson A. The meaning and measurement of memory imagery. Br J Psychol. 1977;68(1):29–43.
Wallwork SB, Butler DS, Fulton I, Stewart H, Darmawan I, Moseley GL. Left/right neck rotation judgments are affected by age, gender, handedness and image rotation. Man Ther. 2013;18(3):225–30.
Bowering KJ, Butler DS, Fulton IJ, Moseley GL. Motor imagery in people with a history of back pain, current back pain, both, or neither. Clin J Pain. 2014;30(12):1070–5.
Campos A, Perez-Fabello MJ. Factor structure of the Spanish version of the Object-Spatial Imagery and Verbal Questionnaire. Psychol Rep. 2011;108(2):470–6.
Campos A, Pérez-Fabello MJ. Some psychometric properties of the Spanish version of the Clarity of Auditory Imagery Scale. Psycholog Rep. 2011;109(1):139–46.
White KD. The measurement of imagery vividness: normative data and their relationship to sex, age, and modality differences. Br J Psychol. 1977;68(2):203–11.
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737–45.
Behnke M, Tomczak M, Kaczmarek LD, Komar M, Gracz J. The Sport Mental Training Questionnaire: development and validation. Curr Psychol. 2019;38(2):504–16.
Frey B. The SAGE encyclopedia of educational research, measurement, and evaluation (Vols. 1-4). Thousand Oaks: SAGE Publications, Inc.; https://doi.org/10.4135/9781506326139.
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539–49.
We would like to thank to Dr. Sabine Klein, Librarian, who helped with the search strategy. Further, we would like to thank Prof. Alfredo Campos for providing literature and necessary assessments. Furthermore, we are grateful to Ladina Matter, Luca Beugger, and Valerie Zumbrunnen for their valuable support during the data extraction period.
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Ethics approval and consent to participate
Ethics approval is not required for this systematic review, as we analysed already published literature only.
Consent for publication
Not applicable, no individual person’s data.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Example search strategy for web of science.
COSMIN Risk of Bias checklist.
Characteristics of the Included Measurement Tools for Motor Imagery.
Motor imagery: Summary of Findings using modified GRADE.
Characteristics of the Included Measurement Tools for Mental Imagery.
Mental imagery Assessments: Summary of Findings using modified GRADE.
About this article
Cite this article
Suica, Z., Behrendt, F., Gäumann, S. et al. Imagery ability assessments: a cross-disciplinary systematic review and quality evaluation of psychometric properties. BMC Med 20, 166 (2022). https://doi.org/10.1186/s12916-022-02295-3