Skip to main content

Table 6 Mental imagery assessments: The characteristics of the included studies - Validity

From: Imagery ability assessments: a cross-disciplinary systematic review and quality evaluation of psychometric properties

Tool

Disciplines

Study

Country

Language

Study population

Validity

COSMIN

Quality Criteria

Comments

Participants

N

Age mean (years)

Sex

Design

Results

a. General mental imagery in any sensorial modality

Auditory Imagery Scale (AIS)

n.d.s.

Gissurarson 1992 [94]

IS

E

Volunteers

160

33.0

70♀, 90♂

Construct validity- structural validity

PCA conducted. All seven items loaded on a single dimension. Item loaded 0.50–0.77.

Adequate

?

Only EFA conducted.

*Not all information reported for quality criteria rating. CFA should be the next step.

Construct validity- hypothesis testing

Corr. AIS with VVIQ

r=0.48

Corr. AIS with GTVIC

r=−0.23

Know-group validity

Sex difference on the AIS were not significant.

Inadequate

?

Psychometric properties of comparator instrument not reported.

Participant's characteristics not reported.

Low corr. indicated, that there are two unrelated modalities: visual and auditory. But no corr. calculated with instrument which measures the same construct.

n.d.s.

Allbutt et al. 2008 [159]

UK

E

Students

113

25.2

31♀, 82♂

Construct validity- hypothesis testing

Corr. AIS with VVIQ-2

r=−0.35

Doubtful

?

Psychometric properties of comparator instrument insufficiently reported.

Very low negative corr. between assessments. See comment above.

n.d.s.

Campos 2017 [95]

ES

S

Students

444

20.4

190♀, 254♂

Construct validity- structural validity

CFA performed using on factor model: χ2//df=2.05, CFI=0.91, GFI=0.98, NNFI=0.80, RMSEA=0.05 and SRMR=0.04.

Doubtful

+

CFA performed but rotation method used was not described.

Accepted model fit: CFI >0.95, or SRMR <0.08, or RMSEA <0.06.

Construct validity- hypothesis testing

Corr. ASI with CAIS

r=-0.49

Corr. ASI with Bett's QMI

r=0.37

Doubtful

?

Psychometric properties of comparator instrument insufficiently reported. Not all results in accordance with the hypotheses. Corr. with comparator instrument <0.50.

Auditory Imagery Questionnaire (AIQ)

n.d.s.

Hishitani 20091 [160]

JP

E

Students

193

20.3

146♀, 47♂

Construct validity- structural validity

PCA with oblimin rotation conducted. 3 factors extracted: relaxing sound, human voice, unpleasant sound. Factor loaded 0.31-0.74. Corr. factors 1 and 2 were 0.47, factors 2 and 3 were 0.47, factors 1 and 3 were 0.66. CFA performed using two-factor model (factor 1=human voice; factor 2=relaxing and unpleasant sound: GFI=0.92, CFI=0.93, RMSEA=0.07.

CFA performed using hierarchical model composed of four factors: relaxing sound, human voice, mind's ear, unpleasant sound. GFI=0.94, CFI=0.96, RMSEA=0.06.

Very good

+

Steps of FA well described. Very good sample size. CFA with hierarchical model showed acceptable fit to the data.

Accepted model fit: CFI >0.95, or SRMR <0.08, or RMSEA <0.06.

Auditory Imagery Questionnaire (AIQ)

n.d.s.

Hishitani 20092 [160]

JP

E

Students

131

19.9

107♀, 24♂

Construct validity- hypothesis testing

Corr. AIQ with VVIQ

r=0.48

Know-group validity

Two subgroups were formed depending on whether the participants practiced music or not. Sig. differences between groups was found p<0.05.

Inadequate

Doubtful

?

Psychometric properties of comparator instrument not reported.

No corr. with comparator instrument which measures the same construct.

Participant's characteristics not described.

n.d.s.

Campos 2017 [95]

ES

S

Students

444

20.4

190♀, 254♂

Construct validity- structural validity

CFA performed using two-factor model: χ2/df=3.83, CFI=0.84, GFI=0.92, NNFI=0.86, RMSEA=0.08 and SRMR=0.07.

Doubtful

+

CFA performed but rotation method used not described. Accepted model fit: CFI >0.95, or SRMR <0.08, or RMSEA <0.06.

n.d.s.

Campos 2017 [95]

ES

S

Students

444

20.4

190♀, 254♂

Construct validity- hypothesis testing

Corr. AIQ with AIS

r=0.44

Corr. AIQ with CAIS

r=−0.48

Corr. ASI with Bett's QMI

r=0.59

Doubtful

?

Psychometric properties of comparator instrument insufficient reported. Results are not in accordance with the hypotheses. Stronger corr. between AIS and CAIS expected.

Bucknell Auditory Imagery Scale (BAIS)

n.d.s.

Halpern 2015 [97]

USA

E

Volunteers

76

22.6

22♀, 54♂

Construct validity- structural validity

EFA using PCA with varimax rotation performed. 3 components/factors defined: environmental sound, voice and music.

BAIS-V: loading for environmental sound 0.48–0.81, for voice 0.42–0.77, for music 0.48–0.89. Total variance explained by 58%. BAIS-C: loading for environmental sound 0.55–0.82, for voice 0.44–0.73, for music 0.45–0.84. Total variance explained by 59%. Some items loaded on more than one factor but this loading <0.50.

Doubtful

?

Sample size doubtful.

Some items showed instability and loaded on two factors.

CFA should be conducted to confirm these three components.

Construct validity- hypothesis testing

Corr. BAIS (both scales) with VVIQ-M

r=0.62

Know-group validity

No sig. difference between men and women on the BAIS score. Sig. difference between men and women on the VVIQ-M.

Doubtful

?

Psychometric properties of comparator instrument insufficiently reported. Participants insufficiently described.

No hypotheses defined.

Betts Questionnaire Upon Mental Imagery (shorted version 35-item, SQMI)

Psy

Sheehan P. W., 1967 [98]

AU

E

Students

62

NR

62♀

Cross-cultural validity

American and Australian students compared. No sig. difference between students regarding vividness over all items established.

Inadequate

?

Low sample size. Population not described. Unclear which group difference analysis was performed.

60

NR

28♀, 32♂

Construct validity- structural validity

r=0.99 between total scores based on the complete scale and the shortened form was obtained. A factor established: a general imagery ability for all sensory modalities. All 35 items in the scale loaded highly on the factor, with an average loading of 0.57

Inadequate

?

Sample size for this analysis inadequate.

*Not all information reported for quality criteria rating.

Betts Questionnaire Upon Mental Imagery (shorted version 35-item, SQMI)

n.d.s.

White et al. 1974 [161]

AU

E

Students

1562

22.3♀

600♀

Construct validity- structural validity

PCA with varimax rotation; one factor with several modalities: auditory, kinaesthetic, gustatory, olfactory, organic, cutaneous, visual. Total variance explained by 51.8%. Factor loadings ranged from 0.43 to 0.89. Only one item ‘sun’ on visual subscale loaded very low (<0.20).

Adequate

?

One item on visual subscale 'sun' should be removed from questionnaire.

20.4♂

962♂

n.d.s.

Lorenz & Neisser 1985 [162]

USA

E

Students

46

NR

NR

Construct validity- structural validity

PCA with varimax rotation used to extract 3 factors: Factor 1. Vividness and control, Factor 2. Spatial manipulation, Factor 3. childhood memory. Betts QMI loaded on 1st factor with loading 0.81.

Inadequate

-

Sample size inadequate for this analysis.

n.d.s.

Kihlstrom et al. 1991 [163]

USA

E

Students

2036

NR

NR

Construct validity- structural validity

PCA with orthogonal rotation showed 7 factors corresponding closely to the subscales.

Doubtful

?

#, Participants not described. *Not all information reported for quality criteria rating.

Construct validity- hypothesis testing

Corr. Betts QMI with GTVIC

r=0.25

Inadequate

?

Measurement properties of the comparator instrument not reported.

The corr. with the comparison instrument that measures the same construct is missing.

n.d.s.

Campos & Pérez-Fabello 2005 [104]

ES

S

Students

562

20.2

148♀, 414♂

Construct validity- structural validity

PCA followed by varimax orthogonal rotation identified 8 factors, together accounted for 58.4% of total variance; Factor loadings 0.42–0.79. 3 items referred to different senses loaded on the 7. factor. Item 5 loaded on the 8 factor, which was a kind of visual image.

Adequate

?

Some items seem to be unstable and could be removed.

Item removed could influence the number of factors/modalities identified.

Construct validity- hypothesis testing

Corr. Betts QMI and GTVIC

r=−0.34

Correlation Betts QMI and VVIQ

r=0.58

Inadequate

?

Measurement properties of the comparator instrument not reported.

Corr. Betts QMI with VVIQ reported, but unclear which modality of Betts QMI has a strong corr. with VVIQ.

n.d.s

Baranchok John 1995 [102]

USA +

MX

S + E

Mexican students1

350

NR

159♀, 191♂

Cross-cultural validity

The t-test, t(12)=0.71, p>0.10, supported the null hypothesis, suggesting that there was no difference between students from the USA and Mexico. The Spanish version of the QMI seems linguistically and statistically equivalent to the English version.

Very good

+

Very good sample size and good description of study population and procedures.

US students2

307

130♀, 177♂

Construct validity- structural validity

PCA with varimax rotation identified one general imagery factor with 7 modaliies specific factors. 51.1% of the variance was explained by the USA students and 49.9% by the Mexican. Factor loaded from students from the USA by 0.25–0.83 (only one item on visual subscale loaded <0.20) and from the Mexican students by 0.25-0.80 (one item on visual and two items on kinaesthetic loaded <0.20).

Adequate

−

Some items loaded very low.

These results confirmed findings by White (1974) [161] and Campos & Péréz-Fabello 2005 [104].

Kinaesthetic subscale seems the most unstable, and item 5 on visual subscale should be evaluated again.

Clarity of Auditory Imagery Scale (CAIS)

n.d.s.

Willander & Baraldi 2010 [105]

SE

E/Se

Students

212

25.9

58♀, 154

Construct validity- structural validity

EFA and principal axis factoring was conducted and one factor was extracted. Factor loadings of 16 items ranged from 0.40 to 0.67. The total variance was explained by 31.63%.

Adequate

?

Following COSMIN recommendation EFA should be rated as adequate.

CFA should be performed too.

Explained variance just above 0.30.

Clarity of Auditory Imagery Scale (CAIS)

Construct validity- hypothesis testing

Known-groups validity

No difference established between men and women (p > 0.05).

Doubtful

+

Results are in accordance with the hypotheses but participants characteristics insufficiently described.

n.d.s.

Campos 2011 [106]

ES

S

Students

234

19.6

47♀, 187♂

Construct validity- structural validity

PCA with varimax orthogonal rotation was conducted. 5 factors with eigenvalues >1 identified. Factor 1 loaded by Item 5,11,12,13,14,15,16; Second factor loaded by Item 6,8,9: Third factor: Item 7 and 10; fourth factor: Item 1 and 2; Fifth factor Item 3 and 4. Factor loadings ranged 0.41–0.79. The five factors explained 57.4% of total variance.

Adequate

?

According to COSMIN recommendations EFA should be rated as adequate.

EFA identified 5 factors, but factors not explained by CFA should be performed too.

Construct validity- hypothesis testing

Corr. CAIS with VVIQ-2

r=0.42

Corr. CAIS with MASMI

r=−0.12

Corr. CAIS with Bett’s QMI

visual r=−0.31, auditory r=−0.46, cutaneous r=−0.37, kinaesthetic r=−0.36, gustatory r=−0.42, olfactory r=−0.41, organic r=−0.25

Doubtful

?

Measurement properties of the comparator instrument insufficiently reported.

Very low corr. with other measures.

The corr. with the comparison instrument that measures the same construct is missing.

Edu

Tuznik & Francuz 2019 [107]

PL

PO

Musicians

39

22.5

21♀, 18♂

Construct validity- structural validity

PCA was conducted by forcing a one-factor solution. The factor loadings of 16 items ranged from 0.46 to 0.74. All factor loadings were >0.32. The total variance was explained by 34.48%.

Doubtful

?

Doubtful sample size.

Non-musicians

40

24.5

20♀, 20♂

Construct validity- hypothesis testing

Known-group validity

Neither gender (p=0.372) of participants or their level of musical expertise (p=0.114) differentiated the scores obtained.

Very good

?

Participants characteristics well described.

Not all results are in accordance with hypotheses.

Gordon Test of Visual Imagery Control (GTVIC)

n.d.s.

Lorenz & Neisser 1985 [162]

USA

E

Students

46

NR

NR

Construct validity- structural validity

PCA with the varimax rotation was used to extract 3 factors: Factor 1: Vividness and control, Factor 2: Spatial manipulation, Factor 3: childhood memory. GTVIC loaded on 1. factor with loading 0.81.

Inadequate

−

Sample size inadequate for this analysis.

n.d.s.

Kihlstrom et al. 1991 [163]

USA

E

Students

2805

NR

NR

Construct validity- structural validity

PCA with orthogonal rotation performed twice and showed:

 1. 4 factors: car in colour or not, car in normal motion or car in unusual positions or motions.

 2. 2 factors: car in normal motion or car in unusual positions or motions.

Doubtful

?

#, Participants not described. Unclear factor structure: four or two?

*Not all information reported for quality criteria rating.

Construct validity- hypothesis testing

Corr. GTVIC with Betts QMI

r=0.25

Corr. GTVIC with VVIQ

r=0.45

Inadequate

?

No information on measurement properties of the comparator instrument available.

See comment above about Betts QMI.

n.d.s.

Lequerica et al. 2002 [22]

USA

E

Students

80

22.1

39♀, 41♂

Construct validity- hypothesis testing

Corr. GTVIC with VMIQ visual subscale

r=0.72

Corr. GTVIC with MIQ visual subscale

r=0.45

Sign. corr. among subjective measures of mental imagery. No corr. between objective and subjective measures of mental imagery providing evidence for the multidimensional nature of imagery.

Adequate

+

# Students received extra credits in their psychology courses for participation. Results in accordance with the hypothesis.

n.d.s.

Pérez-Fabello & Campos 2004 [111]

ES

S

Students

479

20.5

70♀, 409♂

Construct validity- structural validity

PCA followed by varimax orthogonal rotation identified four factors. Movement, misfortune, colour, stationarity. The total variance explained by 55.6%. Factors loadings range 0.43 to 0.88.

Adequate

−

Statement of four- factor structure should be rejected. Item 6 loaded on two factors.

Fewer than 3 items loaded on factor 3 and 4.

Gordon Test of Visual Imagery Control (GTVIC)

n.d.s.

Pérez-Fabello & Campos 2004 [111]

ES

S

Students

479

20.5

70♀, 409♂

Construct validity- hypothesis testing

Corr. GTVIC with VVIQ

r=−0.40

Corr. GTVIC with VVQ

r=0.05

Adequate

?

Authors calculated corr. between different measures (construct validity), which measured different constructs. The corr. with the comparison instrument that measures the same construct is missing.

Alternate Form of the Gordon Test of Visual Imagery Control (TVIC)

n.d.s.

Mckelvie 1992 [28]

CA

E

Students

116

NR

49♀, 67♂

Criterion validity

Corr. GTVIC alternate form with GTVIC original

Pearson corr. r=0.52

Very good

−

Author calculated corr. between alternate form and original version of GTVIC, which belongs to criterion validity.

However, corr. between measures <0.70.

Imaging Ability Questionnaire (IAQ)

Med

Kwekkeboom 2000 [42]

USA

E

Participants from different sources

200

48.7

NR

Construct validity- structural validity

CFA with PCA and oblique rotation was performed and two factors confirmed: absorption and image generation. Factor loadings >0.44. The corr. between two factors was r=0.42.

Adequate

?

Adequate sample size for factor analysis.

*Not all information reported for quality criteria rating .

Imagery Questionnaire by Lane

n.d.s.

Lane 1977 [112]

CA

E

Students

320

NR

122♀, 198♂

Construct validity- structural validity

PCA with varimax rotation of modality yielded one factor: imagery control. Loadings ranged from 0.59 to 0.76. 11 factors were obtained in the component analysis of the individual items. While the composition of four of these factors approximated the content of four of the modalities, no factor completely and exclusively represented any given modality.

Doubtful

?

Insufficient information about factor analysis and quality criteria rating not possible.

60

NR

22♀, 38♂

Construct validity- hypothesis testing

Corr. Imagery by Lane with:

GTVIC r=0.53

Betts QMI r=0.57

Inadequate

−

Why comparison with Betts QMI, when not the same domains/constructs were investigated?

Kids Imaging Ability Questionnaire (KIAQ)

Med

Kwekkeboom et al. 2000 [113]

USA

E

Experts

3

NR

NR

Content validity

All reviewers agreed that the items adequately represented the construct of ‘imaging ability’. The content and language of items were assessed to be appropriate for 6- to 14-year-olds. The format, either self-administered or reading items to the child, was also agreed to be satisfactory.

Doubtful

?

Only 3 experts reviewed the KIAQ for relevance, comprehensiveness and comprehensibility.

Target population was not considered for evaluation of content validity.

Children

58

9.9

19♀, 39♂

Construct validity- hypothesis testing

Corr. KIAQ with SFPI

1. Time, N=54: r=0.31

2. Time, N=44: r=0.46

Doubtful

−

Doubtful if comparator instrument cover the same construct

Corr. <0.50.

Mental Imagery Scale (MIS)

n.d.s

Dercole et al. 2010 [114]

IT

I

Participants characteristics NR

262

29.0

92♀, 170♂

Construct validity- structural validity

EFA with oblimin rotation produced six factor solution: stability, perspective, distance, level of details, dimensions, rapidity. The total variance explained by 54.6%. Factors loadings 0.52–0.80.

Doubtful

+

Sample size very good but participants not described. CFA should be performed.

Plymoth sensory imagery questionnaire (Psi-Q)

n.d.s.

Andrade et al. 20141 [115]

UK

E

Students

404

NR

NR

Construct validity- structural validity

EFA with maximum likelihood extraction and oblimin rotation found seven factors with eigenvalues >1. Goodness of fit test: χ2/(371)=889. Factors loaded very strong, all >0.50 (range 0.53–0.87).

Very good

?

This article reported results from 3 studies.

*Not all information reported for quality criteria rating.

Plymoth sensory imagery questionnaire (Psi-Q)

n.d.s.

Andrade et al. 20142 [115]

UK

E

Students

209

NR

NR

Construct validity- structural validity

CFA with 7 factor model provided a good model fit: χ2/df=1.51, CFI=0.93, RMSEA=0.05.

doubtful

+

Accepted model fit: CFI >0.95, or SRMR <0.08, or

RMSEA <0.06.

n.d.s.

Andrade et al. 20143 [115]

UK

E

Students

212

23.4 (median)

59♀, 153♂

Construct validity- hypothesis testing

Corr. Psi-Q long version with VVIQ-2

r=0.67

Corr. Psi-Q short version with VVIQ-2

r=0.66

Inadequate

?

Measurement properties of the comparator instrument not reported.

Several modalities are covered with Psi-Q. Unclear which modality strong corr. (>0.50) with VVIQ-2.

n.d.s.

Pérez-Fabello & Campos 2020 [116]

ES

S

Students

394

21.0

101♀, 293♂

Construct validity- structural validity

CFA for long version with 7 factor model provided a good model fit: χ2 (733.95), df=413, GFI=0.89, CFI=0.92, NNFI=0.91, RMSEA=0.04, SRMR=0.05.

Very good

+

Accepted model fit: CFI >0.95, or SRMR <0.08, or RMSEA <0.06.

Construct validity- hypothesis testing

Corr. Psi-Q with Betts QMI was sign. (p<0.01), r=0.40–0.56

Corr. Psi-Q with VVIQ was sign. (p<0.01)

r=−0.30–0.41

Corr. Psi-Q with OSIVQ object was sign.

r=0.19–0.34

Doubtful

+

Measurement properties of the comparator instruments insufficiently reported.

The 75 % of the results are in accordance with the hypothesis.

Sport Imagery Ability Measure (SIAM)

Sport

Watt 20031 [36]

AU

E

Students

5

Range

15–16

NR

Content validity

Items were selected through examination of relevant imagery theories, analysis of research work in the field of imagery ability, and review and analysis of a number of existing measures of imagery ability, used in the areas of sport and general psychology. Students were asked about comprehensibility, professionals were asked about relevance and comprehensiveness. 6 experts reviewed all items. Comments and suggested modifications were analysed and incorporated into the final draft.

Doubtful

?

This article reported results from 4 studies.

Data recording and analysis are not clearly described.

Relevance, comprehensiveness and comprehensibility no evaluated by the population of interest.

Experts

6

NR

Revised Sport Imagery Ability Measure (SIAM-R)

Sport

Watt 20031 [36]

AU

E

Students

474

18.42

268♀, 206♂

Construct validity- structural validity

EFA with oblimin rotation, two factors: 1. dimensions and visual modality; 2. modalities minus visual modality. The total variance explained by 75%. Factors loadings greater than 0.50 (0.50–0.92). Only emotion variable had no loadings greater than 0.50. 1. Factor=0.45 and 2. Factor=0.43 both the loadings for this variable were very close.

Adequate

?

This article reported results from 4 studies, 20031=study 1.

Subscales emotion and kinaesthetic loaded on both factors with >0.40.

Sport

Watt 20032 [36]

AU

E

Athletes and students

633

18.77

334♀, 299♂

Construct validity- structural validity

CFA performed. The model of 4 factors (visual/dimensions, body feeling, chemical, emotion/auditory) produced the best fit indices for the data. Nonetheless, the combination of the emotion and auditory variables as a latent construct was considered implausible. The three-factor model involving auditory sense grouped with the other single organ senses of taste and smell, visual/dimensions, and bodily feeling had the greatest conceptual coherence as a representation of sport imagery ability. χ2 (df)=617.63 (51), CFI=0.92, NFI=0.91, TLI=0.89, RMSEA=0.13.

Doubtful

−

20032= study 2.

Rotation method by CFA not described.

Accepted model fit: CFI, NFI and TLI >0.95, or RMSEA <0.06.

Revised Sport Imagery Ability Measure (SIAM-R)

Sport

Watt 20033 [36]

AU

E

Athletes and students

436

18.35

232♀, 204♂

Construct validity- convergent and discriminant validity

Corr. SIAM-R with GTVIC, VMIQ-2, SQMI

All correlations between all the imagery tests and subscales were significant. Small to moderate correlations (r=0.27 to 0.48) were found for the SIAM control, vividness, visual, and kinaesthetic subscales with a number of the related dimension modalities variables of the other imagery measures, providing support for the convergent validity of these subscales of the SIAM.

Corr. SIAM with MAB

Very low to small correlations (r=0.01 to 0.20) reported between the SIAM subscales and (a) the cognitive ability measures and (b) unrelated dimension and modality variables of the other imagery measures, supporting the discriminant validity.

Very good

+

20033= study 3.

Appropriate sample size. The results are in accordance with the hypothesis.

Sport

Watt 20034 [36]

AU

E

Athletes

33

17.91

19♀, 14♂

Criterion validity- concurrent validity

Corr. SIAM with CV Imagery characteristic

visual=0.04, kinaesthetic=0.13, auditory=0.29, tactile=-0.20, emotion=0.19

Inadequate

-

20034= study 4.

Low sample size.

For criterion validity a valid measure should be considered as 'gold standard'.

Sport Imagery Ability Questionnaire (SAIQ)

Sport

Williams & Cumming 2011 [117]

UK

E

Athletes

403

20.2

198♀, 205♂

Content validity

5 sport psychology experts, who were experienced in designing questionnaires, and 5 athletes systematically examined the wording and the content of items. Content validity index was calculated.

Doubtful

?

Pilot study (SAIQ development).

Results from 4 studies reported in this article.

Insufficient information about test procedures: how data were collected- individually or group.

Data collection regarding relevance, comprehensiveness and comprehensibility doubtful.

Sport

Williams & Cumming 20111 [117]

UK

E

Athletes

375

24.7

179♀, 196♂

Construct validity- structural validity

20-item version was evaluated. Principle axis factoring with oblimin rotation resulted in 4 factors/subscales: skill imagery, strategy imagery, goal imagery and affect imagery. Final SAIQ included 12 items with 3 item per factor. Eigenvalues ranged from 1.13–4.05, together accounting for 69.63 % of the variance.

Adequate

+

Following COSMIN recommendation EFA should be rated as adequate.

Sport

Williams & Cumming 20112 [117]

UK

E

Athletes

363

24.8

175♀, 188♂

Construct validity- structural validity

12-item version evaluated.

CFA with maximum likelihood performed. The four-factor model demonstrated adequate fit model: χ2=96.19, CFI=0.96, TLI=0.95, SRMR=0.05, RMSEA=0.05. Factor loadings 0.58–0.86.

Very good

+

Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or RMSEA <0.06.

Sport

Williams & Cumming 20113 [117]

UK

E

Athletes

426

NR

199♀, 227♂

Construct validity- structural validity

Modified version (15 items and 5 subscale) evaluated. CFA with maximum likelihood performed. An adequate fit to the data was established for a final five-factor model: χ2=204.53, CFI=0.96, TLI=0.95,SRMR=0.04, RMSEA=0.06. Factor loadings 0.62-0.88.

Very good

+

Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or RMSEA<0.06.

Sport

Williams & Cumming 20114 [117]

UK

E

Athletes

220

19.5

86♀, 134♂

Construct validity- structural validity

Modified version (15 items and 5 subscale) evaluated with second population. CFA with maximum likelihood performed. An adequate fit to the data was established for a five-factor model: χ2=108.59, CFI=0.98, TLI=0.97, SRMR=0.04, RMSEA=0.04. Factor loadings 0.62–0.88.

Very good

+

Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or RMSEA<0.06.

Sport

Williams & Cumming 20114 [117]

UK

E

Athletes

220

19.5

86♀, 134♂

Construct validity- hypothesis testing

Corr. SIAQ with MIQ-3

Small to moderate corr. ranged from 0.14–0.24 suggesting that imagery ability of movement imagery and sport imagery content are not the same trait.

Doubtful

+

Authors used term concurrent validity, but criterion validity was evaluated.

The results are in accordance with the hypothesis.

Survey of Mental Imagery

n.d.s.

Switras 1978 [118]

USA

E

Students

350

NR

129♀, 221♂

Construct validity- convergent and discriminant validity

Convergent and discriminant validity supported by the fact that the corr. between both main dimensions (controllability and vividness) on the same test forms were les (discriminant) than the corr. between the same factors on the different test forms (convergent).

Doubtful

?

*Insufficient information reported for COSMIN and quality criteria evaluation.

28

NR

NR

Construct validity- structural validity

PCA with the orthogonal varimax rotation. 7 factors were extracted: visual, olfactory, somesthetic, kinaesthetic-tactile controllability, gustatory, kinaesthetic-tactile vividness, and auditory imagery. Factors loadings greater than 0.50. Form A: 0.60–0.81. Form B: 0.58–0.82.

Inadequate

-

FA performed only with 28 subtests (14 for each form).

n.d.s.

Grebot 2003 [119]

FR

F

Teachers

162

36.0

31♀, 131♂

Construct validity- structural validity

Factor analysis, performed on 4 modality-factor subtest scores, yielded four specific factors corresponding to 4 modalities of imagery for controllability, vividness and formation. Expanded variance for controllability ranged from 7.3–13% for all four subscales, for vividness from 8.7–14.2% and for formation from 8.0–13.9%.

Inadequate

−

Sample size for this analysis insufficient.

Visual Elaboration Scale (VES)

n.d.s.

Campos & Pérez 1988 [164]

ES

S

Students

147

19.8

60♀, 87♂

Construct validity- hypothesis testing

Corr. VES with MEIQ (MEIQ consists of 2 parts, visual scenes and personal actions, and three scales for each part: image, absorption and effort)

r= ranged from −0.28 to −0.43 for both parts and image + effort subscales. Only for subscale absorption no sign. corr.

Corr. VES with IDQ

r=0.21 (VES and verbal scale of IDQ)

r=0.27 (VES and imagery scale of IDQ)

Doubtful

?

Some information about comparator instrument provided, but no information on measurement properties of the comparator instrument.

Test procedures not described.

No hypothesis defined. Insufficient information about comparator instrument.

Vividness of Olfactory Imagery Questionnaire (VOIQ)

n.d.s.

Gilbert et al. 1998 [121]

USA

E

Fragrance expertsa

122

NR

63♀, 59♂

Construct validity- hypothesis testing

Corr. VOIQ with VVIQ

Experts r=0.18

Non-experts r=0.44

Know-groups validity

Sig. difference between experts and non-experts on the VOIQ score. No difference between men and women.

Inadequate

−

Psychometric properties of comparator instrument not reported.

Corr. with comparator instrument <0.50.

Participants described. Results in accordance with hypothesis

Non-expert controlsb

95

50♀, 45♂

Very good

+

Vividness of Object and Spatial Imagery Questionnaire (VOSI)

n.d.s.

Blazhenkova Olesya 20162 [122]

TR

NR

Students

205

21.0

95♀, 110♂

Construct validity- structural validity

CFA confirmed 2 factors: object and spatial imagery. Object items loaded above 0.45 and spatial items loaded above 0.44. Two-factor model χ2 (349)=759.30, p<.001, CFI=0.77, GFI=0.77, RMSEA=0.08.

Doubtful

−

Participants completed the study online.

Accepted model fit: CFI and GFI >0.95, or RMSEA <0.06.

Construct validity- hypothesis testing

Corr. VOSI and OSIQ

object imagery r=0.64

spatial imagery r=0.45

Adequate

+

Participants completed the study online.

Results are in accordance with the hypothesis.

Vividness of Visual Imagery Questionnaire (VVIQ)

n.d.s.

Rossi 1977 [123]

USA

E

Students

119

NR

NR

Construct validity- structural validity

PCA performed. A single component explained 42% of variance by first administration, and 52% variance by second. Items loaded >0.50.

Doubtful

?

Rotation method used not described.

*No all information reported for quality criteria rating. Sample size doubtful.

n.d.s.

Lorenz & Neisser 1985 [162]

USA

E

Students

46

NR

NR

Construct validity- structural validity

PCA with the varimax rotation used to extract 3 factors: Factor 1: Vividness and control, Factor 2: Spatial manipulation, Factor 3: childhood memory. VVIQ loaded on 1.factor with loading 0.78.

Inadequate

−

Sample size inadequate for this analysis.

n.d.s.

Kihlstrom et al. 1991 [163]

USA

E

Students

2805

NR

NR

Construct validity- structural validity

PCA with orthogonal rotation performed and showed 4 factors corresponded to the 4 content clusters of the VVIQ.

Doubtful

?

#, Participants not described. *Not all information reported for quality criteria rating.

n.d.s.

Campos et al. 2002 [124]

ES

S

Secondary school students

850

13.3

428♀, 422♂

Construct validity- structural validity

PCA with varimax orthogonal rotation confirmed a single factor, vividness of visual imagery. All items loaded over 0.50 (0.53–0.66) which explained 37 % of total variance.

Adequate

?

Test procedures only briefly reported.

*Insufficient information reported for quality criteria rating.

n.d.s.

Leboutillier & Marks 2001 [125]

UK

E

Students

198

23.86

75♀, 123♂

Construct validity- structural validity

PCA with oblique rotation confirmed 3 factors (nature scenes, person scene, shop scene) and explained variance by 58.6%.

Adequate

?

*Not all information reported for quality criteria rating.

n.d.s.

Campos & Pérez-Fabello, 2009 [126]

ES

S

Students

279

20.1

117♀, 162♂

Construct validity- hypothesis testing

Corr. VVIQ and Gordon Test

r=−0.24

Corr. VVIQ and Betts’ QMI

r=0.49,

Corr. VVIQ and VVIQ-2

r=−0.55

Doubtful

+

Some information on measurement properties of the comparator instrument. Results are in accordance with the hypotheses.

Revised version Vividness of Visual Imagery Questionnaire (VVIQ-2)

n.d.s.

Campos & Pérez-Fabello, 2009 [126]

ES

S

Students

279

20.1

117♀, 162♂

Construct validity- hypothesis testing

Corr. VVIQ-2 and Gordon Test

r=−0.23

Corr. VVIQ-2 and Betts’s QMI

r=−0.54

Corr. VVIQ and VVIQ-2

r=−0.55

Doubtful

+

Some information provided on measurement properties of the comparator instrument.

Results are in accordance with the hypotheses.

n.d.s.

Campos 2011 [106]

ES

S

Students

206

19.7

43♀, 163♂

Construct validity- hypothesis testing

Corr. VVIQ-2 and VVIQ-RV

r=0.67

Corr. VVIQ-2 and Betts’ QMI

r=−0.53

Corr. VVIQ-2 and MASMI

r=0.19

Corr. VVIQ-2 and OSIVQ

verbal scale r=0.07

Corr. VVIQ-2 and OSIVQ

object imagery scale r=0.51

Corr. VVIQ-2 and OSIVQ

spatial imagery scale r=0.04

Adequate

+

# Sufficient information provided on measurement properties of the comparator instrument.

Results are in accordance with the hypothesis: high corr. with Betts’ QMI and object imagery scale of OSIVQ, low corr. with MASMI and verbal + spatial scale of OSIVQ.

Vividness of Visual Imagery Questionnaire- Revised version (VVIQ-RV)

n.d.s.

Campos 2011 [106]

ES

S

Students

206

19.7

43♀, 163♂

Construct validity- hypothesis testing

Corr. VVIQ-RV and VVIQ-2

r=0.67

Corr. VVIQ-RV and Betts’ QMI

r=−0.53

Corr. VVIQ-RV and MASMI

r=0.16

Corr. VVIQ-RV and OSIVQ

verbal scale r=0.06

Corr. VVIQ-RV and OSIVQ

object imagery scale r=0.53

Corr. VVIQ-RV and OSIVQ

spatial imagery scale r=0.02

Adequate

+

#Only students participated and were reimbursed with course credits.

Sufficient information provided on measurement properties of the comparator instrument provided.

The results are in accordance with the hypothesis (see comment above).

Vividness of Wine Imagery Questionnaire (VWIQ)

Edu

Croijmans et al. 2019 [127]

NL

E

Volunteers with experience with wine

83

40.8

71♀,12♂

Construct validity- structural validity

PCA with oblique rotation employed and suggested 3 components: smell, taste, vision. Variance was explained by 68.8%. Factor loadings for smell 0.41–0.58, for taste 0.82–0.94, for vision 0.62–0.83.

Inadequate

−

Low sample size. Instability recognisable by smell items, which loaded on 2 factors (smell and taste)!

Construct validity- hypothesis testing

Corr. VWIQ with PSI-Q

smell r=0.36, taste=0.43, vision r=0.51

Corr. VWIQ-vision with VVIQ

r=−0.51

Corr. VWIQ-smell with VOIQ

r=−0.43

Inadequate

−

No description of participants.

No information about the measurement properties of comparator instrument.

Not all results are in accordance with the hypotheses.

b. Assessments of mental rotation

Cube-Cutting Task (CCT)

n.d.s.

Lorenz & Neisser 1985 [162]

USA

E

Students

46

NR

NR

Construct validity- structural validity

PCA with the varimax rotation used to extract 3 factors: Factor 1: Vividness and control, Factor 2: Spatial manipulation, Factor 3: childhood memory. Cube loaded on 2. factor with loading 0.86.

Inadequate

−

Sample size inadequate for this analysis.

n.d.s.

Richardson 1977 [165]

UK

E

Students

60

19.0 (male)

26♀

Construct validity- hypothesis testing

Sig. corr. for male established for:

CCCT and Rated Imagery Vividness r=0.68

CCT and MPFB r=0.42

CCT and Paper Folding r=0.43

CCT and Controllability of Imagery r=0.36

CCT and Personal Reaction Inventory r=−0.41

Sig. corr. for female established for:

CCT and Rated Imagery Vividness r=0.56

CCT and Necker Cube Fluctuations r=0.46 CCT and Memory for Designs r=0.34

CCT and Concealed Figures r=0.36

CCT and MPFB r=0.35

Inadequate

?

No information on measurement properties of the comparator instrument. No hypothesis defined. Insufficient information about comparator instrument.

20.0 (female)

34♂

n.d.s.

Lequerica et al. 2002 [22]

USA

E

Students

80

22.1

39♀, 41♂

Construct validity- hypothesis testing

Corr. CCT with MRT

r=0.58

Corr. CCT with PFT

r=0.47

Corr. CCT with JOLO

r=0.40

Corr. CCT with HVOT

r=0.50

Corr. CCT with WAIS-R

r=0.59

Inadequate

+

No information on measurement properties of the comparator instrument. The results are in accordance with the hypothesis: no sig. corr. between subjective and objective measures of mental imagery.

German Test of the Controllability of Motor Imagery in older adults (TKBV)

n.d.s.

Schott 2013 [29]

DE

G

Healthy

195

57.3

102♀, 93♂

Construct validity- structural validity

EFA with with the orthogonal varimax rotation showed two- factor structure: recognition and free recall. Total variance explained by 42%. Factors loaded ranged from 0.57–0.85.

Adequate

−

Adequate methodological quality because no CFA performed.

Variance explained by two factors < 50%.

Construct validity- hypothesis testing

Corr. TKBV Recognition and TUG

r=−0.31

Corr. TKBV Recognition and MIQ visual

r=0.143

Corr. TKBV Recognition and MIQ kinaesthetic

r=0.13

Corr. TKBV Free recall and TUG

r=−0.33

Corr. TKBV Free recall and MIQ visual r=0.14

Corr. TKBV Free recall and MIQ kinaesthetic

r=0.11

No gender difference established.

Doubtful

?

Some information about comparator instrument provided, but no information on measurement properties of the comparator instrument.

No hypothesis defined.

Construct validity-hypothesis testing

Corr. TKBV Recognition with Corsi block tapping test

r=0.45

Corr. TKBV Free recall with Corsi block tapping test

r=0.38

Corr. TKBV Recognition with physical activity

r=0.50

Corr. TKBV Free recall with physical activity

r=0.36

Very good

−

Low corr. with comparator instrument <0.50.

Left/Right Judgements (LRJ)

Med

Bray & Mosley 2011 [129]

AU

E

Patients with back paina

5

46.0

1♀, 4♂

Construct validity- hypothesis testing

Know-groups validity

Patients with back pain made more errors overall than controls (p<0.015).

The patients made more mistakes on the trunk rotation judgement task than on the hand judgement task (p<0.001).

Doubtful

+

Results are in accordance with hypothesis.

However, sample size very small.

Healthyb

5

40.0

2♀, 3♂

n.d.s.

Wallwork et al. 2013 [166]

AU

E

Volunteers

1737

40.0

520♀, 1130♂

Construct validity- hypothesis testing

Know-groups validity

Response time increased with age, was greater in females than in males and was greater in left-handers than in right-handers (p<0.001). Accuracy reduced with age (p<0.001), but was unaffected by gender or handedness (p=0.493).

Very good

?

Sample size very good but gender imbalance (much more female participants than males).

That should be taken into account for a know-groups-validity analysis.

Left/Right Judgements (LRJ)

Med

Bowering et al. 2014 [167]

AU

E

Patients with back pain + healthy

1008

37.0

324♀, 684♂

Construct validity- hypothesis testing

Know-groups validity

Response time was not affected by back pain status. Patients who had back pain at the time of testing were less accurate than pain-free controls (p=0.027), as were patients who were pain free but had a history of back pain (p<0.01).

Doubtful

−

Insufficient description of participants (both groups) characteristics. Results are not in accordance with hypothesis.

n.d.s.

Zimney et al. 2018 [130]

USA

E

Students

50

24.3

15♀, 35♂

Criterion validity

Corr. card based with tablet version LRJ

Accuracy left r=0.46

Accuracy right r=0.26

RT r=0.78

Very good

?

Corr. between card-based version and ‘gold standard’ only for response time >0.70.

Should be evaluated with a larger sample size.

n.d.s.

Williams et al. 20191 [131]

AU

E

Healthy

20

55.3

5♀, 15♂

Criterion validity

Corr. between tablet and desktop version

Hand judgements ICC=0.84 for RT and ICC=0.91 for accuracy

Doubtful

+

Sample size could be doubtful for both studies.

However, corr. between tablet version and desktop as ‘gold standard’ very good.

n.d.s.

Williams et al. 20192 [131]

AU

E

Healthy

37

38.5

9♀, 28♂

Criterion validity

Corr. between tablet and desktop version

Back, foot, and neck judgements

ICC=0.88 for RT and ICC=0.78 for accuracy

Doubtful

+

Map Rotation Ability Test (MRAT)

n.d.s.

Campos & Campos-Juanatey 2020 [133]

ES

S

Students

257

19.7

86♀, 171♂

Construct validity- hypothesis testing

Corr. MRAT with MRT

r=0.42

Corr. MRAT with MASMI

r=0.40

Corr. MRT with SOST

r=0.35

Corr. MRAT with VVIQ

r=0.08

Doubtful

+

Some information on measurement properties of the comparator instrument reported.

Structural validity not mentioned.

Results are in accordance with hypothesis.

Mental Rotation of Three-Dimensional Objects (MRT)

n.d.s.

Vandenberg & Kuse 1978 [136]

USA

E

Students

312

NR

115♀,197♂

Construct validity- hypothesis testing

Corr. Mental Rotation with spatial relation r=0.50

Corr. Mental Rotation with Chair-Window r=0.45

Corr. Mental Rotation with Identical Blocks r=0.54

Inadequate

?

No information on constructs measured by the comparator instrument.

No information on measurement properties of the comparator instrument.

Measure of the Ability to Form Spatial Mental Imagery (MASMI)

n.d.s.

Campos 2009 [96]

ES

S

Students

138

20.1

63♀, 75♂

Construct validity- hypothesis testing

Corr. MASMI and PMA

r=0.44

Corr. MASMI and VVIT

r=0.14

Corr. MASMI and GTVIC

r=0.02

Corr. MASMI and VVIQ

r=−0.15

Corr. MASMI and VVIQ-2

r=0.13

Corr. MASMI and Betts’ QMI

r=−0.02

Adequate

?

Some information on measurement properties of the comparator instrument provided.

Structural validity not mentioned.

Corr. between tests calculated but no hypotheses defined.

n.d.s.

Campos& Campos-Juanatey 2020 [137]

ES

S

Students

281

19.8

97♀, 184♂

Construct validity- hypothesis testing

Corr. MASMI with MRT

r=0.42

Corr. MASMI with OSVIQ

object r=-0.06. spatial r=0.38, verbal r=-0.09

Corr. MASMI with SOST

r=0.35

Doubtful

?

Some information on measurement properties of the comparator instrument provided.

Structural validity not mentioned.

Not all results are in accordance with hypotheses.

Measure of the Ability to Rotate Mental Images (MARMI)

n.d.s.

Campos 2012 [139]

ES

S

Students

354

19.5

45♀, 309♂

Construct validity- hypothesis testing

Corr. MARMI with MRT

r=0.40

Corr. MARMI with PMA

r=0.38

Corr. MARMI with MASMI

r=0.48

Corr. MARMI with VVIQ-2

r=0.10

Sign. difference between women and men (p<0.05). Men obtained sig. higher image rotation scores than women.

Doubtful

?

Some information about comparator instrument provided, but no information on measurement properties of the comparator instrument.

Not all results are in accordance with hypotheses.

c. Assessments of mental imagery to distinguish between different types of imagers

Object-Spatial Imagery Questionnaire (OSIQ)

n.d.s.

Blajenkova et al. 20061 [34]

USA

E

Students

25

NR

NR

Content validity

Student interviewed about all items from the OSIQ. 3 experts in the field of mental imagery reviewed the OSIQ object and spatial items. Agreement among judges was 97%.

Doubtful

?

This article reported results from 4 studies.

No details reported about interviews.

Unclear if students were asked about relevance, comprehensiveness and comprehensibility.

Experts

3

n.d.s.

Blajenkova et al. 20062 [34]

USA

E

Students

164a

range (18-50)a

63♀, 83♂a

Construct validity- hypothesis testing

Corr. OSIQ object with:

Paper Folding r=-0.10

Vandenberg-Kuse r=0.11

DTP r=0.19

VVIQ r=0.48

Corr. OSIQ spatial with:

Paper Folding r=0.22

Vandenberg-Kuse r=0.26

Degraded Pictures r=0.05

VVIQ r=0.18

Doubtful

-

a= study 2a.

Corr. between OSIQ object and Degraded Pictures as well as VVIQ was sign. but <0.70.

Corr. between OSIQ spatial and Paper Folding as well as Vandenberg-Kuse was sign. but <0.50.

49b

Range

17–47b

19♀, 30♂b

Construct validity- hypothesis testing

Corr. OSIQ object with:

Paper Folding r=-0.33

Vandenberg-Kuse r=-0.19

Spatial Imagery Test r=-0.24

DPT r=0.31

Corr. OSIQ spatial with:

Paper Folding r=0.51

Vandenberg-Kuse r=0.49

Spatial Imagery Test r=0.47

Degraded Pictures r=-0.05

Doubtful

-

b= study 2b

Sample size doubtful, stronger corr. found as in study 2a.

Sign. corr. between OSIQ object and Degraded Pictures was established. But corr. was very weak <0.50.

Sign. corr. between OSIQ spatial and another measures for spatial imagery was established. But also very weak <0.50.

n.d.s.

Blajenkova et al. 20063 [34]

USA

E

Students

45

Range

18–30

18♀, 27♂

Construct validity: discriminant validity

Corr. OSIQ object with:

APM r=-0.24

WAIS: Similarities r=-0.00

Advanced Vocabulary r=-0.12

Corr. OSIQ spatial with:

APM r=0.20

WAIS: Similarities r=-0.20

Advanced Vocabulary r=-0.25

Doubtful

+

Sample size doubtful.

OSIQ scales did not sig. correlate with measures of verbal and non-verbal intelligence.

The results are in accordance with the hypothesis.

n.d.s.

Blajenkova et al. 20064 [34]

USA

E

Visual artists

28

NR

11♀, 17♂

Construct validity- hypothesis testing

Know-groups validity

Visual artist scored higher than scientists and humanities professionals did on objects imagery scale. Scientists scored higher than visual artists and humanities professionals did on the spatial scale.

Doubtful

+

Authors used a term 'criterion validity', although the relationship between imagery abilities among different professions (subgroups) was investigated.

However, characteristics of the group poorly described. The results are in accordance with the hypothesis.

Natural scientists

24

19♀, 5♂

Humanities professionals

23

9♀, 14♂

Object-Spatial Imagery and Verbal Questionnaire (OSVIQ)

n.d.s.

Blazhenkova & Kozhevnikov1 [35]

USA

E

Experts

3

NR

NR

Content validity

3 experts reviewed the verbal items with regard to their relevance to verbal cognitive style. After excluding all of the items on which there was a disagreement between the judges, items were administered to a sample of 166 students.

Doubtful

?

This article reported results from 2 studies.

No details reported about interviews.

Not clear if students were asked about relevance, comprehensiveness and comprehensibility?

Expert asked only about relevance.

Students and professionals from different fields

625

24.0

251♀,374♂

Construct validity- structural validity

First PCA revealed 18 factors with eigenvalues above 1.

Only three factors (object, spatial, verbal imagery), had eigenvalues markedly higher than the others. These first 3 factors explained 31.95% of the variance. Based on the results from the initial PCA, a second PCA with varimax rotation was performed. The 45 OSIVQ loaded from 0.13–0.73.

Adequate

−

# Several factors loaded lower than 0.45 and variance explained by factors <50%.

n.d.s.

Blazhenkova & Kozhevnikov 20092 [35]

USA

E

Students

128

24.0

93♀,35♂

Construct validity- structural validity

Confirmatory factor analysis: the estimated three-factor model, and values of fit suggest that the three-factor model fits the data well. Model three-factor, χ2=27.61, df=24.00, p value=0.28, χ2/df= 1.15, CFI=0.97, RMSEA=0.03.

Inadequate

?

Sample size not appropriate for this analysis.

Accepted model fit: CFI>0.95, or RMSEA <0.06.

But several factors from previously PCA loaded very low.

Construct validity- hypothesis testing

Corr. OSIVQ spatial with spatial measures PFT r=0.47 and with MRT r=0.31. OSIVQ verbal positiv corr.

Corr. OSIVQ verbal with verbal measures:

arranging words r=0.17 and with SAT verbal r=0.20. OSIVQ object positiv corr.

Corr. OSIVQ object with VVIQr=0.41

Doubtful

+

Some information on measurement properties of the comparator instrument reported.

The results are in accordance with the hypothesis.

n.d.s.

Campos & Pérez-Fabello 2011 [168]

ES

S

Students

213

19.6

62♀,151♂

Construct validity- structural validity

First analysis was PCA with varimax rotation and 13 factors identified, but only 3 factors had eigenvalues above 3.0 and explained 33.1% of the variance. A second three-factor forced PCA with varimax rotation was performed. Factor loadings was 0.07–0.80.

Inadequate

−

Sample size not appropriate for this analysis.

Several factors loaded very low and variance explained by factors < 50%.

Paivio’s Individual Differences Questionnaire (IDQ, 86 items)

n.d.s.

Paivio & Harshman 1983 [141]

CA

E

Students

713

NR

NR

Construct validity- structural validity

FA with the oblique, 6 factor model (six factor: good verbal expression fluency, habitual use of imager, concern with correct use of words, self-reported reading difficulties, use of images to solve problems, vividness of daydreams/ dreams) provided a better fit to the data than the two-factor model.

Adequate

?

Data were collected in 1968 and 1970 with two samples. Finally data from 713 students analysed (collected in both years) but no details about samples available. *Insufficient data for quality criteria rating proposed by COSMIN.

Paivio’s Individual Differences Questionnaire (shorted IDQ, 34 items)

n.d.s.

Kardash et al. 1986 [142]

USA

E

Students

189

NR

99♀, 90♂

Construct validity- structural validity

CFA with the oblique five-factor model (factors: good verbal expression fluency, habitual use of imagery, concern with correct use of words, self-reported reading difficulties, vividness of daydreams, dreams) provided highest values: χ2=811.36, df=517, AGFI=0.77. Variance was explained by 71–77 %. Factor loadings 0.25–0.80. Only on item <0.25.

Adequate

−

AGFI value>0.95.

Several factors loaded lower than 0.45.

Revised Paivio’s Individual Differences Questionnaire (IDQ, 72 items)

n.d.s.

Hiscock 19782 [109]

USA

E

Students

123

NR

55♀, 68♂

Construct validity- hypothesis testing

Corr. IDQ imagery scale with:

GTVIC r=0.21

Betts QMI visual scale r=0.49

Betts QMI auditory scale r=0.21

Marlowe-Crowne scale did not exceed r=0.11.

Doubtful

−

This article reported results from 4 studies.

Construct measured by the comparator instrument unclear. The corr. with the comparison instrument that measures the same construct is missing.

n.d.s.

Hiscock 19783 [109]

USA

E

Students

79

NR

36♀, 43♂

Construct validity- hypothesis testing

Corr. IDQ imagery scale with:

GTVIC r=0.56

Betts QMI visual scale r=0.46

Betts QMI auditory scale r=0.24

Corr. Betts QMI visual scale with GTVIC

r=0.47

Inadequate

−

Construct measured by the comparator instrument not clear and measurement properties of the comparator instrument not reported. See comment above.

Two measures (Visual Memory Scale and Visual Manipulation Scale) developed specifically for use in the present study.

Revised Paivio’s Individual Differences Questionnaire (IDQ, 86 items)

n.d.s.

Hiscock 19784 [109]

USA

E

NR

81

NR

81♀

Construct and criterion validity

Corr. IDQ imagery scale with Study of Values

r=0.35

Corr. IDQ verbal scale with Quick Word Test

r=0.41

Inadequate

−

Different validity terms may be misunderstood in this study: construct and criterion validity.

Author described the aim of the study as assessing of construct validity (various tests were correlated, but did not mention what was expected).

However, the author used same measures to predict the findings, which is a part of criterion and not construct validity.

The relevance of this study doubtful.

Sussex Cognitive Styles Questionnaire (SCSQ

n.d.s.

Mealor et al. 20161 [143]

UK

E

Students

1542

27.0

586♀, 956♂

Construct validity- structural validity

EFA with an oblique rotation suggesting a six factor solution: imagery ability, technical /spatial, language and word forms, need for organisation, global bias, systemising tendency.

The reduced version of the questionnaire contained 60 items, which explained 32% of total variance. Factor loading ranged from 0.31 to 0.74.

Adequate

?

20161=study 1.

Several items loaded <0.50.

These items should be considered for deletion. CFA should be performed.

Construct validity- hypothesis testing

Know-groups validity

Females scored higher on imagery ability and males scored higher on technical/spatial.

Doubtful

?

Participant's characteristics insufficiently described and not all results are in accordance with hypothesis.

n.d.s.

Mealor et al. 20163 [143]

UK

E

Volunteers

121

35.0

24♀,97♂

Construct validity- hypothesis testing

Know-groups validity

Females scored higher on imagery ability, and males scored higher on both technical/spatial, and systemising tendency.

The differences observed between grapheme-colour and sequence-space synaesthetes on SCSQ scales shows that different forms of synaesthesia may predict different aspects of cognition.

Very good

?

20163=study 3.

Participants with equence-space synaesthesia, or grapheme-colour synaesthesia or with both. Participants characteristics described but not all results are in accordance with hypothesis.

Verbalizer-Visualiser Questionnaire (VVQ)

n.d.s.

Campos et al. 2004 [145]

ES

S

Students

969

14.2

496♀, 473♂

Construct validity- structural validity

PCA with varimax orthogonal rotation yielded 5 factors: 1. Factor= interest in words, 2. Factor= dream vividness and frequency, 3. Factor= verbal fluency, 4. Factor= task performance difficulty, 5. Factor= ways of thinking and acting. Factors loaded 0.43–0.77.

This test does not have a clear factorial structure.

Adequate

−

Only high school students tested.

Not all information reported for quality criteria rating.

But this finding is in contrast with findings from previous studies, that obtained only 2 factors.

Construct validity- hypothesis testing

Corr. VVQ with GTVIC

r=0.08

Inadequate

−

No information on the measurement properties of the comparator instrument. Corr. found was very weak. It was expected. But the corr. with the comparison instrument that measures the same construct is missing.

n.d.s.

Wedell et al. 2014 [146]

DE

G

Volunteers

476

24.1

99♀, 377♂

Construct validity- structural validity

FA and varimax rotation yielded 2 factors: visualizer and verbalizer. However, a large deviation between original and translated version was established. 7 items cannot clearly be attributed to one of the both factors.

Adequate

?

Quality criteria for good measurements properties cannot be rated.

d. Assessments of use of mental imagery

Children’s Active Play Imagery Questionnaire

(CAPIQ)

Sport

Cooke et al. 20141 [147]

CA

E

Experts

7

NR

NR

Content validity

The assessment of item-content relevance and comprehensiveness was conducted by experts. Target population was not involved in this step. Not clear if data were analysed by 2 researchers independently.

Doubtful

?

Relevance, comprehensiveness and comprehensibility not evaluated in this phase.

Sport

Cooke et al. 20142 [147]

CA

E

Children

302

10.0

145♀, 157♂

Construct validity- structural validity

PCA with oblimin rotation identified a three-factor solution with 11 items. Factor 1=capability imagery. Factor 2=social imagery. Factor 3=fun imagery. The variance was explained by 61.4%. The interfactor correlations were low to moderate (1+2 r=0.23, 1+3 r=0.30, 2+3 r=0.44).

Adequate

?

Very good sample size. Factors loading not reported.

Children’s Active Play Imagery Questionnaire (CAPIQ)

Sport

Cooke et al. 20143 [147]

CA

E

Children

252

10.4

118♀, 134♂

Construct validity- structural validity

CFA with three-factor model provided acceptable model fit: CFI=0.95, NFI=0.92, TLI=0.93, RMSEA=0.07.

Very good

−

Accepted model fit: CFI>0.95, or SRMR<0.08, or RMSEA<0.06

Almost all fits just below cut-off.

Construct validity- hypothesis testing

Known-group validity

No significant effects were noted between age (7–10 and 11–14) and for any of the imagery functions. Significant main effect for gender was found for capability imagery, (p=0.052), with females reporting more use of this imagery function.

Doubtful

?

Insufficient description of participants characteristics. Not all results are in accordance with hypothesis.

Sport

Kashani et al. 2017 [148]

IR

Pe

Students

190

11.5

85♀, 85♂

Construct validity- structural validity

CFA based on the structural equation mode confirmed three-factor model with acceptable model fit: χ2=88.59, df=41, CFI=0.94, TLI=0.93, RMSEA=0.08.

Very good

−

Accepted model fit: CFI>0.95, or SRMR<0.08, or RMSEA<0.06

Almost all fits just below cut-off.

Exercise Imagery Questionnaire-Aerobic Version (EIQ-AV)

Sport

Hausenblas et al. 19992 [149]

CA

E

Experts

3

NR

NR

Content validity

3 exercise professionals and 3 exercise participants commented on the wording, phraseology, and scoring of the questionnaire items. Minor revisions were made to the questionnaire items based on their comments.

Doubtful

?

This article reported results from 3 studies.

No information whether experts and athletes were asked about relevance and comprehensiveness and how data were analysed.

Athletes

3

Athletes

3071

22.91

9♀,296♂1

Construct validity- structural validity

PCA with varimax rotation conducted for each sample to reduce items. From this analysis a three-factor structure emerged accounting for 63.8% of the variance in sample 1 and 67.6% of the variance in sample 2. The three factors are: energy, appearance, and technique.

Very good

?

*Insufficient information (e.g. factors loading) reported for quality criteria rating.

Athletes

1712

22.42

3♀,168♂2

Hausenblas et al. 19993 [149]

CA

E

Athletesa

144

22.0

16♀,128♂

Construct validity- structural validity

CFA was conducted. Some items were removed. The revised model yielded good fit indices: Athletesa: χ2=40.5, χ2/df=1.69, RMSR=0.05, SRMSR=0.05, GFI=0.94, AGFI=0.89, NFI=0.92, NNFI=0.95, GFI=0.97. Athletesb: χ2=49.6, χ2/df=2.06, RMSR=0.05, SRMSR=0.05, GFI=0.96, AGFI=0.93, NFI=0.95, NNFI=0.96, GFI=0.97. Finally, version consists of 9 items.

Very good

+

Very good sample size.

Steps of data analysis very clear described. Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or RMSEA<0.06.

Athletesb

267

22.4

5♀,262♂

Sport

Pérez-Fabello & Campos 2020 [150]

ES

S

Students

166

20.1

127♀,39♂

Construct validity- structural validity

CFA and two-factor model (only factors energy and technique, the factor appearance was eliminated) revealed a better fit indicates: χ2 (df=8)=14.95, GFI=0.97, CFI=0.97, NNFI=0.94, RMSEA=0.07, SRMR=0.04.

Very good

+

Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or RMSEA<0.06.

Construct validity- hypothesis testing

Sign. corr. among the three EIQ scales: technique with appearance imagery r=0.52, technique with energy imagery r=0.56, energy with appearance imagery r=0.48

No corr. found between EIQ and MIQ-R, VMIQ, or VVIQ. Only low corr. (r=0.26) was found between EIQ technique and GTVIC.

Very good

−

Most of the results are not in accordance with the hypothesis.

Sport Imagery Questionnaire (SIQ)

Sport

Hall et al. 19981 [151]

CA

E

Experts

4

NR

NR

Content validity

4 research experts, in the area of sport psychology and 4 in cognitive psychology assessed content validity. The content, format, wording of the items and usage within athletic populations were determined and evaluated by experts.

Doubtful

?

This article reported results from 3 studies.

No details reported about interviews, insufficient information about data analysis.

Unclear whether athletes were asked about relevance, comprehensiveness and comprehensibility.

Sport

Hall et al. 19981 [151]

CA

E

athletes

113

23.6

53♀,60♂

Construct validity- structural validity

46-item version

PCA and maximum likehood with oblique rotation was employed. MG was separated in two different factors: represent two distinct subscales: MG-A= motivational general arousal and MG-M= motivational general mastery.

Inadequate

?

Sample size for this analysis not appropriate.

Quality criteria for good measurements properties cannot be rated.

Sport

Hall et al. 19982 [151]

CA

E

Students

161

NR

NR

Construct validity- structural validity

30-item version, 5 scales

PCA and maximum likelihood with oblique rotation was employed. Results showed that the items loaded very cleanly onto 5 factors (cognitive general, cognitive specific, motivational specific, motivational general arousal, motivational general mastery) and all items loaded above the criterion level (>0.35). Factors loading ranged from 0.45–0.97.

Adequate

?

EFA performed. Sample size doubtful.

Variance explained by factors not reported.

Sport

Hall et al. 19983 [151]

CA

E

Athletes

271

NR

184♀,87♂

Construct validity- structural validity

30-item version, 5 scales

PCA revealed the existence of 5 distinct factors: cognitive general, cognitive specific, motivational specific, motivational general arousal, motivational general mastery.

Factors loaded >0.45. Total variance explained by 57.5%.

Adequate

+

EFA with adequate sample size performed.

Sport

Vurgun et al. 2012 [152]

TR

Tu

Athletes

142

21.8

100♀,42♂

Construct validity- structural validity

EFA and varimax rotation determined 30 items and 5 factors. The explained variance was by 65.48%. CFA with maximum likelihood estimation method performed and the model found with the EFA showed a good fit to the data: χ2 (395)=632.55, GFI=0.77, CFI=0.88, NNFI=0.87, RMSEA=0.06, SRMR=0.07.

Inadequate

+

Sample size inadequate for this analysis.

Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or RMSEA<0.06.

Sport

Ruiz & Watt 2014 [153]

Not clear

S

Athletes

361

24.1

234♀,29♂

Construct validity- structural validity

The CFA representing the 30-item 5 factor SIQ model revealed acceptable fit to the data, χ2 (378)=694.60; CFI=0.91; TLI=0.90; RMSEA=0.05; SRMR=0.05). Factors loaded 0.41-0.83.

Very good

+

Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or RMSEA<0.06.

Sport Imagery Questionnaire for Children (SIQ-C)

Sport

Hall et al. 20091 [154]

CA

E

Young athletes

428

10.9

137♀,291♂

Construct validity- structural validity

CFA approached a reasonable fit for the hypothesised five-factor model; Q=3.08, CFI=0.89, GFI=0.89, RMSEA=0.07.

Doubtful

-

This article reported results from 3 studies.

Rotation method not described.

Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or RMSEA<0.06.

 

Sport

Hall et al. 20092 [154]

CA

E

Young athletes

628

NR

283♀,345♂

Construct validity- structural validity

CFA performed, with a five-factor model of imagery use being hypothesised: (Q=3.33, CFI=0.89, GFI=0.91, RMSEA=0.06) indicated that the measurement model was tenable.

Doubtful

-

Rotation method not described.

Model fits were at the limit. Accepted model fit: CFI, TLI>0.90, or RMSEA<0.10.

Sport

Hall et al. 20093 [154]

CA

E

Young athletes

82

11.5

21♀,61♂

Construct validity- hypothesis testing

Corr. for MG-M and self-confidence r=0.73 and for MG-M and self-efficiency r=0.61.

Corr. for CS imagery and self-confidence r=0.39 and self-efficacy r=0.41, CG imagery and self-confidence r=0.38 and self-efficacy r=0.38.

Adequate

+

Confidence was measured with the CSAI-2, self-efficacy with the SEQ-S.

Some information on measurement properties of comparator instrument provided.

Results are in accordance with the hypothesis.

Spontaneous Use of Imagery Scale (SUIS)

n.d.s.

Nelis et al. 2014 [156]

UK

E/ D

Studentsa

491

18.6

88♀,403♂

Construct validity- structural validity

EFA in group a suggested two components.

CFA was conducted in groups b and c evaluating a one- and two-factor model. The one-factor model was accepted as final for the following reasons: Fit indices did not strongly differ between the two models, and in the two-factor model, the factors were highly correlated. Fit indices group b: CFI: 0.93. TLI=0.92, RMSEA=0.06, χ2=115 .50

df=54, p<.001. Factor loadings 0.35–0.98. 2 Items 1 and 6 did not reach 0.30. Fit indices group c: CFI: 0.91. TLI=0.89, RMSEA=0.07, 174.19, df=54, p<.001.Factor loadings 0.40–0.71. 2 items 1 and 6 did not reach 0.30.

Very good

+

# Very good sample size. The steps of data analysis very clearly described. Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or RMSEA<0.06.

Volunteersb

373

34.9

119♀,254♂

Studentsc

433

18.4

82♀,351♂

Construct validity- hypothesis testing

Corr. SUIS with VVIQ

r(350)=−0.35, p<.001

Corr. SUIS with visual subscale of the QMI

r(338)=−0.38, p<.001.

Doubtful

+

The results are in accordance with hypothesis. Incomplete information on measurement properties of the comparator instrument.

n.d.s.

Görgen et al. 20161 [157]

DE

G

Students

216

23.7

60♀,156♂

Construct validity- structural validity

CFA one-factor model revealed acceptable fit indices: χ2 (df=54)=86.91, p<.01, RMSEA=0.05, CFI=0.92, TLI=0.90. Factor loadings 0.21–0.64. One item (item 6) reach −0.05.

Very good

-

This article reported results from two studies.

Good sample size.

Several factors loaded very low.

Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or

RMSEA<0.06.

Construct validity- hypothesis testing

Corr. SUIS with TABS

R=0.43, p<0.001

Corr. SUIS with RSQ

r=0.14, p<0.05

Adequate

?

Sufficient information on measurement properties of the comparator instrument. Very low corr., no hypothesis defined. Insufficient information about comparator instrument.

n.d.s.

Görgen et al. 20162 [157]

DE

G

Students

447

24.9

161♀,286♂

Construct validity- structural validity

SUIS 17-item version

CFA one-factor model revealed acceptable fit indices: χ2 (df=119)=413.71, p<.001, RMSEA=0.07, CFI=0.92, TLI=0.91.Factor loadings 0.26–0.73.

Very good

−

Very good sample size.

One factor loaded <0.40. Accepted model fit: CFI, TLI>0.95, or SRMR <0.08, or RMSEA<0.06.

n.d.s.

Görgen et al. 20162 [157]

DE

G

Students

447

24.9

161♀,286♂

Construct validity- hypothesis testing

Corr. SUIS 17-item with STAI-T

r=0.16, p<0.01

Corr. SUIS 17-item with TABS

r=0.42, p< 0.001

Adequate

?

Sufficient information on measurement properties of the comparator instrument. Very low corr., no hypothesis defined. Insufficient information about comparator instrument.

n.d.s.

Tanaka et al. 20181 [158]

JP

J

Students

126

20.6

66♀,60♂

Construct validity- structural validity

CFA and single-factor model was performed. The model fit indices are marginally acceptable: RMSEA=0.09, GFI=0.88, AGFI=0.82, CFI=0.66.

Doubtful

-

Rotation methods for CFA not described.

Accepted model fit: CFI, TLI>0.95, or SRMR<0.08, or RMSEA<0.06.

n.d.s.

Tanaka et al. 20182 [158]

JP

J

Patients with SAD

20

30.9

12♀,8♂

Construct validity- hypothesis testing

Know-groups validity

No significant difference in mean SUIS-J score between patients with SAD (38.7, SD=5.06) and healthy controls (36.1, SD=6.9), p=0.92.

Very good

?

20182=study 2. SAD=social anxiety disorder.

Assumable that data from healthy participants from study 1 were analysed.

No hypothesis defined.

  1. Legend: The superscript numbers were used to distinguish the results per group
  2. Disciplines in which field the tool was evaluated: Edu education, Med medicine, Psy psychology, n.d.s. not disciplines specific, healthy participants/students
  3. Language of the tool, E English, F French, G German, D Dutch, I Italian, S Spanish, Se Swedish, Tu Turkish
  4. Country abbreviations: AU Australia, CA Canada, DE Germany, ES Spain, FR France, IR Iran, IT Italy, JP Japan, MX Mexico, NL Netherlands, SE Sweden, TR Turkey, PL Poland, UK United Kingdom, USA United States of America
  5. Advanced Vocabulary Advanced Vocabulary Test, AGFI adjusted goodness of fit index, APM Advanced Progressive Matrices, CFA confirmatory factor analysis, CI confidence interval, CFI Comparative fit index, corr. correlation, COSMIN COSMIN Consensus-based Standards for the selection of health Measurement Instruments Risk of Bias Checklist, CV Water Polo Imagery Concurrent Verbalisation (CV) Activity was developed by Watt 2003 [36] only for evaluating of criterion validity, DPT Degraded Pictures Test for measures object imagery, df degrees of freedom, EFA exploratory factor analysis, HVOT Hooper Visual Orientation Test, ICC interclass correlation coefficient, JOLO Judgement Of Line Orientation, MAB Multidimensional Aptitude Battery (MAB - Spatial Ability and Verbal Comprehension), MEIQ Mental Imagery Questionnaire, MIQ-3 Movement Imagery Questionnaire-3, MPFB Minnesota Paper Board Form, MRT Mental Rotation of Three-dimensional Objects, N sample size, NFI normed fit index, NNFI non-normed fit index, NR not reported, PMA the Spatial Test of Primary Mental Abilities, PCA Principal Component Analysis, PFT, RT response time, SEQ-S Self-Efficacy Questionnaire—Soccer, SFPI Singer Fantasy Proneness Interview, SRMR standardised root mean square residual, STAI-T Trait-Angstskala des State-Trait-Angstinventars, TLI Tucker-Lewis index, VKMRT Vandenberg-Kuse=Vandenberg-Kuse Mental Rotation Test, WAIS Similarities Test of the conceptual similarity between the two words, TABS Tellegen Absorption Scale, RSQ Response Styles Questionnaire, sign. significant, WAIS Wechsler Adult Intelligent Scale, WAIS-R Wechsler Adult Intelligent Scale-Revised, χ2 chi-square
  6. Quality Criteria=see Table 1 Legend for explanation of quality criteria, # methods could be doubtful, students received a course credits for participation. It could be interpreted that there was a certain dependency/necessity to participate, but it was not taken into account by the COSMIN evaluation
  7. Quality Criteria: ‘+’ = sufficient, ‘−’ insufficient, ‘?’ indeterminate. *See Table 1 and Legend for explanation of quality criteria
  8. For criteria of EFA see de Vet et al. 2011 [52], Izquierdo et al. 2014 [61] and Watkins 2018 [62]