Skip to main content

Table 1 The specific evaluation criteria of IVS

From: Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review

Score items

Grade

Specific evaluation criteria

References

Transparency of algorithms

I

Post the trained models that can be directly loaded by other researchers for a contiguous independent validation or online/mobile user-friendly calculators that can allow batch processing of participant information (e.g., a prediction software or tool)

∙ APPRAISE-AI [31]

∙ MI-CLAIM [32]

∙ AI-TREE [33]

II

Apply and report the classic algorithms that can be found in some common tools/platforms OR report complete codes and hyperparameters and required description, allowing independent researchers to run the pipeline end to end

III

Report formulas and/or incomplete hyperparameters without required description, leading to difficulties in replication or incomplete reproducibility

IV

Incomplete reports that cannot be used for reproduction

Performance of models

I

At least report the discrimination (preferably c-index) and calibration (preferably calibration plot/table) of the model, and the performance index version is clearly reported and index is excellent (e.g., 0.9 < c-index <  = 1.0; calibration intercept close to 0 and calibration slope close to 1)

TRIPOD [34]

∙ CHARMS checklist [35]

∙ Official statement [36]

∙ AI-TREE [33]

∙ Expert comment [37]

II

At least report the discrimination (preferably c-index) and calibration (preferably calibration plot/table) of the model, and the performance index version is clearly reported and index is good (e.g., 0.7 < c-index <  = 0.9; calibration intercept deviates moderately from 0, and calibration slope deviates moderately from 1)

III

Do not report the discrimination or calibration of the models; OR the performance index version is not clearly reported; OR the value of the index is unknown

IV

The model performance is at a low accuracy (e.g., c-index <  = 0.7; calibration intercept deviates severely from 0 and calibration slope deviates severely from 1)

Feasibility of reproduction

I

The office-based models without requirement for laboratory and inspection data (also known as non-laboratory models)

∙ Validation and evaluation framework [38]

∙ AI standardization [39]

∙ AI-TREE [33]

∙ MI-CLAIM [32]

∙ CONSORT-AI [40]

∙ MAIC-10 [41]

∙ SR of validity and clinical utility [11]

∙ WHO laboratory-based and non-laboratory models [42]

∙ Laboratory-based and non-laboratory models [43]

II

The laboratory-based models only requiring routine clinical structured data, which are easy to obtain and do not need secondary operation (e.g., image pre-processing or annotation, etc.)

III

Include data derived from unconventional laboratory and inspection, complex gene-related testing, tissue specimen, and other resource-limiting extensive applications, which are hard to obtain or require secondary operation (e.g., labeling)

IV

Do not report the variables

Risk of reproduction

I

No domain high risk (evaluated by using PROBAST)

∙ PROBAST [30]

II

Only one domain is high risk (evaluated by using PROBAST)

III

Two domains are high risk (evaluated by using PROBAST)

IV

Over two domains are high risk (evaluated by using PROBAST)

Clinical implication

I

Identified novel risk markers or novel risk standards, which will optimize existing clinical preventive strategies and contribute to patient benefit for the general population and major CVDs, similar to classical T-Ms (e.g., Framingham Score)

∙ SR of T-Ms [29]

∙ Biomedical research AI guideline [44]

∙ BS30440 [45]

∙ APPRAISE-AI [31]

∙ Consolidated AI reporting guideline [46]

∙ AI-TREE [33]

∙ SR of validity and clinical utility [11]

∙ Rare CVD [47, 48]

II

Do not identify novel risk markers or novel risk standards, but enhance the predictive capacity beyond that of existing methods, which may optimize existing clinical preventive measures or offer additional benefits for the non-rare population and non-rare subset of CVDs (more than 1/2000 of the general population)

III

Only enhance the predictive capacity beyond that of existing methods, but cannot alter the existing preventive interventions or provide additional benefits for the non-rare population and non-rare subset of CVDs (more than 1/2000 of the general population)

IV

Do not enhance the predictive performance beyond that of existing methods OR only target a rare population or subset of CVDs (fewer than 1/2000 of the general population, e.g., infiltrative cardiac diseases), leading to inadequate validation and a lack of clinical utility for a broader population