Skip to main content

Table 6 High ICCs between the model and pathologists across four independent testing cohorts indicate high consistency and comparable performance

From: Deep learning-based six-type classifier for lung cancer and mimics from histopathological whole slide images: a retrospective study

Raters

Six-type classification model (ICCa with 95% CIb)

SYSU1

SYSU2

SZPH

TCGA

Ground truth

0.941(0.691, 0.991)

0.959 (0.776, 0.994)

0.927 (0.453, 0.995)

0.946 (0.715, 0.992)

Pathologist1+++c

0.938 (0.677, 0.991)

0.957 (0.767, 0.994)

0.878 (0.215, 0.991)

0.918 (0.592, 0.988)

Pathologist2++c

0.873 (0.422, 0.981)

0.960 (0.783, 0.994)

0.909 (0.356, 0.994)

0.928 (0.633, 0.989)

Pathologist3++c

0.945 (0.709, 0.992)

0.945 (0.709, 0.992)

0.928 (0.460, 0.995)

0.922 (0.608, 0.988)

Pathologist4+c

0.944 (0.707, 0.992)

0.800 (0.200, 0.969)

0.905 (0.538, 0.986)

0.754 (0.086, 0.961)

P valued

< 0.05

< 0.05

< 0.05

< 0.05

  1. aICCs were computed with the ‘irr’ package for R v3.6.1 using the ‘oneway’ model to measure the reliability and consistency of diagnoses among raters
  2. bCIs were given by bootstrapping the samples 10,000 times
  3. c‘+’ symbols indicate the levels of pathologists, + means junior, ++ means junior attending, and +++ means senior attending
  4. dICC ranges from 0 to 1, and a high ICC suggests a good consistency. Conventionally, when ICC > 0.75 and P < 0.05, high reliability, repeatability, and consistency were indicated