Skip to main content
  • Research article
  • Open access
  • Published:

Deep learning radiomics based on contrast-enhanced ultrasound images for assisted diagnosis of pancreatic ductal adenocarcinoma and chronic pancreatitis

Abstract

Background

Accurate and non-invasive diagnosis of pancreatic ductal adenocarcinoma (PDAC) and chronic pancreatitis (CP) can avoid unnecessary puncture and surgery. This study aimed to develop a deep learning radiomics (DLR) model based on contrast-enhanced ultrasound (CEUS) images to assist radiologists in identifying PDAC and CP.

Methods

Patients with PDAC or CP were retrospectively enrolled from three hospitals. Detailed clinicopathological data were collected for each patient. Diagnoses were confirmed pathologically using biopsy or surgery in all patients. We developed an end-to-end DLR model for diagnosing PDAC and CP using CEUS images. To verify the clinical application value of the DLR model, two rounds of reader studies were performed.

Results

A total of 558 patients with pancreatic lesions were enrolled and were split into the training cohort (n=351), internal validation cohort (n=109), and external validation cohorts 1 (n=50) and 2 (n=48). The DLR model achieved an area under curve (AUC) of 0.986 (95% CI 0.975–0.994), 0.978 (95% CI 0.950–0.996), 0.967 (95% CI 0.917–1.000), and 0.953 (95% CI 0.877–1.000) in the training, internal validation, and external validation cohorts 1 and 2, respectively. The sensitivity and specificity of the DLR model were higher than or comparable to the diagnoses of the five radiologists in the three validation cohorts. With the aid of the DLR model, the diagnostic sensitivity of all radiologists was further improved at the expense of a small or no decrease in specificity in the three validation cohorts.

Conclusions

The findings of this study suggest that our DLR model can be used as an effective tool to assist radiologists in the diagnosis of PDAC and CP.

Peer Review reports

Background

According to Global Cancer Statistics 2020, pancreatic cancer is the seventh leading cause of cancer-related death, with a five-year survival rate of less than 10% [1, 2]. Approximately 85–95% of pancreatic cancer patients have pancreatic ductal adenocarcinoma (PDAC) [3, 4]. Previous studies have shown that pancreatic cancer occurs more frequently in European and North American countries. The etiology is mainly attributed to genetic and environmental factors, especially diet and lifestyle, as well as a combination of factors such as obesity combined with smoking and alcohol [5, 6]. The poor prognosis in pancreatic cancer is due to a late diagnosis or misdiagnosis resulting from an overlap of symptoms with other conditions, such as chronic pancreatitis (CP) [7,8,9,10].

Imaging methods used in PDAC diagnosis include ultrasound (US), multidetector computed tomography (MDCT), magnetic resonance imaging (MRI), and positron emission tomography-computed tomography (PET-CT). Among them, contrast-enhanced ultrasound (CEUS) is convenient, poses no risk of radiation, and provides excellent spatial and temporal resolution to display microcirculatory perfusion of the pancreatic mass with parenchyma [11,12,13,14,15,16]. Moreover, studies have shown that PDAC can be distinguished from CP by comparing the enhancement intensity of the lesion to the pancreatic parenchyma during the venous phase [17,18,19]. However, the diagnostic performance of CEUS is largely dependent on the experience of radiologists. Furthermore, subjective imaging features and persistent inter- and intra-observer variability remain challenging factors in the interpretation of CEUS images [20, 21]. At present, there are few human experts who can consistently diagnose pancreatic disorders based on CEUS.

Radiomics is a method that extracts high-throughput quantitative features from medical images, which primarily use two analytical strategies in artificial intelligence (AI), machine learning, and deep learning [22,23,24,25,26,27]. The feasibility of radiomics in the diagnosis of PDAC has been demonstrated using MRI, computed tomography (CT), and endoscopic ultrasonography (EUS) images. Deng et al. [28] proposed a multi-parameter MRI radiomics model based on 119 patients with the best area under the curve (AUC) of 0.902 in the validation cohort to distinguish PDAC from CP. Ren et al. [29] verified the ability of texture analysis on unenhanced CT to distinguish PDAC from CP with best accuracy of 0.821. Tonozuka et al. [30] analyzed EUS images of 139 patients to distinguish among PADC, CP, and normal pancreas; the proposed deep learning radiomics (DLR) model achieved AUC of 0.924 and 0.940 in the validation and test cohorts. Although these studies show that the radiomics model can achieve good performance in the identification of PDAC and CP, several common limitations remain unaddressed. First, machine learning-based radiomics studies require labor-intensive and time-consuming lesion delineation, which inevitably is influenced by inter-and intra-operator reproducibility, especially in US images with unclear boundary definition [23]. Second, these studies did not investigate the actual benefits of using radiomics in real diagnostic scenarios for radiologists. Third, the feasibility of radiomics using CEUS imaging in diagnosing PDAC remains unverified.

This study was designed considering these limitations and aimed to (1) develop a DLR model for the automatic and accurate diagnosis of PDAC and CP using CEUS images and (2) validate the applicability of the DLR model as an effective tool to assist radiologists in the diagnosis of PDAC and CP. Additionally, the effect of this DLR model in assisting radiologists in decision-making is measured to assess its real clinical benefits. A two-round reader study with five radiologists was conducted to compare the diagnostic performance between the model and radiologists. More importantly, the ability of the model in assisting different radiologists identify PDAC and CP was investigated, which demonstrated its potential usefulness in real clinical practices.

Methods

Patients

This retrospective multicenter study was conducted using data from three hospitals in China (Hospital 1: First Affiliated Hospital, Zhejiang University School of Medicine; Hospital 2: Cancer Hospital of the University of Chinese Academy of Sciences; Hospital 3: West China Hospital, Sichuan University). It was conducted in accordance with the Declaration of Helsinki and approved by the ethics committee of each participating hospital. The requirement for informed consent was waived owing to the retrospective study design. This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) guidelines for diagnostic studies.

The inclusion criteria were (I) patients with pathologically confirmed CP (followed up for at least 6 months without progression to pancreatic cancer) or PDAC without distant metastasis, (II) patients whose CEUS examination was performed within three days before biopsy and surgery, and (III) availability of CEUS video or CEUS images. The exclusion criteria were (I) multiple lesions in the pancreas, (II) with a history of pancreatic surgery or chemotherapy, and (III) inadequate CEUS image quality. All histopathological findings were confirmed by pathologists with more than 10 years of experience in pancreatic pathology.

Data derived from Hospital 1 with the largest number of enrolled patients were used as the primary cohort to reduce overfitting or bias in the analysis. In this study, patients of Hospital 1 were enrolled between January 2020 to April 2021. We selected the patients admitted in 2021 as the internal validation cohort and the patients admitted in 2020 as the training cohort. Data from Hospitals 2 and 3 were used as independent external validation cohorts. The detailed research process is illustrated in Fig. 1. Baseline characteristics including age, sex, lesion location and size, histological type, and carbohydrate antigen 19-9 (CA19-9) and carcinoembryonic antigen (CEA) levels were collected from the hospital database.

Fig. 1
figure 1

Retrospective workflow. CEUS, contrast-enhanced ultrasound; PDAC, pancreatic ductal adenocarcinoma; CP, chronic pancreatitis

Contrast-enhanced ultrasound image acquisition

Four different US devices (MyLab 90, ESAOTE, Italy; Aloka, HITACHI, Japan; LIGIQ E20, GE, USA; Resona 7, Mindray, China) equipped with an abdominal probe were used to capture the CEUS videos and/or images. Examinations were performed by one of the six radiologists with over 10 years of experience in abdominal CEUS. Before each examination, the proper contrast mode, including gain, depth, acoustic window, mechanical index, and focal zone, were adjusted. First, 2.4 mL of the contrast agent (SonoVue®; Bracco, Milan, Italy) was injected, followed by a 5-mL saline flush. The timer was started simultaneously when the contrast agent was being injected. Subsequently, the probe was kept in a stable state for 120 s to detect the pancreatic lesion and the surrounding pancreatic parenchyma. Finally, the video was recorded in Dicom format.

In this study, only one key CEUS image of each patient was finally selected for analysis. CEUS images of pancreatic lesions were mainly divided into three phases: vascular phase (0–30 s), pancreatic phase (31–60 s), and delayed phase (61–120 s) [14, 18]. Previous studies have shown that diagnosis of PDAC and CP using CEUS is mainly based on different enhancement patterns of the lesions. Studies have confirmed that during the pancreatic phase (30–40 s), the enhancement pattern could be high enhancement, equal enhancement, or low enhancement depending on the contrast of enhancement intensity between lesions and pancreatic parenchyma [13, 14, 31, 32]. Based on the above principles, we developed the criteria for the selection of key CEUS images. Owing to the retrospective nature of the study, dynamic CEUS video data of all patients were not completely preserved (half of the patients had no video). For maximal use of the existing data, image selection mainly included two schemes. For cases without dynamic video, 15–20 images were generally retained in the workstation during routine clinical work of CEUS examination in three participating hospitals, including important static CEUS images of three different phases. A typical static CEUS image of the pancreatic phase was selected for analysis, which showed the maximum diameter of the lesion at approximately the 35th second in duration. For cases with dynamic video, we directly selected the single frame around the 35th second in the dynamic video as a typical CEUS image for model development after preprocessing.

Region of interest extraction and preprocessing

The raw CEUS images were obtained by selecting the key frame from the CEUS videos or existing raw CEUS images extracted from the CEUS videos. Since two-dimensional (2D) grayscale US and CEUS images were displayed simultaneously in one view (Additional file 1: Fig. S1), we defined a rectangular region of interest (ROI) covering the lesion on the raw CEUS image, to eliminate the interference of irrelevant information from the image and non-lesion areas. The radiologist first determined the lesion area according to the 2D grayscale images in the raw CEUS images, following which the ROI was marked at the same location on the CEUS images. The open-source software labelme was used to label the ROI with a rectangular bounding box, and then the ROI image was cropped from the CEUS image [33]. In principle, the ROI image included the lesion and surrounding tissues. After the ROI extraction, further preprocessing was performed to obtain the resized and grayscale ROI images for model development. All colored ROI images were converted to greyscale, considering the color difference of the CEUS images collected from different US devices (Additional file 1: Fig. S2) and the minimal correlation between the enhancement pattern and color to improve the robustness of the DLR model for different equipment. Thus, only the distribution of the image gray values could affect the DLR model output. Finally, the grayscale ROI images were resized to 224×224 and inputted into the DLR model. The ROI extraction and preprocessing workflow is shown in Fig. 2.

Fig. 2
figure 2

Workflow of ROI extraction and preprocessing and our DLR model. The ROI image is extracted from the raw CEUS video, if available; otherwise it is extracted directly from the existing CEUS images. The resized and grayscale ROI images are fed into our model which outputs the AI score and heatmap for each lesion. The radiologists provide an initial decision on each lesion and then adjust their decisions, if uncertain, based on the additional information provided by the DLR model. CEUS, contrast-enhanced ultrasound; PDAC, pancreatic ductal adenocarcinoma; CP, chronic pancreatitis; ROI, region of interest; AI, artificial intelligence; DLR, deep learning radiomics

Deep learning radiomics model development

The DLR model was based on the Resnet-50 [34] backbone to extract deep learning features for classification (Fig. 2). Two fully connected layers with outputs of 512 and 2 neurons, respectively, and a softmax activation layer were placed on top of the convolutional layers to generate the AI scores for PDAC and CP. Using the softmax activation layer can give the AI score the meaning of probability, ensuring that the sum of the AI score in PDAC and CP categories for one lesion is 1. The dropout layer with a probability of 0.5 was added between every two fully connected layers to alleviate overfitting. Additional file 1: Table S1 illustrates the detailed architecture of our DLR model. We also tested other typical image classification backbones, including Inception-v3 [35], VGG-16 [36], and Densenet-121 [37]. The performance between different networks was very small in every cohort (Additional file 1: Fig. S3). Because Resnet-50 achieved the highest AUC in most validation cohorts, we chose Resnet-50 as the backbone for feature extraction. The detailed training process is provided in Additional file 1: Method S1 [38, 39].

Two-round reader study

A two-round reader study was conducted to investigate the clinical benefits radiologists actually obtained through the assistance of the DLR model (Fig. 2). Five radiologists with an average of 9 years of CEUS experience (3–15 years) participated in this study. A total of 207 lesions (150 positives) from the internal validation cohort and the external validation cohorts 1 and 2 were presented in random order. During the whole process, the radiologists were blinded to each other, the original diagnostic reports, and the final pathology results. The details of the two-round reader study are provided in Additional file 1: Method S2 [40].

Statistical analysis

Statistical analyses were performed using SPSS (version 23.0; IBM Corp., Armonk, NY, USA) and Python 3.7. Continuous variables were described as mean and standard deviation (SD), and categorical variables, as number and percentage. Between-group comparisons were performed using the Student’s t-test or Mann–Whitney U test for quantitative variables and the chi-squared test for qualitative variables. The 95% confidence interval (CI) was calculated using bootstrapping with 2000 resamples. The McNemar’s test was used to calculate whether the DLR model and the radiologists had significant differences in sensitivity and specificity. All statistical analyses were two-sided with statistical significance set at P <.05.

Results

Clinical data

In total, 558 patients with pancreatic lesions were enrolled (Fig. 1). Pathological findings showed PDAC lesions in 414 cases and CP lesions in 144 cases. Table 1 summarizes the detailed patient demographics and pancreatic lesion characteristics.

Table 1 Patient demographics and characteristics of pancreatic lesions

Comparison between deep learning radiomics model and radiologists

The radiologists’ decisions from the first-round reading were compared with the DLR model. The receiver operator characteristic (ROC) curve of the DLR model, the diagnoses of each radiologist, and the average diagnostic results of all radiologists of the different cohorts are shown in Fig. 3. Our DLR model achieved a high AUC of 0.986 (95% CI 0.975–0.994), 0.978 (95% CI 0.950–0.996), 0.967 (95% CI 0.917–1.000), and 0.953 (95% CI 0.877–1.000) in the training, internal validation, and external validation cohorts 1 and 2, respectively. The sensitivity of internal validation, external validation cohort 1, and external validation cohort 2 were 97.3% (95% CI 93.2%–100%), 87.2% (95% CI 76.3%–97.2%), and 0.974 (95% CI 0.914–1.000); and the specificity values were 83.3% (95% CI 70.0%–94.3%), 100% (95% CI 100%–100%), and 70.0% (95% CI 37.5%–100%), respectively. The sensitivity and specificity results were based on the operation point of 0.5 [41]. The confusion matrices of DLR model are presented in Additional file 1: Fig. S4. Diagnoses of the five radiologists were either worse or comparable to those of the model. This is demonstrated by almost no green point reaching the upper left region of the ROC curve. Furthermore, average of all three reader diagnoses in the validation cohorts were located below the ROC curve of the model (Fig. 3, green crosses), revealing that our model was superior to the radiologists in general. The confusion matrices of the comprehensive diagnoses from the five readers without DLR assistance are presented in Additional file 1: Fig. S4.

Fig. 3
figure 3

Comparison between performance of the DLR model and radiologists. The figure shows the identification of PDAC and CP in the training cohort, internal validation cohort, and external validation cohorts 1 and 2 using the DLR model and by individual radiologists. The performance of our DLR model is compared with each of the five readers and the average reader. DLR, deep learning radiomics; AUC, area under the curve; PDAC, pancreatic ductal adenocarcinoma; CP, chronic pancreatitis

For a more specific comparison, we also compared the sensitivity and specificity between the model and each radiologist. For fairness, we adjusted the operation point of the DLR model so that the specificity (sensitivity) matched the specificity (sensitivity) of each radiologist when comparing sensitivity (specificity). Since radiologists provide direct qualitative classification reports, sensitivity and specificity are fixed. The sensitivity and specificity of DLR model can be changed by adjusting the classification threshold. Based on the above principles, we achieved a specific comparison between the diagnostic performance of DLR model and radiologists. Detailed results are shown in Additional file 1: Table S2. In the internal validation cohort, the DLR model achieved better sensitivity and specificity than all radiologists, with a significantly higher sensitivity than three out of the five radiologists (P <.05 for Reader-1, Reader-2, and Reader-5) and a significantly higher specificity than three out of the five radiologists (P <.05 for Reader-2, Reader-3, and Reader-5). In the external validation cohort 1, the DLR model also achieved better sensitivity and specificity than all radiologists, with a significantly higher sensitivity than two out of the five radiologists (P <.05 for Reader-2 and Reader-5) and significantly higher specificity than Reader-1 (P <.05). In the external validation cohort 2, the DLR model achieved better sensitivity and specificity than all radiologists, except Reader-3. It showed a significantly higher sensitivity than two out of the five radiologists (P <.05 for Reader-2 and Reader-5), but not a significantly higher specificity.

Enhanced diagnosis with AI assistance

The change in diagnoses given by the five radiologists before and after AI assistance were analyzed in the two-round reader study. Detailed changes in their decision, sensitivity, and specificity are shown in Table 2; and the confusion matrices of each radiologist without and with AI assistance are shown in the Additional file 1: Figs. S5 and S6. In the internal validation cohort, all radiologists achieved higher sensitivity, and four out of the five radiologists achieved higher specificity with AI assistance. Three and two of five radiologists had a significant improvement in sensitivity (P <.05 for Reader-1, Reader-2, and Reader-4) and specificity (P<.05 for Reader-2 and Reader-4), respectively. In external validation cohort 1, all radiologists achieved higher sensitivity, and two out of the five radiologists achieved higher specificity with AI assistance. In external validation cohort 2, all radiologists achieved higher sensitivity, and one out of the five radiologists achieved higher specificity with AI assistance. Reader-5 had a significantly higher sensitivity than the first-round results (P<.05). In all three validation cohorts, we found a positive effect of the DLR model in assisting radiologists to enhance their average accuracy (Fig. 3, orange points and crosses). Additionally, the confusion matrices of the comprehensive diagnoses of the five radiologists with AI assistance are given in the Additional file 1: Fig. S4.

Table 2 Summary of the changes in the decision-making of radiologists before and after AI assistance

To illustrate the clinical value of our DLR model more vividly, some successful and unsuccessful examples where radiologists changed their first-round decisions due to AI assistance are shown in Figs. 4 and 5. Although AI scores and heatmaps given by the DLR model misled the radiologists’ decisions in some cases, the total scores of the five radiologists for all lesions in the validation cohorts before and after DLR assistance exhibited a clear trend of enhanced diagnostic performance (Fig. 6). The total score was calculated as follows: if a patient was identified as a PDAC case by a radiologist, one point was awarded. Therefore, for five reads, the highest score was five, and the lowest score was zero. The higher the score, the more experts believed that the lesion was PDAC. The total scores demonstrated that a systematic improvement of the diagnostic accuracy was achieved in both PDAC and CP groups for all human experts with the help of the DLR model.

Fig. 4
figure 4

Typical cases of our DLR model guiding radiologists to make correct decisions. The top panel shows two PDAC lesions. Most radiologists consider these lesions as CP lesions in the first reading, but 100% accuracy is achieved with access to the additional information generated from the DLR model. In these two cases, the score of the DLR model for PDAC is significantly higher than that of CP, and the area of the highlighted regions is large in the heatmaps. Most of them are distributed inside the tumor, which is consistent with the regular pattern of the PDAC lesions found. The bottom panel shows two CP lesions. Most radiologists consider these lesions as PDAC lesions in the first reading, and 100% accuracy is achieved with access to the additional information generated from the DLR model. In these two cases, the DLR model scores CP significantly higher than PDAC, and the area of the highlighted regions is small in the heatmaps and mostly distributed at the boundary of the ROI image, which is consistent with the regular pattern of CP lesions found. PDAC, pancreatic ductal adenocarcinoma; CP, chronic pancreatitis; ROI: region of interest; AI, artificial intelligence; DLR, deep learning radiomics

Fig. 5
figure 5

Typical cases of our DLR model that misled radiologists to make incorrect decisions. The top panel shows two PDAC lesions. All radiologists consider these two lesions to be PDAC lesions in the first reading. However, with access to the information from the DLR model, Reader-5 changed to the correct decision, considering them as CP lesions. In these two cases, the score of DLR model for PDAC is significantly higher than that of CP, and the area of the highlighted regions in the heatmaps are large and mostly distributed inside the tumor, which is consistent with other PDAC cases. Since Reader-5 is a junior radiologist, we believe that Reader-5’s mistakes may be due to lack of experience or carelessness. The bottom panel shows two CP lesions, which are inconsistent with the diagnosis of the radiologists in the first reading. However, with access to the information provided by the DLR model, all radiologists make the wrong decision. In these two cases, the misjudgment in the first case may be due to the large highlighted area of the generated heatmap, which is relatively rare in CP lesions, although most of the highlighted areas are still located at the boundary of the image. In the second case, the PDAC score with the DLR model is significantly higher than that of CP, which represents a case of AI misjudgment, thus misleading the radiologists. PDAC, pancreatic ductal adenocarcinoma; CP, chronic pancreatitis; ROI, region of interest; AI, artificial intelligence; DLR, deep learning radiomics

Fig. 6
figure 6

A summary of the total scores from five radiologists before and after DLR model assistance for every lesion in the validation cohorts. The red and green circles indicate the total score without and with DLR model assistance, respectively. The blue circles indicate that the lesion has the same score before and after AI assistance. The arrows indicate the trend of the total score after AI assistance. The total score is obtained by the sum of the scores of five radiologists individually. If a radiologist believes that a lesion is PDAC, it is scored as one point leading to a maximum score of five. The higher the score, the more experts believe that the lesion is PDAC. PDAC, pancreatic ductal adenocarcinoma; CP, chronic pancreatitis; AI, artificial intelligence; DLR, deep learning radiomics

Noticeably, the heatmaps generated by gradient-weighted class activation mapping for model visualization had different patterns in PDAC and CP images [40]. More specifically, the highlighted region for PDAC cases was greater than that of CP cases in the heatmaps, and most of those regions were located inside the lesions. In contrast, highlighted regions were mainly distributed at the boundary of the lesion in CP heatmaps. Additionally, radiologists noticed that for PDAC lesions, the highlighted regions were mainly distributed in the low-enhancement area inside the tumor, frequently adjacent to a high-enhancement region. Some heatmap examples of ROI images for PDAC and CP are shown in Fig. 7.

Fig. 7
figure 7

Examples of heatmaps generated by our DLR model for PDAC and CP lesions. Generally, the highlighted area of the PDAC lesions is larger than that of the CP lesions, and most of them are distributed inside the tumor. The highlighted areas are dominated by low-enhancement regions with adjacent high-enhancement regions around. The highlighted regions of the CP lesions are mainly distributed at the boundary of the image. This may be due to the lack of PDAC features in the center of the ROI. PDAC, pancreatic ductal adenocarcinoma; CP, chronic pancreatitis; ROI, region of interest; DLR, deep learning radiomics

Discussion

In this study, we attempted for the first time to investigate the performance of CEUS-based DLR in the diagnosis of PDAC and CP. Compared with human experts, our model achieved an overall better performance in all validation cohorts. Furthermore, we demonstrated that by incorporating AI scores and heatmaps, radiologists improved their decision-making, revealing the clinical value of applying the DLR model in clinical practice. Compared with other radiomics studies, a major highlight here was the use of the two-round reader investigation with five radiologists based on multicenter data.

The performance of our DLR model based on the CEUS images was better than or comparable to that of different models using other modalities, including MDCT, MRI, PET-CT, and EUS [28,29,30, 42]. This could be due to two possible reasons. First, compared with machine learning methods used in most of these studies [28, 29], the DLR model can automatically learn the adaptive features based on a specific task (effective identification of PDAC and CP) and it is flexible. Second, the diagnostic value of CEUS for PDAC has been demonstrated in previous studies [13, 43,44,45], confirming that the enhancement pattern in the lesion area contributes to qualitative diagnosis. Thus, it may contribute more to quantitative diagnosis.

Our DLR model achieved significantly higher, higher, or comparable sensitivity and specificity compared with the five radiologists in our first-round reader study. Although radiologists can identify lesions based on enhancement patterns, PDAC and CP may be difficult to distinguish when they exhibit similar CEUS enhancement patterns, mainly due to the presence of abundant fibrous tissue within PDAC lesions or necrosis within CP lesions. The DLR model can further learn and use high-level abstract features that are unrecognizable to humans to identify PDAC and CP, thus surpassing the diagnostic performance of human experts [24, 46,47,48].

Furthermore, we explored the benefits that radiologists actually obtained from the DLR assistance in clinical practice. We believe this is particularly important because DLR models will play a supporting role in the foreseeable future. Although AI and radiomics models have their superiorities, human experts would still make the final decision. One major reason is that the interpretability of deep learning features is still in its infancy [49, 50], and the biological mechanism behind these radiomics features remain underexplored. However, this should not stop radiologists from utilizing radiomics methods to enhance their diagnosis. In our design, AI scores notified radiologists about patients with different diagnoses between them and quantitative computer analysis. Heatmaps offered extra information for guiding their attention to the highlighted areas in the CEUS images so that they re-evaluated images more efficiently to decide whether to re-evaluate their decision. With this assisting strategy in the second-round image reading, human experts showed an overall increase in sensitivity to PDAC assessment with little or no loss of specificity.

We can understand how they helped radiologists effectively by investigating AI scores and heatmaps more thoroughly. The AI score can be regarded as the predicted probability of PDAC and CP by the DLR model. As can be seen from the frequency distribution histogram in Additional file 1: Fig. S7, we found that our DLR model provided a large ratio of extreme AI scores (e.g., greater than 0.9 for PDAC and less than 0.1 for CP lesions). As shown in Figs. 4 and 5, when the model provides an extreme score and strongly suggested the lesion are PDAC or CP, the AI score itself served as a strong indicator signal to the radiologists. The small ratio of ambiguous AI scores certainly helped with this “alarm” effect. Furthermore, heatmaps generated by DLR model reflected different patterns in the PDAC and CP lesions. For PDAC lesions, the highlighted areas were more concentrated in the low-enhancement region adjacent to the high enhancement area within the tumor, likely because of the DLR model learning key features from low-enhancement patterns related with less microvascular density, abundant fibrous tissue, and large amounts of necrotic tissue [51,52,53,54]. For CP lesions, since the model did not find important features towards PDAC, the highlighted area was relatively small and mainly distributed at the boundary of the ROI [55,56,57,58,59]. Therefore, the “alarm” effect and interpretable heatmap patterns together assisted radiologists to achieve real diagnostic benefits effectively.

Another potential clinical value of the DLR model is that it may help junior radiologists more effectively. Although all radiologists obtained positive assistance from the model, Reader-5, the junior radiologist, benefited the most. Therefore, this approach holds the potential to steepen the learning curve of radiologists with less experience.

Our study had several limitations. First, although this was a multicenter study, the dataset was not large, especially for the external validation cohort. Second, owing to the retrospective nature of the study, we did not use CEUS videos, which probably weakened the performance of the DLR strategy [14, 18]. Nevertheless, the strong performance of our model was sufficient to show that the use of static CEUS images provided effective clinical assistance.

Conclusion

A DLR model for the diagnosis of PDAC and CP was developed from a multicenter retrospective dataset based on CEUS images. Further, a two-round reader study demonstrated that the model was effective in assisting radiologists to improve diagnosis.

Availability of data and materials

The datasets analyzed during the current study are not publicly available due to the metadata containing information that could compromise the patients but are available from the corresponding author on reasonable request.

Abbreviations

PDAC:

Pancreatic ductal adenocarcinoma

CP :

Chronic pancreatitis

DLR:

Deep learning radiomics

CEUS:

Contrast-enhanced ultrasound

AUC:

Area under the curve

US:

Ultrasound

MDCT:

Multidetector computed tomography

MRI:

Magnetic resonance imaging

PET-CT:

Positron emission tomography-computed tomography

CEUS:

Contrast-enhanced ultrasound

CT:

Computed tomography

AI:

Artificial intelligence

STARD:

Standards for Reporting of Diagnostic Accuracy

CA19-9:

Carbohydrate antigen 19-9

CEA:

Carcinoembryonic antigen

2D:

Two-dimensional

SDs:

Standard deviations

CI:

Confidence interval

ROC:

Receiver operator characteristic

ROI:

Region of interest

References

  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209–49.

    Article  PubMed  Google Scholar 

  2. Rahib L, Smith BD, Aizenberg R, Rosenzweig AB, Fleshman JM, Matrisian LM. Projecting cancer incidence and deaths to 2030: the unexpected burden of thyroid, liver, and pancreas cancers in the United States. Cancer Res. 2014;74(11):2913–21.

    Article  CAS  PubMed  Google Scholar 

  3. Hariharan D, Saied A, Kocher HM. Analysis of mortality rates for pancreatic cancer across the world. HPB (Oxford). 2008;10(1):58–62.

    Article  CAS  Google Scholar 

  4. Brown ZJ, Cloyd JM. Surgery for pancreatic cancer: recent progress and future directions. Hepatobiliary Surg Nutr. 2021;10(3):376–8.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Huang J, Lok V, Ngai CH, Zhang L, Yuan J, Lao XQ, et al. Worldwide Burden of, Risk Factors for, and Trends in Pancreatic Cancer. Gastroenterology. 2021;160(3):744–54.

    Article  PubMed  Google Scholar 

  6. Hensrud DD, Heimburger DC. Diet, nutrients, and gastrointestinal cancer. Gastroenterol Clin North Am. 1998;27(2):325–46.

    Article  CAS  PubMed  Google Scholar 

  7. Chen R, Pan S, Cooke K, Moyes KW, Bronner MP, Goodlett DR, et al. Comparison of pancreas juice proteins from cancer versus pancreatitis using quantitative proteomic analysis. Pancreas. 2007;34(1):70–9.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Lowenfels AB, Maisonneuve P, Cavallini G, Ammann RW, Lankisch PG, Andersen JR, et al. Pancreatitis and the risk of pancreatic cancer. International Pancreatitis Study Group. N Engl J Med. 1993;328(20):1433–7.

    Article  CAS  PubMed  Google Scholar 

  9. Malka D, Hammel P, Maire F, Rufat P, Madeira I, Pessione F, et al. Risk of pancreatic adenocarcinoma in chronic pancreatitis. Gut. 2002;51(6):849–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Ferlay J, Partensky C, Bray F. More deaths from pancreatic cancer than breast cancer in the EU by 2017. Acta Oncol. 2016;55(9-10):1158–60.

    Article  CAS  PubMed  Google Scholar 

  11. D'Onofrio M, Barbi E, Dietrich CF, Kitano M, Numata K, Sofuni A, et al. Pancreatic multicenter ultrasound study (PAMUS). Eur J Radiol. 2012;81(4):630–8.

    Article  PubMed  Google Scholar 

  12. Ozawa Y, Numata K, Tanaka K, Ueno N, Kiba T, Hara K, et al. Contrast-enhanced sonography of small pancreatic mass lesions. J Ultrasound Med. 2002;21(9):983–91.

    Article  PubMed  Google Scholar 

  13. Grossjohann HS, Rappeport ED, Jensen C, Svendsen LB, Hillingsø JG, Hansen CP, et al. Usefulness of contrast-enhanced transabdominal ultrasound for tumor classification and tumor staging in the pancreatic head. Scand J Gastroenterol. 2010;45(7-8):917–24.

    Article  PubMed  Google Scholar 

  14. Tanaka S, Fukuda J, Nakao M, Ioka T, Ashida R, Takakura R, et al. Effectiveness of contrast-enhanced ultrasonography for the characterization of small and early stage pancreatic adenocarcinoma. Ultrasound Med Biol. 2020;46(9):2245–53.

    Article  PubMed  Google Scholar 

  15. Kobayashi A, Yamaguchi T, Ishihara T, Tadenuma H, Nakamura K, Saisho H. Evaluation of vascular signal in pancreatic ductal carcinoma using contrast enhanced ultrasonography: effect of systemic chemotherapy. Gut. 2005;54(7):1047.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Piscaglia F, Bolondi L. The safety of Sonovue in abdominal applications: retrospective analysis of 23188 investigations. Ultrasound Med Biol. 2006;32(9):1369–75.

    Article  Google Scholar 

  17. D'Onofrio M, Crosara S, Signorini M, De Robertis R, Canestrini S, Principe F, et al. Comparison between CT and CEUS in the diagnosis of pancreatic adenocarcinoma. Ultraschall Med. 2013;34(4):377–81.

    CAS  PubMed  Google Scholar 

  18. Xu J, Zhang M, Cheng G. Comparison between B-mode ultrasonography and contrast-enhanced ultrasonography for the surveillance of early stage pancreatic cancer: a retrospective study. J Gastrointest Oncol. 2020;11(5):1090–7.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Takeshima K, Kumada T, Toyoda H, Kiriyama S, Tanikawa M, Ichikawa H, et al. Comparison of IV contrast-enhanced sonography and histopathology of pancreatic cancer. AJR Am J Roentgenol. 2005;185(5):1193–200.

    Article  PubMed  Google Scholar 

  20. Ryu SW, Bok GH, Jang JY, Jeong SW, Ham NS, Kim JH, et al. Clinically useful diagnostic tool of contrast enhanced ultrasonography for focal liver masses: comparison to computed tomography and magnetic resonance imaging. Gut Liver. 2014;8(3):292–7.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Muskiet MHA, Emanuel AL, Smits MM, Tonneijck L, Meijer RI, Joles JA, et al. Assessment of real-time and quantitative changes in renal hemodynamics in healthy overweight males: Contrast-enhanced ultrasonography vs para-aminohippuric acid clearance. Microcirculation. 2019;26(7):e12580.

    Article  PubMed  Google Scholar 

  22. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48(4):441–6.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Wang K, Lu X, Zhou H, Gao Y, Zheng J, Tong M, et al. Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut. 2019;68(4):729–41.

    Article  CAS  PubMed  Google Scholar 

  24. Qian X, Pei J, Zheng H, Xie X, Yan L, Zhang H, et al. Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat Biomed Eng. 2021;5(6):522–32.

    Article  PubMed  Google Scholar 

  25. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama. 2016;316(22):2402–10.

    Article  PubMed  Google Scholar 

  26. Ma QP, He XL, Li K, Wang JF, Zeng QJ, Xu EJ, et al. Dynamic contrast-enhanced ultrasound radiomics for hepatocellular carcinoma recurrence prediction after thermal ablation. Mol Imaging Biol. 2021;23(4):572–85.

    Article  CAS  PubMed  Google Scholar 

  27. Gu J, Tong T, He C, Xu M, Yang X, Tian J, et al. Deep learning radiomics of ultrasonography can predict response to neoadjuvant chemotherapy in breast cancer at an early stage of treatment: a prospective study. Eur Radiol. 2021. Online ahead of print.

  28. Deng Y, Ming B, Zhou T, Wu JL, Chen Y, Liu P, et al. Radiomics model based on MR images to discriminate pancreatic ductal adenocarcinoma and mass-forming chronic pancreatitis lesions. Front Oncol. 2021;11:620981.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Ren S, Zhao R, Zhang J, Guo K, Gu X, Duan S, et al. Diagnostic accuracy of unenhanced CT texture analysis to differentiate mass-forming pancreatitis from pancreatic ductal adenocarcinoma. Abdom Radiol (NY). 2020;45(5):1524–33.

    Article  Google Scholar 

  30. Tonozuka R, Itoi T, Nagata N, Kojima H, Sofuni A, Tsuchiya T, et al. Deep learning analysis for the detection of pancreatic cancer on endosonographic images: a pilot study. J Hepatobiliary Pancreat Sci. 2021;28(1):95–104.

    Article  PubMed  Google Scholar 

  31. Wang Y, Yan K, Fan Z, Ding K, Yin S, Dai Y, et al. Clinical value of contrast-enhanced ultrasound enhancement patterns for differentiating focal pancreatitis from pancreatic carcinoma: a comparison study with conventional ultrasound. J Ultrasound Med. 2018;37(3):551–9.

    Article  PubMed  Google Scholar 

  32. Dietrich CF, Braden B, Hocke M, Ott M, Ignee A. Improved characterisation of solitary solid pancreatic tumours using contrast enhanced transabdominal ultrasound. J Cancer Res Clin Oncol. 2008;134(6):635–43.

    Article  CAS  PubMed  Google Scholar 

  33. Wada K. Labelme: Image polygonal annotation with python. GitHub repository. 2016.

  34. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. arXiv preprint arXiv:151203385. 2015.

  35. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:151200567. 2015.

  36. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014.

  37. Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely connected convolutional networks. arXiv preprint arXiv:160806993. 2016.

  38. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. Imagenet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211–52.

    Article  Google Scholar 

  39. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014.

  40. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision: 2017; 2017. p. 618–26.

    Google Scholar 

  41. Guo X, Liu Z, Sun C, Zhang L, Wang Y, Li Z, et al. Deep learning radiomics of ultrasonography: identifying the risk of axillary non-sentinel lymph node involvement in primary breast cancer. EBioMedicine. 2020;60:103018.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Norton ID, Zheng Y, Wiersema MS, Greenleaf J, Clain JE, Dimagno EP. Neural network analysis of EUS images to differentiate between pancreatic malignancy and pancreatitis. Gastrointest Endosc. 2001;54(5):625–9.

    Article  CAS  PubMed  Google Scholar 

  43. Li XZ, Song J, Sun ZX, Yang YY, Wang H. Diagnostic performance of contrast-enhanced ultrasound for pancreatic neoplasms: a systematic review and meta-analysis. Dig Liver Dis. 2018;50(2):132–8.

    Article  PubMed  Google Scholar 

  44. Ran L, Zhao W, Zhao Y, Bu H. Value of contrast-enhanced ultrasound in differential diagnosis of solid lesions of pancreas (SLP): a systematic review and a meta-analysis. Medicine (Baltimore). 2017;96(28):e7463.

    Article  Google Scholar 

  45. Vitali F, Pfeifer L, Janson C, Goertz RS, Neurath MF, Strobel D, et al. Quantitative perfusion analysis in pancreatic contrast enhanced ultrasound (DCE-US): a promising tool for the differentiation between autoimmune pancreatitis and pancreatic cancer. Z Gastroenterol. 2015;53(10):1175–81.

    Article  CAS  PubMed  Google Scholar 

  46. Che H, Li J, Li Y, Ma C, Liu H, Qin J, et al. p16 deficiency attenuates intervertebral disc degeneration by adjusting oxidative stress and nucleus pulposus cell cycle. Elife. 2020;9:52570.

  47. Bronstein YL, Loyer EM, Kaur H, Choi H, David C, DuBrow RA, et al. Detection of small pancreatic tumors with multiphasic helical CT. AJR Am J Roentgenol. 2004;182(3):619–23.

    Article  PubMed  Google Scholar 

  48. Yoon SH, Lee JM, Cho JY, Lee KB, Kim JE, Moon SK, et al. Small (≤ 20 mm) pancreatic adenocarcinomas: analysis of enhancement patterns and secondary signs with multiphasic multidetector CT. Radiology. 2011;259(2):442–52.

    Article  PubMed  Google Scholar 

  49. Castelvecchi D. Can we open the black box of AI? Nature. 2016;538(7623):20–3.

    Article  CAS  PubMed  Google Scholar 

  50. Wang S, Liu Z, Rong Y, Zhou B, Bai Y, Wei W, et al. Deep learning provides a new computed tomography-based prognostic biomarker for recurrence prediction in high-grade serous ovarian cancer. Radiother Oncol. 2019;132:171–7.

    Article  PubMed  Google Scholar 

  51. Ashida R, Tanaka S, Yamanaka H, Okagaki S, Nakao K, Fukuda J, et al. The role of transabdominal ultrasound in the diagnosis of early stage pancreatic cancer: review and single-center experience. Diagnostics (Basel). 2018;9(1):2.

  52. Tanaka S, Nakaizumi A, Ioka T, Takakura R, Uehara H, Nakao M, et al. Periodic ultrasonography checkup for the early detection of pancreatic cancer: preliminary report. Pancreas. 2004;28(3):268–72.

    Article  PubMed  Google Scholar 

  53. Tanaka S, Nakaizumi A, Ioka T, Oshikawa O, Uehara H, Nakao M, et al. Main pancreatic duct dilatation: a sign of high risk for pancreatic cancer. Jpn J Clin Oncol. 2002;32(10):407–11.

    Article  PubMed  Google Scholar 

  54. Tanaka S, Nakao M, Ioka T, Takakura R, Takano Y, Tsukuma H, et al. Slight dilatation of the main pancreatic duct and presence of pancreatic cysts as predictive signs of pancreatic cancer: a prospective study. Radiology. 2010;254(3):965–72.

    Article  PubMed  Google Scholar 

  55. Dong Y, D'Onofrio M, Hocke M, Jenssen C, Potthoff A, Atkinson N, et al. Autoimmune pancreatitis: imaging features. Endosc Ultrasound. 2018;7(3):196–203.

    Article  PubMed  Google Scholar 

  56. Hocke M, Ignee A, Dietrich CF. Contrast-enhanced endoscopic ultrasound in the diagnosis of autoimmune pancreatitis. Endoscopy. 2011;43(2):163–5.

    Article  CAS  PubMed  Google Scholar 

  57. Yamashita Y, Kato J, Ueda K, Nakamura Y, Kawaji Y, Abe H, et al. Contrast-enhanced endoscopic ultrasonography for pancreatic tumors. Biomed Res Int. 2015;2015:491782.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Ardelean M, Şirli R, Sporea I, Bota S, Martie A, Popescu A, et al. Contrast enhanced ultrasound in the pathology of the pancreas - a monocentric experience. Med Ultrason. 2014;16(4):325–31.

    PubMed  Google Scholar 

  59. Fan Z, Li Y, Yan K, Wu W, Yin S, Yang W, et al. Application of contrast-enhanced ultrasound in the diagnosis of solid pancreatic lesions--a comparison of conventional ultrasound and contrast-enhanced CT. Eur J Radiol. 2013;82(9):1385–90.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This study was supported by the Ministry of Science and Technology of China under Grant No. 2017YFA0205200, the National Key R&D Program of China under Grant No. 2018YFC0114900, the Development Project of National Major Scientific Research Instrument No. 82027803, the National Natural Science Foundation of China under Grant Nos. 62027901, 81930053, 81227901, 82027803, and 81971623, the National Natural Science Foundation of China No. 82171937, the Chinese Academy of Sciences under Grant Nos.YJKYYQ20180048 and QYZDJ-SSW-JSC005, Zhejiang Provincial Association Project for Mathematical Medicine No. LSY19H180015, and the Youth Innovation Promotion Association CAS, and the Project of High-Level Talents Team Introduction in Zhuhai City.

Author information

Authors and Affiliations

Authors

Contributions

Conceiving the study and design: T.T. and J.G. Expert radiologist reads: Q.Z., D.X., Z.Y., and S.T. Collection and curation of the clinical datasets: J.G., L.S. and F.C. Development, training, validation, and artistic representation of neural networks: T.T, J.G., and K.W. Data analysis and interpretation: T.T, J.G., and X.Y. Drafting of the manuscript: T.T., J.G., K.W., J.T., and T.J. Critical analysis and manuscript revision: all authors. The authors read and approved the final manuscript.

Corresponding authors

Correspondence to Jie Tian, Kun Wang or Tian’an Jiang.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the ethics committee of each participating hospital. The requirement for informed consent was waived. This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) guideline for diagnostic studies.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Method S1.

Detailed training process of our DLR model. Method S2. Details of the two-round reader study. Figure S1. One example of the raw CEUS image generated from the US device. Figure S2. Resized color and grayscale CEUS ROI images extracted from raw CEUS images generated by different US devices. Figure S3. Performance of different deep learning backbones on training and validation cohorts. Figure S4. Confusion matrices for the comprehensive results from five readers with and without DLR assistance and the DLR model on internal and external validation cohorts. Figure S5. Confusion matrices for Reader 1~5 without DLR assistance on internal and external validation cohorts. Figure S6. Confusion matrices for Reader 1~5 with DLR assistance on internal and external validation cohorts. Figure S7. Histogram representing the PDAC score output from the DLR model on CP and PDAC lesions. Table S1. The detailed architecture of our DLR model. Table S2. Sensitivity and specificity comparison between the diagnoses from the DLR model and that of each reader in the validation cohorts.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tong, T., Gu, J., Xu, D. et al. Deep learning radiomics based on contrast-enhanced ultrasound images for assisted diagnosis of pancreatic ductal adenocarcinoma and chronic pancreatitis. BMC Med 20, 74 (2022). https://doi.org/10.1186/s12916-022-02258-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12916-022-02258-8

Keywords