Deep learning radiomics based on contrast-enhanced ultrasound images for assisted diagnosis of pancreatic ductal adenocarcinoma and chronic pancreatitis
BMC Medicine volume 20, Article number: 74 (2022)
Accurate and non-invasive diagnosis of pancreatic ductal adenocarcinoma (PDAC) and chronic pancreatitis (CP) can avoid unnecessary puncture and surgery. This study aimed to develop a deep learning radiomics (DLR) model based on contrast-enhanced ultrasound (CEUS) images to assist radiologists in identifying PDAC and CP.
Patients with PDAC or CP were retrospectively enrolled from three hospitals. Detailed clinicopathological data were collected for each patient. Diagnoses were confirmed pathologically using biopsy or surgery in all patients. We developed an end-to-end DLR model for diagnosing PDAC and CP using CEUS images. To verify the clinical application value of the DLR model, two rounds of reader studies were performed.
A total of 558 patients with pancreatic lesions were enrolled and were split into the training cohort (n=351), internal validation cohort (n=109), and external validation cohorts 1 (n=50) and 2 (n=48). The DLR model achieved an area under curve (AUC) of 0.986 (95% CI 0.975–0.994), 0.978 (95% CI 0.950–0.996), 0.967 (95% CI 0.917–1.000), and 0.953 (95% CI 0.877–1.000) in the training, internal validation, and external validation cohorts 1 and 2, respectively. The sensitivity and specificity of the DLR model were higher than or comparable to the diagnoses of the five radiologists in the three validation cohorts. With the aid of the DLR model, the diagnostic sensitivity of all radiologists was further improved at the expense of a small or no decrease in specificity in the three validation cohorts.
The findings of this study suggest that our DLR model can be used as an effective tool to assist radiologists in the diagnosis of PDAC and CP.
According to Global Cancer Statistics 2020, pancreatic cancer is the seventh leading cause of cancer-related death, with a five-year survival rate of less than 10% [1, 2]. Approximately 85–95% of pancreatic cancer patients have pancreatic ductal adenocarcinoma (PDAC) [3, 4]. Previous studies have shown that pancreatic cancer occurs more frequently in European and North American countries. The etiology is mainly attributed to genetic and environmental factors, especially diet and lifestyle, as well as a combination of factors such as obesity combined with smoking and alcohol [5, 6]. The poor prognosis in pancreatic cancer is due to a late diagnosis or misdiagnosis resulting from an overlap of symptoms with other conditions, such as chronic pancreatitis (CP) [7,8,9,10].
Imaging methods used in PDAC diagnosis include ultrasound (US), multidetector computed tomography (MDCT), magnetic resonance imaging (MRI), and positron emission tomography-computed tomography (PET-CT). Among them, contrast-enhanced ultrasound (CEUS) is convenient, poses no risk of radiation, and provides excellent spatial and temporal resolution to display microcirculatory perfusion of the pancreatic mass with parenchyma [11,12,13,14,15,16]. Moreover, studies have shown that PDAC can be distinguished from CP by comparing the enhancement intensity of the lesion to the pancreatic parenchyma during the venous phase [17,18,19]. However, the diagnostic performance of CEUS is largely dependent on the experience of radiologists. Furthermore, subjective imaging features and persistent inter- and intra-observer variability remain challenging factors in the interpretation of CEUS images [20, 21]. At present, there are few human experts who can consistently diagnose pancreatic disorders based on CEUS.
Radiomics is a method that extracts high-throughput quantitative features from medical images, which primarily use two analytical strategies in artificial intelligence (AI), machine learning, and deep learning [22,23,24,25,26,27]. The feasibility of radiomics in the diagnosis of PDAC has been demonstrated using MRI, computed tomography (CT), and endoscopic ultrasonography (EUS) images. Deng et al.  proposed a multi-parameter MRI radiomics model based on 119 patients with the best area under the curve (AUC) of 0.902 in the validation cohort to distinguish PDAC from CP. Ren et al.  verified the ability of texture analysis on unenhanced CT to distinguish PDAC from CP with best accuracy of 0.821. Tonozuka et al.  analyzed EUS images of 139 patients to distinguish among PADC, CP, and normal pancreas; the proposed deep learning radiomics (DLR) model achieved AUC of 0.924 and 0.940 in the validation and test cohorts. Although these studies show that the radiomics model can achieve good performance in the identification of PDAC and CP, several common limitations remain unaddressed. First, machine learning-based radiomics studies require labor-intensive and time-consuming lesion delineation, which inevitably is influenced by inter-and intra-operator reproducibility, especially in US images with unclear boundary definition . Second, these studies did not investigate the actual benefits of using radiomics in real diagnostic scenarios for radiologists. Third, the feasibility of radiomics using CEUS imaging in diagnosing PDAC remains unverified.
This study was designed considering these limitations and aimed to (1) develop a DLR model for the automatic and accurate diagnosis of PDAC and CP using CEUS images and (2) validate the applicability of the DLR model as an effective tool to assist radiologists in the diagnosis of PDAC and CP. Additionally, the effect of this DLR model in assisting radiologists in decision-making is measured to assess its real clinical benefits. A two-round reader study with five radiologists was conducted to compare the diagnostic performance between the model and radiologists. More importantly, the ability of the model in assisting different radiologists identify PDAC and CP was investigated, which demonstrated its potential usefulness in real clinical practices.
This retrospective multicenter study was conducted using data from three hospitals in China (Hospital 1: First Affiliated Hospital, Zhejiang University School of Medicine; Hospital 2: Cancer Hospital of the University of Chinese Academy of Sciences; Hospital 3: West China Hospital, Sichuan University). It was conducted in accordance with the Declaration of Helsinki and approved by the ethics committee of each participating hospital. The requirement for informed consent was waived owing to the retrospective study design. This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) guidelines for diagnostic studies.
The inclusion criteria were (I) patients with pathologically confirmed CP (followed up for at least 6 months without progression to pancreatic cancer) or PDAC without distant metastasis, (II) patients whose CEUS examination was performed within three days before biopsy and surgery, and (III) availability of CEUS video or CEUS images. The exclusion criteria were (I) multiple lesions in the pancreas, (II) with a history of pancreatic surgery or chemotherapy, and (III) inadequate CEUS image quality. All histopathological findings were confirmed by pathologists with more than 10 years of experience in pancreatic pathology.
Data derived from Hospital 1 with the largest number of enrolled patients were used as the primary cohort to reduce overfitting or bias in the analysis. In this study, patients of Hospital 1 were enrolled between January 2020 to April 2021. We selected the patients admitted in 2021 as the internal validation cohort and the patients admitted in 2020 as the training cohort. Data from Hospitals 2 and 3 were used as independent external validation cohorts. The detailed research process is illustrated in Fig. 1. Baseline characteristics including age, sex, lesion location and size, histological type, and carbohydrate antigen 19-9 (CA19-9) and carcinoembryonic antigen (CEA) levels were collected from the hospital database.
Contrast-enhanced ultrasound image acquisition
Four different US devices (MyLab 90, ESAOTE, Italy; Aloka, HITACHI, Japan; LIGIQ E20, GE, USA; Resona 7, Mindray, China) equipped with an abdominal probe were used to capture the CEUS videos and/or images. Examinations were performed by one of the six radiologists with over 10 years of experience in abdominal CEUS. Before each examination, the proper contrast mode, including gain, depth, acoustic window, mechanical index, and focal zone, were adjusted. First, 2.4 mL of the contrast agent (SonoVue®; Bracco, Milan, Italy) was injected, followed by a 5-mL saline flush. The timer was started simultaneously when the contrast agent was being injected. Subsequently, the probe was kept in a stable state for 120 s to detect the pancreatic lesion and the surrounding pancreatic parenchyma. Finally, the video was recorded in Dicom format.
In this study, only one key CEUS image of each patient was finally selected for analysis. CEUS images of pancreatic lesions were mainly divided into three phases: vascular phase (0–30 s), pancreatic phase (31–60 s), and delayed phase (61–120 s) [14, 18]. Previous studies have shown that diagnosis of PDAC and CP using CEUS is mainly based on different enhancement patterns of the lesions. Studies have confirmed that during the pancreatic phase (30–40 s), the enhancement pattern could be high enhancement, equal enhancement, or low enhancement depending on the contrast of enhancement intensity between lesions and pancreatic parenchyma [13, 14, 31, 32]. Based on the above principles, we developed the criteria for the selection of key CEUS images. Owing to the retrospective nature of the study, dynamic CEUS video data of all patients were not completely preserved (half of the patients had no video). For maximal use of the existing data, image selection mainly included two schemes. For cases without dynamic video, 15–20 images were generally retained in the workstation during routine clinical work of CEUS examination in three participating hospitals, including important static CEUS images of three different phases. A typical static CEUS image of the pancreatic phase was selected for analysis, which showed the maximum diameter of the lesion at approximately the 35th second in duration. For cases with dynamic video, we directly selected the single frame around the 35th second in the dynamic video as a typical CEUS image for model development after preprocessing.
Region of interest extraction and preprocessing
The raw CEUS images were obtained by selecting the key frame from the CEUS videos or existing raw CEUS images extracted from the CEUS videos. Since two-dimensional (2D) grayscale US and CEUS images were displayed simultaneously in one view (Additional file 1: Fig. S1), we defined a rectangular region of interest (ROI) covering the lesion on the raw CEUS image, to eliminate the interference of irrelevant information from the image and non-lesion areas. The radiologist first determined the lesion area according to the 2D grayscale images in the raw CEUS images, following which the ROI was marked at the same location on the CEUS images. The open-source software labelme was used to label the ROI with a rectangular bounding box, and then the ROI image was cropped from the CEUS image . In principle, the ROI image included the lesion and surrounding tissues. After the ROI extraction, further preprocessing was performed to obtain the resized and grayscale ROI images for model development. All colored ROI images were converted to greyscale, considering the color difference of the CEUS images collected from different US devices (Additional file 1: Fig. S2) and the minimal correlation between the enhancement pattern and color to improve the robustness of the DLR model for different equipment. Thus, only the distribution of the image gray values could affect the DLR model output. Finally, the grayscale ROI images were resized to 224×224 and inputted into the DLR model. The ROI extraction and preprocessing workflow is shown in Fig. 2.
Deep learning radiomics model development
The DLR model was based on the Resnet-50  backbone to extract deep learning features for classification (Fig. 2). Two fully connected layers with outputs of 512 and 2 neurons, respectively, and a softmax activation layer were placed on top of the convolutional layers to generate the AI scores for PDAC and CP. Using the softmax activation layer can give the AI score the meaning of probability, ensuring that the sum of the AI score in PDAC and CP categories for one lesion is 1. The dropout layer with a probability of 0.5 was added between every two fully connected layers to alleviate overfitting. Additional file 1: Table S1 illustrates the detailed architecture of our DLR model. We also tested other typical image classification backbones, including Inception-v3 , VGG-16 , and Densenet-121 . The performance between different networks was very small in every cohort (Additional file 1: Fig. S3). Because Resnet-50 achieved the highest AUC in most validation cohorts, we chose Resnet-50 as the backbone for feature extraction. The detailed training process is provided in Additional file 1: Method S1 [38, 39].
Two-round reader study
A two-round reader study was conducted to investigate the clinical benefits radiologists actually obtained through the assistance of the DLR model (Fig. 2). Five radiologists with an average of 9 years of CEUS experience (3–15 years) participated in this study. A total of 207 lesions (150 positives) from the internal validation cohort and the external validation cohorts 1 and 2 were presented in random order. During the whole process, the radiologists were blinded to each other, the original diagnostic reports, and the final pathology results. The details of the two-round reader study are provided in Additional file 1: Method S2 .
Statistical analyses were performed using SPSS (version 23.0; IBM Corp., Armonk, NY, USA) and Python 3.7. Continuous variables were described as mean and standard deviation (SD), and categorical variables, as number and percentage. Between-group comparisons were performed using the Student’s t-test or Mann–Whitney U test for quantitative variables and the chi-squared test for qualitative variables. The 95% confidence interval (CI) was calculated using bootstrapping with 2000 resamples. The McNemar’s test was used to calculate whether the DLR model and the radiologists had significant differences in sensitivity and specificity. All statistical analyses were two-sided with statistical significance set at P <.05.
In total, 558 patients with pancreatic lesions were enrolled (Fig. 1). Pathological findings showed PDAC lesions in 414 cases and CP lesions in 144 cases. Table 1 summarizes the detailed patient demographics and pancreatic lesion characteristics.
Comparison between deep learning radiomics model and radiologists
The radiologists’ decisions from the first-round reading were compared with the DLR model. The receiver operator characteristic (ROC) curve of the DLR model, the diagnoses of each radiologist, and the average diagnostic results of all radiologists of the different cohorts are shown in Fig. 3. Our DLR model achieved a high AUC of 0.986 (95% CI 0.975–0.994), 0.978 (95% CI 0.950–0.996), 0.967 (95% CI 0.917–1.000), and 0.953 (95% CI 0.877–1.000) in the training, internal validation, and external validation cohorts 1 and 2, respectively. The sensitivity of internal validation, external validation cohort 1, and external validation cohort 2 were 97.3% (95% CI 93.2%–100%), 87.2% (95% CI 76.3%–97.2%), and 0.974 (95% CI 0.914–1.000); and the specificity values were 83.3% (95% CI 70.0%–94.3%), 100% (95% CI 100%–100%), and 70.0% (95% CI 37.5%–100%), respectively. The sensitivity and specificity results were based on the operation point of 0.5 . The confusion matrices of DLR model are presented in Additional file 1: Fig. S4. Diagnoses of the five radiologists were either worse or comparable to those of the model. This is demonstrated by almost no green point reaching the upper left region of the ROC curve. Furthermore, average of all three reader diagnoses in the validation cohorts were located below the ROC curve of the model (Fig. 3, green crosses), revealing that our model was superior to the radiologists in general. The confusion matrices of the comprehensive diagnoses from the five readers without DLR assistance are presented in Additional file 1: Fig. S4.
For a more specific comparison, we also compared the sensitivity and specificity between the model and each radiologist. For fairness, we adjusted the operation point of the DLR model so that the specificity (sensitivity) matched the specificity (sensitivity) of each radiologist when comparing sensitivity (specificity). Since radiologists provide direct qualitative classification reports, sensitivity and specificity are fixed. The sensitivity and specificity of DLR model can be changed by adjusting the classification threshold. Based on the above principles, we achieved a specific comparison between the diagnostic performance of DLR model and radiologists. Detailed results are shown in Additional file 1: Table S2. In the internal validation cohort, the DLR model achieved better sensitivity and specificity than all radiologists, with a significantly higher sensitivity than three out of the five radiologists (P <.05 for Reader-1, Reader-2, and Reader-5) and a significantly higher specificity than three out of the five radiologists (P <.05 for Reader-2, Reader-3, and Reader-5). In the external validation cohort 1, the DLR model also achieved better sensitivity and specificity than all radiologists, with a significantly higher sensitivity than two out of the five radiologists (P <.05 for Reader-2 and Reader-5) and significantly higher specificity than Reader-1 (P <.05). In the external validation cohort 2, the DLR model achieved better sensitivity and specificity than all radiologists, except Reader-3. It showed a significantly higher sensitivity than two out of the five radiologists (P <.05 for Reader-2 and Reader-5), but not a significantly higher specificity.
Enhanced diagnosis with AI assistance
The change in diagnoses given by the five radiologists before and after AI assistance were analyzed in the two-round reader study. Detailed changes in their decision, sensitivity, and specificity are shown in Table 2; and the confusion matrices of each radiologist without and with AI assistance are shown in the Additional file 1: Figs. S5 and S6. In the internal validation cohort, all radiologists achieved higher sensitivity, and four out of the five radiologists achieved higher specificity with AI assistance. Three and two of five radiologists had a significant improvement in sensitivity (P <.05 for Reader-1, Reader-2, and Reader-4) and specificity (P<.05 for Reader-2 and Reader-4), respectively. In external validation cohort 1, all radiologists achieved higher sensitivity, and two out of the five radiologists achieved higher specificity with AI assistance. In external validation cohort 2, all radiologists achieved higher sensitivity, and one out of the five radiologists achieved higher specificity with AI assistance. Reader-5 had a significantly higher sensitivity than the first-round results (P<.05). In all three validation cohorts, we found a positive effect of the DLR model in assisting radiologists to enhance their average accuracy (Fig. 3, orange points and crosses). Additionally, the confusion matrices of the comprehensive diagnoses of the five radiologists with AI assistance are given in the Additional file 1: Fig. S4.
To illustrate the clinical value of our DLR model more vividly, some successful and unsuccessful examples where radiologists changed their first-round decisions due to AI assistance are shown in Figs. 4 and 5. Although AI scores and heatmaps given by the DLR model misled the radiologists’ decisions in some cases, the total scores of the five radiologists for all lesions in the validation cohorts before and after DLR assistance exhibited a clear trend of enhanced diagnostic performance (Fig. 6). The total score was calculated as follows: if a patient was identified as a PDAC case by a radiologist, one point was awarded. Therefore, for five reads, the highest score was five, and the lowest score was zero. The higher the score, the more experts believed that the lesion was PDAC. The total scores demonstrated that a systematic improvement of the diagnostic accuracy was achieved in both PDAC and CP groups for all human experts with the help of the DLR model.
Noticeably, the heatmaps generated by gradient-weighted class activation mapping for model visualization had different patterns in PDAC and CP images . More specifically, the highlighted region for PDAC cases was greater than that of CP cases in the heatmaps, and most of those regions were located inside the lesions. In contrast, highlighted regions were mainly distributed at the boundary of the lesion in CP heatmaps. Additionally, radiologists noticed that for PDAC lesions, the highlighted regions were mainly distributed in the low-enhancement area inside the tumor, frequently adjacent to a high-enhancement region. Some heatmap examples of ROI images for PDAC and CP are shown in Fig. 7.
In this study, we attempted for the first time to investigate the performance of CEUS-based DLR in the diagnosis of PDAC and CP. Compared with human experts, our model achieved an overall better performance in all validation cohorts. Furthermore, we demonstrated that by incorporating AI scores and heatmaps, radiologists improved their decision-making, revealing the clinical value of applying the DLR model in clinical practice. Compared with other radiomics studies, a major highlight here was the use of the two-round reader investigation with five radiologists based on multicenter data.
The performance of our DLR model based on the CEUS images was better than or comparable to that of different models using other modalities, including MDCT, MRI, PET-CT, and EUS [28,29,30, 42]. This could be due to two possible reasons. First, compared with machine learning methods used in most of these studies [28, 29], the DLR model can automatically learn the adaptive features based on a specific task (effective identification of PDAC and CP) and it is flexible. Second, the diagnostic value of CEUS for PDAC has been demonstrated in previous studies [13, 43,44,45], confirming that the enhancement pattern in the lesion area contributes to qualitative diagnosis. Thus, it may contribute more to quantitative diagnosis.
Our DLR model achieved significantly higher, higher, or comparable sensitivity and specificity compared with the five radiologists in our first-round reader study. Although radiologists can identify lesions based on enhancement patterns, PDAC and CP may be difficult to distinguish when they exhibit similar CEUS enhancement patterns, mainly due to the presence of abundant fibrous tissue within PDAC lesions or necrosis within CP lesions. The DLR model can further learn and use high-level abstract features that are unrecognizable to humans to identify PDAC and CP, thus surpassing the diagnostic performance of human experts [24, 46,47,48].
Furthermore, we explored the benefits that radiologists actually obtained from the DLR assistance in clinical practice. We believe this is particularly important because DLR models will play a supporting role in the foreseeable future. Although AI and radiomics models have their superiorities, human experts would still make the final decision. One major reason is that the interpretability of deep learning features is still in its infancy [49, 50], and the biological mechanism behind these radiomics features remain underexplored. However, this should not stop radiologists from utilizing radiomics methods to enhance their diagnosis. In our design, AI scores notified radiologists about patients with different diagnoses between them and quantitative computer analysis. Heatmaps offered extra information for guiding their attention to the highlighted areas in the CEUS images so that they re-evaluated images more efficiently to decide whether to re-evaluate their decision. With this assisting strategy in the second-round image reading, human experts showed an overall increase in sensitivity to PDAC assessment with little or no loss of specificity.
We can understand how they helped radiologists effectively by investigating AI scores and heatmaps more thoroughly. The AI score can be regarded as the predicted probability of PDAC and CP by the DLR model. As can be seen from the frequency distribution histogram in Additional file 1: Fig. S7, we found that our DLR model provided a large ratio of extreme AI scores (e.g., greater than 0.9 for PDAC and less than 0.1 for CP lesions). As shown in Figs. 4 and 5, when the model provides an extreme score and strongly suggested the lesion are PDAC or CP, the AI score itself served as a strong indicator signal to the radiologists. The small ratio of ambiguous AI scores certainly helped with this “alarm” effect. Furthermore, heatmaps generated by DLR model reflected different patterns in the PDAC and CP lesions. For PDAC lesions, the highlighted areas were more concentrated in the low-enhancement region adjacent to the high enhancement area within the tumor, likely because of the DLR model learning key features from low-enhancement patterns related with less microvascular density, abundant fibrous tissue, and large amounts of necrotic tissue [51,52,53,54]. For CP lesions, since the model did not find important features towards PDAC, the highlighted area was relatively small and mainly distributed at the boundary of the ROI [55,56,57,58,59]. Therefore, the “alarm” effect and interpretable heatmap patterns together assisted radiologists to achieve real diagnostic benefits effectively.
Another potential clinical value of the DLR model is that it may help junior radiologists more effectively. Although all radiologists obtained positive assistance from the model, Reader-5, the junior radiologist, benefited the most. Therefore, this approach holds the potential to steepen the learning curve of radiologists with less experience.
Our study had several limitations. First, although this was a multicenter study, the dataset was not large, especially for the external validation cohort. Second, owing to the retrospective nature of the study, we did not use CEUS videos, which probably weakened the performance of the DLR strategy [14, 18]. Nevertheless, the strong performance of our model was sufficient to show that the use of static CEUS images provided effective clinical assistance.
A DLR model for the diagnosis of PDAC and CP was developed from a multicenter retrospective dataset based on CEUS images. Further, a two-round reader study demonstrated that the model was effective in assisting radiologists to improve diagnosis.
Availability of data and materials
The datasets analyzed during the current study are not publicly available due to the metadata containing information that could compromise the patients but are available from the corresponding author on reasonable request.
Pancreatic ductal adenocarcinoma
- CP :
Deep learning radiomics
Area under the curve
Multidetector computed tomography
Magnetic resonance imaging
Positron emission tomography-computed tomography
Standards for Reporting of Diagnostic Accuracy
Carbohydrate antigen 19-9
Receiver operator characteristic
Region of interest
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209–49.
Rahib L, Smith BD, Aizenberg R, Rosenzweig AB, Fleshman JM, Matrisian LM. Projecting cancer incidence and deaths to 2030: the unexpected burden of thyroid, liver, and pancreas cancers in the United States. Cancer Res. 2014;74(11):2913–21.
Hariharan D, Saied A, Kocher HM. Analysis of mortality rates for pancreatic cancer across the world. HPB (Oxford). 2008;10(1):58–62.
Brown ZJ, Cloyd JM. Surgery for pancreatic cancer: recent progress and future directions. Hepatobiliary Surg Nutr. 2021;10(3):376–8.
Huang J, Lok V, Ngai CH, Zhang L, Yuan J, Lao XQ, et al. Worldwide Burden of, Risk Factors for, and Trends in Pancreatic Cancer. Gastroenterology. 2021;160(3):744–54.
Hensrud DD, Heimburger DC. Diet, nutrients, and gastrointestinal cancer. Gastroenterol Clin North Am. 1998;27(2):325–46.
Chen R, Pan S, Cooke K, Moyes KW, Bronner MP, Goodlett DR, et al. Comparison of pancreas juice proteins from cancer versus pancreatitis using quantitative proteomic analysis. Pancreas. 2007;34(1):70–9.
Lowenfels AB, Maisonneuve P, Cavallini G, Ammann RW, Lankisch PG, Andersen JR, et al. Pancreatitis and the risk of pancreatic cancer. International Pancreatitis Study Group. N Engl J Med. 1993;328(20):1433–7.
Malka D, Hammel P, Maire F, Rufat P, Madeira I, Pessione F, et al. Risk of pancreatic adenocarcinoma in chronic pancreatitis. Gut. 2002;51(6):849–52.
Ferlay J, Partensky C, Bray F. More deaths from pancreatic cancer than breast cancer in the EU by 2017. Acta Oncol. 2016;55(9-10):1158–60.
D'Onofrio M, Barbi E, Dietrich CF, Kitano M, Numata K, Sofuni A, et al. Pancreatic multicenter ultrasound study (PAMUS). Eur J Radiol. 2012;81(4):630–8.
Ozawa Y, Numata K, Tanaka K, Ueno N, Kiba T, Hara K, et al. Contrast-enhanced sonography of small pancreatic mass lesions. J Ultrasound Med. 2002;21(9):983–91.
Grossjohann HS, Rappeport ED, Jensen C, Svendsen LB, Hillingsø JG, Hansen CP, et al. Usefulness of contrast-enhanced transabdominal ultrasound for tumor classification and tumor staging in the pancreatic head. Scand J Gastroenterol. 2010;45(7-8):917–24.
Tanaka S, Fukuda J, Nakao M, Ioka T, Ashida R, Takakura R, et al. Effectiveness of contrast-enhanced ultrasonography for the characterization of small and early stage pancreatic adenocarcinoma. Ultrasound Med Biol. 2020;46(9):2245–53.
Kobayashi A, Yamaguchi T, Ishihara T, Tadenuma H, Nakamura K, Saisho H. Evaluation of vascular signal in pancreatic ductal carcinoma using contrast enhanced ultrasonography: effect of systemic chemotherapy. Gut. 2005;54(7):1047.
Piscaglia F, Bolondi L. The safety of Sonovue in abdominal applications: retrospective analysis of 23188 investigations. Ultrasound Med Biol. 2006;32(9):1369–75.
D'Onofrio M, Crosara S, Signorini M, De Robertis R, Canestrini S, Principe F, et al. Comparison between CT and CEUS in the diagnosis of pancreatic adenocarcinoma. Ultraschall Med. 2013;34(4):377–81.
Xu J, Zhang M, Cheng G. Comparison between B-mode ultrasonography and contrast-enhanced ultrasonography for the surveillance of early stage pancreatic cancer: a retrospective study. J Gastrointest Oncol. 2020;11(5):1090–7.
Takeshima K, Kumada T, Toyoda H, Kiriyama S, Tanikawa M, Ichikawa H, et al. Comparison of IV contrast-enhanced sonography and histopathology of pancreatic cancer. AJR Am J Roentgenol. 2005;185(5):1193–200.
Ryu SW, Bok GH, Jang JY, Jeong SW, Ham NS, Kim JH, et al. Clinically useful diagnostic tool of contrast enhanced ultrasonography for focal liver masses: comparison to computed tomography and magnetic resonance imaging. Gut Liver. 2014;8(3):292–7.
Muskiet MHA, Emanuel AL, Smits MM, Tonneijck L, Meijer RI, Joles JA, et al. Assessment of real-time and quantitative changes in renal hemodynamics in healthy overweight males: Contrast-enhanced ultrasonography vs para-aminohippuric acid clearance. Microcirculation. 2019;26(7):e12580.
Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48(4):441–6.
Wang K, Lu X, Zhou H, Gao Y, Zheng J, Tong M, et al. Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut. 2019;68(4):729–41.
Qian X, Pei J, Zheng H, Xie X, Yan L, Zhang H, et al. Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat Biomed Eng. 2021;5(6):522–32.
Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama. 2016;316(22):2402–10.
Ma QP, He XL, Li K, Wang JF, Zeng QJ, Xu EJ, et al. Dynamic contrast-enhanced ultrasound radiomics for hepatocellular carcinoma recurrence prediction after thermal ablation. Mol Imaging Biol. 2021;23(4):572–85.
Gu J, Tong T, He C, Xu M, Yang X, Tian J, et al. Deep learning radiomics of ultrasonography can predict response to neoadjuvant chemotherapy in breast cancer at an early stage of treatment: a prospective study. Eur Radiol. 2021. Online ahead of print.
Deng Y, Ming B, Zhou T, Wu JL, Chen Y, Liu P, et al. Radiomics model based on MR images to discriminate pancreatic ductal adenocarcinoma and mass-forming chronic pancreatitis lesions. Front Oncol. 2021;11:620981.
Ren S, Zhao R, Zhang J, Guo K, Gu X, Duan S, et al. Diagnostic accuracy of unenhanced CT texture analysis to differentiate mass-forming pancreatitis from pancreatic ductal adenocarcinoma. Abdom Radiol (NY). 2020;45(5):1524–33.
Tonozuka R, Itoi T, Nagata N, Kojima H, Sofuni A, Tsuchiya T, et al. Deep learning analysis for the detection of pancreatic cancer on endosonographic images: a pilot study. J Hepatobiliary Pancreat Sci. 2021;28(1):95–104.
Wang Y, Yan K, Fan Z, Ding K, Yin S, Dai Y, et al. Clinical value of contrast-enhanced ultrasound enhancement patterns for differentiating focal pancreatitis from pancreatic carcinoma: a comparison study with conventional ultrasound. J Ultrasound Med. 2018;37(3):551–9.
Dietrich CF, Braden B, Hocke M, Ott M, Ignee A. Improved characterisation of solitary solid pancreatic tumours using contrast enhanced transabdominal ultrasound. J Cancer Res Clin Oncol. 2008;134(6):635–43.
Wada K. Labelme: Image polygonal annotation with python. GitHub repository. 2016.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. arXiv preprint arXiv:151203385. 2015.
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:151200567. 2015.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014.
Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely connected convolutional networks. arXiv preprint arXiv:160806993. 2016.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. Imagenet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211–52.
Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014.
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision: 2017; 2017. p. 618–26.
Guo X, Liu Z, Sun C, Zhang L, Wang Y, Li Z, et al. Deep learning radiomics of ultrasonography: identifying the risk of axillary non-sentinel lymph node involvement in primary breast cancer. EBioMedicine. 2020;60:103018.
Norton ID, Zheng Y, Wiersema MS, Greenleaf J, Clain JE, Dimagno EP. Neural network analysis of EUS images to differentiate between pancreatic malignancy and pancreatitis. Gastrointest Endosc. 2001;54(5):625–9.
Li XZ, Song J, Sun ZX, Yang YY, Wang H. Diagnostic performance of contrast-enhanced ultrasound for pancreatic neoplasms: a systematic review and meta-analysis. Dig Liver Dis. 2018;50(2):132–8.
Ran L, Zhao W, Zhao Y, Bu H. Value of contrast-enhanced ultrasound in differential diagnosis of solid lesions of pancreas (SLP): a systematic review and a meta-analysis. Medicine (Baltimore). 2017;96(28):e7463.
Vitali F, Pfeifer L, Janson C, Goertz RS, Neurath MF, Strobel D, et al. Quantitative perfusion analysis in pancreatic contrast enhanced ultrasound (DCE-US): a promising tool for the differentiation between autoimmune pancreatitis and pancreatic cancer. Z Gastroenterol. 2015;53(10):1175–81.
Che H, Li J, Li Y, Ma C, Liu H, Qin J, et al. p16 deficiency attenuates intervertebral disc degeneration by adjusting oxidative stress and nucleus pulposus cell cycle. Elife. 2020;9:52570.
Bronstein YL, Loyer EM, Kaur H, Choi H, David C, DuBrow RA, et al. Detection of small pancreatic tumors with multiphasic helical CT. AJR Am J Roentgenol. 2004;182(3):619–23.
Yoon SH, Lee JM, Cho JY, Lee KB, Kim JE, Moon SK, et al. Small (≤ 20 mm) pancreatic adenocarcinomas: analysis of enhancement patterns and secondary signs with multiphasic multidetector CT. Radiology. 2011;259(2):442–52.
Castelvecchi D. Can we open the black box of AI? Nature. 2016;538(7623):20–3.
Wang S, Liu Z, Rong Y, Zhou B, Bai Y, Wei W, et al. Deep learning provides a new computed tomography-based prognostic biomarker for recurrence prediction in high-grade serous ovarian cancer. Radiother Oncol. 2019;132:171–7.
Ashida R, Tanaka S, Yamanaka H, Okagaki S, Nakao K, Fukuda J, et al. The role of transabdominal ultrasound in the diagnosis of early stage pancreatic cancer: review and single-center experience. Diagnostics (Basel). 2018;9(1):2.
Tanaka S, Nakaizumi A, Ioka T, Takakura R, Uehara H, Nakao M, et al. Periodic ultrasonography checkup for the early detection of pancreatic cancer: preliminary report. Pancreas. 2004;28(3):268–72.
Tanaka S, Nakaizumi A, Ioka T, Oshikawa O, Uehara H, Nakao M, et al. Main pancreatic duct dilatation: a sign of high risk for pancreatic cancer. Jpn J Clin Oncol. 2002;32(10):407–11.
Tanaka S, Nakao M, Ioka T, Takakura R, Takano Y, Tsukuma H, et al. Slight dilatation of the main pancreatic duct and presence of pancreatic cysts as predictive signs of pancreatic cancer: a prospective study. Radiology. 2010;254(3):965–72.
Dong Y, D'Onofrio M, Hocke M, Jenssen C, Potthoff A, Atkinson N, et al. Autoimmune pancreatitis: imaging features. Endosc Ultrasound. 2018;7(3):196–203.
Hocke M, Ignee A, Dietrich CF. Contrast-enhanced endoscopic ultrasound in the diagnosis of autoimmune pancreatitis. Endoscopy. 2011;43(2):163–5.
Yamashita Y, Kato J, Ueda K, Nakamura Y, Kawaji Y, Abe H, et al. Contrast-enhanced endoscopic ultrasonography for pancreatic tumors. Biomed Res Int. 2015;2015:491782.
Ardelean M, Şirli R, Sporea I, Bota S, Martie A, Popescu A, et al. Contrast enhanced ultrasound in the pathology of the pancreas - a monocentric experience. Med Ultrason. 2014;16(4):325–31.
Fan Z, Li Y, Yan K, Wu W, Yin S, Yang W, et al. Application of contrast-enhanced ultrasound in the diagnosis of solid pancreatic lesions--a comparison of conventional ultrasound and contrast-enhanced CT. Eur J Radiol. 2013;82(9):1385–90.
This study was supported by the Ministry of Science and Technology of China under Grant No. 2017YFA0205200, the National Key R&D Program of China under Grant No. 2018YFC0114900, the Development Project of National Major Scientific Research Instrument No. 82027803, the National Natural Science Foundation of China under Grant Nos. 62027901, 81930053, 81227901, 82027803, and 81971623, the National Natural Science Foundation of China No. 82171937, the Chinese Academy of Sciences under Grant Nos.YJKYYQ20180048 and QYZDJ-SSW-JSC005, Zhejiang Provincial Association Project for Mathematical Medicine No. LSY19H180015, and the Youth Innovation Promotion Association CAS, and the Project of High-Level Talents Team Introduction in Zhuhai City.
Ethics approval and consent to participate
This study was approved by the ethics committee of each participating hospital. The requirement for informed consent was waived. This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) guideline for diagnostic studies.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Detailed training process of our DLR model. Method S2. Details of the two-round reader study. Figure S1. One example of the raw CEUS image generated from the US device. Figure S2. Resized color and grayscale CEUS ROI images extracted from raw CEUS images generated by different US devices. Figure S3. Performance of different deep learning backbones on training and validation cohorts. Figure S4. Confusion matrices for the comprehensive results from five readers with and without DLR assistance and the DLR model on internal and external validation cohorts. Figure S5. Confusion matrices for Reader 1~5 without DLR assistance on internal and external validation cohorts. Figure S6. Confusion matrices for Reader 1~5 with DLR assistance on internal and external validation cohorts. Figure S7. Histogram representing the PDAC score output from the DLR model on CP and PDAC lesions. Table S1. The detailed architecture of our DLR model. Table S2. Sensitivity and specificity comparison between the diagnoses from the DLR model and that of each reader in the validation cohorts.
About this article
Cite this article
Tong, T., Gu, J., Xu, D. et al. Deep learning radiomics based on contrast-enhanced ultrasound images for assisted diagnosis of pancreatic ductal adenocarcinoma and chronic pancreatitis. BMC Med 20, 74 (2022). https://doi.org/10.1186/s12916-022-02258-8