Skip to main content
  • Research article
  • Open access
  • Published:

Cervical lymph node metastasis prediction from papillary thyroid carcinoma US videos: a prospective multicenter study

Abstract

Background

Prediction of lymph node metastasis (LNM) is critical for individualized management of papillary thyroid carcinoma (PTC) patients to avoid unnecessary overtreatment as well as undesired under-treatment. Artificial intelligence (AI) trained by thyroid ultrasound (US) may improve prediction performance.

Methods

From September 2017 to December 2018, patients with suspicious PTC from the first medical center of the Chinese PLA general hospital were retrospectively enrolled to pre-train the multi-scale, multi-frame, and dual-direction deep learning (MMD-DL) model. From January 2019 to July 2021, PTC patients from four different centers were prospectively enrolled to fine-tune and independently validate MMD-DL. Its diagnostic performance and auxiliary effect on radiologists were analyzed in terms of receiver operating characteristic (ROC) curves, areas under the ROC curve (AUC), accuracy, sensitivity, and specificity.

Results

In total, 488 PTC patients were enrolled in the pre-training cohort, and 218 PTC patients were included for model fine-tuning (n = 109), internal test (n = 39), and external validation (n = 70). Diagnostic performances of MMD-DL achieved AUCs of 0.85 (95% CI: 0.73, 0.97) and 0.81 (95% CI: 0.73, 0.89) in the test and validation cohorts, respectively, and US radiologists significantly improved their average diagnostic accuracy (57% vs. 60%, P = 0.001) and sensitivity (62% vs. 65%, P < 0.001) by using the AI model for assistance.

Conclusions

The AI model using US videos can provide accurate and reproducible prediction of cervical lymph node metastasis in papillary thyroid carcinoma patients preoperatively, and it can be used as an effective assisting tool to improve diagnostic performance of US radiologists.

Trial registration

We registered on the Chinese Clinical Trial Registry website with the number ChiCTR1900025592.

Peer Review reports

Background

Papillary thyroid carcinoma (PTC) is the most common endocrine malignant tumor with persistently increasing incidence worldwide [1]. Lymph node metastasis (LNM) has been found in about 30–80% of PTC patients by pathologic examination [2]. It is considered a risk factor for local recurrence, distant metastases, and decreased survival rates [3, 4].

Ultrasound (US) is recommended as the first-line method to diagnose cervical LNM in PTC [5]. However, US is limited for deep structures and those acoustically shielded by air or bone, including patients with morbid obesity, poor neck extension, and remote cervical adenopathy (high level II, VII, substernal, posterior tracheal, etc.). For lateral cervical LNM, it can provide relatively reliable information to assist in surgical management [6], but for central cervical LNM 42% can be misdiagnosed [7]. Previous studies showed that clinical characteristics combined with US images had limited predictive power with the prediction AUC ranging from 71.5 to 75.8% [8, 9]. US-guided biopsy can be used to confirm the diagnosis. However, it is an invasive examination with the drawbacks of a possible inadequate specimen or misdiagnosis [5].

Therefore, prophylactic central compartment neck dissection is recommended with the detection of occult LNM [10]. It can be used to refine the prognosis and follow-up reducing the risk of loco-regional recurrence [11] and allowing for a more tailored use of radioiodine therapy [12]. However, the related complications including permanent recurrent laryngeal nerve injury and permanent hypo-parathyroidism, may significantly affect the quality of life [10]. Active surveillance and US-guided thermal ablation may be considered as alternative treatment options for low-risk papillary thyroid micro-carcinoma [5, 13]. However, occult or missed LNM still exists, leading to 6.0% postoperative recurrence [14]. Therefore, accurate noninvasive prediction of LNM is critical for individualized management of PTC patients to avoid unnecessary overtreatment as well as undesired under-treatment.

Artificial intelligence (AI), especially deep learning (DL) based radiomics approaches, enables automatic and quantitative extraction of high throughput features from medical images to establish imaging markers for disease classification or prediction. Recently, AI models trained by thyroid US images have been increasingly applied to predict cervical LNM [15,16,17,18,19,20,21], but most of them are single-center retrospective studies with a relatively small sample size. Yu et al. [22] conducted a study that investigated the diagnostic value of US radiomics in multi-center, cross-machine, multi-operator conditions, and the results showed that the highest sensitivity and specificity reached 94% and 77%, respectively, in predicting LNM in PTC patients. Unfortunately, it was still based on retrospectively collected data, and a higher-level clinical trial is needed to verify the effectiveness of DL models.

Compared with static US images, real-time US videos can cover all sections of a thyroid lesion with much richer diagnostic information. DL was applied on US videos to classify benign and malignant thyroid nodules, which achieved satisfactory accuracy [23]. However, such an approach has not been used for LNM diagnosis yet. Moreover, some studies proved that US with AI could outperform skilled radiologists in diagnosing thyroid cancer [24], but whether AI could actually help radiologists to improve their prediction of LNM still remains questionable.

To address all those issues properly, we conducted a multi-center prospective clinical trial for PTC patients. A newly developed DL model was applied on dynamic US videos to predict cervical LNM preoperatively. The primary goal was to verify its performance by comparing it with the patients’ pathological report after surgery. The secondary goal was to investigate the impact of AI on the performance of radiologists with different experiences.

Methods

This current study has two phases, a retrospective model pre-training phase and a prospective model fine-tuning phase. There were three steps in the fine-tuning phase: training, test, and validation steps. Figure 1 shows the structure and development process of our multi-scale, multi-frame, and dual-direction deep learning (MMD-DL) model. For both pre-training and fine-tuning phases, patients received a thyroidectomy after US examinations, and the postoperative pathological reports were used as the gold standard to determine whether the thyroid cancer was metastatic.

Fig. 1
figure 1

Illustration of the multi-scale, multi-frame, and dual-direction deep learning (MMD-DL) model. a Flowchart of the training stages of MMD-DL. b Architecture of the pre-trained feature extractor. c Architecture of MMD-DL with transverse and longitudinal ultrasound videos as inputs and lymph node metastasis probability as the output

Retrospective model pre-training phase

From September 2017 to December 2018, PTC patients who underwent thyroid examinations and surgeries from the first medical center of the Chinese PLA general hospital were enrolled in this study to pre-train the DL model. Maximum transverse and longitudinal gray-scale US images were collected by radiologists with more than five years of US experience.

The inclusion criteria were (1) patients confirmed to be PTC after thyroidectomy; (2) patients who underwent thyroid US examination within 2 weeks before surgery; (3) patients who received a thyroidectomy and lymph node dissection consistent with the Chinese Guidelines [25], and ground truth of LNM were evaluated by pathology.

The exclusion criteria were (1) patients received a biopsy before US examination; (2) the US image quality was insufficient, or the number of US videos was incomplete; (3) other pathological types of thyroid cancer, such as medullary carcinoma and undifferentiated carcinoma; (4) presence of distant metastases; and (5) patients who underwent surgery in other hospitals.

Both transverse and longitudinal US images were involved for pre-training, so that our feature extractors learned basic perception ability for diagnosing LNM.

Prospective model fine-tuning phase

Patient enrollment and sample size

The multicenter prospective study was approved by the institutional ethics committee of all involved hospitals, with the ethics committee approval number of S2019-212–06 and a clinical trial registration number of ChiCTR1900025592.

Patients with suspicious PTC from four different centers, including the first medical center of the Chinese PLA general hospital, the fourth medical center of the Chinese PLA general hospital, Beijing Tongren hospital, and China–Japan Friendship hospital, were consecutively enrolled from January 2019 to July 2021. All of the centers are located in Beijing.

All patients were operated on by surgeons with more than 15 years of experience in thyroid surgery and more than 1000 annual volume. All pathological specimens were sent to the pathology department for paraffin fixation and histological analysis by two or more experienced pathologists. Inclusion and exclusion criteria were as listed above.

We assumed that at least 30% of enrolled patients would have cervical LNM. Therefore, we calculated the sample size necessary to estimate a receiver operating characteristic (ROC) curve with no less than 217 patients (α: 0.05, 1-β: 0.85, width of the confidence interval: 0.125, confidence level: 0.95). Given an expected dropout rate of 20%, we should at least enroll 261 patients.

Clinical pathological data and US features

Clinical characteristics including age, sex, number of tumors, tumor size, location, presence of Hashimoto thyroiditis, type of thyroidectomy, type of lymph node dissection, Clinical T stage, and N stage were obtained from the patients’ medical records. Pathological T stage and lymph node metastatic results were obtained from the patients’ pathological report after surgery. The American Joint Committee on Cancer staging of thyroid cancer was applied to evaluate the TNM stage [26].

The multicenter standardized US videos were acquired with a Supersonic Aixplorer System using an S15–4 linear-array transducer (SuperSonic Imaging, France), with a center frequency of 8 Hz (ranging from 4 to 15 Hz), by radiologists with more than 6 years of experience. Patients were supine with the neck extended and the head turned to check the contralateral direction. The gain was 40%, the depth was 4 cm, the frame rate was 40 Hz and the focus was on target depth. Dynamic collection started from the edge of one side of the thyroid lobe, sweeping evenly and slowly until it reached the other side of the lobe. The direction is fixed from the top to the bottom, from the left to the right, and no scanning back and forth. More details in standardized US video acquisition are shown in Additional file 1: Method S1.

US features of the tumors were obtained from US examinations according to the American College of Radiology Thyroid Imaging, Reporting and Data System [27].

DL model development

DL model development is divided into two stages, as shown in Fig. 1a. In the first stage, we pre-train a feature extractor using the retrospective US images, the structure of which is shown in Fig. 1b. In the second stage, based on the pre-trained feature extractor, we build a multi-scale, multi-frame, and dual-direction deep learning (MMD-DL) model and fine-tune it. The structure of MMD-DL is shown in Fig. 1c.

In the first stage, the feature extractor adopts three networks to extract feature vectors of the US images in three scales, namely large, middle, and small. Here, ResNet18 is adopted as the network because of its popularity and resistance to overfitting. The design of the multi-scale structure helps the model to focus on the lesion characteristics of its exterior, edge, and interior areas and avoid the omission of features in the important regions. The features are fused by several fully connected layers to output the diagnostic results.

In the second stage, MMD-DL with two branches were used to extract the features from the horizontal scan and vertical scan after US video prerecession (Additional file 2: Method S2), respectively. Each branch consists of a multi-scale feature extractor, which has the same structure as the pre-trained feature extractor and has the same weight at the beginning of fine-tuning. In order to fuse temporal features, the feature extractor processes five frames obtained from the preprocessing of one US video one by one. Finally, a fully connected layer is used to fuse all features extracted from multi-scale, multi-frame, and dual-direction video frames, offering the diagnostic probability as the output. Details of our model and strategy of training our model are displayed in Additional file 3: Method S3.

Then, the model was transferred into the prospective US videos for test and validation. During training, test, and validation steps, we did not use the same population. Measuring the performance of our model can be found in Additional file 4: Method S4.

The impact of radiologists with different experiences by using AI for assistance

Two junior radiologists (Yi Mao and Guozheng Zhao) with 1 year of experience in thyroid US, two intermediate radiologists (Yan Wang and Lin Yuan) with 5 years of experience in thyroid US, and two senior radiologists (Mingbo Zhang and Mengjie Song) with over 8 years of experience in thyroid US were invited to interpret the same US videos of the test and validation cohorts. The radiologists were shown ultrasound videos that they had not seen before. After they gave the prediction of LNM based on their evaluation of US videos, the AI-predicted probability and AI-generated heatmap were provided to them as assisting information (Additional file 5: Method S5). Then, they performed the second-round diagnosis. Their predictive performances with and without AI assistance were compared.

Statistical analysis

The categorical and normally distributed continuous variables were presented as frequency (percentage) and mean with a 95% confidence interval (CI), respectively. Categorical variables were compared by the χ2 test. Student’s t-test was used for comparison between normally distributed continuous variables. The area under the ROC curve (AUC) was used to measure the performance of prediction. All the statistical analyses above were performed with SPSS software (version 26, Chicago, IL). The Delong test was employed to compare different AUCs using GraphPad Prism (version 8, CA, USA). A two-sided P < 0.05 was considered to indicate statistical significance.

Results

Study population and baseline characteristics

A total of 725 patients with suspicious PTC were retrospectively enrolled (Fig. 2a), but 237 patients were excluded based on our exclusion criteria, resulting in 488 patients and 976 B-mode US images used for pre-training the MMD-DL model.

Fig. 2
figure 2

Flowchart of the retrospective and prospective patient enrollment and cohort buildings. a Inclusion and exclusion process of the retrospective patient enrollment. b Inclusion and exclusion process of the multicenter prospective patient enrollment. PTC, papillary thyroid carcinoma; BJTR, Beijing Tong Ren; FMC, the fourth medical center; CJF, China-Japan friendship

A total of 272 patients with suspicious PTC were prospectively enrolled (Fig. 2b), and 54 patients were excluded due to various reasons. Finally, 218 PTC patients (more than the minimum sample size required) and 436 US videos were included for model fine-tuning (training cohort n = 109), internal test (test cohort n = 39), and external validation (validation cohort n = 70), which were from different hospitals.

Table 1 shows that all demographic and ultrasound characteristics were well balanced between the training, test, and validation cohorts (P > 0.05), except for the tumor size and height-to-width ratio, in which the validation cohort was significantly different from the training cohort (P = 0.03, P < 0.001). This was within the expectations, because patients were collected from different hospitals in these two cohorts (Fig. 2b).

Table 1 Demographic and ultrasound characteristics of prospectively enrolled patients

The diagnostic performance of MMD-DL

Table 2 shows the diagnostic performances of the MMD-DL model in the pre-training, training, test, and validation cohorts, respectively. It achieved AUCs of 0.91 (95% CI: 0.89, 0.94) and 0.85 (95% CI: 0.78, 0.92) in the pre-training and training cohorts (Fig. 3a). In the internal test and external validation cohorts, AUCs researched 0.85 (95% CI: 0.73, 0.97) and 0.81 (95% CI: 0.73, 0.89) (Fig. 3b and c). There were no statistically significant differences between the training, test, and validation cohorts (pairwise comparison P > 0.99, P = 0.25, P = 0.35). Moreover, there were no significant differences of AUCs (pairwise comparison P = 0.10, P = 0.63, P = 0.56) for the three independent hospitals in the validation cohort, suggesting a high LNM diagnostic efficacy of MMD-DL was also highly reproducible.

Table 2 Comparison of LNM predictions in different cohorts using the MMD-DL model
Fig. 3
figure 3

Performances of MMD-DL, radiologists, and radiologists with AI assistance in predicting lymph node metastasis. a Receiver operating characteristic (ROC) curves of MMD-DL in the pre-training and training cohorts. b, c ROC curves of MMD-DL and diagnostic performances of radiologists with and without AI assistance in the test and validation cohorts, respectively. AUC, area under the curve

Although we found that sensitivities were consistently lower than specificities for using MMD-DL in all four cohorts, such behavior was not the same in different hospitals. In the validation cohort, the AI model showed higher sensitivities in two hospitals, but a higher specificity in the other hospital (Table 2), which was likely caused by different US operators.

We explored combining clinical features with AI models to model lymph node metastasis. Two clinical features (age and US suspicious lymph node) were screened out using multi-variable logistic regression, as shown in Additional file 6: Table S1. However, fully connected layers were used to fuse them with the diagnostic scores by our MMD-DL and the results showed no statistically significant difference in the results, whether in the test cohort or the validation cohorts (Additional file 7: Table S2).

Benefits from AI assistance

The AI model we built also generated heat maps, and examples are displayed in Supplementary Materials (Additional file 7: Fig. S1 and Fig. S2). However, the radiologists did not find regular features of heat maps. Figure 3b and c shows that the six green symbols representing six radiologists in diagnosing LNM were mostly under the ROC curves given by MMD-DL in the test and validation cohorts, respectively. However, with MMD-DL assistance, most orange symbols got closer to the curves, and there was one radiologist in each cohort even located above the corresponding curve.

The quantitative comparison in the validation cohort (Table 3) further revealed that the average accuracy and sensitivity of all radiologists were improved from 57% (95% CI: 52%, 62%) to 60% (95% CI: 55%, 64%), as well as from 62% (95% CI: 55%, 69%) to 65% (95% CI: 58%, 72%), respectively by using MMD-DL as assistance, and the improvements were significant (P < 0.001). The specificity was also improved from 53% (95% CI: 46%, 59%) to 55% (95% CI: 48%, 61%), but the difference was not significant (P = 0.28).

Table 3 Comparison of LNM predictions between radiologists with and without AI assistance in the validation cohort

Figure 4 demonstrates the diagnostic accuracies given by three groups of radiologists with different experiences in the test and validation cohorts together. Interestingly, the junior and senior radiologists (Fig. 4a and c) showed more distinct improvement than the intermediate radiologists (Fig. 4b). However, with AI assistance, the accuracy of intermediate radiologists changed from an elongated distribution to a more concentrated distribution, suggesting the LNM diagnosis was more stable with AI. The LNM diagnostic performances of all six radiologists from different hospitals in the validation cohort are demonstrated in Additional file 10: Fig. S3 a to c. In general, most radiologists achieved better performances after using AI, confirming that the benefits from AI assistance were highly reproducible.

Fig. 4
figure 4

Violin plots of the diagnostic accuracy given by a junior, b intermediate, and c senior radiologists with and without AI assistance in the test and validation cohorts together. ACC, accuracy

Discussion

In this multi-center prospective control trial, we verified the performance of our newly developed multi-scale, multi-frame, and dual-direction deep learning (MMD-DL) model for predicting cervical lymph node metastasis (LNM) in papillary thyroid carcinoma (PTC) patients, which achieved AUCs of 0.85 (95% CI: 0.73, 0.97) and 0.81 (95% CI: 0.73, 0.89) in the internal test and external validation cohorts, respectively. To the best of our knowledge, this is the first multi-center prospective clinical trial conducted to verify the actual performance of a deep learning (DL) based artificial intelligence (AI) model in such a clinical scenario. Moreover, we proved that US radiologists with different work experiences, who did not have any prior knowledge of using AI for computer-assisted diagnosis, significantly improved their average diagnostic accuracy (57% vs. 60%, P = 0.001) and sensitivity (62% vs. 65%, P < 0.001) by using the AI model for assistance. We believe this study provides high-level clinical evidence of how much an AI model can achieve and how much radiologists can benefit when adopting DL for assisted prediction of cervical LNM in PTC patients.

As a high-incidence and low-invasiveness tumor, the multimodal imaging diagnosis and cytopathological diagnosis of PTC have been studied in-depth with achievements [28,29,30]. However, PTC patients with pathologic LNM still require aggressive treatment, including lymph node dissection and radioactive iodine therapy, to prevent local recurrence and distant metastases. In contrast, the absence of LNM is one of the characteristics of low-risk PTC, which can be treated with active surveillance or US-guided thermal ablation [5]. Therefore, preoperative identification of LNM is crucial for establishing appropriate management strategies. US is the first-line imaging method for non-invasive assessment of cervical LNM on PTC patients [5]. However, its diagnostic accuracy was affected by the capability of radiologists, obscuration of bones and gases, presence of occult lymph nodes, and many other factors in clinical practice [31]. There is an urgent need of a reliable method to improve the prediction accuracy of cervical LNM during preoperative US evaluations on PTC patients, and our MMD-DL model was proposed to meet this need.

In recent years, investigations of applying AI and radiomic strategies on US images for cervical LNM prediction in PTC patients have drawn great attention [22]. Chang et al. developed an LNM nomogram combining DL signatures, clinical characteristics, and US features, which achieved an AUC of 0.83 in the validation cohort [32]. Wang et al. introduced a DL model with an AUC of 0.78 in their test set [33]. Liu et al. proposed a radiomics model integrating B-mode and strain elastography US images, offering an AUC of 0.90 [15]. Jiang established a nomogram based on shear-wave elastography images with an AUC of 0.83 [17]. All these studies provided valuable insights about the potential and effectiveness of developing DL models to analyze US images for accurate preoperative prediction of LNM in PTC patients. However, they were all retrospective studies that lacked independent multicenter validations. Yu et al. conducted a multicenter study with 1894 PTC patients involved in the training and validation of their transfer learning radiomics model, and the AUC reached 0.93 [22]. This study reported the highest AUC in cervical LNM prediction so far, but it was still a retrospective study with unbalanced patient characteristics in different cohorts, no standardized US image acquisition and no strict quality control.

Unlike previous studies, this multi-center prospective trial used a standardized US video acquisition protocol, and the data from all participating hospitals were reviewed by two senior radiologists to guarantee quality control. In total, 18 demographic and US characteristics of enrolled patients were recorded, which were much more comprehensive, and most of them were well-balanced between the training, test, and validation cohorts, minimizing possible biases in cross-cohort comparisons. Unlike some studies using needle biopsy as references, all patients in this study received a thyroidectomy, and the final pathological reports were used as the only gold standard. The validation cohort consisted of three hospitals, which were independent from the training and test cohorts. This was better than some of the previous studies for evaluating the reproducibility of an AI model. Because of those reasons, although our proposed MMD-DL model did not achieve the highest AUC compared with other reported studies, its LNM prediction performance was still more credible and reliable for radiologists, head and neck surgeons, and endocrinologists.

The MMD-DL model was designed and trained differently from other reported DL models. First, it adopted the transfer learning (TL) strategy [34] and utilized retrospectively collected 976 B-mode US images for pre-training, whereas the other TL model was pre-trained by natural images in ImageNet, rather than US images [22]. Therefore, MMD-DL eliminated potential adverse impacts from non-US images while retaining the essence of TL. Second, it integrated multi-scale (large, middle, and small) and dual-direction (transverse and longitudinal) analysis of a PTC nodule, so that the center, periphery, and adjacent areas of the nodule were separately interpreted by DL algorithms in two directions, making better use of spatial features hidden in US images [35]. Our study showed that using a fewer number of frames or only part of the scales seriously weakens AUC by around 20% and that the simultaneous use of scanning data in both directions improves the performance by around 10% (Additional file 11: Table S3). Third, instead of using one or two static US images from a nodule, the inputs of MMD-DL were down-sampled frames from US videos covering the entire nodule area. Therefore, it was able to analyze much richer information of a PTC nodule and make the cervical LNM prediction, which was not capable by previous DL models [15, 17, 22, 32, 33]. All those efforts effectively helped MMD-DL overcome the overfitting problem, increase training efficiency, and reduce instability caused by different US operators.

After verifying the performance of MMD-DL, we further investigated the actual clinical benefits for US radiologists with AI assistance. It should be pointed out that all six participating radiologists did not have any experience of using AI and had never seen any AI heatmaps before this trial. However, their diagnosis of cervical LNM was still improved regarding average accuracy, sensitivity, and specificity (Table 3) by introducing additional information from the AI model. This provided solid evidence for the clinical significance of using AI for assisted diagnosis. However, such use of AI had certain limitations. Because the interpretation of heatmaps varied largely in different radiologists, this study did not find a simple and recognizable pattern in these AI images, resulting in the intermediate group benefitting less than the other groups. Therefore, it is necessary to establish an appropriate guideline for interpreting AI heatmaps for all radiologists to make the most use of AI assistance, which is the next step of our work in the future.

Our study has limitations. First, only US videos of the thyroid gland were applied to predict cervical LNM, but US images of cervical lymph nodes and laboratory indices, such as genetic testing and thyroid function, were not included in the prediction. Second, only one type of US instrument was used in this study, and whether the results were consistent between different US devices was not investigated. Third, the tumor size and height-to-width ratio were significantly different between the training and validation cohorts, which may cause differences in diagnostic performance. Fourth, the sensitivity and specificity still had some variations across different hospitals. The stability of MMD-DL needs to be further verified in a larger sample size.

Conclusions

In conclusion, the deep learning model using US videos can provide accurate and reproducible prediction of cervical lymph node metastasis on papillary thyroid carcinoma patients preoperatively, and it can be used as an effective assisting tool to improve the diagnostic performance of US radiologists.

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Abbreviations

AI:

Artificial intelligence

AUC:

Area under the ROC curve

CI:

Confidence interval

DL:

Deep learning

LNM:

Lymph node metastasis

MMD-DL:

Multi-scale, multi-frame, and dual-direction deep learning

PTC:

Papillary thyroid carcinoma

ROC:

Receiver operating characteristic

SD:

Standard deviation

TL:

Transfer learning

References

  1. Lim H, Devesa SS, Sosa JA, Check D, Kitahara CM. Trends in thyroid cancer incidence and mortality in the United States, 1974–2013. JAMA. 2017;317:1338–48. https://doi.org/10.1001/jama.2017.2719.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Eskander A, Merdad M, Freeman JL, Witterick IJ. Pattern of spread to the lateral neck in metastatic well-differentiated thyroid cancer: a systematic review and meta-analysis. Thyroid. 2013;23:583–92. https://doi.org/10.1089/thy.2012.0493.

    Article  PubMed  Google Scholar 

  3. Randolph GW, Duh QY, Heller KS, LiVolsi VA, Mandel SJ, Steward DL, et al. The prognostic significance of nodal metastases from papillary thyroid carcinoma can be stratified based on the size and number of metastatic lymph nodes, as well as the presence of extranodal extension. Thyroid. 2012;22:1144–52. https://doi.org/10.1089/thy.2012.0043.

    Article  PubMed  Google Scholar 

  4. Smith VA, Sessions RB, Lentsch EJ. Cervical lymph node metastasis and papillary thyroid carcinoma: does the compartment involved affect survival? Experience from the SEER database. J Surg Oncol. 2012;106:357–62. https://doi.org/10.1002/jso.23090.

    Article  PubMed  Google Scholar 

  5. Haugen BR, Alexander EK, Bible KC, Doherty GM, Mandel SJ, Nikiforov YE, et al. 2015 American thyroid association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: the American thyroid association guidelines task force on thyroid nodules and differentiated thyroid cancer. Thyroid. 2016;26:1–133. https://doi.org/10.1089/thy.2015.0020.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Hwang HS, Orloff LA. Efficacy of preoperative neck ultrasound in the detection of cervical lymph node metastasis from thyroid cancer. Laryngoscope. 2011;121:487–91. https://doi.org/10.1002/lary.21227.

    Article  PubMed  Google Scholar 

  7. O’Connell K, Yen TW, Quiroz F, Evans DB, Wang TS. The utility of routine preoperative cervical ultrasonography in patients undergoing thyroidectomy for differentiated thyroid cancer. Surgery. 2013;154:697–701. https://doi.org/10.1016/j.surg.2013.06.040. discussion.

    Article  PubMed  Google Scholar 

  8. Liu W, Cheng R, Ma Y, Wang D, Su Y, Diao C, et al. Establishment and validation of the scoring system for preoperative prediction of central lymph node metastasis in papillary thyroid carcinoma. Sci Rep. 2018;8:6962. https://doi.org/10.1038/s41598-018-24668-6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zou M, Wang YH, Dong YF, Lai XJ, Li JC. Clinical and sonographic features for the preoperative prediction of lymph nodes posterior to the right recurrent laryngeal nerve metastasis in patients with papillary thyroid carcinoma. J Endocrinol Invest. 2020;43:1511–7. https://doi.org/10.1007/s40618-020-01238-0.

    Article  CAS  PubMed  Google Scholar 

  10. Hughes DT, Rosen JE, Evans DB, Grubbs E, Wang TS, Solórzano CC. Prophylactic central compartment neck dissection in papillary thyroid cancer and effect on locoregional recurrence. Ann Surg Oncol. 2018;25:2526–34. https://doi.org/10.1245/s10434-018-6528-0.

    Article  PubMed  Google Scholar 

  11. Hartl DM, Leboulleux S, Al Ghuzlan A, Baudin E, Chami L, Schlumberger M, et al. Optimization of staging of the neck with prophylactic central and lateral neck dissection for papillary thyroid carcinoma. Ann Surg. 2012;255:777–83. https://doi.org/10.1097/SLA.0b013e31824b7b68.

    Article  PubMed  Google Scholar 

  12. Hughes DT, White ML, Miller BS, Gauger PG, Burney RE, Doherty GM. Influence of prophylactic central lymph node dissection on postoperative thyroglobulin levels and radioiodine treatment in papillary thyroid cancer. Surgery. 2010;148:1100–6. https://doi.org/10.1016/j.surg.2010.09.019. discussion 1106–7.

    Article  PubMed  Google Scholar 

  13. Mauri G, Hegedüs L, Bandula S, Cazzato RL, Czarniecka A, Dudeck O, et al. European thyroid association and cardiovascular and interventional radiological society of Europe 2021 clinical practice guideline for the use of minimally invasive treatments in malignant thyroid lesions. Eur Thyroid J. 2021;10:185–97. https://doi.org/10.1159/000516469.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Liu H, Li Y, Mao Y. Local lymph node recurrence after central neck dissection in papillary thyroid cancers: a meta analysis. Eur Ann Otorhinolaryngol Head Neck Dis. 2019;136:481–7. https://doi.org/10.1016/j.anorl.2018.07.010.

    Article  CAS  PubMed  Google Scholar 

  15. Liu T, Ge X, Yu J, Guo Y, Wang Y, Wang W, et al. Comparison of the application of B-mode and strain elastography ultrasound in the estimation of lymph node metastasis of papillary thyroid carcinoma based on a radiomics approach. Int J Comput Assist Radiol Surg. 2018;13:1617–27. https://doi.org/10.1007/s11548-018-1796-5.

    Article  PubMed  Google Scholar 

  16. Liu T, Zhou S, Yu J, Guo Y, Wang Y, Zhou J, et al. Prediction of lymph node metastasis in patients with papillary thyroid carcinoma: a radiomics method based on preoperative ultrasound images. Technol Cancer Res Treat. 2019;18:1533033819831713. https://doi.org/10.1177/1533033819831713.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Jiang M, Li C, Tang S, Lv W, Yi A, Wang B, et al. Nomogram based on shear-wave elastography radiomics can improve preoperative cervical lymph node staging for papillary thyroid carcinoma. Thyroid. 2020;30:885–97. https://doi.org/10.1089/thy.2019.0780.

    Article  PubMed  Google Scholar 

  18. Li F, Pan D, He Y, Wu Y, Peng J, Li J, et al. Using ultrasound features and radiomics analysis to predict lymph node metastasis in patients with thyroid cancer. BMC Surg. 2020;20:315. https://doi.org/10.1186/s12893-020-00974-7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Zhou SC, Liu TT, Zhou J, Huang YX, Guo Y, Yu JH, et al. An ultrasound radiomics nomogram for preoperative prediction of central neck lymph node metastasis in papillary thyroid carcinoma. Front Oncol. 2020;10:1591. https://doi.org/10.3389/fonc.2020.01591.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Tong Y, Li J, Huang Y, Zhou J, Liu T, Guo Y, et al. Ultrasound-based radiomic nomogram for predicting lateral cervical lymph node metastasis in papillary thyroid carcinoma. Acad Radiol. 2021;28:1675–84. https://doi.org/10.1016/j.acra.2020.07.017.

    Article  PubMed  Google Scholar 

  21. Park VY, Han K, Kim HJ, Lee E, Youk JH, Kim EK, et al. Radiomics signature for prediction of lateral lymph node metastasis in conventional papillary thyroid carcinoma. PLoS One. 2020;15:e0227315. https://doi.org/10.1371/journal.pone.0227315.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Yu J, Deng Y, Liu T, Zhou J, Jia X, Xiao T, et al. Lymph node metastasis prediction of papillary thyroid carcinoma based on transfer learning radiomics. Nat Commun. 2020;11:4807. https://doi.org/10.1038/s41467-020-18497-3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Wang B, Wan Z, Li C, Zhang M, Shi Y, Miao X, et al. Identification of benign and malignant thyroid nodules based on dynamic AI ultrasound intelligent auxiliary diagnosis system. Front Endocrinol (Lausanne). 2022;13:1018321. https://doi.org/10.3389/fendo.2022.1018321.

    Article  PubMed  Google Scholar 

  24. Peng S, Liu Y, Lv W, Liu L, Zhou Q, Yang H, et al. Deep learning-based artificial intelligence model to assist thyroid nodule diagnosis and management: a multicentre diagnostic study. Lancet Digit Health. 2021;3:e250–9. https://doi.org/10.1016/S2589-7500(21)00041-8.

    Article  CAS  PubMed  Google Scholar 

  25. Medical Administration of the National Health Care Commission of the People’s Republic of China. Guidelines for the diagnosis and treatment of thyroid cancer (2022 edition). Chin J Pract Surg. 2022;42:1343-1357,1363.

    Google Scholar 

  26. Amin MB, Edge S, Greene F, et al. AJCC Cancer Staging Manual. 8th ed. New York: Springer International Publishing: American Joint Commission on Cancer; 2017.

    Google Scholar 

  27. Tessler FN, Middleton WD, Grant EG, Hoang JK, Berland LL, Teefey SA, et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J Am Coll Radiol. 2017;14:587–95. https://doi.org/10.1016/j.jacr.2017.01.046.

    Article  PubMed  Google Scholar 

  28. Sengul I, Sengul D. Hermeneutics for evaluation of the diagnostic value of ultrasound elastography in TIRADS 4 categories of thyroid nodules. Am J Med Case Rep. 2021;9(11):538–9.

    Article  Google Scholar 

  29. Zhu L, Chen Y, Ai H, Gong W, Zhou B, Xu Y, et al. Combining real-time elastography with fine-needle aspiration biopsy to identify malignant thyroid nodules. J Int Med Res. 2020;48(12):300060520976027.

    Article  CAS  PubMed  Google Scholar 

  30. Sengul I, Sengul D. Focusing on thyroid nodules in suspense: 10-15 mm with repeat cytology, Category III, the Bethesda System for Reporting Thyroid Cytopathology, TBSRTC. Rev Assoc Med Bras (1992). 2021;67(2):166–7.

    Article  PubMed  Google Scholar 

  31. Wang R, Tang Z, Wu Z, Xiao Y, Li J, Zhu J, et al. Construction and validation of nomograms to reduce completion thyroidectomy by predicting lymph node metastasis in low-risk papillary thyroid carcinoma. Eur J Surg Oncol. 2023;S0748-7983(23):00436–5. https://doi.org/10.1016/j.ejso.2023.03.236.

    Article  Google Scholar 

  32. Chang L, Zhang Y, Zhu J, Hu L, Wang X, Zhang H, et al. An integrated nomogram combining deep learning, clinical characteristics and ultrasound features for predicting central lymph node metastasis in papillary thyroid cancer: a multicenter study. Front Endocrinol (Lausanne). 2023;14:964074. https://doi.org/10.3389/fendo.2023.964074.

    Article  PubMed  Google Scholar 

  33. Wang Z, Qu L, Chen Q, Zhou Y, Duan H, Li B, et al. Deep learning-based multifeature integration robustly predicts central lymph node metastasis in papillary thyroid cancer. BMC Cancer. 2023;23:128. https://doi.org/10.1186/s12885-023-10598-8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging. 2016;35(5):1285–98. https://doi.org/10.1109/TMI.2016.2528162. (Epub 2016 Feb 11).

    Article  PubMed  Google Scholar 

  35. Zhou H, Wang K, Tian J. Online transfer learning for differential diagnosis of benign and malignant thyroid nodules with ultrasound images. IEEE Trans Biomed Eng. 2020;67(10):2773–80. https://doi.org/10.1109/TBME.2020.2971065.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank Wei Zheng, MD for manually annotating the ultrasound images of thyroid cancer.

Funding

Supported by the National Key Research and Development Program of China (No. 2022YFC2407405), the National Natural Science Foundation of China (Nos. 92159305, 92259303, 62027901, 81227901, 81930053, and 82272029), the Excellent Member Project of Youth Innovation Promotion Association CAS (No. 2016124), the Outstanding Youth Fund Incubation Program of PLA General Hospital (no. 2018-YQPY-002) and the Beijing Science Fund for Distinguished Young Scholars (No. JQ22013).

Author information

Authors and Affiliations

Authors

Contributions

Guarantors of integrity of the entire study, Y.K.L. and K.W.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting, all authors; approval of the final version of the submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, M.B.Z., Z.L.M., and K.W.; clinical studies, Y.M., X.J., N.X., and Q.H.X.; experimental study, J.T.; statistical analysis, M.B.Z., Z.L.M., and K.W.; and manuscript editing, all authors. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yu-Kun Luo or Kun Wang.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the institutional ethics committee of all involved hospitals with the ethics committee approval number of S2019-212–06.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Method S1.

Multicenter standardized US video acquisition.

Additional file 2: Method S2.

US videos prerecession.

Additional file 3: Method S3.

Details about our model and Strategy of training our model.

Additional file 4: Method S4.

Measuring the performance of our model.

Additional file 5: Method S5.

Visualization of our model.

Additional file 6: Table S1.

Statistical test results for clinical features.

Additional file 7: Table S2.

LNM predictions in different cohorts using the MMD-DL model with and without clinical features.

Additional file 8: Figure S1.

Heat maps of a thyroid cancer with lymph node metastases.

Additional file 9: Figure S2.

Heat maps of a thyroid cancer without lymph node metastases.

Additional file 10: Figure S3.

LNM diagnostic performances of all six radiologists in different hospitals in the validation cohort.

Additional file 11: Table S3.

The ablation experiment results on the test cohort.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, MB., Meng, ZL., Mao, Y. et al. Cervical lymph node metastasis prediction from papillary thyroid carcinoma US videos: a prospective multicenter study. BMC Med 22, 153 (2024). https://doi.org/10.1186/s12916-024-03367-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12916-024-03367-2

Keywords