This study assessed the quality and consistency of the recommendations of international CPGs on the diagnosis and management of thyroid nodules and cancer to assist physicians in considering the appropriate recommendations. We identified ten guidelines involving thyroid nodules and cancer management, three of which had been published between 2000 and 2007 [16, 19, 22]. As a general rule, CPGs should be reassessed for validity every 3 years [24, 25]. Therefore, one of the CPGs reviewed here are likely to be outdated because they have not been updated in over 10 years . A distinct variation in the applicability and transparency of funding sources was found among the guidelines. After applying the AGREE-II instrument to the ten guidelines, we found that guidelines developed by the BTA , the IKNL , and the NCCN scored above 50% in all six domains . Moreover, the view of the target population (domain 5) on guideline development was inadequate in all ten guidelines. We found differences among guidelines with respect to the indication of FNA in low-suspicion nodules, the routine measurement of serum calcitonin, and the role of cervical lymph node dissection in node-negative patients.
The application of AGREE-II allows for an evaluation of the various aspects of guidelines. We measured the development methods of the guidelines by using the AGREE-II instrument, based on the rationale that a high methodological quality is fundamental for the integrity, reproducibility, and transparency of guidelines. Our study showed that the methodological quality of the guidelines was optimal for ‘scope and purpose’ and ‘clarity and presentation’, but received the lowest scores for ‘applicability’. Most guidelines lacked explicit statements on whether the patients’ views and preferences had been sought (item 5), whether the various options for management of the condition were clearly presented (item 16), whether the potential cost implications of applying the recommendations were considered (item 20), and on key review criteria for monitoring and/or auditing purposes (item 21). Although the AGREE-II instrument provides six independent scores for six corresponding aspects of the guidelines, we believe that clinicians would be more concerned about the ‘rigor of development’. However, the quality of ‘applicability’ domain also plays a critical role in implementation of the guideline. For a guideline to be effective, it should provide advice as to how the recommendations can be implemented, it should present a discussion of the potential impact of recommendations on resources, and it requires clearly defined criteria derived from the key recommendations. Therefore, we recommend that clinicians rely preferentially on the guidelines that performed better regarding the ‘applicability’ domain [16, 19, 21].
The AGREE-II instrument is used to establish a universal standard for the rigor and transparency of guideline development, and to suggest how to improve existing guidelines . However, some limitations exist. One serious limitation concerns conflicts of interest. The AGREE-II instrument advocates that guidelines always report clearly, irrespective of whether conflicts exist, but several of the guidelines lacked statements about conflicts of interest. The use of a guideline-adaptation framework such as ADAPTE should be considered to develop high-quality CPGs in the future .
In general, the recommendations of the CPGs on the diagnosis, management, and postoperative care of thyroid nodules and cancer were consistent, despite the discrepancies between scores for the ‘rigor of development’. All guidelines advocated thyroid sonography, as well as measured the TSH and free thyroxine levels in all patients. However, no firm recommendations were made for the routine assessment of serum calcitonin and the indication of thyroid scintigraphy. Similarly, major differences across the CPGs were related to the indication of radioiodine ablation and the optimal level of TSH suppression therapy in patients with DTC (Table 4). Such situations revealed that, even when CPG developers claimed to have paired their grade of recommendations with the level of evidence, recommendations were not graded or were inconsistent. This variation may be related to the developers’ search strategy, the process of selecting scientific evidence, and the way the recommendations had been formulated [27, 28].
In prospective trials, conclusions regarding the optimal selection of treatments must be based on retrospective analysis and the consensus of expert opinions [15–17, 29]. Several CPGs recommend total thyroidectomy if the primary tumor is at least 1 cm in diameter or if extrathyroidal extension or metastases are present [15–18, 22], but some guidelines advise that total thyroidectomy may be performed in patients with large tumors (>4 cm) in the absence of clinical suspicion [21, 23]. Whereas some CPGs recommend considering routine central-neck dissection for most patients with papillary thyroid cancer [15, 16, 20], the guidelines from the NCCN recommend only central-neck dissection in the presence of grossly positive metastasis [16, 19, 21].
The strengths of our review included a comprehensive search for eligible guidelines, the systemic and explicit application of eligibility criteria, the careful consideration of guideline quality by using the AGREE-II instrument, and a rigorous analytical approach. Therefore, this study can be of additional value to already available guideline compendia and libraries such as the National Guideline Clearinghouse and the National Institute for Health and Clinical Excellence because these libraries depend on submissions from guideline organizations. However, several limitations could have biased our study. First, only CPGs written in English were included, and guidelines written entirely in other languages might have been overlooked. Second, CPGs that focus on non-DTC such as medullary thyroid cancer or unique techniques such as procedural guidelines for radioiodine therapy were excluded from our study. Third, the AGREE-II instrument is used to evaluate the guideline as a whole, and is not intended for specific, individual recommendations. However, a global appraisal on a guideline’s construction process may reflect the strength of the individual recommendations to an extent. Finally, we used only the AGREE-II instrument in evaluating the quality of the guidelines. Other instruments such as the four-item Global Rating Scale (GRS) may also play a role in guideline assessment . Although the GRS is less sensitive than AGREE-II in detecting differences in guideline quality, its items did predict outcome measures related to guideline adoption.