The Biomarker Toolkit — an evidence-based guideline to predict cancer biomarker success and guide development

Background
 An increased number of resources are allocated on cancer biomarker discovery, but very few of these biomarkers are clinically adopted. To bridge the gap between Biomarker discovery and clinical use, we aim to generate the Biomarker Toolkit, a tool designed to identify clinically promising biomarkers and promote successful biomarker translation. Methods All features associated with a clinically useful biomarker were identified using mixed-methodology, including systematic literature search, semi-structured interviews, and an online two-stage Delphi-Survey. Validation of the checklist was achieved by independent systematic literature searches using keywords/subheadings related to clinically and non-clinically utilised breast and colorectal cancer biomarkers. Composite aggregated scores were generated for each selected publication based on the presence/absence of an attribute listed in the Biomarker Toolkit checklist. Results Systematic literature search identified 129 attributes associated with a clinically useful biomarker. These were grouped in four main categories including: rationale, clinical utility, analytical validity, and clinical validity. This checklist was subsequently developed using semi-structured interviews with biomarker experts (n=34); and 88.23% agreement was achieved regarding the identified attributes, via the Delphi survey (consensus level:75%, n=51). Quantitative validation was completed using clinically and non-clinically implemented breast and colorectal cancer biomarkers. Cox-regression analysis suggested that total score is a significant driver of biomarker success in both cancer types (BC: p>0.0001, 95.0% CI: 0.869–0.935, CRC: p>0.0001, 95.0% CI: 0.918–0.954). Conclusions This novel study generated a validated checklist with literature-reported attributes linked with successful biomarker implementation. Ultimately, the application of this toolkit can be used to detect biomarkers with the highest clinical potential and shape how biomarker studies are designed/performed. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-023-03075-3.


Assay Validation
Does the study include methods to understand biomarker variability, e.g. does it include the effects of time as variable?

Assay Validation TRIPOD
Is the variability of biomarker measurement addressed, e.g. does the study evaluate coefficient of variation?

Mechanism of stabilization/ BRISQ
Is the constitution and concentration of fixative stated?

Mechanism of stabilization/ BRISQ
Is the biospecimen processing timing described, e.g., is the time in fixative/preservation solution stated?

Mechanism of stabilization/ BRISQ
Is the biospecimen method of enrichment stated, e.g., do the authors state that laser-capture microdissection of tissue/block selection for region of lesion/ centrifugation of blood etc. were used to enrich the specimen prior to analysis?
Sample Preprocessing BRISQ Were biospecimen quality-assurance measures applied, e.g., was the RNA of the specimen assessed prior/after long-term storage and immediately before experimental analysis?   *Detailed Attributes are found in additional file: Table S7. Attributes were grouped according to theme to simplify the questions and allow the participants to answer the question more efficiently.  To rank attributes falling under each of these categories, for different biomarker types. 291

Additional File
The study methodology was based on grounded theory which was characterised in 1967, by two 292 sociologists, Glasser and Strauss, as the 'theory that was derived from data, systematically gathered 293 and analysed through the research process'. Grounded theory has been described in different ways 294 since its first characterisation, but there are certain core underlying features that remain crucial 295 across all versions including: i) Simultaneous generation and collection of data via surveys, 296 interviews, focus groups and literature, within other sources, ii) Initial coding and category 297 identification, iii) Intermediate coding and subgrouping of codes into core categories and iv) Advance 298 coding, a process in which the researcher interconnects coding between categories in an attempt to 299 build a storyline grounded on the data. 300 The current study design was developed based on grounded theory with the support of qualitative 301 expert SM. During the interview the following semi-structured questions were asked: 332 • What do you think makes a good biomarker? Please list five characteristics linked with a 333 successful biomarker. 334 • Please have a look in the table overleaf (Additional file: Figure S7)  Data was anonymised, and participants were unidentifiable, as each one was given a unique ID. 343 Interview response were audio recorded and transcribed verbatim, after which the data were 344 immediately encrypted and stored. These data will only be accessible by members of the research 345 team. Electronically transcribed data were subsequently thematically analysed using Nvivo Pro 346 Information provided by the responders was kept anonymised and participant information remained 420 confidential, e.g., name, DOB, etc. Study participation was voluntary, while all potential participants 421 had the right to refuse or withdraw from the study at any given point. In both semi-structured 422 interviews and Delphi, participants were provided a patient information leaflet and were allowed 423 enough time to make an informed decision in respect to their participation in the study (at least two 424 weeks We now illustrate how one uses the formulae in practise. Below you can see a simplified version of 456 the toolkit, with a few of the attributes, as a worked example for score calculation. In the following 457 example, there are 5 studies in total and N1 is 4, N2 is 6 and N3 is 3. As shown in Worked Example 458 Part 1, study 1 is scored based on the reporting of specific attributes. For instance, using 459 "Experimental design" as an example: if experimental design is clearly reported in the journal, then 460 the study scores "1", otherwise "0" is assigned. 461 At the first step, the average of the scores from all attributes, per study, per category is calculated. 462 This is repeated for all clinical studies regarding each biomarker being assessed. 463  2003). "2/3" represents the score of Clinical Utility, for study 1 (=2) (worked example 1), divided by the total number of attributes (=3). 2/3 is then multiplied by 100 to become a percentage. "100" is assigned for the presence of i) cost effectiveness study and a ii) decisional analysis study. "0" is assigned for utility, feasibility/implementation, and human factor studies as there were none conducted prior to study 1 publication date (2008).
In optimisation publication was identified for biomarker X, then "1" would be assigned in the attribute 543 "Did the study report study optimisation?", in every publication that used this specific optimised Worked Example Part 4: Equation 4 was used to calculate the overall score from the worked example. These three % scores are divided by three to achieve an average between the three categories which corresponds to the overall score.

Statistical Analysis Justification 552
Cox regression (or proportional hazards regression) is used to formulate predictive model for 553 time-to-event data. For the purpose of this analysis "event" was considered to be biomarker 554 stalling. In this paper Cox-Regression was used as it enables the evaluation of the effects of 555 several variables, taking into consideration the effect of time. In this case study publication 556 date was considered in the model, in addition to other variables including: i.e., Clinical Validity, 557 Clinical Utility and Analytical Validity scores in addition to biomarker type. Therefore, the 558 influence of variables on time-to-event occurrence could be investigated. 559 A logistic regression was performed to assess the relation of each biomarker's: i) sub-category 560 score, ii) Analytical Validity score, iii) Clinical Validity score, iv) Clinical Utility score and v) Total 561 % score with Biomarker implementation status. Since implementation status is a binary 562 measure, logistic regression was used, which also allows the assessment of how well the set 563 of variables can predict the categorical dependant variable (biomarker success) and provide a 564 summary of accuracy % regarding the classification of your cases. This can be used to 565 determine the % of correct predictions generated by the model.