Criteria for the use of omics-based predictors in clinical trials: explanation and elaboration
© McShane et al.; licensee BioMed Central Ltd. 2013
Received: 28 May 2013
Accepted: 6 August 2013
Published: 17 October 2013
High-throughput ‘omics’ technologies that generate molecular profiles for biospecimens have been extensively used in preclinical studies to reveal molecular subtypes and elucidate the biological mechanisms of disease, and in retrospective studies on clinical specimens to develop mathematical models to predict clinical endpoints. Nevertheless, the translation of these technologies into clinical tests that are useful for guiding management decisions for patients has been relatively slow. It can be difficult to determine when the body of evidence for an omics-based test is sufficiently comprehensive and reliable to support claims that it is ready for clinical use, or even that it is ready for definitive evaluation in a clinical trial in which it may be used to direct patient therapy. Reasons for this difficulty include the exploratory and retrospective nature of many of these studies, the complexity of these assays and their application to clinical specimens, and the many potential pitfalls inherent in the development of mathematical predictor models from the very high-dimensional data generated by these omics technologies. Here we present a checklist of criteria to consider when evaluating the body of evidence supporting the clinical use of a predictor to guide patient therapy. Included are issues pertaining to specimen and assay requirements, the soundness of the process for developing predictor models, expectations regarding clinical study design and conduct, and attention to regulatory, ethical, and legal issues. The proposed checklist should serve as a useful guide to investigators preparing proposals for studies involving the use of omics-based tests. The US National Cancer Institute plans to refer to these guidelines for review of proposals for studies involving omics tests, and it is hoped that other sponsors will adopt the checklist as well.
KeywordsAnalytical validation Biomarker Diagnostic test Genomic classifier Model validation Molecular profile Omics Personalized medicine Precision Medicine Treatment selection
The promise of omics profiling for therapeutic decision making
High-throughput ‘omics’ technologies may allow more informative characterization of disease to better predict both an individual patient’s clinical course and the degree of benefit he or she may derive from new and existing therapies. The potential for detailed characterization of disease has been met with particularly great enthusiasm in oncology, where the heterogeneous character of malignant diseases has long presented challenges. However, despite the widespread use of these technologies in both preclinical and retrospective studies, it has proven more difficult than expected to translate their promise into clinically useful tests that can be used to guide management decisions.
This paper focuses on molecular tests derived from high-throughput omics assays (‘omics-based tests’ or simply ‘omics tests’) as defined in the Institute of Medicine (IOM) report Evolution of Translational Omics. An IOM committee that was convened to review omics-based tests for predicting patient outcomes in clinical trials defines ‘omics’ as the study of related sets of biological molecules in a comprehensive fashion. Examples of omics disciplines include genomics, transcriptomics, proteomics, metabolomics, and epigenomics. Further, the IOM committee defined an omics-based test as ‘an assay composed of or derived from multiple molecular measurements and interpreted by a fully specified computational model to produce a clinically actionable result’ .
A distinguishing characteristic of the omics tests discussed here is that computational methods are applied to the high-dimensional data to build mathematical models, often from a subset of the measured variables that have been identified through data-driven selection. This is in contrast to molecular tests based on pre-specified, biologically driven variables, such as mutations in genes targeted by a new therapeutic agent, which might be used to screen patients for eligibility for a clinical trial. Although these biologically driven tests must be based on assays with appropriate analytical performance, they are not subject to all of the same pitfalls that are inherent in omics tests involving complex computational models, so they are not the main focus of this paper.
The development path from high-throughput omics technology to a clinical-grade omics test requires rigorous attention to criteria including the following:
● Availability and quality of appropriate clinical specimens
● Requirements for the analytical performance of the omics assay
● Methods for omics data pre-processing
● Development of the mathematical predictor model and assessment of its performance
● Clinical interpretation of the test result
● Design of the clinical trial
● Ethical, legal, and regulatory issues
Given the rich data emerging from cancer genomics research, it might seem surprising that relatively few omics tests have successfully navigated this path to clinical use. In some cases, the use of omics tests in clinical trials, or their promotion for routine clinical use, has been premature .
There are many reasons for the paucity of omics tests that are able to provide patients and clinicians with information that is useful in the assessment and treatment of disease. They include difficulty in obtaining a sufficient number of acceptable-quality biospecimens with the desired clinical and pathological characteristics, as well as technical challenges in the development and implementation of assays that can be successfully applied to the available types of clinical specimens. Optimal analytical performance and reproducibility of an omics assay may be difficult to achieve, or the assay may lack robustness to ancillary pre-analytical influences on specimens. Translation has also been hampered by the difficulties in properly evaluating the accumulated body of evidence for an omics test by the time there is interest in its definitive evaluation in a clinical trial. Omics studies are often not reported in sufficient detail to allow assessment of the rigor with which the test was developed or evaluated. Adequate data and computer code are not always made available to allow understanding of the methods used in the development of the test or to facilitate independent replication of the results. Subtle flaws in statistical approaches for developing or assessing the performance of mathematical models may go undetected. It is essential to consider all of these issues before launching into a clinical study using an omics test in a way that might influence the clinical management of patients .
Criteria for the use of omics-based predictors in National Cancer Institute-supported clinical trials
1. Establish methods for specimen collection and processing and appropriate storage conditions to ensure the suitability of specimens for use with the omics test.
2. Establish criteria for screening out inadequate or poor-quality specimens or analytes isolated from those specimens before performing assays.
3. Specify the minimum amount of specimen required.
4. Determine the feasibility of obtaining specimens that will yield the quantity and quality of isolated cells or analytes needed for successful assay performance in clinical settings.
5. Review all available information about the standard operating procedures (SOPs) used by the laboratories that performed the omics assays in the developmental studies, including information on technical protocol, reagents, analytical platform, assay scoring, and reporting method, to evaluate the comparability of the current assay to earlier versions and to establish the point at which all aspects of the omics test were definitively locked down for final validation.
6. Establish a detailed SOP to conduct the assay, including technical protocol, instrumentation, reagents, scoring and reporting methods, calibrators and analytical standards, and controls.
7. Establish acceptability criteria for the quality of assay batches and for results from individual specimens.
8. Validate assay performance by using established analytical metrics such as accuracy, precision, coefficient of variation, sensitivity, specificity, linear range, limit of detection, and limit of quantification, as applicable.
9. Establish acceptable reproducibility among technicians and participating laboratories and develop a quality assurance plan to ensure adherence to a detailed SOP and maintain reproducibility of test results during the clinical trial.
10. Establish a turnaround time for test results that is within acceptable limits for use in real-time clinical settings.
Model development, specification, and preliminary performance evaluation
11. Evaluate data used in developing and validating the predictor model to check for accuracy, completeness, and outliers. Perform retrospective verification of the data quality if necessary.
12. Assess the developmental data sets for technical artifacts (for example, effects of assay batch, specimen handling, assay instrument or platform, reagent, or operator), focusing particular attention on whether any artifacts could potentially influence the observed association between the omics profiles and clinical outcomes.
13. Evaluate the appropriateness of the statistical methods used to build the predictor model and to assess its performance.
14. Establish that the predictor algorithm, including all data pre-processing steps, cutpoints applied to continuous variables (if any), and methods for assigning confidence measures for predictions, are completely locked down (that is, fully specified) and identical to prior versions for which performance claims were made.
15. Document sources of variation that affect the reproducibility of the final predictions, and provide an estimate of the overall variability along with verification that the prediction algorithm can be applied to one case at a time.
16. Summarize the expected distribution of predictions in the patient population to which the predictor will be applied, including the distribution of any confidence metrics associated with the predictions.
17. Review any studies reporting evaluations of the predictor’s performance to determine their relevance for the setting in which the predictor is being proposed for clinical use.
18. Evaluate whether clinical validations of the predictor were analytically and statistically rigorous and unequivocally blinded.
19. Search public sources, including literature and citation databases, journal correspondence, and retraction notices, to determine whether any questions have been raised about the data or methods used to develop the predictor or assess its performance, and ensure that all questions have been adequately addressed.
Clinical trial design
20. Provide a clear statement of the target patient population and intended clinical use of the predictor and ensure that the expected clinical benefit is sufficiently large to support its clinical utility.
21. Determine whether the clinical utility of the omics test can be evaluated by using stored specimens from a completed clinical trial (that is, a prospective–retrospective study).
22. If a new prospective clinical trial will be required, evaluate which aspects of the proposed predictor have undergone sufficiently rigorous validation to allow treatment decisions to be influenced by predictor results; where treatment assignments are randomized, provide justification for equipoise.
23. Develop a clinical trial protocol that contains clearly stated objectives and methods and an analysis plan that includes justification of sample size; lock down and fully document all aspects of the omics test and establish analytical validation of the predictor.
24. Establish a secure clinical database so that links among clinical data, omics data, and predictor results remain appropriately blinded, under the control of the study statistician.
25. Include in the protocol the names of the primary individuals who are responsible for each aspect of the study.
Ethical, legal, and regulatory issues
26. Establish communication with the individuals, offices, and agencies that will oversee the ethical, legal, and regulatory issues that are relevant to the conduct of the trial.
27. Ensure that the informed consent documents to be signed by study participants accurately describe the risks and potential benefits associated with use of the omics test and include provisions for banking of specimens, particularly to allow for ‘bridging studies’ to validate new or improved assays.
28. Address any intellectual property issues regarding the use of the specimens, biomarkers, assays, and computer software used for calculation of the predictor.
29. Ensure that the omics test is performed in a Clinical Laboratory Improvement Amendments-certified laboratory if the results will be used to determine treatment or will be reported to the patient or the patient’s physician at any time, even after the trial has ended or the patient is no longer participating in the study.
30. Ensure that appropriate regulatory approvals have been obtained for investigational use of the omics test. If a prospective trial is planned in which the test will guide treatment, consider a pre-submission consultation with the US Food and Drug Administration.
This checklist applies to any clinical trial involving investigational use of an omics test that will influence the clinical management of patients in the trial, for example, the selection of therapy. In situations where an omics test will be evaluated retrospectively on valuable non-renewable specimens collected from patients who were prospectively enrolled in clinical studies, many of the checklist criteria are still applicable and the checklist can serve as a useful guide in judging the quality of the predictor development process and the strength and reliability of the evidence.
This paper is intended as an annotated companion to the short version of these guidelines published elsewhere . Whereas that brief article provides a quick overview of the checklist, background on its development, and discussion of the context in which it is intended to be used, this longer paper elucidates the rationale underlying the development of the criteria in greater detail.
These are general guidelines, and it is recognized that there may be nuances in how they are applied to a particular omics test and clinical setting. The development of omics tests typically proceeds through a series of studies, and it may not be possible to address all of these criteria in early developmental studies. Ideally, investigators should consult this checklist during the research planning and test development phases so that critical evidence is systematically acquired and reported and, by the time the test is ready for definitive evaluation in a clinical study, the necessary evidence to comprehensively address the criteria has been obtained. It is hoped that researchers will find this checklist useful as they prepare background material for research proposals and clinical trial protocols.
Establish methods for specimen collection and processing and appropriate storage conditions to ensure the suitability of specimens for use with the omics test.
Establish criteria for screening out inadequate or poor-quality specimens or analytes isolated from those specimens before performing assays.
Specify the minimum amount of specimen required.
Determine the feasibility of obtaining specimens that will yield the quantity and quality of isolated cells or analytes needed for successful assay performance in clinical settings.
Many omics tests are developed with retrospective collections of specimens that might have already been pre-selected to be of sufficient quality and quantity and thus may not be representative of the specimens that are likely to be obtained in the intended-use clinical setting. It might not be known whether this pre-selection has occurred. There may be no record of the number of patients for whom specimen collection was initially attempted or of the number of attempts that were made per patient until those attempts were either successful or aborted. Some specimens might have been previously collected in specialized research settings in which there were adequate expertise and resources to successfully execute the collections. Patients treated in the context of research studies may be more accepting of specimen collection and more tolerant of potentially invasive specimen collection procedures, leading to greater success in specimen collection in these settings than might be expected in routine clinical settings. To evaluate the feasibility of collecting the needed quantity and quality of specimens in a multicenter clinical trial or a routine clinical setting, it may be necessary to conduct a preliminary feasibility study in more realistic clinical settings. It is important to establish that the omics test is sufficiently robust and will perform acceptably for specimens likely to be encountered in clinical practice.
Review all available information about the standard operating procedures (SOPs) used by the laboratories that performed the omics assays in the developmental studies, including information on technical protocol, reagents, analytical platform, assay scoring, and reporting method, to evaluate the comparability of the current assay to earlier versions and to establish the point at which all aspects of the omics test were definitively locked down for final validation.
The value of critically examining the history of an assay’s development before proceeding to a clinical trial is often underappreciated. Research laboratories in particular may modify their assay methods over time to improve assay performance or adapt to changing costs or availability of reagents, specimens, or instrumentation. Before an omics test is considered for use in a clinical trial, the assay methods used and the data gathered in the developmental stages should be carefully reviewed to determine whether the version of the assay underlying the omics test being proposed for the trial can be expected to generate data comparable to those generated by the former version(s) of the assay. This can be a particular challenge when using omics tests utilizing commercially available microarrays or other rapidly evolving technologies. This assessment should include not only the primary (‘raw’) data generated by the assay but also any scoring method or interpretation rule (for example, positive versus negative) that is to be applied. If data generated in multiple laboratories were used in the development process, it should also be determined whether the different laboratories generated comparable data.
Establish a detailed SOP to conduct the assay, including technical protocol, instrumentation, reagents, scoring and reporting methods, calibrators and analytical standards, and controls.
The assay protocol should be sufficiently detailed to ensure its reproducibility. Elements in the SOP should be specific to minimize variation in the result when the assay is performed at different times, in different laboratories (if more than one laboratory will run the test), and by different technicians. The SOP should include not only the technical steps for conducting the assay, but also the instrumentation, reagents, scoring and reporting method, types of calibrators and analytical standards, controls, and quality control procedures for monitoring assay performance to ensure intra- and inter-laboratory reproducibility (see Criterion 7). To avoid ambiguity in usage of the terms ‘calibrators’, ‘standard’, and ‘controls’, they are defined here as follows:
● A calibrator is a sample engineered to produce a specific value for a particular analyte and is used in the development of a calibration curve to standardize assay values from run to run.
● An analytical standard is a sample that has been extensively characterized and is expected to produce a consistent assay result in repeated assays over time.
● A control is a biological specimen that is available in sufficient quantity to include in multiple assay batches to monitor assay performance for potential drift; or a biological specimen that is expected to produce an unequivocally negative (negative control) or positive (positive control) result.
Establish acceptability criteria for the quality of assay batches and for results from individual specimens.
Validate assay performance by using established analytical metrics such as accuracy, precision, coefficient of variation, sensitivity, specificity, linear range, limit of detection, and limit of quantification, as applicable.
Before an omics test is used in a clinical trial, the analytical performance of the assay should be evaluated to establish the assay’s analytical validity. This evaluation should examine performance metrics, such as accuracy, precision, coefficient of variation, sensitivity, specificity, linear range, limit of detection, and limit of quantitation, as applicable for the particular test under study. A study reported by Tabb et al.  provides examples of the types of repeatability and reproducibility assessments that could be made in proteomic identifications by liquid chromatography–tandem mass spectrometry. A number of helpful guidance documents and templates that outline best practices for characterizing assay analytical performance are available [18–27].
If individual biomarker measurements are combined (for example, as a weighted average), it may be useful to understand the analytical performance characteristics of the individual biomarker measurements that enter into the omics prediction model, particularly in the model development stage. However, assessment of the analytical characteristics of the final result produced by the omics test is of primary importance as the test is moved into use in a clinical trial. The bias or imprecision in the final result will have a direct impact on the patient care that is guided by the test. For example, a genomics test might produce a continuous risk score, calculated as a linear combination of multiple biomarker measurements. The reproducibility of the final risk score generated by such a linear predictor will depend on the bias and precision of the measurements of the individual biomarkers and the weight given to each biomarker in the risk score. A cutpoint may be applied to the risk score for translation into a clinical classification. The reproducibility of the final clinical classification will then depend on the reliability of the risk score and the proportion of the risk scores that cluster near the cutpoint. Higher variability can be tolerated in a risk score when it is far away from a cutpoint, because the final clinical classification is unlikely to change due to inaccuracy in the risk score. Special considerations apply to prediction models that require complex iterative or stochastic calculations for evaluation, in contrast to the simple setting of the linear risk score just discussed (see Criteria 15 and 16).
Establish acceptable reproducibility among technicians and participating laboratories and develop a quality assurance plan to ensure adherence to a detailed SOP and maintain reproducibility of test results during the clinical trial.
Establish a turnaround time for test results that is within acceptable limits for use in real-time clinical settings.
For an omics-based test to be useful in clinical practice, it must be feasible to collect and process the required specimen, complete the assay, generate and confirm the validity of the primary data, compute the predictions, and have the result available within an acceptable time frame without substantially delaying the usual timing of clinical decisions regarding treatment or other follow-up care. Because many omics-based tests are developed and preliminarily validated on retrospective specimen collections, in some cases there is no prior opportunity to assess the feasibility of using the test in real time. Feasibility studies should be conducted prior to the initiation of a trial to ensure that the necessary infrastructure and resources will be in place to collect and process the required specimens according to the specified methods, that there will be sufficient capacity in the laboratories performing the assays, and that it will be possible to achieve timely data submission and processing to generate the predictions for individual patients in the trial.
Model development, specification, and preliminary performance evaluation
Evaluate data used in developing and validating the predictor model to check for accuracy, completeness, and outliers. Perform retrospective verification of the data quality if necessary.
It is strongly advised that a critical and independent evaluation be conducted of the quality of the data used to develop and preliminarily validate an omics predictor. Unlike data for clinical trials, which are typically collected under standardized and carefully quality-controlled conditions, in many cases data used to develop predictors are based on assays conducted on banked clinical specimens for which clinical and pathologic data may have been assembled from retrospective record reviews. Both the omics assay data and the clinical and pathologic data used in the studies should be carefully reviewed for any evidence of errors, inconsistencies, or biases resulting from careless or incomplete data collection and clinical annotation.
In some cases the omics assays might have been conducted by others and only the omics data, and not the specimens, are available. In these situations there may be little information available about the quality of the specimens or assay procedures used. Quality metrics have been proposed for data from some types of omics assays [32–35], and these can be helpful to identify potential problems. Quality assessments should be performed on the primary omics data from original sources, if available, as well as on any pre-processed data to confirm that no errors were introduced during data handling and processing.
Some omics data problems can be identified by use of simple descriptive statistics. Unusually high correlation between molecular profiles of two different specimens may indicate an unintended duplication of specimen labels. When analyzing data merged from several different studies, one should always assess for high correlations between specimens that could occur if there is overlap in the patients whose specimens were examined in the different studies. For nucleic acid-based assays, cross-contamination of specimens can occur and distort genetic variant profiles. It is worthwhile to conduct these preliminary data analyses to allow for identification and removal of problematic data and increase confidence in the data’s reliability.
Assess the developmental data sets for technical artifacts (for example, effects of assay batch, specimen handling, assay instrument or platform, reagent, or operator), focusing particular attention on whether any artifacts could potentially influence the observed association between the omics profiles and clinical outcomes.
Some features of omics profiles can arise from artifacts introduced due to variations in specimen handling, assay reagents, or instrumentation . It is important to check the omics data for evidence of these artifacts. Methods of specimen handling or processing can change over time or differ across clinical sites. Over time, laboratory instruments can drift, reagent lots can change, and assay results can exhibit distinctive ‘batch effects’ due to changes in technique, environmental conditions, or operators. Technology platforms (instrumentation, software, and reagents) can become obsolete, requiring replacement with new versions. A laboratory information management system can be useful for tracking some of these factors to allow for detection of possible problems and troubleshooting. Although attempts can be made to correct for these artifacts through data adjustment and/or use of replicated assays of analytical standards or calibrators, such adjustments often do not completely remove them. The residual effects of these artifacts introduce ‘noise’ into the data and may degrade the performance of the omics predictor.
The best line of defense against technical artifacts in the development stage of an omics predictor is quality monitoring and good experimental design to avoid confounding technical factors with important biological effects or clinical outcomes . For example, if specimens from patient responders and non-responders were assayed in separate batches, an omics predictor developed from those data might predict only assay batch, not clinical outcome. This can be avoided by randomly assigning specimens to assay batches. Other forms of confounding can be more subtle. If patients accrued at one clinical site tend to have worse prognoses than those accrued at a second clinical site, and the two sites process specimens differently in ways that affect the omics profile, an omics predictor developed from such data could end up predicting the specimen processing method and have little true value for predicting clinical outcome. Whenever possible, SOPs for specimen handling should be put in place across all clinical sites to minimize these nuisance effects and avoid confounding specimen handling with patient prognostic characteristics that vary by clinical site. If it is not possible to standardize procedures, or if existing collections of specimens are being used (potentially accrued from multiple clinical sites), it is important to demonstrate, perhaps through multivariable statistical analyses, that the omics predictor has the ability to predict outcome within each clinical site and after adjustment for other standard clinical or pathological variables.
Evaluate the appropriateness of the statistical methods used to build the predictor model and to assess its performance.
The high dimensionality of omics data and the complexity of many algorithms used to develop omics predictors present many potential pitfalls if proper statistical modeling and evaluation approaches are not used. Various statistical methods and machine learning algorithms are available to develop models, and each has its strengths and weaknesses [38, 39]. There is no uniformly best algorithm for developing a predictor model . One of the earliest studies to compare several methods for development of predictors from gene expression microarray data showed that simple linear diagonal discriminant analysis and nearest-neighbor methods performed as well or better than those developed with a variety of more complex approaches on multiple data sets . A subsequent study conducted by the MicroArray Quality Control Consortium  compared predictors developed by 36 independent teams analyzing six microarray data sets to develop predictors for 13 cancer and toxicology endpoints. That study concluded that the performance of the predictors that were developed ‘depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance.’ More complex modeling approaches, especially those involving regularization (approaches to constrain complexity of models) and optimized feature selection , theoretically have the potential to produce better-performing predictors. However, they are sophisticated tools that need to be applied by skilled hands. In situations where the number of omics variables far exceeds the number of independent subjects or specimens, current evidence from comparative studies has not convincingly demonstrated the advantage of highly complex modeling approaches.
A pervasive problem in the omics literature is that algorithms to develop predictors are often applied naively, and flawed approaches are used to assess the predictor’s performance [40, 44, 45]. The more complex the algorithm, the greater the chance that it will be misunderstood or applied incorrectly in inexperienced hands. The two most common pitfalls in developing predictors and assessing their performance are model overfitting and failure to maintain strict separation between the data used to build the predictor model and the data used to assess the predictor performance. These two pitfalls occur in a large number of published papers claiming to have developed omics predictors with good performance [44, 45]. The result is failure of many omics predictors when they are tested on a truly independent data set [46, 47].
Overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Modeling strategies that allow for extremely complicated relationships between one or more independent variables and the dependent (outcome) variable and models that are built from very large numbers of variables are particularly prone to overfitting because they can exaggerate minor fluctuations in the data . For omics data, the number of variables that are available to build a predictor typically greatly exceeds the number of independent subjects for whom omics profiling data have been obtained; therefore, the potential to overfit models to high-dimensional omics data can be enormous. For studies in which an omics predictor is developed for a time-to-event endpoint such as survival, the number of observed events (for example, deaths) is a major determinant of the reliability of the fitted model. For diagnostic studies aiming to build a predictor for disease state or class (for example, cancer or no cancer), the number of patients in the least prevalent class most strongly affects the ability to develop a reliable diagnostic model. Modeling approaches that include regularization to constrain the complexity of the model and limit the influence of individual variables or observations are particularly helpful in reducing the potential for overfitting. When selecting among candidate models, it is important to use appropriate procedures to assess model performance to guard against incorrectly selecting an overfitted model. Overfitted models will generally have poor predictive performance on new data sets.
Validation of model performance on a completely independent external data set is optimal, but there can be ambiguity in the required degree of independence. For the strongest independent external validation, the specimens should be collected under the intended conditions at a different site, and the assays should be conducted according to the final SOP in a different laboratory and by different laboratory personnel. If sufficient numbers of patients or specimen sets are available, a series of validations might be performed in which an additional factor is varied in each successive validation attempt to systematically establish robustness of the omics predictor to these conditions. It is important to clearly describe the conditions under which each validation is attempted so that the strength of each validation can be evaluated in the proper context.
It is not always possible to obtain external data collected under the exact target clinical setting and meeting all of the quality standards for the most rigorous type of validation. In some cases, there might be an external data set, but the specimens or assay protocols might not exactly match those intended for the omics test; in other instances, the clinical setting might be outdated, for example, because standard treatment practices have changed. In these circumstances, one must rely on use of model development techniques designed to avoid overfitting, and one can only make assessments of performance internal to the data set used to build the model.
Performance assessments based on re-use of data used in model building are informative only when appropriate internal validation strategies are used. Despite standardization of assays and attempts to avoid model overfitting, a predictor model will nearly always fit better to the data used to develop it than it does to completely independent data. For models built from high-dimensional omics data, simply ‘plugging in’ to a predictor exactly the same data that were used to build it in order to estimate the predictor’s performance — a so-called resubstitution estimate — results in a highly optimistically biased estimate of performance [40, 45]. Resubstitution estimates of performance for predictors built from high-dimensional omics data are uninterpretable and should not be reported. Unfortunately, resubstitution estimates of model performance can still be found in some published articles. Appropriate alternatives to the naïve resubstitution method are available and should be used for internal validation.
The guiding principle for how to avoid optimistically biased estimates of predictor performance is to never estimate a predictor’s performance by using data that were used to derive the prediction model. The easiest way to ensure separation of model building and assessment is to have completely independent training and validation data sets so that an external validation can be performed. For the most rigorous external validation:
● The predictor to be tested must be completely locked down and there must be a pre-specified performance metric. The lockdown includes all steps in the data pre-processing and prediction algorithm.
● The independent validation data should be generated from specimens collected at a different time, or in a different place, and according to the pre-specified collection protocol.
● Assays for the validation specimen set should be run at a different time or in a different laboratory but according to the identical assay protocol as was used for the training set.
● The individuals developing the predictor must remain completely blinded to the validation data.
● The validation data should not be changed based on the performance of the predictor.
● The predictor should not be adjusted after its performance has been observed on any part of the validation data. Otherwise, the validation is compromised and a new validation may be required.
Internal validation can be used when no external validation set is available and is also helpful to use during the model building stage to generate preliminary estimates of model performance to monitor and guard against overfitting. Either the original data set can be split into ‘training’ and ‘testing’ subsets  or a variety of data resampling techniques can be used to iteratively build the predictor model on a portion of the data and test it on the remaining portion.
Split-sample validation, in which a single data set is split into two parts, has the advantage of being computationally simple. It also provides the flexibility needed to make subjective decisions in the model building process on the training portion of the data. As a further check on model performance in split-sample validation, one can interchange the training and testing subsets to evaluate whether the models built using each subset have similar performance on the other subset . Sometimes what is viewed as a single data set may actually be an amalgamation of several smaller sets. Different portions of the data might represent omics data that were generated in different laboratories or from specimens collected at different clinical sites. Even if all laboratories and clinical sites followed a common protocol for specimen collection and omics assays, subtle differences can arise. In this situation, the most challenging type of split-sample validation, and most representative of ‘real world’, is to partition the data into training and testing sets to minimize the overlap in the laboratories or clinical sites represented in the two sets. A disadvantage of split-sample validation is its inefficiency. This is because only a fraction of the data is ever used to build the predictor model. Split-sample validation applied to small or moderate size data sets tends to yield biased results because it underestimates the performance of a model that could be built using the entire data set.
Resampling methods have an advantage over split-sample validation in that they use the full data set but iteratively select which portion of the data serves as a training set, so that at the end of the iterations, each case has been in at least one training subset and in at least one validation subset. Examples of resampling methods include various forms of cross-validation (for example, leave-one-out, k-fold where typically k = 5 or 10) and bootstrapping. The best choice of resampling method for a particular problem depends on the size of the data set and the desired trade-off between bias in the performance estimate and variance of the performance estimate .
Internal validation has two major limitations. First, if the cases selected for study are not representative of the intended-use population or there are technical artifacts affecting the entire data set, any subsets of the data will inherit those same problems. This is especially of concern when the bias is from technical artifacts that confound the association between the omics profiles and the clinical endpoint of interest (for example, specimens from patients with favorable clinical outcomes run in a different assay batch than those from patients with unfavorable clinical outcomes). This can lead to biased estimates of test performance that cannot be detected unless the test is evaluated on a completely independent data set not subject to these same problems. Second, iterative resampling methods are applicable only if the process used to build the predictor model is entirely algorithmic and requires no subjective judgment. Subjective judgment can potentially and subtly enter into several aspects of building predictor models. These aspects may include decisions about constraints on the number of variables in the model, constraints on the weight given to any single variable, how to handle unusual measurements, how to summarize redundant variables, and where to set cutpoints on risk scores for clinical decision making. Although many of these aspects can be decided in a fully algorithmic fashion, many investigators are reluctant to rely on completely automated model building methods.
The principle of separation of training and validation sets can be violated in several more subtle ways. Sometimes investigators use a split-sample approach and develop a predictor model using only a portion of the data, but then present the model performance estimates on the combined training and validation sets rather than on the validation set only, as would be appropriate. This still leads to an optimistic bias in the performance assessment because the resulting performance estimate is a hybrid of an optimistically biased resubstitution estimate on the training data and an independent estimate on the validation data . Another common error made when applying iterative resampling validation approaches is to perform an initial screen for omics variables using the entire data set to identify those variables that are univariately most informative for predicting the clinical outcome of interest, and then to perform iterative resampling to fit the predictor model using only that subset of selected variables. The leak of information resulting from that initial screening on the full data set to reduce the number of variables can be substantial and can lead to optimistic bias in performance estimates nearly as large as for resubstitution estimates [40, 51].
Other practices that can introduce bias into the reported performance of a predictor model include selective inclusion or exclusion of certain cases in either the training or the validation sets to obtain improved estimates of predictor performance, testing performance in multiple patient subgroups within the full cohort, and trying multiple model building strategies on the training set, but then reporting only the one that yields the best performance estimate on the validation set. All of these practices are examples of multiple testing, and they can lead to spurious findings and optimistically biased estimates of reported predictor performance.
Establish that the predictor algorithm, including all data pre-processing steps, cutpoints applied to continuous variables (if any), and methods for assigning confidence measures for predictions, are completely locked down (that is, fully specified) and identical to prior versions for which performance claims were made.
A multitude of steps occur from the point at which primary omics data are generated from a specimen until a final result is produced from the omics predictor. These include data pre-processing steps such as overall data quality assessments, exclusion of unreliable measurements, data normalization, calculation of intermediate summary statistics (for example, calculation of gene-level summary expression levels from probe intensity values in microarray data), calculation of a score (possibly subjected to a cutpoint for clinical decision making), or prediction of an outcome. A standardized format for reporting the test result should be developed to ensure proper clinical interpretation. Elements of the report might include a continuous score (for example, probability of disease recurrence) or discrete classification (for example, disease subtype) or both, perhaps accompanied by some measure of confidence or statistical uncertainty interval for the result (for example, strong positive, equivocal, ‘uncertainty in risk score is ±10%’) and a recommended clinical action (for example, consider adjuvant chemotherapy, contraindication for a drug class).
Document sources of variation that affect the reproducibility of the final predictions, and provide an estimate of the overall variability along with verification that the prediction algorithm can be applied to one case at a time.
The association between an omics test result and the clinical outcome the test is intended to predict will be attenuated if the testing process lacks reproducibility. Test results obtained for a given individual can vary for numerous reasons, including biological heterogeneity of the specimen (for example, distinct clonal subpopulations of cells or necrotic regions in a tumor), variation in specimen handling, technical variation in the assay, and numerical variation in the prediction algorithm. Some of these sources of variation can be controlled and others cannot. Biological heterogeneity within a specimen cannot be controlled, but it must be understood. If there is substantial biological heterogeneity and it cannot be determined that one portion or region of the specimen provides the omics information of most relevance to the clinical outcome (for example, the leading edge of the tumor, the area of the tumor with highest grade, or the stem cell compartment), then it is unlikely that the omics test will produce clinically reproducible and informative results. Variation due to specimen handling and assay technical variation is best controlled through careful specification of SOPs and quality monitoring. Numerical variation in the prediction algorithm can be controlled through choice of the algorithm or algorithm settings. Multiple reproducibility assessments may be required to fully understand the relative contributions of all of these sources of variation to the overall variation in the predictions that could be obtained for a given individual.
Reproducibility assessments should be reported in sufficient detail to allow others to understand the sources of variation that are being evaluated. For example, two separate portions of a tumor that are independently subjected to the analyte extraction process (for example, mRNA extraction from a tumor specimen), assay procedure (for example, gene expression microarray analysis), and prediction algorithm would be expected to exhibit more variability than replicate assays of a single sample of extracted analyte that is split and run through the omics assay process and prediction algorithm. Studies should be conducted to evaluate the robustness of omics test results to variations in specimen collection, processing, and storage if tight controls on these factors are not specified as part of the SOPs for the testing process. Specimens used in reproducibility studies should be comparable to the clinical samples for which use of the omics test is being proposed. Because more highly reproducible results are likely to be obtained on cell lines or artificially derived specimens that have been carefully prepared in a laboratory setting than on actual clinical samples that were collected under less predictable conditions, variability assessments made on cell lines or other derived specimens are likely to substantially underestimate the variability that could be experienced in clinical practice.
Variation in predictions due to the numerical algorithm that mathematically evaluates the model is the most straightforward to assess. This variation can be assessed independently of the biological and assay technical variation, but it will still contribute to the overall variation in the predictions for any given individual. This numerical variation arises only for prediction models that cannot be evaluated as a simple formula and require evaluation by an iterative or stochastic computerized algorithm. A stable computer algorithm should produce highly similar results when exactly the same primary omics data are used on independent occasions as the model input. Often computerized algorithms are needed because evaluation of the prediction model involves complex mathematical equations that can be solved only by the use of iterative numerical approximations. For example, Markov Chain Monte Carlo methods are iterative mathematical algorithms often used in Bayesian statistical modeling approaches . Other prediction modeling approaches require a computer algorithm because they involve combining many models or decision algorithms where each component model or algorithm is built using only a randomly selected subset of the patient data (for example, random forests [59, 60]) or using only a small subset of randomly chosen variables among a much larger number of variables available (for example, shotgun stochastic search ). In these situations, any variability associated with the iterative calculation or random-component model selection is incorporated into the variability of the final predictions. Depending on the particular algorithm used and the data set to which the algorithm is applied, it is possible that the randomness introduced by the computerized algorithm could be substantial. This variability becomes part of the variability in the test result and must be assessed along with the assay analytical performance (see Criterion 8). Sometimes such instability can be addressed through measures such as locking down random number seeds used by iterative numerical algorithms or by saving information about the exact component models that are combined into a final model. If the numerical variability in the final results cannot be adequately controlled, then these complex models will not be suitable for making clinical predictions where it is expected that the same set of observed omics variables should lead to a consistent final test result.
Many studies in which omics predictors have been developed have used data pre-processing methods that induce interdependencies of pre-processed data or predictions made on a collection of specimens. One circumstance in which this occurs is when the pre-processing or prediction algorithm that is applied to an individual specimen depends on the other specimens that happen to be processed with it. A simple example of when such a dependency could occur is when each measured variable is centered by subtracting that variable’s mean value calculated across a collection of specimens. A more complex example is the widely used Robust Multi-array Average method for calculating gene-level summaries from probe sets in the Affymetrix GeneChip system [62, 63]. The Robust Multi-array Average method incorporates data from a collection of microarrays to fit a model that includes terms to estimate probe-specific sensitivities, and it is used to produce the gene-level summaries for each probe set on an Affymetrix GeneChip. For implementation in a clinical trial in which individual patients will be accrued to the trial over time, methods that require group processing of omics profiles must be modified to allow for processing one omics profile at a time. Some investigators have addressed this issue by using a fixed reference set of omics profiles to which each new omics profile is added and then removed for purposes of pre-processing data [64–66].
Summarize the expected distribution of predictions in the patient population to which the predictor will be applied, including the distribution of any confidence metrics associated with the predictions.
Review any studies reporting evaluations of the predictor’s performance to determine their relevance for the setting in which the predictor is being proposed for clinical use.
Many studies report to have performed validations of omics predictors, but the term ‘validation’ is used in many different ways. A variety of questions should be asked to assess the strength and relevance of any study that claims to have validated an omics predictor. Sometimes a technical validation has been performed to show that an alternative assay methodology produces measurements that have significant correlation with the originally measured omics variables. Although technical validations provide some assurance that the study results are not wholly artifacts of the assay process, they do not provide any clinically relevant validation. Other types of validations in preclinical systems, for example, drug sensitivity experiments in cell lines, may support biological plausibility, but they do not provide clinical validation. For a study to provide a clinical validation, there must be a predefined and clinically meaningful performance metric for the predictor, and the clinical setting (for example, disease type and stage, specimen format) must be similar to the intended-use setting.
Evaluate whether clinical validations of the predictor were analytically and statistically rigorous and unequivocally blinded.
Search public sources, including literature and citation databases, journal correspondence, and retraction notices, to determine whether any questions have been raised about the data or methods used to develop the predictor or assess its performance, and ensure that all questions have been adequately addressed.
Omics research has been at the forefront of efforts to promote the public availability of data, and there has been unprecedented sharing of computational algorithms and computer code. Without this sharing of data and algorithms, the sheer volume of data and the complexity of many analyses conducted with omics data would have made it virtually impossible to reproduce many omics study results. On occasion, questions arise about data or analytic approaches when others try to reproduce results using publicly available data, methods, or computer code. Whenever an omics predictor is to be used in a trial or other clinical setting where it will influence patient care, there must be transparency so that any concerns about accuracy of data or appropriateness of methods can be promptly and fully addressed before resources are expended to pursue further development of the predictor, and certainly before the predictor is used clinically.
Clinical trial design
Provide a clear statement of the target patient population and intended clinical use of the predictor and ensure that the expected clinical benefit is sufficiently large to support its clinical utility.
Many published omics studies report statistically significant associations between omics predictor results and clinical endpoints. Although the presence of such an association may establish the clinical validity of the test, statistical significance (for example, P >0.05) does not always translate into a clinically meaningful association or provide clinically useful information. Unless the omics predictor provides new information that is readily interpretable and useful to the physician and patient in making treatment decisions, the investment of resources in developing a clinical test may be wasted. To establish clinical utility, as opposed to clinical validity, there must be evidence suggesting that use of the test is likely to lead to a clinically meaningful benefit to the patient beyond that provided by current standards of care [83, 84].
Determine whether the clinical utility of the omics test can be evaluated by using stored specimens from a completed clinical trial (that is, a prospective–retrospective study).
In some instances, a candidate prognostic or predictive omics test for an existing therapy can be evaluated efficiently by using a prospective-retrospective design, in which the omics test is applied to archived specimens from a completed trial and the results are compared with outcome data that have already been collected . The retrospective aspect of this design requires that the assay can in fact be performed reliably on stored specimens. The ‘prospective’ aspect of the design refers to the care taken at the outset of the trial to ensure the following:
● The patients in the trial are representative of the target patient population expected to benefit from the omics test.
● There is a pre-specified statistical analysis plan.
● Sufficient specimens are available from cases that are representative of the trial cohort and intended-use population to fulfill the sample size requirements of the pre-specified statistical plan, and those specimens have been collected and processed under conditions consistent with the intended-use setting.
If a new prospective clinical trial will be required, evaluate which aspects of the proposed predictor have undergone sufficiently rigorous validation to allow treatment decisions to be influenced by predictor results; where treatment assignments are randomized, provide justification for equipoise.
A variety of designs have been proposed for phase III clinical trials incorporating biomarkers [86–88]. There are three basic phase III design options that are frequently considered for assessing the ability of a biomarker to identify a subgroup of patients who will benefit from a new therapy. These are the enrichment design, the stratified design, and the strategy design. In the enrichment design, only patients who are positive for the biomarker are randomized to the standard or new therapy. This approach can answer the question of whether biomarker-positive patients benefit from the new therapy, but it cannot be used to empirically assess whether biomarker-negative patients might benefit as well. The stratified design randomizes all patients but conducts the randomization separately with each of the biomarker-positive and -negative groups to ensure balance of the treatment arms within each group. This approach provides maximum information about the ability of the biomarker to identify patients who will benefit from the new therapy. A stratified design does not allow the biomarker to influence what treatment a patient receives. This can be an advantage in a situation where there is some uncertainty about the strength of a biomarker’s performance because there were limited specimens available on which to perform preliminary validations during the biomarker development process. The strategy design randomizes patients between no use of the biomarker (all patients receive standard therapy on that arm) and a biomarker-based strategy where biomarker-negative patients are directed to standard therapy and biomarker-positive patients are directed to the new therapy. A strategy design in the context of a single biomarker is particularly inefficient because patients who are negative for the biomarker will receive standard therapy regardless of whether they are randomized to use the biomarker. Due to this inefficiency, this strategy design is generally not recommended in a simple single-biomarker setting. Each of these designs has its advantages and disadvantages; the optimal choice depends on feasibility and what properties have already been established for the biomarker.
Many of the same principles discussed for phase III trials also apply to phase II trials. Some ways in which phase II designs may differ from phase III designs include alternative or ‘earlier’ endpoints (for example, disease progression or tumor response) and the possibility of non-randomized (for example, single-arm) trials . Just as for drug trials, phase II designs incorporating biomarkers are generally not definitive and serve mostly as a screen to determine whether there is sufficient promise to proceed to a phase III trial. A recently proposed randomized biomarker-based phase II trial design has as its primary aim the generation of sufficient data to inform the decision about the best design for a subsequent phase III trial .
Lastly, the same basic design considerations for trials incorporating single biomarkers apply to omics tests, even though it can be more difficult to properly evaluate the body of evidence for an omics test to determine its readiness for use in a clinical trial. The difficulties lie in the complexity of some predictors and the generally incomplete reporting of methods and results for such studies. By considering the criteria presented here, it is hoped that the body of evidence will be more systematically and thoroughly reviewed before omics tests are applied in clinical trials where they will be used to guide treatment decisions.
To prepare for a prospective phase II or phase III trial that will use an omics test, a thorough review should be conducted of all retrospective validation studies of the test to assess the evidence for both its prognostic value and its predictive ability. This review should be undertaken before it is proposed that a prospective clinical trial be conducted to definitively evaluate the clinical benefit of the test (clinical utility). ‘Prognostic’ refers to the ability of a test to predict clinical outcome in the absence of therapy (that is, natural history) or in the presence of a standard therapy that all patients are likely to receive. ‘Predictive’ (also called treatment effect modifier, treatment selection, or treatment guiding) refers to the ability of a test to predict benefit or lack of benefit (potentially even harm) from a particular therapy relative to other available therapies. Most developmental studies provide evidence of only some prognostic value of a predictor but do not provide convincing evidence of its predictive value, which is best assessed in the context of a randomized trial. Even a prospective-retrospective study might not be an option to establish predictive utility if there are no specimens available from a trial with the relevant and well-controlled treatment randomization.
In some situations, when the prognostic value of a test has been established as sufficiently robust in retrospective studies, the test can be used in a prospective clinical trial to limit the group of patients who should be randomized. An example of such a situation is provided by the TAILORx (Trial Assigning IndividuaLized Options for Treatment; NCT00310180) trial in patients with node-negative, hormone receptor-positive, HER2-negative breast cancer [91, 92]. That adjuvant trial tested more than 10,000 tumors for the 21-gene recurrence score, assigning patients with low-risk scores to adjuvant endocrine treatment and those with high-risk scores to standard-of-care adjuvant chemotherapy or adjuvant chemotherapy treatment trials, in addition to adjuvant endocrine treatment. Patients with intermediate-risk scores were randomized to receive endocrine therapy with or without chemotherapy as adjuvant treatment. It was thought to be firmly established that the risk of recurrence for patients with a value of the 21-gene recurrence score in the low-risk range was so small that those patients had very little potential to benefit from the addition of chemotherapy to hormonal therapy. This conclusion was supported by high-quality evidence from a prospective-retrospective study conducted with banked specimens from the tamoxifen arm of a large clinical trial , and additional confirmation was provided in a subsequent case-control study . Information about the benefit of chemotherapy for patients with risk scores in the intermediate range was considered to be inconclusive, and the absolute risk of recurrence for those intermediate-risk patients was still fairly favorable; thus it was believed that there was sufficient equipoise about the benefit of chemotherapy in the intermediate-risk group to randomize those patients.
Develop a clinical trial protocol that contains clearly stated objectives and methods and an analysis plan that includes justification of sample size; lock down and fully document all aspects of the omics test and establish analytical validation of the predictor.
Establish a secure clinical database so that links among clinical data, omics data, and predictor results remain appropriately blinded, under the control of the study statistician.
Include in the protocol the names of the primary individuals who are responsible for each aspect of the study.
Successful omics-based clinical research requires an interdisciplinary team of experts, generally including laboratory scientists, clinicians, pathologists, statisticians, bioinformaticians, database developers, computational scientists, data managers, and regulatory experts. A standard therapy trial team may not sufficiently cover all of these areas of expertise. It is important that the need for these varied types of expertise is fully recognized and that involvement of the essential individuals is documented by naming the specific responsible individuals in the protocol document.
Ethical, legal, and regulatory issues
Establish communication with the individuals, offices, and agencies that will oversee the ethical, legal, and regulatory issues that are relevant to the conduct of the trial.
Ensure that the informed consent documents to be signed by study participants accurately describe the risks and potential benefits associated with use of the omics test and include provisions for banking of specimens, particularly to allow for ‘bridging studies’ to validate new or improved assays.
Address any intellectual property issues regarding the use of the specimens, biomarkers, assays, and computer software used for calculation of the predictor.
Ensure that the omics test is performed in a Clinical Laboratory Improvement Amendments-certified laboratory if the results will be used to determine treatment or will be reported to the patient or the patient’s physician at any time, even after the trial has ended or the patient is no longer participating in the study.
Ensure that appropriate regulatory approvals have been obtained for investigational use of the omics test. If a prospective trial is planned in which the test will guide treatment, consider a pre-submission consultation with the US Food and Drug Administration.
Federal regulations for investigational and clinical use of in vitro diagnostics also apply to omics tests. The investigator should contact the FDA early in the planning stages of a trial that will use an omics test to ascertain whether an IDE must be filed . IDE regulations  apply to any device, as defined by the FDA , that poses significant risk. An IDE is designed to allow the collection of safety and effectiveness data to support further development toward marketing the device and may be required before an omics predictor can be used in a clinical trial. For predictive tests, an IDE for the predictor may be requested as part of the Investigational New Drug application  for the companion drug. A device review considers the omics assay as well as other aspects of use of the test, including procedures required to obtain the specimen, the mathematical predictor model, and the format of the results report. It is advisable to obtain a Pre-Sub (formerly known as a pre-IDE) consultation with the Office of In Vitro Diagnostic Device Evaluation and Safety in the Center for Devices and Radiologic Health of the FDA. A Pre-Sub is a free, non-binding consultation with FDA personnel that can help determine the regulatory mechanism, if any, that best suits the development plan [103, 104].
Regulatory classifications are determined by intended use and potential risk. Pertinent risks associated with use of a device could be related to specimen collection procedure (for example, risk of a biopsy), performance of the test, or effects of therapy indicated by results produced by the test, but also could be social or psychological, depending on the intended use of the test. The FDA uses several classifications that may apply to an omics predictor. An In Vitro Diagnostic Multivariate Index Assay refers to any assay that uses multiple variables to yield a patient-specific result whose derivation is not easily verifiable by the end user . The FDA recently introduced a Biomarker Qualification mechanism  to streamline the scientific development of biomarkers and their use in the drug development process. The qualification of a biomarker is independent of the specific assay used to measure it, although at least one reliable assay must be available. Once qualified, the biomarker can be used to develop other drugs or assays without the need to reestablish the validity of using that biomarker in the same context of use. However, qualification of a biomarker does not eliminate the need to satisfy other regulatory requirements for use of the biomarker test in patient care, such as an IDE for investigational use or clearance or approval for marketing. Early consultation with the FDA about which mechanism is most appropriate can help streamline the regulatory approval process.
Evaluation of the readiness of an omics test to be used in clinical care or in a trial where it will guide patient therapy requires careful consideration of the body of evidence supporting the test’s potential clinical utility and safety, as well as an understanding of ethical, legal, and regulatory issues. Considerations include those relating to specimens, assays, the appropriateness of the statistical methods used to develop and validate the omics test, the principles of clinical study design, and regulatory, ethical, and legal issues. It is hoped that the 30-point checklist presented here will help investigators to more reliably evaluate the quality of evidence in support of omics tests, to understand what information is important to document about data provenance and the test development process, and to plan appropriately for the use of omics predictors in clinical trials or clinical care, and that it will guide them toward the use of best practices in omics test development. The ultimate goal is to develop a more efficient and reliable process to move omics assays from promising research results to clinically useful tests that improve patient care and outcomes.
Clinical Laboratory Improvement Amendments
Cancer Therapy Evaluation Program
Food and Drug Administration
Investigational Device Exemption
Institute of Medicine
RNA Integrity Number
standard operating procedure
Technology Transfer Office.
- Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials; Board on Health Care Services; Board on Health Sciences Policy; Institute of Medicine: Evolution of Translational Omics: Lessons Learned and the Path Forward. Edited by: Micheel CM, Nass S, Omenn GS. 2012, Washington, DC: The National Academies Press, http://www.iom.edu/Reports/2012/Evolution-of-Translational-Omics.aspx.Google Scholar
- Poste G, Carbone DP, Parkinson DR, Verweij J, Hewitt SM, Jessup JM: Leveling the playing field: bringing development of biomarkers and molecular diagnostics up to the standards for drug development. Clin Cancer Res. 2012, 18: 1515-1523. 10.1158/1078-0432.CCR-11-2206.View ArticlePubMedPubMed CentralGoogle Scholar
- McShane LM, Cavenagh MM, Lively T, Eberhard DA, Bigbee WL, Williams MP, Mesirov JP, Polley MY, Kim KY, Tricoli JV, et al: Criteria for the use of omics-based predictors in clinical trials. Nature. 2013, 502: 317-320. 10.1038/nature12564.View ArticlePubMedPubMed CentralGoogle Scholar
- Apweiler R, Aslanidis C, Deufel T, Gerstner A, Hansen J, Hochstrasser D, Kellner R, Kubicek M, Lottspeich F, Maser E, Mewes HW, Meyer HE, Müllner S, Mutter W, Neumaier M, Nollau P, Nothwang HG, Ponten F, Radbruch A, Reinert K, Rothe G, Stockinger H, Tarnok A, Taussig MJ, Thiel A, Thiery J, Ueffing M, Valet G, Vandekerckhove J, Verhuven W, et al: Approaching clinical proteomics: current state and future fields of application in fluid proteomics. Clin Chem Lab Med. 2009, 47: 724-744.View ArticlePubMedGoogle Scholar
- Espina V, Mueller C, Edmiston K, Sciro M, Petricoin EF, Liotta LA: Tissue is alive: new technologies are needed to address the problems of protein biomarker pre-analytical variability. Proteom Clin Appl. 2009, 3: 874-882. 10.1002/prca.200800001.View ArticleGoogle Scholar
- Moore HM, Kelly AB, Jewell SD, McShane LM, Clark DP, Greenspan R, Hayes DF, Hainaut P, Kim P, Mansfield EA, Potapova O, Riegman P, Rubinstein Y, Seijo E, Somiari S, Watson P, Weier HU, Zhu C, Vaught J: Biospecimen reporting for improved study quality (BRISQ). Cancer Cytopathol. 2011, 119: 92-101. 10.1002/cncy.20147.View ArticlePubMedGoogle Scholar
- Office of Biorepositories and Biospecimen Research: Revised NCI Best Practices. 2011, http://biospecimens.cancer.gov/practices/2011bp.asp.Google Scholar
- Office of Biorepositories and Biospecimen Research: Biospecimen Research Database. https://brd.nci.nih.gov/BRN/brnHome.seam.
- Srinivasan M, Sedmak D, Jewell S: Effect of fixatives and tissue processing on the content and integrity of nucleic acids. Am J Pathol. 2002, 161: 1961-1971. 10.1016/S0002-9440(10)64472-0.View ArticlePubMedPubMed CentralGoogle Scholar
- Thorpe JD, Duan XB, Forrest R, Lowe K, Brown L, Segal E, Nelson B, Anderson GL, McIntosh M, Urban N: Effects of blood collection conditions on ovarian cancer serum markers. PLoS One. 2007, 2: e1281-10.1371/journal.pone.0001281.View ArticlePubMedPubMed CentralGoogle Scholar
- Strand C, Enell J, Hedenfalk I, Ferno M: RNA quality in frozen breast cancer samples and the influence on gene expression analysis–a comparison of three evaluation methods using microcapillary electrophoresis traces. BMC Mol Biol. 2007, 8: 38-10.1186/1471-2199-8-38.View ArticlePubMedPubMed CentralGoogle Scholar
- Kurban G, Gallie BL, Leveridge M, Evans A, Rushlow D, Matevski D, Gupta R, Finelli A, Jewett MA: Needle core biopsies provide ample material for genomic and proteomic studies of kidney cancer: observations on DNA, RNA, protein extractions and VHL mutation detection. Pathol Res Pract. 2012, 208: 22-31. 10.1016/j.prp.2011.11.001.View ArticlePubMedGoogle Scholar
- Rudnick PA, Clauser KR, Kilpatrick LE, Tchekhovskoi DV, Neta P, Blonder N, Billheimer DD, Blackman RK, Bunk DM, Cardasis HL, Ham AJ, Jaffe JD, Kinsinger CR, Mesri M, Neubert TA, Schilling B, Tabb DL, Tegeler TJ, Vega-Montoto L, Variyath AM, Wang M, Wang P, Whiteaker JR, Zimmerman LJ, Carr SA, Fisher SJ, Gibson BW, Paulovich AG, Regnier FE, Rodriguez H, et al: Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Mol Cell Proteomics. 2010, 9: 225-241. 10.1074/mcp.M900223-MCP200.View ArticlePubMedGoogle Scholar
- Beasley-Green A, Bunk D, Rudnick P, Kilpatrick L, Phinney K: A proteomics performance standard to support measurement quality in proteomics. Proteomics. 2012, 12: 923-931. 10.1002/pmic.201100522.View ArticlePubMedGoogle Scholar
- External RNA Controls Consortium: Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics. 2005, 6: 150.View ArticlePubMed CentralGoogle Scholar
- Jiang LC, Schlesinger F, Davis CA, Zhang Y, Li RH, Salit M, Gingeras TR, Oliver B: Synthetic spike-in standards for RNA-seq experiments. Genome Res. 2011, 21: 1543-1551. 10.1101/gr.121095.111.View ArticlePubMedPubMed CentralGoogle Scholar
- Tabb DL, Vega-Montoto L, Rudnick PA, Variyath AM, Ham AJL, Bunk DM, Kilpatrick LE, Billheimer DD, Blackman RK, Cardasis HL, Carr SA, Clauser KR, Jaffe JD, Kowalski KA, Neubert TA, Regnier FE, Schilling B, Tegeler TJ, Wang M, Wang P, Whiteaker JR, Zimmerman LJ, Fisher SJ, Gibson BW, Kinsinger CR, Mesri M, Rodriguez H, Stein SE, Tempst P, Paulovich AG, Liebler DC, Spiegelman C, et al: Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J Proteome Res. 2010, 9: 761-776. 10.1021/pr9006365.View ArticlePubMedPubMed CentralGoogle Scholar
- Zweig MH: Assessment of the Clinical Accuracy of Laboratory Tests Using Receiver Operating Characteristics (ROC) Plots. 1995, Wayne, PA: Clinical and Laboratory Standards InstituteGoogle Scholar
- Dimeski G: Interference Testing in Clinical Chemistry. 2005, Wayne, PA: Clinical and Laboratory Standards Institute, 2Google Scholar
- Hackett JL, Archer KJ, Gaigalas AK, Garrett CT, Joseph LJ, Koch WH, Kricka LJ, McGlennen RC, van Deerlin V, Vasquez GB: Diagnostic Nucleic Acid Microarrays; Approved Guideline. 2006, Wayne, PA: Clinical and Laboratory Standards InstituteGoogle Scholar
- Krouwer JS, Cembrowski GS, Tholen DW: Preliminary Evaluation of Quantitative Clinical Laboratory Measurement Procedures. 2006, Wayne, PA: Clinical and Laboratory Standards Institute, 3Google Scholar
- Wilson JA, Zoccoli MA, Jacobson JW, Kalman L, Krunic N, Matthijs G, Pratt VM, Schoonmaker MM, Tezak Z: Verification and Validation of Multiplex Nucleic Acid Assays. 2008, Wayne, PA: Clinical and Laboratory Standards InstituteGoogle Scholar
- Clark LW: User Protocol for Evaluation of Qualitative Test Performance. 2008, Wayne, PA: Clinical and Laboratory Standards Institute, 2Google Scholar
- Pierson-Perry JF, Vaks JE, Durham AP, Fischer C, Gutenbrunner C, Hillyard D, Kondratovich MV, Ladwig P, Middleberg RA: Evaluation of Detection Capability for Clinical Laboratory Measurement Procedures. 2012, Wayne, PA: Clinical and Laboratory Standards Institute, 2Google Scholar
- National Cancer Institute: Performance standards reporting requirements for essential assays in clinical trials. http://cdp.cancer.gov/scientificPrograms/pacct/assay_standards.htm.
- National Cancer Institute: Templates for clinical assay development. http://www.cancerdiagnosis.nci.nih.gov/diagnostics/templates.htm.
- Sun F, Bruening W, Uhl S, Ballard R, Tipton K, Schoelles K: Quality, Regulation and Clinical Utility of Laboratory-Developed Molecular Tests. 2010, ECRI Institute, Evidence-Based Practice Center: Rockville, MDGoogle Scholar
- Dobbin KK, Beer DG, Meyerson M, Yeatman TJ, Gerald WL, Jacobson JW, Conley B, Buetow KH, Heiskanen M, Simon RM, Minna JD, Girard L, Misek DE, Taylor JM, Hanash S, Naoki K, Hayes DN, Ladd-Acosta C, Enkemann SA, Viale A, Giordano TJ: Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin Cancer Res. 2005, 11: 565-572.PubMedGoogle Scholar
- Perkel JM: Six things you won’t find in the MAQC. Scientist. 2006, 20: 68-69.Google Scholar
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, Thierry-Mieg J, Wang C, Wilson M, MAQC Consortium, et al: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006, 24: 1151-1161. 10.1038/nbt1239.View ArticlePubMedGoogle Scholar
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18: 1509-1517. 10.1101/gr.079558.108.View ArticlePubMedPubMed CentralGoogle Scholar
- Bantscheff M, Schirle M, Sweetman G, Rick J, Kuster B: Quantitative mass spectrometry in proteomics: a critical review. Anal Bioanal Chem. 2007, 389: 1017-1031. 10.1007/s00216-007-1486-6.View ArticlePubMedGoogle Scholar
- Brettschneider J, Collin F, Bolstad BM, Speed TP: Quality assessment for short oligonucleotide microarray data rejoinder. Technometrics. 2008, 50: 279-283. 10.1198/004017008000000389.View ArticleGoogle Scholar
- Brettschneider J, Collin F, Bolstad BM, Speed TP: Quality assessment for short oligonucleotide microarray data. Technometrics. 2008, 50: 241-264. 10.1198/004017008000000334.View ArticleGoogle Scholar
- Kinsinger CR, Apffel J, Baker M, Bian X, Borchers CH, Bradshaw R, Brusniak MY, Chan DW, Deutsch EW, Domon B, Gorman J, Grimm R, Hancock W, Hermjakob H, Horn D, Hunter C, Kolar P, Kraus HJ, Langen H, Linding R, Moritz RL, Omenn GS, Orlando R, Pandey A, Ping P, Rahbar A, Rivers R, Seymour SL, Simpson RJ, Slotta D, et al: Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam Principles). J Proteome Res. 2011, 11: 1412-1419.View ArticlePubMedPubMed CentralGoogle Scholar
- Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010, 11: 733-739. 10.1038/nrg2825.View ArticlePubMedGoogle Scholar
- Cairns DA: Statistical issues in quality control of proteomic analyses: good experimental design and planning. Proteomics. 2011, 11: 1037-1048. 10.1002/pmic.201000579.View ArticlePubMedGoogle Scholar
- Harrell FE: Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. 2001, New York: SpringerView ArticleGoogle Scholar
- Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data-mining, Inference, and Prediction. 2009, New York: Springer, 2View ArticleGoogle Scholar
- Simon R, Radmacher MD, Dobbin K, McShane LM: Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst. 2003, 95: 14-18. 10.1093/jnci/95.1.14.View ArticlePubMedGoogle Scholar
- Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002, 97: 77-87. 10.1198/016214502753479248.View ArticleGoogle Scholar
- Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, Su Z, Chu TM, Goodsaid FM, Pusztai L, Shaughnessy JD, Oberthuer A, Thomas RS, Paules RS, Fielden M, Barlogie B, Chen W, Du P, Fischer M, Furlanello C, Gallas BD, Ge X, Megherbi DB, Symmans WF, Wang MD, Zhang J, Bitter H, Brors B, Bushel PR, Bylesjo M, et al: The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol. 2010, 28: 827-838. 10.1038/nbt.1665.View ArticlePubMedGoogle Scholar
- Fan J, Fan Y: High-dimensional classification using features annealed independence rules. Ann Stat. 2008, 36: 2605-2637. 10.1214/07-AOS504.View ArticlePubMedPubMed CentralGoogle Scholar
- Dupuy A, Simon RM: Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007, 99: 147-157. 10.1093/jnci/djk018.View ArticlePubMedGoogle Scholar
- Subramanian J, Simon R: Gene expression-based prognostic signatures in lung cancer: ready for clinical use?. J Natl Cancer Inst. 2010, 102: 464-474. 10.1093/jnci/djq025.View ArticlePubMedPubMed CentralGoogle Scholar
- Buchen L: Missing the mark. Nature. 2011, 471: 428-432. 10.1038/471428a.View ArticlePubMedGoogle Scholar
- Ioannidis JPA, Khoury MJ: Improving validation practices in ‘omics’ research. Science. 2011, 334: 1230-1232. 10.1126/science.1211811.View ArticlePubMedGoogle Scholar
- Dobbin KK, Simon RM: Optimally splitting cases for training and testing high dimensional classifiers. BMC Med Genomics. 2011, 4: 31-10.1186/1755-8794-4-31.View ArticlePubMedPubMed CentralGoogle Scholar
- Molinaro AM, Simon R, Pfeiffer RM: Prediction error estimation: a comparison of resampling methods. Bioinformatics. 2005, 21: 3301-3307. 10.1093/bioinformatics/bti499.View ArticlePubMedGoogle Scholar
- McIntosh M, Anderson G, Drescher C, Hanash S, Urban N, Brown P, Gambhir SS, Coukos G, Laird PW, Nelson B, Palmer C: Ovarian cancer early detection claims are biased. Clin Cancer Res. 2008, 14: 7574-Author reply 7577–7579View ArticlePubMedGoogle Scholar
- Ambroise C, McLachlan GJ: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci U S A. 2002, 99: 6562-6566. 10.1073/pnas.102102699.View ArticlePubMedPubMed CentralGoogle Scholar
- Mesirov JP: Accessible reproducible research. Science. 2010, 327: 415-416. 10.1126/science.1179653.View ArticlePubMedGoogle Scholar
- Broad Institute of MIT and Harvard: GenePattern. http://www.broadinstitute.org/cancer/software/genepattern.
- Ludwig-Maximilians-Universität München: What Is Sweave?. http://www.statistik.lmu.de/~leisch/Sweave.
- Xie Y: knitr: elegant, flexible and fast dynamic report generation with R. http://yihui.name/knitr.
- Using R Markdown with Rstudio. http://www.rstudio.org/docs/authoring/using_markdown.
- GitHub: git-fast-version-control. http://git-scm.com.
- Gelfand AE, Smith AFM: Sampling-based approaches to calculating marginal densities. J Am Stat Assoc. 1990, 85: 398-409. 10.1080/01621459.1990.10476213.View ArticleGoogle Scholar
- Breiman L: Random forests. Mach Learn. 2001, 45: 5-32. 10.1023/A:1010933404324.View ArticleGoogle Scholar
- Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS: Random survival forests. Ann Appl Stat. 2008, 2: 841-860. 10.1214/08-AOAS169.View ArticleGoogle Scholar
- Hans C, Dobra A, West M: Shotgun stochastic search for ‘large p’ regression. J Am Stat Assoc. 2007, 102: 507-516. 10.1198/016214507000000121.View ArticleGoogle Scholar
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31: e15-10.1093/nar/gng015.View ArticlePubMedPubMed CentralGoogle Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.View ArticlePubMedGoogle Scholar
- Katz S, Irizarry RA, Lin X, Tripputi M, Porter MW: A summarization approach for Affymetrix GeneChip data using a reference training set from a large, biologically diverse database. BMC Bioinformatics. 2006, 7: 464-10.1186/1471-2105-7-464.View ArticlePubMedPubMed CentralGoogle Scholar
- McCall MN, Bolstad BM, Irizarry RA: Frozen robust multiarray analysis (fRMA). Biostatistics. 2010, 11: 242-253. 10.1093/biostatistics/kxp059.View ArticlePubMedPubMed CentralGoogle Scholar
- Owzar K, Barry WT, Jung SH, Sohn I, George SL: Statistical challenges in pre-processing in microarray experiments in cancer. Clin Cancer Res. 2008, 14: 5959-5966. 10.1158/1078-0432.CCR-07-4532.View ArticlePubMedPubMed CentralGoogle Scholar
- Perou CM, Sørlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lønning PE, Børresen-Dale AL, Brown PO, Botstein D: Molecular portraits of human breast tumours. Nature. 2000, 406: 747-752. 10.1038/35021093.View ArticlePubMedGoogle Scholar
- Sorlie T, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Lønning PE, Børresen-Dale AL: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A. 2001, 98: 10869-10874. 10.1073/pnas.191367098.View ArticlePubMedPubMed CentralGoogle Scholar
- Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, et al: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000, 403: 503-511. 10.1038/35000501.View ArticlePubMedGoogle Scholar
- Lusa L, McShane LM, Reid JF, De Cecco L, Ambrogi F, Biganzoli E, Gariboldi M, Pierotti MA: Challenges in projecting clustering results across gene expression-profiling datasets. J Natl Cancer Inst. 2007, 99: 1715-1723. 10.1093/jnci/djm216.View ArticlePubMedGoogle Scholar
- Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, Quackenbush JF, Stijleman IJ, Palazzo J, Marron JS, Nobel AB, Mardis E, Nielsen TO, Ellis MJ, Perou CM, Bernard PS: Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009, 27: 1160-1167. 10.1200/JCO.2008.18.1370.View ArticlePubMedPubMed CentralGoogle Scholar
- Moons KGM, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, Woodward M: Risk prediction models: II. External validation, model updating, and impact assessment. Heart. 2012, 98: 691-698. 10.1136/heartjnl-2011-301247.View ArticlePubMedGoogle Scholar
- Moons KGM, Kengne AP, Woodward M, Royston P, Vergouwe Y, Altman DG, Grobbee DE: Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart. 2012, 98: 683-690. 10.1136/heartjnl-2011-301246.View ArticlePubMedGoogle Scholar
- Taylor JMG, Ankerst DP, Andridge RR: Validation of biomarker-based risk prediction models. Clin Cancer Res. 2008, 14: 5977-5983. 10.1158/1078-0432.CCR-07-4534.View ArticlePubMedPubMed CentralGoogle Scholar
- Janssens AC, Ioannidis JP, van Duijn CM, Little J, Khoury MJ: Strengthening the reporting of genetic risk prediction studies: The GRIPS Statement. PLoS Med. 2011, 8: e1000420-10.1371/journal.pmed.1000420.View ArticlePubMedPubMed CentralGoogle Scholar
- Altman DG, McShane LM, Sauerbrei W, Taube SE: Reporting recommendations for tumor marker prognostic studies (REMARK): explanation and elaboration. BMC Med. 2012, 10: 51-10.1186/1741-7015-10-51.View ArticlePubMedPubMed CentralGoogle Scholar
- McShane LM, Altman DG, Sauerbrei W, Taube SE, Gion M, Clark GM: Statistics subcommittee of the NCI-EORTC working group on cancer diagnostics. Reporting recommendations for tumor marker prognostic studies (REMARK). J Natl Cancer Inst. 2005, 97: 1180-1184. 10.1093/jnci/dji237.View ArticlePubMedGoogle Scholar
- Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC: Standards for reporting of diagnostic accuracy. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Ann Intern Med. 2003, 138: 40-44. 10.7326/0003-4819-138-1-200301070-00010.View ArticlePubMedGoogle Scholar
- Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HC, Lijmer JG: Standards for reporting of diagnostic accuracy. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem. 2003, 49: 7-18. 10.1373/49.1.7.View ArticlePubMedGoogle Scholar
- Greene MH, Feng ZD, Gail MH: The importance of test positive predictive value in ovarian cancer screening. Clin Cancer Res. 2008, 14: 7574-7575.View ArticlePubMedGoogle Scholar
- Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P: Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004, 159: 882-890. 10.1093/aje/kwh101.View ArticlePubMedGoogle Scholar
- US Food and Drug Administration: Guidance for Industry: Computerized Systems Used in Clinical Investigations. 2007, Rockville, MD: US Department of Health and Human ServicesGoogle Scholar
- Schilsky RL, Doroshow JH, LeBlanc M, Conley BA: Development and use of integral assays in clinical trials. Clin Cancer Res. 2012, 18: 1540-1546. 10.1158/1078-0432.CCR-11-2202.View ArticlePubMedPubMed CentralGoogle Scholar
- McShane LM, Hayes DF: Publication of tumor marker research results: The necessity for complete and transparent reporting. J Clin Oncol. 2012, 30: 4223-4232. 10.1200/JCO.2012.42.6858.View ArticlePubMedPubMed CentralGoogle Scholar
- Simon RM, Paik S, Hayes DF: Use of archived specimens in evaluation of prognostic and predictive biomarkers. J Natl Cancer Inst. 2009, 101: 1446-1452. 10.1093/jnci/djp335.View ArticlePubMedPubMed CentralGoogle Scholar
- Sargent D, Conley BA, Allegra C, Collette L: Clinical trial designs for predictive marker validation in cancer treatment trials. J Clin Oncol. 2005, 23: 2020-2027. 10.1200/JCO.2005.01.112.View ArticlePubMedGoogle Scholar
- Freidlin B, McShane LM, Korn EL: Randomized clinical trials with biomarkers: design issues. J Natl Cancer Inst. 2010, 102: 152-160. 10.1093/jnci/djp477.View ArticlePubMedPubMed CentralGoogle Scholar
- Clark GM, McShane LM: Biostatistical considerations in development of biomarker-based tests to guide treatment decisions. Stat Biopharm Res. 2011, 3: 549-560. 10.1198/sbr.2011.09038.View ArticleGoogle Scholar
- McShane LM, Hunsberger S, Adjei AA: Effective incorporation of biomarkers into phase II trials. Clin Cancer Res. 2009, 15: 1898-1905. 10.1158/1078-0432.CCR-08-2033.View ArticlePubMedPubMed CentralGoogle Scholar
- Freidlin B, McShane LM, Polley M-YC, Korn EL: Randomized phase II designs with biomarkers. J Clin Oncol. 2012, 30: 3304-3309. 10.1200/JCO.2012.43.3946.View ArticlePubMedPubMed CentralGoogle Scholar
- Sparano JA: TAILORx: Trial Assigning Individualized Options for Treatment (Rx). Clin Breast Cancer. 2006, 7: 347-350. 10.3816/CBC.2006.n.051.View ArticlePubMedGoogle Scholar
- Zujewski JA, Kamin L: Trial assessing individualized options for treatment for breast cancer: the TAILORx trial. Future Oncol. 2008, 4: 603-610. 10.2217/147966188.8.131.523.View ArticlePubMedGoogle Scholar
- Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, Hiller W, Fisher ER, Wickerham DL, Bryant J, Wolmark N: A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004, 351: 2817-2826. 10.1056/NEJMoa041588.View ArticlePubMedGoogle Scholar
- Habel LA, Shak S, Jacobs MK, Capra A, Alexander C, Pho M, Baker J, Walker M, Watson D, Hackett J, Blick NT, Greenberg D, Fehrenbacher L, Langholz B, Quesenberry CP: A population-based study of tumor gene expression and risk of breast cancer death among lymph node-negative patients. Breast Cancer Res. 2006, 8: R25-10.1186/bcr1412.View ArticlePubMedPubMed CentralGoogle Scholar
- International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use: ICH homepage. http://www.ich.org.
- US Department of Health and Human Services: SACHRP Letter to the Secretary: FAQs, terms and recommendations on informed consent and research use of biospecimens. http://www.hhs.gov/ohrp/sachrp/20110124attachmentatosecletter.html.
- Cancer Therapy Evaluation Program: Investigator resources. http://ctep.cancer.gov/investigatorResources/biomarker_resources.htm.
- Centers for Medicare & Medicaid Service: Clinical Laboratory Improvement Amendments (CLIA). http://www.cms.gov/Regulations-and-Guidance/Legislation/CLIA/index.html?redirect=/CLIA.
- US Food and Drug Administration: Device advice: comprehensive regulatory assistance. http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/default.htm.
- US Food and Drug Administration: Investigational New Drug (IND) Application. http://www.fda.gov/Drugs/DevelopmentApprovalProcess/HowDrugsareDevelopedandApproved/ApprovalApplications/InvestigationalNewDrugINDApplication/default.htm.
- Center for Devices and Radiological Health: In vitro diagnostic multivariate index assays. 2007, Rockville, MD: US Food and Drug AdministrationGoogle Scholar
- US Food and Drug Administration: CFR 21. Chapter I, Subchapter H, Part 812: Investigational Device Exemptions. 2012, Rockville, MD: US Department of Health and Human ServicesGoogle Scholar
- US Food and Drug Administration: Is the product a medical device?. http://www.fda.gov/medicaldevices/deviceregulationandguidance/overview/classifyyourdevice/ucm051512.htm.
- US Food and Drug Administration: Draft guidance for industry and FDA staff: medical devices: the Pre-Submission Program and meetings with FDA staff. 2012, Rockville, MD: US Department of Health and Human Services, http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm310375.htm.Google Scholar
- US Food and Drug Administration: Biomarker Qualification Program. http://www.fda.gov/Drugs/DevelopmentApprovalProcess/DrugDevelopmentToolsQualificationProgram/ucm284076.htm.
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1741-7015/11/220/prepub