Prediction of falls using a risk assessment tool in the acute care setting

Background The British STRATIFY tool was previously developed to predict falls in hospital. Although the tool has several strengths, certain limitations exist which may not allow generalizability to a Canadian setting. Thus, we tested the STRATIFY tool with some modification and re-weighting of items in Canadian hospitals. Methods This was a prospective validation cohort study in four acute care medical units of two teaching hospitals in Hamilton, Ontario. In total, 620 patients over the age of 65 years admitted during a 6-month period. Five patient characteristics found to be risk factors for falls in the British STRATIFY study were tested for predictive validity. The characteristics included history of falls, mental impairment, visual impairment, toileting, and dependency in transfers and mobility. Multivariate logistic regression was used to obtain optimal weights for the construction of a risk score. A receiver-operating characteristic curve was generated to show sensitivities and specificities for predicting falls based on different threshold scores for considering patients at high risk. Results Inter-rater reliability for the weighted risk score indicated very good agreement (inter-class correlation coefficient = 0.78). History of falls, mental impairment, toileting difficulties, and dependency in transfer / mobility significantly predicted fallers. In the multivariate model, mental status was a significant predictor (P < 0.001) while history of falls and transfer / mobility difficulties approached significance (P = 0.089 and P = 0.077 respectively). The logistic regression model led to weights for a risk score on a 30-point scale. A risk score of 9 or more gave a sensitivity of 91% and specificity of 60% for predicting who would fall. Conclusion Good predictive validity for identifying fallers was achieved in a Canadian setting using a simple-to-obtain risk score that can easily be incorporated into practice.


Background
Falls account for at least 40% of all accidents in hospital [1]. Risk of hip fracture was found to be 11 times greater in hospital patients compared to those in the community [2]. Patient characteristics implicated in falls that occur in hospitals include history of falls, difficulty in transfers or ambulating, dizziness and balance, delirium, visual impairment, medications, incontinence and toileting frequency [3][4][5].
Clinical prediction rules are tools designed to predict health outcomes and assist health care professionals plan patient care. These tools typically include three or more risk factors from a patient's history or physical exam that predict an outcome such as falls [6]. The STRATIFY tool was developed and validated in the United Kingdom to predict falls occurring in hospital. The tool contains five clinical factors associated with falling (e.g. previous falls, mental impairment) with a simple scoring system. The tool incorporates many of the features that instrument developers desire including 1) predictive validity: high sensitivity and specificity for predicting falls, 2) feasibility; the items are easily and rapidly assessed by nursing staff with minimal staff training required, and 3) reproducibility; the predictive variables and the decision rule were tested in different geriatric settings [7].
However the STRATIFY tool has limitations. Altman [8,9] pointed out that falls, rather than patients, were outcomes in the STRATIFY study, which could inflate predictive validity. Specifically, of 217 patients, 71 falls occurred (approximately a 30% rate of falls) [6]. Prediction might also have been weakened by the absence of item weighting as certain patient characteristics may have greater value in predicting falls. Price [10] emphasized that the use of data on falls in hospital to update risk scores in the STRATIFY study, would inflate the calculated predictive validity and decrease useful prediction. In fact, the majority of falls occur in the first week of hospitalization [11,12]. Finally, the use of items in the tool having varying interpretations (e.g. agitation) may compromise reproducibility. Ideally, each question in a clinical tool should be interpreted in the same way.
We tested the predictive validity of the STRATIFY variables in a Canadian setting, with the objective of determining if it predicts fallers rather than patient falls. We constructed a weighted risk score based on the components of the STRATIFY tool and examined its predictive validity.

Setting and participants
Data were collected over a 6-month period in 2000 at Hamilton Health Sciences, Hamilton, Ontario, Canada, a multi-site teaching hospital. Patients over 65 years of age admitted consecutively to four general medical units having a total of 114 beds were assessed. Palliative and critical care patients were excluded.

Definition of clinical outcome
The clinical outcome we studied was "fallers" versus "nonfallers". "Fallers" were those having one or more falls during hospitalization. A fall was defined as an individual involuntarily coming to rest on the ground or surface lower than their original station [13]. The attending nurse documented each fall on an incident report and in the patient chart. Nurses completing the risk assessment tool were blinded to the rationale for assessing predictor variables. Incident reports noted time of fall, location, injury sustained, type of fall, and potential causative factors. Only falls occurring after screening were included as an outcome.

Predictive variables: identification and definition
Five patient characteristics found to be risk factors for falls in the British STRATIFY study [7] were assessed. Items were modified for the Canadian health care system to include the definitions of the risk factor to potentially increase reproducibility (Table 1 [see Additional file 1]). Mental status was divided into three concepts, "disorientation", "confusion", and "agitation" and definitions were added. Similarly, "vision impairment" was assessed based on four questions and "history of falls" was divided into two questions. Toileting assessment remained one item as in STRATIFY but the wording was altered and a definition was added. As in the original tool the "transfer and mobility" measure was taken from the Barthel Index [14] and was the sum for transfer (0 to 3) and mobility (0 to 3) scores for a total ranging from of 0 (dependent) to 6 (independent).

Screening assessment scoring
The nurses completing the screening tool had a 10-min orientation session run by the clinical nurse specialists who were part of the investigative team. The assigned nurse collected screening variables 24 to 48 hours after the patient was admitted to hospital in a 5-min bedside session.
To calculate predictive validity, each of the five items were scored dichotomously as 1 = present, and 0 = absent for items 1 to 4. The risk factors were deemed to be "present" if one or more of the statements were considered within each domain (Table 1 [see Additional file 1]). For example, mental impairment was present if one or more of confusion, disorientation or agitation were scored "yes". If none of these variables were present, then the risk factor was scored as "absent". Scores of 0 to 3 for the transfer and mobility sub-score corresponded to the presence of the risk factor and scores of 4 to 6 were considered to represent absence of the risk factor.

Statistical analyses
Each of the five items was assessed individually and collectively for their predictive power based on logistic regression models. The variables were also used to produce an overall risk score using two approaches. First, an unweighted risk score (r) was computed by simply counting the number of risk factors present. This gave a risk score ranging from 0 to 5, as in the original STRATIFY study.
Second, a weighted risk score was obtained based on the regression coefficients from the multivariate logistic regression model in which the outcome was the fall status of the patient. Specifically, the relative magnitude of the beta coefficients from the multivariate logistic model reflects the relative prognostic strength of the risk factors when they are jointly considered. Therefore the relevant information for the construction of the weights is the relative size of the beta coefficients and any weights, which preserve these ratios, will preserve good predictive validity (see Appendix [Additional File 1] for further details).
To consider predictive validity in terms of sensitivity and specificity one must consider two populations, "fallers" and "non-fallers". Two receiver operating characteristic curves (ROC) were constructed to display the varying specificity and sensitivity values applicable to the range of possible thresholds of unweighted (0 -5) and weighted risk (0 -30) scores that could be used to classify patients as high risk for falling or not. All analyses were performed using the Statistical Analysis System (SAS release 8.1).

Inter-rater reliability
Two nurses independently assessed a sub-sample of 35 patients. The order of assessments was random and assessors were blind to the findings of the other nurse. The required sample size was estimated at 33 patients, with power = 0.80 and alpha = 0.05 using published power tables [15]. The inter-class correlation coefficient (ICC) was computed for the weighted and un-weighted risk scores. The kappa statistic was computed on reliability of classification into "high" versus "normal" risk based on the optimal threshold score.

Results
Over 6 months, 620 patients were screened for falls. The mean age was 78 years (SD 7.7) and 338 (54.5%) patients were female. The diagnoses most responsible for hospitalization in the sample were circulatory disorders (45.2%), respiratory disorders (20.8%), digestive disorders (4.0%) and mental disorders (2.9%). Diagnosis was not predictive of falls. Thirty-four patients (5.5%) fell at least once during their hospitalization and there were a total of 77 falls. In total, 171 patients (27.6%) had a history of falls at admission.

Predictive variables
Based on univariate logistic regression (Table 1) history of falls prior to admission (P = 0.011), mental status (confused, disoriented or agitated) (P < 0.001), toileting difficulties (P = 0.005) and transfer / mobility difficulties (P < 0.001) predicted falls. When the inter-correlations between these independent variables were controlled for using multiple logistic regression (Table 1), only mental status (P < 0.001) remained a significant predictor. In terms of the magnitude of the associations from this multivariate model the odds of falling increased over four fold in individuals with mental impairment (odds ratio (OR) = 4.06; 95% CI: 1.81, 9.16). History of falls and transfer/ mobility difficulties incurred an approximate two-fold increase in risk while approaching significance (P = 0.089 and P = 0.077 respectively).
When length of stay was added to the multivariate model, the odds ratios associated with each risk factor were slightly attenuated. However, the pattern of results remained the same and mental status was most predictive, followed by history of falls, and then transfer/mobility difficulties. A weighted risk formula (Appendix I) was derived which shows the relative magnitude of the regression coefficients from the multivariate logistic regression model. This formula can easily be used to calculate the weighted risk score.

ROC curves
The ROC curves based on the unweighted and weighted risk scores are given in Figure 1 and sensitivities and specificities are displayed for a subset of possible threshold values in Table 2. As the weighted risk score (R) is increased from 0, the sensitivity falls very little until one reaches a false positive rate of about 0.40, which corresponds to R = 9. Balancing sensitivity and specificity, we selected this as the ideal cut-off score, which results in a sensitivity of 91.2% (95% CI: 81.6, 100.7) and specificity of 60.2% (95% CI: 56.3, 64.2). With a threshold of 9, the tool would correctly classify over 90% of those patients at high risk for falling and over 60% of those at normal risk. When the cut-off score increases to 20, specificity increases slightly (78.8%), however, sensitivity significantly falls (55.9%).
The ROC curve for the un-weighted risk score (r) ( Figure  1) had poorer predictive validity. For example, when two or more risk factors are present (r ≤ 2), sensitivity was 91.2%, but specificity was only 49.3%. Increasing r to 3 or more risk factors resulted in an increase in specificity to 71.3% but sensitivity dropped to an unacceptable 61.8%.

Discussion
Falls in the elderly are often a symptom of acute medical problems in combination with underlying risks such as medications, postural hypotension and lower extremity weakness. Identifying those at risk allows targeted assessment and intervention such as a review of medications and environmental modifications [16]. This study demonstrated good predictive validity for the modified SRAT-IFY tool to identify individuals at risk of falling in acute care. With a risk score of 9, sensitivity was 91% and specificity was 60%. The falls risk assessment tool can be easily incorporated into practice without added burden to the patient. The findings were achieved with a conservative methodology in which the outcome measure was the patient (i.e. fallers), rather than falls, and the risk score was generated before any falls. Despite minimal nurse training and short completion time, we were able to obtain very good inter-rater reliability (ICC = 0.78). A recent analytic review of falls risk assessment tools found that only two of five tools used in acute care with a sensi-tivity over 80 described how long the tool took to complete and only one had findings reproduced by other investigators. Many did not report inter-rater reliability [17].
Risk factors included in screening for falls in hospitalized patients have largely been consistent across studies with varying methods. Findings have repeatedly emphasized falls history, mental impairment, toileting frequency, and general mobility as predictive variables for falls [7,[18][19][20][21][22][23]. Nevertheless, it is not yet clear how to maximize prediction. The variables included in different studies do not overlap entirely and some studies incorporate variables with poor or inconsistent predictive validity. For example, visual impairment had poor predictive value in our study and no significant predictive value in Morse's study [24]. Oliver et al. included visual impairment as a variable in STRATIFY based on an initial study phase in which it was moderately predictive (OR = 3.55), and appeared to rank fifth strongest among 10 clinical variables described [7]. However, their design did not include a control for intercorrelations among risk factors. Also, relevant to optimizing prediction is the fact that Morse's study [24] and ours are the only ones to include weightings derived from quantitative analysis. Item weighting was clearly important to optimize prediction. Studies have also differed in the suggested ideal risk score cut-offs to consider patients in the "at risk" group. Whether our suggested cut-off of 9 is ideal for different hospital settings is not known. One approach is to use the cut-off that maximizes predictive power mathematically. Practitioners in different settings may adjust the trade-off between sensitivity and specificity, based on differing falls rates, values, laws, funding and other factors.
Our finding of poor predictive validity with the unweighted items does not clearly amount to poor generalization across settings because the items and protocol were changed. Studies involving tests of prediction tools in new locations have found results that are weaker than the original findings [25]. The difficulty obtaining generalized (i.e. reproducible) effects is concerning. One explanation may be that the base rate for occurrence of a clinical outcome is known to affect positive predictive value [24,25]. Our base rate for falls was 5.5%, which is lower than that found in the British study and may have contributed to lower predictability. Another possibility is that prediction may only be consistent among patients with similar characteristics, resulting in generalization across some settings and not others.
A potential methodological limitation is uncertainty about patient incident reports, which may not capture all falls, however our documented rate of falls was similar to previous years in our setting. There is also the possibility that completing risk assessments influences how nurses respond to patients in terms of falls prevention strategies (i.e. Hawthorne effect). It is unknown if this factor affected the true rate of falls in our setting. However, it is predicted that this effect is likely minimal given that strong consistent falls prevention strategies were not in place at the time of the study. Another potential limitation is that despite some changes to scale items taken from STRATIFY to improve reproducibility; there is still room for error. For example, patient recall of falls may not be accurate and consistent. This may have accounted for an inter-rater reliability that was lower than ideal. The problem stresses a potential need to improve operational definitions of risk variables to ensure reproducibility in measuring items such as mental status. It may help to have consensus among investigators on key issues including: which variables to include, operational definitions of risk constructs, the duration over which risk is assessed (i.e. within 24 or 48 hours), the way in which users should be trained, and what the appropriate outcomes are (e.g. falls versus fallers). Further replication of our study in other settings will help to correct upon these limitations and improve generalizability.