Sample and setting
From the original study sample of 276 consecutive patients (i.e., included in the cohort study in the order that they were admitted), 21 patients had only one HRSD-17 measurement due to a short hospitalization period or refusal to participate, yielding 255 (91.6%) patients included in the current analysis. Thus, we included 255 adult patients consecutively admitted to a tertiary psychiatric hospital in Duffel, Belgium, and fulfilling the MINI-Plus diagnosis, based on the DSM-IV criteria, of a depressive episode as part of an MDD or bipolar disorder (BD). In order to obtain a representative sample of depressed participants, exclusion criteria were minimal. We did not include patients with (comorbid MINI-Plus) psychotic disorders (including schizoaffective disorder) or with a dependency on alcohol or drugs within 12 months prior to hospitalization. Moreover, patients with insufficient mastery of the Dutch language were not included.
Treatment
Inpatients received treatment as usual which was based on evidence-based guidelines and consisted of pharmacotherapy, (group) psychotherapy, or a combination of both. These guidelines for diagnosis and treatment were formulated by the Dutch Association of Psychiatry, often in association with the associations of psychology and general practitioners (www.trimbos.nl, www.nvvp.net). Psychotropic medication at baseline was coded into five dichotomous variables: antidepressants, mood stabilizers, antipsychotics, benzodiazepines, and stimulants. Response was defined as a ≥ 50% reduction of the HRSD-17 compared to the baseline assessment. Remission was defined as scoring ≤ 7 on the HRSD-17.
Measurements
The present study was part of a larger follow-up study investigating the feasibility of ROM in the University Psychiatric Centre in Duffel [38]. ROM were done at baseline and every 2 weeks thereafter during the clinical admission which lasted from 2 weeks to 16 months. ROM consisted of a test battery assessing overall mental well-being, quality of life, and mood (including the Hamilton Depression Rating Scale-17). The data presented in this article represent the collected data from the period April 2015 through February 2018.
The HRSD-17 consists of 17 items on a Likert scale, ranging from either 0 to 4 (for 9 items) or 0 to 2 (for 8 items). The internal reliability of the HRSD-17 is adequate with most studies reporting a Cronbach’s alpha of ≥ 0.70. It has a good retest and interrater reliability (above 0.80) when assessed over an interval ranging from 1 to 30 days [21]. The Omega and Cronbach’s alpha in our sample of 255 patients at baseline were only 0.49 and 0.52, respectively. The Cronbach’s alpha improved over time with a score of 0.74 after 2, 0.77 after 4, and 0.79 after 6 weeks. The total score ranges from 0 to 52, and higher scores indicate greater severity, but in the present study, we focus on the trajectories of the 17 individual items only. In order to improve interrater reliability, Hamilton Depression Rating Scale training sessions were organized every 3 months among the in total 6 assessors, during which video-recorded interviews with patients were rated and discussed to reach consensus. In total, they conducted 1480 HRSD-17 assessments in 255 patients, with an average of 5.8 assessments per patient.
Statistical analysis
DTW is an approximate pattern detection algorithm that can measure the similarity between two time-series. It uses a dynamic (i.e., stretching and compressing) programming approach to minimize a predefined distance measure (e.g., Euclidean distance), in order for the two time-series to become optimally aligned through a warping path. The “optimal” alignment minimizes the sum of distances between the aligned elements. The “dtw” (version 1.20.1), “pheatmap” (version 1.0.12), “parallelDist” (version 0.2.4), and “qgraph” (version 1.6.2) packages for the R statistical software were used (R version 3.6.0; R Foundation for Statistical Computing, Vienna, Austria, 2016. URL: https://www.R-project.org/).
The idiographic approach per patient was followed by a nomothetic approach to study the depression symptom patterns both within individual patients and in the whole sample of 255 patients. The subsequent methodological steps and statistical methods are described below.
Intra-individual approach
We first aimed to cluster individual symptoms based on the temporal features that they share within each individual patient. The clustering of symptom trajectories based on DTW consisted of two steps. First, the DTW distance between each pair of symptom trajectories was calculated. This is illustrated in Fig. 1 with the example of two HRSD-17 item time-series (item 1 “depressed mood” and item 7 “work and interest”) of a single patient. The temporal scoring (per 2 weeks) on the given items is seen in Fig. 1a, with the two items for which the DTW distance is calculated shown in red. This patient had 14 assessments over a period of 26 weeks. The trajectories of items 1 “depressed mood” and 7 “work and interest” over time are plotted in Fig. 1b. The deformations of the time axes between both items are added, which brings the two time-series as close as possible to each other, in which all elements must be matched. Next, the calculation of the shortest path between the two time-series is shown in Fig. 1c. The two time-series were aligned in time with compressions and expansions. The “symmetricP0” step pattern was used as the dynamic time warping algorithm to match the two sequences, resulting in the red “warping path.” A Sakoe-Chiba Band of 2 was used in order for the severity scores to be matched to a maximum of plus or minus two time points (plus or minus a maximum of 4 weeks). Resulting from the DTW method, a distance measure (d) is produced: items with the best alignment, having a more similar slope and other dynamics (i.e., changes that co-vary over time), resulted in the smallest distance. The distance measures of each of the 17 time-series of individual HRSD-17 items are grouped in a distance matrix, comprising (172 − 17)/2 = 136 distances for each individual patient.
Second, this matrix of 136 distances was presented in a heatmap and used in a hierarchical cluster analysis and a symptom network per patient. For the hierarchical cluster analysis, each item is initially assigned to its own cluster, and then the algorithm proceeds iteratively, at each stage joining the two most similar clusters, continuing until there is just a single cluster. We assumed 3 clusters for each patient, for illustrative purposes only, to enable easier recognition of the symptom with the more similar trajectories. We excluded all symptoms with a score of 0 throughout follow-up, as these tended to cluster together most strongly as these symptom pairs will have a distance of 0. At each stage, distances between clusters are recomputed by the Lance-Williams dissimilarity update formula according to the “Ward.D2” clustering methods. With “Ward.D2,” the total within-cluster variance is minimized, and the dissimilarities are squared before cluster updating.
Using the “qgraph” package, the structure of the network based on the distance matrix was visualized per patient, providing another way of graphical presentation of the clusters. We followed the recommendations on network analysis written by the developers of the R package [39]. A network with up to 17 nodes (representing the individual HRSD-17 depression symptoms) is obtained and, connecting them, the edges representing the distances between symptom trajectories. The thickness of the edges indicates the strength of the longitudinal elastic covariation (thicker edges represent a shorter distance between the two symptom trajectories).
Inter-individual approach
Next, we aimed to study the aggregated dynamics of individual symptoms to yield systematic patterns over time across patients. In this second part, we aimed to build a generalizable hierarchy of symptom clusters based on their shared temporal features. First, the 136 distances were averaged over the 255 patients, weighted for the number of assessments that were done for each of the patients (ranging from 2 through 17). Second, this matrix of 255 mean distances was used for the generalizable hierarchical cluster analysis. A scree plot was constructed displaying the heights in a downward curve and the elbow rule (i.e., the point where the graph leveled off) was used to determine the most appropriate number of clusters.
The “Distatis” algorithm from the “DistatisR” package was used to check whether using the actual 255 distance matrices instead of one mean distance matrix yielded similar clusters. Distatis is a generalization of classical multidimensional scaling (MDS), based on a three-way principal component analysis, to analyze a set of distance matrices. In order to compare these distance matrices, it combines them into a common structure called a compromise and then projects the original distance matrices onto this compromise. Compromise factors are calculated and plotted in the compromise space, with each component been given the length corresponding to its eigenvalues. We plotted each of the 17 HRSD symptoms on an X-Y plane according to their first and second compromise factor values.
In the following step, we investigated two centrality metrics, being closeness centrality and degree centrality [40] for the average distance matrix. Degree centrality is based on the number and strengths of connections each symptom has. Closeness centrality also takes the global network structure into account because it measures the average distance of a certain symptom to all other symptoms. Applied on the DTW data, closeness is inversely proportional to the mean DTW to all other symptoms and, in this way, indicates which symptom trajectory is the most similar to that of other symptoms.
Finally, we computed the average DTW distance among all symptom trajectories for each patient. Symptoms that scored consistently zero were deleted from these analyses for that particular patient, as all such symptoms would result in distances of zero. Shorter average DTW distances reflected denser interconnections between symptoms, and longer average DTW distances reflected looser longitudinal connectivity between symptoms. In order to investigate the relationship between network density and reaching response and remission, we calculated the residuals of the regression with the number of assessments and the HDRS sum score. These residuals were plotted using box plots according to whether response and remission were reached, and we performed Wilcoxon signed-rank tests to compare the two samples.
The analyses used the packages “dtw” (version 1.20.1), “pheatmap” (version 1.0.12), “parallelDist” (version 0.2.4), “qgraph” (version 1.6.2), and “DistatisR” (version) for the R statistical software (R version 3.6.0; R Foundation for Statistical Computing, Vienna, Austria, 2016). A sample code (with data from the of 2 exemplar patients of Fig. 2) can be found in Additional file 1.