Tissue sampling
For the retrospective analysis of human muscle tissue, we obtained images from processed biopsies stored in tissue banks at the Virgen del Rocío University Hospital (Seville), the Santa Creu i Sant Pau Hospital (Barcelona) and the Doce de Octubre Hospital (Madrid). Our database consisted of 102 images from 70 individuals. Fifty images were considered controls and came from patients who were referred to the Neuromuscular Unit of Virgen del Rocío University Hospital because of minor symptoms, such as unspecific fatigue or pain, in whom the physical examination and complete routine biopsy analysis were completely normal. Seventeen images came from adults with diverse NA and 20 images were from the quadriceps of children with either Duchenne or Becker MD (Additional file 2: Table S1). All the methodology used in this study follows the guidelines of the Declaration of Helsinki developed by the World Medical Association. The biopsies used in this study were treated following the requirements established by Spanish law in relation to the Investigation in Biomedicine (Ley 14/2007), personal data protection (Ley Orgánica de 15/1999) and Bioethics. The Hospital Virgen del Rocío ethics commission gave approval for this work (File 2/11). All biopsies were performed under informed consent using a standardized protocol [10].
Immunohistochemistry
Muscle biopsies were processed by the standard methods of freezing and cryostat sectioning. Fluorescence microscopy was used to detect muscle fiber types and the amount of collagen in preparations. The following antibodies were used in accordance with standard immunostaining protocols [10]: mouse anti-myosin heavy chain (slow) (Leica, Newcastle, United Kingdom, clone WB-MHCs; 1:200), mouse anti-myosin heavy chain (fast) (Leica, Newcastle, United Kingdom, clone WB-MHCf; 1:200), and rabbit anti-collagen type VI (Millipore, Temecula, CA, USA, lot number: NG18332|0; 1:300). After staining, the preparations were mounted using Fluoromount-G (Southern Biotech, Birmingham, AL, USA All images were obtained using the 10× objective of a DP70 Olympus microscope (Olympus Iberia, Barcelona, Spain), with a resolution of 4080 × 3072 pixels. Regions of interest were chosen that did not show artifacts caused by the sample processing protocol.
Biopsy evaluation
The pathologist from the Virgen del Rocío University Hospital estimated the degree of severity (from 1 to 4) of the MD samples by following routine and specific protocols (see Additional file 1: Figure S1).
Image processing: muscle fiber segmentation
Muscle fiber segmentation for image processing was carried out in two steps. First, muscle fibers in an image were identified by applying morphological operators. This was followed by application of a watershed transform to provide accurate detection of the fiber contours.
The G component of the Red-Green-Blue (RGB) image was used because of the high contrast that exists between muscle fibers and collagen. Considering that fibers are darker than the surrounding collagen, we searched the image for intensity valleys. Accordingly, the h-minima transform [25] was applied to the G component image to obtain homogeneous minima valleys. The h-minima or h-maxima transform is a powerful mathematical tool used to suppress undesired minima or maxima [25]. By using the h-minima transform, all minima whose depth was lower than or equal to a given h-value were suppressed. The h-value has a direct influence on the number of segmented regions: the larger the h-value is, the smaller the number of segmented regions. In our case, the h-value was experimentally chosen as being half the average intensity of G, such that:
ℎ = ed transform may be applied to an original image as given. However, if the gradient of the original image is used, the minima in the gradient image will correspond to sites within homogeneous regions in the original image. Thus, the watershed transform is usually applied to the gradient image. The watershed algorithm yields results with substantial over-segmentation; that is, the number of segmented regions could be much larger than desired, with clearly identifiable objects or regions being broken into multiple smaller regions. This undesirable result is because the gradient image used in the process is sensitive to noise. The problem of over-segmentation can be overcome with the use of markers that identify objects. The objects’ contours in the gradient image can be seen as the highest crest-lines around the object marker. In our case, the image gradient was calculated in the G component and the internal and external markers were derived from the first step. The block diagram of Additional file 3: Figure S2 shows the steps followed in the segmentation process.
Image processing: generation of the muscle network
For a structural analysis of the biopsy images, we formed a muscle network, where each fiber is represented as a node and two nodes are connected if two fibers are adjacent. This network was formed from the external markers, that is, from the watershed transformation directly to the binary image considered as internal markers, resulting in the first step of the segmentation process. The resulting image was a mosaic where each fiber contour reached the adjacent fiber contour.
Geometric and network feature extraction
The next step concerned feature extraction. Geometric features such as the fiber area or the length of the major and minor axes of the fiber were extracted from the detected contours. Other parameters that took into account the neighboring vicinity of each fiber, such as the ratio between the fiber area and adjacent fiber areas, or the ratio between the fiber area and the area resulting from the expansion of its contour (computed in the previous step), were calculated from the muscle network. Finally, graph theory was applied to the muscle network to compute additional features.
A total of 82 characteristics were computed, including 16 geometric features, 22 features derived from the muscle network, and 44 from graph theory (Additional file 4: Table S2).
Discriminant feature selection
A feature selection step was performed to analyze the discriminatory power of the 82 characteristics mentioned above. To achieve this, the sequential forward selection method (SFS) [26] and the sequential backward selection method (SBS) [26] used via the Fuzzy-ARTMAP neural network [27] were applied.
SFS is a bottom-up search procedure where one feature at a time is added to the current feature set. At each stage, the feature to be included in the feature set is selected from among the remaining available features, which have not been added to the feature set. The new enlarged feature set yields a minimum classification error compared to adding any other single feature. The algorithm stops when the addition of a new feature yields an increase of the classification error. The SBS is the top-down counterpart of the SFS method. It starts from the complete set of features and, at each stage, the feature that shows the least discriminatory power is discarded. The algorithm stops when removing another feature results in an increase of the classification error.
The selection performance was evaluated by five-fold cross-validation [26]. In this way, the disadvantage of sensitivity to the order of presentation of the training set that the SBS and SFS methods present was diminished. To perform the cross-validation method, four disjoint subsets of each class (control, dystrophy) were used. Three of these subsets served as a training set for the neural network, and the remaining one was used as a validation set. The procedure was then repeated, interchanging the validation subset with one of the training subsets and so on until each of the four subsets had been used as validation sets. The final classification error was calculated as the mean of the errors for each cross-validation run.
The classifier used was the Fuzzy-ARTMAP neural network architecture developed by Carpenter et al.[27], which is based on adaptive resonance theory (ART). Fuzzy-ARTMAP is a supervised learning classification architecture for analog-value input pairs of patterns where each individual input is mapped to a class label.
Results analysis
Once the feature vector with the most discriminatory power to distinguish between the classes analyzed had been selected, it was possible to classify new biopsy images and to estimate the degree of pathology with respect to control images.
The Fuzzy-ARTMAP neural network classifies new biopsies based on the selected features. Also, principal component analysis (PCA) [26] allows the degree of pathology of each biopsy image to be visualized graphically based on its position when two or three principal components are represented (Figure 2). These principal components are correlated data points of the feature vector that PCA transforms into a small number of uncorrelated variables. The projection maximizes the dispersion of the individual data points without prior knowledge of whether these are expected to form groups. This allows the unbiased identification of naturally separated sets of data points that may form groups because they are similar.
PCA was applied to the selected feature vector. To check the vector’s performance as an indicator of the degree of pathology, the Pearson correlation coefficient was computed. This parameter provides a measure of the strength of linear dependence between two variables X and Y, giving a value between +1 and -1 inclusive. In this case, the variable X is the degree of pathology evident in the biopsies used in the training stage, which have previously been analyzed by the pathologist. The variable Y is the Euclidean distance between the pathological images and the centroid of the controls in the PCA graph.