pyKNEEr: An image analysis workflow for open and reproducible research on femoral knee cartilage

Transparent research in musculoskeletal imaging is fundamental to reliably investigate diseases such as knee osteoarthritis (OA), a chronic disease impairing femoral knee cartilage. To study cartilage degeneration, researchers have developed algorithms to segment femoral knee cartilage from magnetic resonance (MR) images and to measure cartilage morphology and relaxometry. The majority of these algorithms are not publicly available or require advanced programming skills to be compiled and run. However, to accelerate discoveries and findings, it is crucial to have open and reproducible workflows. We present pyKNEEr, a framework for open and reproducible research on femoral knee cartilage from MR images. pyKNEEr is written in python, uses Jupyter notebook as a user interface, and is available on GitHub with a GNU GPLv3 license. It is composed of three modules: 1) image preprocessing to standardize spatial and intensity characteristics; 2) femoral knee cartilage segmentation for intersubject, multimodal, and longitudinal acquisitions; and 3) analysis of cartilage morphology and relaxometry. Each module contains one or more Jupyter notebooks with narrative, code, visualizations, and dependencies to reproduce computational environments. pyKNEEr facilitates transparent image-based research of femoral knee cartilage because of its ease of installation and use, and its versatility for publication and sharing among researchers. Finally, due to its modular structure, pyKNEEr favors code extension and algorithm comparison. We tested our reproducible workflows with experiments that also constitute an example of transparent research with pyKNEEr, and we compared pyKNEEr performances to existing algorithms in literature review visualizations. We provide links to executed notebooks and executable environments for immediate reproducibility of our findings.


Introduction
Open science and computational reproducibility are recent movements in the scientific community that aim to promote and encourage transparent research. They are supported by PLOS ONE | https://doi.org/10.1371/journal.pone.0226501 January 24, 2020 1 / 19 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Wang's repository and Shan's repository). These two implementations, however, have some limitations: in the first case, documentations of code and usage are not extensive, while in the second case the code is written in C++ and requires advanced programming skills to be compiled and run. Other communities, such as neuroimaging, largely benefit from robust, opensource, and easy-to-use software to segment and analyze images (e.g. ANTs [33], FreeSurfer [34], Nipype [35]). Because of these open-access tools, researchers do not need to re-implement algorithms for basic processing and can focus on further statistical analyses [36][37][38]. To accelerate discoveries and findings, it is fundamentally important to have not only open-source tools, but also workflows that are computationally reproducible, and thus enhance scientific rigor and transparency [8].
In this paper, we present pyKNEEr, an automatic workflow to preprocess, segment, and analyze femoral knee cartilage from MR images specifically designed for open and reproducible research. The main characteristics of pyKNEEr are embedded in its name: py is for python, to indicate openness, KNEE is for femoral knee cartilage, and r is for reproducibility. pyKNEEr is written in python with Jupyter notebooks as graphical user-interface, is shared on GitHub, and has a documentation website. In addition, we provide an example of transparent research with pyKNEEr through our validation study, implemented using images from the Osteoarthitis Initiative (OAI) [68] as well as in-house images. Finally, to be compliant with recommendations for interactive publications, throughout the paper we provide links to data files and repositories, software repositories, specific code and Jupyter notebooks, executable environments), metafiles and web documentation, and websites [69]. In addition, we provide an example of transparent research with pyKNEEr through our validation study, implemented using images from the Osteoarthitis Initiative (OAI) [68] as well as in-house images. Finally, to be compliant with recommendations for interactive publications, throughout the paper we provide links to data files and repositories, software repositories, specific code and Jupyter notebooks, executable environments, metafiles and web documentation, and websites [69].

Reproducibility: Jupyter notebooks with computational narratives and dependencies
We designed pyKNEEr as a tool to perform and support computational reproducible research, using principles recommended in the literature [5,6]. For each module of the framework, we used one or more Jupyter notebooks as a user-interface, because of their versatility in combining code, text, and visualization, and because they can be easily shared among researchers, regardless of operating systems. Across pyKNEEr modules, we used the same notebook structure for consistent computational narratives (Fig 2). Each notebook contains: 1. Link to the GitHub repository: The repository contains code and additional material, such as source files of documentation and publication; 2. Link to documentation: Each notebook is associated with a webpage containing instructions on how to create input text files, run notebooks, and evaluate outputs. Explanations include step-by-step instructions for a demo dataset, provided to the user to become familiar with the software. Single webpages are part of a documentation website, comprehensive of installation instructions and frequently asked questions. We created the website using sphinx, the python documentation generator; 3. Introduction: Brief explanation of the algorithms in the notebook; 4. User inputs: The input of each notebook is a text file (.txt) with folder paths and file names of images to process or analyze. Additional inputs are variables to customize the process, such as number of cores and algorithm options; 5. Commands with narrative: Titles and subtitles define narrative units of computations, and additional texts provide information about command usage and outcome interpretation. Commands in the notebook call functions in python files associated with that notebook (e.g. in the preprocessing module, the notebook preprocessing.ipynb calls the python file preprocessing_for_nb.py). In turn, associated python files call functions in core files (e.g. the python file preprocessing_for_nb.py calls sitk_ functions.py, containing image handling functions); 6. Visualization of outputs: Qualitative visualizations include sagittal slices with superimposed cartilage mask or relaxometry map, 2D thickness maps, and 3D relaxometry maps, to allow users a preliminary evaluation of outputs. Quantitative visualizations include scatter plots and tables with numerical values and descriptive statistics (Fig 2b), which are also saved in .csv files to allow researcher subsequent analysis; 7. References: List of main references used in notebook algorithms; 8. Dependencies: Code dependencies (i.e. version of python, python packages, and computer operating systems and hardware) to allow researchers to recreate the current computational environment and thus reproduce findings. To print dependencies, we used the python package watermark.

Algorithms in pyKNEEr
pyKNEEr contains specific algorithms to preprocess, segment, and analyze femoral knee cartilage from MR images. Image preprocessing. Spatial and intensity preprocessing provide standardized high quality images to the segmentation algorithm [77]. In spatial preprocessing, we transform images to right-anterior-inferior (RAI) orientation, we flip right knees (when present) to the left laterality, and we set image origin to the origin of the cartesian system (0,0,0). In intensity preprocessing, we correct image intensities for the inhomogeneities of the static magnetic field (B 0 ) [78], we rescale intensities to a common range [0-100], and we enhance cartilage edges with pyKNEEr: An image analysis workflow for transparent research on femoral knee cartilage edge-preserving smoothing using curvature flow [79] (Fig 3I). Implementation of intensity preprocessing is a translation of the open access code by Shan et al. [32] from C++ to python.
Femoral knee cartilage segmentation. Three steps comprise femoral cartilage segmentation: 1) finding a reference image; 2) segmenting femoral cartilage; and 3) evaluating segmentation quality (Fig 3II). Finding reference image and evaluating segmentation quality are possible only when ground truth segmentations are available.
Finding a reference image. We propose a convergence study to find the reference image, i.e. a segmented image used as a template, or atlas, for the segmentation. First, we randomly select an image as a reference image. Then, we register all images of the dataset to the reference using rigid, similarity, and spline transformations, as explained in the following paragraph. Next, we average the vector fields that result from the registrations. Finally, we choose the image whose vector field is the closest to the average vector field as the new reference for the following iteration. We repeat until two consecutive iterations converge to the same reference image or after a fixed number of iterations. It is possible to execute the search for the reference several times using different images as the initial reference to confirm the selection of the reference image. This algorithm requires femur masks because the comparison among vector fields and their average is calculated in the femur volume, as cartilage volume is limited.
Atlas-based segmentation. Initially, we register a moving image (i.e. any image of the dataset) to a reference image, by transforming first the femur and then the cartilage. Then, we invert the transformation describing the registration. Finally, we apply the inverted transformation to the cartilage mask of the reference image to obtain the cartilage mask of the moving image. The images to segment can be new subjects (intersubject), images of the same subject acquired at different time points (longitudinal), or images of the same subject acquired with different protocols (multimodal). To segment intersubject images, we use rigid, similarity, and spline registration, to segment longitudinal images only rigid and spline registration, and to segment multimodal images only rigid registration. We perform image registration and mask warping with elastix and transformix, respectively [44, 76], using a multiresolution approach with smoothing image pyramid, random coordinate sampler, adaptive stochastic gradient descent optimizer, and B-spline interpolators [44]. Detailed parameters are in the code repository (GitHub).
Evaluating segmentation quality. We quantitatively evaluate quality of segmentation using the Dice Similarity Coefficient (DSC) and average surface distance (ASD) for the whole mask region of interest. DSC is a measure of the overlap between a newly segmented mask and the corresponding ground truth segmentation [80]. The Dice Similarity Coefficient is calculated as: where NM is the newly segmented mask, and GT is the ground truth. ASD is the average of the Euclidean distances between the newly segmented mask and the ground truth mask, calculated as: where edge NM is the edge the newly segmented mask, nm is any voxel, and n nm is the number of voxels; similarly, edge GT is the edge the ground truth mask, gt is any voxel, and n gt is the number of voxels. [55].
Cartilage morphology. Morphology quantifications are cartilage thickness and cartilage volume. To calculate cartilage thickness, first we extract contours from each slice of the cartilage mask as a point cloud. Then, we separate the subchondral side of the cartilage from the articular side, we interpolate each cartilage side to a cylinder that we unroll to flatten cartilage [26], and we calculate thickness between the two cartilage sides using a nearest neighbor algorithm in the 3D space [40,81]. Finally, we associate thicknesses to the subchondral point cloud to visualize them as a 2D map. We compute cartilage volume as the number of voxels of the mask multiplied by the voxel volume.
Cartilage relaxometry. We implemented two algorithms to calculate relaxometry maps: Exponential or linear fitting and Extended Phase Graph (EPG) modeling. We use exponential or linear fitting to compute T 1ρ maps from T 1ρ -weighted images and T 2 maps from T 2weighted images. We calculate exponential fitting by solving a mono-exponential equation voxel-wise using a Levenberg-Marquardt fitting algorithm [25]: where: for T 1ρ -weighted images, T a is time of spin-lock (TSL) and T b is T 1ρ ; for T 2 -weighted images, T a is echo time (TE) and T b is T 2 ; and K is a constant. We compute linear fitting by transforming the images to their logarithm and then linearly interpolating voxel-by-voxel. Linear fitting is not recommended when signal-to-noise ratio is high because the logarithm transformation alters the normality of noise distribution, but it is fast and computationally inexpensive [82]. Before calculating exponential or linear fitting, the user has the option to register the images with lowest TE or TSL to the image with the highest TE or TSL to correct for image motion during acquisition [83]. We use EPG to calculate T 2 maps from DESS acquisition. The implementation in pyKNEEr is the one proposed by Sveinsson et al. [84], which is based on a linear approximation of the relationship between the two DESS signals. Computational costs. Computation time for one image through the whole pipeline is about 45 minutes on one core on a MacOS laptop with a processor Intel Core i5 at 2.3 GHz and memory of 16 GB at 2133 MHz. On average, the computation times for the workflow steps are: preprocessing: 20 minutes, segmentation: 15 minutes, morphology analysis: 5 minutes, and relaxometry analysis: 5 minutes. To optimize computational effort, we used the multiprocessing python package to process images on separate cores. Therefore, computation time for a whole dataset is linearly dependent on the number of cores. As an example, computation time for 2 images on one core is about 90 (45x2) minutes, on two cores is 45 minutes.

Open and reproducible research with pyKNEEr: Our validation study
We validated pyKNEEr with experiments that also constitute an example of open and reproducible research with pyKNEEr.

Image data
We used three datasets that we named OAI1, OAI2, and inHouse ( Table 2). OAI1 contained 19 Double-Echo in Steady-State (DESS) images and T 2 -weighted (T 2 -w) spin-echo images acquired at year 4 of the OAI study. Ground truth segmentations were created using an atlasbased method (Qmetrics Technologies, Rochester, NY, USA) [45] for a previous study [85]. OAI2 consisted of 88 DESS images acquired at baseline and at 1-year followup. Ground truth segmentations were computed using an active appearance model (imorphics, Manchester, UK) [42]. Finally, inHouse contained 4 images acquired at Stanford University using DESS and CubeQuant protocols. For clarity in the following, OAI1 will be split in OAI1-DESS and OAI1-T2, OAI2 in OAI2-BL (baseline) and OAI2-FU (followup), and inHouse in inHouse-DESS and inHouse-CQ (CubeQuant). Details of the acquisition parameters are in Table 2I.

Results
We preprocessed, segmented, and analyzed all the dataset using different options in pyK-NEEr, according to dataset characteristics and availability of ground truth segmentation (link to dataset here) (Table 2III).
Preprocessing. We executed spatial preprocessing for all images of the datasets and intensity preprocessing only for the images directly involved in segmentation.
Finding reference. We selected the reference mask from the dataset OAI1-DESS because of the availability of ground truth segmentations of the femur, which is the volume where we compare velocity fields. We picked 5 images as initial reference for our parallel investigation Table 2. Datasets used to evaluate pyKNEEr. I. Acquisition parameters: Parameters of the protocols used to acquire the images. Images of OAI1-DESS, OAI2-BL, and OAI2-FU were acquired with the same DESS protocol, consisting of 2 echos, although only their average was available (�). Images of one subject of the dataset OAI1 had different slice spacing and thickness (?). Data queries to obtain acquisition parameters are in a Jupyter notebook (here). The original identification numbers (IDs) of the OAI images are in a Jupyter notebook used as a metafile (here). II. Ground truth segmentation: The datasets OAI1 and OAI2 have ground truth segmentations. They differ for computational method, segmented anatomy, and label type. III. Experimental results: Details of the steps in pyKNEEr for each dataset. Full circle (•) indicates processing of the dataset, while empty circle (�) indicates processing of ground truth segmentations. The numbers in "Find reference" indicated the ID of the seed images used in the convergence study. Links are to the executed notebooks on GitHub.
Morphology. We calculated cartilage thickness and volume for all datasets, including ground truth segmentations. We computed correlations of cartilage thickness calculated from pyKNEEr's segmentation and ground truth segmentation, and we found that Pearson coefficients were 0.958 for OAI1-DESS, 0.668 for OAI1-T2, 0.654 for OAI2-BL, and 0.667 for OAI2-FU (Fig 4b). Similarly, we computed correlations for cartilage volume, and we found that Pearson coefficients were 0.983 for OAI1-DESS, 0.847 for OAI1-T2, 0.891 for OAI2-BL, and 0.885 for OAI2-FU (Fig 4c) (see code).
Relaxometry. Before calculating relaxometry maps for OAI1-T2, we rigidly registered the images with shortest TE to the image with longest TE. Similarly, before calculating T 1ρ maps for inHouse-CQ, we rigidly registered the images with shortest TSL to the image with longest TSL. Then, we calculated T 2 maps for OAI-T2 images extracting values in pyKNEEr's masks (see code) and ground truth masks (see code), and we compared them, obtaining a Pearson's coefficient of 0.513 (Fig 4d) (see code). Finally, we computed relaxometry maps using exponential fitting for inHouse-CQ (see code) and EPG modeling for inHouse-DESS (see code) to show feasibility of the methods.

Discussion
To test possible reproducible workflows with pyKNEEr, we ran experiments with three different datasets. Image preprocessing was successful in all cases, while image segmentation failed in 4 cases. Average DSC were 0.81 for dataset OAI1 and 0.73 for dataset OAI2. These values are in the range of published values, as depicted in the literature review visualization of DSC (Fig 5). Discrepancies of DSC between OAI1 and OAI2 can be due to the different characteristics of ground truth segmentations. OAI1 ground truth segmentations were created using an atlas-based method with DSC = 0.88 [45] (see "TamezPena (2012)" in Fig 5), whereas OAI2 ground truth segmentations were created using an active appearance model with DSC = 0.78 [42] (see "Vincent (2011)" in Fig 5). In addition, to calculate DSC we transformed OAI2 ground truth segmentations from contours to volumetric masks, potentially adding discretization error. Quality of segmentation had a direct impact on morphology and relaxometry analysis. Pearson's coefficient was higher for cartilage volume than cartilage thickness, suggesting higher preservation of volume, and it was low for T 2 relaxation times, suggesting higher dependency on segmentation quality for intensity-based measurements. Finally, regression lines show that measurements from pyKNEEr segmentation overestimated small thicknesses and underestimated large volumes and T 2 values (Fig 4). We implemented atlas-based segmentation because it has the advantage to provide byproducts for further analysis. Image correspondences established during the registration step can be used for intersubject and longitudinal comparison of cartilage thickness and relaxation times, and voxel-based morphometry and relaxometry [44]. We designed pyKNEEr to facilitate transparent research on femoral knee analysis from MR images. Traditionally, medical image analysis workflows are in ITK, VTK, and Qt, requiring advanced computational skills in C++ to build, run, and extend code. We wrote pyKNEEr in python because of its ease of use, compatibility with various operating systems, and extensive computing support through packages and open code. As a consequence, pyKNEEr can be easily installed as a package in the python environment and does not require advanced programming skills. In addition, we used Jupyter notebooks as a user-interface because of their ease of use, literate computing approach [86], versatility for publications, and sharing among researchers. In pyKNEEr, Jupyter notebooks can be simply downloaded from our GitHub repository to a local folder. Researchers have to input an image list and optionally set a few variables, and after automatic execution of the notebook, they directly obtain visualizations, graphs, and tables for further analysis. In addition, researchers can link the executed notebook directly to papers (similarly to Table 2) and thus create an interactive publication with reproducible analysis. In the medical image analysis community, other examples of combined use of python and Jupyter notebooks are mainly for educational and general research purpose (e.g SimpleITK notebooks [87]), while usage of python as a programming language is rapidly gaining popularity in neuroimaging (e.g. Nipype [35]).
Several extensions of pyKNEEr could be imagined, due to the modularity of its structure. In the segmentation module, the current notebook implementing atlas-based segmentation (segmentation.ipynb) could be substituted by notebooks with hybrid machine or deep learning algorithms, which can provide higher DSC [55] (Fig 5). In the morphology module (morphology.ipynb), the code structure already includes a flag (thickness_algo) to integrate additional algorithms for cartilage thickness, such as surface normal vectors, local thickness, and potential field lines [81]. Finally, new notebooks could be added to the workflow to segment and analyze more knee tissues, such as tibial cartilage, patellar cartilage, and the pyKNEEr: An image analysis workflow for transparent research on femoral knee cartilage menisci. Extensions will require a limited amount of effort because of the popularity and ease of python, the free availability of a large number of programming packages, and the flexibility of Jupyter notebooks [87]. In addition, standardized file format and computational environment will facilitate comparison of findings and performances of new algorithms.
In conclusion, we have presented pyKNEEr, an image analysis workflow for open and reproducible research on femoral knee cartilage. We validated pyKNEEr with three experiments, where we tested preprocessing, segmentation, and analysis. Through our validation test, we presented a possible modality of conducting open and reproducible research with pyKNEEr. Finally, in our paper we provide links to executed notebooks and executable environments for computational reproducibility of our results and analysis.