Table 1.
Overview of the 21 datasets used in the pipeline.
Fig 1.
Overview of the total pipeline.
The pipeline is separated in two: Data Standardization and Data Serving, with the common datastore connecting them. Lazy Loading refers to loading the preprocessed data from disk when it is needed during training.
Table 2.
Analysis of Parquet, HDF5 and Pickle file formats, showing file size together with averages of peak memory and load time.
File sizes were measured once as the amount of space used on the system disk. Memory and load time experiments were repeated 100 times for each file format and the mean is reported in the table. Peak memory showed negligible variance. The files all contained the same data from CFS record ‘cfs-visit5–800002’.
Table 3.
Description of actions in the template method pattern adaption for the data standardization procedure, shown in order of operation.
Fig 2.
Class diagram depicting the inheritance hierarchy in CSDP.
The arrows point towards the parent class, which the child class inherits its functionality from. Abstract classes, abstract functions and abstract properties are shown with *.
Fig 3.
Class diagram depicting the dataserving part of the pipeline.
A generic Pipe class is used. Abstract classes, abstract functions and abstract properties are shown with *.
Fig 4.
Class diagram depicting the design of the data augmentation.
Abstract classes, abstract functions and abstract properties are shown with *.
Fig 5.
Distribution of per-recording kappa scores achieved on all data sets.
The upper plot shows scores for the training set, while the lower plot shows scores for the hold-out sets.