EntropyHub: An open-source toolkit for entropic time series analysis

An increasing number of studies across many research fields from biomedical engineering to finance are employing measures of entropy to quantify the regularity, variability or randomness of time series and image data. Entropy, as it relates to information theory and dynamical systems theory, can be estimated in many ways, with newly developed methods being continuously introduced in the scientific literature. Despite the growing interest in entropic time series and image analysis, there is a shortage of validated, open-source software tools that enable researchers to apply these methods. To date, packages for performing entropy analysis are often run using graphical user interfaces, lack the necessary supporting documentation, or do not include functions for more advanced entropy methods, such as cross-entropy, multiscale cross-entropy or bidimensional entropy. In light of this, this paper introduces EntropyHub, an open-source toolkit for performing entropic time series analysis in MATLAB, Python and Julia. EntropyHub (version 0.1) provides an extensive range of more than forty functions for estimating cross-, multiscale, multiscale cross-, and bidimensional entropy, each including a number of keyword arguments that allows the user to specify multiple parameters in the entropy calculation. Instructions for installation, descriptions of function syntax, and examples of use are fully detailed in the supporting documentation, available on the EntropyHub website– www.EntropyHub.xyz. Compatible with Windows, Mac and Linux operating systems, EntropyHub is hosted on GitHub, as well as the native package repository for MATLAB, Python and Julia, respectively. The goal of EntropyHub is to integrate the many established entropy methods into one complete resource, providing tools that make advanced entropic time series analysis straightforward and reproducible.


Introduction
Through the lens of probability, information and uncertainty can be viewed as conversely related-the more uncertainty there is, the more information we gain by removing that uncertainty. This is the principle behind Shannon's formulation of entropy (1) which quantifies uncertainty as it pertains to random processes [1]: environments. Incorporating entropy estimators from information theory, probability theory and dynamical systems theory, EntropyHub features a wide range of functions to calculate the entropy of, and the cross-entropy between, univariate time series data. In contrast to other entropy-focused toolboxes, EntropyHub runs from the command line without the use of a GUI and provides many new benefits, including: ■ Functions to perform refined, composite, refined-composite and hierarchical multiscale entropy analysis using more than twenty-five different entropy and cross-entropy estimators (approximate entropy, cross-sample entropy, etc).
■ Functions to calculate bidimensional entropies from two-dimensional (image) data.
■ An extensive range of function arguments to specify additional parameter values in the entropy calculation, including options for time-delayed state-space reconstruction and entropy value normalisation where possible.
■ Availability in multiple programming languages-MATLAB, Python, Julia-to enable opensource access and provide cross-platform translation of methods through consistent function syntax. To the best of the Authors' knowledge, this is the first entropy-specific toolkit for the Julia language.
■ Compatible with both Windows, Mac and Linux operating systems.
■ Comprehensive documentation describing installation, function syntax, examples of use, and references to source literature. Documentation is available online at www.EntropyHub. xyz (or at MattWillFlood.github.io/EntropyHub), where it can also be downloaded as a booklet (EntropyHub Guide.pdf). Documentation specific to the MATLAB edition can also be found in the 'supplemental software' section of the MATLAB help browser after installation. Documentation specific to the Julia edition can also be found at MattWillFlood. github.io/EntropyHub.jl/stable. As new measures enter the ever-growing entropy universe, EntropyHub aims to incorporate these measures accordingly. EntropyHub is licensed under the Apache license (version 2. 0) and is available for use by all on condition that the present paper by cited on any scientific outputs realised using the EntropyHub toolkit.
The following sections of the paper outline the toolkit contents, steps for installing and accessing documentation.

Toolkit contents and functionality
Functions in the EntropyHub toolkit fall into five categories. The first three categories-Base, Cross and Bidimensional-refer to standalone entropy estimators distinguished according to the type of input data they analyse.
The remaining two categories-Multiscale and Multiscale Cross-relate to multiscale entropy methods using the entropy estimators from the Base and Cross categories, respectively.
■ Multiscale functions return the multiscale entropy of a single univariate time series, calculated using any of the Base entropy estimators, ■ e.g. multiscale entropy (MSEn), composite multiscale entropy (cMSEn), etc.
■ Multiscale Cross functions return the multiscale cross-entropy between two univariate time series calculated using any of the Cross entropy estimators, ■ e.g. cross-multiscale entropy (XMSEn), refined multiscale cross-entropy (rXMSEn), etc.
A list of all functions available in version 0.1 of the EntropyHub toolkit is provided in Table 2. As more entropy methods are identified, these will be added to newer versions of the toolkit.
One of the main advantages of EntropyHub is the ability to specify numerous parameters used in the entropy calculation by entering optional keyword function arguments. The default value of each keyword argument is based on the value proposed in the original source literature for that method. However, blindly analysing time series data using these arguments is strongly discouraged. Drawing conclusions about data based on entropy values is only valid when the parameters used to calculate those values accurately capture the underlying dynamics of the data.
With certain Base and Cross functions, it is possible to calculate entropy using variant methods of the main estimator. For example, with the function for permutation entropy (PermEn) one can calculate the edge [65], weighted [70], amplitude-aware [11], modified [68], finegrained [67], and uniform-quantization [71] permutation entropy variants, in addition to the original method introduced by Bandt and Pompe [66]. It is important to note that while the primary variable returned by each function is the estimated entropy value, most functions provide secondary and tertiary variables that may be of additional interest to the user. Some  [8] which also returns the reverse dispersion entropy [50], the spectral entropy function (SpecEn) [74] which also returns the band-spectral entropy [102], and the Kolmogorov entropy function (K2En) [63] which also returns the correlation sum estimate. Furthermore, every Multiscale and Multiscale Cross function has the option to plot the multiscale (cross) entropy curve (Fig 1), as well as some Base functions which allow one to plot spatial representations of the original time series (Figs 2 and 3).

Installation and dependencies
Major version releases of the EntropyHub toolkit can be directly installed through the native package repository for the MATLAB, Python and Julia programming environments. Beta development versions can be downloaded and installed from the directories of each programming language hosted on the EntropyHub GitHub repository-github.com/MattWillFlood/ EntropyHub. EntropyHub is compatible with Windows, Mac and Linux operating systems.

MATLAB
There are two additional toolboxes from the MATLAB product family that are required to experience the full functionality of the EntropyHub toolkit-the Signal Processing Toolbox and the Statistics and Machine Learning Toolbox. However, most functions will work without these 2. In the search bar, search for "EntroypHub" (S1b Fig). 3. Open the resulting link and click 'add' in the top-right corner (S1c Fig). 4. Follow the instructions to install the toolbox (S1D Fig). Option 2.

Python
There are several modules required to use EntropyHub in Python-NumPy [103], SciPy [104], Matplotlib [105], PyEMD [106], and Requests. These modules will be automatically installed alongside EntropyHub if not already installed. EntropyHub was designed using Python3 and thus is not intended for use with Python2 or Python versions < 3.6. EntropyHub Python functions are primarily built on top of the NumPy module for mathematical computation [103], so vector or matrix variables are returned as NumPy array objects.
There are 2 ways to install EntropyHub in Python. Option 1 is strongly recommended.   EntropyHub.greet()

Supporting documentation and help
To help users to get the most out of EntropyHub, extensive documentation has been developed to cover all aspects of the toolkit, www.EntropyHub.xyz/#documentation-help. Included in the documentation are: ■ Instructions for installation.
■ Thorough descriptions of the application programming interface (API) syntax-function names, keyword arguments, output values, etc.
■ References to the original source literature for each method.
■ Licensing and terms of use.
■ Examples of use.
Supporting documentation is available in various formats from the following sources.

www.EntropyHub.xyz
The EntropyHub website, www.EntropyHub.xyz (also available at MattWillFlood.github.io/ EntropyHub) is the primary source of information on the toolkit with dedicated sections to MATLAB, Python and Julia, as well as release updates and links to helpful internet resources.

EntropyHub guide
The EntropyHub Guide.pdf is the toolkit user manual and can be downloaded from the documentation section of the EntropyHub website or from the EntropyHub GitHub repository. In addition to the information given on the website, the EntropyHub Guide.pdf document provides some extra material, such as plots of fuzzy functions used for fuzzy entropy (FuzzEn) calculation, or plots of symbolic mapping procedures used in dispersion (DispEn) or symbolic-dynamic entropy (SyDyEn).

EntropyHub.jl
Custom documentation for the Julia edition of the toolkit can also be found at MattWillFlood. github.io/EntropyHub.jl (linked to the EntropyHub website). Following Julia package convention, the Julia edition is given the suffix '.jl' and is hosted in a standalone GitHub repository linked to the main EntropyHub repository.

Seeking further help
Within each programming environment, information about a specific function can be displayed in the command prompt by accessing the function docstrings. For example, to display information about the approximate entropy function (ApEn), type:

help?> ApEn
Contact. For help with topics not addressed in the documentation, users can seek help by contacting the toolkit developers at help@entropyhub.xyz. Every effort will be made to promptly respond to all queries received.
To ensure that EntropyHub works as intended, with accurate and robust algorithms at its core, users are encouraged to report any potential bugs or errors discovered. The recommended way to report issues is to place an issue post under the 'Issues' tab on the Entro-pyHub GitHub repository. Doing so allows other users to find answers to common issues and contribute their own solutions. Alternatively, one can notify the package developers of any issues via email to fix@entropyhub.xyz.
Continuous integration of new and improved entropy methods into the toolkit is a core principle of the EntropyHub project. Thus, requests and suggestions for new features are welcomed, as are contributions and offers for collaboration. EntropyHub developers will work with collaborators to ensure that contributions are valid, translated into MATLAB, Python and Julia, and follow the formatting adopted throughout the toolkit. Please contact info@entropyhub.xyz regarding any proposals that wish to be made.

Validation
Included in EntropyHub are a number of sample time series and image datasets which can be used to test the validity of the toolkit functions (Fig 4). Included in these datasets are random number sequences (gaussian, uniform, random integers), chaotic attractors (Lorenz, Hénon), and matrix representations of images (Mandelbrot fractal, random numbers, etc.). Importing these datasets into the programming environment is done using the ExampleData function (Table 2), which requires an internet connection. Every example presented in the supporting documentation on the EntropyHub website, in the MATLAB help browser, or in the Entropy-Hub Guide.pdf, employs the same sample datasets provided by the ExampleData function. Therefore, users can replicate these examples verbatim to verify that the toolkit functions properly on their computer system. The following subsections demonstrate the implementation of several Base, Cross-, Bidimensional, Multiscale and Multiscale Cross-entropy methods as a proof-of-principle validation. Note: the examples in the following subsections use MATLAB syntax, but the implementation of these functions and the values they return are the same when using Python and Julia.

Base entropy
A sequence of normally distributed random numbers (Fig 4A; N

PLOS ONE
Entropy analysis in MatLab, Python and Julia reflects the high degree of deterministic structure shared between the x and y components of the Lorenz system.

Bidimensional entropy
A matrix of normally distributed (Gaussian) random numbers is imported (Fig 4C;  N = 60x120, mean = 0, SD = 1) and bidimensional dispersion entropy is estimated with a template submatrix size of 5 and all other parameters set to default values (time delay = 1, number of symbols = 3, symbolic mapping transform = normal cumulative distribution function). >> X = ExampleData('gaussian_Mat'); >> DispEn2D(X, 'm', 5) 8.77894 The high value of the bidimensional dispersion entropy estimate corresponds to those previously reported for Gaussian white noise [83].

Multiscale entropy
A chirp signal (N = 5000, t 0 = 1, t 0 = 4000, normalised instantaneous frequency at t 0 = 0.01Hz, instantaneous frequency at t 1 = 0.025Hz) is imported and multiscale sample entropy is estimated over 5 coarse-grained temporal scale using the default parameters (embedding dimension = 2, time delay = 1, threshold = 0.2 � SD[X]). Note: a multiscale entropy object (Mobj) must be used with multiscale entropy functions. The chirp signal imported in this example represents a swept-frequency cosine with a linearly decreasing period length. The coarse-graining procedure of multiscale entropy [15] functions as a low-pass filter of the original time series, with a lower cut-off frequency at each increasing time scale. Therefore, the coarse-graining procedure increasingly diminishes the localised auto-correlation of the chirp signal at each temporal scale, increasing the entropy. This reflects the increasing sample entropy values from low (0.2738) to moderate (0.6759) returned by the MSEn function.

Multiscale cross-entropy
Two sequences of uniformly distributed random numbers (N = 4096, range = [0, 1]) are imported and multiscale cross-distribution entropy is estimated over 7 coarse-grained temporal scales with the default parameters (embedding dimension = 2, time delay = 1, histogram binning method = 'sturges', normalisation with respect to number of histogram bins = true).

Discussion
The growing number of entropy methods reported in the scientific literature for time series and image analysis warrants new software tools that enable researchers to apply such methods [2,3,38]. Currently, there is a dearth of validated, open-source tools that implement a comprehensive array of entropy methods at the command-line with options to modify multiple parameter values. EntropyHub is the first toolkit to provide this functionality in a package that is available in three programming languages (MATLAB, Python, and Julia) with consistent syntax, and is supported by extensive documentation (Table 3). To the best of the Authors knowledge, EntropyHub is also the first toolkit to provide multiple functions for bidimensional entropy [82][83][84][85][86], multiscale entropy [14,15,35,90,96] and multiscale cross-entropy analyses [40,97,98] all in one package. Specific programming language editions of the Entro-pyHub toolkit are hosted on the native package repositories for MATLAB, Python and Julia (Table 3), facilitating straightforward installation and version updates. EntropyHub is compatible with both Windows, Mac and Linux operating systems, and is open for use under the Apache License (Version 2.0) on condition that the present manuscript be cited in any outputs achieved through the use of the toolkit.
The application of entropy in the study of time series data is becoming more common in all manner of research fields such as engineering [17,18], medicine [19][20][21][22][23] and finance [24][25][26][27]. The broad range of entropy functions provided by EntropyHub in multiple programming languages can serve to support researchers in these fields by characterising the uncertainty and complexity of time series data with various stochastic, time-frequency and chaotic properties. Additionally, this is the first toolkit to provide several functions for performing bidimensional (2D) entropy analysis, which can enable users to estimate the entropy of images and matrix data.
The goal of EntropyHub is to continually integrate newly developed entropy methods and serve as a cohesive computing resource for all entropy-based analysis, independent of the application or research field. To achieve this goal, suggestions for new features and contributions from other researchers are welcomed.