Analyzing the fine structure of distributions

One aim of data mining is the identification of interesting structures in data. For better analytical results, the basic properties of an empirical distribution, such as skewness and eventual clipping, i.e. hard limits in value ranges, need to be assessed. Of particular interest is the question of whether the data originate from one process or contain subsets related to different states of the data producing process. Data visualization tools should deliver a clear picture of the univariate probability density distribution (PDF) for each feature. Visualization tools for PDFs typically use kernel density estimates and include both the classical histogram, as well as the modern tools like ridgeline plots, bean plots and violin plots. If density estimation parameters remain in a default setting, conventional methods pose several problems when visualizing the PDF of uniform, multimodal, skewed distributions and distributions with clipped data, For that reason, a new visualization tool called the mirrored density plot (MD plot), which is specifically designed to discover interesting structures in continuous features, is proposed. The MD plot does not require adjusting any parameters of density estimation, which is what may make the use of this plot compelling particularly to non-experts. The visualization tools in question are evaluated against statistical tests with regard to typical challenges of explorative distribution analysis. The results of the evaluation are presented using bimodal Gaussian, skewed distributions and several features with already published PDFs. In an exploratory data analysis of 12 features describing quarterly financial statements, when statistical testing poses a great difficulty, only the MD plots can identify the structure of their PDFs. In sum, the MD plot outperforms the above mentioned methods.


Introduction
In exploratory distribution analysis, it is essential to investigate structures of continuous features making sure that such investigation does not mislead the researcher to make false assumptions.
Given a feature in the data space, there are several approaches to evaluate univariate structures using the indications of the quantity and range of values, e.g., Quantile-Quantile plot (1), histogram or cumulative density function and probability density function (pdf).If the goal is to evaluate many features simultaneously, three approaches are of particular interest: the Box-Whisker diagram (box plot) (2), the violin plot (3) and the bean plot (4).The box plot is almost unable to visualize multimodality (2) and is therefore disregarded in this work.As suggested by the name, the violin plot was particularly intended to identify multimodality by exposing a waist between two modes of distribution.
In exploratory statistics, univariate density estimation is a trying task, especially for non-experts of the field.Changing default parameters of available software such as bandwidth and kernel density estimator can lead to better or worse results regarding the methods mentioned above.
However, in an strictly exploratory setting as well as for the evaluation of quality measures for supervised or unsupervised machine learning methods, it is difficult to set such parameters without having a prior model of the data or results of the evaluation.Hence, non-experts typically use the default choices.Moreover, it is a challenging task to consider the intrinsic assumptions of common density estimates approaches leading to the preference to use the most common methods in their default setting.
Remaining in default parameter settings, the schematic plots of the violin plots, bean plots or the histograms provide misleading visualizations which will be illustrated on several bodies of data.This motivates a new graphical tool enabling a better understanding of the data at hand.This work propose a strictly-data driven schematic plot called Mirrored-Density plot based on Pareto density estimation (PDE).The PDE approach is particularly suitable to detect structures in continuous data, but its kernel density estimation does not require any parameters to be set.The MD plot is compared to conventional methods of the violin or so-called bean plot and histogram.This work will show that the MD plot is able to investigate distributions of data with more sensitivity in the case of multimodal or skewed distributions in comparison to the conventional methods.Statistical testing will be used as an indicator of the sensitivity of all methods regarding skewness and multimodality.For exploratory data analysis in a high-dimensional case, descriptive statistics will be used to show that the bean plot results in misleading visualization contrary to the MD plot.

Methods
The methods section is divided into three parts.First, we outline how the performance of visualization tools is investigated.The focus of interest in this work lies in the visualization of the basic properties of empirical distribution of each feature separately, which means that our interest is restricted to univariate density estimation and visualizations that can present more than one feature in one plot.Such approaches are usually called schematic plots.The best-known representative is the Box-Whisker diagram (box plot) (2).However, box plots are unable to visualize multimodality (e.g. ( 5)) and are therefore not investigated here further.In the next section, we introduce the visualization tools, which will be compared.In the last section, we introduce the MD plot.

Performance Comparison
In this work, three steps of comparison are applied.First, artificial features are generated by specifically defined sampling approaches.Thus, the basic properties of the investigated distributions are well defined as long as the sample size does not become too small.To account for variance in sampling, we perform 100 iterations of sampling and test the artificial data sets for multimodality and skewness priorly to visualizing them with schematic plots.
The sensitivity for multimodality is compared to Hartigans' dip statistic (6) because it has the highest sensitivity in distinguishing unimodality from non-unimodality compared to other approaches (7).For skewness, the D'Agostino test of skewness ( 8) is used to distinguish skewed distributions from normal distributions.In the next step, natural features are selected, which the basic properties of the empirical distributions are already known.The first and second step outline the problems of conventional methods.In the last step, we exploratively investigate a new data set with several features with unknown basic properties in order to summarize the challenges of visualizing the estimated probability density function.In such a typical data mining setting it would be a very challenging task to adjust parameters of the conventional visualizations tools investigated here.We compare the visualizations to basic descriptive statistic and show which visualization tools do not visualize the shapes of the pdf accurately.Table 1 summarizes the interesting basic properties from the perspective of data mining and the methods used to compare performance.
Comparing visualizations is challenging because they have the same issues as the estimation of quantiles or clustering algorithms like k-means or Ward: they depend on the specific implementation (c.f. ( 16), (17), (18,19)).Therefore, this work restricts the comparison to several conventional methods and specifies the programming language, package and pdf estimation approach used in order to outline several relevant problems for visualization the basic properties of the pdf.To make sure that the here introduced MD plot does not depend on the specific implementation we provide the package two different programming languages (R and python) reproducing the R results presented below in the python tutorial attached to this work.
Not required here, but we can refer to (10,12) Range of data is sufficient for the task."There is no easy characteristic for heavytailedness" (13) skewness versus normality Biomedical data (14)

Visualization Tools
Usually, univariate density-estimation is based on either finite mixture models, variable kernel estimates or uniform kernel estimates (20).Finite mixture models attempt to find a superposition of parameterized functions, typically Gaussians, which best account for the data (21).In the case of kernel-based approaches, the actual probability density function is estimated using local approximations (21): The local approximations are parameterized such that only data points within a certain distance of a selected point influence the shape of the kernel function which is called (band-) width or radius of the kernel (21).Variable kernel methods adjust the radius of the kernel, and uniform kernel algorithms use a fixed global radius (21).Histograms use a fixed global radius defining the width of a bin (binwidth).The binwidth parameter is critical for the visualized basic properties of the pdf and in this work only default parameter will be used because layman would probably not adjust parameters.However, approaches exist for a more elaborate choice depending on the intrinsic assumptions about the data (e.g. ( 22)).As an example, we use histograms of plotly (23) because it can be used in either R, Matlab or Python.This work concentrates on visualizing the b of the estimated probability density distribution (pdf) which will be called in short the distribution of the variable.
The first variation to visualize the pdf was the vase plot (24), where the box of a box plot is replaced with a symmetrical display of estimated density (5).The box plot itself visualizes only the statistical summary of a feature.A further improvement was the violin plot which mirrors an estimated pdf in a way that the visualization looks similar to a box plot."The bean plot (4) is a further enhancement that adds a rug that is showing every value and a line that shows the mean.
The name is inspired by the appearance of the plot: the shape of the density looks like the outside of a bean pod, and the rug plot looks like the seeds within" (5). .The violin plot uses a nonparametric density estimation based on a smooth kernel function with a fixed global radius (25).The R package 'vioplot' on CRAN (26) serves as a representative for this work using the density estimation with the bandwidth defined by a Gaussian variance of the R package 'sm' on CRAN (27).Alternative approach that is commonly applied, is to use the density estimation of the R package 'stats' (28) where the bandwidth is usually computed by estimating the mean integrated square error (29) but several approaches can be chosen.In contrast to the violin plot, the bean plot in the R package 'beanplot' on CRAN (4) redefines the bandwidth with (30).
As remarked by Bowman and Azzalini the density estimation critically depends on the choice of the width of the kernel function (25).
One of the most common ways to create a violin plot in Python is to use the visualization package 'seaborn' (31) which extends the Python package 'matplotlib' by statistical plots like the violin plot.Seaborn is using gaussian kernels for kernel density estimation from the Python package 'scipy' (32) where the bandwidth is per default 1 set to Scott's Rule (21).

Mirrored Density Plot (MD plot)
A special case of uniform kernel estimates is the density estimation using the number of points within a hypersphere of a fixed radius around each given data point.There, the number of points within a hypersphere around each data point is used for the density estimation at the center of the hypersphere.In "Pareto Density Estimation (PDE) the radius for hypersphere density estimation is chosen optimally w.r.t information theoretic sense (20).Information optimization calls for a radius with which the hyperspheres contain a maximum of information using minimal volume (20).If a hypersphere contains in the average about 20% of the data, it gives more than 80% of the possible information any subset of data can have (20).PDE is particularly suitable for the discovery of structures in continuous data and allows the discovery of mixtures of Gaussians (9).
For this work, the general idea of mirroring the pdf in a visualization is combined with the PDE approach of density estimation resulting in the Mirrored-Density plot (MD plot).Using the theoretical insights of (20) for the Pareto radius and (22) for the number of kernels the algorithm of PDE is implemented in the package 'DataVisualizations' on CRAN (33) and independently 1 https://github.com/scipy/scipy/blob/v1.3.0/scipy/stats/kde.py#L43-L637implemented additionally in Python (34).To provide and easy-to-use method for non-experts, the MD plot allows to investigate the distributions of many variables after common transformations (symmetric log, robust normalization (35), percentage) with automatic sampling in case of large data sets and several statistical tests for normal distributions.If all tests agree that a variable is Gaussian distributed than the plot of the variable is automatically overlaid with a normal distribution of robustly estimated mean and variance equal to the data.This step allows to mark possible non-Gaussian distributions single variable investigations with a quantile-quantile plot in a case where statistical testing maybe insensitive.In the default mode, the feature are ordered by convex, concave, unimodal, non-unimodal "distribution shapes".The MD plot can be applied by using the R package 'DataVisualizations' on CRAN (33).The Python implementation of the MD plot is provided in the Python package 'md_plot' on PyPi (34).The vignettes describing the usage and providing the data are attached to this work for the two most-common data science programming languages Python and R. In the next section, the visual performance of indicating the correct distribution of features is investigated by histogram, violin and bean plot in comparison to MD plot

Results
Initially, a random sample of 1000 points of a uniform distribution was drawn and visualized by a commonly used histogram method, violin plot, bean plot and MD plot (Fig. 1).In a pdf visualizations of a uniform distribution, a straight line is expected which can have minor fluctuations depending on the random number generator used (range [-2,2], generated with R 3.5.1,runif function).Contrary to the expectation, the histogram and bean plot indicate multimodality and bean plot and violin plot bend the pdf line in the direction of the end points.The visualization of this sample in Python with the package 'seaborn' (31) shows a tendency towards multimodality (SI.C, Fig. 13).Statistical testing with Hartigans' dip test (6) and D'Agostino test of skewness (8) yields (p(N=1000, D = 0.01215) = 0.44 and p(N=1000, z = 0.59) = 0.55 respectively) indicating that this sample is unimodal and not skewed.This insight leads to several experiments and one exploratory investigation of a high-dimensional data set.The first two experiments investigate multimodality and skewness of data.The third experiment investigates the clipping of a data because it is often used in data science.The fourth experiment uses a well-investigated clipped variable which is log-normal distributed and possesses several modes (36).In the exploratory investigation, descriptive statistics in a high-dimensional case is used to outline major differences between the bean plot and the MD plot.In the last experiment, the effect of the range of values on the schematic plots is outlined.

Experiment I: Multimodality versus Unimodality
Two Gaussians with one changing means were used in order to investigate the sensitivity for bimodality in the histogram, violin plot bean plot and MD plot.For each Gaussian randomized samples of 15500 points were drawn, with 50% probability from each.The sample consisting of the first Gaussian N(m=0,sd=1) remained unchanged and the second Gaussian N(m=i,sd=1) changes its mean through a range of values.Vividly, the distance between the two modes of a Gaussian mixture is varied for each change of the mean of the second Gaussian.For statistical testing with Hartigans' dip test 100 iterations were performed in order to take the variance of the random number generators and statistical method into account.In Fig. 2 it is visible, that starting with a mean of 2.4 a significant p-value around 0.05 is probable and starting with a mean of 2.5 every p-value will be below 0.01.This result is visualized in Fig. 3.The bimodality is visible in the bean plot starting with a mean equal to 2.4 and in the MD plot starting with a mean equal to 2.4 but a robustly estimated Gaussian in magenta is overlaid making bimodality visible starting from a mean of 2.2.Hartigans' dip statistic (6) agrees with these two schematic plots.In contrast, violin plots do not show a bimodal distribution (Fig. 3), while the Python violin shows the bimodality starting with a mean equal to 2.4 (SI.C, Fig. 14).Histograms are less sensitive showing a bimodal distribution beginning with the mean of 2.5.

Experiment II: Skewness versus Normality
Next, an artificial feature of a skewed normal distribution is generated by the sampling method of the R package 'fGarch' available on CRAN (37).For the skewed Gaussian, large randomized samples of 15000 points were drawn for each value of the skewness parameter.One hundred iterations were performed, and the D'Agostino test of skewness (8) was applied and showed no significant results for skewness in a range of ]0.95,1.05[ in Fig. 4. Skewness is visible in the bean plot and MD plot (Fig. 5) but not in the violin plot.Unlike the R version, the skewness is visible in the Python version of the violin plot (SI.C, Fig. 15), but slightly less sensitive as the bean plot and MD plot.In the histogram, the skewness of the distribution is difficult to recognize (Fig. 5).
The bean plot, and MD plot are slightly less sensitive regarding skewed distributions compared to statistical testing (Fig. 4).

Experiment III: Data Clipping versus Heavy-Tailedness
The municipality income tax yield (MTY) of German municipalities of 2015 (33,38) serves as an example for data clipping in which the comparison will be restricted to bean plot and MD plot.MTY is unimodal.Hartigans' dip statistic agrees with this assessment that MTY is unimodal ( = 11194,  = 0.0020678) = 0.99, (6).The bean plot has a mayor limitation for clipped data (Fig. 6) where it estimates non-existent distribution tails and visualizes a density above and below the range of the clipping [1800,6000].This issue can also be observed with the Python violin plot (SI.C, Fig. 16).

Experiment IV: Combining Multimodality and Skewness with Data Clipping
Here, one features is used to compare the histogram and the schematics plots against each other.
The feature is the income of German people from the year 2003 (36).The whole feature was modelled with Gaussian mixture model on the log scale and verified with Xi-quadrat-test (p<.001) and QQ plot (36).A sample of 500 cases was taken, the pdf of the sample is skewed on the log scale ( = −1.73,p-value ( = 500,  = −22.4)< 2.2 − 16, D'Agostino skewness test ( 8)).
In Fig 8, it is visible that the violin plot underestimate skewness of the distribution contrary to the MD plot.The violin and bean plot shows a mode in the skewed distribution between 4 to 4.5 contrary to the MD plot (Fig. 8).In Fig. 7, the histogram agrees with the MD plot and disagrees with the bean plot that there are no values above 4.35 meaning that the bean plot visualizes a pdf above the maximum value (marked with red lines).Thus, similar to experiment III, the bean plot    Note, that for a better comparison we disabled the additional overlaying plots.

Experiment V: Visual Exploration of Distributions
The high-dimensional dataset (d=45) of quarterly statements of companies being noted in the German stock market is investigated by selecting 12 features, exemplary2 .The prime standard of "Deutsche Börse" (39) requires these companies to report their balance and cash flow regularly every three months in and the standardized way which is accessible in (40).Using web scraping the information of n=269 cases were extracted.In such a high-dimensional case, statistical testing and histograms become very troublesome and thus are omitted in this work.In Tab. 1, SI B, the ordering of the descriptive statistics is from top to bottom is the same as in MD plot and bean plot from left to right.The MD plot enables ordering by concavity which is used here.The MD plot (Fig 9 ) and the bean plot (Fig. 10) visualizes all variables in one picture.S1 Table shows that six variables from right to left do not possess more than 1% negative values.50% of data for "net tangible assets" and "total cash flow from operating activities" lies in a small positive range."Interest expense" and "capital expenditures" do not possess more than 1% positive values."Net income" can only have 25% of data below zero and "treasury stock" has the second largest kurtosis of the selected features.
The MD plot shows that "net income", "treasure stock" and "total cash flow from operating activities" have a high kurtosis in a small range of data centered around zero (Fig. 9)."Interest expenses" and "capital expenditures" are highly skewed to the negative.The last six variables from right to left do not possess visible negative values.
The bean plot changes skewed distributions to distributions with one mode or uniform distributions (Fig. 10).There is no hard cut around the value zero (red line).Instead, around one third or more of the distributions visualized lie below zero contrary to the descriptive statistics where six variables cannot have more than 1% of values below zero.In sum, the visualization of MD plot is in agreement of descriptive statistics (SI B, S1 Table ) and disagreement with the bean plot.The Python violin plot is showing values above and below the limits of [-250000, 1000000] and a less detailed, and incorrectly unimodal distributions.(SI.C, Fig. 18).The variables are highly skewed with a small besides net tangible assets, total assets and total stockholder equity.
The latter two are multimodal.Fig. 10: Bean plots of selected features from 269 companies on the German stock market reporting quarterly financial statements by the Prime standard.The ordering of the features is by concavity and the same as in Fig. 9.There is no hard cut around the value zero (red line) and the variables are unimodal or uniform with a large variance and a small skewness.This visualizations disagrees with the descriptive statistics in SI B, S1 Table.Note, that for a better comparison we disabled the additional overlaying plots.

Experiment VI: Range of Values Depending on Feature
In a data set often the ranges of features differ.For example, the range of MTY and the range of ITS (Income Tax Share, [Thrun/Ultsch, 2018; Ultsch/Behnisch, 2017]) varies largely and usual schematics plot would be unable to show the distributions of both features simultaneously which is visualized by the MD plot in Fig. 11.With the robust normalization (35) selected in the MD plot the distributions can be investigated at once without changing the basic properties (Fig. 11).Then, the bi-modality of the ITS feature becomes visible in the MD plot and in the bean plot (SI.B, Fig. 12).However, the violin plot is unable to visualize the bimodal distribution and the stacked histogram underestimates it significantly (SI.B, Fig. 12).The Python violin plot draws data above and below the limits of the data, but is showing the bimodality of the ITS feature (SI.C, Fig. 19).
Statistical testing confirms that the distribution of ITS is not unimodal p(n=11194,D = 0.01196)< 2.2e-16.

Discussion
If an explorative distribution analysis of several features at once is required, the interesting basic properties of empirical distributions are depicted in Tab1: skewness, multimodality, normality, uniformity, data clipping, and additionally of interest is the visualization of the varying ranges between features.The results show that MD plot is the only schematic plot, which is appropriate for every case and does not require adjusting its process of density estimation by various parameters.
Three artificial and four natural datasets show the limitations of the schematic plots of bean plot, violin plot (R and Python versions).A comparison of results to conventional statistical testing and histograms is included.The results illustrate that the usefulness of the violin or so-called bean plot depends on the density estimation approach used in the algorithm, and the density approach critically depends on the bandwidth of the kernel function.
For an artificial distribution of two equal sized Gaussians and a skewed Gaussian, statistical testing was performed with the dip statistic by changing the mean of the second Gaussian and the D'Agostino test of skewness [D 'Agostino, 1970] by changing the skewness parameter (sample size n=15000).Statistical testing, the bean plot and MD plot have a similar sensitivity regarding bimodality and skewness as long as the sample is large enough.The sensitivity in these cases of the Python violin plot is comparable to the bean plot in R.However, overlaying the MD plot with a robustly estimated Gaussian allows for an even higher sensitivity than statistical testing.Contrary to the bean plot and the Python violin plot, the MD plot does not indicate multimodality in uniform distributions.
Automatically ordering the features makes skewness more clearly visible in the MD plot in comparison to the bean plot and Python violin plot.The natural example of the Log of German peoples income showed that for smaller samples (n=500) the bean plot visualizes unimodal distributions instead of skewed distributions disagreeing with the histogram and MD plot.
Additionally, the bean plot visualizes a mode which partly is above the maximum value of 4.35.
The same behavior regard to leaving the valid value range and stronger smoothing of the representation could also be observed with the Python violin plot.For clipped data, the density estimates of the MD plot does not change contrary to the bean plot.
Violin plots in R were not able to visualize the bimodality which was surprising.As suggested by the name, the violin plot was particularly intended to identify multimodality by exposing a waist between two modes of a distribution because the box plot is unable to visualize it.Additionally, the R version violin plots underestimate the skewness of the distributions.It was illustrated that histograms were less sensitive in the case of bimodality because the default binwidth was not small enough.The effects found in the bean plot and Python violin plot for skewed distributions and clipped data were outlined further in the high-dimensional case of financial statements of companies noted on the German stock market (39).As an example, 12 features were selected.
Here, the visualization of the bean plot leads to a completely misleading interpretation of the data contrary to the MD plot (cf.SI B, S1 Table ).The parameter settings of all plots until the last experiment remained at default because a non-expert user would not change them and an expert user would have difficulties to set density estimation parameters in a solely explorative approach.
In sum, the results illustrate that the MD plot can outperform histograms and all other schematic plots investigated and that, the alternative, descriptive statistics is a hard task contrary to plotting all variables in one picture.Typically skewness and multimodality for each feature in SI B, S1

Conclusion
This work serves as an illustration that current density estimation approaches can lead to major misinterpretations if the default setting is not adjusted.Adjusting the parameters of conventional plots would require prior knowledge or statistical assumptions about the data, which are often difficult to acquire.In the case of strictly exploratory data mining, we propose a parameter-free schematic plot called mirrored density plot.The MD plot represents the relative likelihood of a given variable taking on specific values using the PDE approach to estimate the pdf.PDE is slivered in kernels with a specific width.This width, and therefore the number of kernels, depends on the data.The MD plot enables to estimate the pdf of many features in one visualization.It was shown on artificial data and natural examples on multimodal and skewed distributions that the MD plot serves as a good indicator in the case of bimodal as well as skewed distributions for small and large samples.All other approaches had intrinsic assumption about the data which in some cases lead to misleading interpretations of the basic properties.The MD plot possesses an explicit model of density estimation leaned on information theory and is parameter-free through the definition of a data-driven kernel radius contrary to the commonly used density estimation approaches (e.g.bean and violin plot).Furthermore, the MD plot has the advantage to visualize the distribution of a feature correctly in the case of data clipping and in the case of varying ranges of features.In sum, the MD plot enables a non-specialist to easily apply explorative data mining by estimating the basic properties of the pdf (distribution) of many features in one visualization for whom the setting several parameters become a difficult task.
Combining the MD plot with a(n) (un-)supervised index is an excellent approach to evaluate the stability of stochastic clustering algorithms (e.g. ( 41)) or classifiers.Furthermore, it can be used with quality measures for dimensionality reduction methods to compare projection methods (e.g. ( 41)).The MD plot is available in the R-package 'DataVisualizations' on CRAN (33) and in the Python package 'md_plot' on PyPi (34).

Fig. 1 :
Fig. 1: Uniform distribution in the interval [−2,2] of a 1000 points sample visualized by a histogram (a) of plotly (23) with a default binwidth (top) of plotly and bottom: violin plot (b, left), bean plot (c, middle) and MD plot (d, right).In the violin plot and bean plot, the borders of the uniform distribution are skewed contrary to the real amount of values around the borders 2, −2 .The bean plot and histogram indicate multimodality but Hartigans' dip statistic (6) disagrees: p(n=1000,D = 0. 01215)= 0.44.

Fig. 2 :
Fig.2: Scatterplots of a Monte-Carlo simulation in which samples were drawn and testing was performed in a given range of parameters in 100 iterations.The visualization is restricted to the median and 99 percentile of the p-values for each x value.Significance test of Hartigans' dip statistic is highly significant for a mean higher than 2,4 in a sample of a size of n=31.000.

Fig. 3 :
Fig. 3: Plots of bimodal distribution of changing mean of second Gaussian: Stacked histogram (a) of plotly (23) with a default binwidth, violin plot (b), bean plot (c), and MD plot (d).Bimodality is visible beginning from mean 2.4 in bean lot and MD plot, but the MD plot draws a robustly estimated Gaussian (magenta) if statistical testing is not significant which indicates in mean of two that the distributions is not unimodal.The bimodality of the distribution is not visible in the violin plot.Histograms are less sensitive than statistical testing, bean or MD plots.

Fig. 4 :
Fig. 4: Scatterplots of a Monte-Carlo simulation in which samples were drawn and testing was performed in a given range of parameters in 100 iterations.The visualization is restricted to the median and 99 percentile of the p-values for each x value.Significance test D'Agostino test of skewness (8) is highly significant if skewness lies not in ]0.95,1.05[ in a sample of a size n=15.000.Scatter plots were generated with plotly (23).

Fig. 5 :
Fig. 5: Plots of skewed normal distribution by changing the skewness using the R package fGarch (37) on CRAN: Stacked histogram (a) of plotly (23) with a default binwidth, violin plot (b), bean plot (c) and MD plot (d).The sample is with n=15000 large.The histogram and violin plot is less sensitive for the skewness of the distribution.MD plot allows for an easier detection of skewness by ordering the columns automatically.

Fig. 6 :
Fig. 6: MTY feature clipped in the range marked in red with a robustly estimated average of the whole data in magenta (left) and not clipped (right).Bean plot (a) underestimates the density in the direction of the clipped range [1800, 6000] and draws a density outside of the range of values.Additionally, this leads to the misleading interpretation that the average lies in 4000 instead of 4300.The MD plot (b) visualizes the density independently of the clipping.Note, that for a better comparison we disabled the additional overlaying plots.
visualizes a density above the maximum value of 4.35 and underestimates the density in the direction of the maximum value contrary to the MD plot which estimates density correctly (c.f.visualizations in (33)).The Python violin plot shows, like the bean plot, values above 4.35, but smoothes the distrubition more (SI.C, Fig. 17 ) and, hence, does not indicate multimodality.

Fig. 7 :
Fig. 7: Distribution Analyses performed on the log of German people's income in 2003 with a histogram of plotly (23) with a default binwidth.

Fig. 8 :
Fig. 8: Distribution Analyses performed on the log of German people's income in 2003 with violin plot (b), bean plot (a) and MD plot (c).Bean plot and violin plot visualize an additional mode in the range of 4-4.5.With the bean plot visualizing a pdf above the maximum value (red line).The multimodality of ITS is not visible with the default binwidth.Only the MD plot visualizes a clearly clipped and skewed multimodal distribution.

Fig. 9 :
Fig. 9: MD plots of selected features from 269 companies on the German stock market reporting quarterly financial statements by the Prime standard.The ordering of the features is by concavity and the same as in Fig. 10 and SI B, S1 Table.For 8 out of 12 distributions there is a hard cut at the value zero which overlaps with SI B, S1 Table.
For 8 out of 12 distributions there is a hard cut at the value zero which overlaps with SI B, S1 Table.

Fig. 11 :
Fig. 11: Visualization of the distribution of even two features at once is inappropriate if the ranges vary widely (a).This is shown on the example of the MD plot (a).However, the MD plot enables the user to set simple transformations enabling the visualization of several distributions at once even if the ranges vary (b).

Fig. 16 :
Fig. 16: The data for the left visualization was limited to the range [1800, 6000].Nevertheless, in contrast to the MD plot, the violin plot goes beyond this range.

Fig. 17 :
Fig. 17: Visualization of the log of german income.The violin plot is showing values above 4.35 and a less detailled, more smoothed distribution as the MD plot.

Fig. 18 :
Fig. 18: Visualization of selected features from 269 companies on the German stock market reporting quarterly financial statements by the Prime standard.The violin plot is showing data above and below the limits [-250000, 1000000] and a less detailled, mor smoothed distribution as the MD plot.

Table would
have been statistically tested leading to an even bigger table.