Understanding the Historic Emergence of Diversity in Painting via Color Contrast

Painting is an art form that has long functioned as major channel for communication and creative expression. Understanding how painting has evolved over the centuries is therefore an essential component for understanding cultural history, intricately linked with developments in aesthetics, science, and technology. The explosive growth in the ranges of stylistic diversity in painting starting in the nineteenth century, for example, is understood to be the hallmark of a stark departure from traditional norms on multidisciplinary fronts. Yet, there exist few quantitative frameworks that allow us to characterize such developments on an extensive scale, which would require both robust statistical methods for quantifying the complexities of artistic styles and data of sufficient quality and quantity to which we can fruitfully apply them. Here we propose an analytical framework that allows us to capture the stylistic evolution of paintings based on the color contrast relationships that also incorporates the geometric separation between pixels of images in a large-scale archive of 179,853 images. We first measure how paintings have evolved over time, and then characterize the remarkable explosive growth in diversity and individuality in the modern era. Our analysis demonstrates how robust scientific methods married with large-scale, high-quality data can reveal interesting patterns that lie behind the complexities of art and culture.


INTRODUCTION
Humans have used paintings as a way to communicate, record events, and express ideas and emotions since long before the invention of writing. Painting has been at the center of artistic and cultural evolution of humanity reflecting their lifestyle and beliefs. For example, Ernst Gombrich, art historian in Britain, remarked on Egyptian art in his book that "To us these reliefs and wall-paintings provide an extraordinarily vivid picture of life as it was lived in Egypt thousand years ago" [1]. A deep investigation into how painting has evolved and the motivations behind it, therefore, can be expected to yield valuable insights into the history of creative developments in culture.
Given the ubiquity of art and culture, and the value our society puts on them as symbols of the quality of life, we believe that approaching art and culture as subjects of serious science should be a worthy endeavor. In order to proceed, we take the viewpoint that a piece of art can be considered as a complex system composed of diverse elements whose collective effect, when presented to an audience, is to stimulate their sensesbe they cerebral, emotional, or physiological. To understand a painting, for instance, one may analyze its colors, geometry, brushing techniques, subjects, or impact on the audience, each of which would allow us to grasp the multifacted, correlated aspects of the art form. The same could be said of many other art forms, obviously with some unique variations depending on the form. A positive development that could have far-reaching benefits for such work is the recent proliferation of high-quality, large-scale digital data of cultural * Corresponding author; juyongp@kaist.ac.kr artifacts that provides an unprecedented opportunity to devise science-inspired analytical methods for identifying interesting and complex patterns residing in art and culture en masse [2][3][4][5][6][7].
The quantitative study of style in a cultural artifact is also called stylometry, a term coined by Polish linguist Wincenty Lutosławski who attempted to extract statistical features of word usage from Plato's Dialog [8]. Stylometric analyses have been performed on various subjects since then, including literature [9][10][11][12], music [13][14][15][16] and art [17][18][19][20][21][22][23][24]. A landmark scientific study of paintings is Taylor et al.'s characterization of the fractal patterns in Jackson Pollock's  drip paintings [17]. It was subsequently found that the characteristics of drip paintings of unknown origin significantly deviated from those of Pollock's, showing that such measurements can reflect an artist's unique style [18]. Other notable studies of paintings include Lyu et al.'s which decomposed images using wavelets [19]; Hughes et al.'s which used sparsecoding models for authenticating artworks [20]; and Kim et al.'s which studied the "roughness exponent" to characterize brightness contrasts [21]. Venturing beyond quantification of artistic styles, recent studies investigated perceived similarities between different paintings [22], the influence relationships between artworks for quantifying creativity in an artwork [23], and the changes in the perception of beauty using face-recognition techniques on images from different era [24].
Despite these attempts, individual stylistic characteristics of painters have not yet been sufficiently and collectively explored, which will also reveal a remarkable diversity in the modern times. This stem from a number of shortcomings of previous works: They lack robust statistical frameworks for understanding the underlying principles based on the image data; they make only limited use of the full color information, even though it is readily available in data; or they concern themselves with specific artworks or painters. In this work, we propose a scientific framework for characterizing artistic styles that makes use of the complete color profile information in a painting and simultaneously takes into account the geometrical relationships between the colored pixels in the image, two essential building blocks of an image. Applying this framework to a large number of historical paintings, we characterize artistic styles of various paintings over time.
Color boasts a long history as a subject of intensive scientific investigation from many points of view including physical, physiological, sensory, and so forth. Starting with two classical groundbreaking investigations by Newton [25] and Goethe [26,27], modern research on color continues in full force in art, biology, medicine, and vision science [28,29] as well as physics. Here we introduce the concept of 'color contrast' as the signature property of color use in a painting. As its name suggests, color contrast refers to the contrast effect originating from the color differences between clusters of colors in a painting. Examples of paintings in which color contrast is highly pronounced include Vincent van Gogh's (1853-1890) Starry Night (1899) where a bright yellow moon floats against the dark blue sky and Piet Mondrian's (1872-1944) Composition A (1923) where well-defined geometric shapes of distinct colors are juxtaposed to form a 'hard edge' painting, a popular style of the twentieth century.
We propose a quantity we call seamlessness for color contrast that incorporates the color profile and geometric composition of a painting. We show that this quantity is a useful indicator for characterizing distinct painting styles, which also allows us to track the stylistic evolution of western painting both on the aggregate and the individual levels (particularly for modern painters) by applying it to a total of 179 853 digital scans of paintings-the largest yet in our type of studycollected from multiple major online archives.

DATA DESCRIPTION
We collected digital scans of paintings (mostly western) from the following three major online art databases: Web Gallery of Art (WGA) [3], Wiki Art (WA) [4], and BBC-Your Paintings (BYP) [5]. The WGA dataset contains paintings dated before 1900, while the WA and BYP datasets contain those dated up to 2014 (all datasets are up-to-date as of Oct 2015). WGA provides two useful metadata on the paintings allowing a deeper analysis: painting technique (e.g., tempera, fresco, or oil) and painting genre (e.g., portrait, still life, genre  painting-itself a specific genre depicting scenes from ordinary life). BYP is mainly a collection of oil paintings preserved in the United Kingdom where most paintings originate from. (Later we show that BYP data still exhibit a comparable trend in color contrast with other datasets.) We excluded paintings dated before the 1300s, as they were too few. We manually removed those deemed improper for, or outside the scope of, our analysis; they include partial images of a larger original, non-rectangular frames, seriously damaged images, photographs, etc. The final numbers of painting used for analysis are 18 321 (WGA), 70 235 (WA), 91 297 (BYP) images for a total of 179 853.

METHODOLOGY
Color contrast and seamlessness S as its measure As its name suggests, color contrast represents the effect brought on by color differences between pixels in a painting. Therefore, characterizing how an artist places various colors on a canvas is a key element in determining the color contrast in a painting. The human sense of color contrast between two points in a painting (pixels in a digital image) is affected most strongly by two factors, the relative color difference between the pixels and their geometric separation-the closer they are in real space the more pronounced their color difference will be. In order to quantify this phenomenon, we must first define the color difference between two pixels that agrees with the human-perceived difference. Then we must incorpo- rate the spatial separation information to produce a combined measure. A color is represented by three values that define the three coordinates in a 'color space'. A color space is named according to what the coordinates mean. Familiar examples include the RGB space for Red, Green, and Blue, the HSV space for Hue (the color wheel), Saturation, and Value (brightness), and the CIELab space (the full nomenclature is 1976 CIE L * a * b * ) for L * (lightness ranging from 0 for black to 100 for white), a * (running the gamut between cyan and magenta), and b * (between blue and yellow). The a * and b * axes have no specified numerical limits. For our work we use the CIELab space as it was designed specifically to emulate the human perception of difference between two colors which is proportional to the Eu- 2 ) 2 between the (L * , a * , b * ) coordinates of two colors [30].
That a color difference d between pixels would be the more pronounced the closer they are prompts us to consider, for simplicity, that between adjacent pixel pairs. This results in a total of approximately 2N data points to consider in an image of N pixels. To illustrate what the d can teach us about the use of color in a painting, we compare the distribution π(d) for Piet Mondrian's (1872-1944) Composition A (1923) ( Fig. 1(a)) and Claude Monet's (1840-1926) Water Lilies and Japanese Bridge (1899) (Fig. 1(c)), which show significant differences that indeed well reflect their visible differences. In order to provide a baseline for proper comparison, we also measure the color distance distribution of two different types of null models obtained from randomizing these paintings. The first null model is produced by randomly relocating the pixels of the original image while preserving the number of each color, and the second is produced by replacing all pixels of the original image with randomly selected colors in the RGB color space. Therefore, the first randomization retains the intrinsic color of a painting with only its geometric structure destroyed, where the second randomization produces a completely random image. The fact that the second null model shows a significantly broader tail than other dis-tributions in d indicates that artists are likely to use similar colors, avoiding extremely distant colors in the color space ( Fig. 1(b), (d)). Furthermore, when geometry is considered, the similar colors already in a same painting tend to stay close in real space also. Therefore, the final distribution of d of a painting can be considered as the signature of a painting that represents both an artist's own color selection and geometric style. An image characterized by a high color contrast shows regions with large inter-pixel color distance relative to the overall image, i.e. inhomogeneity in d. In the Mondrian, d is small on average but a significant number of large d exist, namely the tail of π(d) decays more slowly than an exponential ( Fig. 1(b)). This is a consequence of large patches of nearly uniform colors being separated by well-defined borders. The Monet, on the other hand, shows a high average d owing to the many intertwining brush strokes of different colors, but few extraordinarily large d, similar to an exponential distribution (Fig. 1(d)). These suggest using the relative magnitude of the meand and the standard deviation σ d to characterize a painting's overall color contrast: Specifically, we use S ≡ (σ d −d)/(σ d +d). S ∈ [−1, 1] and S > 0 when σ d >d (S = 0.46 for the Mondrian), and when (S = −0.12 for the Monet). We call it "seamlessness" because a high (low) S means fewer (more) boundaries or 'seams' between clusters or 'patches' of like colors. This quantity is also used for quantifying heterogeneity of inter-event time distributions in statistical physics, although the problems are unrelated to each other [31].

Robustness of S
Unlike digitized or OCR'ed (Optical Character Recognition) text, a digitized image of a painting can exist in many versions of differing sizes or colors depending on scanning environments and settings. We therefore need to test for the robustness (insensitivity) of S against such variations, if we are to be able to rely on it as a characteristic of a painting, and not only of a specific scan of it. While we expect slight differences in color or resolution not to result in significant changes in S in principle since it is defined in terms of the color differences between pixels, it would still be reassuring to confirm the robustness of S against certain realizable variations.
When one digitally scans a painting, the lighting condition is a key element that affects the final color of the image. Since the original lighting conditions are not given in the datasets, we simulate different lighting conditions by varying the color temperatures of the light sources, i.e. the color profile of a black body of the same temperature [32]. Assuming that the original scan represents a well white-balanced image, multiplying each pixel by the RGB values of the color of the black body normalized by 255 (the maximum value of each axis) gives the simulated pixel. For instance, at 1500 K, the RGB value of the color that a black body radiates is (255, 109, 0). Then the pixels of an image are multiplied by the factor (255/255, 109/255, 0/255). Analysis of the six test images in different conditions of light ranging from 1500 K (similar to a common candle) to 10 000 K (similar to a very clear blue sky) indicates that S is fairly consistent as expected (see Fig. 2 and Fig. 3(a) for the simulated images of paintings in different color temperature and their S values).
We also test for the effect of image size on S by rescaling the six test images to between 100 and 1500 pixels in width (the longer side of an image) using the bicubic interpolation method (Fig. 3(b)). After showing some fluctuation when the size of the image is very small (< 300 pixels in width), S be- comes fairly stable for larger widths (> 500 pixels). Since 99.8% (WGA), 76.0% (WA) and 100.0% (BYP) of painting images in this research are wider than 500 pixels, this is unlikely to be an issue in practice (Fig. 4).

Historical evolution of color contrast
The measurement of S on all images allows us to map the historical trend of color contrast, shown in Fig. 5(b). Most notably, the average S consistently increases until it shows a temporary dip in the nineteenth century (Fig. 5(c)). The increase in S around the fifteenth century is often attributed to the wide adoption of oil as binder medium for pigments [1,33]. The availability of new pigments, media, and colors have historically been linked to the emergence of new techniques and styles in painting. Prior to 1500 CE, in the Medieval times, most paintings were tempera or fresco. Around the fifteenth century, oil gained popularity, superseding the previous two as the most dominant medium of choice that allowed for new techniques for high contrast (Fig. 6(a)). Fig. 6(b) teaches us that oil paintings show significantly higher average S than other techniques. The Kolmogorov-Smirnov test tells us that the distributions of S of varying techniques are significantly different (P < 10 −11 for all pairs). Additionally, two well-known historical developments in painting, the chiaroscuro (the treatment of light and dark to express gradations of light that create the effect of modeling [33]) during the Renaissance period, and tenebrism (painting in the shadowy manner with dramatic contrasts of light and dark [33] by Caravaggio (1517-1610) made popular during the Baroque period contributed to the increase in S .
The development of these new painting techniques is also closely related to the rise of novel painting genres. The rise in popularity of portraits after the fifteenth century led to a couple of significant developments in painting technique such as chiaroscuro and tenebrism mentioned above (Fig. 6(c)). Still life shows notable changes during the sixteenth century, reaching the peak in the seventeenth century (Fig. 6(d)). The increase of S in still life in the sixteenth century is attributed to the change of themes and subjects. In the prior half of the century, Dutch painters like Pieter Aertsen (1508-1575) and Joachim Beuckelaer (1533-1573) intentionally combined still life and depiction of biblical scenes in the background, while in the latter half artists began to highlight still objects by incorporating chiaroscuro previously found in portraits resulting high S [33].
In the nineteenth century, artists began to perceive paintings as a means of expressing their individuality and originality more strongly than ever before [34]. Challenging the tradition led to a thriving of different interpretations of the world, and various new techniques for expressing it emerged [1]. In the beginning of the nineteenth century, for instance, artists started to pursue various impressions of light shining on nature and landscapes, rather than the dramatic and artificial lighting effect of the previous era, leading to the decrease in S . The invention of the railroad and the paint tube enabled impressionists to travel to distant areas, leading to the surge in popularity of landscape paintings in the nineteenth century [35] (Fig. 6(c)). Furthermore, towards the end of the nineteenth century modern abstract art began to emerge, noted for a drastic departure from realism [1]. The simple and geometric abstraction of the movement led to an unprecedented growth in S (Fig. 5(c)). But it is important to note that the variance in S also grows rapidly, indicating a remarkable growth in the diversity in color contrast. The most notable growth occurs between the nineteenth and the twentieth centuries (Fig. 5(d)). Fig. 7 shows that in earlier periods, the shape of distribution of S is concentrated around the mean, reflecting a narrow scope of color usage and therefore color contrast. However, in modern periods, the distribution becomes much broader than earlier periods. This indicates that painting styles become more diverse in later times, especially the modern era. The concentration around the mean appears weak, making it difficult to think of a "typical" style.
This stronger diversity in color contrast is observed not only on such aggregate level, but also in individual painters' profiles: Regardless of the numbers of paintings produced, the individual painter exhibits a wider range of S in this period than their predecessors (Fig. 5(e), (f)), signifying a culture of experimentation and willing adoption of diverse styles [1]. We explore this in more detail in the following section.

Characterizing individual painters in the modern era
Prompted by the aforementioned extraordinary historical developments in color contrast in the modern, we find it essential to explore in finer detail the patterns of individuality for understanding stylistic developments. For the modern painters who belong to this period (defined as those whose middle point in their career is in the nineteenth century or later), we introduce two novel quantities to characterize their individuality, metamorphosality and singularity. Metamorphosality measures a painter's transformation in color contrast over their career, while singularity measures how distinct their style is from the norm of the day. We used the WA dataset to measure these quantities.

Metamorphosality
Mondrian, founder of De Stijl movement and renowned for abstractionist paintings (Fig. 8(d)), actually produced works of a wide range of styles over his career. His progression from traditional style to abstractionism can in fact be summarized using S , which increases consistently until the mid-1920s, when his abstractionism fully matures ( Fig. 8(a), (d)). Pierre Auguste Renoir (1841-1919), an early leader of impressionism, is the opposite: his S decreases over time, as he progressively employs free-flowing brush strokes to generate boundaries that fuse softly with the background (Fig. 8(b), (e)). Claude Monet (1840Monet ( -1926 and Edgar Degas , also prominent impressionists, show similar trends. These observations teach us that the changes in S can indeed represent painters' stylistic evolutions. We now define the metamorphosality of a painter based on the slope a of the linear fit to the S values with their career lengths normalized to 1. For instance, a = 0.62 for Mondrian and a = −0.10 for Renoir (Fig. 8(c)). Given the near-Gaussian distribution π(a) over the 1,326 modern artists who produced paintings in at least five distinct years, we define a painter's metamorphosality as their z-score µ ≡ (a −ā)/σ a , whereā is the average, and σ a is the standard deviation of the slopes. This allows us, for example, to rank the artists and find the most notable, prominent ones. Fig. 9(a) shows 100 modern artists whose µ is ranked in the top 50, both positive and negative. It is American painter Howard Mehring (1931Mehring ( -1978 who has the largest µ = 4.07. Mehring's early works are reminiscent of Pollock, Mark Rothko (1903Rothko ( -1970 and Helen Frankenthaler (1928-2011, employing uniformly scattered colors with vague boundaries [36]. His later works, on the other hand, become more structured with geometric compositions of vivid colors with abrupt transitions, very similar to Mondrian's hard-edge paintings. At the other extreme with the smallest (most negative) µ is Swiss-French painter Félix Edouard Vallotton (1865-1925), member of the postimpressionist avant-garde group Les Nabis; initially famous for wood cuts featuring extremely reductive flat patterns with strong outlines (high S ), he produced classical-style paintings such as landscapes and still life in later life (low S ) for µ = −5.59.

Singularity
Another indication of a strong individuality is how one's works differ from their contemporaries'. We quantify this using singularity defined as follows. For each painting we compare its S with those produced roughly at the same time (defined as a span of eleven years, five years before and five after its date) and measure its z-score. We call a painting singular (i.e. statistically unusual) if its S falls outside some |z| value, which we take to be |z| > 1. We then measure each painter's production rate of singular artworks over their careers. Fig. 8(f) shows seven artists and their paintings' zscores, for example. The artworks in the lightly shaded areas are the singular ones according to our definition. We now define the singularity ν of an artist as the difference between the fractions of their paintings that are z > 1 and z < −1. This definition allows us to determine those who often produced singular paintings, and show a specific trend in S . For example, 45% of Mondrian's paintings are in z > 1 (singular high in S ) and only 6% in z < −1, resulting in ν = 0.39, showing that his high-S paintings are indeed unique and singular when compared with his contemporaries. In Fig. 8 shows the histogram of 330 artists who painted more than 40 works. In accordance with our definition, we indeed identify those known for a high level of singularity and originality (see Fig. 9(b) for 100 modern artists whose ν is ranked in the top 50, both positive and negative). Qi Baishi (1864-1957), Chinese-born but widely known in the West for his witty watercolor works of vivid colors [37] shows the highest positive singularity (ν = 0.92). Max Bill (1908-1994) is also highly singular, known for geometric paintings that also became a signature of his style as a Swiss designer (ν = 0.79). Koloman Moser (1868-1918), founding member of the Vienna Secession movement and known for repetitive complex motifs inspired by classical Greek and Roman art, has the largest negative singularity (ν = −0.91). Eugène Leroy  is ranked second in negative singularity, known for numerous works featuring obsessively thick brush strokes in different colors, resulting in obscure objects not readily identifiable [38]. These findings show that our understanding of color contrast can indeed characterize the individual painters, and identify those prominently noted for their creativity and uniqueness.

CONCLUSIONS
Art and culture are the manifestations of human creativity. For that reason, in addition to being objects of appreciation for purely aesthetic purposes, they may contain valuable information we could utilize to understand the creative process. To this end, we have focused on perhaps the most essential ingredients of a painting-color and geometry-via color contrast and inhomogeneity, which allowed us to quantitatively characterize and trace artistic styles of various periods and identify those artists who exhibited variability and originality. We inspected whether our measure was sensible by cross-validating our findings with accepted understandings of their artworks.
Lessons from our investigation suggest many interesting directions for understanding art and culture via the use of massive data sets. For instance, Asian, Hindu, and Islamic painting art have been largely untouched in our work; large-scale analyses of these subjects would be of immediate, universal interest. Also, integrating an analytical study using stylometric measures such as ours with object-recognition and clas-sification techniques from machine learning could lead to a deeper understanding of art that incorporates both the styles and contents of paintings [24,39]. For example, how the same objects or motifs have been portrayed differently over time would shed light on changes in tastes as well as style. We expect such work to find use in understanding various art forms such as sculpture, architecture, visual design, film, animation, typography, etc.