Synthetic images aid the recognition of human-made art forgeries

Previous research has shown that Artificial Intelligence is capable of distinguishing between authentic paintings by a given artist and human-made forgeries with remarkable accuracy, provided sufficient training. However, with the limited amount of existing known forgeries, augmentation methods for forgery detection are highly desirable. In this work, we examine the potential of incorporating synthetic artworks into training datasets to enhance the performance of forgery detection. Our investigation focuses on paintings by Vincent van Gogh, for which we release the first dataset specialized for forgery detection. To reinforce our results, we conduct the same analyses on the artists Amedeo Modigliani and Raphael. We train a classifier to distinguish original artworks from forgeries. For this, we use human-made forgeries and imitations in the style of well-known artists and augment our training sets with images in a similar style generated by Stable Diffusion and StyleGAN. We find that the additional synthetic forgeries consistently improve the detection of human-made forgeries. In addition, we find that, in line with previous research, the inclusion of synthetic forgeries in the training also enables the detection of AI-generated forgeries, especially if created using a similar generator.


Introduction
Forgeries are a serious threat to the artwork market, as illustrated for instance by the infamous Max Ernst forgery "La Horde".In 2006, the auction house Christie's announced the sale of the artwork, with an estimated value of about £3,000,000.However, it turned out that "La Horde" was a forgery created by the art forger Wolfgang Beltracchi [1].Similarly, at the beginning of the 20th century, the Wacker case made the headlines globally.The German art dealer Otto Wacker, possibly with the help of his brother Leonhard, managed to sell over 30 fake Van Gogh paintings to public and private collectors, and many of the paintings were even included in the Catalogue Raisonné by Van Gogh expert Jacob de la Faille [2].Despite experts' disagreement, the art dealer was charged with fraud in April 1932.
16th February 2024 1/25 Recent developments in computer vision and machine learning techniques may contribute to the issue in several ways [3].
While most of the studies concentrate on the attribution of an artwork to several pre-defined authors, similar machine-learning methods can also be used to distinguish between authentic artworks by a given author and forgeries.Due to the very close resemblance between original images and human-made forgeries (such as the Wacker forgeries), art authentication is generally a more challenging task than artwork attribution.In particular, authentication algorithms often have to learn very fine details such as brushstroke structure [7,9,17].An additional challenge is that, for a given artist, forgeries are typically much less numerous than original artworks and often lack systematic documentation and high-resolution scans or photos.Despite such limitations, in recent years, NNs such as Convolutional Neural Networks (CNNs) or transformer-based architectures have shown promising results in both art attribution, when trained on datasets of authentic paintings and other stylistically similar artworks [17] as well as in artwork authentication, trained against forgeries [18].
In this context, the new trend of Generative Artificial Intelligence (GenAI) appears to present both threats and opportunities.On the one hand, GenAI might be adopted to create refined synthetic digital forgeries [19], which might populate the internet and diffuse misinformation.The possibility of creating 'fake' synthetic artworks using AI-based methods gained popularity with the publication of Neural Style Transfer (NSF) [20], which learns to decouple the style of an artwork from its content.This method is capable of creating synthetically styled images in the particular predisposition of a given artist to varying scales of accuracy and applicability.The successive publication of various Generative Adversarial Network enhanced architectures (e.g.StyleGANs [21,22]) and powerful large-scale diffusion models (e.g.Stable Diffusion [23] and DALL-E 2 [24]) paved the way to the generation of realistic synthetic forgeries.In particular, the introduction of text conditioning using contrastive text-image models such as CLIP [25], created an accessible and quick interface for the creation of artworks 'in the style of'.Differently from NSF, the latter is not tied to an input natural image and therefore allows greater freedom of generation.
At the same time, the ability of GenAI to create synthetic forgeries may mitigate the limitation of AI-based art authentication of being hampered by the limited availability of known forgeries and imitations.The goal of this work is to explore to what extent the recent GenAI methods such as StyleGANs and Stable Diffusion are able to augment the training datasets of known forgeries and enhance the performance of AI-based art authentication.While most recent proposals to detect fake images mainly address photorealistic images [26], the use of synthetic forgeries in artwork authentication is a widely unexplored area.Specifically, we focus on paintings by Vincent van Gogh, which are frequently used as a benchmark dataset for machine-based art attribution methods [6,7,9,10].Van Gogh painted a sheer amount of artworks, now in the public domain, and was widely forged due to its enormous market value.Van Gogh datasets, therefore, serve as valuable case studies for art authentication.
We build on the already publicly available dataset VGDB-2016 on Van Gogh [27] available here, containing a set of 126 RGB original images by the artist and a set of 212 non-authentic RGB images by other Impressionist and Expressionist artists.The VGDB-2016 dataset does not contain forgeries, making it unsuitable for forgery detection.To address this, we enrich it for the purposes of art authentication and add 11 RGB images of well-known forgeries created by the forger Otto Wacker into our dataset.We also include 8 forgeries by former art forger and now legitimate artist creating genuine fakes, John Myatt.The latter images are not in the Open Domain, therefore we only provide a pointer to those images.Furthermore, we release the artificially generated AI-based forgeries specifically generated for this paper.Finally, to reinforce our findings on van Gogh, we carry out the same analysis on datasets of Amedeo Modigliani and Raffaello Sanzio (Raphael), which are detailed in the supporting information S1 Appendix.
The outline of this paper is as follows.The next section details the Methodology we employ to generate synthetic images used to augment the training data set of known forgeries.We also briefly discuss the classifier model that we use for forgery detection.We then present our main findings on improved Classification methodology, leading up to the goals listed above.As a consistency check, we also discuss the authentication of synthetic forgeries created by Stable Diffusion and StyleGANs Detection of synthetic images.A brief summary is provided in the Discussion and Conclusions.

Methodology
In this section, we provide an overview of the methods we employed to generate synthetic images for art authentication.We first outline the process of creating synthetic images for the training datasets and provide details about the composition of the dataset.We also elaborate on our classification methodology for art authentication.Finally, we explain how we evaluate the authentication efficiency.

Methods for synthetic image generation
We use two fundamentally different GenAI methods to generate synthetic artwork: an image-to-image generative adversarial network (GAN) and a text-to-image diffusion model.The images generated both by the diffusion model and GAN are collectively referred to as synthetic data.
We used the NVlabs implementation StyleGAN3 [22] which is one of the most recent and successful GANs.StyleGAN3 was trained from scratch on 10380 portraits in various genres and by many different authors, including 126 portraits by van Gogh, 280 by Modigliani, and 157 by Raphael.The portraits by the three artists are sourced from Wikiart, and while there is considerable overlap, they do not entirely represent the sets of original artworks detailed in the subsequent datasets.The latter ones are not limited to portraits but, in turn, are filtered to include only artworks appearing in museum collections or Catalogue Raisonnés, ensuring a high level of certainty regarding their authenticity.
The training took 5M epochs on 4 GPUs.More details on the training procedure and the quality of the resulting images can be found in the supporting information S1 Appendix.With such training, StyleGAN3 produces images in a mixture of styles by random authors.We used the trained StyleGAN3 to produce a "raw" dataset of 2000 random portraits.In what follows, images picked at random from this "raw" dataset are referred to as the "raw GANs" image set.Furthermore, subsets of synthetic portrait images in the style of a specific artist (van Gogh, Modigliani, or Raphael) were created through further training for 50k epochs exclusively on original paintings by the respective artist.This yielded image sets of synthetic images that looked stylistically close to the works of van Gogh, Modigliani, and Raphael.We refer to these datasets as "tuned GANs" image sets.We remark that, due to the limited number of paintings available for each artist, prolonged training on the exclusive data sets often results in a decline in the quality of the StyleGAN3 images.Rather than achieving the desired outcome of generating a large variety of images in a given style, long specialized training tends to produce an almost exact reproduction of the training set.Specialized training for some time between 20k and 100k epochs has proven to be a good compromise, striking a balance between a wide variety of images and effective adaption of the desired style.
To create the text-to-image synthetic artworks, we use the Stable Diffusion [23] generative model.It relies on CLIP guidance [25] to semantically align the latent text representation and the latent image representation and a U-Net architecture [28] as a de-noising diffusion model.The quality of images generated using Stable Diffusion strongly depends on the text prompt.We generate images in the style of each artist using a simple prompt indicating the style, the content, and the artist; for example: 'Post-impressionist painting of a young boy, by Vincent van Gogh'.We adopt the Stable Diffusion version 2.1 (v2-1_768-ema-pruned.ckpt), with 60 inference steps, 8 guidance scale, and 512 × 512 pixels resolution.The resulting synthetic dataset is referred to as "diffusion".Note that Stable Diffusion has been trained on subsets of the very broad open-source dataset LAION-2B(en) collected in the wild and using the large contrastive model OpenCLIP while the GAN was trained on the controlled WikiArt dataset.We used the second version of Stable Diffusion because it is trained on fully open data and models.

Composition of training datasets
AI-based art authentication is a binary classification task where the model learns to differentiate between authentic and non-authentic artworks (including known forgeries).This requires training on two sets of artworks for each artist, an authentic and a contrast set.Our experiments are centered around the van Gogh dataset which contains 126 original artworks from the VGDB-2016 dataset [27].The dataset was gathered from Wikimedia Commons, and it contains artworks with a similar chronology or artistic movement to van Gogh and with a density of at least 196.3 PPI (Pixels Per Image), the dataset also contains two artworks with debated attribution for testing.Here we note that the number of images does not exactly match those mentioned in the original paper [27], we provide the number of images that were actually downloaded through the dataset link provided in [27].
In addition, in supporting information S1 Appendix we provide two further tests of our approach on the artworks by Modigliani (100 original artworks) and Raphael (206 original artworks) and imitations/forgeries thereof.The latter datasets were collected from museum collections or sourced from Catalogue Raisonnées, which are expert-curated lists documenting all verified authentic artworks by the respective artists.
The contrast set includes artworks that were not made by the artist, but that resemble it closely and are helpful in detecting forgeries of the artist's work.Normally, this includes artworks of similar artists, referred to as 'proxies', and forgeries or explicit imitations of the artist, referred to as 'imitations'.Proxies are paintings by different human authors who painted in a similar style to the artist (i.e.artists pertaining to the same artistic movement) and/or were collaborators, pupils, and teachers.The word imitation is used here as an umbrella term to encompass human-made non-autograph copies of authentic works, artworks explicitly made in the style of the artist, and known forgeries of the artist.To these elements, we add synthetic fakes generated by Stable Diffusion 2.1 and StyleGAN3, and we test whether their addition increases the performance of the models.
The contrast set of Vincent van Gogh contains 212 artworks by similar artists, 19 forgeries (11 by Otto Wacker and 8 by John Myatt), 30 Stable Diffusion generated images, 30 GANs fine-tuned on the artist, and 30 random GANs (the 'raw GANs').
16th February 2024 4/25 The set of 'raw GANs' contains the same exact images across all three datasets used in this work.All images are pre-processed according to the procedure detailed in [18].Specifically, we generate sub-images of paintings, i.e., RGB images normalized to a fixed size of 256 × 256 pixels, with channel values normalized to the unit interval.These sub-images are created by dividing the entire image into 2 p × 2 p equally sized units, where p is determined by the resolution of the original image.If the smaller side of an image is larger than 1024 pixels, then p = 2; if the smaller side is larger than 512 pixels and smaller than 1024, then p = 1.For all images, irrespective of resolution, we also include the sub-image obtained by center-cropping a square from the full image.Therefore, depending on the resolution of the original image, the images are patched in 21, 5, or 1 adjacent non-overlapping patches.Using bi-cubic resampling, we reshape all the images to either 224 × 224 pixels or 256 × 256 pixels depending on the input supported by the model.
The patches are split randomly into training (72%), validation (11%), and test (17%) sets, ensuring that patches belonging to the same original image feature in the same set.We randomly sample the split 10 times, obtaining 10 bootstrapped splits for cross-validation.
For the sake of clarity, we will present the results for the van Gogh dataset [29] in the remainder of this paper.The outcomes for Modigliani and Raphael are available in the supporting information S1 Appendix.
Table 1 provides a detailed overview of the van Gogh dataset.The rows represent the six image sets, while the columns show the number of images and the corresponding number of patches.Representative images of each class (authentic, imitation, GAN, and diffusion) are shown in Fig. 1.

Classification methodology
After preparing the training and testing datasets of human-made and synthetic artwork forgeries as described in the previous Section, we proceed to explain the classification methodology on the gathered dataset.In line with the approach outlined in [18], we employ transformer-based classification methods (specifically, Swin Base [31]) and state-of-the-art Convolutional Neural Networks (EfficientNet B0 [32]) to distinguish between authentic artworks and forgeries.These models have been adopted in [18] and proved to outperform the canonical ResNet101 model, the typical baseline model for art authentication.The Swin Base is an image transformer model that uses a hierarchical structure with shifting windows to reduce the computational complexity of transformer models, it accepts inputs of size 224 × 224 and has 88M parameters.On the other side, EfficientNet B0 is a CNN-based model belonging to the class of EfficientNets which adopts an optimal width, depth, and resolution scaling for the architecture.It accepts slightly larger inputs of shape 256 × 256 and has only 5.3M parameters.We note that Swin Base is a larger model version compared to EfficientNet B0.We will present the classification outcomes for both the Swin Base and EfficientNet models, however, we do not directly compare the performance of these models.Rather, the purpose is to demonstrate that incorporating synthetic data into training datasets enhances classification reliability, regardless of the classifier architecture.We use the Swin Base and EfficientNet models pre-trained on ImageNet data [33] and fine-tune them for the art authentication task.To do so, we substitute the final activation layer with one dense layer converging in a single node with sigmoid activation and train using the binary cross-entropy loss without freezing the weights.For both models, we use a learning rate of 10 −5 , a batch size of 32.We train the models on binary classification, where class 1 contains the authentic artworks by the artist (authentic set) and class 0 refers to the non-authentic artworks (proxies, imitations, and synthetic images).
To investigate how the addition of synthetic images in the training set improves classification accuracy we run the following experiments.First, we test whether the addition of each of the synthetic sets separately, as well as the combination of Stable Diffusion and fine-tuned GANs, improves the classification accuracy of the human-made forgeries against a baseline trained using 'proxies' and 'imitations'.This baseline also 16th February 2024 6/25 agrees with the previous work [18].Secondly, we investigate the extent to which synthetic images can increase the detection of human-made forgeries while never training the models on any human forgeries, thus relying solely on 'proxies' and excluding 'imitations'.The setup of the experiments is schematized in Fig. 2. We note that the second task is inherently much harder than the first one, as it tests whether synthetic images alone can substitute the need to train on human-made forgeries to detect such forgeries.This case scenario addresses the situations in which there are no known forgeries of an artist but it is still desirable to be able to flag possible forgeries.This case is common in art connoisseurship as less well-known artists are rarely forged.To evaluate the performance of our classifiers, we compute the confusion matrices and classification accuracy on the test datasets aggregated at the image level, separately for authentic artworks, imitations, and synthetic images.
Statistical significance of the results is guaranteed by the use of cross-validation with 10 different splits and subsequent uncertainty estimation.All quoted results are the median of the joint distributions and the uncertainties are the (symmetrized) 68%quantiles of the median (equivalent to the 1σ-standard-error for normally distributed data).We use the concise parenthesis notation in Tables 2-4.Thus, an entry like 0.710 (46) means that the central value of the distribution we obtained during the cross-validation process is 0.710 and it is unlikely (at most 32%) that the true value does deviates from this central value by more than 0.046.Similarly, in Figs. 3 and 4 this would correspond to a main bar at 0.710 and error-bars of lengths 0.046 both up and down.

Results
The outcomes of our classification experiments for van Gogh are shown in Tabs. 2 and 3.The results are also visually depicted in Fig. 3.The evaluation of the classification performance is based on two main criteria: the accuracy in classifying human forgeries 16th February 2024 7/25 (accuracy 'forgeries') and the accuracy in classifying authentic paintings (accuracy 'originals').Note that what we here refer to as 'forgeries' is synonymous with the set of 'imitations', as those imitations are, in this case, indeed forgeries.

Detection of human forgeries
The classification accuracy for authentic paintings reveals consistently high levels, approximately 90% or higher with the 'Swin Base' model classifier and at least 80% with 'EfficientNet B0', across all training sets, as indicated by the green bars in Fig. 3.We observe, moreover, the reproducible improvement in the classification accuracy of human-made forgeries when synthetic forgeries are added to the training datasets.This finding is, to the best of our knowledge, yet unobserved in the literature.On the mixed synthetic training datasets, we were able to achieve accuracies approaching 80% (see Table 2).Images generated by Stable Diffusion appear to be particularly beneficial, leading to accuracy improvements of 10% to 20%.These improvements are evident in Fig. 3, where the purple bars (forgeries) associated with "no synthetic" values (this baseline is also extended as a dotted horizontal line) are consistently lower or equal to the values associated with the synthetic ("diffusion" and "tuned GANs") counterparts.This result is particularly impressive considering that the forgeries were often painted by professionals with the goal of avoiding detection.
In addition to augmenting the human-made forgeries with synthetic ones, we also investigated the case where no human-made forgeries were included in the contrast set at all (experiment 2).All classification accuracies are bound to be lower in this case, which is what we observe, and we certainly cannot recommend using this approach in practice if any human-made forgeries are available.However, it allows to resolve the benefit of synthetic forgeries with higher statistical significance.As can be seen in Table 3 and the lower two panels of Fig. 3, the addition of synthetic data allowed to improve the forgery detection accuracy by almost 40% and 30% for 'Swin Base' and 'EfficientNet B0', respectively, both corresponding to a significance of about 4σ.
Finally, the quality of synthetic data plays a crucial role in training success, as expected.As seen in Tables 2 and 3, training solely on "raw GAN" images resulted in minor or no improvement in authentication accuracy.Nevertheless, it is interesting to note that in some cases, the addition of "raw GAN" datasets without any authorspecific features led to slight enhancements in authentication capabilities.Consistent with previous findings by Schaerf et al. [18] and the inherent differences in model sizes, the transformer-based Swin Base classifier demonstrated slightly superior overall performance.
All findings are consistent across the two models 'Swin Base' and 'EfficientNet B0'.This observation is also supported by a similar analysis of Modigliani and Raphael's datasets described in supporting information S1 Appendix.While the numerical values of the classification results may vary between models and artists, the qualitative conclusions remain consistent across all six combined cases.

Detection of synthetic images
Machine learning methods for the detection of synthetic artwork and synthetic images is currently a very active research topic (see e.g.[34][35][36][37]).While the primary focus of this paper is the detection of art forgeries created by humans, in this Section we demonstrate that, in agreement with previous studies, our classifier Neural Networks (Swin Base and EfficientNet) are also capable of detecting synthetic artwork forgeries created by GenAI.A novel aspect of our approach is that, unlike in most previous studies, the classifier is trained on both human-made and synthetic forgeries.
16th February 2024 We assess the efficiency of the detection of synthetic forgeries using the synthetic sets listed in Table 1.Tuned GANs and Stable Diffusion images are tested independently with central values (medians) and uncertainties of the cross-validation results are computed in the same way as in teh previous Section Detection of human forgeries.The results are summarized in Table 4 and in Fig. 4, which show a considerable improvement of synthetic images detection when integrating synthetic images in the training set.
Our findings are consistent with the typical conclusion in literature (e.g.[35][36][37][38][39]), in that the training on synthetic artwork (tuned GANS, diffusion and raw GANs) is crucial for the classifier to detect forgeries created by GenAI.
We have to distinguish the two cases here of training the classifier with a similar GenAI as has to be detected versus training with a different one.In agreement with the literature, the highest authentication accuracy is obtained if the classifier had already seen synthetic images by the same generator architecture during the training [36,37].As one can infer from Table 4, the best results (all above 80%) for tuned GANs detection are achieved when tuned GANs are also included in the training and equivalently training on Stable Diffusion images allows the highest accuracy for diffusion detection.
16th February 2024 9/25 Based on the results presented in Tables 2 and 3 with the composition of the underlying van Gogh data set as detailed in Table 1 and visualised in Fig. 2. The horizontal dotted line shows the baseline without synthetic images in the training data.Similar results for the artists Modigliani and Raphael can be found in S1 Appendix.We also observe that including tuned GANs in the training helps to some extent with the detection of images generated by Stable Diffusion, and vice versa.This is an interesting observation given that most previous studies on synthetic forgery detection concentrated on generator-specific image features and visual inconsistencies [39][40][41].In the van Gogh based studies presented in this main manuscript it turned out that our classifier could detect GAN images relatively well even without training on synthetic data.The trends described above are visible for both, tuned GANs and diffusion, but they are significantly more pronounced for the latter.When performing the same analysis with the artists Modigliani and Raphael (see S1 Appendix), we found that in some cases tuned GANs also eluded detection very effectively (some accuracies below 10%) as long as no StyleGAN3 images had been included in the training.In all of these cases training on the given architecture readily improved the accuracy.Based on the results shown in Table 4 with the composition of the underlying van Gogh data set as detailed in Table 1 and visualised in Fig. 2. tuned GANs diffusion to be able to teach something to the classifier.
In agreement with previous research, we also confirmed that the training on synthetic images expectably improves the authentication efficiency on synthetic forgeries, especially when the same GenAI architecture was used to produce the training dataset.
Further exploration should be dedicated to quantifying the optimal ratio of humanmade forgeries to synthetic data.Moreover, it might be of interest to investigate the influence of image resolutions on the performance of both generators and classifiers.However, significantly larger computing resources than currently available to us would be needed for this type of analysis.With more computational resources, an additional improvement to this work might be the dedicated post-training of generators like Stable Diffusion on authentic artworks of given artists in order to further enhance the quality of synthetic data.
A notable limitation of our study is that the synthetic GAN-based forgeries are limited to portrait paintings due to the poor convergence of other types of images.While the artistic styles of van Gogh, Modigliani and Raphael are without doubt very different, further tests should be carried out to generalize the findings to a variety of genres and even more different artists.
16th February 2024 11/25 of images used for training, along with the corresponding quality of the generated results, sorted by categories.Furthermore, we include visual representations of sample images generated after the training.
2. Classification results for Modigliani and Raphael.The entire workflow presented for the artist Vincent van Gogh in the main manuscript has also been performed for Amedeo Modigliani and Raphael (Raffaello Sanzio da Urbino).The results can be found in this section.

A Evaluation of the quality of synthetic images generated by StyleGAN
In this section of the supplementary information to the manuscript "Synthetic images aid the recognition of human-made art forgeries" we detail the procedure employed for the training of StyleGAN3 [22], the generation of images as well as their quality.
The training was performed independently on genre-based subsets of Wikiart (www.wikiart.org),with the number of images of each subset and the corresponding genre listed in Tab. 6.We trained each genre starting from white noise (i.e.no pretraining) with a resolution of 256 × 256 pixels.The portraits analysed in the main manuscript were trained in an independent additional run using a higher resolution of 512×512.The corresponding hyperparameters we used for the training with StyleGAN3 are listed in Tab. 5.
resolution -batch= -gamma= -cbase= -glr= -dlr= -mbstd-group= 256 × 256 16 1 16384 0.001 0.001 4 512 × 512 12 5 0.001 0.001 3 It is interesting to note that the latest alias-free version included in StyleGAN3 turned out to perform worse on artworks than the older version StyleGAN2 [21], in our case realized by the corresponding flag natively provided in StyleGAN3.This is likely a consequence of local hard transitions, often featured by brush strokes, which tend to be smeared out by the translationally invariant StyleGAN3.
Tab. 6 provides an overview over the number of images used for training and the resulting image qualities achieved by the GAN for different image types with a resolution of 256 × 256.The Fréchet Inception Distance (FID) is the state of the art estimator for image quality that is closest to human perception as a rule.A high FID (ca.20 or more) usually signifies bad results, while a low FID (here less than 20) indicates that the images are reasonably realistic.However, this rule has notable exceptions, in our case 'history and genre paintings' as well as 'still and flower paintings'.Most humans would readily agree that the former category produced unsatisfactory results (see fig. 5, top left) while the latter succeeded with the flowers at least (see fig. 5, bottom right), both contrary to the FID predictions.
There is an overall trend that a larger training set results in higher quality images as could have been expected.However, even with similar sample sizes, some categories fare much better than others.Some examples are shown in Fig. 5 where the images in the top row ('history and genre paintings' and 'landscapes') have training data sets of similar size.The data sets 'figurative and allegorical' and 'still and flower paintings' are also similar in size.Their representatives in the bottom row have extremely different quality as well.We speculate on several causes which can, at least partially, be responsible for such differences in generative quality.One possibility and known issue of GANs is a poor convergence of the optimizer during the training process.A high number of minuscule details is certainly prohibitive when learning on such low resolutions, for instance, many people in the historical paintings, each with facial features.Both, the image resolution and the network capacities are insufficient to resolve this kind of details.In addition, a large diversity of images poses a difficulty because the GAN might not be able to identify reoccurring features and is incapable of generalizations.This is most likely the pivotal problem with the figurative and allegorical paintings.
A separate training run has been performed on images with the higher resolution of 512 × 512 for the 'portraits and self-portraits' category, which serves as a base for all the analysis in the main Manuscript.These GAN images are used for the benchmarks below.These are trained on a smaller training set (only 7983 of the 10 380 images had a sufficient resolution) and a higher number of network parameters (59 259 432 instead of 48 768 547 parameters in total).

B Classification results for Modigliani and Raphael
This section of the supplementary information to the manuscript "Synthetic images aid the recognition of human-made art forgeries" complements the results presented in the manuscript for the artist Vincent van Gogh with analogous studies using portraits by Amedeo Modigliani and Raphael (Raffaello Sanzio da Urbino).The entire procedure (from data generation to classification and analysis) is identical with that employed for van Gogh and we refer to the main manuscript for the details.

Data sets
The compositions of the training datasets are listed in Tabs.7 and 8 with representative images of each category displayed in Figs. 6 and 7 for Modigliani and Raphael, respectively.

Detection of human forgeries
In the following we report the results of the classification experiments: Modigliani with human-made forgeries in the training set (Tab. 9, top panels of Fig. 8), without forgeries (Tab.10, third row of Fig. 8), Raphael with forgeries (Tab.11, second row of Fig. 8), without forgeries (Tab.12, bottom panels of Fig. 8).
The accuracy of the classification of original paintings is consistently high and fully compatible with the results obtained for van Gogh paintings in the main manuscript.The accuracy of forgery classification on the other hand is lower than that observed for van Gogh in most cases, especially when no human-made forgeries are included in the training set.It is important to note that this is in no way contradicting the conclusions drawn in the main manuscript since, regardless of absolute numbers, these accuracies improve significantly in all considered cases when synthetic forgeries are added to the training data.

Figure 1 .
Figure 1.Illustration of real (top row) and synthetic (bottom row) van Gogh images."SelfPortrait with a Straw Hat", Vincent van Gogh (1887)[2] (square-cropped, top left), "Self-portrait with a Bandaged Ear and Pipe", sold by Otto Wacker, previously attributed to van Gogh[30] (top right), fine-tuned GAN generated image in the style of van Gogh (bottom left), and Stable Diffusion generated image in style of van Gogh (bottom right).

Figure 2 .
Figure 2. Composition of the training and testing sets for the different experiments.Each box in the training represents a training configuration.The configuration names on the bottom row are used throughout the following sections.Green sub-boxes indicate the original set, red indicates the contrast set.

Figure 3 .
Figure 3. Accuracies of different models for originals and forgeries.Based on the results presented in Tables2 and 3with the composition of the underlying van Gogh data set as detailed in Table1and visualised in Fig.2.The horizontal dotted line shows the baseline without synthetic images in the training data.Similar results for the artists Modigliani and Raphael can be found in S1 Appendix.
nt h et ic ra w G A N s tu n ed G A N s d iff u si on d iff .+G A N s accuracy training without forgeries, EfficientNet B0 forgeries original

Figure 4 .
Figure 4. Accuracies of different models for synthetic data.Based on the results shown in Table4with the composition of the underlying van Gogh data set as detailed in Table1and visualised in Fig.2.

Figure 7 .
Figure 7. Illustration of real (top row) and synthetic (bottom row) Raphael images."Madonna with Child" by Raffaello Sanzio [44] (square-cropped, top left), "Portrait of a Young Man in Red" by the Circle of Raphael [45] (top right), fine-tuned GAN generated image in the style of Raffaello (bottom left), and Stable Diffusion generated image in the style of Raffaello (bottom right).

Figure 8 .
Figure 8. Accuracies of different models.Classification results for Modigliani and Raphael based on tables 9 to 12.The horizontal dotted line shows the baseline without synthetic images in the training data.

Table 1 .
Composition of the van Gogh dataset.

Table 2 .
Performance on different tests after training with forgeries.The composition of the underlying van Gogh data set is detailed in Table 1 and visualised in Fig. 2. The best result for each test is highlighted in bold.Values are medians with respective uncertainties in parentheses.

Table 3 .
Performance on different tests after training without forgeries.The composition of the underlying van Gogh data set is detailed in Table1and visualised in Fig.2.The best result for each test is highlighted in bold.

Table 4 .
Accuracy of synthetic forgery detection.The composition of the underlying van Gogh data set is detailed in Table1and visualised in Fig.2.The best result for each test is highlighted in bold.Values are medians with respective uncertainties in parentheses.

Table 6 .
Training data sets.Number of images with resolution at least 256 × 256, and quality of the training results using StyleGAN3.We provide the Fréchet Inception Distance (FID) as a metric for the quality of the generated images (lower is better).

Table 7 .
Composition of the Modigliani dataset.

Table 8 .
Composition of the Raphael dataset.

Table 9 .
Performance for Modigliani on different tests after training with forgeries.

Table 10 .
Performance for Modigliani on different tests after training without forgeries.

Table 11 .
Performance for Raphael on different tests after training with forgeries.

Table 12 .
Performance for Raphael on different tests after training without forgeries.