Can we spot fake public comments generated by ChatGPT(-3.5, -4)?: Japanese stylometric analysis expose emulation created by one-shot learning

Public comments are an important opinion for civic when the government establishes rules. However, recent AI can easily generate large quantities of disinformation, including fake public comments. We attempted to distinguish between human public comments and ChatGPT-generated public comments (including ChatGPT emulated that of humans) using Japanese stylometric analysis. Study 1 conducted multidimensional scaling (MDS) to compare 500 texts of five classes: Human public comments, GPT-3.5 and GPT-4 generated public comments only by presenting the titles of human public comments (i.e., zero-shot learning, GPTzero), GPT-3.5 and GPT-4 emulated by presenting sentences of human public comments and instructing to emulate that (i.e., one-shot learning, GPTone). The MDS results showed that the Japanese stylometric features of the public comments were completely different from those of the GPTzero-generated texts. Moreover, GPTone-generated public comments were closer to those of humans than those generated by GPTzero. In Study 2, the performance levels of the random forest (RF) classifier for distinguishing three classes (human, GPTzero, and GPTone texts). RF classifiers showed the best precision for the human public comments of approximately 90%, and the best precision for the fake public comments generated by GPT (GPTzero and GPTone) was 99.5% by focusing on integrated next writing style features: phrase patterns, parts-of-speech (POS) bigram and trigram, and function words. Therefore, the current study concluded that we could discriminate between GPT-generated fake public comments and those written by humans at the present time.


Introduction
Currently, we are facing an unprecedented crisis caused by artificial intelligence (AI).The proliferation of disinformation such as fake news and images may begin to surround us without our recognition.ChatGPT [1] has played a major role in sparking the beginning.This large language model (LLM), trained and released by OpenAI on November 30, 2022, naturally generates human-like text.Recent chatbots have a generative pretrained transformer (GPT), which dramatically improves the generative performance.These chatbots are convenient and provide various benefits, but it is easy to imagine many kinds of problems, such as manipulating public opinion, writing fake customer reviews, and submitting fabricated academic papers.It has already become possible for anyone to easily generate a large amount of fake public comments for the purpose of making the government create laws and regulations in line with one's own opinions.To make matters worse, previous studies [2,3] have verified that almost no people can distinguish between AI-generated and human-written sentences at first glance.Such social problems have already arisen worldwide.Therefore, controlling and understanding generative AI is an urgent issue for humans.The purpose of this study is to try to classify human public comments and ChatGPT-generated fake public comments.
Several researchers have reported the possibility of distinguishing between ChatGPT-generated and human-written texts [4,5].Desaire et al. [4] made ChatGPT-3.5 learn human-written academic papers as training data and compared ChatGPT-generated and human-written texts.Zaitsu & Jin [5] also gave instructions against ChatGPT to generate texts by presenting the titles of Japanese scientific academic papers.The results of these studies were distinguishable with nearly 100% accuracy.However, several studies for distinguishing AI-generated and human-written sentences exist.Therefore, it is necessary to conduct research that targets various genres.Brown et al. [6] proposed a learning method without changing the parameters of GPT-3, such as fine-tuning: zero-shot, one-shot, and few-shot learning.One-shot or few-shot learning attempts to obtain an answer by providing prompts with any additional information, whereas zero-shot learning only provides instructions without other information.A question arises here: when we present human-written text as a sample against AI and instruct them to emulate the contents and writing styles of the example, can we distinguish between AI-emulated and human-written texts?In this study, we compared ChatGPT-generated fake public comments with and without emulation (i.e., zero-shot or one-shot learning) to human-written true public comments.This study will make a great contribution to the solutions of problems and risks facing modern society, especially manipulating public opinions using fake public comments generated by ChatGPT.
Public comments (or public consultations) are important civic opinions in establishing rules and orders, such as laws and regulations, and differ from academic papers in two ways: (1) Higher degree of freedom in writing styles because public comments have fewer constraints.(2) Public comments (a few hundred characters) have fewer word counts than academic papers (over thousands of characters).It is expected that the higher the degree of freedom for writing, the easier it is to discriminate the texts of both AI and humans because the features of writing styles are easily expressed.On the other hand, the fewer the word counts, the more difficult the discrimination because the amount of information available for distinguishing decreases.
In study 1, we prepared sample through following methods: (1) "HM" texts (100 samples): human public comments published by Japanese national administrative agencies, (2) "GPT3.5 zero " or "GPT4 zero " texts (every 100 samples): ChatGPT (GPT-3.5 and -4)-generated texts with only presenting the title of public comments (zero-shot learning), (3) "GPT3.5 one " or "GPT4 one " texts (each 100 sample): We instructed ChatGPT (GPT-3.5 and -4) to emulate the contents and writing styles of human public comments while presenting the entire body (one-shot learning).Each fake public comment generated by both ChatGPT and each public comment text written by a human were paired and had similar content.Next, we compared these texts from the perspective of their Japanese stylometric features.Especially, we analyzed no meaning stylometric features such as function words or sentence structures, rather than content words such as noun 'cat', verb 'run', and adjective 'beautiful', because the former features are not dependent on topic and genre of texts.
Thus, this study proposes the following hypothesis: Hypothesis 1: As shown in a previous study [5], the Japanese stylometric features of both GPT zero texts (GPT3.5 zero and GPT4 zero ) are completely different from those of HM texts, even in public comments.Hypothesis 2: Both GPT one texts (GPT3.5 one and GPT4 one ) are closer to the HM texts than the GPT zero texts because of the effect of one-shot learning.Hypothesis 3: We can discriminate GPT-generated both types of fake public comments (both GPT zero and GPT one ) from human public comments using Japanese stylometric analysis, even if this study supposed hypothesis 2.

Sample
As stated previously, we collected 100 Japanese public comments from the e-Gov website (https://www.e-gov.go.jp) published by Japanese national administrative agencies.There is no copyright problem because this website states that published information is not subject to copyright and can be freely used.Public comments covered various topics: telework security guidelines, eel aquaculture, support for the independence of the homeless, personal information protection law, etc.The number of characters in HM texts resulted in a mean of 661.3 (SD 132.0) and a median of 627.
Next, we make ChatGPT generated 100 texts (GPT-3.5)and 100 texts (GPT-4) in Japanese (i.e., GPT3.5 zero and GPT4 zero texts) with the next prompts: "You are 'general citizen.' Write a public comment (criticism, request, and opinion) about 'title of the public comment.'."If the attribute of the person who wrote the public comment was known to us, we change 'general citizen' to a specified attribute such as business person, lawyer, or doctor.The number of characters showed a mean of 604.3 (SD 61.3) and a median of 601.5 in GPT3.5 zero and a mean of 620.4 (SD 61.8) and a median of 621 in GPT4 zero .
Lastly, as with GPT3.5 one and GPT4 one texts, we have ChatGPT generated two sets of 100 Japanese texts by having each ChatGPT (-3.5 and -4) emulate while presenting human public comments with the next prompts: "The following statement is a public comment (criticism, request, and opinion) submitted from a general citizen.Write a public comment similar in content and in writing style to this statement."The number of characters of GPT3.5 one was a mean of 603.3 (SD 71.5) and a median of 594 and that of GPT4 one was a mean of 604.6 (SD 54.8) and a median of 621.

Japanese stylometric features
We counted the frequency of occurrence of the next stylometric features and calculated the rate of frequency of occurrence within each text to avoid depending on the length of the count words of the texts.
Phrase patterns.Phrase patterns are regarded as effective features for authorship attribution in the Japanese language [7].To analyze these features, we attached POS tags to each word using morphological analysis and divided the sentences into phrases using syntactic analysis.After the analysis, we focused on the combination of function words and POS of content words within each phrase: "noun + が (postpositional particle)", "noun + noun + へ (postpositional particle) + の (postpositional particle)", "実際 (adverb) + に (postpositional particle)", and "noun + noun + noun + の (postpositional particle)" etc.
Parts-of-speech (POS) bi-and trigrams.The concept of N-gram is used in the field of quantitative linguistics to determine the frequency of a contiguous sequence of symbols (characters, words, phrases, etc.) in a sentence.Bigram is in the case of N = 2 ("preposition + noun" etc.), and trigram is in the case of N = 3 ("preposition + noun + adjective" etc.).Both POS trigrams and bigrams are effective stylometric features for authorship attribution [8].
Bigrams of postpositional particle words.The frequency of a contiguous sequence of postpositional particle words such as "を(case particle) + の (case particle)" and "は (binding particle) + が (case particle)" etc.A previous study on Japanese authorship attribution [9] reported effectiveness as a distinguishable feature but lower performance in AI detection tasks [5].
Function words.Preceding study of authorship attribution [10] and AI detection task [5] reported the function words as quite distinguishable features: "だ (auxiliary verb)," "また (conjunction)," and "は (postpositional particle)." In Study 1, we confirmed which stylometric features were effective; in Study 2, we consolidated the effective features into integrated ones to examine incremental validity in verifying distinguishable performance levels.

Analysis procedure
The current study essentially adopted the analysis procedure and statistical methods of Zaitsu & Jin [5] to compare the current results with prior results.
Study 1.To examine Hypotheses 1 and 2, we used classical multidimensional scaling (MDS).This statistical method can display the similarity between texts as distance; the more similar both texts are, the closer they are in dimensions.In MDS, the definitions of distances exist in various forms, and we used the symmetric Jensen-Shannon divergence distance (d SJSD ) to compare 500 texts of five classes (HM, GPT3.5 zero , GPT4 zero , GPT3.5 one , and GPT4 one ) in each Japanese stylometric feature because it is effective for authorship attribution [13] and AI detection [5].The Eq (1) for the distance between x and y is shown below.We conducted MDS using the cmdscale function of the stats package of the R language.
Study 2. To verify the performance level for distinguishing among the three classes (GPT zero , GPT one , and HM), we used random forest (RF) and executed leave-one-out crossvalidation (LOOCV).The RF classifier is a classical machine learning method similar to bagging.The reasons that we selected this classifier are follows: (1) The RF classifier is effective for authorship attribution [14] among several other classifiers and AI detection [5] in Japanese.
(2) we investigate the effective stylometric features for distinguishing AI-generated texts from human-written ones.LOOCV is a type of cross-validation used to evaluate the generalization performance of a model.In this study, one text was excluded from the 500 texts as the testing set, and the RF classifier was trained using the remaining 499 texts to classify the testing text into one of three classes.These procedures were repeated 500 times using different test sets.We used the randomForest function of the random Forest package and set the number of decision trees to 1,000 and the other hyperparameters to default.Table 1 shows the means and standard deviations of the distances of the texts between GPT (GPT zero and GPT one ) and HM, corresponding to Figs 1-6.The distances between GPT one and HM were shorter than those between GPT zero and HM.This implies that GPT one texts are more similar to human texts compared to GPT zero .These results support Hypothesis 2.

Study 2: Evaluation of performance of RF classifier at LOOCV
First, we integrated GPT-3.5 and GPT-4 texts in each GPT-generated type, such as the three classes (GPT zero , GPT one , and HM).To evaluate the performance level for classifying the three classes using RF, we executed LOOCV and created confusion matrices for multiclass  2 presents an example confusion matrix for these three classes.For instance, the cell of a in Table 2 means that RF classifier correctly judges text generated by ChatGPT with zero-shot learning as "GPT zero ", whereas the one of c indicates mistakes a judge as the text written by human.Next, based on the confusion matrix, the classification performance was assessed using the following metrics: "accuracy" in Eq (2), "recall" in Eqs (3A) to (3C), and "precision" in Eqs (4A) to (4C).The metric values were calculated for each class, together with the macro-average values (Eqs (5A) to (5B)).or or or Macro average for precision Additionally, we combined the class of GPT zero and GPT one texts as "GPT zero and one " and calculated "recall for GPT zero and one " and "precision for GPT zero and one ".Refer to the following Eqs (6A) and (6B) for details of the metric calculations.Among these performance metrics, we regard both "precision for HM" of Eq (4C) and "precision for GPT zero and one " of Eq (6B) as the most important performance metrics because our human want to accurately predict whether the sentences by an unknown author was written by ChatGPT or by a human.
Finally, we integrated four effective features (the phrase patterns, the POS bigrams and trigrams, and the function words) and analyzed them using the integrated features.Table 9 presents the confusion matrix for the integrated features.The performances were slightly improved, compared to other features: Accuracy (91.6%), recall for GPT zero (97.0%), recall for GPT one (83.0%),recall for HM (98.0%), precision for GPT zero (89.8%), precision for GPT one (95.4%),precision for HM (89.1%), recall for GPT zero and one (97.0%),precision for GPT zero and one (99.5%),macro average for recall (92.7%), and macro average for precision (91.4%).This study demonstrated incremental validity because the integrated features achieved the best classification performance.
In addition to above the analyses, we calculated the classification performance metrics by focusing only on the integrated features to compare each GPT type to HM as follows: (1) GPT zero (GPT 3.5 zero vs. GPT4 zero ) vs. HM and (2) GPT one (GPT3.5one vs.GPT4 one ) vs. HM.With regard to GPT zero vs. HM, we can completely distinguish the GPT zero texts from the HM (Table 10).Therefore, all performance metrics (accuracy, recall, and precision for GPT zero vs. humans) resulted in 100%.However, in the case of GPT one vs.human (Table 11), the classification performance slightly decreased compared to the other cases (GPT zero vs. human) but maintained a high performance level: accuracy (95.3%), recall for GPT one (94.5%),recall for HM (97.0%), precision for GPT one (98.4%),and precision for HM (89.8%).

Discussion
This study examined whether we could distinguish between human public comments and ChatGPT-generated fake public comments (including ChatGPT-emulated humans) using Japanese stylometric analysis.
According to Study 1, the results of the MDS indicated that GPT zero texts generated by presenting only the titles of public comments applicable to zero-shot learning were completely different from human-written texts.However, most of the GPT one texts, which emulated human public comments (i.e., one-shot learning), were positioned between the distributions of GPT zero and HM on the MDS dimension.Furthermore, some GPT one texts overlapped slightly with the human texts.These results support Hypotheses 1 and 2: Japanese stylometric features of GPT zero texts are completely different from those of human public comments, and GPT one texts are more similar to human public comments than GPT zero .We consider that this center positioning of the GPT one texts means not "closer from GPT zero to human" but "closer from human to GPT zero " because GPT one may start emulating and generating from human public comment.That is, GPT one texts may be closer to GPT zero texts by emulating and modifying the HM texts.Furthermore, according to the Figure (especially Figs 1-3), the texts of GPT4 one are farther away from the distribution of HM texts than GPT3.5 one .These results suggest that the higher the performance of ChatGPT (i.e., GPT-4 at present), the easier it may be to distinguish emulated texts from human-written texts because higher-performance ChatGPT can more sophisticatedly rewrite human-written texts to make them closer to GPT zero texts.Regardless of the lower word counts in the current study (appropriately 600 characters vs. 1,000 characters in a previous study [5]), the differences between the GPT with zero-shot learning and humans were larger in the current study than in the previous study.It is unclear why these results occurred because several factors, such as word count (600 words vs. 1,000 words) and categories (public comments vs. academic papers), were confounded.Fig 5 indicates that the positioning of commas had little distinguishable effect because almost all texts in each class overlapped.A previous study [5] demonstrated a certain effective level of comma positioning.We considered the possibility that the difference in genres (academic papers and public comments) influenced these results.Therefore, we need to further examine other genres of texts.Study 2 showed that the best precision HM achieved was approximately 90% and that GPT zero and one reached was 99.5%.Considering these results, it can be said that Hypothesis 3 was supported: we can discriminate fake public comments generated by ChatGPT from human public comments.Among the six Japanese stylometric features, phrase patterns indicated the best discriminable performance and POS bigrams and trigrams showed high classification accuracy.ChatGPT is not good at rewriting texts taking these features into consideration because these stylometric features (the phrase patterns, POS bigrams, and POS trigrams) are regarded as a deeper structural aspect of sentences.However, the present study revealed low performance of the positioning of commas, particularly in the GPT emulation.ChatGPT can easily rewrite this feature in sentences because of linguistically low-level features.While presenting human public comments and making ChatGPT emulate, we confirmed ChatGPT often just paraphrased words (e.g., from "ignorant" to "fool").Therefore, presently, even if we analyze other languages, we may be able to distinguish sentences between generative AI and humans by focusing on deeper structures.
Above the results of this study limited Japanese language.Zaitsu & Jin [5] also pointed out that Japanese language have different notation formats (Kanji, Hiragana, and Katakana) and no space between words as opposed to English.Therefore, we need conduct similar verification for other languages as well.In addition, we need collect and analyze larger sample size of humanwritten and AI-generated public comments for the purpose of generalization of this study.
Recently, the disinformation generated by AI, such as fake news, has become a problem worldwide because these fakes are instantly and widely generated.Disinformation has certainly caused chaos in the human world; therefore, we need techniques to control generative AI, including sophisticated classifiers.

Figs 1 -
Figs 1-6 show the degrees of similarity and difference between the texts belonging to the five different classes separately for the six types of stylometric features.First, except for the positioning of commas in Fig 5, the stylometric features (Figs 1-4 and 6) appear to be HM texts that are completely separated from both GPT zero texts.These results support hypothesis 1.Second, all but Fig 5 indicated that GPT3.5 zero and GPT4 zero have different distributions.Finally, according to all Figures except Fig 5, the distributions of both GPT3.5 one and GPT4 one are slightly closer to HM texts and are positioned between the distribution of GPT zero texts and that of HM texts.Moreover, some GPT one texts overlapped with HM texts.Table1shows the means and standard deviations of the distances of the texts between GPT (GPT zero and GPT one ) and HM, corresponding to Figs 1-6.The distances between GPT one and

Fig 5 )
displays a mixture of all classes, which means that the positioning of commas is not an effective feature for classifying ChatGPT-generated and human-written public comments.Based on the above results, we judged phrase patterns (Fig 1), POS bigrams (Fig 2), POS trigrams (Fig 3), and function words (Fig 6) to be effective stylometric features for discriminating texts between ChatGPT and humans.Therefore, we integrated these four stylometric features and used them as "integrated features" for the next analysis.Fig 7 shows the MDS configuration of the texts, focusing on integrated features.

Table 1 . The means and standard deviations of distances of the entire texts between GPT (GPT zero and GPT one ) and HM corresponding to each stylometric feature.
https://doi.org/10.1371/journal.pone.0299031.t001Precisionfor GPT zero and one