Figures
Abstract
This study investigates the syntactic complexity of argumentative essays generated by ChatGPT in comparison to those written by native speakers. By examining cross-rhetorical-stage variation in syntactic complexity, we explore how ChatGPT’s writing aligns with or diverges from human argumentative writing. The results reveal that ChatGPT and native speakers exhibit similar patterns in mean length of sentence in the thesis stage, mean length of T-unit and complex nominals per T-unit in the conclusion stage. However, ChatGPT showed a preference for coordination structures across all stages, relying more on parallel constructions, and native speakers used subordination structure and verb phrases more frequently across all stages. Additionally, ChatGPT’s syntactic complexity was characterized by lower variability across multiple measures, indicating a more uniform and formulaic output. These findings underscore the differences between ChatGPT and native speakers in syntactic complexity and rhetorical functions in argumentative essays, therefore contributing to our understanding of ChatGPT’s argumentative writing performance and providing valuable insights for ChatGPT integration into writing instruction.
Citation: Liu W, Liu X (2025) A comparative analysis of syntactic complexity in argumentative essays from rhetorical perspective: ChatGPT vs. English native speakers. PLoS One 20(8): e0329410. https://doi.org/10.1371/journal.pone.0329410
Editor: Natalia Grabar, STL UMR8163 CNRS, FRANCE
Received: March 24, 2025; Accepted: July 16, 2025; Published: August 1, 2025
Copyright: © 2025 Liu, Liu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In recent decades, scholars have increasingly explored how syntactic complexity interacts with rhetorical units [1–3]. Previous studies in this field have primarily focused on syntactic complexity in research articles [2,3], investigating how syntactic complexity varies across rhetorical moves and how it is influenced by factors such as academic disciplines [4] and writer groups [5]. While these studies have enhanced our comprehension of syntactic complexity in relation to rhetorical moves, the focus on genre-specific research articles limits their broader applicability, leaving other forms of writing relatively understudied [6].
Argumentative writing is universally acknowledged as a crucial component of literacy, particularly essential for undergraduates to fulfill their academic requirements [7,8]. Through composing argumentative essays, students are expected to analyze complex issues, evaluate evidence, and display critical thinking abilities [9]. Despite the significance of argumentative essays, student writing often reveals a lack of “audience awareness” [10], which required the use of appropriate rhetorical techniques to engage targeted readers. appropriate rhetorical techniques to engage targeted readers. In addition to rhetorical competence, syntactic complexity, another key aspect in student writing [11], also warrants attention, particularly in terms of how it interacts with rhetorical structure in argumentative essays. Yet this area remains underexplored, calling for further investigation.
To support the integration of syntactic complexity and rhetorical structures in argumentative writing, an increasing number of AI tools are now available to help. One such tool is ChatGPT. Created by OpenAI, ChatGPT stands out as a sophisticated chatbot capable of generating human-like text based on input prompt. Its functions in argumentative writing have been validated in terms of the capability to produce natural language that addresses dialogical, structural, and linguistic challenges [12]. For instance, ChatGPT can generate accurate argumentative summaries on provided topic based on the prompts of contexts and targeted audience [13]. However, introducing ChatGPT into writing instruction is premised on more accurate grasp of the writing performance and characteristics of ChatGPT and human writers. Given that native speaker texts are often regarded as authentic models of language use and serve as a model for language learners [14], such comparison enables us to assess whether, and in what aspects, ChatGPT can function as an effective model for student writing. It is therefore important to examine the extent to which ChatGPT-generated texts resemble or diverge from native speaker writing.
Our study aims to examine the syntactic complexity of specific rhetorical stages and how it varies across these stages in ChatGPT-generated argumentative texts through a comparative analysis with texts written by native speakers. By analyzing texts of similar length and same topic, this study not only provides empirical evidence for identifying ChatGPT-generated texts, but also offers pedagogical implications for using ChatGPT as a writing assistant. Specifically, the study addresses the following research questions:
- How does the syntactic complexity differ between argumentative essays generated by ChatGPT and those written by native speakers across rhetorical stages?
- If syntactic complexity varies within each group across rhetorical stages, how do ChatGPT and native speakers exhibit these different patterns of variation?
Based on previous findings on human and AI-generated texts [15,16], the study hypothesizes that:
Literature review
This section reviews literature on ChatGPT and writing, rhetorical moves in argumentative writing, and the relationship between syntactic complexity and rhetorical moves, outlining previous findings and research gaps relevant to the current study.
ChatGPT and writing
ChatGPT has garnered significant attention from scholars across various domains since its release, sparking numerous studies and inquiries into its capabilities and implications. Built on advanced machine learning algorithms and extensive datasets, ChatGPT is capable of dealing with a wide range of tasks across different prompts, such as text generation, modification and translation [17]. But among its various applications, writing is considered the most useful function [18]. Recent studies have shown that ChatGPT can produce human-like texts [19], assisting students in improving grammar, coherence, and stylistic refinement in their writing. However, whether it can produce texts with coherent rhetorical structure remains less explored [20].
Beyond these contributions, attempts to investigate differences between ChatGPT-generated and human-written texts also lay a foundation for exploring ChatGPT’s potential in writing. Tudino and Qin noted that, compared with human-authored texts, ChatGPT-generated research articles tended to overuse low-frequency “academic” vocabulary, underutilize subordination, and display less syntactic and semantic variation [21]. Jiang and Hyland [15,16] analyzed argumentative essays and found ChatGPT texts included fewer lexical bundles and engagement markers, reflecting a more rigid and formulaic style with limited capacity for constructing interactional arguments. Conversely, native-speaker essays exhibited a stronger authorial presence and made greater use of engagement markers, indicating more interactive and persuasive discourse. These findings collectively underscore the distinct linguistic strategies employed in ChatGPT-generated and human-authored texts while also revealing ChatGPT’s limitations in argumentative writing.
In addition to aforementioned features relevant to argumentative writing, rhetorical moves, functioning as discoursal units that help achieve coherent communicative goals, are also essential in structuring argumentative essays [22]. But to date, only a few scholars have focused on the rhetorical moves of ChatGPT-generated texts. Kong and Liu [20], for instance, found both human-written and ChatGPT-generated abstracts frequently included the moves of presenting the research and describing the methodology. Nevertheless, unlike ChatGPT, human-written abstracts often integrated the methodology move with the results move and exhibited a different overall sequence of rhetorical moves. While this research has provided new insights, it remains largely limited to abstracts, leaving other important genres like argumentative essays underexplored. Given the importance and fundamental role of argumentative writing in academic contexts, this lack of attention to rhetorical structure in ChatGPT-generated argumentative essays reveals an important research gap. A comprehensive understanding of the rhetorical structures in ChatGPT-generated and human-written argumentative essays is essential for improving ChatGPT’s ability to meet the specific demands of this genre and refining its role as a writing assistant in academic contexts.
Syntactic complexity and rhetorical moves in argumentative writing
Syntactic complexity, referring to the diversity, sophistication, and elaboration of syntactic structures in language production [11,23], is widely recognized as a key indicator of linguistic proficiency [24]. Research has indicated that higher language proficiency is often related to more complex syntactic structures [25, 26, 27]. In addition, employing appropriate syntactic complexity to address different registers, genres, and tasks is also a sign of mature writers [28–31]. For instance, argumentative essays typically demonstrate greater syntactic complexity than narrative essays, particularly in terms of the length of production units, coordination, and phrasal sophistication [31], reflecting specific linguistic demands of different genres.
Therefore, to complete genre-based writing, rhetorical moves, which guide readers to the theme and substantive content of a text [32,33], also deserve our attention. Rhetorical moves refer to functional units, serving specific communicative purposes that collectively support the text’s overall rhetorical goal within a particular social context [3]. These moves vary across genres and sections of academic texts, reflecting their unique rhetorical goals. For example, Hyland’s [34] model for argumentative essays includes three stages, namely Thesis, Argument, and Conclusion, which correspond to rhetorical moves as defined, each fulfilling specific communicative functions to persuade readers of the central statement’s correctness [6].
In recent years, increasing scholarly attention has been directed toward the relationship between syntactic complexity and rhetorical moves in argumentative writing [3,4]. Much of this research has focused on research articles, particularly in sections such as abstracts and introductions [4,35]. However, limited studies have examined this relationship in argumentative essays. To our knowledge, only Zhang and Cui [6] revealed significant syntactic differences across rhetorical stages in native and L2 learners’ argumentative writing.
While all of the above studies focus on comparisons of language produced by humans, there is insufficient research comparing AI-generated language with regard to its syntactic complexity and rhetorical moves. Zindela [36] compared a single syntactic complexity measure between ChatGPT-generated and L2 human-written argumentative essays, finding that L2 learners’ essays exhibited lower mean length of sentence than those written by ChatGPT. However, studies comparing ChatGPT-generated texts with those of human like native speakers, particularly with a focus on syntactic complexity across rhetorical moves, are notably absent. By examining these differences, educators and researchers can also develop a more profound insight into how to effectively incorporate ChatGPT into writing instruction with academic standards. Therefore, considering the literature and research gaps above, this study examines the syntactic complexity of ChatGPT-generated essays compared with those written by native speakers, focusing on how these features vary across rhetorical stages and their implications for argumentative effectiveness.
Method
This section outlines the methodology employed to compare syntactic complexity in ChatGPT-generated and native speaker argumentative essays. It is organized into four subsections. The first describes the data sources and selection criteria for both corpora, ensuring consistency in topic and length. The second details the procedures for annotating rhetorical stages based on Hyland’s framework. The third introduces the syntactic complexity measures adopted from Lu’s L2 Syntactic Complexity Analyzer (L2SCA). The last explains the statistical methods used to analyze group differences in syntactic complexity across rhetorical stages.
Data
A native speaker corpus, sourced from the International Corpus Network of Asian Learners of English, consists of 73 essays by American students across Social Sciences, Sciences and Technology, Humanities, and Life Sciences [37]. Among the native speaker section of ICNALE, 57% of the participants are identified as American. However, only 73 essays were confirmed to be written by American students, as the remaining participants were marked as “N/A” or categorized as English teachers or adults with varied job backgrounds. To ensure demographic consistency and maintain the validity of the comparison, this subset of 73 essays was selected for analysis.
To ensure the reliability of this analysis, several key factors were carefully controlled. First, all essays were produced under standardized conditions, including a time limit and a word count range of 200–300 words. Second, to minimize potential linguistic variation caused by differences in essay topics, this study selected one of ICNALE’s two common argumentative topics, “Smoking should be completely banned at all restaurants in the country,” ensuring topic consistency across the dataset. This controlled design enhances the comparability of essays generated by ChatGPT and native speakers, enabling a targeted analysis of syntactic complexity and rhetorical patterns.
The ChatGPT corpus was created by prompting the GPT-4o mini model to simulate argumentative writing typical of competent university students, chosen for its feasibility in this study and its current availability as a free tool for public use. The data collection process utilized a standardized prompt, adapted from Jiang and Hyland’s [15] design, instructing the model as follows: “You are a competent university-student writer of English tests for academic purposes. Write an argumentative essay on the topic ‘Smoking should be completely banned at all restaurants in the country.’ You can choose your stance and fulfill the argumentation without giving the title. The essay is at least 200 words but no more than 300 words.” Due to inherent word count constraints in ChatGPT, essays could only be generated one at a time, requiring the prompt to be repeated 73 times to produce the full dataset. This process ensured consistency in task requirements across both corpora. The resulting essays varied in stance (pro and con) and rhetorical strategies, providing a robust basis for comparing the syntactic complexity and rhetorical stages of ChatGPT-generated texts with those written by native speakers.
After collecting a total of 146 argumentative essays, all texts were manually converted into Word files with formatting consistency and no typos or grammatical errors being rectified. The essays from the two groups were organized into separate datasets to facilitate systematic comparison. Finally, two files were generated, comprising 19,868 words for the ChatGPT’s argumentative essays and 15,574 words for native speakers’ argumentative essays respectively. The average word count per ChatGPT’s argumentative essays is 272.164 words, while the average word count per native speakers’ argumentative essays is 213.342 words, as detailed in Table 1.
Rhetorical stage annotation
The texts were manually annotated for rhetorical stages based on Hyland’s [34] three-stage framework in argumentative essays, which is shown in Table 2. First, two experienced Chinese university professors, both possessing more than 10 years of expertise in teaching English writing, were invited to independently annotate a sample of 20 essays (10 written by native speakers and 10 generated by ChatGPT). To minimize potential bias, the annotators were not informed of the source of each essay during the annotation process. After completing their initial annotations, the inter-rater reliability was calculated using Cohen’s kappa coefficient, yielding a value of 0.773, which indicates substantial agreement. The two teachers then compared results and discussed all instances where they disagreed on the boundaries or labels of rhetorical stages, a common challenge that other researchers have also encountered in manual annotation [38]. Discrepancies typically involved ambiguous transitions between moves, such as between argument development and conclusion. These were resolved through collaborative discussion and reference to Hyland’s framework until consensus was reached. After that, the remaining 126 essays were independently annotated by the two teachers. Finally, a thorough review of all annotations was conducted to ensure consistency. The inter-rater reliability was calculated using Cohen’s kappa coefficient, yielding a value of 0.898, which indicates almost prefect agreement. During this review, the annotations were cross-checked again for adherence to Hyland’s framework, focusing on the accuracy of rhetorical stage labels, completeness of rhetorical structure, and consistency in boundary demarcation. Any remaining ambiguities were resolved through discussion.
Syntactic complexity measurement
Syntactic complexity refers to the degree of variation, elaboration, and sophistication in language production, and is commonly conceptualized as a multidimensional construct involving global complexity, subordination, coordination, clausal elaboration, and phrasal complexity [39]. To evaluate the syntactic complexity of each argumentative essay, we utilized the L2 Syntactic Complexity Analyzer (L2SCA), a software tool created by Lu Xiaofei. [40]. This is because this tool closely aligns with the definition of syntactic complexity, as it operationalizes the construct across five dimensions through 14 measures, which were specially selected from over 100 measures reviewed by Wolfe-Quintero et al. [41] and Ortega [42]. Moreover, L2SCA has demonstrated reliable accuracy in computing complexity scores [40] and has been widely applied in studies on syntactic complexity and rhetorical moves. Following prior studies that employed L2SCA to investigate syntactic complexity from a rhetorical perspective [4,23], all 14 measures were included in our analysis. Table 3 presents a detailed review of all measures, including their corresponding codes and definitions.
To assess syntactic complexity based on stages, we created individual plain text files for each ChatGPT’s and native speaker’s argumentative essays. These files were named S1GPT1, S1NS1, S2GPT1 and S2NS1, etc. Then, the L2SCA tool was employed to calculate the values of the 14 measures representing syntactic complexity in these files.
The statistical analysis
The statistical analysis was conducted using SPSS 27.0. First, the test for normality and homogeneity of variance was carried out to evaluate the data distribution, indicating that both ChatGPT’s and native speakers’ essays were normally distributed across all syntactic complexity measures. However, some indices failed to meet the assumption of homogeneity of variance. Thus, the Mann-Whitney U test, a non-parametric counterpart to the t-test, assessed differences across 14 syntactic complexity measures in both groups.
Results
This section presents the results on syntactic complexity and rhetorical stages in argumentative essays produced by ChatGPT and by native English speakers. It is organized into two subsections. The first compares syntactic complexity between ChatGPT-generated and native-speaker essays across rhetorical stages, identifying significant differences in various syntactic measures. The second explores how syntactic complexity varies across rhetorical stages within each group, with patterns illustrated through statistical analysis and figures to show distinct trends in the writing of ChatGPT and native speakers.
The difference between ChatGPT and native speakers by rhetorical stages
Table 4 shows the difference of syntactic complexity measures between ChatGPT and native speakers across rhetorical stages. Although no significant differences were observed in MLS in the thesis stage or in MLT and CN/T in the conclusion stage in both groups, most other measures showed significant group differences. ChatGPT demonstrated significantly higher values than native speakers in some syntactic complexity measures across rhetorical stages. MLC was significantly higher for ChatGPT in the thesis (15.259 vs. 8.596, p < .001), argument (12.349 vs. 8.726, p < .001), and conclusion stages (19.608 vs. 10.087, p < .001). Similarly, ChatGPT demonstrated significantly higher values in CN/C across all stages, including the thesis (2.154 vs. 1.043, p < .001), argument (1.787 vs. 0.910, p < .001), and conclusion stages (2.605 vs. 1.206, p < .001). CP/C was also significantly more frequent in ChatGPT’s writing at every stage, with CP/C in the thesis (0.478 vs. 0.069, p < .001), argument (0.439 vs. 0.145, p < .001) and conclusion stages (0.780 vs. 0.154, p < .001). CP/T followed a similar pattern, with ChatGPT showing significantly higher values in the thesis (0.602 vs. 0.141, p < .001), argument (0.622 vs. 0.328, p < .001) and conclusion stages (0.824 vs. 0.355, p < .001). MLT was significantly higher for ChatGPT in the thesis stage (19.361 vs. 17.777, p = .021), while CN/T were significantly higher for ChatGPT in the thesis (2.726 vs. 2.218, p < .001) and argument stages (2.538 vs. 2.136, < .001).
To compare, native speakers demonstrated significantly higher values than ChatGPT in many other syntactic complexity measures. In all the measures related to amount of subordination, native speakers showed significantly higher values. For C/T, native speakers exhibited higher values in the thesis (2.186 vs. 1.363, p < .001), argument (2.368 vs. 1.431, p < .001), and conclusion stages (2.310 vs. 1.100, p < .001). Similarly, native speakers’ texts suggested significantly higher CT/T in the thesis (0.747 vs. 0.330, p < .001), argument (0.780 vs. 0.364, p < .001), and conclusion stages (0.715 vs. 0.135, p < .001). DC/C followed the same pattern, with native speakers showing significantly higher values in the thesis (0.448 vs. 0.201, p < .001), argument (0.509 vs. 0.292, p < .001), and conclusion stages (0.422 vs. 0.087, p < .001). DC/T was also significantly higher for native speakers across all stages including thesis (1.161 vs. 0.368, p < .001), argument (1.246 vs. 0.433, p < .001), and conclusion stages (1.173 vs. 0.126, p < .001). Native speakers also demonstrated significantly higher values than ChatGPT in measures of T/S, VP/T and C/S across all stages. For T/S, native speakers showed significantly higher values in the thesis (1.261 vs. 1.100, p = .013), argument (1.432 vs. 1.043, p < .001), and conclusion stages (1.348 vs. 1.062, p < .001). Similarly, VP/T was significantly higher for native speakers in the thesis (2.773 vs. 1.680, p < .001), argument (3.277 vs. 2.143, p < .001), and conclusion stages (3.505 vs. 2.000, p < .001). For C/S, native speakers demonstrated significantly higher values in the thesis (2.703 vs. 1.479, p < .001), argument (3.347 vs. 1.490, p < .001), and conclusion stages (2.963 vs. 1.174, p < .001). Beyond these measures, native speakers produced significantly longer sentences and T-units in certain stages. MLS was significantly higher for native speakers in the argument (28.805 vs. 18.218, p < .001) and conclusion stages (26.805 vs. 21.747, p < .001), while MLT was significantly higher in the argument stage (20.528 vs. 17.512, p < .001). In addition to mean differences, the two groups also differed in terms of variation. Standard deviations for native speakers were higher than those for ChatGPT in many syntactic complexity measures, except for MLC, CP/C, and CN/C across all stages, and CP/T in the thesis stage.
Patterns of syntactic complexity variation across rhetorical stages for ChatGPT and native speakers
Table 5 presents the stage-specific differences in syntactic complexity between ChatGPT and native speakers. Combined with findings from Table 4, for ChatGPT, significant differences were found across rhetorical stages in most syntactic complexity measures, except CN/T. For example, T/S was higher in the thesis stage than other two stages. For native speakers, differences were observed across rhetorical stages in nine syntactic complexity measures, except CT/T, DC/C, DC/T, CN/C and CN/T. For instance, MLS, C/T, T/S and C/S show the highest value in the argument stage.
According to the findings of stage-specific differences in syntactic complexity between ChatGPT and native speakers, the variation patterns across rhetorical stages were compared between ChatGPT-generated and native speakers-authored texts. Figure 1-4 illustrates the changes in syntactic complexity measure across rhetorical stages for ChatGPT and native speakers.
Note: S1, S2 and S3 represent Thesis, Argument and Conclusion. GPT and NS denote ChaGPT and native speakers.
Note: S1, S2 and S3 refer to Thesis, Argument and Conclusion. GPT and NS denote ChaGPT and native speakers.
Note: S1, S2 and S3 represent Thesis, Argument and Conclusion. GPT and NS denote ChaGPT and native speakers.
Note: S1, S2 and S3 represent Thesis, Argument and Conclusion. GPT and NS denote ChatGPT and native speakers.
Fig 1 illustrates the means of three length of production unit measures in every stage for ChatGPT and native speakers. For MLC, MLS, and MLT, ChatGPT and native speakers demonstrate different variation patterns across rhetorical stages. In ChatGPT, all these three measures peaked in the conclusion stage and dropped to their minimum in the argument stage. Comparatively for native speakers, there were significant differences for MLC in the thesis and conclusion stages, with the conclusion stage being significantly higher. For MLS and MLT, both the argument and conclusion stages were significantly higher than the thesis stage.
Fig 2 illustrates the means of four amount of subordination measures in every stage for ChatGPT and native speakers. Both groups displayed similar overall variation patterns across rhetorical stages, but differences still emerged in each group. In ChatGPT, all four measures followed a consistent trend across stages, with the argument stage showing significantly higher values than the thesis stage. But in native speakers, C/T was highest in the argument stage.
Fig 3 illustrates the means of three amount of coordination measures in every stage for ChatGPT and native speakers. For CP/C and CP/T, the conclusion stage was the highest in both groups. In ChatGPT, the conclusion stage had significantly higher values than both the thesis and argument stages. Similarly, in native speakers, the conclusion stage exhibited significantly higher values than the argument stage, which, in succession, was higher than the thesis stage. For T/S, the two groups displayed contrasting trends. In ChatGPT, the conclusion stage was significantly higher than the argument stage, while native speakers were highest in the argument stage.
Fig 4 illustrates the means of three phrasal sophistication measures and one overall sentence complexity measure in every stage for ChatGPT and native speakers. In general, ChatGPT exhibited more significant between-rhetorical-stage differences compared to native speakers for these measures. Neither group showed significant difference across rhetorical stages for CN/T. For CN/C, contrasting to ChatGPT, native speakers exhibited no significant difference. For VP/T, both groups exhibited different patterns. In ChatGPT, the argument stage showed the highest value, whereas the thesis stage showed the lowest. In native speakers, the conclusion stage was the highest, with the thesis stage being the lowest. For C/S, both groups exhibited the highest values in the argument stage. However, the lowest value for ChatGPT was in the conclusion stage, whereas for native speakers, it was in the thesis stage.
Discussion
This section discusses the key findings regarding the syntactic complexity and rhetorical stages in argumentative essays produced by ChatGPT and native speakers. It is organized into two subsections. The first examines syntactic complexity differences between ChatGPT and native writers within each rhetorical stage. The second compares the cross-stage variation patterns of syntactic complexity between the two groups.
Differences between ChatGPT and native speakers by rhetorical stages
Syntactic complexity across different rhetorical stages produced by ChatGPT and native speakers was one of the focuses of the current study. The thesis and conclusion stage did not suggest significant differences between ChatGPT and native speakers in part of length of production units and subordination units. These findings suggest that ChatGPT’s writing performance in these specific measures and stages closely aligns with that of native speakers in argumentative essays, further highlighting ChatGPT’s capability of producing texts that approximate native-like quality in certain aspects of argumentative writing [19].
First, ChatGPT produced significantly longer T-units than native speakers in the thesis stage, which may be attributed to ChatGPT’s information density that aligns with the prompts of argumentative writing, which embeds more information within a single T-unit in the thesis stage. In contrast, native speakers convey their core message directly, emphasizing clarity and conciseness expected in thesis statements [30]. This difference is illustrated in Examples 1 and 2, both as opening sentences of the essay.
- Ex. 1. In my opinion, smoking should be completely banned at all restaurants in the country due to health concerns, the impact on non-smokers, and the need for creating cleaner environments. (ChatGPT-thesis)
- Ex. 2. I don’t think that smoking should be banned at all the restaurants in Japan. (native speakers-thesis)
And in the argument stage, native speakers elaborate on their arguments with more words within a single T-unit, approaching more clarified expressions of their intended meaning and making the arguments more persuasive. This difference is illustrated in Examples 3 and 4: Example 3 demonstrates how native speakers provide detailed evidence to support their claims, whereas Example 4 shows ChatGPT presenting a less detailed explanation.
- Ex. 3. Next, I think smoking is disrespectful to the people who work at the restaurant. Even though the other customers may only have to endure the second hand smoke for one or 2 hours, the restaurants staff has to work for as many as 8 hours or more at a time, and so they will have to be around smoke much longer. (native speakers-argument)
- Ex. 4. Finally, the presence of smoking areas within restaurants can also create discomfort for customers who do not smoke. The lingering smell of smoke can taint the dining experience and cause unease. (ChatGPT-argument)
Second, ChatGPT produced significantly longer clauses than native speakers across all rhetorical stages, while its MLS were notably shorter than native speakers in the argument and conclusion stages. This partially aligns with Zindela’s [36] study, which reported that ChatGPT-generated argumentative essays featured lower MLS compared to human writers. The reason is that human writers, as Zhou et al. [43] noted, often rely on longer sentences to highlight key information and convey complex ideas. Our finding indicates that ChatGPT tends to generate longer clauses within shorter sentences in the argument and conclusion stages. Although clauses help link ideas and convey complex expressions [44], shorter sentences are less effective than longer sentences in expressing complicated meanings [45], thereby limiting the depth and complexity of argumentation essential for persuasive argumentative writing.
Third, ChatGPT used more CN/C and CN/T in all or some stages, indicating a tendency to produce denser nominal structures. This partially aligns with Wang’s finding [46], which showed that a notable increase in CN/C in student essays revised by ChatGPT compared to the original versions. High-level writers tend to employ more complex phrases, with the frequent use of nominal phrases standing out as a distinctive feature in academic English writing [39], which means ChatGPT’s capacity to mimic advanced writing patterns by increasing complex nominals in its generated text, similar to that of proficient writers. In contrast, native speakers produced more VP/T than ChatGPT across all stages, indicating their more frequent use of verb phrases. For instance, Example 3 illustrates this with nine verb phrases (is disrespectful to the people who work at the restaurant, work at the restaurant, may only have to endure the second hand, to endure the second hand, smoke for one or 2 hours, has to work for as many as 8 hours or more at a time, to work for as many as 8 hours or more at a time, will have to be around smoke much longer and to be around smoke much longer). By using verb phrases, native speakers include richer semantic information, constructing layered and nuanced arguments with greater logical depth. This frequent use of verb phrases enhances the clarity and persuasiveness of their writing, making their arguments more comprehensive and compelling.
Finally, ChatGPT used more CP/C and CP/T, alongside significantly lower amount of subordination measures and C/S across all stages. This is in line with Tudino and Qin’s [21] findings, suggesting that ChatGPT relies more on simpler sentence structures with coordination phrases, rather than the hierarchically layered subordinate structures typically used by native speakers. Subordinate clauses, as a form of structural expansion, reduce the cognitive effort required from readers by explicitly specifying the semantic relationships between ideas, thereby enhancing the readability of the text [28,47]. Therefore, ChatGPT’s coordination-heavy approach may place a greater cognitive burden on readers. Example 5, a ChatGPT text of conclusion section, includes one coordination phrase (ensuring comfort for all patrons, and promoting cleaner, healthier environments), but lacks complex T-units or dependent clauses. In contrast, example 6, a typical native speaker’s text, includes one complex T-units (I think that banning smoking in public places like restaurants is a great step in discouraging people from hurting themselves), one dependent clauses (that banning smoking in public places like restaurants is a great step in discouraging people from hurting themselves) but no coordination phrase. These examples highlight contrasting syntactic strategies in the conclusion stage: ChatGPT prioritizes clarity and simplicity through parallel structures, whereas native speakers employ subordinate structures to create more nuanced and reflective conclusions.
- Ex. 5. In conclusion, the complete ban of smoking in all restaurants is essential for protecting public health, ensuring comfort for all patrons, and promoting cleaner, healthier environments. (ChatGPT-conclusion)
- Ex. 6. I think that banning smoking in public places like restaurants is a great step in discouraging people from hurting themselves. (native speakers-conclusion)
However, native speakers produced more T/S than ChatGPT across all stages. A T-unit, defined as a production unit within a sentence that includes one main clause and any subordinate clauses directly or indirectly connected to it, serves as a key indicator of syntactic complexity [27]. As mentioned earlier, all measures related to subordination were consistently higher for native speakers across all stages. This suggests that native speakers incorporate both coordination and subordination structures in argumentative writing, often employing multiple T-units within a single sentence. In example 7, a native speaker’s argument text, the two sentences represent two T-units and complex T-units, with four dependent clauses (that it is much more healthy for the other people in the restaurants if smoking is banned at the restaurants, if smoking is banned at the restaurants, as the secondhand smoke is much more dangerous than the smoke they are inhaling and they are inhaling). This strategy enables the progressive development of arguments and a step-by-step articulation of viewpoints, thereby increasing the overall persuasiveness and coherence of argumentation.
- Ex. 7. Next, I agree that it is much more healthy for the other people in the restaurants if smoking is banned at the restaurants. It is not right to have to breathe in the secondhand smoke of many smokers around you as the secondhand smoke is much more dangerous than the smoke they are inhaling. (native speakers-argument)
It is noteworthy that the two groups differed in variation. Higher standard deviations in native speaker texts reflect greater individual differences in syntactic complexity, likely due to diverse writing styles, rhetorical choices, and language proficiency. In contrast, the lower variation in ChatGPT outputs points to more uniform syntactic patterns. As Jiang and Hyland [15] observe, ChatGPT language tends to be more rigid and formulaic, lacking the flexibility and stylistic diversity characteristic of human writing. While such consistency ensures structural stability, it may constrain the range of syntactic choices essential for nuanced academic expression.
Differences between ChatGPT and native speakers in cross-rhetorical stage variation pattern
Our study investigates ChatGPT’s and native speakers’ text syntactic complexity variation across rhetorical stages. We found that there is no variation in CN/T across all stages, the same for ChatGPT and for native speakers. This is an indication that ChatGPT can achieve a considerably comparable level of writing to native speakers in the use of complex nominal structures regardless of rhetorical stage.
In the thesis stage, ChatGPT used the fewest VP/T, while producing more CN/C compared to the argument stage, though still fewer than in the conclusion stage. Considered as the first stage, the thesis stage primarily focuses on presenting the writer’s stance or proposition, requiring clarity and conciseness rather than detailed elaboration [30,34]. This could explain the lowest VP/T in the thesis stage, as fewer verb phrases are needed for argument expansion at this stage. Furthermore, argumentative writing, as a frequent genre of academic writing, often features a condensed structure, incorporating phrasal modifiers within noun phrases and is much less explicit [28]. The intermediate use of CN/C indicates that complex nominal structures were employed to convey the core message precisely while avoiding overly intricate expressions. This balance between informativeness and readability is crucial for effectively establishing the argumentative foundation in the thesis stage.
In the argument stage, ChatGPT produced the shortest mean length of production units and used the fewest CN/C, while employing the highest amount of subordination and the most VP/T. This is possibly because, in the argument stage, writers are expected to express a stance clearly and substantiate this position with compelling evidence to effectively persuade the reader [48]. The greater use of subordination measures and VP/T suggest that ChatGPT can employ subordinate structures and verb phrases to connect and support multiple arguments, enabling a layered presentation of evidence that enhances persuasiveness. Meanwhile, the shortest mean length of production units and the fewest CN/C indicate ChatGPT’s preference for concise and straightforward expressions. While this approach enhances clarity, it may limit information density and depth, particularly in the argument stage, where more detailed evidence is often required to build a robust and convincing argument [34].
The conclusion stage witnessed the highest CN/C and T/S, but the lowest C/S. This suggests that ChatGPT employs more T-units, clauses and denser nominal structures to list the core arguments effectively, aligning with the summarizing function of the conclusion stage. Meanwhile, the lowest C/S reflects a simplification of syntactic complexity, likely aimed at enhancing readability and ensuring the summary remains concise and direct.
The overall syntactic complexity variation pattern across rhetorical stages for ChatGPT reflects its ability to adapt syntactic strategies to the communicative functions of different rhetorical stages in argumentative essays. It suggests that ChatGPT’s syntactic complexity varies systematically across stages, aligning with the functional demands of each rhetorical section.
Regarding native speakers, the pattern of syntactic complexity variation across part of genres differed from that of ChatGPT in some measures. In addition to CN/T, as previously mentioned, no significant differences were observed across rhetorical stages in CT/T, DC/C, DC/T, and CN/C. This suggests that native speakers consistently employ complex nominals and subordinate structures across rhetorical stages to achieve overall writing objectives. Such stability highlights their adaptability to varying rhetorical contexts without depending on a single syntactic strategy, reflecting their advanced proficiency in argumentative writing [23].
In the thesis stage, native speakers produced the shortest MLS and MLT, as well as the fewest VP/T and C/S, suggesting the tendency to use shorter sentences and T-units, with reduced reliance on verb phrases and clauses. This reflects a deliberate strategy in the thesis stage, which is to introduce the central argument clearly and concisely, ensuring that the reader quickly grasps the core focus of the essay [30,34]. Moreover, the use of shorter sentences and fewer clauses usually minimizes the cognitive load on readers [49], allowing grasping the proposition more efficiently and focusing on the central message.
In the argument stage, native speakers used more C/T and T/S, indicating that native speakers tend to construct sentences with more clauses and longer T-units to develop their arguments. This reflects the functional demands of the argument stage, where claims must be elaborated and supported with detailed reasoning and evidence [44]. The use of more C/T enables sentences to embed multiple logical relationships, such as causality, contrast, or condition, enriching the argument’s depth and precision. Similarly, longer T-units allow the integration of layered reasoning and supporting evidence within a single sentence, reducing fragmentation and ensuring logical coherence. By employing these syntactic strategies, native speakers enhance both the persuasiveness and cohesiveness of their arguments, effectively meeting the rhetorical demands of the argument stage.
In the conclusion stage, native speakers produced the longest clauses and used the most VP/T. This is consistent with the functional demands of the conclusion stage, which requires both a comprehensive synthesis of key arguments and a clear reinforcement of the central claim [34]. The use of longest clauses enables the integration of multi-layered information, while verb phrases facilitate logical connections between summarized ideas. This approach enhances the depth and clarity of the summary, reflecting native speakers’ ability to balance syntactic complexity with readability in argumentative writing.
The overall syntactic complexity variation pattern across rhetorical stages for native speakers is less variable than that for ChatGPT, indicating that native speakers show greater consistency in choosing syntactic structures to fulfill the communicative functions across various stages. In the thesis stage, native speakers employ shorter sentences, T-units and fewer verb phrases, emphasizing clarity and conciseness to directly present their central argument. In the argument stage, they effectively utilize subordinate structures and achieve high T/S values, enabling them to progressively elaborate on their claims and integrate multi-layered information within single sentences. In the conclusion stage, native speakers rely on longer clauses, a higher density of verb phrases to summarize key arguments and maintain clarity and organization. Their ability to combine denser information units with various syntactic strategies highlights their advanced proficiency in constructing nuanced, purpose-driven academic texts.
Conclusion
This study compared the syntactic complexity in ChatGPT’s and native speakers’ argumentative essays, especially from the perspective from rhetorical stages. The results indicated that while both groups showed similarities in part of syntactic complexity measures, some notable differences were observed. ChatGPT demonstrated more obvious favor toward using coordination and shorter sentence structures, particularly in the argument stage, while native speakers employed a greater variety of syntactic strategies, including more extensive subordination, which allowed for more nuanced argumentation and deeper logical relationships. Moreover, ChatGPT exhibited lower variation than native speakers in many syntactic complexity measures, suggesting a more rigid and formulaic language style [15].
This study holds significant implications. ChatGPT’s ability to produce well-structured argumentative texts demonstrates its potential to assist writing. For L2 learners, ChatGPT can serve as a valuable writing assistance tool, especially for enhancing syntactic complexity and adapting to rhetorical stages in argumentative essays. Its ability to produce coherent outputs across rhetorical stages contributes to the development of learners’ genre awareness. For educators, this study highlights the potential of using ChatGPT as a supplementary resource which can be integrated into the language classrooms. Educators should also guide students in strengthening their syntactic strategies and ensuring the depth of their rhetorical structures, moving beyond ChatGPT’s strengths and addressing its limitations in more nuanced writing tasks.
This study also has several limitations. First, it concentrated on syntactic complexity without considering other aspects such as coherence, lexical choice and writing style. Second, the study was based on a limited sample size of 146 essays. Third, the study focused solely on ChatGPT and argumentative essays, ignoring other large language models such as DeepSeek or Claude, as well as other genres like narrative or descriptive essays. Future studies should encompass more linguistic features, large language models and writing genres, as well as a larger sample size to provide a more comprehensive view of AI’s and human writers’ performance in English writing.
Acknowledgments
We would like to express our sincere appreciation to the creators and contributors of the ICNALE (International Corpus Network of Asian Learners of English) for their efforts in compiling and making the corpus publicly available for free. Their generous contribution has significantly supported academic research.
References
- 1. Gokturk N, Saricaoglu A. Examining teacher-written conference abstracts: Rhetorical functions and syntactic complexity features. Journal of English for Academic Purposes. 2024;72:101454.
- 2. Li Y, Yang R. Assessing the writing quality of English research articles based on absolute and relative measures of syntactic complexity. Assessing Writing. 2023;55:100692.
- 3. Saricaoglu A, Bilki Z, Plakans L. Syntactic complexity in learner-generated research paper introductions: Rhetorical functions and level of move/step realization. Journal of English for Academic Purposes. 2021;53:101037.
- 4. Zhou W, Li Z, Lu X. Syntactic complexity features of science research article introductions: Rhetorical-functional and disciplinary variation perspectives. Journal of English for Academic Purposes. 2023;61:101212.
- 5. Yin S, Gao Y, Lu X. Diachronic changes in the syntactic complexity of emerging Chinese international publication writers’ research article introductions: A rhetorical strategic perspective. Journal of English for Academic Purposes. 2023;61:101205.
- 6. Zhang Y, Cui J. The relationship between syntactic complexity and rhetorical stages in L2 learners’ texts: A comparative analysis. English for Specific Purposes. 2023;72:51–64.
- 7.
Hewings M. Materials for university essay writing. In: Harwood N, editor. English language teaching materials: Theory and practice. Cambridge: Cambridge University Press. 2010.
- 8. Sandra RP, Hwang WY, Zafirah A, Hariyanti U, Engkizar E, Hadi A. Crafting Compelling Argumentative Writing for Undergraduates: Exploring the Nexus of Digital Annotations, Conversational Agents, and Collaborative Concept Maps. Journal of Educational Computing Research. 2024;62(5):1327–57.
- 9. Nesi H, Gardner S. Variation in disciplinary culture: University tutors’ views on assessed writing tasks. British studies in applied linguistics. 2006;21:99–117.
- 10. McAlexander PJ. Ideas in practice: Audience awareness and developmental composition. Journal of Developmental Education. 1996;20(1):28–33.
- 11. Lu X. A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly. 2011;45(1):36–62.
- 12. Wu T, He S, Liu J, Sun S, Liu K, Han QL, et al. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica. 2023;10(5):1122–36.
- 13. Esmaeil AAA, Kiflee@Dzulkifli DNA, Maakip I, Mantaluk OO, Marshall S. Understanding student perception regarding the use of ChatGPT in their argumentative writing: A qualitative inquiry. Jurnal Komunikasi: Malaysian Journal of Communication. 2023;39(4):150–65.
- 14. Creese A, Blackledge A, Takhi JK. The ideal ‘native speaker’ teacher: negotiating authenticity and legitimacy in the language classroom. The Modern Language Journal. 2014;98(4):937–51.
- 15. Jiang F, Hyland K. Does ChatGPT argue like students? Bundles in argumentative essays. Applied Linguistics. 2024.
- 16. Jiang F, Hyland K. Does ChatGPT write like a student? Engagement markers in argumentative essays. Written Communication. 2024.
- 17. Bašić Ž, Banovac A, Kružić I, Jerković I. ChatGPT-3.5 as writing assistance in students’ essays. Humanities and Social Sciences Communications. 2023;10:750.
- 18. Van Noorden R, Perkel JM. AI and science: what 1,600 researchers think. Nature. 2023;621(7980):672–5.
- 19. Suchman K, Garg S, Trindade AJ. Chat Generative Pretrained Transformer Fails the Multiple-Choice American College of Gastroenterology Self-Assessment Test. Am J Gastroenterol. 2023;118(12):2280–2. pmid:37212584
- 20. Kong X, Liu C. A comparative genre analysis of AI-generated and scholar-written abstracts for English review articles in international journals. Journal of English for Academic Purposes. 2024;71:1014322.
- 21. Tudino G, Qin Y. A corpus-driven comparative analysis of AI in academic discourse: Investigating Chatgpt-generated academic texts in social sciences. J Acad Discourse. 2024;312:103838.
- 22.
Swales J. Genre analysis: English in academic and research settings. Cambridge University Press. 1990.
- 23. Youn SJ. Measuring syntactic complexity in L2 pragmatic production: Investigating relationships among pragmatics, grammar, and proficiency. System. 2014;42:270–87.
- 24. Maamuujav U, Olson CB, Chung H. Syntactic and lexical features of adolescent L2 students’ academic writing. Journal of Second Language Writing. 2021;53:100822.
- 25. Bulté B, Roothooft H. Investigating the interrelationship between rated L2 proficiency and linguistic complexity in L2 speech. System. 2020;91:102246.
- 26. Jagaiah T, Olinghouse NG, Kearns DM. Syntactic complexity measures: variation by genre, grade-level, students’ writing abilities, and writing quality. Reading and Writing. 2020;33(10):2577–638.
- 27. Mylläri T. Words, clauses, sentences, and T-units in learner language: precise and objective units of measure?. Journal of the European Second Language Association. 2020;4(1):13–23.
- 28. Biber D, Gray B. Challenging stereotypes about academic writing: Complexity, elaboration, explicitness. Journal of English for Academic Purposes. 2010;9(1):2–20.
- 29. Inoue C. A comparative study of the variables used to measure syntactic complexity and accuracy in task-based research. The Language Learning Journal. 2016;44(4):487–505.
- 30.
Lam AT, Thai CD, Thach CD, Phu TH, Chau TN, Mai BT, Phan HD. A study about EFL English-major students’ challenges in writing argumentative essays at Soc Trang Teachers’ Training College, Vietnam. In: Proceedings of the RSU International Research Conference 2020; Rangsit University; 2020. p. 1544–58.
- 31. Zheng Y, Barrot JS. Syntactic complexity in second language (L2) writing: Comparing students’ narrative and argumentative essays. System. 2024;123:103342.
- 32. Brown GTL, Marshall JC. The impact of training students how to write introductions for academic essays: an exploratory, longitudinal study. Assessment & Evaluation in Higher Education. 2012;37(6):653–70.
- 33. Kusel PA. Rhetorical approaches to the study and composition of academic essays. System. 1992;20(4):457–69.
- 34. Hyland K. A genre description of the argumentative essay. RELC Journal. 1990;21(1):66–78.
- 35. Tankó G. Literary research article abstracts: An analysis of rhetorical moves and their linguistic realizations. Journal of English for Academic Purposes. 2017;27:42–55.
- 36. Zindela N. Comparing measures of syntactic and lexical complexity in artificial intelligence and L2 human-generated argumentative essays. International Journal of Education and Development using Information and Communication Technology. 2023;19(3):50–68.
- 37. Ishikawa S. The ICNALE and sophisticated contrastive interlanguage analysis of Asian learners of English. ICNALE Journal. 2013;1(1):91–118.
- 38.
Cardon R, Pham TTH, Doueihi JZ, François T. Contribution of move structure to automatic genre identification: an annotated corpus of French tourism websites. In: Calzolari N, Kan MY, Hoste V, Lenci A, Sakti S, Xue N, editors. LREC-COLING 2024: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation; 2024 May; Torino, Italia. ELRA and ICCL; 2024. p. 3916–26.
- 39. Norris JM, Ortega L. Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics. 2009;30(4):555–78.
- 40. Lu X. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics. 2010;15(4):474–96.
- 41.
Wolfe-Quintero K, Inagaki S, Kim HY. Second Language Development in Writing: Measures of Fluency, Accuracy, and Complexity. Honolulu, HI: University of Hawaii Press; 1998.
- 42. Ortega L. Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics. 2003;24(4):492–518.
- 43.
Zhou T, Cao S, Zhou S, Zhang Y, He A. Chinese Intermediate English Learners outdid ChatGPT in deep cohesion: Evidence from English narrative writing. 2023;118:103141.
- 44. Pu L, Heng R, Cao C. The effects of genre on the syntactic complexity of argumentative and expository writing by Chinese EFL learners. Frontiers in Psychology. 2022;13:1047117.
- 45. Davis EA. Mean sentence length compared with long and short sentences as a reliable measure of language development. Child Development. 1937;8(1):69.
- 46.
Wang C. A syntactic complexity analysis of revised composition through artificial intelligence-based question-answering systems. In: Proceedings of the 2023 2nd International Conference on Artificial Intelligence and Computer Information Technology (AICIT), Yichang, China, 2023. 1–3.
- 47.
Pitler E, Nenkova A. Revisiting readability: a unified framework for predicting text quality. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2008. 186–95.
- 48.
Wood NV. Perspectives on argument. 1995.
- 49. Luna K, Albuquerque PB, Martín-Luengo B. Cognitive load eliminates the effect of perceptual information on judgments of learning with sentences. Memory & Cognition. 2018;47(1).