Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Comparative analysis of text readability and writing styles in AI-generated vs. Human-written academic abstracts

Abstract

Research article abstracts are vital in scientific publications for readers to assess a study’s significance. The increasing use of AI tools, such as Kimi, ChatGPT and DeepSeek, to generate abstracts raises concerns about their readability and writing styles compared to human-written ones. The study aims to compare the differences in text readability and writing styles between human-written against AI-generated abstracts. A total of 150 abstracts of high-impact journal articles in the field of linguistics and computer science, 75 from each discipline, and another 150 AI-generated abstracts from the same corpus of articles served as the source texts for analysis. The Readability Scoring System, a computational tool, yielded readability and writing style metrics, while expert evaluation was performed to assess the quality of AI-generated academic abstracts. The quantitative data generated were analysed using SPSS 27 with non-parametric statistical methods. Key findings revealed: (1) AI-generated abstracts exhibited significantly lower readability across eight metrics, indicating greater complexity and lower readability; (2) Discipline-specific analysis showed five differing metrics in linguistics and eight in computer science; (3) Interdisciplinary comparisons revealed non-significant differences across nine readability metrics, highlighting AI’s potential to mimic natural writing. However, it still faces challenges in generating lexically diverse content. These results underscored the current limitations of AI in generating readable and human-like abstracts, especially in technical fields.

1. Introduction

The integration of Artificial Intelligence (AI) in academic writing has spurred increasing research evaluating AI tools’ performance [13]. Such evaluations are crucial for informing researchers on the use of large language models (LLMs) in academic contexts [46]. Previous studies indicate AI’s transformative impact on academic writing and research across disciplines [13]. Tools like ChatGPT and Kimi can generate coherent, contextually relevant texts in diverse fields [7], enhancing productivity [1,8] and at times, mimicking human writing convincingly [9,10].

However, key questions remain regarding the readability and writing styles of AI-generated text. While AI may produce structured and grammatically sound content, its ability to capture the nuanced, discipline-specific knowledge of high-quality academic writing is unclear. Furthermore, comprehensive comparative analyses of AI-generated and human-written abstracts across different academic disciplines are lacking, hindering our understanding of AI’s broader impact on scholarly communication.

To address these gaps, this study evaluated the effectiveness of AI tools in generating research article abstracts by comparing AI-generated and human-written abstracts in linguistics and computer science. Specifically, it examines and compares the readability and writing styles of these abstracts. This research addressed the critical need to distinguish between human-written and AI-generated abstracts, with significant implications for ethical standards, information security, and accountability in an increasingly AI-driven academic landscape.

Although prior research has compared AI-generated and human-written texts in fields like journalism, business, and technical communication [1], studies focusing specifically on academic abstracts remain limited. Moreover, few have examined AI-generated abstracts across disciplines with contrasting writing conventions, such as linguistics and computer science. This study addressed this gap by systematically comparing text readability and writing style differences between AI-generated and human-written abstracts, contributing to the broader discussion on AI’s role in academic writing. Specifically, this study aims to address the following research question: To what extent do human and AI-generated research article abstracts show similar and different readability and writing styles across disciplines?

2. Literature review

2.1 Research article abstract writing

Academic abstracts are vital for research dissemination, offering concise summaries that enable researchers to quickly assess study relevance [11,12]. Mastering abstract writing is a key academic skill, as abstracts serve as condensed representations of entire studies [13]. While general abstract patterns exist, disciplinary variations in structure may occur [14]. Typically encompassing the research problem, methodology, and key findings, abstract quality and structure can differ significantly across journals [15]. Enhancing abstract writing involves focusing on accuracy, appropriate length, and effective language use [12,16]. Suggestions for improvement include greater consistency and standardisation across journals, improved author and reviewer training, and increased emphasis on abstract quality [15]. Despite these challenges, a strong conceptual understanding of writing and structural framing may aid authors in effectively presenting their research [13].

2.2 Text readability and writing styles variations in AI-generated abstracts

Readability, the ease with which a text is understood, is often quantified using formulas such as the Flesch Reading Ease, Gunning Fog Index, and Coleman-Liau Index (Flesch, 1948).These metrics analyse sentence length, word complexity, and syllable count, making them applicable to academic texts [17]. Clear and readable abstracts can improve citation rates and knowledge transfer [18]. The increasing use of AI in text generation has driven interest in comparing the readability and writing style of AI-generated and human-written academic content [1820]. While traditional readability formulas have limitations with academic writing’s specialized vocabulary and complex structures [21], they remain valuable tools for comparative analyses between human and AI-generated texts [22].

Writing style, encompassing lexical diversity and syntactic complexity, significantly impacts text quality and readability [23,24]. High lexical diversity can enrich a text, while appropriate syntactic complexity enhances clarity and precision. In academic writing, a well-crafted style is crucial for conveying complex ideas accessibly and authoritatively. Research on writing styles in academic contexts has examined linguistic complexity, finding that syntactic complexity, measured by sentence length and structure, and lexical complexity, referring to lexical diversity and sophistication, are key indicators of writing quality [25,26]. High-quality academic writing often exhibits greater syntactic complexity, lexical diversity, and the use of less frequent words [23,27].

2.3 Variations in human-written and AI-generated abstracts

Recent advancements in AI, particularly in large language models like GPT-4 and KIMI, have significantly improved automated academic text generation [28]. While AI-excels at grammatical accuracy and structural coherence [8], its output often follows a formulaic approach that lacks the nuances of human writing [29].

A key distinction lies in linguistic usage, whereby AI-generated texts tend to overuse nouns, pronouns, and adjectives, whereas human authors favour verbs and adverbs [7]. Furthermore, AI often employs complex vocabulary, while humans typically use simpler, more precise language [30]. These word choice differences contribute to variations in readability and stylistic fluidity.

AI-generated abstracts also tend to be less discipline-specific, struggling to incorporate field-specific nuances, which raises concerns for specialised academic writing [31]. Additionally, AI’s limited access to real-time data restricts the currency and relevance of the information it presents [29].

Comparative studies across disciplines reveal qualitative differences. Human-written abstracts generally offer more comprehensive discussions, emphasising context, methodology, and detailed results [31,32]. In contrast, AI-generated abstracts often prioritise clarity and purpose but lack depth. Interestingly, distinguishing between AI-generated and human-written abstracts can be challenging in some fields, like orthopaedic surgery [33,34]. Despite AI’s ability to produce scientifically accurate content, gaps in depth and overall quality persist, with AI-generated texts being more prone to redundancy and factual inaccuracies [35].

Disciplinary variations in writing styles further influence these comparisons. Fields such as Biology, Chemistry, Computer Science, and Psychology exhibit distinct trends in complexity and informativeness, with AI-generated texts often failing to match the depth and specificity of human work [36]. While AI continues to evolve as an academic writing tool, current limitations underscore the necessity of human oversight to ensure clarity, depth, and discipline-specific accuracy in academic abstracts.

2.4 Disciplinary variations in AI-generated abstracts

Disciplinary conventions significantly influence the effectiveness and readability of AI-generated abstracts. Different academic fields impose distinct expectations for writing style, rhetorical structure, and language use, shaping abstracts composition. While AI can mimic surface-level linguistic patterns, consistently adhering to discipline-specific norms remains challenging. Hard sciences, such as physics and engineering, often emphasise methodology and results, whereas soft sciences, including linguistics and sociology, prioritise theoretical discussions and argumentation [32]. These differences pose unique challenges for AI in generating discipline-appropriate abstracts.

In the social sciences, AI-generated abstracts exhibit linguistic limitations, including the overuse of uncommon academic vocabulary, limited subordinate structures, and a lack of syntactic diversity, resulting in formulaic and less natural text [37]. Humanities and social sciences tend to favour complex sentence structures and argument-driven writing, while science and engineering prioritise clarity and conciseness [38]. Interestingly, applied linguistics research has shown that experienced journal reviewers struggle to distinguish between AI-generated and human-written abstracts [10].

The increasing use of AI in academic writing also raises ethical concerns regarding authenticity, transparency, and potential misuse. While AI can aid in drafting and editing, its role in scientific communication requires careful management to maintain academic integrity [39]. Consequently, some publishers have implemented policies regulating AI use, from disclosure requirements to outright bans [40], reflecting growing concerns about responsible AI application in research dissemination.

3. Methods

3.1 Abstract data sources

This study analysed a corpus of 150 human-written and 150 Kimi-generated English academic abstracts. The principle of maximal variation was applied to guide the text selection among the top five journals of the respective field of study publications from the recent 3 years only. The human abstracts were sourced from ten Tier 1 journals indexed by WOS/SCI across two disciplines: linguistics and computer science, 5 from each discipline. The Kimi-generated abstracts were produced based on these corresponding articles after manually removing the original abstracts. Table 1 provides the information of the journals.

thumbnail
Table 1. Human-written and AI-generated corpus. 15 papers were selected from each of the 10 journals.

https://doi.org/10.1371/journal.pone.0343163.t001

Linguistics and computer science were selected due to their distinct academic writing and knowledge representation styles. Linguistics, a text-based discipline, emphasises qualitative analysis, narrative explanations, and theoretical discussions, typically employing a descriptive and discursive style focused on argumentation and detailed textual explanations [41]. Conversely, computer science, a symbolic-based field, often presented information through mathematical notation, algorithms, and formal structures, characterised by structured, concise, and formulaic writing incorporating technical terminology and symbolic representations [42,43].

3.2 Abstract generation

In this study, a total of 300 abstracts from 10 high-impact journals (linguistics and computer science) were selected as a control corpus of well-written abstracts.

The full-text article (PDF) was uploaded after manually removing the original abstract, so the model received only the main body of the paper and had no access to the human-written abstract. The AI tool Kimi (version 2.0) then generated 150 abstracts based on the uploaded file, using the prompt: “Please write an abstract of no more than 300 words based on the attached article.” All AI-generated abstracts, along with the corresponding human-written abstracts, were analysed using readability scoring systems and writing style evaluation tools. This allowed for a direct comparison of text readability and writing style scores between the two sources across the two disciplines.

3.3 Abstract evaluation

3.3.1 Automated machine assessment.

AI-generated and human-written abstracts were compared in terms of text readability and writing styles. Each abstract was evaluated using an online computational tool, the Readability Scoring System (https://readabilityformulas.com/readability-scoring-system.php) developed by Brain Scott (2024).

In this scoring system, text readability was further assessed through nine different readability metrics, namely, Automated Readability Index (ARI), Flesch Reading Ease, Gunning Fog Index, Flesch-Kincaid Grade Level, Coleman-Liau Index, SMOG Index, Linsear Write Readability Formula, FORCAST Readability Formula, Average Reading Level Consensus Calc.

Writing style was analysed through lexical diversity and syntactic complexity. The details are presented in Table 2.

thumbnail
Table 2. Measures of text readability and writing styles.

https://doi.org/10.1371/journal.pone.0343163.t002

Text readability assessment

  • Automated Readability Index (ARI): Determines the grade level required to comprehend a text by analysing sentence length and character count.
  • Flesch Reading Ease: Assigns a readability score between 0 and 100, where higher values indicate easier text and lower values suggest greater difficulty.
  • Gunning Fog Index: Estimates the grade level based on sentence length and the percentage of complex words (words with three or more syllables).
  • Flesch-Kincaid Grade Level: Similar to the Flesch Reading Ease formula but provides a U.S. school grade level for readability.
  • Coleman-Liau Index: Evaluates readability based on the number of characters per word and words per sentence, without relying on syllable counting.
  • SMOG Index (Simple Measure of Gobbledygook): Estimates the reading level required to understand a text by calculating the number of polysyllabic words.
  • Linsear Write Formula: Assesses readability based on sentence structure and word complexity, often used for technical and instructional writing.
  • FORCAST Readability Formula: Designed for specialised or technical texts, it predicts comprehension difficulty based on the use of simple and complex words.
  • Average Readability Consensus Calculation: Aggregates multiple readability scores to provide an overall reading level estimate.

Writing style analysis

To assess writing styles, this study examines lexical density and diversity, which provide insights into vocabulary richness, complexity, and formality:

  • Lexical Density: Measures the proportion of content words (nouns, adjectives, verbs, and adverbs) relative to the total number of words. Higher lexical density indicates more information-rich, formal writing, while lower density suggests a conversational style.
  • Lexical Diversity: Evaluates the variety of words used in a text by calculating the ratio of unique words to total words. A higher lexical diversity score signifies a broader vocabulary range, whereas a lower score may indicate repetitive language or domain-specific jargon.

3.3.2 Experts evaluation.

In addition to quantitative analyses, a qualitative evaluation was conducted using expert evaluation. Three experts experienced in scientific writing and publishing reviewed AI-generated and human-written abstracts, focusing on structural readability (sentence length, clause embedding, word length, and information packing) and lexicon-stylistic features (lexical variation, balance between repetition and variation, and functional use of vocabulary) (see Table 3). The evaluation aimed to identify how abstracts realized complex constructs, multi-variable relationships, and contextual information within each discipline. Experts’ judgments highlighted discipline-specific patterns in abstract writing and provided insights that complemented quantitative measures of syntactic complexity and lexical diversity.

thumbnail
Table 3. Pre-defined themes and sub-themes used in experts evaluation.

https://doi.org/10.1371/journal.pone.0343163.t003

3.4 Statistics and visualisation

Data generated were analysed using SPSS 27. Given the non-normal distribution of data (Shapiro-Wilk, p < 0.05), non-parametric statistical methods were employed. Since each AI-generated abstract was paired with its human counterpart from the same article (forming dependent observations), the Wilcoxon signed-rank test was used to compare the two related groups. Statistical significance was reported using median values (consistent with non-parametric testing conventions) and p-values to highlight key differences.

To effectively present the disparities in the data, violin plots were generated using Jamovi 2.6.23., providing a visual representation of the distribution, central tendency, and variability of the data across different sources.

3.5 Ethical statement

The study does not involve animal or human participants.

4. Results

The results were organised into three sections, aligned with the study’s objectives. The first section presented a comparison of readability and writing styles between human-written and AI-generated abstracts. The second section focused on intra-disciplinary comparisons within linguistics and computer science. The third section extended the analysis to an interdisciplinary perspective, comparing AI-generated abstracts across the two fields.

4.1 Variations between human-written abstracts and AI-generated abstracts

To compare human-written abstracts and AI-generated abstracts, 300 texts were used as corpus data. The results (median values and p value) are illustrated in Table 3 and visualised in Fig 1.

thumbnail
Fig 1. Violin plot of Human-written and AI-generated Abstract Comparison.

(1) Automated Readability Index (2) Flesch Reading Ease; (3) Flesch-Kincaid Grade Level; (4) Coleman-Liau Index; (5) FORCAST Readability Formula; (6) Average Reading Level Consensus Calc; (7) Lexical Density; (8) Diversity Analysis.

https://doi.org/10.1371/journal.pone.0343163.g001

Table 4 presented the percentiles 50th median value of readability and writing style measures for human-written and AI-generated abstracts, along with p-values indicating statistically significant differences in 8 indices versus similarities in 3 indices. AI-generated abstracts scored significantly higher on the Automated Readability Index (ARI), Flesch-Kincaid Grade Level, Coleman-Liau Index, and Readability Formula (p < 0.05), indicating greater complexity in sentence structure and vocabulary. For instance, the Coleman-Liau Index (Human 17.01 vs. AI 18.41, p < 0.001) and FORCAST Readability Formula (Human 12.93 vs. AI 13.36, p < 0.001) suggest AI uses longer words and sentences, reducing readability. Additionally, the lower Flesch Reading Ease score for AI (Human 17.00 vs. AI 11.50, p < 0.001) further confirms this difficulty, likely due to longer sentence structures and multi-syllabic words.

thumbnail
Table 4. Readability and writing style measures across the human and AI-generated abstracts.

https://doi.org/10.1371/journal.pone.0343163.t004

In terms of writing style, AI-generated abstracts had significantly higher lexical density (Human 63.70% vs. AI 66.30%, p < 0.001), suggesting a greater proportion of unique words and a more academic tone. However, their lower lexical diversity was significantly lower (Human 63.70% vs. AI 60.95%, p < 0.001), indicating more word repetition, possibly due to AI’s reliance on structured templates. While AI exhibited higher lexical density, its lower lexical diversity suggests a more rigid and repetitive writing style. These differences suggested AI-generated abstracts align more with formal academic writing in structure and vocabulary but may lack the natural variation and readability of human-written abstracts. The violin plot in Fig 1 further revealed the disparities in text readability and writing styles in the eight indices.

Expert evaluations aligned with the quantitative analyses, confirming clear differences between human-written and AI-generated abstracts. Experts noted that human-written abstracts employ complex sentence structures with embedded clauses and discipline-specific terminology, enabling nuanced presentation of background, methodology, and findings. This corresponds with the higher syntactic complexity and lexical diversity observed in the quantitative metrics. In contrast, AI-generated abstracts use shorter, simpler sentences and basic, repetitive vocabulary, enhancing readability but reducing informational depth and precision.

It involved the participation of 121 Chinese students, who are English language learners, at a university in the United States. Haws et al.’s (2010) questionnaire was used to examine the participants’ regulatory dispositions, and a judgment task was adapted from Bardovi-Harlig and Dörnyei (1998) to assess participants’ awareness of grammatical and pragmatic errors, as well as the severity of each type of error. (Paper 2-human-written)

The study involved 121 ESL speakers at a U.S. university, using a regulatory focus questionnaire, an error judgment task, and a demographic questionnaire.(Paper 2-AI-generated)

Experts highlighted that the functional lexical variation in human-written abstracts allows distinct research elements to be clearly conveyed, whereas AI-generated summaries, though concise, provide less differentiation between research components. Overall, expert judgment supports the quantitative findings: human-written abstracts prioritize scholarly depth and precision, while AI-generated abstracts favor accessibility and rapid comprehension.

Given the lack of research into native-speakerism among teachers of languages other than English (LOTEs), this qualitative study aims to bridge the gap by investigating the discriminatory and inclusive language employed in online recruitment for post-secondary institution instructors of LOTEs. (Paper 6-human-written)

This study examines the prevalence of native-speakerism in job advertisements for language teachers in the United States, focusing on languages other than English (LOTEs). (Paper 6-AI-generated)

Consequently, it can be concluded that AI-generated abstracts scored higher on multiple readability indices, indicating they are generally more complex and harder to read than human-written abstracts.

4.2 Discipline-specific variations in text readability and writing styles

This study compared AI and human performance within linguistics and computer science via Wilcoxon signed-rank test with descriptive statistics (percentiles 50th median value) and P value. The results were presented in Tables 4 and 5 and visualised in Figs 2 and 3.

thumbnail
Table 5. Variations of text readability and writing styles in linguistic journals.

https://doi.org/10.1371/journal.pone.0343163.t005

thumbnail
Fig 2. Violin-plot of human-written and AI-generated in Linguistics.

(1) Flesch Reading Ease; (2) Coleman-Liau Index; (3) FORCAST Readability Formula; (4) Lexical Density; (5) Diversity Analysis.

https://doi.org/10.1371/journal.pone.0343163.g002

thumbnail
Fig 3. Violin-plot of human-written and AI-generated in Computer Science.

(1) Automated Readability Index; (2) Flesch Reading Ease; (3) Flesch-Kincaid Grade Level; (4) Coleman-Liau; (5) FORCAST Readability Formula;(6) Average Reading Level Consensus Calc; (7) Lexical Density.

https://doi.org/10.1371/journal.pone.0343163.g003

Table 5 revealed significant differences in five indices and similar readability and writing styles in six indices between human-written and AI-generated abstracts in linguistics. These five differing indices mirrored the overall findings in Section 4.1, while the other three significant indices from that section were not significantly within linguistics alone.

Consistent with Section 4.1, AI-generated linguistics abstracts had significantly lower Flesch Reading Ease scores (p < 0.05), indicating lower readability. Additionally, the Coleman-Liau Index was significantly higher for AI-generated abstracts (p < 0.001), suggesting more long words and complex sentences. The Forecast Readability Formula also showed significantly higher complexity for AI (p < 0.001). Regarding writing style, AI-generated abstracts had significantly higher lexical density (p < 0.05), indicating more content words. However, lexical diversity was significantly lower for AI-generated abstracts (p < 0.001), suggesting more repetitive word use. Fig 2 visually highlights these five differences. These findings suggest that while AI-generated linguistics abstracts can match human complexity in some ways, they tend to be less readable and less lexically diverse [44].

The results indicated that AI-generated abstracts in linguistics can closely mimic the human complexity in certain aspects but are generally less readable and employ less varied language. The higher complexity and lower readability, as indicated by the Coleman-Liau Index and FORCAST Readability Formula, may stem from AI’s tendency to use more complex sentence structures and vocabulary, potentially increasing information density but reducing accessibility [45]. The higher lexical density suggests a greater concentration of content words, which could be beneficial for conveying detailed information but challenging for readers with varying expertise. The significantly lower lexical diversity in AI abstracts suggests that human-written abstracts employ a broader range of vocabulary and sentence structures, potentially improving reader engagement and comprehension, a critical aspect of academic writing that AI currently struggles to fully replicate.

In comparison with linguistics shown in Table 4, Table 5 shows the results of variations of text readability and writing style in computer science journals.

Table 6 shows that only four metrics (Gunning Fog Index, SMOG Index, Linsear Write Readability Formula and Diversity analysis) did not significantly differ between human-written and AI-generated abstracts in computer science. Conversely, seven metrics showed significant differences.

thumbnail
Table 6. Variations of text readability and writing styles in computer science journals.

https://doi.org/10.1371/journal.pone.0343163.t006

Table 6 and Fig 3 indicate that AI-generated abstracts generally had higher readability scores on metrics such as the Automated Readability Index (AI: 19.67 vs. Human: 18.61, p < 0.01) and Flesch-Kincaid Grade Level (AI: 18.13 vs. Human: 16:99, p < .05). Additionally, AI showed higher lexical density (AI:67.80% vs. Human: 64.20%, p < 0.01), suggesting a greater use of content words. However, the Flesch Reading Ease score was lower for AI (10.00 vs. Human:16.00, p < 0.005), indicating lower readability.

In terms of writing style, AI-generated computer science abstracts tend to prioritise clarity and purpose but often lack depth. They scored higher on the Coleman-Liau Index (AI:18.46 vs. Human: 17.02) and SMOG Index (AI:15.45 vs. Human: 14.87, p < 0.05), further suggesting complexity. Despite these higher readability scores, AI abstracts often miss the depth and specificity of human-written ones, which tend to incorporate more verbs and adverbs, enhancing the narrative flow and coherence of the text.

Experts observed that human-written abstracts typically integrate multiple analytical dimensions within single, syntactically complex sentences, allowing abstract constructs, methodological design, and variable distinctions to be foregrounded simultaneously (e.g., embedding participant scope, data volume, and measurement levels in one sentence). This pattern aligns with the higher syntactic complexity and lexical diversity found in the quantitative results.

By contrast, AI-generated linguistics abstracts tend to distribute the same information across shorter, sequential sentences. Although this restructuring improves readability, experts noted that it reduces information density and weakens the functional differentiation of research components, resulting in a flatter representation of theoretical and methodological relationships. It is indicated that human-written linguistics abstracts emphasize disciplinary depth and analytical precision, whereas AI-generated abstracts prioritize accessibility.

Across two weeks, 16 Hong Kong EFL students completed pre-and post-trait-level surveys and generated 1,120 state-level responses via the experience sampling method (ESM). (Paper 8-human-written)

Using an experience sampling method (ESM), the research collected real-time data from 16 Hong Kong EFL students over 14 days to examine both trait (in-class, out-of-class, and digital) and state (momentary) levels of L2 WTC. (Paper 8-AI-generated)

Expert evaluations also revealed systematic differences within computer science journals. Human-written abstracts use long, information-dense sentences that integrate system types, control strategies, theoretical properties, and experimental results, reflecting the field’s emphasis on precise modeling and algorithmic specification. AI-generated abstracts split the same information into shorter sentences with more general technical vocabulary, enhancing readability but weakening differentiation among methods, system characteristics, and outcomes. Overall, human-written CS abstracts prioritize technical precision and integrated presentation, whereas AI-generated abstracts favor clarity and linear readability.

In order to guarantee the stability of the closed-loop system, a generalised nonlinear proportional differential controller is designed to configure the system into a desired linear constant system. Then, an explicit reference governor for high-order system is introduced to modify the reference signal such that the system state and the state derivatives of certain orders always remain within a prescribed constraint set. (Paper 134-human-written)

The control strategy involves two steps: first, designing a generalized nonlinear proportional differential (PD) controller to stabilize the system; second, constructing an ERG to modify the reference signal to ensure state constraints are not violated. (Paper 134-AI-generated)

In conclusion, while AI can replicate certain aspects of human writing in computer science, significant differences exist across eight metrics, highlighting AI’s limitations in mimicking human writing styles in this field. AI-generated abstracts are generally more complex and less readable, suggesting that AI struggles to capture the nuances of human writing in computer science. This aligns with previous research that found AI often prioritized clarity over depth. For instance, a comparative genre analysis of AI-generated and scholar-written abstracts for English review articles in international journals found that AI-generated texts often prioritise clarity and purpose statements but may lack the depth and critical nuance found in human-written texts [32]. This is particularly relevant in specialised fields such as computer science, where the ability to convey complex ideas with precision and clarity is crucial. The higher readability scores and lexical density in AI-generated abstracts suggest an aptitude for formal, structured content. However, the lower Flesch Reading Ease scores indicate potential readability issues due to complex sentences and vocabulary, mirroring findings in other fields like orthopaedic surgery [33].

4.3 Text readability and writing styles variations between two disciplines

Interdisciplinary analysis was conducted across two disciplines to identify the similarities and differences of text readability and writing styles of AI-generated abstracts. The results are illustrated in Table 7 and Fig 4.

thumbnail
Table 7. AI-generated abstract comparison in linguistics and computer science.

https://doi.org/10.1371/journal.pone.0343163.t007

thumbnail
Fig 4. Violin-plot of AI-generated abstracts across two disciplines.

(1) Lexical Density; (2) Diversity Analysis.

https://doi.org/10.1371/journal.pone.0343163.g004

Table 7 compares AI-generated abstracts in linguistics and computer science across various readability and lexical measures. The results showed significant differences in 2 7 out of 11 indices (p < 0.001) when comparing AI-generated abstracts across linguistics and computer science. AI-generated abstracts in computer science and linguistics exhibit overall similarity across most readability metrics, as indicated by non-significant differences in measures such as the Automated Readability Index (19.42 vs. 19.67, p = .714), Flesch Reading Ease (14.00 vs. 10.00, p = .131), and Flesch-Kincaid Grade Level (17.13 vs. 18.13, p = .121), among others. However, notable distinctions emerge in lexical characteristics: computer science abstracts demonstrate significantly higher lexical density (67.80% vs. 64.60%, p < .001) and lexical diversity (62.70% vs. 59.40%, p < .001) compared to those in linguistics. Thus, while the two fields’ AI-generated abstracts are comparable in readability, they differ markedly in lexical richness.

Expert evaluation of AI-generated abstracts across disciplines revealed differences in readability and writing style between linguistics and computer science journals, which align with the findings from the quantitative analyses. In linguistics journals, AI abstracts use short, sequential sentences and general academic vocabulary, which improves readability but reduces differentiation of theoretical constructs, methodological details, and contextual factors. In computer science, AI abstracts also employ concise sentences but focus on stepwise procedural description and technical operations, enhancing procedural clarity while limiting conceptual nuance. AI-generated abstracts prioritize readability across disciplines, but linguistics emphasizes construct and context clarity, whereas computer science emphasizes methods and system specification, reflecting discipline-specific writing conventions.

This study investigates the impact of implicit corrective feedback, specifically recasts, on the production of lexical stress in L2 English among Arabic-speaking learners. (Paper 9-AI-generated)

This study aims to optimize and validate a multi-parametric wearable platform for stress level assessment using physiological signals. (Paper 93-AI-generated)

In conclusion, AI-generated computer science abstracts were significantly more complex and less readable than those in linguistics, as reflected in higher Gunning Fog Index and the Flesch-Kincaid Grade Level scores. They also exhibited higher lexical density and greater word repetition compared to linguistics abstracts. This finding aligned with previous research highlighting ChatGPT’s limitations in academic writing. Specifically, Tudino and Qin (2024) identified issues such as the overuse of infrequent “academic” vocabulary, limited use of subordination, and a lack of syntactic and semantic diversity. While higher lexical density may suggest richer vocabulary, it can also reduce readability. In contrast, AI-generated linguistics abstracts, though still complex, tended to be more readable and less dense, possibly due to the nature of the field.

5. Discussion

Despite the rapid adoption of AI tools in academic writing, existing research has yet to systematically compare the readability and stylistic features of AI-generated versus human-written academic abstracts—particularly across distinct disciplines. This study addresses this gap by analysing AI-generated and human-written abstracts in linguistics and computer science, with core findings showing that AI-generated abstracts exhibit significantly lower readability across eight key metrics and display discipline-specific variations in readability patterns, while struggling with lexical diversity relative to human-written counterparts.

5.1 Human-written and AI-generated abstracts variations

The present study revealed that AI-generated research abstracts were significantly different from human-written ones in 6 readability metrics (including the Automated Readability Index, Flesch Reading Ease, and Lexical Diversity Analysis) and 2 writing style indices (lexical density and lexical diversity). AI tended to use longer sentences and more complex structures, reducing the readability, making them difficult to read and comprehend. The narrower use of these features in GenAI writing may result from the models’ training data and algorithms [6]. Additionally, AI-generated abstracts displayed higher level of lexical density and lower lexical diversity, suggesting a more rigid and repetitive writing style.

Our results with KIMI 2.0 are consistent with Huang’s (2025) findings from ChatGPT-4o and OpenAI-o1: both models tend to prioritize syntactic complexity over communicative clarity, as they learn to mimic formal linguistic registers from training data but lack an inherent grasp of reader needs [46]. This tendency is particularly pronounced in academic contexts, where AI may over-utilize technical jargon or convoluted sentence structures to signal “scholarly tone,” as observed in comparative studies of AI and human research papers.

The reduced lexical diversity in KIMI 2.0-generated abstracts further resonate with findings by Mo (2025), who demonstrated that ChatGPT, a representative large language model, tends to generate text with narrower vocabulary ranges compared to human writers, relying on high-frequency terms to maintain fluency. In academic abstracts—where conciseness and precision are paramount—this limitation risks obscuring nuanced contributions or alienating readers unfamiliar with field-specific terminology. These results reinforce calls for “readability-aware” AI training, such as integrating metrics like the Flesch-Kincaid Grade Level into loss functions to balance technical accuracy with accessibility.

5.2 Intra-disciplinary variations

In line with the findings in 5.1, the comparisons between AI and human performance within linguistics and computer science indicated that AI-generated linguistics abstracts had significantly lower Flesch Reading Ease scores (p < 0.05), indicating lower readability. And lexical diversity was significantly lower for AI-generated abstracts (p < 0.001), suggesting more repetitive word use.

Linguistics, a field centered on language structure and meaning, showed discrepancies in metrics like Lexical Density, which aligns with findings by Mindner, et al. (2023) that ChatGPT struggles to replicate the nuanced lexical choices critical to linguistic analysis, often over-simplifying theoretical distinctions or overusing jargon [19]. In contrast, computer science—characterized by algorithmic descriptions and technical rigor—exhibited broader divergences, including in the SMOG Index, which measures syllable complexity. This echoes observations by Chen et al. (2024) that AI-generated technical prose in computer science tends to over-embed complex terms (e.g., “neural network architectures,” “computational complexity”) without contextual scaffolding, inflating readability scores beyond typical human-written norms [47].

These variations underscore the need for domain-adapted AI training, as Xie et al. (2023) argue that models fine-tuned on discipline-specific corpora better align with field-specific readability norms. The lower overall readability of AI-generated abstracts—evidenced by higher Flesch-Kincaid Grade Levels and lower Flesch Reading Ease scores—also supports Benber et al.’s (2024) observation that LLMs AI prioritize syntactic complexity over communicative clarity, a tendency that is particularly problematic in academic contexts where accessibility is key to knowledge dissemination.

Notably, both fields showed significant differences in the FORCAST Readability Formula, a metric designed for technical texts. This suggests that while AI may grasp discipline-specific terminology, it fails to calibrate the “density” of such terms to match human conventions—likely because training data includes uneven distributions of disciplinary texts, leaving gaps in AI’s ability to emulate field-specific stylistic norms.

5.3 Interdisciplinary variations

Interdisciplinary analysis indicated non-significant differences across nine readability metrics support the notion that AI has internalized general features of academic discourse, such as structural conventions and syntactic patterns. This finding aligns with research by Chen et al. (2024), who argued that that ChatGPT performs equally well as human participants in four out of the five tested pragmalinguistic features and five out of six sociopragmatic features. Additionally, the conversations generated by ChatGPT exhibit higher syntactic diversity and a greater sense of formality compared to those written by humans [47].

However, the persistent deficit in lexical diversity across disciplines echoes critiques by Du, et al. (2025), who highlighted that AI’s reliance on statistical co-occurrence patterns leads to repetitive phrasing, undermining the rhetorical function of abstracts—to engage readers through varied and precise language. The increased rates of repetition seen in texts produced by AI point to a potential constraint in their capacity to build a diverse and contextually relevant lexicon [48]. This limitation is particularly striking when compared to human abstracts, which consistently demonstrate higher lexical variation to emphasize novelty.

Addressing this gap may require hybrid approaches: combining AI’s efficiency in drafting with human oversight to refine lexical choices, as proposed by Hemmer, et al. (2023), who found that such collaboration improved both readability and diversity in academic writing [49,50]. Additionally, fine-tuning models on discipline-specific corpora annotated for lexical variation could help AI better emulate the nuanced word choice that distinguishes effective academic abstracts.

6. Conclusion

This study compared the readability and writing styles of human-written and AI-generated research abstracts in linguistics and computer science. AI-generated abstracts were found to be more complex and less readable across multiple metrics. Intra-disciplinary analysis showed AI struggled to replicate human writing in computer science more than in linguistics, while inter-disciplinary comparison revealed broader challenges in achieving coherence and natural variation. These findings underscored current limitations of AI in academic writing, particularly in technical fields requiring precise and accessible communication. Although AI tools offer potential benefits in assisting with abstract generation, they require further refinement to better align with human writing conventions.

7. Limitations and recommendations for future directions

This study has several limitations. First, it only used Kimi for AI-generated abstracts, without comparing it to other models such as ChatGPT or DeepSeek. Future research should explore differences across multiple AI systems for a more comprehensive understanding of readability and writing style variations. Second, the study focused on only two disciplines, limiting the generalisability of the findings.

Given these limitations, several key recommendations emerge. First, future researchers could explore strategies to improve AI-generated academic texts by fine-tuning language models for enhanced readability and stylistic adaptability across disciplines. Second, researchers could broaden the analysis to include more technical fields such as physics and engineering, comparing them with social sciences to examine discipline-specific trends in AI-generated and human-written abstracts. Third, academic institutions and educators may integrate AI into research workflows by providing targeted training that teaches researchers to critically evaluate AI outputs, establishing mentorship programs for ethical AI use, and investing in discipline-specific AI tools. This integration should not only be widespread but also deep, ensuring that all facets of academic work are touched by AI’s transformative capabilities. Lastly, journal editors should set clear rules for authors to disclose AI use, train reviewers to assess AI-generated content, and collaborate with developers to align AI tools with journal standards.

References

  1. 1. Khalifa M, Albadawy M. Using artificial intelligence in academic writing and research: an essential productivity tool. Comput Methods Prog Biomed Updat. 2024;5(March):100145.
  2. 2. Radtke A, Rummel N. Generative AI in academic writing: does information on authorship impact learners’ revision behavior?. Comput Educ Artif Intell. 2025;8(June 2024):100350.
  3. 3. Garg S, Ahmad A, Madsen DØ. Academic writing in the age of AI: comparing the reliability of ChatGPT and bard with scopus and web of science. J Innov Knowl. 2024;9(4).
  4. 4. Mishra T, Sutanto E, Rossanti R, Pant N, Ashraf A, Raut A, et al. Use of large language models as artificial intelligence tools in academic research and publishing among global clinical researchers. Sci Rep. 2024;14(1):31672. pmid:39738210
  5. 5. Porsdam Mann S, Vazirani AA, Aboy M, Earp BD, Minssen T, Cohen IG, et al. Guidelines for ethical use and acknowledgement of large language models in academic writing. Nat Mach Intell. 2024;6(11):1272–4.
  6. 6. Mo Z, Crosthwaite P. Exploring the affordances of generative AI large language models for stance and engagement in academic writing. J English Acad Purp. 2025;75:1–18.
  7. 7. Georgiou GP. Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool. 2023. 1–11.
  8. 8. Parteka A, Kordalska A. Artificial intelligence and productivity: global evidence from AI patent and bibliometric data. Technovation. 2023;125:102764.
  9. 9. Fleckenstein J, Meyer J, Jansen T, Keller SD, Köller O, Möller J. Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays. Comp Educ Artif Intellig. 2024;6:100209.
  10. 10. Casal JE, Kessler M. Can linguists distinguish between ChatGPT/AI and human writing?: a study of research ethics and academic publishing. Res Methods in Appl Ling. 2023;2(3):100068.
  11. 11. Swales JM. Genre analysis: English in academic and research settings. In: The discourse studies reader: main currents in theory and analysis. Cambridge: John Benjamins Publishing Company; 2014. 306–16.
  12. 12. Peh WCG, Ng KH. Abstract and keywords. Singapore Med J. 2008;49(9):664–5; quiz 666. pmid:18830537
  13. 13. Tarvadi PV. Research writing: review. Forensic Res Criminol Int J. 2016;2(6):221–2.
  14. 14. Musa NF, Khamis N. Research article writing: a review of a complete rhetorical organisation. 2015.
  15. 15. Budiyono S, Fadhly FZ. A qualitative evidence synthesis of article abstract writing in Elt and literature journals. Eng Rev J Eng Educ. 2023;11(1):253–62.
  16. 16. Adinkrah-Appiah PK, A AM, Adaobi CC, Owusu-Addo A. Innovative research: writing an effective abstract to improve your article quality and readability. Int J Res Sci Innov. 2021.
  17. 17. Kincaid JP. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Chief Nav Tech Train. 1975.
  18. 18. Jin T, Duan H, Lu X, Ni J, Guo K. Do research articles with more readable abstracts receive higher online attention? Evidence from Science. Scientometrics. 2021;126(10):8471–90.
  19. 19. Mindner L, Schlippe T, Schaaff K. Classification of human- and AI-generated texts: investigating features for ChatGPT. Lect Notes Data Eng Commun Technol. 2023;190:152–70.
  20. 20. Zaleski AL, Berkowsky R, Craig KJT, Pescatello LS. Comprehensiveness, accuracy, and readability of exercise recommendations provided by an AI-based chatbot: mixed methods study. JMIR Med Educ. 2024;10(1):1–15.
  21. 21. McNamara DS, Graesser AC, McCarthy PM, Cai Z. Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press; 2014.
  22. 22. Kim M, Crossley SA. Corrigendum to “Modeling second language writing quality: a structural equation investigation of lexical, syntactic, and cohesive features in source-based and independent writing” [Assess. Writ. 37C (2018) 39–56]. Assess Writ. 2018;38:56–60.
  23. 23. Crossley S. Linguistic features in writing quality and development: an overview. J Writ Res. 2020;11(3):415–43.
  24. 24. Yang Y, Yap NT, Ali AM. A review of syntactic complexity studies in context of EFL/ESL writing. Int J Acad Res Bus Soc Sci. 2022;12(10):441–54.
  25. 25. McNamara DS, Crossley SA, McCarthy PM. Linguistic features of writing quality. Writ Commun. 2010;27(1):57–86.
  26. 26. Zou Y, Sathiamoorthy K, Kaur GS. Influence of task complexity on text features and writing scores: Evidence from college students in southern China. SAGE Open. 2024;1101:1–12.
  27. 27. Crossley SA, McNamara DS. Understanding expert ratings of essay quality: coh-Metrix analyses of first and second language writing. Int J Contin Eng Educ Life-long Learn. 2011;21(2/3):170.
  28. 28. Zahid IA, Joudar SS, Albahri AS, Albahri OS, Alamoodi AH, Santamaría J. Unmasking large language models by means of OpenAI GPT-4 and Google AI: a deep instruction-based analysis. Intell Syst with Appl. 2024;23.
  29. 29. Hohenstein J, Kizilcec RF, DiFranzo D, Aghajari Z, Mieczkowski H, Levy K, et al. Artificial intelligence in communication impacts language and social relationships. Sci Rep. 2023;13(1):5487. pmid:37015964
  30. 30. Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, et al. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digit Med. 2023;6(1):75. pmid:37100871
  31. 31. Hsu T-W, Tseng P-T, Tsai S-J, Ko C-H, Thompson T, Hsu C-W, et al. Quality and correctness of AI-generated versus human-written abstracts in psychiatric research papers. Psychiatry Res. 2024;341:116145. pmid:39213714
  32. 32. Kong X, Liu C. A comparative genre analysis of AI-generated and scholar-written abstracts for English review articles in international journals. J Eng Acad Purp. 2024;71:101432.
  33. 33. Nabata KJ, AlShehri Y, Mashat A, Wiseman SM. Evaluating human ability to distinguish between ChatGPT-generated and original scientific abstracts. Updates Surg. 2025;77(3):615–21. pmid:39853655
  34. 34. Stadler RD, Sudah SY, Moverman MA, Denard PJ, Duralde XA, Garrigues GE. Identification of ChatGPT-generated abstracts within shoulder and elbow surgery poses a challenge for reviewers. Arthrosc J Arthrosc Relat Surg. 2024.
  35. 35. Ma Y, Liu J, Yi F, Cheng Q, Huang Y, Lu W. AI vs. Human -- Differentiation Analysis of Scientific Content Generation. 2023. http://arxiv.org/abs/2301.10416
  36. 36. Ma Y, Liu J, Yi F, Cheng Q, Huang Y, Lu W. Is this abstract generated by AI? A research for the gap between AI-generated scientific text and human-written scientific text. 2023;1(1):1–17.
  37. 37. Tudino G, Qin Y. A corpus-driven comparative analysis of AI in academic discourse: Investigating ChatGPT-generated academic texts in social sciences. Lingua. 2024;312:103838.
  38. 38. Yoong Wei H, Razali AB, Abd Samad A. Writing abstracts for research articles: towards a framework for move structure of abstracts. World J Eng Lang. 2022;12(6):492.
  39. 39. Thorp HH. ChatGPT is fun, but not an author. Science. 2023;379(6630):313. pmid:36701446
  40. 40. Flanagin A, Bibbins-Domingo K, Berkwits M, Christiansen SL. Nonhuman “Authors” and implications for the integrity of scientific publication and medical knowledge. JAMA. 2023;329(8):637–9. pmid:36719674
  41. 41. Biber D. Grammatical complexity in academic English: linguistic change in writing. Cambridge University Press; 2016.
  42. 42. Wallwork A. English for writing research papers. Springer; 2016.
  43. 43. Zobel J. Writing for computer science. 2014.
  44. 44. Berber Sardinha T. AI-generated vs human-authored texts: a multidimensional comparison. Appl Corpus Ling. 2024;4(1):100083.
  45. 45. Reviriego P, Conde J, Merino-Gómez E, Martínez G, Hernández JA. Playing with words: comparing the vocabulary and lexical diversity of ChatGPT and humans. Mach Learn with Appl. 2024;18(November):100602.
  46. 46. Huang Y, Li D, Cheung AKF. Evaluating the linguistic complexity of machine translation and LLMs for EFL/ESL applications: an entropy weight method. Res Methods Appl Ling. 2025;4(3):100229.
  47. 47. Chen X, Li J, Ye Y. A feasibility study for the application of AI-generated conversations in pragmatic analysis. J Pragm. 2024;223:14–30.
  48. 48. Du M, Lu M, Dai Y, Wang F. A corpus-based analysis of verb collocations in human and AI-generated IELTS writing. 2025.
  49. 49. Hemmer P, Westphal M, Schemmer M, Vetter S, Satzger G. Human-AI collaboration: the effect of AI delegation on human task performance and task satisfaction. In: 28th International Conference on Intelligent User Interfaces (IUI ’23), Sydney, NSW, Australia; 2023.
  50. 50. Malik AR, Pratiwi Y, Andajani K, Numertayasa IW, Suharti S, Darwis A. Exploring artificial intelligence in academic essay: higher education student’s perspective. Int J Educ Res Open. 2023;5(August):100296.