Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Lexical feature analysis of Chinese informed consent forms based on the information entropy methods: A paired study of minor and their guardian’ version

  • Qiansu Yang ,

    Contributed equally to this work with: Qiansu Yang, Yining Wang

    Roles Conceptualization, Methodology, Resources, Writing – original draft

    Affiliations School of Information and Electronics, Beijing Institute of Technology, Beijing, China, Department of Pharmacy, Medical Supplies Center of Chinese PLA General Hospital, Beijing, China

  • Yining Wang ,

    Contributed equally to this work with: Qiansu Yang, Yining Wang

    Roles Methodology, Writing – original draft

    Affiliation School of Information and Electronics, Beijing Institute of Technology, Beijing, China

  • Wenbin Shi,

    Roles Validation

    Affiliations School of Information and Electronics, Beijing Institute of Technology, Beijing, China, The Key Laboratory of Brain Health Intelligent Evaluation and Intervention, Ministry of Education, Beijing, China, The JiaXing Key Laboratory of Intelligent Management for CPCR and Severe Infections (A), Jiaxing, China

  • Zhenzhen Li,

    Roles Resources

    Affiliation Department of Medical Engineering, Medical Supplies Center of Chinese PLA General Hospital, Beijing, China

  • Hong Liang,

    Roles Resources

    Affiliation Department of Pharmacy, Medical Supplies Center of Chinese PLA General Hospital, Beijing, China

  • Jiang Cao,

    Roles Resources

    Affiliation Department of Pharmacy, Medical Supplies Center of Chinese PLA General Hospital, Beijing, China

  • Nan Bai ,

    Roles Conceptualization, Writing – review & editing

    13810535576@126.com (NB); chien-hung.yeh@bit.edu.cn (C-HY)

    Affiliation Department of Pharmacy, Medical Supplies Center of Chinese PLA General Hospital, Beijing, China

  • Chien-Hung Yeh

    Roles Methodology, Writing – review & editing

    13810535576@126.com (NB); chien-hung.yeh@bit.edu.cn (C-HY)

    Affiliations School of Information and Electronics, Beijing Institute of Technology, Beijing, China, The Key Laboratory of Brain Health Intelligent Evaluation and Intervention, Ministry of Education, Beijing, China, The JiaXing Key Laboratory of Intelligent Management for CPCR and Severe Infections (A), Jiaxing, China

Abstract

High-quality informed consent forms (ICFs) are crucial to facilitate effective communication between researchers and patients. However, the complex and specialized terminologies in ICFs often result in biased or late interpretations, thus hindering the ability to make decisions in line with their own will. Therefore, evaluating the readability of ICFs is an important task for Institutional Review Boards (IRBs) and regulatory agencies. This study proposes the use of information theory methods, including Shannon entropy and its extension—Rényi entropy (α values = 0, 0.5, and 1.5), as a set of comparisons, to quantify the lexical characteristics of Chinese ICFs, for either minors or their guardians. The Shannon entropy and Rényi entropy values of minor-version ICFs were significantly lower than those of guardian-version ICFs. The Shannon entropy and Rényi entropy with α = 1.5 of the ICFs for minors show no significant differences compared to those of the sixth-grade textbooks, while the Rényi entropy with both α = 0 and 0.5 shows no significant differences compared to the ninth-grade textbooks. This study utilized information entropies to assess lexical features of ICFs, as a pilot study to validate the feasibility of implementing Shannon and Rényi entropies to evaluate readability in Chinese ICFs.

1 Introduction

Informed consent is a critical component of clinical research [1]. It is a communication process that provides either patients or their guardians (e.g., parents) with the required information on treatment and diagnosis, enabling them to make informed decisions [2]. As a fundamental prerequisite for clinical research, informed consent has been gradually incorporated into the laws of most countries over the past six decades [3], and is subject to oversight by administrations and the general public.

Informed consent comprises two main processes, i.e., “informed” and “consent”. “Informed” serves as the prerequisite for “consent,” meaning that researchers must fully disclose the nature of the research, and the participant’s involvement to potential participants before their involvement [4]. Participants must be voluntary and decide whether or not to participate in the research based on a thorough understanding of the information provided [5].

The informed consent forms (ICFs), as a carrier of informed consent information, are generally regarded as a legally binding document. ICFs contain all the information that needs to be disclosed to participants, including the research background, procedures, risks, potential benefits, and compensation, among other details. High-quality ICFs are critical for ensuring effective communication of information between researchers or doctors with patients [6]. However, it is often challenging to fully and clearly explain research information to participants without medical expertise. ICFs that are filled with complex and obscure professional terminology can be difficult to understand, even for individuals with higher education or certain medical expertise. In such cases, the participants’ full understanding cannot be guaranteed, making it even harder to ensure that they make choices that are consistent with their own decisions. This also makes the function of ICFs fall into a dilemma, that is, ICFs have typically emphasized the provision of information over support to people making a difficult decision [7]. Paasche-Orlow et al. found that Institutional Review Boards (IRBs) commonly provide text for informed-consent forms that fall short of their own readability standards [8]. The mean Flesch–Kincaid scores for the readability of sample text provided by IRBs exceeded the stated standard by 2.8 grade levels. Another research also found that mind wandering increases linearly with text difficulty [9]. As reading difficulty increases and interest decreases, people are more prone to mind-wandering, which explains why more difficult texts lead to a decline in reading comprehension. Therefore, evaluating the readability of ICFs is an important task for IRBs and regulatory administrations.

In Civil Code of the People’s Republic of China [10], the civil capacity of minors is divided into three stages: (1) Minors (children) under the age of 8 are considered to have no capacity for performing civil juristic acts, and may perform a civil juristic act only through their legal representatives; (2) Minors aged 8 or above are considered to have limited civil capacity, and their civil legal acts must either be performed by their legal guardians or approved and ratified by them; (3) natural people aged 18 or above, as well as minors aged 16 or above who main source of support is the income from their own labor, are deemed as a person with full capacity for performing civil juristic acts. From the perspective of Chinese law, parents or guardians can decide on behalf of most minors whether to participate in clinical research. However, ethical principles require that these minors, as participants in clinical research, understand the research information and agree to participate as far as possible [11,12].

As the right to be informed of Chinese minors gradually gains more attention, an increasing number of clinical researchers involving minors have started to design ICFs specifically for minors to read and sign. Informing minor participants not only safeguards their right to be informed but also helps them understand the research procedures, precautions, and other details, which can be to the benefit of the research progress. However, ICFs that are difficult for even adults to read pose an even greater challenge for minors, who may lack sufficient reading ability and knowledge. This makes it impractical to include all research information in the informed consent process for minors. Therefore, IRBs and researchers need to pay more attention to whether minors can comprehend the content of the ICFs. Thus, there is an urgent need to explore a readability evaluation method for Chinese ICFs that can be broadly applied across all age groups.

Currently, there are multiple methods internationally available to evaluate the readability of medical information provided to patients, such as the Flesch-Kincaid scale [1315], the Gunning Fog Index [16], and the Simple Measure of Gobbledygook (SMOG) test [17]. However, these readability evaluation methods are designed for English-based texts, particularly using the number of syllables as the standard for distinguishing complex words. Thus, these methods cannot be directly applied to Chinese texts. In a study on the readability of Chinese text, Yong et al. found that character frequency and lexical richness were key factors in distinguishing readability levels [18]. The higher the value of lexical richness in a text, the greater the uncertainty of the words used (indicating a higher variation in vocabulary), and consequently, greater reading difficulty for the text [19]. Shannon entropy was confirmed to be an effective measure for evaluating these indices that influence the readability of Chinese text [2022]. In addition, aside from the inapplicability of these methods due to language differences, ICFs themselves often contain a large number of incomplete sentence structures, such as lists, phrases, and fragmentary descriptions (inclusion and exclusion criteria, et al). This makes it difficult to accurately reflect the readability level of ICFs.

The information entropy method has been widely applied in biomedical signal processing [23,24] and financial time series analysis [25]. Therefore, this research proposes the use of information theory methods, including Shannon entropy and its extension—Rényi entropy, to analyze the lexical features of Chinese ICFs for minors and their guardians. The reliability of the evaluation method will be validated by referencing Chinese language textbooks from different grade levels. This approach aims to explore the feasibility of establishing a readability evaluation system for ICFs.

2 Materials and methods

2.1 Materials

We selected the clinical research involving minors in China between January 2018 and December 2024, including investigator-initiated clinical research and drug clinical trials. We collected a total of 17 pairs of minor-version ICFs and corresponding guardian-version ICFs from 17 approved ethical review applications. All the ICF texts are written in Simplified Chinese. Since this research focuses on the textual characteristics of ICFs, we retained only the ICFs for teenagers (ages 12−18) and their guardians, and excluded those designed for children under 12 years old, as those ICFs often convey research information through visual elements such as illustrations. This study only collects and analyzes the text content of ICFs and does not contain any personal information of the human participants. The Institutional Review Board of the Chinese PLA General Hospital has granted an exemption from ethical review for this study (S2025-378–01).

As a comparison, we selected Chinese language textbooks (excluding poetry and prose of classical Chinese) from the second-grade (n = 45), sixth-grade (n = 36), and ninth-grade (n = 23) editions published by the People’s Education Press (organized and prepared by the Ministry of Education in China). The second-grade texts correspond to the reading ability of minors around 8 years old, marking the transition from no civil capacity to limited civil capacity. The sixth-grade texts correspond to the reading ability of minors around 12 years old, the age boundary for entering adolescence. The ninth-grade texts correspond to the reading ability of minors aged 15–16, marking the completion of compulsory education or the full civil capacity under certain conditions.

2.2 Information entropy

Shannon entropy is a core concept in information theory, put forward by Claude Shannon in 1948 to quantify the uncertainty of information [26]. It essentially measures the uncertainty or amount of information associated with the possible values of a random variable. The formula for Shannon entropy with base 2 is as follows:

where pi denotes the probability of the i-th possible event occurring in the given probability distribution P, while n represents the total number of possible outcomes in the system. The higher the Shannon entropy value, the more evenly distributed the probability density of information, indicating greater randomness in the distribution [27]. In this research, Shannon entropy is used as a standard method to evaluate text complexity and readability. A higher entropy value indicates a larger vocabulary and a more uniform word frequency distribution within the text, meaning that the text is more complex and shares less readability.

Rényi entropy, as an extension of Shannon entropy, introduces a parameter α to represent different weighting sensitivities. For a discrete probability distribution P = (p₁, p₂,..., pn), the Rényi entropy of order α (α > 0, α ≠ 1) is defined as:

When α → 1, Rényi entropy converges to Shannon entropy. When α < 1, Rényi entropy places more emphasis on the size of the sample set while reducing the influence of the probability distribution [28]. Conversely, when α > 1, Rényi entropy focuses more on events with higher probabilities.

When α = 0, Rényi entropy is equivalent to Hartley entropy, and the formula is as follows:

When α = 0, Rényi entropy is used to measure the information content of a set of equally probable events. When analyzing text, it no longer considers the probability distribution of the words but only focuses on which words are possible. In this research, Rényi entropy represents the size of the word set, i.e., the total vocabulary of the text.

When α→∞, Rényi entropy represents the negative logarithm of the most probable event and is commonly referred to as maximum entropy or minimum uncertainty entropy. The formula is as follows:

When α→∞, Rényi entropy becomes dominated by the maximum probability pmax, which significantly influences the result, while other smaller probabilities rapidly approach 0. At this point, Rényi entropy no longer considers the entire probability distribution but focuses solely on the maximum probability value. When analyzing text, Rényi entropy represents the proportion of the most frequently occurring word, reflecting the distribution of the most dominant informational feature in the text.

Therefore, in this research, to simultaneously focus on the size of the text word set and its probability distribution, we chose Rényi entropy across α ∈ [0,2] with 0.1 steps (replaced by Shannon entropy when α = 1) to analyze the ICFs.

2.3 Data preprocessing

In this research, information such as the cover page, signature page, headers, and footers of the ICFs was removed, while the text within tables and text boxes was retained, and the text format was standardized. For the comparison of Chinese language textbooks, images, introductory sections, and footnotes within the textbooks were removed, and the text format was similarly standardized.

We used the open-source toolkit (jieba: https://github.com/randoruf/jieba-chinese-tokenization) to perform word segmentation on the ICFs. After segmentation, punctuation marks, spaces, and other characters were removed from the segmentation results, retaining only the lexical information.

2.4 Statistics

Continuous variables are represented by means ± standard deviation (SD) or median (lower quartile, upper quartile). The Wilcoxon signed-rank test was used to assess the median difference of Shannon entropy and Rényi entropy between ICFs for minors and their guardians. The area under the curve (AUC), based on logistic regression, was calculated to discriminate between minor-version and guardian-version ICFs using Shannon entropy or Rényi entropy. Furthermore, the Shannon entropy and Rényi entropy of ICFs designed for minors were compared with those of textbooks for second, sixth, and ninth graders. A P-value of less than 0.05 is considered significant.

3. Results

3.1 Method verification

The minor-version ICFs and guardian-version ICFs were used to calculate Rényi entropy across α∈[0, 2] with steps of 0.1 (replaced by Shannon entropy when α = 1). The paired differences ∆(Guardian – Minor) were then computed, and the corresponding curves were plotted (Table 1 and 2). The results showed that ∆ decreased as α increased, with no distinct separation peaks observed in the overall curve (Fig 1). A random experiment (Section 3.6) indicated that above α = 1.6, the minor-version ICFs and guardian-version ICFs no longer exhibited significant differences (P > 0.05). Therefore, in the subsequent analyses, we reduced the step size of α, selecting 0, 0.5, 1 (replaced by Shannon entropy), and 1.5 for lexical feature analysis. The complete results for α ∈ [0, 2] are provided in the supplementary materials.

thumbnail
Table 1. Descriptive results of minor-version ICFs, guardian-version ICFs, and Chinese language textbooks of Grades 2, 6, and 9.

https://doi.org/10.1371/journal.pone.0338611.t001

thumbnail
Table 2. The Wilcoxon signed-rank test results of Shannon entropy and Rényi entropy (α values: 0, 0.5, and 1.5) between ICFs for minors and their guardians.

https://doi.org/10.1371/journal.pone.0338611.t002

thumbnail
Fig 1. Parameter sensitivity analysis of entropy differences between Guardian and Minor ICFs across α values.

https://doi.org/10.1371/journal.pone.0338611.g001

Shannon entropy and Rényi entropy (α values of 0, 0.5, and 1.5) were chosen to analyze the Chinese language textbooks for the second, sixth, and ninth grades (Fig 2). The results showed that as the difficulty of the texts increased, their entropy values also increased. This preliminarily proves that Shannon and Rényi entropy can be used for text readability assessment.

thumbnail
Fig 2. The entropy values of the Chinese language textbooks for the second, sixth, and ninth grades.

https://doi.org/10.1371/journal.pone.0338611.g002

We chose one set of minors’ and their guardians’ ICF to analyze the distribution of the word frequency and Shannon entropy for each single word. The results shown in Fig 3 (a) and (b) indicate that compared with the guardian-version ICF, the word frequency distribution of the minor-version ICF is more uneven. Fig 3 (c) and (d) display the contributions of Shannon entropy for single words in ICF. Similarly, the distribution of Shannon entropy contributions of the ICF for minors is more uneven than that of the ICF for guardians. Since Rényi entropy is based on the overall distribution of word frequencies in the entire text, it can not calculate the contribution of a single word to the Rényi entropy value.

thumbnail
Fig 3. Word frequency and shannon entropy distribution.

This figure shows the distribution of word frequency (a. Minor-version; b. Guardian-version) and Shannon entropy (c. Minor-version; d. Guardian-version) for one set of minors and their guardians ICF.

https://doi.org/10.1371/journal.pone.0338611.g003

3.2 Descriptive statistics

The descriptive results, including word count, Shannon entropy, and Rényi entropy (α values: 0, 0.5, and 1.5), of minor-version ICFs, guardian-version ICFs, and Chinese language textbooks from the second grade, sixth grade, and ninth grade were shown in Table 1.

3.3 Comparisons of minor-version with guardian-version ICFs

The Wilcoxon signed-rank test results of Shannon entropy and Rényi entropy (α values: 0, 0.5, and 1.5) between ICFs for minors and their guardians are shown in Table 2. The Shannon entropy of minor-version ICFs was significantly lower than that of guardian-version ICFs (7.71 (7.38, 8.13) vs. 8.34 (8.23, 8.57), P = 0.001). Significant differences were also found in the Rényi entropy (α = 0) (9.03 (8.67, 9.72) vs. 10.27 (9.90, 10.43), P < 0.001), Rényi entropy (α = 0.5) (8.38 (8.15, 9.10) vs. 9.45 (9.30, 9.63), P < 0.001) and Rényi entropy (α = 1.5) (7.04 (6.47, 7.27) vs. 7.17 (7.02, 7.49), P = 0.025) of minors’ and their guardians’ ICFs. In distinguishing the minor-version ICFs from guardian-version ICFs, the Rényi entropy (α = 0) had the best performance (AUC = 0.952), followed by Rényi entropy (α = 0.5) (AUC = 0.941), Shannon entropy (AUC = 0.879), and Rényi entropy (α = 1.5) (AUC = 0.713) (Fig 4). The AUC values of four kinds of entropy and the corresponding cutoff points are shown in Table 3.

thumbnail
Table 3. The AUC values of four different entropy measures (Shannon entropy and Rényi entropy with α-values of 0, 0.5, and 1.5) and their corresponding cutoff points.

https://doi.org/10.1371/journal.pone.0338611.t003

thumbnail
Fig 4. The receiver operating characteristic (ROC) curves for the discrimination between minor-version and guardian-version ICFs.

The performance of four different entropy measures (Shannon entropy and Rényi entropy with α-values of 0, 0.5, and 1.5) was shown.

https://doi.org/10.1371/journal.pone.0338611.g004

3.4 Comparisons of minor-version ICFs with Chinese language textbooks

The Shannon entropy and Rényi entropy (α-values: 0, 0.5, and 1.5) of minor-version ICFs were significantly higher than those of Chinese language textbooks from the second grade (all P < 0.001). When contrasted with six-grade textbooks, minors’ ICFs showed significantly elevated Rényi entropy at α = 0 (P = 0.001) and α = 0.5 (P = 0.016), but comparable Shannon entropy and Rényi entropy (α = 1.5). On the contrary, no significant differences were found in the Rényi entropy (α = 0, 0.5) between ICFs for minors and ninth-grade textbooks. The Shannon entropy (P = 0.022) and Rényi entropy (α = 1.5) (P = 0.001) of minor-version ICFs were significantly lower than that of ninth-grade textbooks. These comparisons in textual complexities are shown in Fig 5.

thumbnail
Fig 5. Entropy comparison across minor-version ICFs and textbooks.

This figure presents the comparison of (a) Rényi entropy (α = 0), (b) Rényi entropy (α = 0.5), (c) Shannon entropy, and (d) Rényi entropy (α = 1.5), between minor-version ICFs and textbooks of different grades.

https://doi.org/10.1371/journal.pone.0338611.g005

3.5 The effects of textual content upon entropy

To investigate the reasons underlying the changes in information entropy between Minor and Guardian version ICFs, we divided the informed consent forms into four sections: introduction, procedures, risks and benefits, and participants’ rights and other information. We then recalculated the entropy for each section, computed the difference ∆(Guardian – Minor), and visualized the results using a heatmap (Fig 2 and Table 3). The findings indicate that the section “participants’ rights and other information” exhibits the largest variation with respect to α. Specifically, the results show that the sample size of Minor ICFs differs substantially from that of Guardian ICFs when α = 0, whereas the word frequency distribution becomes more uniform when α = 1.5 (Table 4).

thumbnail
Table 4. The entropy value of the difference ∆(Guardian – Minor) for each section.

https://doi.org/10.1371/journal.pone.0338611.t004

3.6 Random experiment

Entropy is sensitive to vocabulary growth. Therefore, we designed a random experiment to test whether the smaller entropy values in minors’ ICFs were simply due to their shorter length compared with the guardian version. We performed additional analyses by randomly sampling 1,000 tokens from each ICF (with replacement for shorter texts) and computed the entropies over 100 iterations, then averaged the results. The results show that minor-version ICFs still exhibited significantly lower entropy values (P < 0.05) than guardian-version ICFs when α ≤ 1.5 (Table 5). Therefore, it is reasonable that this study chose Shannon entropy and Rényi entropy with α = 0, 0.5, and 1.5.

thumbnail
Table 5. The Wilcoxon signed-rank test results of a Random experiment.

https://doi.org/10.1371/journal.pone.0338611.t005

4 Discussion

This study conducted a comparative analysis of the differences in Shannon entropy and Rényi entropy between minor-version and guardian-version ICFs, as well as between minor-version ICFs and Chinese textbooks from various grade levels. Significant differences were observed between the ICFs for minors and their guardians across all four entropy values. The result also showed that the readability level of the minor-version ICFs corresponds to grades six to nine, which matches the reading ability of minors aged 12–16. However, in terms of length, the word count of the minors’ ICFs far exceeds that of ninth-grade textbooks. These results are of significant importance in guiding the drafting of ICFs and evaluating their readability.

4.1 Differences in word set size and word frequency between minor-version and guardian-version ICFs

In this study, significant differences were observed between the minor-version and guardian-version ICFs across all four entropy values, indicating differences in word set size and word frequency between the two versions. The Rényi entropy (α = 1.5) shows that the probability distribution in the minor-version ICFs is less uniform than that in the guardian-version ICFs, meaning that high-frequency words (such as function words) dominate, and the information is more concentrated. On the other hand, this could also indicate simpler or repetitive sentence structures and lower lexical diversity. However, a notable number of minors’ ICFs (7/17) had Rényi entropy values (α = 1.5) higher than the guardians’ ICFs. We believe that the minor-version ICFs have reduced length in sections such as Risk, Benefit, and Participants’ Rights, compared to the guardian-version ICFs. This reduction led to a higher occurrence of low-frequency terms, such as technical or medical terminology, in the minor ICFs, resulting in a higher Rényi entropy (α = 1.5). The results in section 3.5 also show that the gap in entropy values between the Risk and Benefit and Participants’ Rights and Other Information sections rapidly reduces as the α value increases, thus supporting this hypothesis.

To further explore this hypothesis, we initially built a shortlist of 563 technical terms. The technical-term density (TT) was calculated as follows:

Subsequent analysis was conducted on the TT results to examine whether the Rényi entropy parameter (α = 1.5) increases with higher technical-term density, using Spearman’s rank correlation coefficient (ρ) across ICFs. The results reveal a significant correlation between the Rényi entropy value (α = 1.5) and the TT results (ρ = 0.570, P < 0.001).

4.2 The writing of informed consent forms should aim to lower the threshold for understanding the information

Although according to Cognitive Development Theory [29,30], minors aged 12 and above are beginning to develop abstract thinking abilities, these abilities are still in the developmental stage. Complex sentence structures and medical terminology remain significant challenges for minors’ comprehension of abstract concepts. This finding highlights the need for researchers and IRBs to focus on the use of complex vocabulary and sentence structures when drafting and reviewing ICFs, rather than simply reducing content by cutting text from the ICFs for guardians. In addition, the guardian-version ICFs significantly surpass the reading ability typically acquired by individuals upon completing compulsory education. Therefore, in most cases, participants and their guardians are required to make decisions that exceed the limits of their available information and cognitive abilities. This also determines that the drafting of ICFs is an extremely important work, which needs to lower the threshold of information understanding as much as possible, so that participants can fully understand the research information.

The Shannon entropy and Rényi entropy (α = 1.5) of the minor-version ICFs show no significant differences compared to sixth-grade textbooks, while the Rényi entropy with α = 0 and 0.5 shows no significant differences compared to ninth-grade textbooks. The entropy values of the minors’ ICFs generally align with the grades six to nine, but the length far exceeds that of textbooks. According to Cognitive Load Theory [31,32], human working memory has a limited capacity during learning and information processing, and longer texts may increase cognitive load during reading [33]. Extended text length can significantly impact the effectiveness of reading and understanding informed consent forms by participants [34,35]. Therefore, researchers and IRBs need to consider the length of ICFs during their drafting and review. They could extract key information from the ICFs to create summaries or organize the information in layers based on its importance, so that the primary informed content is concentrated within specific sections. When reading ICFs, participants may not receive sufficient explanations. Although this research shows that the readability of the minor-version ICFs is equivalent to the reading level of grades six to nine, the reading environment of ICFs is significantly different from the learning environment of Chinese language classes. We should not overly expect researchers to read and explain ICFs line by line, as teachers would do with textbooks, especially since ICFs themselves do not come with well-established instructional methods like those for classroom texts. Similarly, designing ICFs faces a comparable challenge. Expecting researchers to craft a text that is both easy to understand and linguistically engaging, like a professional writer, is clearly unrealistic.

4.3 The construction of a readability evaluation tool for Chinese ICFs based on information entropy is feasible

The study demonstrates that Shannon and Rényi entropy values increase as the difficulty of the text rises, as seen in the comparison of ICFs across different versions (minor and guardian) and Chinese language textbooks of various grade levels (second, sixth, and ninth grades). The entropy values for minor-version ICFs and guardian-version ICFs, as well as textbooks, reflect the overall text complexity, which can be correlated with how easily the target audience—minors and their guardians—can comprehend the document. Shannon entropy and Rényi entropy can successfully capture the complexity and variability in the text, providing objective measures of its readability.

The random experiment is designed to test the effects of text length on entropy values showed that even when adjusting for text length by randomly sampling the minor-version ICFs still exhibited significantly lower entropy values compared to the guardian-version ICFs (α ≤ 1.5). This strengthens the argument that the differences in entropy are not solely due to text length, but rather reflect inherent differences in text complexity between the two versions.

Furthermore, entropy can also be analyzed at different granular levels (e.g., Introduction, Procedures, Risk, and Benefit), enabling more targeted interventions. This could lead to a more efficient and systematic approach to revising ICFs, ensuring that they are both legally sound and comprehensible to a wide range of participants.

4.4 The advantages and disadvantages of Shannon and Rényi entropy

Entropy measures are critical for evaluating complex, non-linear systems, much like how they apply entropy to electrophysiological data [36,37]. In the study of text readability and linguistic feature analysis, Shannon entropy, with its clear probabilistic interpretation and robust theoretical foundation, offers strong interpretability and empirical reliability in quantifying the average uncertainty of a text [38]. It effectively captures the “average information content” at both lexical and syntactic levels and has been validated across multiple languages and genres in readability research [39]. Because it originates directly from fundamental information theory, the computation and interpretation of Shannon entropy are relatively straightforward, making it particularly suitable for constructing general frameworks of linguistic complexity measurement. However, its main limitation lies in its fixed sensitivity: Shannon entropy assigns equal weight to high-frequency and low-frequency events, which makes it less flexible when distinguishing between the contributions of frequent and rare words to readability [40]. In contrast, the parameterized nature of Rényi entropy enables a tunable analytical framework, offering notable advantages in handling heterogeneous text structures and mitigating the effects of noisy linguistic data. However, this flexibility comes at the cost of increased interpretive and computational complexity. The selection of the parameter α remains largely heuristic, lacking standardized guidelines, and its interpretation is highly dependent on the specific linguistic or analytical context. Consequently, a combined application of Shannon and Rényi entropy can yield a hierarchical and conceptually coherent framework for the comprehensive analysis of Chinese ICF lexical features.

4.5 Limitations of the study

One limitation of the current study could be that our focus on the choice of different word segmentation tools or tokenization tools can have an impact on the results. To avoid the impact of this difference, we choose the most popular open-source word segmentation toolkit (jieba) for text segmentation, and carry out lexical feature analysis under the same conditions. Another limitation of this work may be based solely on information entropy and lacks an analysis of the relationships between words; thus, the lexical relation of Chinese text requires further exploration. For example, terminologies and their explanations are not analyzed together in terms of readability. Finally, the information conveyed by ICFs is multidimensional, including elements such as charts, images, and more. The readability of the ICFs is not only influenced by lexical information, but also by different layouts and formatting can also affect the readability. However, given that the main purpose of the present study is to discover lexical features for ICFs, exploring such issues is beyond our scope.

5 Conclusions

The design of ICFs is a complex engineering that involves knowledge from multiple fields, such as medicine, ethics, law, linguistics, psychology, and education. Unlike literary works, the purpose of an ICF is to ensure that participants fully understand the research procedures, risks, and benefits. Therefore, when designing ICFs, it is necessary to comprehensively consider various factors such as participants’ cultural background, educational level, and cognitive abilities to ensure that the content, structure, and language expression effectively convey the core information. For IRBs and researchers, faced with increasingly stringent review requirements, the help of digital tools to quantitatively evaluate ICFs will become a necessary approach.

This research utilized the method of information entropy to analyze the lexical features of ICFs and has preliminarily validated the feasibility of applying Shannon entropy and Rényi entropy to the readability evaluation of Chinese ICFs. With the advancement of digital technologies, the design of ICFs can also explore personalized approaches, and provide auxiliary reading functions according to different levels of reading ability to ensure a thorough understanding of the research information.

References

  1. 1. Agozzino E, Borrelli S, Cancellieri M, Carfora FM, Di Lorenzo T, Attena F. Does written informed consent adequately inform surgical patients? A cross sectional study. BMC Med Ethics. 2019;20(1):1. pmid:30616673
  2. 2. Kaushik JS, Narang M, Agarwal N. Informed consent in pediatric practice. Indian Pediatr. 2010;47(12):1039–46. pmid:21220800
  3. 3. Cocanour CS. Informed consent-It’s more than a signature on a piece of paper. Am J Surg. 2017;214(6):993–7. pmid:28974311
  4. 4. Dankar FK, Gergely M, Dankar SK. Informed Consent in Biomedical Research. Computational and Structural Biotechnology Journal. 2019;17:463–74.
  5. 5. Ren Y-P, Jin X-R, Jiang S, Jiang B-S. Legal protection of the rights of clinical trial subjects in China. J Biomed Res. 2018;32(2):77–80. pmid:29921746
  6. 6. Vučemilo L, Borovečki A. Readability and content assessment of informed consent forms for medical procedures in Croatia. PLoS One. 2015;10(9):e0138017. pmid:26376183
  7. 7. Tugwell P, Knottnerus A, Idzerda L. Informed consent forms fail to reflect best practice. J Clin Epidemiol. 2012;65(7):703–4. pmid:22651972
  8. 8. Paasche-Orlow MK, Taylor HA, Brancati FL. Readability standards for informed-consent forms as compared with actual readability. N Engl J Med. 2003;348(8):721–6. pmid:12594317
  9. 9. Kahmann R, Ozuer Y, Zedelius CM, Bijleveld E. Mind wandering increases linearly with text difficulty. Psychol Res. 2022;86(1):284–93. pmid:33576850
  10. 10. People’s Republic of China. Civil Code of the People’s Republic of China. 2020. https://english.www.gov.cn/archive/lawsregulations/202012/31/content_WS5fedad98c6d0f72576943005.html
  11. 11. Parmigiani G, Benevento M, Solarino B, Margari A, Ferorelli D, Buongiorno L, et al. Decisional capacity to consent to treatment in children and adolescents: A systematic review. Psychiatry Res. 2025;344:116343. pmid:39798483
  12. 12. Cohen LT, Millock PJ, Asheld BA, Lane B. Minor patients: consent to treatment and access to medical records. J Am Coll Radiol. 2015;12(8):788–90. pmid:26250974
  13. 13. FLESCH R. A new readability yardstick. J Appl Psychol. 1948;32(3):221–33. pmid:18867058
  14. 14. Williamson JML, Martin AG. Analysis of patient information leaflets provided by a district general hospital by the Flesch and Flesch-Kincaid method. Int J Clin Pract. 2010;64(13):1824–31. pmid:21070533
  15. 15. Emanuel EJ, Boyle CW. Assessment of Length and Readability of Informed Consent Documents for COVID-19 Vaccine Trials. JAMA Netw Open. 2021;4(4):e2110843. pmid:33909052
  16. 16. Świeczkowski D, Kułacz S. The use of the Gunning Fog Index to evaluate the readability of Polish and English drug leaflets in the context of Health Literacy challenges in Medical Linguistics: An exploratory study. Cardiol J. 2021;28(4):627–31. pmid:33140389
  17. 17. Grabeel KL, Russomanno J, Oelschlegel S, Tester E, Heidel RE. Computerized versus hand-scored health literacy tools: a comparison of Simple Measure of Gobbledygook (SMOG) and Flesch-Kincaid in printed patient education materials. J Med Libr Assoc. 2018;106(1):38–45. pmid:29339932
  18. 18. Yong C, Dekuan X, Jun D. On key factors of text reading difficulty grading and readability formula based on Chinese textbook corpus. Applied Linguistics. 2020;1:132–43.
  19. 19. Lei L, Yaoyu W, Kanglong L. AlphaReadabilityChinese: a tool for the measurement of readability in Chinese texts and its applications. 2024;01:83–93.
  20. 20. Mohseni M, Redies C, Gast V. Comparative analysis of preference in contemporary and earlier texts using entropy measures. Entropy (Basel). 2023;25(3):486. pmid:36981375
  21. 21. Shi Y, Lei L. Lexical richness and text length: an entropy-based perspective. Journal of Quantitative Linguistics. 2020;29(1):62–79.
  22. 22. Koponen I, Södervik I. Lexicons of key terms in scholarly texts and their disciplinary differences: from quantum semantics construction to relative-entropy-based comparisons. Entropy (Basel). 2022;24(8):1058. pmid:36010722
  23. 23. Yeh C-H, Xu Y, Shi W, Fitzgerald JJ, Green AL, Fischer P, et al. Auditory cues modulate the short timescale dynamics of STN activity during stepping in Parkinson’s disease. Brain Stimul. 2024;17(3):501–9. pmid:38636820
  24. 24. Oswal A, Cao C, Yeh C-H, Neumann W-J, Gratwicke J, Akram H, et al. Neural signatures of hyperdirect pathway activity in Parkinson’s disease. Nat Commun. 2021;12(1):5185. pmid:34465771
  25. 25. Olbrys J, Majewska E. Approximate entropy and sample entropy algorithms in financial time series analyses. Procedia Computer Science. 2022;207:255–64.
  26. 26. Shannon CE. A Mathematical Theory of Communication. Bell Telephone System Technical Publications. 1948.
  27. 27. Estevez-Rams E, Mesa-Rodriguez A, Estevez-Moya D. Complexity-entropy analysis at different levels of organisation in written language. PLoS One. 2019;14(5):e0214863. pmid:31067221
  28. 28. Mastroeni L, Mazzoccoli A, Vellucci P. Studying the impact of fluctuations, spikes and rare events in time series through a wavelet entropy predictability measure. Physica A: Statistical Mechanics and its Applications. 2024;641:129720.
  29. 29. Piaget J. Science of education and the psychology of the child. New York: Orion Press. 1970.
  30. 30. Stoltz T, Weger U, da Veiga M. Consciousness and education: contributions by Piaget, Vygotsky and Steiner. Front Psychol. 2024;15:1411415. pmid:39161692
  31. 31. Sweller J. Cognitive load during problem solving: effects on learning. Cognitive Science. 1988;12(2):257–85.
  32. 32. van Gog T, Paas F, Sweller J. Cognitive load theory: advances in research on worked examples, animations, and cognitive load measurement. Educ Psychol Rev. 2010;22:375–8.
  33. 33. Baker JR. The effects of text length on the readability of model essays. gema. 2023;23(1):60–73.
  34. 34. Larson E, Foe G, Lally R. Reading level and length of written research consent forms. Clin Transl Sci. 2015;8(4):355–6.
  35. 35. Corneli A, Namey E, Mueller MP, Tharaldson J, Sortijas S, Grey T, et al. Evidence-based strategies for shortening informed consent forms in Clinical Research. J Empir Res Hum Res Ethics. 2017;12(1):14–25. pmid:28078953
  36. 36. Yeh C-H, Zhang C, Shi W, Zhang B, An J. Quantifying sharpness and nonlinearity in neonatal seizure dynamics. Cyborg Bionic Syst. 2024;5:0076. pmid:38274711
  37. 37. Yeh CH, Zhang C, Shi W, Lo MT, Tinkhauser G, Oswal A. Cross-frequency coupling and intelligent neuromodulation. Cyborg Bionic Syst. 2023;4:0034.
  38. 38. Papadimitriou C, Karamanos K, Diakonos FK, Constantoudis V, Papageorgiou H. Entropy analysis of natural language written texts. Physica A: Statistical Mechanics and its Applications. 2010;389(16):3260–6.
  39. 39. Liu K, Liu Z, Lei L. Simplification in translated Chinese: An entropy-based approach. Lingua. 2022;275:103364.
  40. 40. Bentz C, Alikaniotis D, Cysouw M, Ferrer-i-Cancho R. The entropy of words—learnability and expressivity across more than 1000 languages. Entropy. 2017;19(6):275.