Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Measuring reading time: Comparing logged and self-reported data in relation to reading skills

  • Brice Brossette ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    brice.brossette@gmail.com

    Affiliations Aix-Marseille Univ, CNRS, LPL, Aix-en-Provence, France, Aix-Marseille Univ, Pôle Pilote AMPIRIC, Marseille, France

  • Laurie Persia-Leibnitz,

    Roles Data curation, Project administration

    Affiliations Aix-Marseille Univ, CNRS, LPL, Aix-en-Provence, France, Academy of Martinique, Les Hauts de Terreville, Schoelcher, France

  • Mee-Jin Chalbos,

    Roles Data curation, Project administration

    Affiliation Aix-Marseille Univ, CNRS, LPL, Aix-en-Provence, France

  • Chloé Prugnières,

    Roles Data curation, Project administration

    Affiliations Aix-Marseille Univ, CNRS, LPL, Aix-en-Provence, France, Aix-Marseille Univ, Pôle Pilote AMPIRIC, Marseille, France

  • Stéphanie Ducrot

    Roles Conceptualization, Data curation, Funding acquisition, Methodology, Project administration, Supervision, Visualization, Writing – review & editing

    Affiliations Aix-Marseille Univ, CNRS, LPL, Aix-en-Provence, France, Aix-Marseille Univ, Pôle Pilote AMPIRIC, Marseille, France, Institute for Language, Communication, and the Brain, Aix-Marseille Université, Aix-en-Provence, France

Abstract

Background

Children’s reading time at home plays a critical role in their reading development. However, existing measures of reading time, based on self-reports, are often biased. Logged data from mobile apps may offer a more reliable alternative, as shown in studies examining screen time in digital media use.

Objectives

This study compared logged and self-reported measures of reading time and examined their associations with reading skills in French primary school children.

Method

One hundred and nine children from Grade 1 to Grade 5 and their parents participated. Parents completed a retrospective questionnaire estimating weekly reading time (self-reported measure). They then used a mobile application to record their child’s reading activities in real time over a 14-day period (logged measure). All children were assessed on their reading fluency.

Results and conclusions

The self-reported measure yielded significantly higher reading time estimates (M = 6.26 hours/week) than the logged measure (M = 2.11 hours/week), with a moderate correlation between the two (r = .45). Crucially, the logged measure showed stronger predictive validity for reading fluency (r = .39) than the self-reported measure (r = .25). Regression analyses confirmed that when both measures were included simultaneously, only the logged reading time remained a significant predictor of reading performance. These findings suggest that logged measures obtained via ambulatory assessment (here, using a mobile app) provide more accurate estimates of reading time and superior predictive validity compared to traditional self-reports. This methodology offers promising avenues for future research on reading habits and literacy development.

1. Introduction

The development of reading competencies is a central priority in educational public policies, which promote a broadened conception of literacy that goes beyond basic decoding skills and emphasizes a functional understanding of written language [1,2]. Developing strong reading competencies requires well-established reading comprehension skills [3]. These skills are regularly assessed in large-scale international studies [4], which consistently show that children’s learning environments play a central role in their ability to comprehend written materials [5]. In particular, print exposure appears to be an important explanatory factor [6].

The mechanisms underlying this effect have been well described within the Home Literacy Model [7,8]. This model posits that both formal and informal home literacy activities support early literacy development, as well as oral language and vocabulary growth. Moreover, early reading routines have been shown to predict later engagement in leisure reading [9], which in turn contributes to the development of reading skills [10], while the reciprocal relationship also holds [11]. This reciprocal relationship underlies the “snowball effect,” a positive feedback loop in which reading becomes increasingly enjoyable, thereby fostering skill development, confidence, and motivation to continue reading, all of which are crucial for the development of reading self-efficacy over time [1214].

Importantly, this reciprocal relationship is developmentally dynamic rather than linear. Longitudinal and genetically informed studies indicate that during the early school years, reading ability primarily predicts later engagement with print [11,15]. From adolescence onward, the direction of this relationship tends to reverse, such that greater exposure to written language plays a stronger role in reading development [16,17]. However, despite this body of evidence, the directionality of effects between print exposure and reading skills remains an open and debated question, with no clear consensus in the literature. Importantly, even when reading skills are considered primary, promoting print exposure may act as a protective factor, particularly for children with reading difficulties or those growing up in disadvantaged environments [18]. This appears particularly important because schools do not always succeed in compensating for a deprived family environment that does not provide sufficient reading time [19].

Over the past decades, research has employed diverse measures to estimate reading time [20]. Nonetheless, reliance on self-reported data may partly explain the heterogeneity and contradictory findings observed in the literature regarding the causal direction between reading time and reading skills. In the following section, we discuss the limitations associated with such self-reported measures. We then explore an innovative approach to assessing reading time through ambulatory assessment, inspired by digital log-based methods commonly used to assess screen time [21]. We believe that these innovative measures will allow for a more accurate estimation of reading time and its effects on the development of reading skills.

1.1 Traditional self-reported measures of reading time

Traditionally, reading time has been assessed using retrospective self-report questionnaires [20]. These measures show small to moderate associations with reading outcomes [12] and capture both quantitative aspects (e.g., time and frequency) and qualitative ones (e.g., text type, parent-child interaction), which differentially impact reading motivation and skill development [2225]. However, retrospective questionnaires require complex cognitive judgments and are therefore vulnerable to multiple source of bias, including subjective interpretations of frequency and duration, memory and reference biases, and social desirability effects, often leading to overestimation of actual reading behavior [2633].

As retrospective self-reported measures appear too coarse-grained, some researchers have turned to more fine-grained approaches, such as reading diaries. These involve daily reporting of reading activities, capturing not only the time spent reading but also qualitative aspects, such as the type of material read [3436]. Such diaries have been found to be reliable predictors of reading outcomes and motivation [35,3739]. Although they reduce some limitations of retrospective questionnaires, their data quality depends on compliance and timing of completion, as delayed entries may reintroduce retrospective bias and increase missing data, particularly over longer collection periods [4042]. In addition, traditional paper-and-pencil formats impose a substantial burden on participants, which may negatively affect adherence and data quality [43].

1.2 Ambulatory assessment of reading time

To address these limitations, some researchers have proposed using ambulatory assessment [44,45], defined as the use of digital tools to track behaviors in real time and in natural settings. Thanks to technological advances, ambulatory assessments can now be easily carried out using mobile apps on participants’ phones, as already demonstrated in the field of health [46,47]. More specifically, these methods have recently been successfully applied to assess screen time, demonstrating stronger psychometric properties than self-reported measures (for a review, see [48]).

However, to date, only one study has used ambulatory assessment to measure reading time, despite several advantages highlighted by Locher and collaborators [49]. First, smartphone ownership is common, even in disadvantaged families, enabling continuous data collection through a dedicated mobile app. This provides a convenient way to complete a reading diary at any time. Moreover, mobile apps offer high flexibility by allowing questions to be tailored, either by removing unnecessary items or by adding new ones based on participants’ responses or profiles. Another advantage is the ability to detect when entries are completed retrospectively, allowing for a more precise evaluation of data quality. Finally, mobile apps may be easier to use than paper-based methods, promoting broader participation and increasing the generalizability of results. Locher and collaborators [49] demonstrated that using a mobile app to track the reading behavior of university students is a reliable approach that reduces the burden of diary completion. Importantly, they found that app-based data and retrospective questionnaires were closely related, although reading time was overestimated in the questionnaire. This suggests that mobile apps may help reduce social desirability bias, which contributes to such overestimations.

1.3 The present study

The present study aimed to extend this approach in several ways. First, we developed a mobile application that allowed participants to time the duration of reading activities using a stopwatch function, rather than reporting it retrospectively. We believed that this method would encourage greater attention to accurately reporting reading activity, while reducing retrospective bias and providing a more objective measure of reading time. To distinguish this approach from traditional self-reports, and to draw a parallel with studies using logged screen time in digital media research, we refer to this method as a logged measure of reading time. Second, the study focused on primary school children. Since children do not necessarily own smartphones, we targeted reading time at home, where parents could use the mobile app to record their child’s reading activities. Finally, we assessed children’s reading fluency to examine its relationship with recorded reading times. A total of 109 children from Grade 1 to Grade 5 and their parents first completed a paper-and-pencil questionnaire about reading behavior, then used a mobile app for 14 days to log the duration of reading activities in real time. Children’s reading fluency was assessed at school. The study aimed, first, to evaluate the degree of convergence between self-reported and app-recorded reading time, and second, to assess the extent to which each measure contributes to the prediction of reading fluency.

2. Method

2.1. Participants

A total of 109 children (49 female, 60 male) from a public elementary school (Grade 1: n = 20; Grade 2: n = 29; Grade 3: n = 29; Grade 4: n = 13; Grade 5: n = 18) were tested in the middle of their school year (from 01/01/2024 to 30/04/2024). All participants were native French speakers with normal or corrected-to-normal vision. Caregivers provided written informed consent prior to the experiment. In addition, children provided verbal assent before participation. The experimenter explained the study in age-appropriate language, emphasized that participation was voluntary, and informed children that they could withdraw at any time without consequences. This procedure, including the use of verbal child assent, was reviewed and approved by the French Ethics Committee Review Board (2023-01-05-04). The experiment was conducted in accordance with relevant guidelines and regulations, as well as the Declaration of Helsinki.

2.2. Measures of reading time

2.2.1. Self-report measure: The paper-and-pencil questionnaire.

A self-report measure of children’s reading time was obtained using a paper-and-pencil questionnaire completed by parents (similar to that used in [50]). The questionnaire asked parents to retrospectively estimate the average number of hours per week their child spent engaged in three distinct types of reading activities at home: (1) school-related reading, defined as the time parents supervised their child while completing reading assignments for school; (2) shared reading, referring to time spent reading to or with the child outside of homework; and (3) independent reading for pleasure, referring to time the child spent reading alone by choice. These distinctions are important, as the nature of children’s reading activities is known to evolve with reading development: shared reading tends to decrease as decoding becomes more automated, while independent reading increases [51,52]. By evaluating all three components, the questionnaire provides a comprehensive picture of the overall volume of exposure to written material across the elementary school years. For the purposes of the present study, weekly time estimates for the three activity types were summed to create a single composite index of reading time (see Results section). This questionnaire-based approach constitutes a self-report measure of reading time, as it relies on parental estimates rather than direct behavioral tracking.

2.2.2. Logged measure: The reading diary app data.

A logged measure of children’s reading time was obtained using a custom-built mobile application developed with FlutterFlow [53]. Parents were asked to use the app to time their child’s reading activities over a 14-day period. This period was selected based on previous work ([49] informed by [40]). At the end of each recorded session, parents were prompted to indicate the type of reading activity that had taken place: (1) school-related reading, (2) shared reading, or (3) independent reading for pleasure—categories that were strictly identical to those used in the paper-and-pencil questionnaire. The total duration of each type of activity was automatically logged by the app. To produce a measure of average weekly reading time comparable to the questionnaire data, the total duration across the 14-day period was divided by two (see Results section). This approach provides a time-based, behaviorally grounded measure of children’s reading time in the home environment. Unlike retrospective estimates, it offers a chronometric index of real-world reading behavior.

2.3. Measure of reading fluency

To assess reading fluency, we used the Alouette Test [54], a standardized reading assessment widely used in France for children aged 5–14. In this task, participants are asked to read aloud a 265-word text within a time limit of three minutes (180 seconds), aiming for both speed and accuracy. The text is designed to minimize reliance on semantic cues by presenting syntactically correct but nonsensical sentences (e.g., “Le printemps a mis ses nids,” [Spring has put on its nests]), thereby targeting decoding skills. It contains rare vocabulary, irregular spellings, and orthographic challenges such as silent letters and phonologically confusable items. Three main variables were recorded: the number of correctly read words (M), total reading time in seconds (TL), and the number of reading errors (E). These variables were used to compute the reading fluency index (CTL = [(M-E))/TL * 180]).

3. Results

3.1. Preprocessing of reading time data

3.1.1. Self-reported reading time.

Following data collection, a small number of missing values were identified in the parent-reported duration estimates (n = 5). Specifically, two values were missing for school-related reading activities, one for shared reading, and two for independent reading for pleasure. To address these missing entries, we imputed the corresponding values using the mean duration reported by other children in the same grade level for the same type of activity. In addition, to minimize the influence of outliers, a winsorization procedure was applied to the upper 5% of values for each activity type (for more details, see [55]). Extreme values were replaced with the highest non-extreme value within the same distribution, thereby reducing the weight of atypical responses while preserving rank order. For each participant, weekly durations across the three activity types were then summed to generate a single composite index of total reading time

3.1.2. Logged reading time.

A total of 1,324 reading activities were recorded via the mobile application across 109 participants. Reading activities shorter than one minute were excluded as unlikely to reflect meaningful reading behavior (e.g., premature stops or test entries). These accounted for 1.36% of the total data. This decision was based on feedback from parents indicating occasional handling errors. Because parents were not allowed to delete recorded activities in order to preserve data quality, the corresponding corrections were performed manually after data collection. Following this exclusion, two participants had no remaining valid reading entries and were therefore removed from further analyses. Extremely long reading sessions, due to a failure to stop the timer, were replaced with the participant’s mean reading time (3.68% of the data).

Measurement quality was assessed using two complementary approaches. First, internal consistency was estimated following the procedure outlined by Locher and collaborators [49], treating daily reading time as repeated measurements of the same latent construct—namely, time spent reading. The resulting Cronbach’s alpha was  .67, approaching the conventional threshold for acceptable reliability. Second, test–retest stability was evaluated by correlating total reading time between week 1 and week 2. The correlation was moderate and statistically significant (r = .48, p < .001), providing further support for the temporal stability of the app-based measure.

3.2. Comparisons between self-reported and logged measures of reading time

A comparison of reading time estimates revealed substantially higher weekly durations for the self-reported measure (M = 6.26 hours, SD = 2.78) than for the logged measure (M = 2.11 hours, SD = 1.80). This difference was highly significant (t = 12.97, p < .001, d = 1.64). Reading time did not significantly vary across grade levels (see Table 1; all pairwise comparisons with p > .05).

thumbnail
Table 1. Mean weekly reading time (in hours) by grade level, measured using a mobile application (logged measure) and a paper-pencil questionnaire (self-reported measure). For each grade, the table presents the mean and standard deviation (in parentheses), as well as the observed range [minimum; maximum].

https://doi.org/10.1371/journal.pone.0344853.t001

Despite large differences in absolute levels, the two measures were moderately correlated (r = .45, p < .001), suggesting that the two measures capture partly overlapping constructs. To further evaluate the consistency of individual rankings across the two measurement methods, we computed an intraclass correlation coefficient (ICC) using a mixed-effects model implemented via the R package psych [56]. The ICC indicated a significant but relatively low level of agreement between the two measures (F (106, 106) = 2.42, p < .01).

To further assess the agreement between the two measurement methods, we conducted a Bland–Altman analysis [57]. The 95% limits of agreement (grey dotted lines) ranged from approximately 0 to −8 hours, indicating wide individual differences and poor agreement, consistent with the ICC results. Fig 1 shows that self-reported reading time was overestimated by approximately 4 hours per week (grey dashed line). A systematic bias was observed for most participants (grey points), with the self-reported measure consistently reporting higher values. This bias appeared proportional, as the variability in differences increased with higher mean reading durations. A linear regression confirmed this trend, showing a significant proportional bias (grey solid line).

thumbnail
Fig 1. Bland-Altman plot illustrating the agreement between logged and self-reported reading time (in hours).

The x-axis represents the mean of the two measures for each participant, and the y-axis shows the difference between logged and self-reported reading times. The solid black line represents the linear regression of the differences on the means, indicating a systematic bias. The dashed line indicates the average difference (bias), and the dotted lines indicate the limits of agreement.

https://doi.org/10.1371/journal.pone.0344853.g001

3.3. Relation between reading time and reading fluency

To examine the relationship between reading time and reading fluency, we first conducted Pearson correlation analyses. Weekly logged reading time recorded via the mobile application showed a moderate and significant correlation with fluency scores (r = .39, p < .001). Self-reported reading time from the paper-based questionnaire was also positively correlated with fluency, although to a lesser extent (r = .25, p < .05).

We then ran a series of linear regression models to compare the predictive power of the two measurement methods, controlling for age, gender, and mother’s professional occupation. In the self-reported model, weekly reading time significantly predicted fluency performance (b = 0.09, SE = 0.04, β = 0.23, partial R2 = .06, p < .05). Mother’s occupation also emerged as a significant predictor (b = −0.21, SE = 0.08, β = −0.23, partial R2 = .06, p < .05). This model accounted for 9.2% of the variance in fluency scores (adjusted  = .092).

In contrast, the logged model showed that objectively measured weekly reading time was a stronger and more reliable predictor of fluency (b = 0.23, SE = 0.06, β = 0.37, partial R2 = .13, p < .001), while mother’s occupation was not significant (b = –0.15, SE = 0.08, β = −0.17, partial R2 = .03, p = .070). The model explained 16.3% of the variance in reading fluency (adjusted  = .163).

Finally, in a combined model including both reading time measures, only the logged reading time remained a significant predictor of fluency (b = 0.21, SE = 0.07, β = 0.33, partial R2 = .08, p < .01), whereas the self-reported estimate no longer reached significance (b = 0.03, SE = 0.04, β = 0.08, partial R2 = .01, p = .46). Mother’s occupation again failed to reach significance (b = –0.16, SE = 0.08, β = −0.17, partial R2 = .03, p = .066). The model explained 15.9% of the variance in reading fluency (adjusted  = .159).

4. Discussion

The present study evaluated the relevance of ambulatory assessment of reading time in primary school children using a smartphone-based app. The first objective was to examine the convergence between the logged measure and a traditional self-reported paper-and-pencil retrospective questionnaire. Our results showed that reading time was overestimated using the self-reported measure, although both measures were significantly correlated. However, their agreement remained low, indicating that participants who reported reading more on the questionnaire did not necessarily log more reading time in the mobile app.

The second objective was to assess the extent to which each measure predicts reading fluency. Results showed stronger associations between reading fluency and the logged measure. Regression analyses confirmed this pattern: although both measures were significant predictors when tested separately, only the logged measure remained significant when both were included in the same model. This suggests that the logged measure is more reliable and informative.

In the following discussion, we examine the data quality of both reading time measures and argue for broader use of log-based approaches in research. We then discuss how the type of measure affects the prediction of reading fluency scores. Finally, we discuss the implications and challenges of using log-based data in research on print exposure.

4.1. Data quality of the log-based measure

The log-based measure showed satisfactory data quality, with acceptable internal consistency and reporting stability over the two-week period. However, data quality appeared lower than that reported by Locher and collaborators [49] with university students. Several explanations can be considered.

First, logging relies heavily on parental compliance. Parents must consistently initiate and terminate the stopwatch for each reading episode, which may result in omissions or inaccuracies, particularly during busy routines or over extended monitoring periods. Second, independent reading episodes occurring outside parental supervision may go unrecorded. This limitation is likely to increase with children’s age and autonomy, potentially leading to a systematic underestimation of reading time in older children. Third, the use of a stopwatch, as opposed to manual time entry, may introduce additional noise—for example when the timer is started unintentionally or not stopped immediately after a reading episode. Despite these limitations, the logged reading times remained very similar to those reported by Locher and collaborators [49] with university students. Importantly, these limitations indicate that app-based logged measure should not be interpreted as exhaustive records of all reading activity. Rather, they should be considered conservative, behaviorally anchored estimates of supervised reading time.

4.2. Data quality of self-reported measure

Contrary to the log-based measure, self-reported measure showed clear signs of overestimation. This overestimation may be due to social desirability bias, which could have been stronger in this study because total reading time included self-reports for three types of reading activities (shared reading, independent reading, and reading for homework), potentially amplifying the effect. In this regard, research on reading time rarely includes controls for social desirability bias [58], even though several methods have been developed to address it [59]. In contrast, the logged measure seems to be less sensitive to social desirability bias, as it is more difficult or impractical to inflate reading time when using a stopwatch.

Another, non-exclusive explanation for the overestimation in self-reports is that parents may have difficulty clearly distinguishing between different types of reading activities. As a result, a single reading event may be reported under multiple categories (e.g., both shared reading and reading for homework), leading to double counting and an inflated total reading time. The rationale for separating reading activities was to capture the diversity of reading practices across the primary school years. For example, Grade 1 children are more likely to engage in shared reading and less in independent reading, while the opposite tends to be true in Grade 5 [51,52]. Using a single global question that explicitly includes all types of reading activities might have reduced overestimation. In the same vein, it is possible that parents overestimate reading time because children appear to be reading (e.g., holding a book) without sustained engagement in decoding or comprehension. This limitation should be more pronounced for self-reports than for log-based measures. Stopwatch-based logging requires an explicit initiation of reading episodes, thereby reducing reliance on inference or routine-based assumptions. However, the present design did not allow for direct verification of children’s cognitive engagement during reading episodes. Neither the questionnaire nor the app-based logging can distinguish between active reading and more superficial engagement.

4.3. What do these two measures really capture?

At first glance, the moderate correlation between the two measures suggests that logged and self-reported data capture only partially overlapping constructs, in line with findings from digital media use research, where similar discrepancies between logged and self-reported measures have been observed [48]. This is further supported by the low level of agreement between them, indicating that the measures are not interchangeable for consistently ranking participants. The Bland–Altman analysis revealed a systematic bias, with the self-reported measure overestimating reading time for nearly all participants by an average of four hours. This finding supports the idea that self-reports are sensitive to social desirability bias, as previously discussed. However, this explanation is not entirely sufficient, as the bias was also proportional to the total reading time reported by participants. One possible explanation is that frequent readers tend to overreport their reading time, as it may be more difficult to estimate the duration of habitual activities embedded in daily routines [26]. It is also possible that less skilled readers, due to the cognitive effort required and reduced enjoyment, perceive time as passing more slowly, which may lead to overestimation in self-reported reading time [60].

The proportional nature of this systematic bias remains an open question, as other research has identified different patterns. In the field of digital media use, a regression-to-the-mean bias—where heavy users tend to underreport their activity, while light users tend to overreport it—has been described in several studies [6163]. As acknowledged by Scharkow [63], several factors may influence the direction and nature of systematic bias, including gender, cognitive abilities, social desirability, behavior frequency, the specificity of the construct measured, and methodological limitations of the instruments. Systematic bias can distort not only correlations with other variables but also estimates of means, variances, and proportions. Future studies using logged data to measure reading time should aim to disentangle the contribution of these factors to better understand the biases affecting self-reported data, which appear less suitable for accurately capturing actual reading time.

4.4. How are these two measures associated with reading fluency?

Both measures were significantly correlated with reading fluency, with the logged measure showing the strongest association. Regarding this difference, it is possible that mean imputation may have slightly attenuated correlations involving the self-reported measure. However, given the very small proportion of missing data and the use of grade-level imputation, this effect is likely negligible and unlikely to explain the observed superiority of the logged measure. Importantly, available evidence indicates that the present sample size was adequate to investigate associations between different measures of reading time and reading fluency. Based on prior literature, Torppa et al. (2020) reported correlations between leisure reading and reading fluency ranging from r = .28 to  .41 (M ≈ .35) in children from Grade 1 to Grade 4 [16]. These findings converge with the meta-analysis by Mol and Bus (2011) [12], which showed that associations with reading fluency are generally stronger than those observed for more constrained basic decoding skills. Adopting a conservative expected correlation of r = .28 and a conventional target power of 80% [64], an a priori power analysis using the pwr.r.test function from the R package pwr indicated that approximately 97 participants would be required (two-sided α = .05). With the present sample size (N = 109), the corresponding post hoc power based on the observed correlation (r = .25) is approximately 75%, and the sample size required to detect this effect with 80% power would be approximately 123 participants.

This predictive advantage was further confirmed in the regression analyses: when both measures were included in the model, only the logged measure remained a significant predictor. This suggests that the self-reported measure no longer provided meaningful information, likely due to shared variance with the logged measure. This finding is consistent with the literature, which shows that daily measures of reading time are often considered the gold standard for assessing print exposure. They limit retrospective bias and offer superior content, convergent, and criterion validity [35,39]. Their predictive power is comparable to that of print exposure checklists, which have been shown to outperform self-reported reading time [12].

Moreover, the fact that mother’s occupation was a significant predictor only in the self-reported model, but not in the logged model, supports the idea that more precise behavioral tracking can reduce reliance on proxy variables. Although this issue has not, to our knowledge, been directly examined in the field of literacy, it has been addressed in other domains. For example, in the insurance industry, behavioral tracking—such as installing devices in cars to record real-time driving behaviors (e.g., duration, distance, speed)—allows for a more accurate assessment of risk than relying on proxies like income or occupation [65]. It is well established that proxy variables can introduce measurement bias and may fail to capture the underlying construct. Taken together, these findings help explain the added value of the logged measure over the traditional self-reported measure.

4.5 Implications and challenges for future research

Despite these promising results, several avenues for improving the reliability of app-based logged measures can be identified. First, data quality could be improved. Allowing retrospective entries may reduce missing data. However, it may also encourage participants to abandon real-time logging. This would reintroduce recall bias and create heterogeneous data types [58,59]. A better solution would be to include a post-study compliance questionnaire. This would allow researchers to assess how consistently families used the stopwatch function. An annotation feature could also be added. Users could then report recording errors, such as forgetting to stop the timer.

Second, as discussed earlier, parental involvement may have influenced data completeness. In the present study, parents were asked to record their child’s reading activities. It is possible that the degree of parental supervision varied depending on the child’s age and level of autonomy. It may also have differed according to socioeconomic background. Such variability could have affected the proportion of missing data and the overall consistency of logging. Future studies should therefore include specific measures assessing children’s autonomy in both reading practices and recording behaviors. An open question remains whether child self-recording would provide more complete or more reliable data. The answer likely depends on developmental factors and the child’s level of autonomy.

Third, engagement and scalability must be considered. In the present study, research assistants supported families. This support improved compliance but limits scalability. Families more familiar with digital tools may also have been more likely to participate. This raises potential selection bias. These issues could be mitigated by improving the interface and simplifying the user experience. Gamification features may also enhance engagement. However, researchers should monitor possible reactivity effects induced by such features.

Improved engagement could make longer data collection periods more feasible. In the present study, we selected a 14-day period based on prior diary research [40,49]. Because app-based logging reduces participant burden, extended recording periods may be achievable without compromising data quality. This would make the method suitable for longitudinal designs and allow researchers to examine learning dynamics in greater detail.

App-based logging also opens broader perspectives for cross-cultural research. Applications can be developed in multiple languages. This may facilitate participation from families who do not speak the language of schooling [19]. Such an approach is particularly relevant because literacy practices in the home language can transfer to school-based literacy skills [66,67].

Finally, this approach is not limited to home literacy research. App-based logging could be implemented in school contexts. For example, it could complement existing measures of Academic Learning Time [68]. Current methods rely on teacher self-reports or direct classroom observations. The former are efficient but prone to bias. The latter are resource intensive. Real-time logging by teachers may offer a useful compromise.

4.6. Conclusion

While our findings may appear to argue for abandoning questionnaires, we believe they instead open new avenues for improving these measures. Rather than opposing the two approaches, their combined use in future studies could help identify the various factors that contribute to systematic biases, as widely documented in the literature. This, in turn, would support the development of statistical techniques to adjust for these influences [48,69]. Strengthening the predictive validity of questionnaires through such corrections would preserve their value as practical and scalable tools. Ultimately, integrating real-time and self-reported measures holds strong potential for advancing research on reading practices—both in ecological and methodological terms.

Acknowledgments

We would like to thank Pierre Blache, Inspector of National Education, as well as the teachers who made it possible to implement the project in their schools and classrooms. We are also deeply grateful to the children and their parents for their participation in the study. Finally, we wish to acknowledge Florian Amerigo and Marylou Garabedian for their valuable assistance in data collection as part of their Master’s thesis work.

Declaration of generative AI and AI-assisted technologies in the writing process: During the preparation of this work, the authors used ChatGPT to proofread the manuscript. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

References

  1. 1. Council of European Union. Council recommendation on key competences for lifelong learning. European Union; 2018.
  2. 2. Organización de Estados Iberoamericanos. Metas educativas 2021: La educación que queremos para la generación de los bicentenarios. Madrid: Organización de Estados Iberoamericanos; 2010.
  3. 3. Jiménez Pérez E del P. Pensamiento crítico VS competencia crítica: lectura crítica. Revista científica y de divulgación ISL. 2023;18(1):1–26.
  4. 4. Mullis I, von Davier M, Foy P, Fishbein B, Reynolds K, Wry E. PIRLS 2021 international results in reading. 2023. https://doi.org/10.6017/lse.tpisc.tr2103.kb5342
  5. 5. Stancel‐Piątak A, Mirazchiyski P, Desa D. Promotion of reading and early literacy skills in schools: a comparison of three European countries. Eur J Educ. 2013;48:498–510.
  6. 6. Araújo L, Costa P. Home book reading and reading achievement in EU countries: the Progress in International Reading Literacy Study 2011 (PIRLS). Educ Res Evaluat. 2015;21(5–6):422–38.
  7. 7. Sénéchal M, LeFevre J-A. Parental involvement in the development of children’s reading skill: a five-year longitudinal study. Child Dev. 2002;73(2):445–60. pmid:11949902
  8. 8. Sénéchal M, Whissell J, Bildfell A. Starting from home: home literacy practices that make a difference. In: Theories of reading development. John Benjamins Publishing Company; 2017. 383–408. https://doi.org/10.1075/swll.15.22sen
  9. 9. Sénéchal M. Testing the home literacy model: Parent involvement in kindergarten is differentially related to grade 4 reading comprehension, fluency, spelling, and reading for pleasure. Scient Stud Read. 2006;10:59–87.
  10. 10. Cunningham AE, Stanovich KE. Early reading acquisition and its relation to reading experience and ability 10 years later. Dev Psychol. 1997;33(6):934–45. pmid:9383616
  11. 11. van Bergen E, Snowling MJ, de Zeeuw EL, van Beijsterveldt CEM, Dolan CV, Boomsma DI. Why do children read more? The influence of reading ability on voluntary reading practices. J Child Psychol Psychiatry. 2018;59(11):1205–14. pmid:29635740
  12. 12. Mol SE, Bus AG. To read or not to read: a meta-analysis of print exposure from infancy to early adulthood. Psychol Bull. 2011;137(2):267–96. pmid:21219054
  13. 13. Petscher Y. A meta-analysis of the relationship between student attitudes towards reading and achievement in reading. J Res Read. 2010;33:335–55.
  14. 14. Stanovich KE. Matthew effects in reading: some consequences of individual differences in the acquisition of literacy. Read Res Q. 1986;21(4):360–407.
  15. 15. Erbeli F, van Bergen E, Hart SA. Unraveling the relation between reading comprehension and print exposure. Child Dev. 2020;91(5):1548–62. pmid:31732976
  16. 16. Torppa M, Niemi P, Vasalampi K, Lerkkanen M-K, Tolvanen A, Poikkeus A-M. Leisure reading (But not any kind) and reading comprehension support each other-a longitudinal study across grades 1 and 9. Child Dev. 2020;91(3):876–900. pmid:30927457
  17. 17. van Bergen E, Vasalampi K, Torppa M. How are practice and performance related? Development of reading from age 5 to 15. Read Res Q. 2021;56:415–34.
  18. 18. Barone C, Fougère D, Pin C. Social origins, shared book reading, and language skills in early childhood: evidence from an information experiment. Eur Sociol Rev. 2021;37:18–31.
  19. 19. Salas N, Pascual M. Impact of school SES on literacy development. PLoS One. 2023;18(12):e0295606. pmid:38127961
  20. 20. Locher FM, Philipp M. Measuring reading behavior in large-scale assessments and surveys. Front Psychol. 2023;13:1044290. pmid:36817384
  21. 21. Kristensen PL, Olesen LG, Egebæk HK, Pedersen J, Rasmussen MG, Grøntved A. Criterion validity of a research-based application for tracking screen time on android and iOS smartphones and tablets. Comp Human Behav Rep. 2022;5:100164.
  22. 22. McGeown SP, Duncan LG, Griffiths YM, Stothard SE. Exploring the relationship between adolescent’s reading skills, reading motivation and reading habits. Read Writ. 2015;28:545–69.
  23. 23. Pillinger C, Vardy EJ. The story so far: a systematic review of the dialogic reading literature. J Res Read. 2022;45:533–48.
  24. 24. Locher FM, Becker S, Pfost M. The relation between students’ intrinsic reading motivation and book reading in recreational and school contexts. AERA Open. 2019;5(2).
  25. 25. Locher F, Pfost M. The relation between time spent reading and reading comprehension throughout the life course. J Res Read. 2020;43:57–77.
  26. 26. Schwarz N, Oyserman D. Asking questions about behavior: cognition, communication, and questionnaire construction. Am J Eval. 2001;22:127–60.
  27. 27. Bus AG, van Ijzendoorn MH, Pellegrini AD. Joint book reading makes for success in learning to read: a meta-analysis on intergenerational transmission of literacy. Rev Educ Res. 1995;65:1.
  28. 28. Sénéchal M, LeFevre JA, Hudson E, Lawson EP. Knowledge of storybooks as a predictor of young children’s vocabulary. J Educ Psychol. 1996;88(3):520–36.
  29. 29. Kihlstrom JF, Eich E, Sandbrand D, Tobias BA. Emotion and memory: implications for self-report. https://doi.org/10.4324/9781410601261-12
  30. 30. Juster FT, Ono H, Stafford FP. An assessment of alternative measures of time use. Sociol Methodol. 2003;33:19–54.
  31. 31. Lira B, O’Brien JM, Peña PA, Galla BM, D’Mello S, Yeager DS, et al. Large studies reveal how reference bias limits policy applications of self-report measures. Sci Rep. 2022;12(1):19189. pmid:36357481
  32. 32. Neuberger L. Self-reports of information seeking: Is social desirability in play?. Atl J Commun. 2016;24:242–9.
  33. 33. Mortel TF van de. Faking it: social desirability response bias in self-report research. Austral J Adv Nurs. 2008;25:40–8.
  34. 34. Foasberg NM. Student reading practices in print and electronic media. Coll Res Libr. 2014;75(5):705–23.
  35. 35. Allen L, Cipielewski J, Stanovich KE. Multiple indicators of children’s reading habits and attitudes: Construct validity and cognitive correlates. J Educ Psychol. 1992;84(4):489–503.
  36. 36. St Clair-Thompson H, Graham A, Marsham S. Exploring the reading practices of undergraduate students. Educ Inq. 2017;9(3):284–98.
  37. 37. Wigfield A, Guthrie JT. Relations of children’s motivation for reading to the amount and breadth or their reading. J Educ Psychol. 1997;89(3):420–32.
  38. 38. Sonnenschein S, Baker L, Serpell R, Scher D, Truitt VG, Munsterman K. Parental beliefs about ways to help children learn to read: the impact of an entertainment or a skills perspective. Early Child Develop Care. 1997;128(1):111–8.
  39. 39. Anderson RC, Wilson PT, Fielding LG. Growth in reading and how children spend their time outside of school. Read Res Q. 1988;23:285–303.
  40. 40. Conner T, Lehman B. Getting started: Launching a study in daily life. In: Handbook of research methods for studying daily life. New York: The Guilford Press; 2012. 89–107.
  41. 41. Juster FT. Response errors in the measurement of time use. J Am Stat Assoc. 1986;81:390–402.
  42. 42. Braak PT, van Tienoven TP, Minnen J, Glorieux I. Data quality and recall bias in time-diary research: The effects of prolonged recall periods in self-administered online time-use surveys. Sociol Methodol. 2023;53:115–38.
  43. 43. Bolger N, Davis A, Rafaeli E. Diary methods: capturing life as it is lived. Annu Rev Psychol. 2003;54:579–616. pmid:12499517
  44. 44. Trull TJ, Ebner-Priemer U. The Role of Ambulatory Assessment in Psychological Science. Curr Dir Psychol Sci. 2014;23(6):466–70. pmid:25530686
  45. 45. Fahrenberg J, Myrtek M, Pawlik K, Perrez M. Ambulatory assessment - monitoring behavior in daily life settings. Euro J Psychol Assess. 2007;23:206–13.
  46. 46. Bardram JE, Westermann M, Makulec JG, Ballegaard M. The Neuropathy Tracker-A mobile health application for ambulatory and self-administred assessment of neuropathy. PLOS Digit Health. 2025;4(2):e0000725. pmid:39937736
  47. 47. Sliwinski MJ, Mogle JA, Hyun J, Munoz E, Smyth JM, Lipton RB. Reliability and validity of ambulatory cognitive assessments. Assessment. 2018;25(1):14–30. pmid:27084835
  48. 48. Parry DA, Davidson BI, Sewall CJR, Fisher JT, Mieczkowski H, Quintana DS. A systematic review and meta-analysis of discrepancies between logged and self-reported digital media use. Nat Hum Behav. 2021;5(11):1535–47. pmid:34002052
  49. 49. Locher F, Schnabel VA, Unger V, Pfost M. Measuring students‘ reading behavior with an ambulatory assessment – a field report on a smartphone-based reading diary study. Methods Data Anal. 2023;17.
  50. 50. Ducrot S, Persia-Leibnitz L, Vernet M, Brossette B, Prugnières C, Grainger J. Children in French overseas departments are at a 3-fold increased risk of developing reading problems. Int J Educ Dev. 2025;115:103277.
  51. 51. Worthy J, Broaddus K. Fluency beyond the primary grades: from group performance to silent, independent reading. Reading Teacher. 2002;55:334–43.
  52. 52. Silinskas G, Sénéchal M, Torppa M, Lerkkanen MK. Home literacy activities and children’s reading skills, independent reading, and interest in literacy activities from kindergarten to grade 2. Front Psychol. 2020;11.
  53. 53. Flutter flow. 2024.
  54. 54. Lefavrais P. L’alouette-R. Centre de psychologie appliquée; 2005.
  55. 55. Wilcox RR, Keselman HJ. Modem robust data analysis methods: measures of central tendency. Psychol Methods. 2003;8(3):254–74.
  56. 56. Revelle W. Psych: Procedures for psychological, psychometric, and personality research. 2025.
  57. 57. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–10. pmid:2868172
  58. 58. Brossette B, Vernet M, Prugnières C, Chalbos M-J, Ziegler JC, Ducrot S. Print exposure and reading development in the French educational context: A systematic review. Int J Educ Res. 2025;134:102838.
  59. 59. Larson RB. Controlling social desirability bias. International Journal of Market Research. 2018;61(5):534–47.
  60. 60. Deng T, Kanthawala S, Meng J, Peng W, Kononova A, Hao Q, et al. Measuring smartphone usage and task switching with log tracking and self-reports. Mobile Media Commun. 2018;7(1):3–23.
  61. 61. Vanden Abeele M, Beullens K, Roe K. Measuring mobile phone use: Gender, age and real usage level in relation to the accuracy and validity of self-reported mobile phone use. Mobile Media Commun. 2013;1(2):213–36.
  62. 62. Collopy F. Biases in retrospective self-reports of time use. Manage Sci. 1996.
  63. 63. Scharkow M. The accuracy of self-reported internet use—a validation study using client log data. Commun Methods Measures. 2016;10(1):13–27.
  64. 64. Cohen J. A power primer. Psychol Bull. 1992;112(1):155–9. pmid:19565683
  65. 65. Steinberg E. Run for your life: the ethics of behavioral tracking in insurance. J Bus Ethics. 2021;179(3):665–82.
  66. 66. Kim W-J, Yim D. Exploring the influence of the home literacy environment on early literacy and vocabulary skills in Korean-English bilingual children. Front Psychol. 2024;15:1336292. pmid:38524291
  67. 67. Relyea JE, Zhang J, Liu Y, Lopez Wui MaG. Contribution of home language and literacy environment to English reading comprehension for emergent bilinguals: sequential mediation model analyses. Read Res Quart. 2020;55:473–92.
  68. 68. Berliner DC. Academic learning time and reading achievement. In: Guthrie JT, ed. Comprehension and teaching: research reviews. International Reading Association; 1981.
  69. 69. van Smeden M, Lash TL, Groenwold RHH. Reflection on modern methods: five myths about measurement error in epidemiological research. Int J Epidemiol. 2020;49(1):338–47. pmid:31821469