Test-retest reliability of the HEXACO-100—And the value of multiple measurements for assessing reliability

Sam Henry; Isabel Thielmann; Tom Booth; René Mõttus

doi:10.1371/journal.pone.0262465

Peer Review History

Original SubmissionSeptember 24, 2021
1 Nov 2021 Decision Letter - Frantisek Sudzina, Editor PONE-D-21-30871Test-retest reliability of the HEXACO-100PLOS ONE Dear Dr. Henry, Thank you for submitting your manuscript to PLOS ONE. We invite you to look at the reviewer's suggestions and think whether they could be used to improve the article. Please submit your revised manuscript by Dec 16 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Frantisek Sudzina Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. 3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 4. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well. 5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for the opportunity to review this paper. I see it as being both interesting and very important to researchers using the HEXACO 60 and 100 personality inventories. I also really enjoyed the analysis of and discourse about the item-level data. I believe that this paper is likely to become the ‘go-to’ paper for people wishing to cite evidence of the reliability of these measures. Overall, I have only two relatively minor suggestions/thoughts. 1. If the authors have more information they can share about the sample (e.g., country of origin, education levels), I strongly encourage them to add this information to the Participants sections, if there is space. My thinking here is that these details may be important for future researchers who, for example, conduct a similar study but receive different results and wish to understand why. 2. After reading the third paragraph on page 13, which pondered the effects of items’ contextualisation levels on reliability, I wondered whether contextualisation levels would be predictably positively associated with rTT but negatively associated with α. For example, whether a person likes poetry _today_ is probably very strongly associated with whether they like it tomorrow, next week, next year, and so on (high rTT). But it’s not hard to imagine there would exist plenty of people who love poetry but are indifferent to, say, classical music or ancient ruins, thus the contextualised items contribute negatively to alpha. I concur with the authors’ speculation that more generic/less contextualised items (e.g., a hypothetical item, “I like artistic things”) may undermine rTT, for all the reasons the authors mentioned (e.g., what artistic things are they thinking about in that moment? Have they enjoyed/not enjoyed a recent artistic experience?). And I could see how such an item would positively influence alpha, as the generic item would represent, to some extent, any of the specific/contextualised items in the same scale. Anyway these are just thoughts; I do _not_ insist the authors should to include them in their revision. Reviewer #2: Review PONE-D-21-30871: “Test-retest reliability of the HEXACO-100” Many thanks for the opportunity to review this manuscript, which provides - once again - evidence that alpha reliability is a less optimal parameter than test-retest reliability. Based on my reading of the manuscript, I have a few suggestions and comments: 1. One of the open questions for me is what the most optimal time period is to establish test-retest reliability. The authors chose 12 days (please provide mean and SD of number of days or even hours between the two ratings; and please check whether the individual number of days has an effect on r(tt)!), but I’m not sure whether this is the optimal time period and what is actually the most optimal time period for personality questionnaires. That is, in the introduction and the discussion, I would like the authors to explain a bit more, based maybe on memory research (which, of course, also shows large individual differences) and based on the traitedness of a construct and the possible time frame for changes to occur, what kind of time frame would be most optimal to establish r(tt). 2. With respect to the above time frame, the findings can also be used to comment on McCrae’s (2015) approach to distinguish trait, method, specific, and error variance components. McCrae notes that specific variance is obtained by subtracting alpha from r(tt), but in most cases this would yield a negative specific variance in the current study. As McCrae notes: “By definition, [...] specific variance in an item is not shared by other items in the scale, so it detracts from alpha. However, in retest designs, the same items, with the same specific variance, are readministered, and they may elicit the same response. Item-specific variance could thus account for the fact that retest reliability is greater than alpha, especially if we also assume that method variance is stable over short intervals.” (McCrae, 2015, p. 2) That is, McCrae’s formula implies that the time period between two measures of the same construct should depend on the specific variance (i.e., if there is more specific variance, the time period should be longer, because else r(tt) is bound to be greater than alpha. I’d love the authors to comment on this. Note: I must admit there are notable problems with McCrae’s approach, something that is long overdue being commented on. 3. I wondered about the criteria to establish whether an item is a ‘good’ item. One could argue that both r(ca) and r(tt) are important, and not just r(ca). But how to weigh these is - to me - an open question. Logically, r(tt) is a necessary, but not sufficient, condition for r(ca) (i.e., a highly temporally stable item may not be observable, and thus have a low r(ca), whereas r(ca) may be a sufficient condition for r(tt) (if items are really observable and there is high r(ca), by necessity there is a high r(tt)). But the question is whether you only want to have observability criteria (or other criteria aligned with r(ca), e.g., ‘item domain’, see De Vries et al., 2016) properties in a personality questionnaire. I would love to see the authors make a statement about this in the discussion and maybe even suggest which (24? 48?) items would provide the most suitable short measure of the HEXACO-100 (with coverage of each facet) according to their criteria. 4. Last but not least, I would love the authors to make the title a bit more informative about the implications of the manuscript, especially with respect to the importance of test-retest reliability and the fact that alpha reliability should be less often used as a measure of reliability. As a final note, please refrain from using the term ‘internal consistency’ and/or explain that it is a misnomer, because alpha does not measure internal consistency (with thousands of items, any scale has a high alpha, but can have practically zero internally consistency). See Sijtsma (2009); just call it ‘alpha reliability’ or ‘internal reliability’. McCrae, R. R. (2015). A more nuanced view of reliability: Specificity in the trait hierarchy. Personality and Social Psychology Review, 19(2), 97-112. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes:** Reinout E. de Vries [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0262465.r001
Revision 1
22 Dec 2021 Author Response Response to Reviewers \| PONE-D-21-30871R1 \| Test-Retest Reliability of the HEXACO-100 Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf We have carefully reviewed PLOS ONE’s style requirements and believe that all files meet these requirements. 2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. The minimal underlying dataset for test-retest reliability has now been submitted as Supporting Information (S2_File.xlsx) and can also be found in the project page on the Open Science Framework website, listed in the manuscript (https://osf.io/wz3du/). We also include the Lee & Ashton (2018) article, which provides facet alphas and cross-rater agreement estimates in Table 4, as is now noted in the Data Availability statement. We now also note in the Data Availability statement how to access the single-item cross-rater agreement data: “Though the authors of Lee & Ashton (2018) do not specifically indicate in their article how to access these data, we simply sent them an email requesting the raw self-/observer data and explaining how we intended to use them. We would thus confirm that, to our knowledge, others can access these datasets and would be able to access these data in the same manner as the authors of the present manuscript by contacting the authors Lee & Ashton (2018) in a similar manner.” 3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. See previous response; our data is made available at the following data repository: https://osf.io/wz3du/. 4. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well. We now include a full ethics statement in the “Methods” section that includes the full name of the ethics committee at the University of Edinburgh who approved our study. We also note that consent was in written form, given online at the time of completing the survey. 5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. We have revised the reference section and can confirm that 1) all references are up to date and correct, and 2) none of our references have been retracted. ----------- Reviewer #1: Thank you for the opportunity to review this paper. I see it as being both interesting and very important to researchers using the HEXACO 60 and 100 personality inventories. I also really enjoyed the analysis of and discourse about the item-level data. I believe that this paper is likely to become the ‘go-to’ paper for people wishing to cite evidence of the reliability of these measures. Overall, I have only two relatively minor suggestions/thoughts. Thank you for this very positive evaluation of the manuscript. 1. If the authors have more information they can share about the sample (e.g., country of origin, education levels), I strongly encourage them to add this information to the Participants sections, if there is space. My thinking here is that these details may be important for future researchers who, for example, conduct a similar study but receive different results and wish to understand why. Though we did not explicitly ask participants about these variables, we were able to access information via Prolific Academic for many of our participants’ first language, country of birth and residence, student status, and occupational status. All of these are now included in the “Participants” subsection of our “Materials and Methods” section (lines 177-189). We did not have full data for every demographic variable because, at the time of extracting it, some participants had removed their information. Where we have a substantial amount of missing data (e.g., for student and occupational status) we mention this specifically. Reporting these demographic variables also allowed us to examine the effect of having English as a first language on rTT, as nearly three quarters of our sample were non-natives. There did indeed appear to be a slight difference between natives and non-natives. We now discuss this and its implications in the Methods, Results, and Limitations sections (lines 177-183, 246-252, 486-493), noting that our reported rTTs may thus be lower bound estimates. 2. After reading the third paragraph on page 13, which pondered the effects of items’ contextualisation levels on reliability, I wondered whether contextualisation levels would be predictably positively associated with rTT but negatively associated with α. For example, whether a person likes poetry _today_ is probably very strongly associated with whether they like it tomorrow, next week, next year, and so on (high rTT). But it’s not hard to imagine there would exist plenty of people who love poetry but are indifferent to, say, classical music or ancient ruins, thus the contextualised items contribute negatively to alpha. I concur with the authors’ speculation that more generic/less contextualised items (e.g., a hypothetical item, “I like artistic things”) may undermine rTT, for all the reasons the authors mentioned (e.g., what artistic things are they thinking about in that moment? Have they enjoyed/not enjoyed a recent artistic experience?). And I could see how such an item would positively influence alpha, as the generic item would represent, to some extent, any of the specific/contextualised items in the same scale. Anyway these are just thoughts; I do _not_ insist the authors should to include them in their revision. We think that this is an interesting proposition and spent some time trying to fit it into the Discussion; we even went so far as to attempt to test this hypothesis (see below). Ultimately, we chose not to include it in the final document as we felt it distracted somewhat from the main purpose, which was simply to speculate on the wide variety of causes of variation in item properties. However, we agree that this would be worth exploring further in any work that specifically looks at contextuality. Excised: “Specifically, we examined the correlations of item rTT with both �s and rTTs of the facets and domains the items measure. For facets, we found a strong relation (ρ = .58) between single-item rTT and facet rTT on the one hand (which is reasonable given that each item effectively contributes 25% to their facet’s rTT). On the other hand, single-item rTT and facet �s had a much smaller association, and in the opposite direction than we might have expected suggested in the previous paragraph (ρ = .11). When expanding these analyses to domains, these relations effectively went to zero. The Spearman’s correlation between item and domain rTT was ρ = .10 and between item rTT and domain �, ρ = -.01. Based on this, there is not strong evidence to suggest that, were it the case that more specific, less contextually variable items are more reliable than more general items, that has to come at the “cost” of lower �.” ----------- Reviewer #2: Review PONE-D-21-30871: “Test-retest reliability of the HEXACO-100” Many thanks for the opportunity to review this manuscript, which provides - once again - evidence that alpha reliability is a less optimal parameter than test-retest reliability. Based on my reading of the manuscript, I have a few suggestions and comments: 1. One of the open questions for me is what the most optimal time period is to establish test-retest reliability. The authors chose 12 days (please provide mean and SD of number of days or even hours between the two ratings; and please check whether the individual number of days has an effect on r(tt)!), but I’m not sure whether this is the optimal time period and what is actually the most optimal time period for personality questionnaires. That is, in the introduction and the discussion, I would like the authors to explain a bit more, based maybe on memory research (which, of course, also shows large individual differences) and based on the traitedness of a construct and the possible time frame for changes to occur, what kind of time frame would be most optimal to establish r(tt). We are happy to elaborate on these issues. We have added a paragraph in the Methods section with a comprehensive description of times: both the days, hours, and minutes between survey administration as well as the time it took participants to complete the survey at each time point. We then examined whether either of these had an impact on retest reliability by correlating them with participants T1-T2 overall profile consistency. In summary, we found no evidence that interval length or overall duration had an effect on reliability, but please see lines 228-245 for more details. Regarding the second point, we were not able to find much information on ideal intervals aside from previous empirical work comparing different intervals, which we now include in the manuscript – both in the Introduction and the Discussion. We now more explicitly note that we chose our interval (actually closer to 13 than 12 days) largely to be consistent with and comparable to previous research. We also dedicate more space to previous research which has discussed the issue of appropriate interval length and go on to note that there is little theoretical rationale even in the typically-cited papers that refers to memory work in particular. We thus agree that these are interesting and important questions that need to be explored with respect to retest reliability, especially in an era where single-item properties are receiving more and more attention (see lines 119-129 and 495-501 where we comment on this). 2. With respect to the above time frame, the findings can also be used to comment on McCrae’s (2015) approach to distinguish trait, method, specific, and error variance components. McCrae notes that specific variance is obtained by subtracting alpha from r(tt), but in most cases this would yield a negative specific variance in the current study. As McCrae notes: “By definition, [...] specific variance in an item is not shared by other items in the scale, so it detracts from alpha. However, in retest designs, the same items, with the same specific variance, are readministered, and they may elicit the same response. Item-specific variance could thus account for the fact that retest reliability is greater than alpha, especially if we also assume that method variance is stable over short intervals.” (McCrae, 2015, p. 2) That is, McCrae’s formula implies that the time period between two measures of the same construct should depend on the specific variance (i.e., if there is more specific variance, the time period should be longer, because else r(tt) is bound to be greater than alpha. I’d love the authors to comment on this. Note: I must admit there are notable problems with McCrae’s approach, something that is long overdue being commented on. Another excellent point. Several of the authors on this manuscript are actually also interested in re-visiting this technique and have plans to do so in upcoming projects, utilizing some of the ideas discussed in this paper. That is to say, we wholeheartedly agree that this needs to be commented on in more detail, although we recognize that a full discussion of McCrae’s method is probably beyond the scope of the present manuscript. To the first point, we have now included a few paragraphs commenting on this in the Discussion (lines 373-382). Just as a note, we wonder whether the reviewer may have mis-read Table 1, because for only 3 facets is � > rTT, which is what McCrae’s approach would predict. For example, we now include the following: “Most items have unique valid variance [21,32], and this unique variance is by definition not captured by � but is assessed by rTT (because � removes anything not common to all items); therefore, a trait scale that aggregates multiple items should have rTT > �. Our results support this model, with only three facet �s lower than rTTs. In other words, most facet measures contain both information that is common to items written to measure the trait (e.g., Sincerity) and unique valid content specific to each item, ostensibly indexing a further personality nuance [32]” (lines 378-384). 3. I wondered about the criteria to establish whether an item is a ‘good’ item. One could argue that both r(ca) and r(tt) are important, and not just r(ca). But how to weigh these is - to me - an open question. Logically, r(tt) is a necessary, but not sufficient, condition for r(ca) (i.e., a highly temporally stable item may not be observable, and thus have a low r(ca), whereas r(ca) may be a sufficient condition for r(tt) (if items are really observable and there is high r(ca), by necessity there is a high r(tt)). But the question is whether you only want to have observability criteria (or other criteria aligned with r(ca), e.g., ‘item domain’, see De Vries et al., 2016) properties in a personality questionnaire. I would love to see the authors make a statement about this in the discussion and maybe even suggest which (24? 48?) items would provide the most suitable short measure of the HEXACO-100 (with coverage of each facet) according to their criteria. We have also been pondering how best to operationalize the “goodness” of an item lately, and we have yet to come to any clear conclusion. We agree with the speculations here, but (as we now mention in the Discussion) are relatively hesitant to suggest a specific subset of items given how much there is yet to be understood about the interplay of item properties (content, empirical quality, and otherwise). We offer some tentative starting points for prioritising one item for a given trait over another (e.g., selecting items with high variance, rTT, and rCA) but do not feel we have enough evidence to propose a specific subset of HEXACO items for future research. See the section “Implications for scale development” from line 454 for our full commentary on the topic. 4. Last but not least, I would love the authors to make the title a bit more informative about the implications of the manuscript, especially with respect to the importance of test-retest reliability and the fact that alpha reliability should be less often used as a measure of reliability. As a final note, please refrain from using the term ‘internal consistency’ and/or explain that it is a misnomer, because alpha does not measure internal consistency (with thousands of items, any scale has a high alpha, but can have practically zero internally consistency). See Sijtsma (2009); just call it ‘alpha reliability’ or ‘internal reliability’. Thank you for this suggestion. We agree and have now changed the title to be “Test-Retest Reliability of the HEXACO-100 – and the Value of Multiple Measurements for Assessing Reliability” Regarding the point about alpha, we went back and forth between the two recommendations and have ultimately chosen to continue referring to it as “internal consistency” for ease of differentiating it from rTT throughout the paper and to be consistent with the literature (we, e.g., use a direct quote from McCrae who also referred to internal consistency for alpha). That said, we now include a clarification on the first page that addresses the Reviewer’s point and explains that using “internal consistency” to describe alpha is a misnomer: “We note that alpha does not measure internal consistency of items per se: with hundreds or thousands of items, any scale has high alpha but can have practically zero consistency among individual items. Instead, a scale’s alpha indexes the expected consistency among hypothetical item aggregates containing the same number of items as the scale. Nonetheless, in line with common usage and to clearly distinguish it from rTT, we will also refer to alpha as a measure of internal consistency” (lines 79-84). Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pone.0262465.r002
26 Dec 2021 Decision Letter - Frantisek Sudzina, Editor Test-retest reliability of the HEXACO-100 - and the value of multiple measurements for assessing reliability PONE-D-21-30871R1 Dear Dr. Henry, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Frantisek Sudzina Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: https://doi.org/10.1371/journal.pone.0262465.r003
Formally Accepted
4 Jan 2022 Acceptance Letter - Frantisek Sudzina, Editor PONE-D-21-30871R1 Test-Retest reliability of the HEXACO-100 – and the value of multiple measurements for assessing reliability Dear Dr. Henry: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Frantisek Sudzina Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0262465.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .