Fig 1.
Design features that may affect tool performance.
This figure highlights and explains eight features that can affect tool performance,and provides explanations and guiding questions for each feature.
Table 1.
Overview of individual comparisons.
We highlight in overview form each tool comparison that we performed. A dash (“-”) indicates that the tools differed in one of the following ways, and a cross mark (“X”) denotes that the difference made a substantial impact on the final performance. Differences include how the definitions of terms are operationalized into a tool (e.g. does the tool recognize power calculations or other means to check for group size), document input format and requirements for preprocessing (e.g. extraction of text or images from PDF documents can be challenging and introduce systematic errors), section of the paper examined (e.g. related to sensitivity vs specificity as some information may be missed by a tool that does not run on the section of text where the mention appears), selection of training and validation data such as the field (e.g. clinical studies vs psychology), algorithm choice (e.g. regular expressions vs large language models), the openness & accessibility of the tool (e.g. does the tool have a version with a user interface, is the tool maintained, is the tool free and open code), and the desired performance (e.g. the tool may be tuned for sensitivity to capture more cases with more false positives). See Fig 1 for more details on the main differences between tools.