Fig 1.
General workflow of the analysis.
Workflow included selection and refining of texts (features X), online survey with rating estimation (targets Y) and finding a regression model f that maps X to Y.
Fig 2.
Detailed workflow of the modeling.
Steps included in the data processing: Feature extraction, train/test splitting and prediction model training. Prediction model consisted of feature extractor and rating predictor modules, which were trained separately using (non-trainable) hyperparameters. Training and testing were repeated independently for 10 folds (data ratio 9:1 for training and testing).
Fig 3.
Cumulative user bias histograms for individual text properties.
The biases reflect the tendencies of individual raters to over or under-rate the texts compared to the population average.
Fig 4.
User and item estimates are correlated between text properties.
Between items (N = 364) comparisons are shown in the lower triangular, while the upper triangular portion is for users (N = 416). All correlations were significant at p<10−6, FDR adjusted separately for both triangular parts.
Fig 5.
Variances between rating biases varied between text properties.
Variances are shown in parenthesis, while matrix elements depict their ratios. Between-users (N = 416) ratios are shown in the lower triangular, while the upper triangular portion is for items (N = 364). When computing ratios, the larger variance was always set as the nominator for easier visual inspection. Statistically significant ratios are marked with * (p<0.05) and ** (p<0.001), FDR adjusted separately for both triangular parts.
Table 1.
Summary of the most and least trustworthy texts.
Fig 6.
There were notable correlations between behavioral parameters.
Spearman rank correlations between behavioral parameters (n = 407). Statistically significant correlations are marked with * (p<0.05) and ** (p<0.001), FDR adjusted.
Fig 7.
Textual ratings were biased by behavioral parameters.
Spearman rank partial correlations between behavioral parameters and user bias estimates (N = 407). Computation was done independently for each text property while controlling the influence of behavioral variables. Statistically significant correlations are marked with * (p<0.05) and ** (p<0.001), FDR adjusted for each column.
Fig 8.
Ratings were predicted well by linear models and their ensembles.
Best model performances measured by the MSE ratio (i.e., MSE of the model divided by that of a constant-only model) of the test set with 10-fold cross-validation. The smaller scores are better. sLEM stands for sequential Linear Ensemble Model.
Table 2.
Top features depended on text property.