Fig 1.
Log-histograms of sampled user submission and comment counts.
Table 1.
Totals for missing data in the Baumgartner dataset.
Fig 2.
Burstiness of missing submissions and comments per month, 2005-June 2017.
Fig 3.
Varied measures of missing submissions per month.
Medium blue circles denote the percent of submissions missing for each month of data, bright blue squares denote the average percent of missing submissions to date, and dark blue stars denote the cumulative total percent of missing submissions to date.
Fig 4.
Varied measures of missing comments per month.
Medium blue circles denote the percent of comments missing for each month of data, bright blue squares denote the average percent of missing comments to date, and dark blue stars denote the cumulative total percent of missing comments to date.
Fig 5.
Gaps are not evenly distributed across communities.
The total historical counts of comments per community comments are mildly correlated with the number of dangling references, while submissions are not very correlated with the number of dangling references.
Table 2.
Regression exploring the relationship between amount of missing content per subreddit and total amount of known content per subreddit, and month in which the subreddit was created.
We expect that these two variables would have meaningful explanatory power for where missing content is—we find that this appears to be the case for missing comments but not for missing submissions, as evidenced by the relative R2 values.