Reader Comments

Post a new comment on this article

A similar study published in PLOS ONE (not cited)

Posted by geysenba on 24 Jun 2011 at 17:47 GMT

A very similar study analyzing tweets during the H1N1 pandemic (though with a larger timeframe and a more thorough qualitative analysis and validation of keyword searches), entitled "Pandemics in the Age of Twitter" has been published earlier in PLOS ONE and was not cited:

I always find it disconcerting if authors don't acknowledge/discuss/cite earlier studies (and if neither editor nor reviewers spot the omission), in particular if it is in the same journal!

I should also point out that the Ginsberg study / Google Flutrends (cited in the introduction) was not the first study showing the correlation between Google searches and seasonal influenza - we demonstrated this already in 2006, and this should have been cited as well (

No competing interests declared.

RE: A similar study published in PLOS ONE (not cited)

segre replied to geysenba on 13 Feb 2012 at 20:31 GMT

Establishing precedence of publication is often not as cut and dried as participants close to the process may believe or claim. In fact, your study appeared more than two months (November 29, 2010) after our paper was submitted to PlOS One (September 20, 2010): it simply isn't humanly possible for us (or the reviewers) to have advance knowledge of other papers that have not yet appeared in press.

Put another way, although our own work was first presented in October of 2009 at the 47th Annual Meeting of Infections Disease Society of America, we did not expect others to cite the work until after our paper appeared in journal form.

A second and perhaps more important issue, of course, is precedence of ideas. The premise of mining Twitter for public health information during the H1N1 pandemic was a pretty obvious idea, and one for which neither of us can rightfully take credit. Mining of Twitter data for purposes other than public health was commonplace practice in April of 2009, when we started scraping the Twitter stream for references to influenza.

Moreover, the two papers differ significantly once one gets beyond the fact that both are based on independently generated data sets obtained from the same root source over (something neither of us can rightfully claim as our own innovation) over nearly identical periods of time.

First, the main result presented in our paper -- and, we believe, the reason our paper was chosen for an award -- is our method for real-time tracking of geolocated disease activity. Our method is a machine learning approach, in particular, an application of the support vector machine idea to produce real time estimates of disease in a given geographic region.

Second, our results are carefully validated against real, geographically situated, disease data; indeed, we see "closing the loop" and grounding the result in real-world disease activity as one of our primary contributions, and, in fact, the main goal of surveillance work.

Third, to produce geographically situated estimates of disease activity, our method relies on geolocating each record. Our work demonstrates that there is, in fact, more fine-grained, geographically nuanced, information embedded in the Twitter stream, and that one can provide significantly better surveillence information by looking beyond aggregate values. The absence of geolocation is something you explicitly mention as a methodological limitation of your own work.

Fourth, because we do not rely on manual coding, our method is fully automated and can operate unsupervised and in real time, as our validation demonstrates.

There are other differences as well, mostly due to the fact that the papers bear only a surface similarity and do not, in fact, share the same goals or purpose. We were not interested in "qualitative analysis" of tweets, but rather in geographically situated and numerically valid surveillance estimates.

In closing, we regret that you feel we have in some way shortchanged your work. If we were writing this paper in 2012 (rather than in 2010), we would of course cite your work, as well as any number of other Twitter-related public health papers, in the related work section (as I am sure you would cite our paper under similar circumstances). But in the final analysis, the primary reason to cite a paper is acknowledge the influence of that work on your own. Here, as the chronology demonstrates, there was no such influence: our work was already complete and submitted for publication well before your paper appeared.

Alberto Maria Segre
Professor and Chair
Gerard P. Weeg Faculty Scholar in Informatics
Department of Computer Science
14D MacLean Hall
The University of Iowa
Iowa City, IA 52242-1419

Competing interests declared: I am one of the coauthors of this paper.