Posted by drewendy on 26 Apr 2012 at 16:57 GMT

This is a very attractive study that, by itself, seems incredibly dangerous to use as a practical guide for what and why things are happening within local, national, and global synthetic biology research communities.

For example, some obvious papers seem absent, although they are listed in the citation database the authors use and include the relevant stated keywords. As a quick self-example, my research group has two papers in this article's top 10 but others seem missing (doi:10.1038/nature04342 and doi:10.1038/nbt1413 are there, but doi: 10.1038/msb4100025 is not?).

More seriously, I also worry that key relations and actors are absent or misrepresented. Again, as quick local examples: Where is SynBERC? Where is iGEM? Where is the BBF?

In summary, I wish that the article included a critical discussion on the limits of its methods. I do very much appreciate the author's pioneering use of new tools to visualize what might be happening. I would like to see this work repeated with other databases (e.g., Google Scholar), and would love to see systematic critical analysis.

-Drew Endy

Competing interests declared: Obvious (i.e., professional life's work is subject of published study).

RE: here be dragons... careful reading is required

poldham replied to drewendy on 26 Apr 2012 at 19:47 GMT

We appreciate these comments.

The article and the accompanying dataset require careful reading and I think a Tableau workbook may well take some getting used to. For example, doi: 10.1038/msb4100025 by Chan et el Refactoring bacteriophage T7 mentioned in the comment as missing is actually in the workbook. It can be located by scrolling down the trends panel containing title and first author data to 2005 i.e. by downloading the workbook for Tableau Reader or in the Tableau Public mirror workbook here This is of course where it should be.

As mentioned in the article, there is an important limitation with respect to back cited literature. That is articles cited by the synthetic biology corpus but that do not use the key terms in the title, abstract or author keywords sections. These articles appeared in the cited literature for the core data but purely as abbreviated references. As we explained, we lacked the means to readily retrieve the full records for thousands of back citations from Web of Science. This was frustrating. If a practical computational means could be found to retrieve the full data for these references a wider picture of the historical emergence of synthetic biology and important articles shaping the field would emerge. We would like to see that happen.

With regard to organisations and networks. Institutional data requires significant cleaning and decision making on allocations. For example, SynBERC is a multi-institutional collaboration but Berkeley is the lead partner and the people listed as working there have Berkeley email addresses. SynBERC was therefore aggregated onto Berkeley. There is however a broader methodological issue in how to appropriately allocate inter-institutional research centres in a way that fully reflects their characteristics. Using animated network visualizations and scaleable aggregation layers would be one way forward.

iGem and BBF are important in synthetic biology. However, as far as I am aware they are not organisations that appear as author affiliations in the literature we focused on mapping. They are still important. I think mapping of participants in iGem and users of the BBF would be an interesting and informative project (and efforts may well be underway in that area of which I am unaware).

The suggestion that the work be repeated with other databases such as Google Scholar or Scopus makes very good sense. Web based network mapping would also be valuable. In general, we regard the article as a start in mapping synthetic biology in a way that is transparent and promotes engagement and appreciate the comments received.

Paul Oldham

Competing interests declared: Lead author of the article

RE: RE: here be dragons... careful reading is required

drewendy replied to poldham on 27 Apr 2012 at 00:18 GMT

Thanks for your response and explanations which I really appreciate. It might always be difficult to understand what collections of humans are doing. I hope that the article and these comments will be read carefully.

As another question, given that you had (have) the goal of informing the CBD policy making process it seems strange to use a raw academic literature database as your primary source data. For example, iGEM is likely the largest-by-far distributor of engineered genetic material across national borders (both as sequence information and material) and seems likely to define the future leadership culture within practicing genetic engineers globally. As a second example, a search of the academic AND industry literature for acknowledgements of gene synthesis companies would help to establish a map for how genetic information & material are flowing within the commercial sector.

With thanks again and best wishes, -Drew

Competing interests declared: Unchanged.

RE: RE: RE: here be dragons... careful reading is required

poldham replied to drewendy on 30 Apr 2012 at 14:35 GMT

Thanks for these very useful comments Drew,

We focused on the academic literature for a couple of different reasons. The first is that we think that delegations to the CBD will not possess much information about synthetic biology and what is happening in their countries. The scientific data provided a good way of providing a basic insight into the who, what and where of the field in a way that can be repeated. We were particularly pleased that the citing landscape revealed such an interesting picture of the ways that synthetic biology is being picked up around the world.

The second reason is that for some time (indeed a long time) I have been working towards completing the patent landscape. In the absence of a controlled vocabulary the research has focused on following researchers from the literature into the patent system. This is methodologically difficult and time consuming but provides a sound route to identifying patent activity. Stephen and I are looking at ways to speed this up because the demands of annually updating the data is a significant obstacle to the wider use of the method.

I also agree with both of the points that you raise. For example, I can see that iGem provides an actual example of practices in distributing material and information across national boundaries. Case studies on iGEM (and BBF) could provide very useful data on these practices if the CBD picks up this issue for further work. Just as important is the issue you raise of future leadership culture. Here one practical suggestion would be for the iGEM organizers to extend an invitation for members of the Secretariat of the CBD (i.e. via a letter to the Executive Secretary) to attend an iGEM conference. That approach could provide iGEM participants with a fuller understanding of the objectives of the CBD and at the same time provide members of the Secretariat with information on iGEM.

I also think that the point you raise about searching the academic and industry literature is quite right. I had seen the patent landscape as part of a way forward on this. However, I think that you are raising a wider issue involving mapping outside the formal literature. I think the challenge here would be working up stable methods (and tools) for identifying that data, extracting, analysing and updating over time. Stephen and I developed a tool some time ago for complex web searching that you can visit here http://www.researchdeskto... . It is inspired by iGoogle and works with standards compliant browsers. Try entering a gene synthesis company name into the search (or synthetic biology). We have not done much with this recently and some of the gadgets may need an update. However, this kind of intermediate analytics tool could perhaps help with gathering data on these issues.

Once again many thanks for these very useful comments. Paul

Competing interests declared: Lead author