Reader Comments

Post a new comment on this article

Reliability of sequence data in GenBank, EMBL and DDBJ?

Posted by JanS on 15 Jan 2007 at 00:58 GMT

that the reference database features a satisfactory taxonomic sampling of sequences


that the sequences in the reference database are correctly identified and annotated


that the process of translating the comparison into species names is standardized, universally adopted, and not easily misunderstood

http://plosone.org/article/info:doi/10.1371/journal.pone.0000059#article1.body1.sec1.p1

Dear all,

I have enjoyed reading the article and found it a very good treatment of the problems associated with entries in GenBank, EMBL, DDBJ.

In a traditional/old-fashioned museum (with all the jars with strange looking de-pigmented organisms, pinned butterflies, skeletons etc.) real but dead organisms are reliably stored, categorized and annotated by a curator. If there is any doubt about the taxonomic affiliation one can always examine the specimen. Unlike this traditional museum GenBank, EMBL and DDBJ store only DNA sequences + fragmentary annotation of some organism. Unfortunately in majority of cases this organism, is not linked to a repository (= traditional museum) with the dead organism, or in case of cultures a culture repository (i.e. ATCC). GenBank, EMBL, DDBJ has probably no means how to regulate this problem nor the regulation of the depth of annotation of the sequence as presented by Nilsson et al. in their article because it all stands on the individual researcher that submits the data there.

The point that I would like to highlight is an addition to Nilsson et al.’s “assumptions” in their introduction. One has to also assume that the sequence has been generated correctly and has no errors caused by technical issues (not to be confused by polymorphism).

But really, how reliable are the sequence data in the GenBank, EMBL, DDBJ? Here I would like direct your attention to this excellent article, that is worth reading:

Harris DJ (2003) Can you bank on GenBank? Trends in Ecology & Evolution 18(7) 317-319. http://dx.doi.org/10.1016/S0169-5347(03)00150-2

Anyway, I can only agree that more control over the annotation as well as the sequence is needed from the publishers, repositories and last but not least authors.

Cheers, JanS

RE: Reliability of sequence data in GenBank, EMBL and DDBJ?

RHNi replied to JanS on 07 Feb 2007 at 15:45 GMT

>The point that I would like to highlight is an addition to Nilsson
>et al.’s “assumptions” in their introduction. One has to also assume
>that the sequence has been generated correctly and has no errors caused
>by technical issues (not to be confused by polymorphism).

You bring up a good point here. I'd say the “technical quality” of the sequences is just taken for granted most of the time, whereas in reality we have all seen Sequencher or Staden trying to make sense of noisy indata and making equivocal basecalls. (On a sidenote, the sequence with the highest number of IUPAC ambiguities – if I remember correctly – sported a full 85% of them.) I agree that we should probably have been more explicit about the “technical quality” in the Introduction.

Thanks for sharing your thoughts,

Sincerely,

Henrik N

RE: RE: Reliability of sequence data in GenBank, EMBL and DDBJ?

MarkvP replied to RHNi on 30 Jun 2007 at 23:22 GMT

Dear all,

with respect to the 'power of participation' of the (scientific) community, I would like to suggest the paper by S.L. Salzberg in Genome Biology

S.L. Salzberg
Genome re-annotation: a wiki solution?
Genome Biol. 2007;8(1):102

I don't think it's open access: isn't that weird, an open access journal discussing an open access wiki approach, yet with restricted access to the article!?!!

Anyway, I am sure you can get the paper if you try.

Good luck

Mark

RE: RE: RE: Reliability of sequence data in GenBank, EMBL and DDBJ?

RHNi replied to MarkvP on 11 Jul 2007 at 20:05 GMT

Dear Mark,

I have requested the possibility to leave comments for particular INSD entries on multiple occasions, but I never had a satisfactory reply. Such a feature would surely alleviate the concerns with, e.g., misidentified entries: it would be highly useful to read other persons' warnings and reservations on particular entries. I particularly think of the ITS sequences that are submitted as belonging to reindeer but that really are ascomycetes - a word of warning would certainly beneficial here.

I fully agree with you on the absurdity on publishing a plea for open access / contribution style approaches - in a non-open access paper. Maybe the author was less into open access and more into high impact factors after all?

(Genome Biology is partly open access, right?)

Best,

Henrik N