Reader Comments

Post a new comment on this article

Great review - one minor inaccuracy

Posted by victorhenning on 31 Oct 2008 at 11:55 GMT

Thank you - a great review. May I just point to a minor inaccuracy in your description: By writing that Mendeley can only extract metadata from PDFs where it “is available in an amenable format” and by citing Howison & Goodrum, you seem to imply that Mendeley reads the PDF files’ embedded metadata fields for its automatic document recognition.

This is not the case: Mendeley does not rely on the embedded metadata fields, since (as Howison & Goodrum point out) they are usually empty. Instead, Mendeley extracts the full text of the document and, using regular expression and Hidden Markov Model algorithms, tries to “guess” the correct metadata based on the layout, formatting, and text.

It is true, though, that the recognition quality is much better for journal articles formatted in a certain way (e.g. Elsevier, Kluwer, or Wiley journals) than for others - so you could say that some journal formattings are more amenable to our metadata extraction. Improving this is one of our main development priorities at the moment.

All the best,

Victor Henning