Extraction of Pharmacokinetic Evidence of Drug-drug Interactions from the Literature

Drug-drug interactions (DDIs) are major causes of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. While evidence for DDI ranges in scale from intracellular biochemistry to human populations, literature mining methods have not been used to extract specific types of experimental evidence which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDIs ... We used a manually curated corpus of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining in classifying PubMed abstracts containing pharmacokinetic evidence for DDIs, as well as extracting sentences containing such evidence. We implemented a text mining pipeline using several linear classifiers and a variety of feature transformation methods. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, from various publicly-available named entity recognizers and from pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1 ~= 0.93, MCC ~= 0.74, iAUC ~= 0.99) and sentences (F1 ~= 0.76, MCC ~= 0.65, iAUC ~= 0.83). We found that word-bigram textual features were important for achieving optimal classifier performance, that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification, and that some drug-related entity named recognition tools and dictionaries led to slight but significant improvements, especially in classification of evidence sentences. ...


Abstract Performance: Most relevant and irrelevant features
A linear classifier separates classes using a hyperplane defined by a set of feature coefficients. The impact of a given feature on classification is naturally quantified by the sign and amplitude of its hyperplane coefficient. The increase of the value of a feature with a large positive (negative) coefficient produces a large increase in a document's propensity to be classified as relevant (irrelevant). Coefficients for different features are made comparable by an appropriate normalization: we multiply each feature's coefficient by the standard deviation of the feature's values in the training data, producing what is referred to as a 'standardized coefficient' in the linear regression literature. For the abstract bigram runs, the following figure shows the ranks of the most relevant and the reverse rank of the most irrelevant features. RELEVANT FEATURES includes any feature whose standardized coefficient was among the top 5 most positive standardized coefficients for any transform/classifier combination, while IRRELEVANT FEATURES includes any feature whose standardized coefficient was among the top 5 most negative standardized coefficients for any transform/classifier combination. Transforms are organized in the vertical columns, while classifiers are distinguished by color and marker style. For relevant (irrelevant) features, markers are positioned according to their rank (reverse rank) among the most positive (negative) features for a given classifier and transform combination. 10 0 10 1 10 2 10 3 10 4 Rank IDF 10 0 10 1 10 2 10 3 10 4 Rank IDF+Norm 10 0 10 1 10 2 10 3 10 4 Rank Norm 10 0 10 1 10 2 10 3 10 4 Rank TFIDF 10 0 10 1 10 2 10 3 10 4 Rank TFIDF+Norm 10 0 10 1 10 2 10 3 10 4 Rank IRRELEVANT FEATURES  To further compare the importance that different classifier/transform combinations gave to different features, we performed principal component analysis on a matrix composed of the hyperplane coefficients of all such combinations. The following figure shows each transform-classifier hyperplane in terms of its loading on the first two principal components (PCs). This projection separates classifiers that use feature covariance information (LDA, SVM, and Logistic Regression) and those that don't (Naive Bayes, dLDA, VTT). It also groups configurations according to feature transforms, with configurations that included IDF-like transforms clustering separately from those that used no transforms or a simple L2-normalization. In general, SVM and Logistic Regression produce very similar feature loadings, likely due to the fact that they optimize similar cost functions during training.

Abstract Performance: NER and Metadata Features
The following figures plot the relative changes in F1 and MCC performance when including vs. not-including metadata and NER-derived features on abstract bigram runs with feature transforms. Significant changes (p<0.05, two-tailed test) are indicated by asterisks. For metadata, changes in performance are measured while still including features from the other 4 metadata fields.

Sentence performance: Feature transforms and dimensionality reduction
The following charts show the F1 and MCC performance performance on bigram runs when feature transforms and dimensionality reductions are applied. Significant improvements (p<0.05, one-tailed test) compared to the same classifier applied to non-transformed data are indicated by stars.

Sentence Performance: Most relevant and irrelevant features
We analyzed which features were most relevant and irrelevant for identifying evidence sentences using the same methodology as for abstracts (described in section 1.2). In the following figure, RELEVANT FEA-TURES includes any feature whose standardized coefficient was among the top 5 most positive standardized coefficients for any transform/classifier combination, while IRRELEVANT FEATURES includes any whose standardized coefficient was among the top 5 most negative standardized coefficients for any transform/classifier combination. Transforms are organized in the vertical columns, while classifiers are distinguished by color and marker style. For relevant (irrelevant) features, markers are positioned according to their rank (reverse rank) among the most positive (negative) features for a given classifier and transform combination. As in section 1.2, we perform principal component analysis of separating hyperplanes produced by different transforms and classifiers trained on the sentence corpus. Hyperplanes are generally grouped by classifier in this projection, with those corresponding to 'naive' classifiers that do not use feature covariances (VTT, dLDA, Naive Bayes) clustering separately from those that do (LDA, SVM, Logistic Regression).

Sentence Performance: Impact of NER Features
The following figures plot the relative changes in F1 and MCC performance when including vs. not-including NER-derived features on non-transformed sentence bigram runs. Significant changes (p<0.05, two-tailed test) indicated by asterisks.
B IC E P P D ru g B a n k i-C Y P S i-D ru g s i-P k P a ra m s i-T ra n s p o rt e rs