Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Illustration of the presented novel approach to the search for a patent’s prior art.

First, a dataset of patent applications is obtained from a patent database using a few manually selected seed patents and recursively including the patent applications they cite. Then, the patent texts are transformed into feature vectors and the similarity between two documents is computed based on said feature vectors. Finally, patents that are considered as very similar to a new target patent application are returned as possible prior art. An appropriate similarity measure for this process should assign high similarity scores to related patents (e.g. where one patent was cited in the search report of the other) and low scores to unrelated (randomly paired) patents. We compare different similarity measures by quantifying the overlap between the respective similarity score distributions of pairs of related documents and randomly paired patents using the AUC score.

More »

Fig 1 Expand

Table 1.

Evaluation results on the cited/random dataset.

More »

Table 1 Expand

Fig 2.

Distributions of cosine similarity scores.

Similarity scores for the patent pairs are computed using BOW feature vectors generated either from full texts (left) or only the claims sections (right). Scale on the y-axis is irrelevant and was therefore omitted.

More »

Fig 2 Expand

Table 2.

Confusion matrix for the dataset subsample.

More »

Table 2 Expand

Table 3.

Correlations between labels and similarity scores on the dataset subsample.

More »

Table 3 Expand

Fig 3.

Score correlation for the patent with ID US20150018885.

A false negative (ID US20110087291) caught by the cosine similarity is circled in gray.

More »

Fig 3 Expand

Table 4.

Summary of evaluation results.

More »

Table 4 Expand