A full-document analysis of the semantic relation between European Public Assessment Reports and EMA guidelines using a BERT language model

Erik Bergman; Anna Maria Gerdina Pasmooij; Peter G. M. Mol; Gabriel Westman

doi:10.1371/journal.pone.0294560

Abstract

In the European Union, the Committee for Medicinal Products for Human Use of the European Medicines Agency (EMA) develop guidelines to guide drug development, supporting development of efficacious and safe medicines. A European Public Assessment Report (EPAR) is published for every medicine application that has been granted or refused marketing authorisation within the EU. In this work, we study the use of text embeddings and similarity metrics to investigate the semantic similarity between EPARs and EMA guidelines. All 1024 EPARs for initial marketing authorisations from 2008 to 2022 was compared to the 669 current EMA scientific guidelines. Documents were converted to plain text and split into overlapping chunks, generating 265,757 EPAR and 27,649 guideline text chunks. Using a Sentence BERT language model, the chunks were transformed into embeddings and fed into an in-house piecewise matching algorithm to estimate the full-document semantic distance. In an analysis of the document distance scores and product characteristics using a linear regression model, EPARs of anti-virals for systemic use (ATC code J05) and antihemorrhagic medicines (B02) present with statistically significant lower overall semantic distance to guidelines compared to other therapeutic areas, also when adjusting for product age and EPAR length. In conclusion, we believe our approach provides meaningful insight into the interplay between EMA scientific guidelines and the assessment made during regulatory review, and could potentially be used to answer more specific questions such as which therapeutic areas could benefit from additional regulatory guidance.

Citation: Bergman E, Pasmooij AMG, Mol PGM, Westman G (2023) A full-document analysis of the semantic relation between European Public Assessment Reports and EMA guidelines using a BERT language model. PLoS ONE 18(12): e0294560. https://doi.org/10.1371/journal.pone.0294560

Editor: Dzintars Gotham, Kings College Hospital, UNITED KINGDOM

Received: August 2, 2023; Accepted: November 2, 2023; Published: December 15, 2023

Copyright: © 2023 Bergman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data is publicly available through the Swedish National Data Service (DOI https://doi.org/10.57804/wa37-j878).

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In Europe, the Committee for Medicinal Products for Human Use (CHMP) of the European Medicines Agency (EMA) develops guidelines to guide drug development [1]. These EMA guidelines outline expectations with respect to the studies needed to support applicants when preparing marketing authorisation application dossiers for new medicinal products. The ultimate goal is to support development of efficacious and safe medicines, balancing the need for robust high-quality evidence with the need to expedite availability of medicinal products for patients in the EU.

Availability and uptake of EMA guidelines vary with factors such as the therapeutic area and type of active substance in the medicinal product. As the body of available guidelines have expanded over time, older products may have been less informed by formal guidelines at time of approval than more recent products, but this can to some extent have been balanced by other forms of regulatory support during development.

A European Public Assessment Report (EPAR) is published for every medicine application that has been granted or refused marketing authorisation within the EU [2]. The EPAR details the characteristics of the medicinal product, the pharmaceutical, pre-clinical and clinical development programme and includes a comprehensive regulatory assessment of the benefits and risk associated with its use. To our knowledge, the extent to which EMA scientific guidelines are reflected in the European Public Assessment Reports (EPARs) that concludes regulatory assessment at time of marketing authorisation in the EU, has not previously been systematically studied.

As manual analysis of thousands of relatively large documents is very labour-intensive, natural language processing techniques can be used for such tasks. Previously used approaches for semantic full-document matching have often relied on generating one single mathematical representation of the document, including approaches such as TF-IDF and Doc2Vec. However, these approaches each come with their respective disadvantages and cannot capture the full granularity of the content, which may prevent in-detail comparison of partial semantic overlap between documents [3]. Large language models such as BERT provide additional performance and robustness in information retrieval tasks [4]. Since the publication of the original BERT model, several domain-specific models such as PubMedBERT [5] and PharmBERT [6] have been developed for the medical and pharmaceutical domains. However, for information retrieval tasks, a further development of the principles of model training has led to a branch of models called sentence transformers which provide greatly improved performance in task such as similarity search [7, 8].

Here, we describe an algorithm for piecewise matching used together with a sentence BERT language model, allowing full-document semantic comparison to investigate the textual semantic distance between EPARs and EMA scientific guidelines. Further, we investigate which factors are related to this semantic distance with the aim to guide and improve guideline development, implementation, and uptake, both for the European Medicines Regulatory Network and for developers of medicinal products for the European market.

Materials and methods

Data collection, curation, and transformation

All currently authorised human medicinal products authorised in a central procedure in the EU between 2008 and 2022 were included. In total, 1024 EPARs for initial marketing authorisations connected to these products were collected from the EMA website on November 18, 2022. Similarly, the latest revision of a total of 669 EMA available scientific guidelines were collected via the EMA recommended search page [9]. Drafts and reflection papers were included, while concept papers and documents not directly providing guidance for medicines development such as presentations, overview of comments and list of participants, were excluded.

All documents were converted to plain text and further split into chunks of 200 words with a 50 word overlap between chunks, except at page breaks where chunk size could be lower than 200 words and chunk overlap could be up to 100 words. A total of 265,757 EPAR and 27,649 guideline text chunks were transformed into a semantic text embedding, a mathematical representation in 768-dimensional space, using the Sentence BERT model (mpnet-all-ver2) [10].

Full-document semantic matching

Piecewise semantic matching.

To allow maximum granularity in the analysis of semantic overlap between EPARs and guidelines, an all-vs-all method for piecewise comparison of document chunks was used, generating a set of semantic distances corresponding to the number of chunks in the smallest of the two documents being compared (Fig 1). Hence, a text chunk could contribute more than once if well-matched against multiple chunks in the comparison document. The global semantic distance between two documents was then defined as the mean of the N lowest chunk distances, where N is the number of chunks in the smallest document.

Download:

Fig 1. Flow-chart describing the piecewise document matching algorithm.

Text chunks from each document are transformed into high-dimensional semantic embeddings using a BERT language model. The embedding vectors are then compared all vs. all between Document A and Document B, whereafter a global document distance measure is calculated based on the geometric mean of the N lowest distances, where N is the number of chunks in the smallest document of the two.

https://doi.org/10.1371/journal.pone.0294560.g001

For performance comparison, a mean-pooling algorithm was used, using the mean embedding for each document to calculate the global cosine distance between documents.

EPAR vs. guideline semantic analysis.

A total guideline semantic distance was calculated for each EPAR, based on the best-matching EMA guidelines. As it was assumed that several, but far from all, guidelines would be applicable to a single regulatory assessment report covering all aspects of drug development for one medicinal product in one therapeutic area, the optimal number of guidelines contributing to the total EPAR semantic distance was investigated. After confirming that using both 1% and 10% of the best guideline matches generated an inferior signal to noise ratio by visual inspection of the expected temporal trend illustrated in Fig 1, this parameter was set to 5% for both piecewise matching and mean pooling algorithms.

Python 3.8 was used for all text and data handling, with libraries; scikit-learn 1.1.3, cupy-cuda11x 11.4.0, statsmodels 0.13.5 and PyMuPDF 1.21.0. Factors relating to the guideline semantic distance scores were analysed using linear regression, assuming a statistical significance level of 0.05.

Results

EPAR vs. guideline semantic similarity

To verify that the scoring algorithms generate meaningful results, the total guideline distance score for each EPAR were compared based on date of approval, with lower score indicating a higher level of semantic similarity (Fig 2). As expected, given that we compare the initial assessment reports with currently available guidelines, the average distance score steadily decreases with time, most clear when using the piecewise matching algorithm (top panel) while the mean pooling (bottom panel) appears to perform less well with a higher variance and a less clear temporal trend.

Download:

Fig 2.

Semantic distance between European Public Assessment Reports for initial marketing authorisation of medicinal products and EMA scientific guidelines, per date of initial marketing authorisation, using piecewise matching (top panel) and mean pooling (bottom panel) algorithms. Lower score indicates higher semantic similarity.

https://doi.org/10.1371/journal.pone.0294560.g002

The piecewise matching method was used to further explore which individual EPARs displayed the lowest semantic distance to EMA guidelines (Table 1). The top-20 list of EPARs with the lowest overall semantic distance to the guidelines include a high proportion of products indicated for the treatment or prevention of Covid-19 (7 products) and Haemophilia A (4 products).

Download:

Table 1. List of the 20 EPARs with the lowest semantical distance to EMA scientific guidelines.

Lower score indicates higher semantic similarity.

https://doi.org/10.1371/journal.pone.0294560.t001

Inversely, all scientific guidelines were scored to identify those most semantically close to the EPAR database, showing a set of therapeutic indication-independent guidelines with applicability to many medicinal products (Table 2).

Download:

Table 2. List of the top 20 EMA scientific guidelines contributing to the distance scores, as estimated by mean semantic distance to European Public Assessment Reports for initial marketing authorisation of medicinal products.

Lower score indicates higher semantic similarity.

https://doi.org/10.1371/journal.pone.0294560.t002

Factors predicting crude semantic distance to guidelines were investigated, including ATC code levels 1 and 2 (Figs 3 and 4) and CHMP Rapporteur country (Fig 5). The most common ATC level 1 codes in the dataset included antineoplastic agents and immunomodulating agents (L), anti-infectives for systemic use (J), and treatments targeting diseases in the nervous system (N) and related to the alimentary tract and metabolism (A).

Download:

Fig 3. Semantic distance between European Public Assessment Reports (EPARs) for initial marketing authorisation of medicinal products and EMA scientific guidelines per ATC level 1.

Number of EPARs in grey. Lower score indicates higher semantic similarity. Some products were yet to be assigned an ATC code at time of analysis.

https://doi.org/10.1371/journal.pone.0294560.g003

Download:

Fig 4. Semantic distance between European Public Assessment Reports (EPARs) for initial marketing authorisation of medicinal products and EMA scientific guidelines per ATC level 2.

Number of EPARs in grey. Lower score indicates higher semantic similarity.

https://doi.org/10.1371/journal.pone.0294560.g004

Download:

Fig 5. Semantic distance between European Public Assessment Reports (EPARs) for initial marketing authorisation of medicinal products and EMA scientific guidelines per CHMP Rapporteur country.

Number of EPARs in grey. Lower score indicates higher semantic similarity.

https://doi.org/10.1371/journal.pone.0294560.g005

Looking at ATC level 2 the differences between product groups were clearer, with anti-obesity agents (A08) and anti-hemorrhagic agents (B02) being the groups with the lowest crude semantic distance to EMA guidelines (excluding group V01 with only one product).

A linear regression model (Table 3 and S1 Table) including product age and document length as covariates was used to further investigate predictors of EPAR-guideline distance in relation to ATC level 2 code, CHMP Rapporteur country and a selected set of metadata variables including whether products were subject to additional monitoring, had orphan designation, were biosimilars, or were subject to a conditional approval.

Download:

Table 3. Linear regression analyses of EPAR guideline distance by ATC level 2 code, CHMP Rapporteur country and selected metadata parameters.

https://doi.org/10.1371/journal.pone.0294560.t003

EPARs from medicinal products with ATC codes B02 and J05 were associated with a statistically significantly lower semantic distance, indicating a higher textual similarity with guidelines. Similarly, EPARs where the CHMP Rapporteur country was Croatia, Malta, Latvia, Estonia, and Iceland were also significantly more semantically close to guidelines, while reports for products that were flagged for additional monitoring had a higher semantic distance.

The groups with statistically significant differences in semantic distance to guidelines in the linear regression model were explored further, by aggregating distance scores for each group and adjusting for the global geometrical mean distance in the dataset (Table 4 and S2 Table). The results show that the differences observed in relation to ATC code can be attributed to guidelines specific to the therapeutic areas in question. Differences related to the additional monitoring flag appear related to a higher variety of guidelines, and the semantic distance differences are generally lower, but several safety-related guidelines rank high within this group. Further, examples of the best text chunk matches on an individual document level are presented in S1 File.

Download:

Table 4. List of the top ten EMA scientific guidelines contributing to group-level distance scores for ATC B02, J05 and products under additional monitoring.

Scores are adjusted for the respective global geometrical mean in the dataset (see Table 2). Higher score indicates a semantic specificity in relation to the respective group.

https://doi.org/10.1371/journal.pone.0294560.t004

Discussion

To our knowledge, this is the first study that applies a pre-trained language model for semantic analysis of European Public Assessment Reports in relation to EMA scientific guidelines. By using a full piece-by-piece semantic matching method it is possible to capture relevant details across a variety of regulatory document types (S2 File) and directly map sections in the text with high semantic similarity between different documents. This approach can also be used to perform semantic searches that can be applied in regulatory science, using larger sections of text and reconstructing semantic distance with higher complexity than in our previous work on medicinal product information [11].

The piecewise matching method appears to outperform mean pooling of text embeddings in all applications within this study. This is expected due to the preservation of semantic granularity in the process, with the all-vs-all comparison being similar to comparing two puzzles piece-by-piece rather than comparing a single average representation from each puzzle. However, this comes at a price of significantly more computing power required but within the scale of this project, building the full embedding vector database and matching thousands of documents, all calculations were performed in approximately 35 minutes on a single workstation with a consumer-level graphics processing unit (GPU).

Looking into specific results from the linear regression model, it appears that the area of anti-viral drugs (ATC code J05) and antihemorrhagic medicines (B02) have a lower overall guideline semantic distance, also when adjusting for product age and EPAR length. The low semantic distance to guidelines for EPARs in the fields of antivirals appear to be related to active regulatory support for therapeutic areas such as HIV and viral hepatitis. Also, the high number of products in this ATC group provides an increased chance for statistical significance. Similarly, antihemorrhagic medicines are highly standardised, which could be a reason behind the low semantic distance between EPARs in this therapeutic area and current guidelines.

In contrast, EPARs for products that are subject to additional monitoring have an overall higher semantic distance to guidelines when adjusting for product age, which is likely related to the fact that such products more often contain completely new chemical entities or are biological drugs, where guideline support could perhaps be more limited at time of approval due to their novelty [12].

The results related to CHMP rapporteur country is more difficult to interpret, as national agencies have differences in preference for therapeutic areas and types of procedures (i.e., new chemical entities vs. generics applications) and where activity in the European regulatory network has varied over time. Given that a large number of variables were included in the regression model, close to the maximum 1 variable per 10 data points given as a rule of thumb, interactions between variables could not be investigated with statistical rigor.

Importantly, the semantic distance scores presented in this study should not be interpreted as a direct measure of guideline adherence, quality of the dossier provided by the applicant at time of marketing authorisation or quality of the assessment made by the European Regulatory Network. Rather, it is a result of several factors such as the availability of guidelines and level of standardisation in the respective fields.

Nevertheless, we believe that our methodology provides meaningful insight into the interplay between EMA scientific guidelines and the assessment made during regulatory review and could potentially be used to answer more specific questions such as which therapeutic areas could benefit from additional guideline support.

In conclusion, semantic analysis at the whole-document level shows promise in the field of pharmaceutical regulatory science and could potentially assist in identifying therapeutic areas in need of further support by regulatory guidance.

Supporting information

S1 Table. Full linear regression model.

https://doi.org/10.1371/journal.pone.0294560.s001

(PDF)

S2 Table. List of the top ten EMA scientific guidelines contributing to group-level distance scores for the ten EPAR ATC level 2 groups with the most specific guideline matches.

Scores adjusted for the respective global geometrical mean in the dataset (see Table 2). Higher score indicates a semantic specificity in relation to the respective group.

https://doi.org/10.1371/journal.pone.0294560.s002

(PDF)

S1 File. Best product unique text chunk matches between EPARs for medicinal products with ATC B02 and ATC J05 and EMA guidelines.

https://doi.org/10.1371/journal.pone.0294560.s003

(PDF)

S2 File. Multi-document type clustering.

https://doi.org/10.1371/journal.pone.0294560.s004

(PDF)

References

1. EMA. Scientific guidelines. In: European Medicines Agency [Internet]. 17 Sep 2018 [cited 18 Jul 2023]. Available: https://www.ema.europa.eu/en/human-regulatory/research-development/scientific-guidelines
2. EMA. European public assessment reports: background and context. In: European Medicines Agency [Internet]. 27 Sep 2018 [cited 18 Jul 2023]. Available: https://www.ema.europa.eu/en/medicines/what-we-publish-when/european-public-assessment-reports-background-context
3. Kim D, Seo D, Cho S, Kang P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf Sci. 2019;477: 15–29.
- View Article
- Google Scholar
4. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics; 2019. pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423
5. Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans Comput Healthc. 2022;3: 1–23.
- View Article
- Google Scholar
6. ValizadehAslani T, Shi Y, Ren P, Wang J, Zhang Y, Hu M, et al. PharmBERT: a domain-specific BERT model for drug labels. Brief Bioinform. 2023;24: bbad226. pmid:37317617
7. Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. 2019 [cited 4 Oct 2023].
- View Article
- Google Scholar
8. Thakur N, Reimers N, Rücklé A, Srivastava A, Gurevych I. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models. 2021 [cited 4 Oct 2023].
- View Article
- Google Scholar
9. EMA. EMA—Search the website. In: European Medicines Agency [Internet]. [cited 18 Jul 2023]. Available: https://www.ema.europa.eu/en/search/search/field_ema_web_topics%253Aname_field/Scientific%20guidelines/field_ema_web_categories%253Aname_field/Human
10. sentence-transformers/all-mpnet-base-v2 · Hugging Face. 9 Jun 2023 [cited 18 Jul 2023]. Available: https://huggingface.co/sentence-transformers/all-mpnet-base-v2
- View Article
- Google Scholar
11. Bergman E, Sherwood K, Forslund M, Arlett P, Westman G. A natural language processing approach towards harmonisation of European medicinal product information. Grabar N, editor. PLOS ONE. 2022;17: e0275386. pmid:36264941
12. EMA. List of medicines under additional monitoring. In: European Medicines Agency [Internet]. 17 Sep 2018 [cited 18 Jul 2023]. Available: https://www.ema.europa.eu/en/human-regulatory/post-authorisation/pharmacovigilance/medicines-under-additional-monitoring/list-medicines-under-additional-monitoring

[ref1] 1. EMA. Scientific guidelines. In: European Medicines Agency [Internet]. 17 Sep 2018 [cited 18 Jul 2023]. Available: https://www.ema.europa.eu/en/human-regulatory/research-development/scientific-guidelines

[ref2] 2. EMA. European public assessment reports: background and context. In: European Medicines Agency [Internet]. 27 Sep 2018 [cited 18 Jul 2023]. Available: https://www.ema.europa.eu/en/medicines/what-we-publish-when/european-public-assessment-reports-background-context

[ref3] 3. Kim D, Seo D, Cho S, Kang P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf Sci. 2019;477: 15–29.
View Article
Google Scholar

[4] View Article

[5] Google Scholar

[ref4] 4. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics; 2019. pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423

[ref5] 5. Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans Comput Healthc. 2022;3: 1–23.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref6] 6. ValizadehAslani T, Shi Y, Ren P, Wang J, Zhang Y, Hu M, et al. PharmBERT: a domain-specific BERT model for drug labels. Brief Bioinform. 2023;24: bbad226. pmid:37317617
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref7] 7. Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. 2019 [cited 4 Oct 2023].
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref8] 8. Thakur N, Reimers N, Rücklé A, Srivastava A, Gurevych I. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models. 2021 [cited 4 Oct 2023].
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref9] 9. EMA. EMA—Search the website. In: European Medicines Agency [Internet]. [cited 18 Jul 2023]. Available: https://www.ema.europa.eu/en/search/search/field_ema_web_topics%253Aname_field/Scientific%20guidelines/field_ema_web_categories%253Aname_field/Human

[ref10] 10. sentence-transformers/all-mpnet-base-v2 · Hugging Face. 9 Jun 2023 [cited 18 Jul 2023]. Available: https://huggingface.co/sentence-transformers/all-mpnet-base-v2
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref11] 11. Bergman E, Sherwood K, Forslund M, Arlett P, Westman G. A natural language processing approach towards harmonisation of European medicinal product information. Grabar N, editor. PLOS ONE. 2022;17: e0275386. pmid:36264941
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref12] 12. EMA. List of medicines under additional monitoring. In: European Medicines Agency [Internet]. 17 Sep 2018 [cited 18 Jul 2023]. Available: https://www.ema.europa.eu/en/human-regulatory/post-authorisation/pharmacovigilance/medicines-under-additional-monitoring/list-medicines-under-additional-monitoring

Figures

Abstract

Introduction

Materials and methods

Data collection, curation, and transformation

Full-document semantic matching

Piecewise semantic matching.

EPAR vs. guideline semantic analysis.

Results

EPAR vs. guideline semantic similarity

Discussion

Supporting information

S1 Table. Full linear regression model.

S2 Table. List of the top ten EMA scientific guidelines contributing to group-level distance scores for the ten EPAR ATC level 2 groups with the most specific guideline matches.

S1 File. Best product unique text chunk matches between EPARs for medicinal products with ATC B02 and ATC J05 and EMA guidelines.

S2 File. Multi-document type clustering.

References