Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Assessing document section heterogeneity across multiple electronic health record systems for computational phenotyping: A case study of heart-failure phenotyping algorithm

  • Sungrim Moon,

    Roles Data curation, Formal analysis, Methodology, Software, Writing – original draft, Writing – review & editing

    Affiliation Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States of America

  • Sijia Liu,

    Roles Software

    Affiliation Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States of America

  • Bhavani Singh Agnikula Kshatriya,

    Roles Software, Writing – review & editing

    Affiliation Department of Center for Digital Health, Mayo Clinic, Rochester, MN, United States of America

  • Sunyang Fu,

    Roles Software

    Affiliation Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States of America

  • Ethan D. Moser,

    Roles Data curation

    Affiliation Department of Quantitative Health Sciences, Division of Epidemiology, Mayo Clinic, Rochester, MN, United States of America

  • Suzette J. Bielinski,

    Roles Investigation, Writing – review & editing

    Affiliation Department of Quantitative Health Sciences, Division of Epidemiology, Mayo Clinic, Rochester, MN, United States of America

  • Jungwei Fan,

    Roles Writing – review & editing

    Affiliation Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States of America

  • Hongfang Liu

    Roles Investigation, Supervision, Writing – review & editing

    Affiliation Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States of America



The incorporation of information from clinical narratives is critical for computational phenotyping. The accurate interpretation of clinical terms highly depends on their associated context, especially the corresponding clinical section information. However, the heterogeneity across different Electronic Health Record (EHR) systems poses challenges in utilizing the section information.


Leveraging the eMERGE heart failure (HF) phenotyping algorithm, we assessed the heterogeneity quantitatively through the performance comparison of machine learning (ML) classifiers which map clinical sections containing HF-relevant terms across different EHR systems to standard sections in Health Level 7 (HL7) Clinical Document Architecture (CDA).


We experimented with both random forest models with sentence-embedding features and bidirectional encoder representations from transformers models. We trained MLs using an automated labeled corpus from an EHR system that adopted HL7 CDA standard. We assessed the performance using a blind test set (n = 300) from the same EHR system and a gold standard (n = 900) manually annotated from three other EHR systems.


The F-measure of those ML models varied widely (0.00–0.91%), indicating MLs with one tuning parameter set were insufficient to capture sections across different EHR systems. The error analysis indicates that the section does not always comply with the corresponding standardized sections, leading to low performance.


We presented the potential use of ML techniques to map the sections containing HF-relevant terms in multiple EHR systems to standard sections. However, the findings suggested that the quality and heterogeneity of section structure across different EHRs affect applications due to the poor adoption of documentation standards.


The wide adoption of electronic health records (EHRs) creates a rich and integrated data source for phenotypic information. Computational phenotyping, which automatically extracts phenotypes from EHR data, can accelerate the adoption and utilization of phenotype-driven efforts to advance scientific discovery and improve healthcare delivery. Given that much clinical information is embedded in clinical narratives, natural language processing (NLP) techniques have been extensively utilized to extract such information for accurate computational phenotyping. However, the interpretation of a term or phrase mentioned in a document depends on its associated section context. For instance, an occurrence of the term “heart failure” in the “Past Medical History” section most likely means the patient had a history of heart failure (HF). In contrast, “Brother had heart failure” in the “Family History” section would imply the patient has a family member with HF. Deploying computational phenotyping algorithms built upon one EHR system which incorporates section information to different EHR systems requires the mapping of relevant clinical sections due to the lack of standardization in clinical document practice [1, 2].

The eMERGE HF phenotyping algorithm was developed using data from the General Electronic Centricity (GEC) EHR system at Mayo Clinic, which adopted the Health Level 7 (HL7, Clinical Document Architecture (CDA) 1.0 standards [3]. The algorithm requires HF-relevant terms to be from clinical sections capturing patient information about current or past medical problems. Implementing the HF phenotyping algorithm in other EHR systems requires the accurate identification of those corresponding clinical sections. However, distinct EHR systems often use heterogeneous documentation (e.g., the lexical variants to describe the same clinical section, one major section that matches multiple granular document subsections in other EHR systems), leading to the challenge of identifying corresponding sections between separate EHRs for phenotype algorithms. In this study, we explored the use of embedding-based machine learning (ML) classifiers to detect corresponding sections among different EHRs, which were trained using a labeled corpus automatically extracted from the GEC system and evaluated using a blind test set sampled across four EHR systems using HF phenotyping algorithm as a case study. Two types of embedding-based ML classifiers, the random forest (RF) model and bidirectional encoder representations from the transformers (BERT) model, were experimented [4]. The performance evaluation of those classifiers also allowed us to assess the heterogeneity associated with the section information among them.


Standardization of clinical section

Clinical documentation is quite complex as it can fall into different types (e.g., consultation notes, progress reports) depending on the purpose. For a given document type, sections or subsections generally follow a logical sequence that has not changed much [5]. However, there are inconsistencies among different EHR vendors regarding document types and sections [6]. One effort to standardize clinical documentation is HL7 Clinical Document Architecture (CDA), initiated in 1996, one of the widely adopted HL7 version 3 standards. It standardizes document metadata and organizes clinical contexts into various sections. The latest Fast Healthcare Interoperability Resource (FHIR) specification ( has adopted HL7 CDA as documentation standards. However, those standards have not been consistently adopted by EHR vendors [7, 8].

Approaches for section detection

The detection of sections in a single EHR system or the alignment of corresponding sections and subsections across different EHR systems have been explored with diverse approaches such as rule-based, ML-based, or hybrid approaches [6]. For example, Melton et al. detected sections in operative notes using the regular expression with controlled Logical Observation Identifiers Names and Codes terminology (LOINC) [9]. Haug et al. utilized the HL7 CDA standard to ensure the level of CDA-compliance of their results after training the Bayesian network with N-gram features which were extracted from the section in pathology and radiology notes [10].

Recently, the state of the art NLP approaches such as embedding-based or deep learning techniques. Beyond overcoming the heterogeneity of diverse data types in EHR using a hierarchical embedding-based model, these approaches offer the potential to detect sections [6, 1115]. For example, Sadoughi et al. used unidirectional long-short term memory (LSTM) units. And Salloum et al. proposed using bi-direction LSTM to detect sections while converting from medical dictations into clinical reports [16, 17]. In a recent study, Rosenthal et al. applied recurrent neural network (RNN) or the fine-tune BERT model using gated recurrence units trained with medical literature. They evaluated models using the Cleveland Clinic dataset and i2b2 dataset sentences to demonstrate the feasibility of detecting and classifying eleven common sections and sentences [18]. Their study showed that RNN or BERT could leverage the medical literature to predict clinical sections.

While section identification may enhance the performance of clinical NLP tasks, there is a lack of community efforts in standardizing clinical documentation structure.6 As most clinical NLP studies are based on a single site, very few studies have investigated section detection across multiple EHR systems for computational phenotyping [6, 19, 20].

Materials and methods

The overview of our methods for developing section identification classifiers and assessing the heterogeneity associated with the section information among multiple EHRs is shown in Fig 1.

Fig 1. The overview of assessing heterogeneity of clinical sections across three electronic health records using embedding-based machine learning approaches.

EHR = Electronic health records; HL7-CDA = Health Level 7—Clinical Document Architecture; ML = Machine Learning; RF = Random Forest; BERT = Bidirectional Encoder Representations from Transformers; GEC = General Electronic Centricity EHR.


We used a training corpus consisting of clinical documents of 5,000 patients randomly selected from a primary care cohort. All clinical documents (n = 1.6 million) from 2009 to 2013 were retrieved from the Mayo Clinic GE Centricity (GEC) EHR. Among clinical documents, we randomly extract 100,000 clinical sections containing HF-relevant terms (i.e., “heart failure,” “cardiac failure,” “multi-organ failure,” “ventricular failure,” “CVF,” and “LVF”) [3]. As Mayo Clinic GEC EHR adopted HL7 CDA 1.0 standard, we had a weakly labeled silver standard for section identification [21].

We used a cardiovascular epidemiology cohort retrieved from the Rochester Epidemiology Project (REP) as the test set. REP is a record-linkage system capturing longitudinal patient records of a cohort of 750,000 patients from various health care institutions, drawn from the population who resided in 27 counties in southern Minnesota and western Wisconsin between 1998 and 2019 [2224].

The study has appropriate approvals by the Mayo Clinic and Olmsted Medical Center Institutional Review Boards (IRBs Mayo 13–009317, Mayo 17–008818, and OMC 053-OMC-17).

Gold standard creation

To create the gold standard annotation, we retrieved 900 text segments from three EHR systems (300 each from IC Chart, Cerner, and Epic EHR systems) at two medical centers, Mayo Clinic and Olmsted Medical Center. The goal was to assess the performance of section identification classifiers and the heterogeneity across different EHR vendors. Those 900 text segments were randomly drawn from all text segments containing HF-relevant terms [3]. S1 Table presents examples of sentences with potential sections containing the HF-relevant terms from individual EHR systems, where the potential sections were retrieved through heuristic rules. As the eMERGE HF algorithm was developed based on the GEC EHR incorporating HF-relevant terms from six specific sections (i.e., “Assessment and Plan,” “History of Present Illness,” “Past Medical History,” “Chief Complaint and Reason for Visit,” “Problem,” and “Review of systems” sections), it is crucial to identify those relevant sections in those other EHR systems in order to implement HF phenotyping algorithm [3].

Two trained abstractors (Ethan D Moser and Donna Ihrke) annotated the 900 text segments by assigning a section or subsubsection to each occurrence of the HF-relevant terms and mapping them to standard sections of the HL7 CDA 1.0 (i.e., sections used in the GEC EHR). The section mapping can be fully or partially. If a given text segment can be mapped to an HF phenotyping-relevant section in the GEC EHR, the mapping is considered fully. Otherwise, it is considered partially mapping with all possible sections listed according to the high relevance of the context. For example, one text segment from the “subjective” sections in the other EHR system is mapped partially to the “History of Present Illness” and “Chief Complaint and Reason for Visit” sections in the GEC EHR. The third independent nurse abstractor (Ellen E. Koepsell) adjudicated the discrepant cases between the other two abstractors. If there were multiple mappings, fully or partially, we choose the most relevant section for the given text segment. The inter-annotator reliability was assessed using percent agreement and Cohen’s kappa.

Algorithm for the section classification

A total of seven classifiers were developed (as shown in Fig 2). For example, in the following text segment from the problem section, “DIAGNOSIS: 1. Congestive Heart Failure. 2. Chronic Systolic (Congestive) Heart. 3. Anemia.”, Seg consists of three sentences, {“1. Congestive Heart Failure,” “2. Chronic Systolic (Congestive) Heart,” “3. Anemia”}. The six binary classifiers classify the text segment and the specific section as true or false for each section C (“Assessment and Plan,” “History of Present Illness,” “Past Medical History,” “Chief Complaint and Reason for Visit,” “Problem,” or “Review of systems”). The above example with the “Problem” classifier predicts whether each of the three sentences is the “Problem” section or not. If the classifier labels any of the three sentences belonging to the “Problem” section, it results in output as true, and then the text segment is labeled as the “Problem” section, which is a case of true positive for the “Problem” classifier. Meanwhile, if the “History of Present Illness” classifier labels any of the three sentences as true, then the text segment is labeled as the “History of Present Illness” section, which is a case of false positive for the “History of Present Illness” classifier because the original sentences in the silver standard belong to the problem section. The HF phenotyping-relevant section classifier combined the outputs of six ML classifiers to make a decision collectively. If any of the six classifiers output true, i.e., Seg belongs to one of the HF phenotyping-relevant sections, the classifier output is true. In the above example, the output of the HF phenotyping-relevant section classifier is true because the “Problem” classifier output is true.

Two types of embedding-based ML algorithms were compared, a random forest (RF) model with an embedded-encoding sentence using Bert-as-service ( and a pre-trained clinical BERT model ( Bert-as-service used BERT-base-uncase pre-trained model as a sentence encoder with the “REDUCE_MEAN” strategy by converting each input sequence (a sentence using the NLTK sentence tokenizer) into a 768-dimensional vector [25]. For random forest model, we used the default threshold as 0.5. Our BERT model started from the clinical BERT-base-uncased pre-trained model and fine-tuned for four epochs, with hyper-parameter settings of 32 batch size, 3e-5 learning rate, and 512 max sequence length. All training sentences for both models were in the silver standard; the clinical notes from Mayo Clinic GEC EHR adopted HL7 CDA 1.0 standard. Note that we choose binary classifiers due to the low performance of a multi-class or multi-label classifier.


We evaluated the performance of the section identification classifiers using a test set of 300 text segments from the GEC and a manually annotated gold standard as described above. The precision, recall, and F-measure were computed. We focused on assessing the interesting individual section rather than macro or weighted average evaluation because we used random phrases corresponding to HF-relevant sections in the test EHRs. If we denote true positives, true negatives, false positives, and false negatives as TP, TN, FP, and FN, respectively, those metrics are defined as follows: (1)


The training set (from GEC EHR) consists of 47 unique sections from randomly selected 100,000 sections containing HF terms. The clinical section distribution in the training set is 39% for “Assessment and Plan,” 23% for “History of Present Illness,” 11% for “Problem,” 7% for “Past Medical History,” 3% for “Chief Complaint and Reason for Visit,” less than 1% (0.89%) of “Review of systems,” and 16% of other 41 sections. The test set consists of a total of 110 unique sections and subsections from 900 text segments from three different EHRs (IC Chart, Cerner, and Epic). The percentage agreement and Cohen’s kappa between the two annotators was 78% and 0.56 (agreed 79 corresponding sections/subsections among 110 clinical sections) for those fully mapped, versus 80% and 0.35 (agreed 100 corresponding sections/subsections among 110 clinical sections) for those partially mapped.

Based on the manual annotations, the section headers of the three other EHR systems show a high degree of variety in representing patient-specific HF-relevant sections. For example, we identified the GEC section, “Assessment and Plan” had 23 different expressions from the other three EHR systems (22 fully corresponding sections and one partially corresponding section) in Table 1. The most frequent partially corresponding section is the “History of Present Illness.” (e.g., the “Disease Summary” subsections partially corresponded to the “History of Present Illness”). During the generation of the gold standard, a given text segment of the “Problem” section in the other three EHRs is often partially mappable to the following sections in the GEC EHR, “Chief Complaint and Reason for Visit,” “Past Medical History,” and “Others (Consults)” depends on the context as well as the viewpoint of the annotator.

Table 1. Corresponding sections and subsections among test corpora.

The distribution of clinical sections in the test sets is shown in Fig 3. Table 2 shows the general statistics of the test sets. We observed HF-relevant terms appear more frequently in the “Assessment and Plan,” “History of Present Illness,” and “Problem.” The Cerner corpus has nearly 46% (gray bar in Fig 3) samples containing HF-relevant terms in the “Problem” section. In contrast, the GEC and Epic (blue and yellow bars in Fig 3) have the majority in “Assessment and Plan” and “History of Present Illness.” In the case of the IC Chart (orange in Fig 3), “Assessment and Plan” is the majority, followed by “Past Medical History” and “History of Present Illness.” Overall, the GEC corpus contained most HF-relevant terms in those six specific sections (96% in Table 2), while other sets had relatively high HF-relevant terms in the “Other” sections (ranging from 7% to 11%).

Fig 3. The distribution of the sections and subsections in the test.

GEC = General Electronic Centricity EHR.

Table 2. Casea and non-caseb for train set in GEC and the heart failure relevant clinical section in test sets.

The section detection performance of the embedding-based MLs is shown in Table 3. It varies widely, from 0.00 to 0.91 F-measure in Table 3. The performance within the homogenous EHR (i.e., the test set from GEC EHR) is higher than those of heterogeneous EHR (i.e., the test sets from the other three EHRs) except in the “Past Medical History” section for BERT models. Both RF and BERT models show higher performance in identifying “HF phenotyping-relevant sections” (i.e., six specific sections) than individual sections. Especially, the BERT models can capture “HF phenotyping-relevant sections” sections with the best F-measure as 0.88 for the GEC corpus and 0.81–0.86 for three other EHRs. Also, the BERT models tend to achieve better performance than the RF models with embedding-based features. However, both RF and BERT failed to capture the “Problem” section. The “Chief Complaint and Reason for Visit” and “Review of system” sections were also not captured by RF. Note that the performance of the RF and BERT models for the “Review of systems” section in the Cerner EHR is 1.00 for precision (Table 3); however, it has less meaning because the total sample size of the case is one in Table 2.

Table 3. Evaluation of binary embedding-based ML models regarding the section.


The generalizability of NLP-empowered computational phenotyping algorithms leveraging section information depends on how well those sections are aligned across different EHR systems. We developed and evaluated embedding-based approaches (e.g., derived embedded features or utilizing available BERT models) for mapping clinical sections containing HF-relevant terms across four EHR systems. We also investigated the transferability of section classifiers trained with a labeled corpus to different EHR systems with the following key findings: 1) models for section classification trained using a single EHR system have limited generalizability to other EHR systems because of different section structures. For computational phenotyping algorithms leveraging section information (e.g., the eMERGE HF phenotyping algorithm), the heterogeneity associated with section structure may require dedicated efforts to account for such heterogeneity when deploying NLP-based phenotyping algorithms across different sites. 2) significant variation in clinical documentation across different EHR systems limits the full potential of using EHR for clinical research as each site may need to develop site-specific computational phenotyping algorithms. According to our knowledge, our study is the first study that quantitatively assessed the impact of documentation heterogeneity across multiple EHR systems for computational phenotyping.

In general, we observed a certain level of transferability across different EHR systems, especially for the “Assessment and Plan” section and the “History of Present Illness” section. It is known that the most informative resources for medical experts to obtain the comprehensive medical history of patients were in the “Assessment and Plan” section as well as the “History of Present Illness” section [26]. Therefore, section classifiers for those two sections have better performances due to the necessity of having higher consistency in documenting critical information. Meanwhile, the “Review of System” does not sufficiently record medical history [26]. The distribution of HF-relevant concept mentions reflects similar observations as those of the previous study [26].

Our study demonstrates the mapping of clinical sections across different EHRs is a challenging task, even though the mapping of specific clinical sections to standards does not impact much on the eMERGE HF algorithm. Specifically, the classifier for detecting HF phenotyping-relevant sections achieved high performance as HF-relevant terms tend to occur only in certain sections in those EHR systems. In our training and test sets, about 16% and 7% of HF-relevant terms appeared in clinical sections that were not HF phenotyping-relevant sections, respectively. In this study, we achieved an above 0.8 F1-score for detecting HF phenotyping-relevant sections that present high transferability. However, these may not be true for other phenotypes (e.g., pain).

Both BERT and RF models failed for the “Problem” section. GEC’s “Problem” section consists of noun phrases, incomplete sentences, or short expressions rather than a complete sentence structure. The average number of words in a sentence in the training set (i.e., the GEC EHR) for the “Problem” section is less than 25. Those embedding-based techniques might fail as little contextual information was available to learn. This inferior result was analogous to preliminary results of prior studies, subsequently resolved through the wide-ranging parameter tuning and class optimization [27]. Meanwhile, the “Problem” section in other EHR systems contains full sentence structures. According to our error analysis, the majority of sentences in the “Problem” section were incorrectly categorized into the “Chief Complaint and Reason for Visit” or “Past Medical History” sections, which often contained repeated information about the disease information of patients in the GEC EHR. Additionally, our training data for the “Problem” section is very skewed, with only 4.76% of sentences labeled as positive (i.e., from the “Problem” section).

We observed the lack of section standardization in clinical documentation might contribute to the low inter-annotator agreements (i.e., fair and moderate agreements). For example, family history information can be documented in the “Chief Complaint and Reason for Visit” section. Additionally, we noticed many subsections were defined but not mappable to standards, similar to prior studies [10]. This implies significant challenges faced in detecting section context for clinical NLP. As the prior study presented, the HL7 CDA standards provide diverse representations to capture certain granular information within the various sentences [28]. If clinical sections could conform more consistently to standards like the HL7 CDA, it will significantly raise the value of EHR in the secondary use for research.

Our model demonstrated the wide range of transferability of section information regarding patients’ HF information associated with the medical history across four different EHR systems. The distribution of HF phenotyping relevant sections varies across EHRs. For example, the most frequent section for the IC chart and Epic is the “Assessment and Plan” section, with 36% and 35% prevalence, respectively. In comparison, the most frequent section for the Cerner EHR is the “Problem” section (46%). Additionally, the “History of Present Illness” section of the Epic EHR contains more HF-relevant terms (24%) compared to the other two EHR systems (18% for the IC chart and 16% for Cerner). HF-relevant terms were prevalent in the “Assessment and Plan” and “History of Present Illness” sections in the GEC EHR but with low occurrences in other EHRs, as shown in Table 2 and Fig 3. The section distribution variation across different EHR systems for HF-relevant terms may also be due to the characteristics of the practice settings rather than the EHR systems alone.

Note that our study focused on the assessment of heterogeneity of section information across different EHRs through the performance evaluation of embedding-based section classifiers. We trained ML models for section classification using a large corpus consisting of CDA-compliant clinical documents from one EHR system and tested on text fragments containing HF-relevant terms from multiple other EHR systems. We did not thoroughly compare various ML algorithms and other BERT-based approaches for section classification tasks or exhausting parameter optimizations for the RF and BERT models. Furthermore, the sample size of the test set (n = 1,200) was limited, which may lead to high variations. To fair comparison, we used the same chunk of the sentence set for training both BERT and RF models; however, this setting may lead to the disadvantage of BERT, which is a contextualized language model. The study confirmed poor adoption of standardization in clinical documentation, which can cause significant challenges for computational phenotyping leveraging section information. Additionally, we only conducted the assessment on one computational phenotyping algorithm, but the evidence of the heterogeneity is clear. Our future direction would be to explore context-aware computational phenotyping algorithms to infer the section information from the local discourse context rather than leveraging section headers.

Supporting information

S1 Table. Differentiation of corresponding sections and subsections among corpora.

aOne clinical document standard of IC Chart Electronic Health Record (EHR) consists of sections ‘S,’ ‘O’, ‘A,’ and ‘P.’; The “Impression and Plan” section of the Cerner EHR, corresponds to “the Assessment and Plan” section of the General Electronic Centricity (GEC) EHR, which contains information on the “Problem,” “Medications,” and other sections; “Chief Complaint and Reason for Visit” sections of the Epic EHR are similar to “Problem” sections of the GEC EHR.



We would like to thank Donna Ihrke and Ellen E. Koepsell for conducting annotations, as well as Dr. Walter Rocca for providing the dataset.


  1. 1. Cambria E, White B. Jumping NLP curves: A review of natural language processing research. IEEE Computational intelligence magazine. 2014;9(2):48–57.
  2. 2. Sohn S, Wang Y, Wi C-I, Krusemark EA, Ryu E, Ali MH, et al. Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions. Journal of the American Medical Informatics Association. 2018;25(3):353–9. pmid:29202185
  3. 3. Bielinski SJ, Pathak J, Carrell DS, Takahashi PY, Olson JE, Larson NB, et al. A robust e-epidemiology tool in phenotyping heart failure with differentiation for preserved and reduced ejection fraction: the electronic medical records and genomics (eMERGE) network. Journal of cardiovascular translational research. 2015;8(8):475–83. pmid:26195183
  4. 4. Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018.
  5. 5. Denny JC, Spickard A III, Johnson KB, Peterson NB, Peterson JF, Miller RA. Evaluation of a method to identify and categorize section headers in clinical documents. Journal of the American Medical Informatics Association. 2009;16(6):806–15. pmid:19717800
  6. 6. Pomares-Quimbaya A, Kreuzthaler M, Schulz S. Current approaches to identify sections within clinical narratives from electronic health records: a systematic review. BMC medical research methodology. 2019;19(1):1–20.
  7. 7. Boone KW. The HL7 clinical document architecture. The CDA TM book: Springer; 2011. p. 17–21.
  8. 8. Amato F, Casola V, Mazzocca N, Romano S. A semantic approach for fine-grain access control of e-health documents. Logic Journal of the IGPL. 2013;21(4):692–701.
  9. 9. Melton GB, Wang Y, Arsoniadis E, Pakhomov SV, Adam TJ, Kwaan MR, et al. Analyzing operative note structure in development of a section header resource. Studies in health technology and informatics. 2015;216:821. pmid:26262166
  10. 10. Haug PJ, Wu X, Ferraro JP, Savova GK, Huff SM, Chute CG, editors. Developing a section labeler for clinical documents. AMIA Annual Symposium Proceedings; 2014: American Medical Informatics Association.
  11. 11. Apostolova E, Channin DS, Demner-Fushman D, Furst J, Lytinen S, Raicu D, editors. Automatic segmentation of clinical texts. 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2009: IEEE.
  12. 12. Khattak FK, Jeblee S, Pou-Prom C, Abdalla M, Meaney C, Rudzicz F. A survey of word embeddings for clinical text. Journal of Biomedical Informatics: X. 2019;4:100057. pmid:34384583
  13. 13. Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, et al. Deep learning in clinical natural language processing: a methodical review. Journal of the American Medical Informatics Association. 2020;27(3):457–70. pmid:31794016
  14. 14. Dai H-J, Syed-Abdul S, Chen C-W, Wu C-C. Recognition and evaluation of clinical section headings in clinical documents using token-based formulation with conditional random fields. BioMed research international. 2015;2015. pmid:26380302
  15. 15. Meng Y, Speier W, Ong M, Arnold CW. HCET: Hierarchical Clinical Embedding With Topic Modeling on Electronic Health Records for Predicting Future Depression. IEEE Journal of Biomedical and Health Informatics. 2020;25(4):1265–72.
  16. 16. Salloum W, Finley G, Edwards E, Miller M, Suendermann-Oeft D, editors. Automated preamble detection in dictated medical reports. BioNLP 2017; 2017.
  17. 17. Sadoughi N, Finley GP, Edwards E, Robinson A, Korenevsky M, Brenndoerfer M, et al., editors. Detecting section boundaries in medical dictations: toward real-time conversion of medical dictations to clinical reports. International Conference on Speech and Computer; 2018: Springer.
  18. 18. Rosenthal S, Barker K, Liang Z, editors. Leveraging Medical Literature for Section Prediction in Electronic Health Records. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); 2019.
  19. 19. Mehrabi S, Krishnan A, Roch AM, Schmidt H, Li D, Kesterson J, et al. Identification of patients with family history of pancreatic cancer-Investigation of an NLP System Portability. Studies in health technology and informatics. 2015;216:604. pmid:26262122
  20. 20. Tepper M, Capurro D, Xia F, Vanderwende L, Yetisgen-Yildiz M, editors. Statistical Section Segmentation in Free-Text Clinical Records. Lrec; 2012.
  21. 21. Wu S, Liu S, Wang Y, Timmons T, Uppili H, Bedrick S, et al. Intrainstitutional EHR collections for patient‐level information retrieval. Journal of the Association for Information Science and Technology. 2017;68(11):2636–48.
  22. 22. St Sauver JL, Grossardt BR, Yawn BP, Melton LJ III, Pankratz JJ, Brue SM, et al. Data resource profile: the Rochester Epidemiology Project (REP) medical records-linkage system. International journal of epidemiology. 2012;41(6):1614–24. pmid:23159830
  23. 23. Rocca WA, Grossardt BR, Brue SM, Bock-Goodner CM, Chamberlain AM, Wilson PM, et al. Data resource profile: expansion of the Rochester Epidemiology Project medical records-linkage system (E-REP). International journal of epidemiology. 2018;47(2):368-j. pmid:29346555
  24. 24. Manemann SM, St Sauver JL, Liu H, Larson NB, Moon S, Takahashi PY, et al. Longitudinal cohorts for harnessing the electronic health record for disease prediction in a US population. BMJ Open. 2021;11(6):e044353. pmid:34103314
  25. 25. Liu S, Fu S, Moon S, Wen A, Liu H, editors. Predicting Section Location of Clinical Sentences using BERT Encoder-A Pilot Study. AMIA; 2020.
  26. 26. Koopman RJ, Steege LMB, Moore JL, Clarke MA, Canfield SM, Kim MS, et al. Physician information needs and electronic health records (EHRs): time to reengineer the clinic note. The Journal of the American Board of Family Medicine. 2015;28(3):316–23. pmid:25957364
  27. 27. Peng Y, Yan S, Lu Z. Transfer learning in biomedical natural language processing: An evaluation of bert and elmo on ten benchmarking datasets. arXiv preprint arXiv:190605474. 2019.
  28. 28. Chen ES, Manaktala S, Sarkar IN, Melton GB, editors. A multi-site content analysis of social history information in clinical notes. AMIA Annual Symposium Proceedings; 2011: American Medical Informatics Association.