Figures
Abstract
Narratives posted on the internet by patients contain a vast amount of information about various concerns. This study aimed to extract multiple concerns from interviews with breast cancer patients using the natural language processing (NLP) model bidirectional encoder representations from transformers (BERT). A total of 508 interview transcriptions of breast cancer patients written in Japanese were labeled with five types of concern labels: "treatment," "physical," "psychological," "work/financial," and "family/friends." The labeled texts were used to create a multi-label classifier by fine-tuning a pre-trained BERT model. Prior to fine-tuning, we also created several classifiers with domain adaptation using (1) breast cancer patients’ blog articles and (2) breast cancer patients’ interview transcriptions. The performance of the classifiers was evaluated in terms of precision through 5-fold cross-validation. The multi-label classifiers with only fine-tuning had precision values of over 0.80 for "physical" and "work/financial" out of the five concerns. On the other hand, precision for "treatment" was low at approximately 0.25. However, for the classifiers using domain adaptation, the precision of this label took a range of 0.40–0.51, with some cases improving by more than 0.2. This study showed combining domain adaptation with a multi-label classifier on target data made it possible to efficiently extract multiple concerns from interviews.
Citation: Watabe S, Watanabe T, Yada S, Aramaki E, Yajima H, Kizaki H, et al. (2024) Exploring a method for extracting concerns of multiple breast cancer patients in the domain of patient narratives using BERT and its optimization by domain adaptation using masked language modeling. PLoS ONE 19(9): e0305496. https://doi.org/10.1371/journal.pone.0305496
Editor: Asif Ekbal, Indian Institute of Technology Patna, INDIA
Received: April 25, 2023; Accepted: May 30, 2024; Published: September 6, 2024
Copyright: © 2024 Watabe et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: DIPEx interview data cannot be made publicly available due to the consent and explanation document for the interviews to which the research participant has consented. Data can be viewed on the DIPEx website (https://www.dipex-j.org/). but if you wish to have your data analyzed for the purpose of conducting research, you can submit an application on the web page about the data-sharing system (https://www.dipex-j.org /outline/data-sharing), and must be approved by DIPEx. Also, data cannot be shared publicly according to the term of service of LifePalette which was agreed by blog users. Data will be available if contract for research purpose is signed-off with Mediaid Corporation (please visit and contact https://mediaid.co.jp/contact/) for researchers who are interested in analyzing the data, although the individual blog posts can be browsed on the net, LifePalette site, after user registration (please visit https://lifepalette.jp/login).
Funding: This work was supported by JSPS KAKENHI Grant Number 21H03170 and JST CREST Grant Number JPMJCR22N1, Japan. For more information on JSPS and JST, please visit https://www.jsps.go.jp and https://www.jst.go.jp, respectively. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: HY is the CEO of Mediaid Corporation, which operates LifePalette. The other authors declare no competing. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
Introduction
Breast cancer is the most commonly diagnosed cancer among women and its incidence is rising worldwide [1–3]. Breast cancer treatment tends to be prolonged, and the increasing incidence of the disease in the 30s and 40s, a time of many potentially major life events, may cause various problems for patients, including employment issues, financial pressures, and psychosocial stress [4]. The onset of breast cancer has a significant impact on the patients’ work and employment prospects, but few patients consult with medical professionals about such issues, and the actual situation is often not well understood [5]. In addition, patients also have physical concerns that are unique to breast cancer, such as breast reconstruction after surgery, or fertility problems caused by drugs and radiation therapy, but these concerns are poorly shared with medical professionals [6–8].
With the development of internet services in recent years, many patients are using social media, such as Twitter, to spread information [9, 10]. Such patients’ narratives may provide information about their needs that they do not tell medical professionals, and they are also a valuable resource for other patients to find information they want, which may improve their quality of life. In order to explore ways to utilize such narratives, qualitative research methods have been used [11–13]. However, the sheer volume of information on the internet makes it difficult to analyze, and therefore methods using NLP have been proposed as a way to process these large amounts of data [14].
In NLP approach, both unsupervised and supervised machine learning methods are being used worldwide to analyze patients’ stories. Many unsupervised machine learning approaches have been studied for content analysis, using topic models such as Latent Dirichlet Allocation (LDA) [15]. In supervised learning approaches, on the other hand, classical algorithms such as support vector machine (SVM) and naive bayes classifier (NB), and more recently, complicated deep learning frameworks such as long short-term memory (LSTM) [16] and BERT [17] are often used [18–21]. In supervised learning, tasks such as classification and unique expression extraction (NER) are used to extract useful knowledge from records of patients helping each other in the community [22, 23]. However, few studies have dealt primarily with patients’ concerns.
In this situation, we have recently reported developing BERT-based classifier to automatically classify multiple concerns from blog posts by patients with breast cancer [24]. Although some improvements are needed in terms of the performance of the model, our previous study showed that the NLP model can extract multiple concerns. Since breast cancer patients have multiple concerns and their concerns may change over time, the creation of such a multi-label classifier is a stepping stone to creating a higher-quality information delivery system for individual patients. In recent years, however, the main platforms for patients to share and obtain information on the internet have become Facebook, YouTube, podcasts, and other platforms, rather than blogs [25, 26]. The major difference from blogs is that patients’ thoughts are often expressed in the form of interviews, and the spoken rather than the written word is becoming the main form of communication. In Japanese conversation, there is a great variety of slurs, mispronunciations, word-chokes, and onomatopoeia, and the gap between the written and spoken language is greater than in English. Applying NLP models to text from speech transcripts therefore has the potential to yield findings that differ from those of previous studies, and such attempts could be helpful for implementing better information-providing systems in the future, especially in non-English-speaking countries.
Therefore, this study aimed to examine the applicability of the BERT-based classifier proposed in the previous study to breast cancer patients’ interviews and also explore methods to improve the performance of the classifier for patients’ narratives. Our results indicate that the "physical" and "work/financial" concerns of breast cancer patients can be efficiently extracted using BERT-based methods not only from text sources, as previously reported, but also from spoken materials such as interview transcriptions. Furthermore, we show here that domain adaptation using MLM is effective for improving performance in the patient narrative domain, and this opens up potential future applications utilizing a variety of unstructured patient-generated data such as narrative text. These findings suggests that the application of natural language processing to a broader range of source materials than have been employed to date, including speech transcripts, will be helpful to improve the provision of personalized advice to patients.
Materials and methods
Overview
This section describes the dataset used in the study, the methodology for developing the classifier, and the metrics for evaluating it. Fig 1 shows the flow of data extraction and model development. First, text excerpts from interviews with breast cancer patients were collected, and the texts were annotated according to predetermined guidelines. In addition, the annotated text was segmented for model development using 5-fold cross-validation (Fig 1A).
A: Multiple five concern l Multiple five concern labels were assigned to each text. Labeled texts were split into two datasets in a ratio of 4:1: training data (406), and test data (102). B: The classifier was built by fine-tuning the pre-trained BERT model. Training data in the previous section was used. C: The method of fine-tuning is the same as in B, except that the BERT model was additionally trained using masked language modeling on domain-specific data before fine-tuning. We prepared multi datasets for this. A detailed description of each dataset is given in the “Data collection” section and Table 1.
Then, we built the classifier from processed texts dealing with patient concerns. In this study, two main ways of model development were tried. Fig 1B and 1C shows the model development methodology. In the first method, a pre-trained NLP model was fine-tuned directly with the target data. As another method, in an attempt to improve the performance of the model, we tested additional model training to adapt the model to the patients’ narratives domain prior to fine-tuning. For the data used in the domain adaptation here, there are several combinations, which are presented in more detail in Table 1 below.
Data collection
The patients’ narratives data were collected from interview transcripts of patients with breast cancer registered in Database of Individual Patient Experiences-Japan (hereinafter, called DIPEx) [27]. DIPEx divides the interview transcriptions into sections for each theme, such as "onset of illness" and "treatment", and posts the extracted processed texts on its website. Each process is conducted by accredited researchers based on qualitative research methods established by the University of Oxford [28]. In this study, interview text data generated from interviews with 52 breast cancer patients conducted from January 2008 to October 2018 were provided with the approval of DIPEx, of which 508 were included, after excluding duplicates (hereafter, this data is referred to as processed DIPEx, (DIPEx(P)). In addition, to explore ways to improve classifier performance, we used Japanese blog articles in the patient web community LifePalette [29], one of the most active internet patient communities in Japan, along with texts from DIPEx in a domain adaptation test.
Ethical approval
This study was approved by the ethics committee of the Keio University Faculty of Pharmacy (approval No 220311–1, 221205–1). All procedures were performed in accordance with the Ethical Guidelines for Medical and Health Research Involving Human Subjects (set by the Ministry of Education, Culture, Sports, Science and Technology and the Ministry of Health, Labour and Welfare in Japan) and the Declaration of Helsinki and its later amendments. Informed consent for this study was waived due to its retrospective observational design. DIPEx data were provided through data sharing, a system for utilizing narrative data for research and education, and consent for this was obtained from the research participants at the time when the interviews were recorded. Consent to use the data from Life Palette for research purposes was obtained from users at the time of registration. Although non-textual information such as age at diagnosis and gender information is present in both data sources, we had consent to utilize only textual data for research purposes in this study.
Data pre-processing
In this study, we defined concern labels as expressions that indicate what concerns are described in the interview transcriptions. The annotation guidelines describing the criteria for assigning labels were used in accordance with previous studies [24] (S1 Table). Labels are assigned if there are expressions of cancer-related concerns in the text. Multiple labels were allowed if more than one description of distress corresponded to a single text. In these guidelines, five labels, “treatment,” “physical,” “psychological,” “work/financial,” and “family/friends,” were established based on the Shizuoka Classification, a method of classifying cancer patients’ problems established from a previous survey [30]. Based on the annotation guidelines, two researchers (SW and TW) assigned labels to 100 randomly selected texts out of a total of 508 texts, 50 each in two sessions. Although the guidelines were thoroughly discussed prior to annotation, the first session was used to establish the annotation method and how to handle the guidelines, and the second session was used to confirm the reliability of the guidelines. The agreement of the assignment results was confirmed using the kappa coefficient, and finally SW assigned labels to all texts.
Model development
Table 1 shows a list of all classification model development patterns that were tried in this study. To develop the classifier, BERT, a context-aware, large-scale NLP model, was used. BERT is trained through a two-step learning process. The first step is pretraining from scratch on a large text data set, which allows the model to learn generic features common to various tasks and the second step is to fine-tune the pre-trained model for the desired task using a small amount of new data (Fig 1B and 1C).
The model was developed by fine-tuning the pre-trained Japanese BERT model from the Inui/Suzuki Lab at Tohoku University [31]. Hereafter, this pre-trained model-based model will be referred to as J-based BERT. In addition, domain adaptation was tried before fine-tuning in order to explore ways to improve the performance of the model. It is often the case that the characteristics of the data used in fine-tuning are different from those used in pre-training, and in such cases, it is known that additional training with data from the target domain before fine-tuning improves the performance of the model [32, 33].
In this case, the J-based BERT was pre-trained on a large volume of text from the Japanese version of Wikipedia. However, the data used for fine-tuning was interview transcriptions, and the nature of the two types of texts is very different. Furthermore, the small volume of data made it difficult to tune the model for the task. The reason for this is that transcriptions of interviews are costly, because they are created through many processes, including the appropriate treatment for faltering or misspoken words. This makes it difficult to acquire large amounts of data compared to text data available from Twitter, blogs, etc. Domain adaptation is effective when it is difficult to prepare data for such a specific field. In this case, masked language modeling (MLM), one of BERT’s pre-training methods, was used as the domain adaptation method.
This method is self-supervised learning, which means that unlabelled text can also be used for training. Therefore, pre-processed patient interview transcripts and texts from related domains, labelled or unlabelled, may also be used as resources. The amount of relevant data at hand is large compared to the data covered in this study, leading to an exploration of how to make use of unprocessed interview transcripts. Prior research suggests that choosing an MLM-based approach as domain adaptation improves performance in downstream classification tasks [34, 35]. By adding such domain-specific data, domain adaptation allows the weights to be adjusted before fine-tuning the model, thus mitigating the effects of differences in the nature of the data during fine-tuning and pre-training. In this study, MLM was applied based on the pre-trained model tuning method supported by the Huggingface package [36]. The goal of MLM is cross entropy loss in predicting masked tokens. In this study, as in the original paper on BERT, we selected 15% of the input tokens for possible substitution [17]. NVIDIA’s Tesla M10 was used for training, batch size was set to 8, and the number of epochs was set to 30, since there was no difference in the results for larger numbers of epochs. After the MLM was applied, fine-tuning was done. The parameters for fine-tuning were the same as in the previous study [24]: the model consisted of 12 layers of BERT itself and a fully connected layer, the loss function was set to cross-entropy, the batch size to 16, and the learning rate to 10−5, and fine-tuning was performed. However, unlike previous studies, the number of epochs was set to 30 and early stopping was implemented with patience. These methods were applied to all five labels for fine-tuning.
Table 1 summarizes all the dataset patterns tested: pre-processed DIPEx interview transcriptions (DIPEx raw), processed DIPEx (P), LifePalette, and DIPEx (P)+LifePalette.
Task and metrics
Using the test data, a multi-label classification task was performed to predict whether the text contained descriptions corresponding to each concern label. Since the ultimate goal is to apply the constructed model as an information support system for patients with concerns, it is preferable to construct a model with less out-of-place information provision. Therefore, we assessed the performance of the model by focusing on precision for each label. In addition, other evaluation items were F1 score and exact match accuracy, which indicates the percentage of correct prediction of all concern labels. For the evaluation, we used the average of 5-fold cross-validation results. For the evaluation of significant differences in precision scores for each label between models, we employed a two-tailed test with the Bonferroni correction for multiple comparison. The criterion of significance was p < 0.05.
Results
Annotated dataset
The average number of words per text in the interview transcriptions was 428.9, the median was 416.0, and the minimum and maximum were 103 and 1054. Texts with 512 words or less that were not caught by BERT’s input limit were 367/508 (72.2% of all texts). Of the remaining 141 texts, 109 (over 75%) were 700 words or less. The number of labels was highest for “physical (212)” and lowest for “work/financial (42)”. The number of labels per text was 0 or 1 for about 80% of the total, making it difficult to assign labels overall. Less than 3% of all texts were assigned three or more labels (Table 2). The label combinations that frequently occurred when there was more than one label assigned were “physical” and “psychological” (35), “treatment” and “physical” (27), and “physical” and “family/friends” (20), which were particularly frequent combinations with “physical”. This was followed by “psychological” and “family/friends” (20), “treatment” and “psychological” (10), while the others were less than 10.
The kappa coefficients were 0.64 for "treatment," 0.73 for "physical," 0.65 for "psychological," 0.83 for "work/financial," and 0.77 for "family/friends," with all labels being in the substantial or almost perfect category of Landis and Koch.
Table 3 shows excerpts of the statements that supported the assignment of concern labels at the time of annotation, for each label. The original texts in Japanese are also included in S2 Table. Due to the nature of the interview data, most of the descriptions were chronologically in the past.
Performance of models
Table 4 shows the performance parameters of the constructed models. In all modeling procedures, precision tended to be high for "physical" and "work/financial" and low for "treatment". Although inferior to "physical", the precision of the other labels, “psychological” and “family/friends” was in the neighborhood of 0.60~0.70. Fine-tuning alone showed little impact on model performance, due to differences in data sources. The model with domain adaptation by MLM added had the highest precision for the LifePalette data. When both DIPEx and LifePalette data were used, domain adaptation tended to have little effect on performance. Overall, however, the models with MLM domain adaptation performed better in terms of precision, F1 score, and exact match accuracy. In particular, the performance of extracting the "treatment" label was improved after domain adaptation. Accuracy and recall were almost unchanged before and after domain adaptation. In terms of statistical significance, the precision of the model applying MLM using LifePalette was higher than that of the baseline model for the "treatment" and "physical" labels (p < 0.05). In addition, for the "work/financial" label, the precision for the all models except the DIPEx raw model and DIPEx (P)+LifePalette model was higher than that of the baseline model (p < 0.05).
Discussion
Principal findings
Our results show that developed multi-label classifier can deal with multiple concerns. Among the five targeted concerns from, the extraction performance was particularly high with regard to work, household finances, and physical concerns. Also, the proposed method of domain adaption using MLM provides better performance for multi-label classification in the patients’ narratives domain compared to the approach using only fine-tuning. In particular, the model applying MLM with LifePalette had a significantly higher precision for “treatment” and “physical” labels. Importantly, these concerns can be extracted from text based on Japanese conversation, which includes a wide variety of complex mispronunciations and onomatopoeia. This suggests that our approach could be a step forward in the development of future systems for providing appropriate information to cancer patients.
Analysis of model performances
An error analysis was conducted to validate the performance of each label assignment. Concerning the labels "physical" and "work/financial," we considered that the text terms assigned to these labels are easily comprehensible and discriminable for the model. Specifically, if descriptions such as "pain," "nausea," or "feeling uncomfortable" were present, the label "physical" was assigned with a high probability, while if descriptions such as "money," "insurance," or "return to work" were present, the label "work/financial" was assigned with a high probability. For the "physical" label, many descriptions focused on specific body parts, and included adjectives such as "terrible" to describe the cancer status, "shock-like sensations," and "mastectomy." These descriptions were often accompanied by psychological shock and distrust of the hospital attended. Referring to the number of times the labels co-occurred, the top multi-labeled combinations were "physical" and " psychological " and "physical" and "treatment", with these two combinations accounting for approximately 60% of the total.
Conversely, for "work/financial," there were descriptions such as "interpersonal relationships at work" and "discussions about the cost of treatments." These expressions are specific to this concern and are less related to other concerns than the descriptions about the body, which are more common in "physical". As for the labeling results, "work/financial" alone accounted for around 70% of the results. For these reasons, it is considered that, although the number of labels for "work/financial" was far fewer than for "physical", it may have shown the same or better extraction performance than "physical". Nevertheless, the classification models may not fully distinguish whether or not these concerns are cancer-related.
As for the other three labels, they all account for nearly 15% of all labels granted, but there are marked differences in extraction performance. Among them, that of “treatment” was particularly low. There is no significant difference in the percentage of each label given on its own, and if multiple labels were given, the majority of the cases in which any of the labels were given were in combination with "physical".
For the "treatment" label, we believe that while specific patterns of concern existed, they varied widely and were difficult for the model to learn due to the small data population. Furthermore, since medical treatment-related terms are present in most texts, it is difficult to classify "treatment" labels by phrase, and the presence or absence of concerns must be determined from the context. With respect to the " psychological " label, labels were correctly assigned to texts with negative emotion words such as "shock" and "anxiety," but labels were misassigned to expressions that counteracted negative emotions. These expressions represent overcoming the emotional damage caused by cancer, but the model did not seem to understand this point. For the “family/friends” labels, labels were assigned to texts in which words such as “parents” and "siblings” were used to describe family members. However, labels were also given to positive statements such as “my family helped me to continue my cancer treatment”, so it is presumed that the learning is still not sufficient.
The application of domain adaptation resulted in an overall improvement in model performance, particularly in the precision of "treatment", which was notably low in the original model, although recall was not improved (Table 4). We speculate that this was due to successful learning of commonalities between the DIPEx and LifePalette datasets, leading to improved precision. Indeed, upon examination of the texts in which the model correctly classified “treatment”, it was observed that in many cases, the model correctly classified concerns also present in the LifePalette dataset, but was unable to correctly classify a diverse range of concerns unique to the DIPEx dataset. These results provide valuable insights into the reasons for the lack of improvement in recall.
Comparison with prior studies
In a previous study [24], on which the current study is based, a classifier created from blogs had the highest prediction precision for "physical" at over 0.80, while the prediction precision for other concern labels was around 0.60. Although the model created in this study was based on interview data, the extraction performance for "physical" was as good as in the previous study. The performance of "work/financial" was superior to that of previous studies, whereas the performance of "treatment" was inferior. However, the domain adaptation improved precision for this label by an average of more than 0.15. So far, Watanabe et al.’s study is the only one that has utilized leading-edge transformer-based NLP methods, such as BERT, for dealing with patients’ multiple concerns. Most studies have used classical algorithms, such as SVM and NB, and primarily focused on sentiment analysis [14, 37]. Studies using such methods, while the models are lightweight, show limitations in dealing with the domain of patients’ words [38, 39]. In this study, we also built a classical algorithmic model, but it failed to reach the classification performance of the deep learning model (S3 Table). Therefore, it is appropriate to utilise models such as BERT, which can take into account the context of the patient’s narrative. There have been studies using BERT for text related to breast cancer, with each study exploring ways to improve the performance of the model. However, there are no previous studies on domain adaptation using MLMs in the domain of patient narratives [40–42]. Although there is no previous research on this method conducted in the area of patient narratives, there are examples of improved model performance by making use of specific domain data resources for both English and Japanese texts [43, 44]. In this study, as in previous studies, MLM improved the performance of models.
Limitations
This study was subject to several limitations.
First, the dataset used in this experiment is small and imbalanced. In general, approximately 500 texts may not be sufficient for building machine learning models for text classification. However, in studies such as this, dealing with medical data containing personal information, datasets are often small, and problems such as bias in patient background factors are difficult to avoid. We attempted to mitigate this problem by having the model learn data from the relevant domain through domain adaptation. In fact, the dataset contained approximately 200 "physical" labels, which was the highest number of assigned labels, and approximately 40 "work and financial" labels, which was the lowest number of assigned labels, and this may have affected the model’s learning results.
Second, this model has the limitation that only 512 words can be input. Texts exceeding 512 words amounted to less than 30% of the dataset, and the omitted text was only a few dozen words, so there was little information loss. We tried methods that varied the input limit of 512 words and methods that used only data with less than 512 words, but the model performance showed little change.
Third, the data set used in this study includes only text data transmitted by patients. Thus, although there are many different types of breast cancer, and the progression of breast cancer varies from person to person, which may lead to significant differences in mental status and living environment, it is not possible to take account of patients’ precise treatment status in this study.
Fourth, although this study focuses on the usefulness of using unprocessed interview transcripts for training, there is a risk that too much wordiness will be learned if too much unstructured data is used for training. Therefore, minimal processing would likely need to be applied to avoid this effect, but there is no clear definition of the type of unspokenness or which state “unprocessed” refers to with respect to interviews, so it is difficult to identify an appropriate processing method. This study does not address this issue.
Future work
The concern classifier developed here is superior in extracting "physical" and "work/financial" concerns and could be applied to patient support systems to facilitate the identification of patients suffering from side effects or in need of work/financial support. From the viewpoint of patient support, it would be desirable in the future to provide more personalized information by combining this text-based system with other patient data, such as the patient’s stage of treatment. In addition, given the effectiveness of the NLP model for spoken text in this study, it could be applied to systems that extract patients’ needs and concerns from their voice input. Patient communications, such as blogs and interviews, as handled in this study, have high barriers to entry for patients, and it takes time for the information to become available to the public. Thus, it would also be desirable to develop systems that allow patients to communicate their needs and concerns directly to medical professionals from their homes, and to build chatbots that can automatically respond to such communications and provide information. We hope that the challenge of adapting the NLP model to spoken language, as in this study, will be the first step in the development of such a system.
Conclusions
In this study, we tested whether the method for extracting concerns from patient-generated text proposed in a previous study is also effective for interview transcriptions. Our results suggest that it is possible to effectively extract the "physical" and "work/financial" concerns of breast cancer patients using BERT-based methods, even for spoken text such as interview transcriptions. We also found that domain adaptation using MLM is effective for improving performance.
Supporting information
S2 Table. Table 3, which also includes the original texts (Japanese).
https://doi.org/10.1371/journal.pone.0305496.s002
(DOCX)
S3 Table. The performance of the classifiers based on classical algorithms.
https://doi.org/10.1371/journal.pone.0305496.s003
(DOCX)
References
- 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021 May;71(3):209–249. [Medline: 33538338]
- 2.
Cancer Statistics in Japan-2019. Tokyo, Japan: Foundation for Promotion of Cancer Research; Mar 2020.
- 3. Allemani C, Weir HK, Carreira H, Harewood R, Spika D, Wang XS, et al. Global surveillance of cancer survival 1995–2009: analysis of individual data for 25,676,887 patients from 279 population-based registries in 67 countries (CONCORD-2). Lancet. 2015 Mar 14;385(9972):977–1010. Epub 2014 Nov 26. Erratum in: Lancet. 2015 Mar 14;385(9972):946. pmid:25467588; PMCID: PMC4588097.
- 4. Syrowatka A, Motulsky A, Kurteva S, Hanley JA, Dixon WG, Meguerditchian AN, et al. Predictors of distress in female breast cancer survivors: a systematic review. Breast Cancer Res Treat. 2017 Sep;165(2):229–245. Epub 2017 May 28. pmid:28553684; PMCID: PMC5543195.
- 5. Fassier JB, Lamort-Bouché M, Broc G, Guittard L, Péron J, Rouat S, et al. Developing a Return to Work Intervention for Breast Cancer Survivors with the Intervention Mapping Protocol: Challenges and Opportunities of the Needs Assessment. Front Public Health. 2018 Feb 23;6:35. pmid:29527521; PMCID: PMC5829033.
- 6. Wang H, Liu J, Bordes MC, Chopra D, Reece GP, Markey MK, et al. The role of psychosocial factors in patients’ recollections of breast reconstruction options discussed with their surgeons. Sci Rep. 2022 May 6;12(1):7485. pmid:35523931; PMCID: PMC9076612.
- 7. Oktay K, Harvey BE, Partridge AH, Quinn GP, Reinecke J, Taylor HS, et al. Fertility Preservation in Patients With Cancer: ASCO Clinical Practice Guideline Update. J Clin Oncol. 2018 Jul 1;36(19):1994–2001. Epub 2018 Apr 5. pmid:29620997.
- 8. Ishak A, Yahya MM, Halim AS. Breast Reconstruction After Mastectomy: A Survey of Surgeons’ and Patients’ Perceptions. Clin Breast Cancer. 2018 Oct;18(5):e1011–e1021. Epub 2018 Apr 28. pmid:29784600.
- 9. Falisi AL, Wiseman KP, Gaysynsky A, Scheideler JK, Ramin DA, Chou WS. Social media for breast cancer survivors: a literature review. J Cancer Surviv. 2017 Dec;11(6):808–821. Epub 2017 Jun 10. pmid:28601981.
- 10. McHugh SM, Corrigan M, Morney N, Sheikh A, Lehane E, Hill AD. A quantitative assessment of changing trends in internet usage for cancer information. World J Surg. 2011 Feb;35(2):253–7. pmid:20972679.
- 11. Choi E, Becker H, Kim S. A Blog Text Analysis to Explore Psychosocial Support in Adolescents and Young Adults With Cancer. Cancer Nurs. 2022 Dec 1. Epub ahead of print. pmid:35349497.
- 12. Syntosi A, Felizzi F, Bouchet C. A Social Media Listening Study to Understand the Unmet Needs and Quality of Life in Adult and Pediatric Amblyopia Patients. Ophthalmol Ther. 2022 Dec;11(6):2183–2196. Epub 2022 Sep 29. pmid:36175822; PMCID: PMC9587203.
- 13. Gooden R, Winefield HR. Breast and prostate cancer online discussion boards: a thematic analysis of gender differences and similarities. J Health Psychol 2007 Jan;12(1):103–114. pmid:17158844
- 14. Dreisbach C., Koleck T. A., Bourne P. E., & Bakken S., “A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data,” International journal of medical informatics, 125, 37–46, 2019. pmid:30914179
- 15. Tapi Nzali MD, Bringay S, Lavergne C, Mollevi C, Opitz T. What patients can tell us: topic analysis for social media on breast cancer. JMIR Med Inform 2017 Jul 31;5(3):e23 pmid:28760725
- 16. Hochreiter S. Long Short-Term Memory. 1997;1780:1735–80.
- 17.
Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. 2019 Presented at: NAACL-HLT 2019; June 2019; Minneapolis, MN
- 18. Wagland Richard, et al. "Development and testing of a text-mining approach to analyse patients’ comments on their experiences of colorectal cancer care." BMJ Quality & Safety 25.8 (2016): 604–614. pmid:26512131
- 19. Doing-Harris K, Mowery DL, Daniels C, Chapman WW, Conway M. Understanding patient satisfaction with received healthcare services: A natural language processing approach. AMIA Annu Symp Proc. 2017 Feb 10;2016:524–533. pmid:28269848; PMCID: PMC5333198.
- 20. Nishioka S, Watanabe T, Asano M, Yamamoto T, Kawakami K, Yada S, et al. Identification of hand-foot syndrome from cancer patients’ blog posts: BERT-based deep-learning approach to detect potential adverse drug reaction symptoms. PLoS One. 2022 May 4;17(5):e0267901. pmid:35507636; PMCID: PMC9067685.
- 21. Wu EL, Wu CY, Lee MB, Chu KC, Huang MS. Development of Internet suicide message identification and the Monitoring-Tracking-Rescuing model in Taiwan. J Affect Disord. 2023 Jan 1;320:37–41. Epub 2022 Sep 23. pmid:36162682.
- 22. Dirkson A., Verberne S., van Oortmerssen G., Gelderblom H., & Kraaij W. (2023). How do others cope? Extracting coping strategies for adverse drug events from social media. Journal of Biomedical Informatics, 139, 104228. pmid:36309197
- 23. Verberne S., Batenburg A., Sanders R., van Eenbergen M., Das E., & Lambooij M. S. (2019). Analyzing empowerment processes among cancer patients in an online community: a text mining approach. JMIR cancer, 5(1), e9887.
- 24. Watanabe T, Yada S, Aramaki E, Yajima H, Kizaki H, Hori S. Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs. JMIR Cancer. 2022 Jun 3;8(2):e37840. pmid:35657664; PMCID: PMC9206207.
- 25.
Petek, Elyse. Pilot Study: Exploration of How Women Use Social Media After a Breast Cancer Diagnosis. 2021. Ohio State University, Master’s thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1617891913525326.
- 26. Güloğlu S, Özdemir Y, Basim P, Tolu S. YouTube English videos as a source of information on arm and shoulder exercise after breast cancer surgery. Eur J Cancer Care (Engl). 2022 Nov;31(6):e13685. Epub 2022 Aug 15. pmid:35970600.
- 27. DIPEx-Japan. Narratives of Breast Cancer Patients. URL: https://www.dipex-j.org/breast-cancer/ [accessed 2022-10-17].
- 28. Herxheimer Andrew, et al. "Database of patients’ experiences (DIPEx): a multi-media approach to sharing experiences and information." The Lancet 355.9214 (2000): 1540–1543. pmid:10801187
- 29. Mediaid Corporation. Life Palette. URL: https://lifepalette.jp [accessed 2022-10-16]
- 30.
The voices of 1,275 people who have faced breast cancer. Shizuoka Cancer Center. 2016. URL: https://www.scchr.jp/book/houkokusho/2013nyugan.html [accessed 2022-10-28]
- 31.
Inui Laboratory Tohoku University. cl-tohoku / bert-japanese. GitHub. URL: https://github.com/cl-tohoku/bert-Japanese [accessed 2022-10-17].
- 32.
Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, et al. 2020. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, Online. Association for Computational Linguistics.
- 33.
Roee Aharoni and Yoav Goldberg. 2020. Unsupervised domain clusters in pretrained language models. In ACL. To appear.
- 34.
Whang, Taesun, et al. "An effective domain adaptive post-training method for bert in response selection." arXiv preprint arXiv:1908.04812 (2019).
- 35.
Shnarch, Eyal, et al. "Cluster & tune: Boost cold start performance in text classification." arXiv preprint arXiv:2203.10581 (2022).
- 36. Face Hugging. Resources Course MAIN NLP TASKS. Fine-tuning a masked language model. URL: https://huggingface.co/course/chapter7/3?fw=pt [accessed 2023-03–10].
- 37. Sulieman L, Gilmore D, French C, Cronin RM, Jackson GP, Russell M, et al. Classifying patient portal messages using Convolutional Neural Networks. J Biomed Inform. 2017 Oct;74:59–70. Epub 2017 Aug 30. pmid:28864104.
- 38.
Miyabe M, Shimamoto Y, Aramaki E. Extracting patients’ distress of their medical care from web texts: the automatic classification of cancer patients’ distress. 2014 Presented at: Forum on Information Technology; September 2014; Tsukuba, Japan.
- 39. Khanbhai M, Anyadi P, Symons J, Flott K, Darzi A, Mayer E. Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review. BMJ Health Care Inform. 2021 Mar;28(1):e100262. pmid:33653690; PMCID: PMC7929894.
- 40.
Zhou, S.; Wang, L.; Wang, N.; Liu, H.; Zhang, R. CancerBERT: A BERT Model for Extracting Breast Cancer Phenotypes from Electronic Health Records. arXiv 2021, arXiv:2108.11303.
- 41. Zhang X, Zhang Y, Zhang Q, Ren Y, Qiu T, Ma J, et al. Extracting comprehensive clinical information for breast cancer using deep learning methods. Int J Med Inform. 2019 Dec;132:103985. Epub 2019 Oct 2. pmid:31627032.
- 42.
O. Solarte-Pabón, M. Torrente, A. Garcia-Barragán, M. Provencio, E. Menasalvas and V. Robles, "Deep learning to extract Breast Cancer diagnosis concepts," 2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS), Shenzen, China, 2022, pp. 13–18, https://doi.org/10.1109/CBMS55023.2022.00010
- 43.
Arefyev, Nikolay, Dmitrii Kharchev, and Artem Shelmanov. "Nb-mlm: Efficient domain adaptation of masked language models for sentiment analysis." Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021.
- 44.
Katsumata S, Komachi M, Manabe A, Tanimoto H. Using Data Selection for Failure Report Classification Problems Improving the Performance of BERT Model. 2020 Presented at: 26th Annual Conference of The Association for Natural Language Processing; March 2020; Ibaraki, Japan.