Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Tongue feature dataset construction and real-time detection

  • Wen-Hsien Chang,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Graduate Institute of Chinese Medicine, School of Chinese Medicine, College of Chinese Medicine, China Medical University, Taichung, Taiwan, Republic of China

  • Chih-Chieh Chen,

    Roles Investigation, Methodology

    Affiliation Center for Artificial Intelligence in Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan, Republic of China

  • Han-Kuei Wu,

    Roles Data curation, Resources

    Affiliations School of Post-Baccalaureate Chinese Medicine, College of Chinese Medicine, China Medical University, Taichung, Taiwan, Republic of China, Department of Traditional Chinese Medicine, Kuang Tien General Hospital, Taichung, Taiwan, Republic of China

  • Po-Chi Hsu,

    Roles Data curation, Resources

    Affiliations Department of Traditional Chinese Medicine, Kuang Tien General Hospital, Taichung, Taiwan, Republic of China, School of Chinese Medicine, College of Chinese Medicine, China Medical University, Taichung, Taiwan, Republic of China

  • Lun-Chien Lo,

    Roles Data curation, Supervision

    Affiliations School of Chinese Medicine, College of Chinese Medicine, China Medical University, Taichung, Taiwan, Republic of China, Department of Chinese Medicine, China Medical University Hospital, Taichung, Taiwan, Republic of China

  • Hsueh-Ting Chu ,

    Roles Conceptualization, Resources, Software, Supervision

    htchu@asia.edu.tw (HTC); tcmchh55@mail.cmu.edu.tw (HHC)

    Affiliation Department of Computer Science and Information Engineering, College of Computer Science, Asia University, Taichung, Taiwan, Republic of China

  • Hen-Hong Chang

    Roles Conceptualization, Data curation, Funding acquisition, Investigation, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing

    htchu@asia.edu.tw (HTC); tcmchh55@mail.cmu.edu.tw (HHC)

    Affiliations Department of Chinese Medicine, China Medical University Hospital, Taichung, Taiwan, Republic of China, Graduate Institute of Integrated Medicine, College of Chinese Medicine, and Chinese Medicine Research Center, China Medical University, Taichung, Taiwan, Republic of China

Abstract

Background

Tongue diagnosis in traditional Chinese medicine (TCM) provides clinically important, objective evidence from direct observation of specific features that assist with diagnosis. However, the current interpretation of tongue features requires a significant amount of manpower and time. TCM physicians may have different interpretations of features displayed by the same tongue. An automated interpretation system that interprets tongue features would expedite the interpretation process and yield more consistent results.

Materials and methods

This study applied deep learning visualization to tongue diagnosis. After collecting tongue images and corresponding interpretation reports by TCM physicians in a single teaching hospital, various tongue features such as fissures, tooth marks, and different types of coatings were annotated manually with rectangles. These annotated data and images were used to train a deep learning object detection model. Upon completion of training, the position of each tongue feature was dynamically marked.

Results

A large high-quality manually annotated tongue feature dataset was constructed and analyzed. A detection model was trained with average precision (AP) 47.67%, 58.94%, 71.25% and 59.78% for fissures, tooth marks, thick and yellow coatings, respectively. At over 40 frames per second on a NVIDIA GeForce GTX 1060, the model was capable of detecting tongue features from any viewpoint in real time.

Conclusions/Significance

This study constructed a tongue feature dataset and trained a deep learning object detection model to locate tongue features in real time. The model provided interpretability and intuitiveness that are often lacking in general neural network models and implies good feasibility for clinical application.

Introduction

Traditional Chinese medicine (TCM) physicians learn about the status of internal and external organs, meridians, and blood-Qi circulation in the human body, infer physiological and pathological changes, and select appropriate treatments through the application of four methods of diagnosis: inspection, listening and smelling examinations, inquiry, and palpation. Tongue examination is part of the inspection diagnosis, since the condition of the tongue is often highly correlated with a patient’s health status and disease course. Inspecting specific tongue features provides TCM physicians with clinically important, objective evidence that assists with diagnosis, whereas patients’ narratives can contribute to diagnostic errors. Tongue diagnosis is therefore widely used by TCM physicians.

Importantly, the determination of tongue features can be subjectively affected by observation, with different TCM physicians disagreeing about the interpretation of features on the same tongue, leading to different study conclusions [13]. Moreover, experienced TCM physicians typically identify tongue features that are overlooked by nonclinical personnel [13]. In addition, in the current medical process, to record and interpret a patient’s tongue features, the patient is guided to an examination room first. Then, an assistant captures only “one” tongue image in a well-prepared photographic environment. The interpretation report is completed by a TCM physician after a few days. The reports are in plain text format and mainly specify which tongue features exist, without precise positioning information. The development of an automated tongue feature detection system would expedite the interpretation process and make it possible to obtain more consistent results and reduce human errors. Moreover, junior doctors and medical students could learn tongue diagnosis more efficiently.

In the past few decades, automated interpretation of the tongue has been performed through conventional feature extraction algorithms and statistical methods. For example, conventional image processing techniques have been used to detect tongue features with their corresponding areas [46]. However, those studies lack detailed assessment methods and results [4, 5]. In recent years, artificial intelligence (AI) has been actively applied to medical technology, and significant progress with deep learning image processing has eliminated the need for manual extraction of image features [7, 8]. Furthermore, transfer learning has allowed big datasets to pretrain a deep learning model that can often be easily used to interpret image categories of other, different big datasets. For example, Iqbal et al. applied transfer learning to detect the synovial fluid of human knee joint [9]. Several studies also applied Gradient-weighted Class Activation Mapping or other visualization techniques to roughly locate tongue features [1012]. Only two applied deep learning object detection techniques to mark tongue features. Weng et al. detected fissures and tooth marks with rectangles, but the marking was coarse and details of the dataset construction were not described [13]. Zhang et al. detected several features but the performance was not clear [53]. This study built a manually annotated dataset for several tongue features in TCM and applied the well-known deep learning object detection model "You Only Look Once v4-tiny (YOLOv4-tiny)" [14]. The model marks tongue features with rectangles, so is not limited to simply determining the existence of a tongue feature. Users can clearly see the locations of tongue features.

Many tongue features are clinically examined in TCM, including tongue fissures, tooth marks, and thin and thick coatings, as shown in Fig 1. The clinical significance of these features can be interpreted from the perspectives of TCM and modern medicine.

thumbnail
Fig 1. Examples of tongue features.

Both tongues contain tooth marks on the edges and fissures in the middle. The tongue on the left has a thin coating, while the one on the right has a thick coating.

https://doi.org/10.1371/journal.pone.0296070.g001

Tongue fissures

Some researchers believe that fissured tongues are hereditary [15] and positively correlated with age and male sex [16]; fissures are very rare in children aged less than 10 years [17]. A burning sensation in the mouth is more relevant to tongue fissures [18, 19] and similar to the TCM concept that fissures arise from excessive heat or inadequate body fluid [20]. Generally, a fissured tongue does not directly indicate a specific disease. For instance, the reported incidence of tongue fissures ranges widely (from 20%–95%) in patients with Down syndrome in three different studies [2123] and also in patients with psoriasis (from 4%–66%) [2426]. Not only do the incidence rates of fissured tongue vary widely in specific diseases, but also, it remains to be clarified as to whether tongue fissures are simply a normal phenomenon of age from adolescence upwards, and when tongue fissures have pathological significance. Tongue fissures lack a consistently high standard of interpretation.

Tooth marks

Tooth marks can be caused by an excessive size of the tongue [27]. Some studies have found that tooth marks are associated with three obesity-related disorders; obstructive sleep apnea [28], nocturnal intermittent hypoxia, and snoring [29], each of which disturb sleep and lead to mental fatigue. In TCM diagnostics, obesity has the symptom of “dampness” (resulting from fat in the body), while a deficiency in qi (vital energy) is marked by fatigue [20]. Tooth marks are important indicators for these contributing factors to illness in TCM theory.

Thick coating

Many recent studies have explored the phenomenon of thick coatings on the tongue. A thick coating is positively correlated with bad breath [30] and Helicobacter pylori infection [31]. In another study, in which 459 patients with dysphagia sought medical attention, the thickness of the tongue coating was negatively correlated with food intake [32]. The findings in these three studies are similar to the phenomenon of dampness in the middle burner or food masses (indigestion) in TCM [20].

Yellow coating

Smoking, poor oral hygiene, food and medications can result in tongue coating discoloration [33, 34]. Yellow coatings on tongues may be related to specific diseases. In a Japanese study involving 969 individuals aged 30–79 years, a yellow coating on the tongue was associated with diabetes [34], while Chinese research has reported finding that the extent of yellow coating (light yellow, or yellow) was associated with different types of eczema (subacute, acute, or chronic) [35]. In another study, around 75% of patients with chronic gastritis had a yellow coating on the tongue and almost all of those infected with H. pylori had a yellow coating [31]. A yellow coating is also associated with the presence of Bacillus on the tongue [36]. Yellow coatings on tongues in individuals with diabetes, eczema and bacterial infections are similar to the phenomena associated with dampness-heat (such as chronic inflammation or infection) in TCM theory [20].

Materials and methods

Ethical statement

This research was reviewed and approved by the institutional review board of China Medical University Hospital (registration number CMUH107-REC2-146). Data were obtained from China Medical University Hospital in April 2020. The data were analyzed anonymously. Informed consents were obtained from all participants. Participants provided consents when they first provided their data.

Overview of the dataset and training process

The overview of this study is illustrated in Fig 2.

The construction of the tongue feature dataset and training process of the deep learning model is illustrated in Fig 3 and described in detail below.

thumbnail
Fig 3. The development of tongue feature dataset and training process of a deep learning model.

https://doi.org/10.1371/journal.pone.0296070.g003

Tongue feature annotation process and analysis

Currently, no publicly available manually annotated dataset exists for TCM tongue features. Deep learning object detection techniques require the preparation of a huge number of annotated images with specific features. In this study, tongue images recorded over several years in the Department of Chinese Medicine of China Medical University Hospital were used to construct a dataset of tongue features. The tongue diagnosis data cover the period from January 2008 through March 2020 and include a total of 2,010 images and corresponding interpretation reports issued by TCM physicians. The interpretation reports are in plain text format and mainly specify which tongue features exist, without precise positioning information. In the environment for capturing tongue images, the digital camera, ring light, and chin rest were stably set up and covered with a focusing cloth. Participants were guided to stabilize their chins on the chin rest and protrude their tongues to the appropriate position for capturing. For 764 images in this dataset, tongues and tongue features such as tongue fissures, tooth marks, thick coatings and yellow coatings have been manually annotated with minimal rectangles using an assisted annotation tool LabelMe [37]. These 764 images belong to 652 different tongues and were obtained in a darkroom on different dates. More images will be annotated in the future.

The tongue feature annotation process is illustrated as Fig 4. TCM Expert A referenced formal interpretation reports (issued by TCM physicians in the past) and confirmed the interpretation criteria with a senior TCM physician with more than 30 years’ experience, before commencing the first round of annotation. Tongues, thick coatings and yellow coatings were annotated by Expert A. With regard to tongue fissures and tooth marks, because of higher interobserver agreement for these features [1], TCM Experts B and C were included to assist with annotation. Expert A annotated 264 images, while Experts B and C each annotated 250 images. In order to ensure consistency among all three Experts, 15 images were firstly annotated by Experts B and C, then modified one by one by Expert A. Subsequently, the same process was followed for another 15 images, then finally for another 20 images. After a higher consistency was achieved, Experts B and C each separately annotated an additional 200 images.

After this first round of annotation, Expert A re-examined and modified in detail all annotations in all images for two rounds. Due to the large variation of fissure sizes and occurrences of tens of fissures on a single tongue, a new annotation class ("total area of fissures") was created, whereby a single minimal rectangle enclosed all fissures in each tongue. This new annotation can crop the total area of fissures and yield a higher-resolution image that includes all fissures, enabling higher accuracy of fissure detection. Finally, Expert A re-examined and modified in detail all annotations once more. In summary, one round for initial annotation and three rounds for detailed modifications were performed. Fig 2A illustrates an example (from the set of 764 annotations) of how all tongue features were annotated to prepare the dataset.

AI training

The above annotated tongue feature dataset was used to train the deep learning object detection model YOLOv4-tiny, the “tiny” version of the widely used object detection model YOLOv4.

The YOLOv4-tiny source code used in this study was obtained from https://github.com/AlexeyAB/darknet; the same website also provided pretrained weights for the publicly available MS COCO (Microsoft COCO: Common Objects in Context) dataset [38]. Following conventional training policy, 70% of the dataset was used for training and 30% for testing. An average precision with an intersection over union greater than or equal to 50% (AP50) was used as the evaluation metric. AP50 is a commonly used metric in the evaluation of performance of an object detection model and has been defined in detail in previous research [39]. In order to maximize the AP50 value, many of the model’s default training settings were modified, as described in the following text.

Using pretrained weights to perform transfer learning onto new, small datasets usually yields quite good effects quickly. Due to the large number of objects in this study dataset, the effects of training with and without pretrained weights were compared.

  1. Color augmentation: the use of color augmentation prevents the model from overfitting the data from the training set, by adjusting the image saturation, exposure and hue. Thus, the model is better adapted to the variability of real-world tongue image coloring during tongue feature detection, although the interpretation of some features such as a thick coating and a yellow coating can be affected by changes in colors. Therefore, the effect of training with and without color augmentation for each tongue feature were compared.
  2. Aspect ratios and image resizing: the differentiation ability of the model was increased by changing the aspect ratios and scales of images, to increase the variability by incorporating different aspect ratios and scale of tongue features. Three related parameters were involved in the model’s training settings:
    1. Jitter: this parameter denotes the degree of change in the aspect ratio of an image. Jitter changes the aspect ratio from (1–2*jitter) to (1 + 2*jitter) in the last layer of the model. Using the model’s default value of 0.3, the study also tested the value of 0.1 (to decrease the degree of change in the aspect ratio of images) to compare the difference.
    2. Random: this parameter increases the degree of change in the network scale. Enabling (random = 1) randomly resizes the network size from 1/1.4 to 1.4 in the last layer of the model at every 10 epochs. As the default setting (random = 0) disables this technique, outcomes from both the default setting and the enabled setting (random = 1) were compared.
    3. Resize: this parameter denotes the degree of change in resized images. The scale of images is changed from 1/resize to resize before training, using a default value of 1.5. In this study, the difference with 2 (to increase the degree of change in image scales) was tested.
  3. Learning rate selection: the best possible value for the learning rate is usually inferred through multiple experiments, or by using adaptive learning rate policies such as SGDR (Stochastic Gradient Descent with warm Restarts) [4042]. The default initial learning rate is 0.00261 and changes to one-tenth of the origin (0.000261) when the training process achieves 80%, then changes to 1% of the origin (0.0000261) when the training process achieves 90%. SGDR and another initial learning rate of 0.005 were tested to compare differences in training results.

Each setting was used for 10 rounds (each containing 50,000 epochs) when training the model, in response to the results of several experiments demonstrating that the best result usually occurs before 50,000 epochs. Because different training rounds sharing the same setting may yield slightly different AP50 values, the best AP50 and the average AP50 values from 10 rounds were obtained to achieve a more objective view compared with single-round training. The combined values from these improved settings were used to train the model again, to achieve a model with the highest AP50 value.

Results and discussion

The characteristics of the tongue feature dataset constructed in this study and the training results of the object detection model for the dataset are described separately in the following sections.

Tongue feature dataset characteristics

The numbers of images containing each tongue feature and numbers of feature objects are illustrated in Table 1. Each feature object is annotated by a minimal rectangle, so for each tongue feature there may be several feature objects. The distributions of objects counted on each tongue depicted in Table 1 are illustrated in Fig 5.

thumbnail
Fig 5. Distributions of tongue feature counts.

(A) Distribution of fissure counts, (B) Distribution of tooth mark counts, (C) Distribution of thick coating counts, (D) Distribution of yellow coating counts.

https://doi.org/10.1371/journal.pone.0296070.g005

thumbnail
Table 1. Numbers of images and numbers of feature objects in the dataset.

https://doi.org/10.1371/journal.pone.0296070.t001

Fissures

A total of 73% of the images contain fissures; as illustrated in Fig 5A, around two-thirds (61%) of these images contain fewer than 5 fissures. Ambiguities exist in manual annotation. For example, Fig 6 depicts manually annotated tongues with more than 20 fissures, with crisscrossing of fissures that complicate the annotation, although their presence adds diversity to the dataset and they are therefore valuable contributors. Importantly, 80% of all images contain fewer than 8 fissures, which therefore ensures that during AI model training, the manual annotations still provide good consistency; a small number of ambiguous fissure patterns has little effect upon AI model training. When combined with annotations regarding the "total area of tongue fissures" as shown in Fig 7, the information is more than sufficient for clinical applications. From the TCM clinical point of view, the focus is on the presence or absence of fissures and the size of the fissure area, so the manual annotations should be sufficient for clinical use.

Tooth marks

Almost all images contain tooth marks and 83% of the images contain between 7 and 13 tooth marks, as illustrated in Fig 5B. The depressions of some tooth marks are not obvious on the 2-dimensional plane image from a single viewpoint of a tongue, but appear only with a darker color, as shown in Fig 8. Manual annotation of indistinct tooth marks is difficult, because repeated inspections are required to minimize the degree of omission.

Thick coatings

Most (91%) of the tongue images contain a thick coating. One tongue may contain multiple thick coating areas due to an uneven distribution of the tongue coating, as shown in Fig 9. Of the 696 images of tongues with thick coatings, most (85%) contain only one area of thick coating, as illustrated in Fig 5C.

Yellow coatings

Over two-thirds (69%) of tongue images have a yellow coating, and the uneven distribution of tongue coating can form multiple yellow coating areas, as shown in Fig 10. Almost all (98%) tongues with a yellow coating contain only one or two areas of yellow coating, as illustrated in Fig 5D.

Distributions of area percentages of feature objects

Fig 11 illustrates the distribution of area percentages of feature objects (the rectangle containing the whole tongue is represented by a dashed green rectangle in Fig 2A). Marked differences exist regarding the numbers and areas of each tongue feature; the areas of yellow coatings, thick coatings and total areas of fissures are markedly larger than the areas covered by fissures and tooth marks. Notably, although the analysis identified a large total number of fissures and tooth marks, these objects are small and are therefore more difficult for AI model training to cope with than larger objects such as the areas of yellow coatings, thick coatings and total areas of fissures (see Table 1).

thumbnail
Fig 11. The distribution of area percentages of feature objects (with respect to the rectangle containing the whole tongue).

Avg: average, Stdev: standard deviation.

https://doi.org/10.1371/journal.pone.0296070.g011

Training results of the object detection model

The training results for each tongue feature are presented in Table 2 (Refer to S1 Table for details), which lists the best AP50 values, as well as training results for AP50 values with default settings and pretrained weights. At over 40 frames per second on a NVIDIA GeForce GTX 1060, the model was capable of detecting tongue features from any viewpoint in real time. The model size is only 22.4MB. Several models released after the completion of this study have also been trained (partially fine-tuned) for comparison (S2 Table). Among them PP-YOLOE+ [43] and DINO [44] show a more significant improvement in accuracy although the noticeable difference to the naked eye may not be significant. Nonetheless, the model sizes (773MB with PP-YOLOE+ and 839MB with DINO) are more than 34 times larger than the YOLOv4-tiny used in the original paper. The benefits of running this model on mobile devices require further evaluation.

In regard to the manual marking up of tongue features, fissures and tooth marks are less related to gradation and easier to annotate than thick coatings and yellow coatings, but the fissure and tooth mark areas are smaller and there are larger numbers of objects. AP50 values for the total areas of fissures and thick coatings both exceed 70%, while the AP50 values for tooth marks and yellow coatings are approximately 60% and the AP50 value for fissures is only 47.67%.

The reason for the much higher AP50 value for the total area of fissures compared with the AP50 value for fissures may be due to the low input resolution (416*416) of the YOLOv4-tiny image used in this study, and because tongue fissures are not easily detected as they are mainly small objects. Twenty images were randomly selected from the test set for comparing the differences between expert manually annotated fissures and model-predicted fissures. Notably, ambiguities exist during manual interpretation; for instance, small tongue fissures may appear to be part of a larger fissure, which then yields lower AP50 values for fissures and therefore lower accuracy, as illustrated in Fig 12 and S8S10 Figs. From the current results, the differences in the annotated areas between the model and manual annotations mostly stem from the acceptable ambiguous zones during interpretation and the predicted results provided by the model are satisfactory.

thumbnail
Fig 12. Multiple fissures are marked as a large fissure.

The left is manually annotated, while the right is marked by the model.

https://doi.org/10.1371/journal.pone.0296070.g012

Training results for each setting (as presented in Table 3, and also in S1 and S2 Figs) are described as follows:

  1. The accuracy of tongue fissure detection is not greatly affected by the inclusion or exclusion of pretrained weights. When pretrained weights are not used, the detection accuracies are higher for tooth marks, thick coatings and yellow coatings, but lower for fissures. Fissures are smaller than other features and are crisscrossed (as in Fig 6), making fissure detection difficult for AI models.
  2. Color augmentation lowers the detection effect for thick coatings and yellow coatings, features more related to gradation than features such as tooth marks and fissures.
  3. Reducing aspect ratios of images (jitter) decreases detection accuracy for all features, probably because the model loses the ability to identify features with varying aspect ratios. In general, increasing the varying degree of the network size (network) has positive benefits for each feature, whereas increasing the varying degree of image sizes (resize) results in obvious positive effects only for thick coating and yellow coating detection; the reason for this phenomenon is yet to be determined.
  4. Compared with the model’s default learning rate of 0.00261, using another initial learning rate (0.005) or an adaptive learning rate policy (SGDR) yielded better results.

A single model for simultaneously detecting all features were also trained. The training results are represented in S3 Fig. However, the mean AP50 (MAP50) value was only 50.98%; individual AP50s for fissures, tooth marks, thick coatings, yellow coatings, and total areas of features were 34.06%, 50.64%, 58.93%, 52.83%, and 58.46%, respectively. This may be because more than 80% of the objects were small objects such as fissures (21.76%) and tooth marks (61.82%) and also, the detection for each feature required different settings. Compared with the mean AP50 of YOLOv4-tiny on MS COCO 2017, which is about 42% [45], models trained in this study exhibits quite a good performance. Some detection results are represented in S4S7 Figs.

YOLOv4-tiny has a very fast computing speed and is therefore particularly appropriate for mobile devices with limited computing power, such as smartphones or tablets. Trained models in this study can be used to record tongue features easily and dynamically under real-time conditions during the outpatient clinic visit. Users can take screenshots of the clearest viewing angle of tongue features for evidence, without needing to set up the shooting environment or wait for the results to be interpreted by TCM physicians.

Many studies are based on the development of YOLO, with various papers offering detailed reviews and comparisons [46, 47]. YOLO is famous for its speed, and there are still many other object detection models, each with its own strengths and weaknesses [48, 49]. Object detection models, including YOLO, are based on convolutional neural networks. Transformer-based models have recently entered the field of object detection and achieve similar accuracy [50]. However, due to their high computational demands, it remains challenging to deploy on mobile devices.

While deep learning has been applied to the analysis of tongue images, this AI modality is yet to be applied to the clinical interpretation of TCM tongue diagnosis. For the analysis of tongue images, the CHDNet model combines deep learning and support-vector machine classifiers to extract and classify tongue features [51]. However, the digital features extracted by the CHDNet model are not visual features and are therefore not related to tongue features quantified in TCM. Consequently, the digital features in the CHDNet model cannot be applied to clinical tongue inspection diagnosis. In addition, the classification results of this model show either “gastritis” or “no gastritis”, which does not relate to either a disease term or diagnosis in TCM language. In another deep learning model, the analysis of tongue color outperformed conventional imaging processing methods that lacked deep learning techniques [52]. Thus, these deep learning models do not relate to the positioning of tongue features and are therefore unsuitable for application in TCM clinical and teaching environments.

Up until now, only two studies exist in the literature that are similar to this study. In one study, deep learning object detection technology was used to mark tongue fissures and tooth marks with rectangles [13]. Notably, the marked rectangles in that study are much larger than the areas of tongue features, the MAP50 is only 34.42%, and the researchers do not describe details of the manually annotated dataset. The second study used the deep learning object detection technique, whereby rectangles marked tongue fissures, tooth marks, thick coatings, peeling coatings, and red dots [53]. However, these markings missed many obvious feature objects and the definition of accuracy was very rough: among many features on a single tongue, only one object with the best detection result was included in the final accuracy evaluation, instead of an evaluation of all detected objects.

This study has some limitations. First, the tongue feature dataset is smaller than other datasets used for object detection, and the dataset includes records from a single hospital, which may limit the diversity. Second, because the goal of this study is to apply tongue feature detection to mobile devices, the model is fast but compromises accuracy; a more accurate model can be used in high-end devices in future. Third, clinical verification is needed for the detection results of the model trained in this study. Clinical TCM physicians can provide feedback based on their domain knowledge and interactive analysis, all of which will improve the accuracy and practicality of the deep learning object detection model.

Conclusions

This study has constructed a tongue feature dataset and trained a deep learning object detection model for locating tongue features that performs in real time. A step-by-step setting-tuning process were proposed to push the model to its limits as much as possible. Although there are differences in the features marked by manual and model annotations, these variances lie within the clinically acceptable gray area. As a result, the model’s predictions are deemed satisfactory. A real advantage of the training methods and materials used in this study is that they can be easily and quickly adopted in the training of any new model. In conclusion, we endorse the development of more AI applications that relate to tongue diagnosis in TCM, a traditional diagnostic technique that is informed by a huge body of expert experience accumulated over many centuries. Having more such AI applications will promote the wider use of this tool in the analysis of a patient’s overall health condition and also ensure that this technique continues to be valued in the future.

Supporting information

S1 Fig. Setting tuning results for fissures and total areas of fissure detection.

https://doi.org/10.1371/journal.pone.0296070.s001

(TIF)

S2 Fig. Setting tuning results for tooth mark, thick coating and yellow coating detection.

https://doi.org/10.1371/journal.pone.0296070.s002

(TIF)

S3 Fig. Setting tuning results for all tongue features detection.

https://doi.org/10.1371/journal.pone.0296070.s003

(TIF)

S5 Fig. Tooth marks detected by our model.

https://doi.org/10.1371/journal.pone.0296070.s005

(TIF)

S6 Fig. Thick coatings detected by our model.

https://doi.org/10.1371/journal.pone.0296070.s006

(TIF)

S7 Fig. Yellow coatings detected by our model.

https://doi.org/10.1371/journal.pone.0296070.s007

(TIF)

S1 Table. Evaluation metrics for tongue features.

https://doi.org/10.1371/journal.pone.0296070.s011

(DOCX)

S2 Table. Comparison of models for tongue features.

https://doi.org/10.1371/journal.pone.0296070.s012

(DOCX)

Acknowledgments

Special thanks to Iona J. MacDonald from China Medical University (Taichung, Taiwan) for her editing of this manuscript.

References

  1. 1. Lo LC, Chen YF, Chen WJ, Cheng TL, Chiang JY. The Study on the Agreement between Automatic Tongue Diagnosis System and Traditional Chinese Medicine Practitioners. Evid Based Complement Alternat Med. 2012;2012:505063. Epub 2012/08/28. pmid:22924055.
  2. 2. Kim M, Cobbin D, Zaslawski C. Traditional Chinese medicine tongue inspection: an examination of the inter- and intrapractitioner reliability for specific tongue characteristics. J Altern Complement Med. 2008;14(5):527–36. pmid:18564955.
  3. 3. Ko MM, Lee JA, Kang BK, Park TY, Lee J, Lee MS. Interobserver reliability of tongue diagnosis using traditional korean medicine for stroke patients. Evid Based Complement Alternat Med. 2012;2012:209345. Epub 20120228. pmid:22474492.
  4. 4. Lo Lc, Hou MCc, Chen Yl, Chiang JY, Hsu Cc, editors. Automatic Tongue Diagnosis System. 2009 2nd International Conference on Biomedical Engineering and Informatics; 2009 17–19 Oct. 2009.
  5. 5. Hsu Y, Chen Y, Lo L, Chiang JY, editors. Automatic tongue feature extraction. 2010 International Computer Symposium (ICS2010); 2010 16–18 Dec. 2010.
  6. 6. Naveed S, Geetha G. Intelligent Diabetes Detection System based on Tongue Datasets. Curr Med Imaging Rev. 2019;15(7):672–8. pmid:32008515.
  7. 7. Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117. Epub 2014/12/03. pmid:25462637.
  8. 8. Medenica S, Zivanovic D, Batkoska L, Marinelli S, Basile G, Perino A, et al. The Future Is Coming: Artificial Intelligence in the Treatment of Infertility Could Improve Assisted Reproduction Outcomes-The Value of Regulatory Frameworks. Diagnostics (Basel). 2022;12(12). Epub 20221128. pmid:36552986.
  9. 9. Iqbal I, Shahzad G, Rafiq N, Mustafa G, Ma J. Deep learning-based automated detection of human knee joint’s synovial fluid from magnetic resonance images with transfer learning. IET Image Processing. 2020;14(10):1990–8.
  10. 10. Yang Z, Zhao Y, Yu J, Mao X, Xu H, Huang L. An Intelligent Tongue Diagnosis System via Deep Learning on the Android Platform. Diagnostics (Basel). 2022;12(10). Epub 20221010. pmid:36292140.
  11. 11. Wang X, Liu J, Wu C, Liu J, Li Q, Chen Y, et al. Artificial intelligence in tongue diagnosis: Using deep convolutional neural network for recognizing unhealthy tongue with tooth-mark. Comput Struct Biotechnol J. 2020;18:973–80. Epub 2020/05/06. pmid:32368332.
  12. 12. Zhou J, Li S, Wang X, Yang Z, Hou X, Lai W, et al. Weakly Supervised Deep Learning for Tooth-Marked Tongue Recognition. Front Physiol. 2022;13:847267. Epub 20220412. pmid:35492602.
  13. 13. Weng H, Li L, Lei HW, Luo ZM, Li CD, Li SZ. A weakly supervised tooth-mark and crack detection method in tongue image. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE. 2021;33(16).
  14. 14. Wang C-Y, Bochkovskiy A, Liao H-YM. Scaled-YOLOv4: Scaling Cross Stage Partial Network. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): IEEE Computer Society; 2021. p. 13024–33.
  15. 15. Assimakopoulos D, Patrikakos G, Fotika C, Elisaf M. Benign migratory glossitis or geographic tongue: an enigmatic oral lesion. Am J Med. 2002;113(9):751–5. pmid:12517366.
  16. 16. Hsu PC, Wu HK, Huang YC, Chang HH, Chen YP, Chiang JY, et al. Gender- and age-dependent tongue features in a community-based population. Medicine (Baltimore). 2019;98(51):e18350. Epub 2019/12/22. pmid:31860990.
  17. 17. Jarvinen J, Mikkonen JJ, Kullaa AM. Fissured tongue: a sign of tongue edema? Med Hypotheses. 2014;82(6):709–12. Epub 2014/04/05. pmid:24698850.
  18. 18. Ching V, Grushka M, Darling M, Su N. Increased prevalence of geographic tongue in burning mouth complaints: a retrospective study. Oral Surg Oral Med Oral Pathol Oral Radiol. 2012;114(4):444–8. Epub 2012/08/21. pmid:22901641.
  19. 19. Soto-Rojas AE, Villa Ar Fau—Sifuentes-Osornio J, Sifuentes-Osornio J Fau—Alarcon-Segovia D, Alarcon-Segovia D Fau—Kraus A, Kraus A. Oral manifestations in patients with Sjogren’s syndrome. (0315-162X (Print)).
  20. 20. Schnorrenberger CC, Schnorrenberger B. Pocket Atlas of tongue diagnosis: With Chinese therapy guidelines for acupuncture, herbal prescriptions, and Nutrition: Thieme; 2011.
  21. 21. Zeligman I, Scalia SP. Dermatologic manifestations of mongolism. AMA Arch Derm Syphilol. 1954;69(3):342–4. Epub 1954/03/01. pmid:13137604.
  22. 22. Kullaa-Mikkonen A, Mikkonen M, Kotilainen R. Prevalence of different morphologic forms of the human tongue in young Finns. Oral Surg Oral Med Oral Pathol. 1982;53(2):152–6. Epub 1982/02/01. pmid:6949120.
  23. 23. Ercis M, Balci S, Atakan N. Dermatological manifestations of 71 Down syndrome children admitted to a clinical genetics unit. CLINICAL GENETICS. 1996;50(5):317–20. pmid:9007317
  24. 24. Daneshpazhooh M, Moslehi H, Akhyani M, Etesami M. Tongue lesions in psoriasis: a controlled study. BMC Dermatol. 2004;4(1):16. Epub 2004/11/06. pmid:15527508.
  25. 25. Zargari O. The prevalence and significance of fissured tongue and geographical tongue in psoriatic patients. Clin Exp Dermatol. 2006;31(2):192–5. Epub 2006/02/21. pmid:16487088.
  26. 26. Al Qahtani NA, Deepthi A, Alhussain NM, Al Shahrani BAM, Alshehri H, Alhefzi A, et al. Association of geographic tongue and fissured tongue with ABO blood group among adult psoriasis patients: a novel study from a tertiary care hospital in Saudi Arabia. Oral Surg Oral Med Oral Pathol Oral Radiol. 2019;127(6):490–7. Epub 2019/03/25. pmid:30902460.
  27. 27. Yanagisawa K, Takagi I, Sakurai K. Influence of tongue pressure and width on tongue indentation formation. J Oral Rehabil. 2007;34(11):827–34. pmid:17919249.
  28. 28. Weiss TM, Atanasov S, Calhoun KH. The association of tongue scalloping with obstructive sleep apnea and related sleep pathology. OTOLARYNGOLOGY-HEAD AND NECK SURGERY. 2005;133(6):966–71. pmid:16360522
  29. 29. Tomooka K, Tanigawa T, Sakurai S, Maruyama K, Eguchi E, Nishioka S, et al. Scalloped tongue is associated with nocturnal intermittent hypoxia among community-dwelling Japanese: the Toon Health Study. JOURNAL OF ORAL REHABILITATION. 2017;44(8):602–9. pmid:28548303
  30. 30. van den Broek A, Feenstra L, de Baat C. A review of the current literature on aetiology and measurement methods of halitosis. JOURNAL OF DENTISTRY. 2007;35(8):627–35. pmid:17555859
  31. 31. Liu X, Sun ZM, Liu YN, Ji Q, Sui H, Zhou LH, et al. The Metabonomic Studies of Tongue Coating in H. pylori Positive Chronic Gastritis Patients. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE. 2015;2015. pmid:26557866
  32. 32. Furuya J, Suzuki H, Tamada Y, Onodera S, Nomura T, Hidaka R, et al. Food intake and oral health status of inpatients with dysphagia in acute care settings. JOURNAL OF ORAL REHABILITATION. 2020;47(6):736–42. pmid:32196723
  33. 33. Schlager E, St Claire C, Ashack K, Khachemoune A. Black Hairy Tongue: Predisposing Factors, Diagnosis, and Treatment. Am J Clin Dermatol. 2017;18(4):563–9. pmid:28247090.
  34. 34. Tomooka K, Saito I, Furukawa S, Maruyama K, Eguchi E, Iso H, et al. Yellow Tongue Coating is Associated With Diabetes Mellitus Among Japanese Non-smoking Men and Women: The Toon Health Study. JOURNAL OF EPIDEMIOLOGY. 2018;28(6):287–91. pmid:29311441
  35. 35. Yu ZF, Zhang HF, Fu LJ, Lu XZ. Objective research on tongue manifestation of patients with eczema. TECHNOLOGY AND HEALTH CARE. 2017;25:S143–S9. pmid:28582901
  36. 36. Ye J, Cai X, Yang J, Sun X, Hu C, Xia J, et al. Bacillus as a potential diagnostic marker for yellow tongue coating. Sci Rep. 2016;6:32496. Epub 2016/09/01. pmid:27578261.
  37. 37. Wada K. Labelme: Image Polygonal Annotation with Python.
  38. 38. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al., editors. Microsoft COCO: Common Objects in Context. Computer Vision–ECCV 2014; 2014 2014//; Cham: Springer International Publishing.
  39. 39. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision. 2010;88(2):303–38.
  40. 40. Robbins H, Monro S. A Stochastic Approximation Method. The Annals of Mathematical Statistics. 1951;22(3):400–7.
  41. 41. Loshchilov I, Hutter F, editors. SGDR: Stochastic Gradient Descent with Warm Restarts. International Conference on Learning Representations; 2017.
  42. 42. Iqbal I, Odesanmi GA, Wang J, Liu L. Comparative Investigation of Learning Algorithms for Image Classification with Small Dataset. Applied Artificial Intelligence. 2021;35(10):697–716.
  43. 43. Xu S, Wang X, Lv W, Chang Q, Cui C, Deng K, et al. PP-YOLOE: An evolved version of YOLO2022 March 01, 2022:[arXiv:2203.16250 p.].https://ui.adsabs.harvard.edu/abs/2022arXiv220316250X.
  44. 44. Zhang H, Li F, Liu S, Zhang L, Su H, Zhu J, et al., editors. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. The Eleventh International Conference on Learning Representations; 2023.
  45. 45. Wang C-Y, Bochkovskiy A, Liao H-YM. Scaled-YOLOv4: Scaling Cross Stage Partial Network. 2021:13029–38.
  46. 46. Sirisha U, Praveen SP, Srinivasu PN, Barsocchi P, Bhoi AK. Statistical Analysis of Design Aspects of Various YOLO-Based Deep Learning Models for Object Detection. International Journal of Computational Intelligence Systems. 2023;16(1):126.
  47. 47. Hussain M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines. 2023;11(7):677.
  48. 48. Kaur R, Singh S. A comprehensive review of object detection with deep learning. DIGITAL SIGNAL PROCESSING. 2023;132.
  49. 49. Zou Z, Chen K, Shi Z, Guo Y, Ye J. Object Detection in 20 Years: A Survey. Proceedings of the IEEE. 2023;111(3):257–76.
  50. 50. Arkin E, Yadikar N, Xu X, Aysa A, Ubul K. A survey: object detection methods from CNN to transformer. Multimedia Tools and Applications. 2023;82(14):21353–83.
  51. 51. Meng D, Cao G, Duan Y, Zhu M, Tu L, Xu D, et al. Tongue Images Classification Based on Constrained High Dispersal Network. Evid Based Complement Alternat Med. 2017;2017:7452427. Epub 2017/05/04. pmid:28465706.
  52. 52. Hou J, Su H, Yan B, Zheng H, Sun Z, Cai X, editors. Classification of tongue color based on CNN. 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)(2017 10–12 March 2017.
  53. 53. Zhang X, Chen ZK, Gao J, Huang W, Li P, Zhang JN. A two-stage deep transfer learning model and its application for medical image processing in Traditional Chinese Medicine. KNOWLEDGE-BASED SYSTEMS. 2022;239.