Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Exploring the power of data mining for uncovering traditional medicinal plant knowledge: A case study in Shahrbabak, Iran

  • Hossein Bibak,

    Roles Methodology, Writing – original draft

    Affiliation Faculty of Science, Department of Biology, University of Jiroft, Jiroft, Iran

  • Farzad Heydari,

    Roles Formal analysis, Software, Writing – review & editing

    Affiliation Faculty of Mathematics and Computer, Department of Computer Science, Shahid Bahonar University of Kerman, Kerman, Iran

  • Mohammad Sadat-Hosseini

    Roles Conceptualization, Investigation, Writing – original draft, Writing – review & editing

    m.hosseini@ujiroft.ac.ir

    Affiliation Faculty of Agriculture, Department of Horticultural Science, University of Jiroft, Jiroft, Iran

Abstract

The present study recorded indigenous knowledge of medicinal plants in Shahrbabak, Iran. We described a method using data mining algorithms to predict medicinal plants’ mode of application. Twenty-oneindividuals aged 28 to 81 were interviewed. Firstly, data were collected and analyzed based on quantitative indices such as the informant consensus factor (ICF), the cultural importance index (CI), and the relative frequency of citation (RFC). Secondly, the data was classified by support vector machines, J48 decision trees, neural networks, and logistic regression. So, 141 medicinal plants from 43 botanical families were documented. Lamiaceae, with 18 species, was the dominant family among plants, and plant leaves were most frequently used for medicinal purposes. The decoction was the most commonly used preparation method (56%), and therophytes were the most dominant (48.93%) among plants. Regarding the RFC index, the most important species are Adiantum capillus-veneris L. and Plantago ovata Forssk., while Artemisia auseri Boiss. ranked first based on the CI index. The ICF index demonstrated that metabolic disorders are the most common problems among plants in the Shahrbabak region. Finally, the J48 decision tree algorithm consistently outperforms other methods, achieving 95% accuracy in 10-fold cross-validation and 70–30 data split scenarios. The developed model detects with maximum accuracy how to consume medicinal plants.

Introduction

The utilization of plants for traditional medicine and health purposes has been around since ancient times and is becoming increasingly popular in many parts of the world [1,2]. Medicinal plants are rich in effective substances that treat various diseases [3]. For novel drug development, the first and most critical stage is the collection and analysis of information on medicinal plants used by various indigenous cultures [4]. Ethnopharmacological studies are necessary to obtain the past and present state of cultural habits about plants around the world. It is also essential to record indigenous people’s knowledge of medicinal plants [1,5,6]. Furthermore, it supports the preservation of traditional knowledge for future generations and other communities [7,8]. Throughout Iran, ethnopharmacological studies have been conducted on plants for many years [916]. With 900,000 ha of natural resources, the Shahrbabak has many medicinal plants. However, the existing literature indicates a significant gap in our understanding of how local populations utilize these species for medicinal purposes and disease treatment.

Data mining emerged in the mid-1990s as a method for uncovering hidden knowledge. Data mining can identify complexity, discover potential causal relationships, and find hidden relationships, and correlations between variables [17].

  • Data mining is the science of extracting patterns, information, and analysis from raw datasets produced by an organization, a society, or any other set. Data mining transforms useless information into useful information by obtaining valuable results. At a more detailed level, data mining is a step in the Knowledge Discovery in Databases (KDD) process. Generally, four stages or main steps can be considered for data mining: determining goals, collecting and preparing data, extracting patterns, and evaluating the results [18]. Data mining algorithms are divided into four categories based on performance: Classification (supervised learning): In this type of learning, a set of samples with their labels is provided to the model, and the model must establish a relationship between the examples and their labels. This algorithm can learn from the labeling model and use data mining algorithms to label and separate new samples. Classification algorithms include decision trees, support vector machines, neural networks, and logistic regression.
  • Clustering (unsupervised learning): In this case, the algorithm divides the data into groups based on their similarities. Unsupervised learning uses unlabeled samples. In these algorithms, a cost function and a distance measurement are defined. The algorithms should reduce the cost function value according to distance measurement.
  • Semi-supervised learning involves labeled and unlabeled data. Semi-supervised learning methods are somewhere between unsupervised and supervised learning methods.
  • Reinforcement learning, the algorithm continuously discovers and learns by exchanging information and operations with the surrounding environment. When a machine receives a reward, it can learn how to improve itself to receive more rewards in the future. This is done by performing specific actions [19].

We used different data mining algorithms for prediction. The most crucial classification algorithms used in this article are described below.

  • Decision tree is a supervised learning (classification) method. A decision tree has a structure where an internal node represents an attribute, a branch means a decision rule, and each leaf node indicates an outcome. The highest node in the decision tree is known as the root node, which is the highest level of the tree. A decision tree is suitable for establishing non-linear relationships between features and classes. The decision tree is flexible because it can easily model non-linear or unconventional relationships. It can interpret the interaction between predictors. This method can also be well interpreted due to its binary structure [19,20].

Support Vector Machine is a supervised learning algorithm that controls and solves classification problems. This algorithm is applied to different classification fields. SVM is designed to achieve the goal of class members having the least distance from each other and the maximum length from other classes. This technique is a supervised learning model used for linear and non-linear classification. The basis of the work of SVM classifiers is the linear classification of the data. In the linear division of the data, an attempt is made to select a line with a higher margin of confidence [19,20].

Logistic regression is a classification algorithm that assigns observed samples to a distinct set of classes. Unlike linear regression, which produces continuous numerical values, it uses the logistic sigmoid function to transform its output to return a probability value that can be mapped into two or more distinct classes. Logistic regression works well when the data relationship is almost linear but poorly if non-linear relationships exist between the variables [19,20].

Artificial Neural Networks (ANN) is an information processing paradigm inspired by biological neural systems such as the brain that process information. ANN consists of several layers of simple processing elements called neurons. The neuron performs the two functions of collecting inputs and producing an output. Using ANN provides an overview of the theory, learning rules, and applications of the most important neural network models, definitions, and computational styles [21].

Data quality is vital in data analysis because incorrect data leads to wrong results. Fast detection of data quality issues reduces the effort and time needed to find and analyze them. Therefore, it is necessary to use data mining methods to find defects and fix wrong data [22].

Data mining starts with raw data and continues until new knowledge is formed. Data cleansing refers to identifying, removing, and correcting wrong data from tables, records, or databases. It also includes identifying incomplete and incorrect data parts and correcting and replacing them. Data Integration is collecting data from multi-source systems to create single sets of information for operational and analytical applications. During the Data Selection section, the dataset should be selected and retrieved. Sometimes, to increase the accuracy of the analysis, we have to change the raw data available for analysis. One of these changes is the Data Transformation process [23]. In addition, we need to identify the right features. Choosing the most critical features improves the efficiency of data mining algorithms and data understanding, reduces algorithm execution time, reduces data storage volume, and simplifies the model. Feature selection methods are divided into Filters, Wrappers, and Embedded [24].

Very few studies have been performed on data mining methods to increase the discovery of hidden knowledge of ethnopharmacology. It has been found that data mining in ethnopharmacology has two crucial advantages. First, it utilizes qualitative and quantitative data (such as observations and sensor information) to study practically inaccessible phenomena through each data type alone. Second, it provides a means of interpreting that data, which produces novel insights by exposing the biases inherent in each data type alone [25]. Axiotis et al. performed a study using an intelligent search system to support ethnopharmacological research through a combination of active learning and reinforcement learning. They reported that Machine learning-powered research improved the effectiveness and efficiency of the domain expert by 3.1 and 5.14 times, respectively. This was done by fetching 420 relevant ethnopharmacological documents in only seven hours versus an estimated 36 hours of human effort [26]. The current study documented ethnomedicinal knowledge of medicinal plants in the Shahrbabak region in the southeastern part of Iran, within Kerman Province. This work analyzed medicinal plants used for treating various diseases. Also, for the first time, we recommend modeling and comparing data mining algorithms to predict medicinal plant modes of application. Our study combines qualitative ethnobotanical fieldwork with advanced data mining approaches to create a systematic framework for collecting and analyzing traditional medicinal plant data. Also, the current study demonstrates the potential of data mining as a tool for unlocking valuable insights from traditional knowledge systems.

Material and methods

Study area

The city of Shahrbabak is situated (30° 11ʹ63ʺ N 55° 11ʹ86ʺ E) North-West of Kerman Province with an area of 13500 square kilometers and an altitude of 1845 m above sea level. According to the 2006 Iranian census, this area had 43,916 residents. Shahrbabak is an ancient Iranian city. Meymand, one of Iran’s four ancient villages, is 36 kilometers from Shahrbabak. This town is near Sarcheshmeh and Miedook, Iran’s largest copper mines. Historians say this town was built by the Sassanid king Ardeshir Babakan 1800 years ago. Shahrbabak has a semiarid climate with hot and dry summers and cold and dry winters. This region’s annual temperature, average rainfall, and humidity ranges are 16.2°C, 162 mm, and 34%, respectively.

Collecting data and identifying plants

Data were collected from different parts of the Shahrbabak district, North-West of Kerman Province. The interviewees were identified as indigenous practitioners, sellers, shepherds and medicinal herb vendors who assisted with identifying plants they regarded as medicinal (the questionnaire is accessible through an online Supplementary file). Twenty-oneindividuals (11 females and 10 males) aged 28 to 81 were interviewed. Plants were collected from Robat, Meymand, Khatoun Abad, Estabragh, Mehrabad, Dehej-Jowzam, Abdar, Barfe, and Khabr regions, all parts of the Shahrbabak district. Information on vernacular names, herbal part(s) as pharmacological agents, medicinal uses, methods of treatment and preparation was recorded, shown in Table 1. The plants were dried, labeled, and preserved in the Herbarium of the Biology Department at the University of Jiroft for identification and future work. Medicinal plants were identified using Iranica flora [27], Palestine flora [28], Iraq flora [29], Turkey flora [30], and Iran flora (in color) [31]. Plant life cycles were classified according to Raunkiaer’s system [32].

thumbnail
Table 1. Indigenous medicinal knowledge of plants from the study area.

https://doi.org/10.1371/journal.pone.0303229.t001

Data analysis

Ethnomedicine information was evaluated using plant medicinal reports. Three variables were used to define this indicator: i, u, and s. Accordingly, the informant ‘i’ mentions the use of species ‘s’ in a specific category of use ‘u’. The number of medicinal plants and the number of informants reporting the use of a species were counted. We also calculated quantitative value indices.

Informant consensus factor (ICF)

The Informant Consensus Factor (ICF) was applied to determine the homogeneity of information. The claims regarding medicinal uses are termed ‘citations’, which were classified into ailment categories where each plant was deemed adequate. The ICF index was estimated as follows:

Here, ‘Nur’ represents the number of citations used in each category whereas ‘Nt’ represents the number of species used for medicinal purposes [33].

The following formula was employed to compute the relative frequency of citation (RFC) index:

This index (RFC) was computed when the frequency of citation (FC) (i.e. the number of interviewees who mentioned a beneficial species) was divided by the total number of participants in the survey (N). The RFC index ranged from 0 (when no informants mentioned it as beneficial) to 1 (when all interviewees revealed it as beneficial).

Using the following equation, the cultural importance index (CI) was estimated:

A CI index considers the frequency of use of a species (according to the number of informants) and the number of cases in which it is used.

The correlation between an informant’s age and the number of applications reported by each informant mentioned for a given medicinal plant was obtained by the coefficient of determination (R2). For this purpose, each informant’s age was scored according to the following ranges: 1 (for 28–40), 2 (for 40–50), 3 (for 50–60), 4 (for 60–70) and 5 (for 70–81).

Methodology proposed

The proposed method involves three steps. Preprocessing the data is the first step. This approach eliminates unnecessary features and simplifies them. By removing inconsistent features, predictions can be improved and execution times reduced. At this stage, data mining algorithms depend on uppercase letters, lowercase letters, text spaces, etc. Therefore, the data is integrated first. After that, one of the sample values of the "mode of application" column is empty, and from where the value of this column is empty. So, this sample is removed from the dataset. Also, some columns that have no role in the prediction process such as “Scientific name”, “Vernacular name (Persian)”, “Voucher no” are removed from the dataset.

In the second step, the preprocessed samples are divided into training and testing. In the current study, 70% of the data is used for training and 30% for testing. The 10-fold cross-validation method has been used to train the proposed model.

Once the data has been preprocessed and converted into an optimal dataset, it can be fed into different classification algorithms (supervised learning). To classify the dataset, support vector machines, J48 decision trees, neural networks, and logistic regression were employed. All data mining algorithms are tested on the dataset with different parameters.

In the third step, the proposed model is evaluated using various model evaluation criteria. Several criteria are used to evaluate the developed method, including F-measure, Recall, Precision, Receiver Operating Characteristic (ROC), and Cohen’s Kappa. Fig 1 illustrates the flowchart of the proposed method.

Results and discussion

Shahrbabak region’s choice to study ethnopharmacology was determined for several reasons. Firstly, its historical significance as an ancient Iranian city believed to have been established by Sassanid king Ardeshir Babakan around 1800 years ago offers a rich cultural and historical context for ethnopharmacological research [34]. Additionally, this town is amidst a semi-arid climate and close to significant copper mines, influencing local ethnopharmacological practices and offering a rich cultural context for research. This study will significantly benefit researchers, scientists, herbal enthusiasts, and pharmaceutical professionals. Researchers can use data mining techniques to analyze and interpret traditional medicinal plant knowledge. A valuable resource has been provided to pharmaceutical researchers by the current study, allowing them to explore and develop novel drugs in the future. This contributes to pharmaceutical science advancement and healthcare solutions improvement.

Plant diversity

All 141 plant species in this study were considered medicinal by indigenous people. These medicinal plants are from 43 different families, among which the Lamiaceae has 18 species, Fabaceae has 17 and Apiaceae has 16 species, which were found to be the most frequently occurring families among the 141 species in the area, followed by Asteraceae with 13 species (Fig 2). According to a previous report from Iran, the Lamiaceae and Apiaceae families have the highest number of medicinal plants in their local area. [16]. According to other studies, the Lamiaceae are the most abundant family of medicinal plants in the Kerman province [15]. Furthermore, the Lamiaceae family has plants with medicinal properties that enable their use as sources of traditional drugs. These can be applied to digestive disorders, menstrual disorders, hepatitis, and liver diseases [35,36].

Plant parts are used as medicinal agents

Local people reported using different plant parts. The most common parts used were leaf (17.7%), seed (17.1%), aerial parts (16.6%), fruit (15.1%) and flower (11.9%), respectively (Fig 3). In contrast, indigenous people were least likely to use whole plants, corns, gums, skin, capsules, and rhizomes of plants. The leaf was the most popular, which could be explained botanically by photosynthesis-producing compounds such as chlorophyll, flavonoids, alkaloids, and other bioactive molecules [37,38]. These results support previous reports that leaf, fruit, and aerial parts are mainly medicinal [12,16].

Preparation and modes of application

The decoction was found to be the most frequently used method (56%) for preparing plant materials before medicine application. Other methods of preparing the plants were by processing them as freshly cooked (with a prevalence of 17%), using them by infusion (14%) and as liniment (12%) (Fig 4). Due to its ease of use, decoction is usually the most widely used method for medicinal plant preparation before consumption [39]. Use methods are oral, topical and combined. In the available literature, most plants are reported to be consumed orally, while the topical mode of application is subordinate to oral consumption (Table 1). For instance, Centaurium pulchellum subsp. grandiflorum (Batt.) Maire, Geranium rotundifolium L. and Berberis jamesiana Forrest & W.W.Sm.. are used only as oral, whereas Ducrosia assadii Alava. and Cymbopogon schoenanthus (L.) Spreng. are used by topical modes of application, and some species such as Sanguisorba minor Scop. and Papaver dubium L. are used orally and topically concurrently. Various methods are employed for plant preparation and application, emphasizing the diversity and complex nature of traditional medicinal practices. Traditional knowledge of medicinal plant use can be gained by understanding the various modes of application. This highlights the potential and adaptability of these natural resources to address healthcare needs [40].

thumbnail
Fig 4. Modes of preparations used of medicinal plants for treatment.

https://doi.org/10.1371/journal.pone.0303229.g004

The life cycle of plants

An analysis of the life cycles shows the dominance of Therophytes (48.93%) and Geophytes (21.98%) in the flora species in the current study (Fig 5). Geophytes and Therophytes in a region can provide valuable information about the availability and seasonal variations of medicinal plants used by local communities [41]. As a result of Therophytes’ ability to grow in adverse conditions and germinate quickly after rain, medicinal plant resources may be abundant during certain seasons [42]. Additionally, Geophytes can store vital nutrients underground, ensuring a continuous supply of medicinal plants, especially in arid climates [43].

thumbnail
Fig 5. Life grown forms of wild medicinal species from Shahrbābak area.

https://doi.org/10.1371/journal.pone.0303229.g005

Records and categories

Based on the data collected from the informants, a total number of 222 medicinal applications are reported in this work, which can be categorized into 14 groups which heal disorders of the digestive system (27.92%) as the most common ailment treated by plants, followed by metabolic disorders (14.41%), cold-flu and fever (10.81%), and problems of the nervous system (7.2%) (Fig 6). These results are similar to other studies in which many medicinal plants were used to alleviate digestive disorders [9,10,12,15,16,44]. Among these categories, some other ailments, such as constipation, diarrhea, and influenza, have been commonly treated in the Shahrbabak.

thumbnail
Fig 6. Percentage of species and citation in each medicinal use category.

https://doi.org/10.1371/journal.pone.0303229.g006

Comparison of different indices

Table 2 presents the results obtained from the ICF values for the categorized ailments. Metabolic (0.64) and musculoskeletal disorders (0.62) had the highest ICF value and included ailments such as kidney stones, urinary infections, diuretics and rheumatism, headaches and skeletal fractures. Also, the respiratory system (0.5) had a high ICF value and was followed by skin and hair (0.43), digestive system (0.4), nervous system (0.4), cold/flu/fever (0.4) and cuts/wounds (0.4). For liver problems and flavor/appetizing, the ICF values were 0.37 and 0.36, respectively. When the ICF index is very low, informants do not exchange extensive amounts of information about the use of species to treat diseases [11]. The digestive system was claimed to be treated most commonly (with 20 plants), followed by cold/flu/fever (with 15 plants), sedative ailments (with 11 plants), the nervous system (with ten plants), cuts/wounds (with ten plants) and skin and hair (with nine plants). According to the current study, metabolic and musculoskeletal disorders are the most common ailments in the Shahrbabak region. The current findings seem to be consistent with other research findings which found that metabolic disorders had the highest ICF in Sirjan, a city in the Kerman province [15] and Rasuwa District in Central Nepal [45]. Nonetheless, these results differ from some other published reports from Iran such as those carried out in the south of Kerman [16] and in the Kohgiluyeh and Boyer Ahmad provinces [46].

Adiantum capillus-veneris L. and Plantago ovata are the most prized plants in this region, therefore many informants confirmed that these are useful plants (Table 3). The number of informants reporting a specific use for a plant species is called a ‘Use Report’ (UR). Artemisia aucheri had the maximum number of reports confirming its medicinal use (26 UR), followed by Centaurium pulchellum (22 UR), Salix mucronata Thunb. (21 UR), Diarthron lessertii (Wikstr.) Kit Tan. and Plantago ovata with (20 UR) (Fig 7). Sadat-Hosseini et al reported that Chrysanthemum parthenium (L.) Pers. and Cerasus mahaleb (L.) Mill. had the highest number of uses, reasserting their medicinal purpose (23) in the south of Kerman. [16]. Nasab and Khosravi studied the Sirjan region in Kerman and discovered Malva sylvestris L. has the highest number of medicinal use reports [15].

thumbnail
Table 3. Comparison of important medicinal plants by using indices and species ranking based on each index.

https://doi.org/10.1371/journal.pone.0303229.t003

Table 3 provide the results obtained from the RFC and CI indices, respectively. The most critical species according to the RFC index are A. capillus-veneris, P. ovata,Malva parviflora var. Parviflora. and Genista tinctoria L. It can therefore be suggested that these species are commonly recognized by many informants in the Shahrbabak. However, A. aucheri ranked first through the CI index, and Table 3 shows the ranking based on CI and RFC indices. These results differ from some published studies. For instance, Sadat-Hosseini et al reported that C. mahaleb and C. parthenium ranked first in Kerman’s south [16], while Mosaddegh et al indicated that Teucrium polium L. ranked first in the Kohgiluyeh and Boyer Ahmad province [12]. The linear regression model drawn between informants’ age and the number of reported uses for a given medicinal plant is significant (P-value = 0.004; Fig 8). This indicates that older informants have more knowledge of the use of medicinal plants. RFC and CI indices show that the best-known plants have major chemical compounds (Table 4), such as 1,8-Cineol and α-pinene.

thumbnail
Fig 8. The liner regression model between informant’s age and number of plant species uses.

https://doi.org/10.1371/journal.pone.0303229.g008

thumbnail
Table 4. Main compounds of importance of medicinal species.

https://doi.org/10.1371/journal.pone.0303229.t004

Medicinal plants are used in combinations

In some cases, indigenous people treated diseases using a combination of medicinal plants. For example, combining Foeniculum vulgare Mill., Elwendia persica (Boiss.) Pimenov & Kljuykov and Cuminum cyminum L. alleviated carminative and gastric discomforts. Also, the combination of Tanacetum parthenium (L.), Ocimum basilicum L. and Nepeta glomerulosa Boiss. was reported to be effective as a nerve tonic. Traditional medicine utilizes combined medicinal plants to enhance therapeutic and minimize side effects [47]. By using this approach, new treatment strategies can be developed and local plants can be identified for drug development [48].

Side effects of medicinal plants

Informants believe combining F. vulgare, B. persicum, and C. cyminum can improve digestion. However, it can cause abortions in some women. Ferula species may also cause diarrhea in children and adults. Medicinal plants’ side effects are influenced by an individual’s reaction, dosage, preparation method, and interactions with other medications or health conditions [49,50]. Esmaeilzadeh, reported that herbal combinations can benefit certain ailments, but also present risks, such as abortion risk for women [49].

Comparison of plants identified in the current study with previous studies

A comparison between this study and 14 similar studies (in Iran and other countries) was conducted to identify the plants that were reportedly medicinal in the current work for the first time in the available literature. Previous studies were carried out in various regions of Iran, including Sirjan [15], south of Kerman [16], Kohgiluyeh and Boyer Ahmad [12], Saravan [11], Turkmen Sahra [9] and West Azarbaijan [10]. Other countries in which studies have been conducted include Pakistan [51,52], Sri Lanka [53], Brazil [54], China [55], Morocco [56], Italy [57] and India [58]. Table 5 presents the results of the comparing medicinal plants with other reports. Following the literature review, 57 of the 141 species are reported here for the first time to have medicinal uses.

thumbnail
Table 5. Comparative presence-absence matrix for the recorded plant species.

https://doi.org/10.1371/journal.pone.0303229.t005

Other results from this comparison showed that some plants had a more comprehensive distribution range but different uses. In various studies, for example, F. vulgare is reported to have other uses for the treatment of various ailments such as abdominal pain and bloating [15], gastric discomfort, bone and joint pain [16], diuretic problems and kidney malfunctions [12]. It could also be used to treat menstrual disorders, or as a lactiferous agent. It could be used to alleviate coughs, asthma and digestive disorders, while also serving as a nerve tonic [11]. It is a carminative and hypnotic agent [9] and could treat hypertension [51], diabetes [56] and stomachache [57].

Criteria for evaluation

F-measure, Recall, Precision, Receiver Operating Characteristic (ROC) and Cohen’s Kappa are used in this study to evaluate the performance of the proposed method [59]. An important evaluation criterion in data mining is accuracy. Several studies have discovered the use of different assessment metrics in predictive modeling for medicinal plant uses. While accuracy is a vital criterion, it is significant to consider other metrics such as precision, recall, F-measure, Cohen’s Kappa coefficient, and ROC analysis [6062]. These metrics can provide a more comprehensive understanding of model performance, particularly in multi-class classification problems. However, the choice of evaluation metric should be tailored to the specific objectives of the model, with accuracy being less suitable for particular applications [62]. It is possible to determine whether the proposed method accurately predicted the output based on accuracy. To obtain detailed information about the model, other evaluation criteria should be used besides measuring accuracy. The goal is to predict the mode of application of the medicinal plant, which includes three modes: oral, topical, or both. Based on the class of feature (mode of application), Fig 9 shows the distribution of each feature sample.

thumbnail
Fig 9. The amount of dispersion of samples of each class based on the class of "mode of application".

https://doi.org/10.1371/journal.pone.0303229.g009

Table 6 presents the confusion matrix, the values can be in one of the categories TN (True Negative), TP (True Positive), FN (False Negative) and FP (False Positive).

  • TP: The algorithm classified the sample in the positive category and the sample is also positive.
  • FP: The algorithm classified the sample in the positive category, but the sample is negative.
  • TN: The algorithm classified the sample in the negative category and the sample was also negative.
  • FN: The algorithm classified the sample in the negative category, but the sample is positive.

In other words, when the algorithm mispredicts the sample class, the result will be FN or FP. When the algorithm correctly predicts the sample class, the result will be TN or TP. By using the following ratio, we can determine the model’s accuracy.

A model’s accuracy is determined by its ability to detect the medicinal plant’s mode of application correctly. The amount of data that can be recognized correctly equals the total number of available data. A model with a higher detection accuracy value will be more accurate and reliable. Eq (1) shows the accuracy evaluation criteria.

(1)

Precision

This evaluation criterion is used when the proposed method positively predicts the outcome. The precision criterion will be appropriate when the False Positive (FP) class detection accuracy value is high. Criteria for evaluating precision are given in relation (2).

(2)

Recall

The recall criteria are used to evaluate negative class detection accuracy. It is appropriate to use the Recall criterion when the false negative value (FN) is high. The Recall criterion is shown in Eq (3).

(3)

F-measure

A critical evaluation criterion for model accuracy is the F-measure. The two measures of Recall and Precision are combined to form this criterion. Eq (4) shows the F-measure criterion.

(4)

Cohen’s kappa coefficient

Cohen’s kappa coefficient is a numerical measure between -1 and +1, any measure closer to +1 indicates adequate performance, and the closer this value is to -1, it indicates disagreement. Cohen’s kappa coefficient is given in Eq (5).

(5)

Receiver Operating Characteristic (ROC)

shows the area under the curve (AUC). A ROC analysis is one of the most critical evaluation criteria for supervised learning models. We can create a ROC curve by plotting the True Positive Rate against the False Positive Rate. Since the threshold is variable, a continuous graph will result.

10-fold cross-validation

The K-fold cross-validation method proves the model’s performance. The 10-fold cross-validation method divides the original sample into ten equal parts. In each iteration, nine parts are considered training data, and one part is considered test data until the entire data is scrolled. In this method, the presented model was trained and tested ten times, and the result is an average accuracy of ten times. The benefit of using this approach is that it mitigates the overfitting risks linked to random sampling [59]. Fig 10 shows the test accuracy of different classification algorithms under the 70–30 split and 10-fold cross-validation. In the 70–30 split, 70% of the data was utilized for training the proposed model and 30% for testing the proposed model. The results show that in the 70–30 split, the J48 decision tree algorithm correctly predicted the dataset samples with an accuracy of 95.24%. In the 10-fold cross-validation, the J48 decision tree algorithm correctly assigned new samples to their respective classes with 95% accuracy. Since the 10-fold cross-validation is the average of ten times, and the number of dataset records is small, we use the 10-fold cross-validation method for prediction. Based on Fig 10, there is not much difference between 10-fold cross-validation and 30–70 division, and since the J48 decision-tree algorithm achieved 95% accuracy with cross-validation, this model is used. The confusion matrix table was used to calculate the model value based on different evaluation metrics. Table 7 shows that the J48 decision-tree algorithm achieved high accuracy in each evaluation metric, indicating that it is a very accurate algorithm. It is also more efficient than other algorithms.

thumbnail
Fig 10. Comparison between accuracy through model training based on cross-validation and 70–30 split.

https://doi.org/10.1371/journal.pone.0303229.g010

thumbnail
Table 7. Comparison of evaluation criteria of the proposed model.

https://doi.org/10.1371/journal.pone.0303229.t007

Conclusion

Using data mining analysis, we gained valuable knowledge about medicinal plants uses. We noticed clear preferences for specific plant families, including Lamiaceae, Fabaceae, and Apiaceae, which are strongly inclined to apply leaves to medicinal preparations. Based on the current study results, the following conclusions and suggestions are presented:

  • Focusing on documenting, standardizing, and preserving traditional knowledge and quality is crucial. We must carefully evaluate herbal combinations for potential side effects and consider dosage regulation and individual responses.
  • For better reproducibility and understanding of ethnopharmacological studies, it should be considered that differences in language dialects and cultural interpretations could have influenced the data collection process and introduced complexities in data interpretation. Additionally, ensuring data quality from local informants raises concerns about reliability.
  • Furthermore, predictive modeling based on machine learning algorithms showed promise for predicting plant applications. However, future works will be challenged by limited data availability, model generalization across diverse regions, and indigenous knowledge conservation and utilization.
  • Future studies could investigate these medicinal plants’ compounds and biochemical properties using data mining algorithms. This scientific investigation could help identify their antibacterial, antifungal, antitoxic, or neutral properties.

While data mining provides valuable insight into medicinal plant usage, future studies should focus on standardization, ethical considerations, and strong model development.

Acknowledgments

We would like to thank the University of Jiroft for supporting the current study. We also thank Mohsen Hamedpour-Darabi for editing the research language of the paper.

References

  1. 1. Silambarasan R, Ayyanar M. An ethnobotanical study of medicinal plants in Palamalai region of Eastern Ghats, India. Journal of ethnopharmacology. 2015; 22:172:162–78. pmid:26068426
  2. 2. Houghton PJ. The role of plants in traditional medicine and current therapy. The Journal of Alternative and Complementary Medicine. 1995 Jun 1;1(2):131–43. pmid:9395610
  3. 3. Tuttolomondo T, Licata M, Leto C, Savo V, Bonsangue G, Gargano ML, et al. Ethnobotanical investigation on wild medicinal plants in the Monti Sicani Regional Park (Sicily, Italy). Journal of Ethnopharmacology. 2014;14:153(3):568–86. pmid:24632020
  4. 4. Farnsworth NR. The role of ethnopharmacology in drug development. InCiba Foundation Symposium 154‐Bioactive Compounds from Plants: Bioactive Compounds from Plants: Ciba Foundation Symposium 154 2007 Sep 28 (pp. 2–21). Chichester, UK: John Wiley & Sons, Ltd.
  5. 5. Ahmad M, Zafar M, Shahzadi N, Yaseen G, Murphey TM, Sultana S. Ethnobotanical importance of medicinal plants traded in Herbal markets of Rawalpindi-Pakistan. Journal of Herbal Medicine. 2018;1(11):78–89.
  6. 6. Heinrich M, Lardos A, Leonti M, Weckerle C, Willcox M, Applequist W, et al. Best practice in research: consensus statement on ethnopharmacological field studies–ConSEFS. Journal of ethnopharmacology. 2018;30:211:329–39.
  7. 7. Idolo M, Motti R, Mazzoleni S. Ethnobotanical and phytomedicinal knowledge in a long-history protected area, the Abruzzo, Lazio and Molise National Park (Italian Apennines). Journal of Ethnopharmacology. 2010;3:127(2):379–95. pmid:19874882
  8. 8. Mahmood A, Malik RN, Shinwari ZK, Mahmood AQ. Ethnobotanical survey of plants from Neelum, Azad Jammu and Kashmir, Pakistan. Pak. J. Bot. 2011;1:43(1):105–10.
  9. 9. Ghorbani A. Studies on pharmaceutical ethnobotany in the region of Turkmen Sahra, north of Iran:(Part 1): General results. Journal of ethnopharmacology. 2005;31:102(1):58–68. pmid:16024194
  10. 10. Miraldi E, Ferri S, Mostaghimi V. Botanical drugs and preparations in the traditional medicine of West Azerbaijan (Iran). Journal of ethnopharmacology. 2001;1:75(2–3):77–87. pmid:11297838
  11. 11. Sadeghi Z, Kuhestani K, Abdollahi V, Mahmood A. Ethnopharmacological studies of indigenous medicinal plants of Saravan region, Baluchistan, Iran. Journal of Ethnopharmacology. 2014;11:153(1):111–8. pmid:24509152
  12. 12. Mosaddegh M, Naghibi F, Moazzeni H, Pirani A, Esmaeili S. Ethnobotanical survey of herbal remedies traditionally used in Kohghiluyeh va Boyer Ahmad province of Iran. Journal of ethnopharmacology. 2012;7:141(1):80–95. pmid:22366675
  13. 13. Dolatkhahi M, Yousefi M, Bagher Nejad J, Dolatkhahi A. Introductory study of the medicinal plant species of Kazeroon, Fars province. Journal of Medicinal Herbs,. 2010;1:1(3):47–56.
  14. 14. Safa O, Soltanipoor MA, Rastegar S, Kazemi M, Dehkordi KN, Ghannadi A. An ethnobotanical survey on hormozgan province, Iran. Avicenna journal of phytomedicine. 2013;3(1):64. pmid:25050260
  15. 15. Nasab FK, Khosravi AR. Ethnobotanical study of medicinal plants of Sirjan in Kerman Province, Iran. Journal of ethnopharmacology. 2014;28:154(1):190–7. pmid:24746480
  16. 16. Sadat-Hosseini M, Farajpour M, Boroomand N, Solaimani-Sardou F. Ethnopharmacological studies of indigenous medicinal plants in the south of Kerman, Iran. Journal of Ethnopharmacology. 2017;6:199:194–204. pmid:28167292
  17. 17. Hand D, Mannila H, Smyth P. Principles of Data Mining”. The MIT Press. InA comprehensive, highlytechnical look at the math and science behind extracting useful information from large databases 2001 (Vol. 546).
  18. 18. Heydari F, Rafsanjani MK. A review on lung cancer diagnosis using data mining algorithms. Current Medical Imaging. 2021;1:17(1):16–26.
  19. 19. Elavarasan D, Vincent DR, Sharma V, Zomaya AY, Srinivasan K. Forecasting yield by integrating agrarian factors and machine learning models: A survey. Computers and electronics in agriculture. 2018;1:155:257–82.
  20. 20. Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, et al. Data mining in healthcare and biomedicine: a survey of the literature. Journal of medical systems. 2012;36:2431–48. pmid:21537851
  21. 21. Dongare AD, Kharde RR, Kachare AD. Introduction to artificial neural network. International Journal of Engineering and Innovative Technology (IJEIT). 2012;2(1):189–94.
  22. 22. Tayi GK, Ballou DP. Examining data quality. Communications of the ACM. 1998; 1:41(2):54–7.
  23. 23. Zhen C, Jiang C. Overview of data mining in the era of big data. International Core Journal of Engineering. 2019;1:5(10):136–9
  24. 24. Liu H, Motoda H, editors. Computational methods of feature selection. CRC press; 2007;29.
  25. 25. Aipperspach R, Rattenbury TL, Woodruff A, Anderson K, Canny JF, Aoki P. Ethno-mining: integrating numbers and words from the ground up. Electrical Engineering and Computer Sciences University of California at Berkeley Tech Report. 2006;6.
  26. 26. Axiotis E, Kontogiannis A, Kalpoutzakis E, Giannakopoulos G. A Personalized Machine-Learning-Enabled Method for Efficient Research in Ethnopharmacology. The Case of the Southern Balkans and the Coastal Zone of Asia Minor. Applied Sciences. 2021; 23:11(13):5826.
  27. 27. Rechinger KH. Flora Iranica, vols. 1–178. Akad Druck-U Verlagsanstalt, Graz. 1963;
  28. 28. Zohary M, Feindbrun-Dothan N. 1966–1986. Flora Palaestina, Vols. 1–4. Jeruselam Academic Pres, Israel.
  29. 29. Townsend CC, Guest E, Al-Ravi A. Flora of Iraq. vols. 1–9. Ministry of Agriculture and Agrarian Reform, Baghdad. 1966.
  30. 30. Davis PH. Flora of Turkey. Flora of Turkey. 1965.
  31. 31. Ghahreman A. Flora of Iran. vols. 1–25. Research Institute of Forests and Rangelands, Tehran (in Persian). 1975.
  32. 32. Raunkiaer C. The life-forms of plants and their bearing on geography. The life forms of plants and statistical plant geography. 1934;2–104.
  33. 33. Trotter RT, Logan MH. Informant consensus: a new approach for identifying potentially effective medicinal plants. InPlants and Indigenous Medicine and Diet 2019;16:91–112. Routledge.
  34. 34. Mostafaeipour A, Sedaghat A, Dehghan-Niri AA, Kalantar V. Wind energy feasibility study for city of Shahrbabak in Iran. Renewable and Sustainable Energy Reviews. 2011; 1:15(6):2545–56.
  35. 35. Rokaya MB, Münzbergová Z, Timsina B. Ethnobotanical study of medicinal plants from the Humla district of western Nepal. Journal of Ethnopharmacology. 2010;9:130(3):485–504. pmid:20553834
  36. 36. Stankovic MS, Topuzovic M, Solujic S, Mihailovic V. Antioxidant activity and concentration of phenols and flavonoids in the whole plant and plant parts of Teucrium chamaedrys L. var. glanduliferum Haussk. Journal of Medicinal Plants Research. 2010;18:4(20):2092–8.
  37. 37. Tattini M, Gravano E, Pinelli P, Mulinacci N, Romani A. Flavonoids accumulate in leaves and glandular trichomes of Phillyrea latifolia exposed to excess solar radiation. The New Phytologist. 2000;148(1):69–77. pmid:33863030
  38. 38. Aye MM, Aung HT, Sein MM, Armijos C. A review on the phytochemistry, medicinal properties and pharmacological activities of 15 selected Myanmar medicinal plants. Molecules. 2019;15:24(2):293. pmid:30650546
  39. 39. Nadembega P, Boussim JI, Nikiema JB, Poli F, Antognoni F. Medicinal plants in Baskoure, Kourittenga province, Burkina Faso: an ethnobotanical study. Journal of ethnopharmacology. 2011;27:133(2):378–95. pmid:20950680
  40. 40. JU SK MJ KC, Semotiuk AJKrishna V. Indigenous knowledge on medicinal plants used by ethnic communities of South India. Ethnobotany Research and Applications. 2019;11:18:1–12.
  41. 41. Hachemi N, Hasnaoui O, Bouazza M, Benmehdi I, Medjati N. The therophytes aromatic and medicinal plants of the southern slopes of the mountains of Tlemcen (western Algeria) between utility and degradation. Research Journal of Pharmaceutical, Biological and Chemical Sciences. 2013;4(1):1194–203.
  42. 42. Heidari Rikan M, Malekmoohamadi L. Medicinal plants in Ghasemloo valley of Uromieh. Iranian Journal of Medicinal and Aromatic Plants Research. 2007;23:23(2):234–50.
  43. 43. Qasim M, Gulzar S, Khan MA. Halophytes as medicinal plants. Urbanisation, land use, land degradation and environment. 2011;330–43.
  44. 44. Heinrich M, Ankli A, Frei B, Weimann C, Sticher O. Medicinal plants in Mexico: Healers’ consensus and cultural importance. Social science & medicine. 1998;1:47(11):1859–71. pmid:9877354
  45. 45. Uprety Y, Asselin H, Boon EK, Yadav S, Shrestha KK. Indigenous use and bio-efficacy of medicinal plants in the Rasuwa District, Central Nepal. Journal of ethnobiology and ethnomedicine. 2010;6:1–0.
  46. 46. Jahantab E, Hatami E, Sayadian M, Salahi Ardakani A. Ethnobotanical study of medicinal plants of Boyer Ahmad and Dena regions in Kohgiluyeh and Boyer Ahmad province, Iran. Adv Herb Med. 2018;4(4):12–22.
  47. 47. Che CT, Wang ZJ, Chow MS, Lam CW. Herb-herb combination for therapeutic enhancement and advancement: theory, practice and future perspectives. Molecules. 2013;3:18(5):5125–41. pmid:23644978
  48. 48. Farnsworth NR, Akerele O, Bingel AS, Soejarto DD, Guo Z. Medicinal plants in therapy. Bulletin of the world health organization. 1985;63(6):965. pmid:3879679
  49. 49. Esmaeilzadeh M, Moradi B. Medicinal herbs with side effects during pregnancy-An evidence-based review article. The Iranian Journal of Obstetrics, Gynecology and Infertility. 2017; 22:20:9–25.
  50. 50. Nasri H, Shirzad H. Toxicity and safety of medicinal plants. J HerbMed Plarmacol. 2013;2(2):21–2.
  51. 51. Ahmad L, Semotiuk A, Zafar M, Ahmad M, Sultana S, Liu QR, et al. Ethnopharmacological documentation of medicinal plants used for hypertension among the local communities of DIR Lower, Pakistan. Journal of Ethnopharmacology. 2015;4:175:138–46. pmid:26392329
  52. 52. Ishtiaq M, Mahmood A, Maqbool M. Indigenous knowledge of medicinal plants from Sudhanoti district (AJK), Pakistan. Journal of ethnopharmacology. 2015;20:168:201–7. pmid:25666425
  53. 53. Dharmadasa RM, Akalanka GC, Muthukumarana PR, Wijesekara RG. Ethnopharmacological survey on medicinal plants used in snakebite treatments in Western and Sabaragamuwa provinces in Sri Lanka. Journal of Ethnopharmacology. 2016;17:179:110–27. pmid:26724891
  54. 54. Ribeiro RV, Bieski IG, Balogun SO, de Oliveira Martins DT. Ethnobotanical study of medicinal plants used by Ribeirinhos in the North Araguaia microregion, Mato Grosso, Brazil. Journal of ethnopharmacology. 2017;9:205:69–102. pmid:28476677
  55. 55. Li DL, Zheng XL, Duan L, Deng SW, Ye W, Wang AH, et al. Ethnobotanical survey of herbal tea plants from the traditional markets in Chaoshan, China. Journal of ethnopharmacology. 2017;9:205:195–206. pmid:28249822
  56. 56. Barkaoui M, Katiri A, Boubaker H, Msanda F. Ethnobotanical survey of medicinal plants used in the traditional treatment of diabetes in Chtouka Ait Baha and Tiznit (Western Anti-Atlas), Morocco. Journal of ethnopharmacology. 2017;23:198:338–50. pmid:28109915
  57. 57. Fortini P, Di Marzio P, Guarrera PM, Iorizzi M. Ethnobotanical study on the medicinal plants in the Mainarde Mountains (central-southern Apennine, Italy). Journal of Ethnopharmacology. 2016;26:184:208–18. pmid:26969402
  58. 58. Adhikari PP, Talukdar S, & Borah A. Ethnomedicobotanical study of indigenous knowledge on medicinal plants used for the treatment of reproductive problems in Nalbari district, Assam, India. Journal of ethnopharmacology. 2018;210:386–407. pmid:28733191
  59. 59. Wu H., Yang S., Huang Z., He J., & Wang X. (2018). Type 2 diabetes mellitus prediction model based on data mining. Informatics in Medicine Unlocked, 10:100–107.
  60. 60. Amancio DR, Comin CH, Casanova D, Travieso G, Bruno OM, Rodrigues FA, et al. A systematic comparison of supervised classifiers. PloS one. 2014;24:9(4):e94137. pmid:24763312
  61. 61. Sokolova M. Learning from communication data: Language in electronic business negotiations (Doctoral dissertation, University of Ottawa (Canada)).
  62. 62. Dinga R, Penninx BW, Veltman DJ, Schmaal L, Marquand AF. Beyond accuracy: measures for assessing machine learning models, pitfalls and guidelines. BioRxiv. 2019;22:743138.