Figures
Abstract
To address the challenges of feature sparsity, semantic ambiguity, and insufficient feature extraction in sheep disease question classification, this paper proposes a novel model named Dual-Channel Feature Fusion Network for Sheep Diseases Question Classification (DFF-SDQC). The model leverages the CINO pre-trained model to generate dynamic word embeddings, thereby enriching semantic representations. Subsequently, global textual features are captured through BiLSTM, while deeper local contextual features are extracted using an attention mechanism. To further enhance the robustness and generalization of the model, a question-word attention mechanism is introduced, enabling the attention matrix to better capture the intentions expressed by interrogative words, thus strengthening the overall feature representation of the question. Finally, dual-channel feature information is fused to obtain the final textual representation. Experimental results on the D-SDQC and D-TQC datasets show that DFF-SDQC achieves an F1-score of 93.18% on D-SDQC, improving 2.22 percentage points over the strongest baseline, demonstrating the effectiveness of the dual-channel fusion and attention design.
Citation: Haisa G, Kezierbieke G (2026) Dual-channel feature fusion network for sheep diseases question classification. PLoS One 21(3): e0343990. https://doi.org/10.1371/journal.pone.0343990
Editor: Yasin ALTAY, Eskisehir Osmangazi University: Eskisehir Osmangazi Universitesi, TÜRKIYE
Received: October 7, 2025; Accepted: February 14, 2026; Published: March 30, 2026
Copyright: © 2026 Haisa, Kezierbieke. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data cannot be shared publicly because the data are part of an ongoing veterinary question-answering research project and the complete dataset is still being curated. Data are available from the research team at Xinjiang Agricultural University (contact via glzd@xjau.edu.cn) for researchers who meet the criteria for access to confidential data.
Funding: This study was financially supported by the Xinjian Tianchi Elite Project in the form of a grant awarded to GH (6661045/2225ZZQRCXM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. No additional external funding was received for this study.
Competing interests: Not applicable.
1. Introduction
Question classification plays a pivotal role in question answering (QA) systems, as it enables mapping user queries to predefined categories for efficient answer retrieval [1]. Existing research on question classification has primarily focused on real-time queries such as temporal, entity-based, and descriptive questions. For example, the TREC conference [2] has long emphasized fact-based question classification. In contrast, domain-specific QA tasks involve not only factual queries but also large amounts of professional, knowledge-intensive questions. In high-resource languages such as English [3] and Chinese [4], numerous studies have been proposed to improve the effectiveness of question classification models. While existing research has achieved considerable progress in high-resource languages such as English and Chinese, low-resource languages like Kazakh remain underexplored, facing challenges such as data sparsity, morphological complexity, and insufficient corpus resources.
To address these challenges, we propose the Dual-Channel Feature Fusion Network for Sheep Diseases Question Classification (DFF-SDQC), tailored for the veterinary domain. Unlike traditional approaches that struggle with sparse and ambiguous short-text representations, DFF-SDQC integrates dynamic word embeddings from a pre-trained CINO model with bidirectional sequence features from BiLSTM. In addition, a novel question-word attention mechanism is designed to capture the intent expressed by interrogatives, while an enhanced convolutional layer strengthens global and contextual feature representations. Furthermore, we construct a domain-specific dataset for sheep disease question classification in Kazakh, covering 32 distinct question types. Experimental results show that DFF-SDQC consistently outperforms competitive baselines, highlighting its robustness and generalization ability in both low-resource and specialized domains.
The remainder of this paper is structured as follows. The subsequent section provides an overview of related work. Section 3 presents our proposed model. Section 4 outlines our experimental setup, including datasets, baselines, implementation details, experimental results, and analysis. Finally, Section 5 concludes the paper and discusses future directions.
2. Related work
There are a lot of question classification tasks and approaches, and we briefly review the most widely-used methods in this paper.
Question classification is a fundamental text classification task with significant applications in natural language processing (NLP) fields such as question answering (QA) and dialogue systems. The earliest approaches were rule-based, where a predefined set of rules guided the extraction of semantic information from text. While these methods could achieve satisfactory classification results, they required extensive handcrafted rules and exhibited poor generalization. For example, Hovy et al.[5] employed rule-based strategies to represent text with handcrafted rules for classification, and Brill et al. [6] applied regular expressions for text classification. However, such approaches were inherently limited by subjective human judgments.
Traditional machine learning methods alleviated the dependence on handcrafted rules by leveraging larger corpora or iterative optimization, though they still required large-scale annotated datasets. Metzler et al.[7] applied radial basis kernel functions combined with multiple feature fusion techniques for English question classification, while Zhang et al.[8] introduced tree kernels to allow support vector machines to exploit syntactic structures of questions. Nevertheless, these methods still relied heavily on manual feature engineering, which constrained their scalability and efficiency.
With the rapid advancement of NLP, many scholars have turned to deep learning techniques to enhance the performance of question classification [9]. Kim et al.[10] proposed a convolutional neural network (CNN)-based sentence classification model using word embeddings. Dachapally et al.[11] extended CNN architectures by first classifying questions into broad categories and then refining them into more specific classes using prior knowledge. In the Chinese question classification domain, Liu et al. [4] employed attention mechanisms over BiGRU outputs to assign weights and used CNNs to learn local feature representations, effectively combining the strengths of BiGRU and CNN. Similarly, Shi et al. [12]adopted BiLSTM and CNN models for community QA question classification tasks.
More recently, studies have explored fine-tuning pre-trained models [13,14], multi-task learning [15,16], and knowledge distillation [17] to further improve classification accuracy. In addition, research in specialized domains has increasingly incorporated multimodal information [18,19] to overcome the limitations of unimodal approaches, thereby offering new directions and solutions for practical applications. For low-resource languages such as Kazakh, research on question classification remains limited. Wang et al. [20] pioneered the use of SVMs for Kazakh text classification, while Shaldan et al. [21] employed stem-based text representation combined with CNNs to classify Kazakh short texts. Haisa et al. [22,23] applied multi-feature deep learning approaches for Kazakh question classification, incorporating syntactic features. Despite these advances, substantial challenges remain in effectively modeling Kazakh question semantics due to resource scarcity and linguistic complexity. Overall, while numerous optimization algorithms and classification models have been proposed across different scenarios, question classification in domain-specific and low-resource settings remains an open challenge. In particular, for languages with limited data resources, such as Kazakh, question classification continues to be a difficult yet important research problem that requires further exploration.
3. Methodology
3.1. Problem definition
Natural language processing (NLP) for Kazakh remains a challenging task due to its agglutinative characteristics and rich morphological structures. In this study, we aim to effectively address the problem of Kazakh question classification by combining modern deep learning techniques with the linguistic characteristics of Kazakh interrogative structures.
On the one hand, Kazakh is recognized as a low-resource language [24,25], with highly complex linguistic features spanning five grammatical levels: syllables, stems, words, phrases, and sentences. Kazakh questions typically contain relatively few words, resulting in insufficient feature information for classification. Moreover, the addition of prefixes and suffixes to word stems produces numerous derived word forms. Stems serve as the core lexical units with semantic meaning, whereas affixes provide both semantic and grammatical functions [26]. Through morphological analysis and processing, meaningful and effective textual features can be preserved while reducing complexity and dimensionality. For example, the words “قويلار” (many sheep), “قويدىڭ” (of the sheep), “قويشى” (shepherd), “قويعا” (to the sheep), and “قويىڭ” (your sheep) all share the stem “قوي” (sheep), while the affixes “لار”, “دىڭ”, “شى”, “عا”, and “ىڭ” modify the semantics or grammatical roles. On the other hand, interrogative pronouns in Kazakh play a crucial role in indicating the type of information being sought. For instance, the interrogative pronouns “قانداي” (what kind) and “قانشالىقتا” (how long/how much) directly determine the type of the question.
In Table 1, the type of the question can be inferred primarily from the interrogative pronoun, indicating that the relationship between interrogative pronouns and other words has a decisive impact on question classification. To address these issues, this study first designs a classification schema for sheep disease-related questions and constructs a dataset through data augmentation. Then, for the Kazakh question classification task, we employ a pre-trained model to obtain dynamic word embeddings and introduce an interrogative pronoun attention mechanism to strengthen semantic feature representations of questions.
3.2. Dataset construction and augmentation
3.2.1. Sheep disease question classification schema.
To construct a well-defined classification schema for sheep disease-related questions, it is essential to analyze existing research on sheep diseases [27–30] and conduct an in-depth investigation of sheep disease question corpora. Existing question classification frameworks generally fall into two categories: hierarchical (tree-structured) and flat (parallel) structures. Commonly used schemas often follow the UIUC classification system proposed for the TREC evaluation tasks [31]. However, studies focusing on veterinary medicine and sheep disease question classification are very limited. Therefore, in this work, we analyze the themes of user-generated questions from online sources and combine insights from relevant sheep disease literature to define a Kazakh-language question classification schema tailored to sheep diseases. The resulting dataset comprises 7 main categories and 32 subcategories, with detailed information presented in Table 2.
The Kazakh-language question classification schema for the sheep disease domain adopts a two-level hierarchical framework. The main categories primarily follow the UIUC English question classification standards, aligning with the principle that future research should build upon existing studies whenever possible. The subcategories are derived by analyzing users’ question patterns on various websites, enabling finer-grained categorization to facilitate subsequent answer extraction.
3.2.2. Dataset augmentation.
Data augmentation is a widely used technique for expanding the size of training datasets, thereby enhancing the performance of deep learning models and mitigating overfitting. By increasing the diversity of training examples, data augmentation facilitates the development of more robust models, especially when training data is limited. In artificial intelligence, data augmentation methods have been extensively applied across various domains, including computer vision [32], speech processing [33], and natural language processing (NLP) tasks [34].
Algorithm 1: Synonym Replacement for Data Augmentation
1: Input: Question sets (seeds) Ds
2: Output: Augmented data Da
3: Initialize: Ds, Da
4: for m in range do
5: for k in range do
6: Select k keywords from Ds
7: Replace each keyword k with m synonyms
8: Generate m new questions
9: end for
10: end for
11: return Da
The synonym replacement method can generate additional training data, thereby improving model performance and robustness. The procedure is as follows:
- Preprocessing: Clean the original dataset, compute the term frequency (TF) for each word in each category, and rank the words in descending order of TF to select k keywords.
- Synonym Dictionary Construction: Create a dictionary containing m synonyms for each of the k keywords.
- Seed Selection: Randomly select 30% of the questions in each category from the original dataset as seed data.
- Sentence Generation: For each question in the seed set, extract all keywords and replace them with their synonyms from the dictionary to generate new sentences.
Following this process, the question dataset can be effectively augmented, improving model accuracy and generalization. Newly generated questions are labeled as valid if they are semantically and grammatically correct; otherwise, they are discarded. The selection of k keywords and m synonyms is illustrated in Algorithm 1.
Due to the scarcity of Kazakh synonym resources, we manually constructed a domain-specific synonym dictionary for the sheep disease domain. Two reference dictionaries were used during construction: (1) the Kazakh Synonym Dictionary [35], containing 16,000 entries, and (2) the Chinese-Kazakh Synonym Explanation Dictionary [36], containing 3,300 entries. After processing, a final dictionary comprising 1,000 words was created, with each word associated with 1–10 synonyms. This dictionary provides a valuable resource for Kazakh NLP tasks, particularly in domain-specific applications.
Human verification and error rate control in synonym-based data augmentation: To address the potential semantic drift and grammatical errors introduced by synonym replacement in low-resource language scenarios, this study incorporates a human verification mechanism into the data augmentation process to ensure the quality and reliability of the augmented data. In addition, the error rate induced by synonym replacement is systematically analyzed and quantitatively evaluated. After completing the synonym replacement procedure based on Algorithm 1, all automatically generated questions are first subjected to a manual verification stage. The verification is conducted by researchers proficient in the Kazakh language and familiar with the target domain, following a dual-criteria evaluation framework that simultaneously assesses semantic consistency and grammatical correctness. Specifically, (1) semantic consistency requires that the core meaning of an augmented question remains unchanged after keyword substitution and that the question still corresponds to its original category label; and (2) grammatical correctness requires that the generated question conforms to the fundamental morphological, word order, and syntactic rules of the Kazakh language. Only questions that satisfy both criteria are labeled as valid positive samples; otherwise, they are regarded as invalid and discarded.
To quantitatively evaluate the errors introduced by synonym replacement, we define a data augmentation accuracy metric t, which measures the proportion of valid samples among the augmented data, as, where
denotes the number of augmented questions that pass manual verification in terms of both semantic and grammatical correctness, and
denotes the total number of questions generated during the data augmentation stage. Comparative analyses under different parameter configurations—namely, the number of selected keywords
and the number of synonyms
—indicate that, when
is fixed, increasing m leads to a gradual decrease in the proportion of valid samples. Conversely, when m is fixed, increasing k also results in a reduction in the proportion of valid samples. Relatively higher-quality augmented datasets are obtained when both
and
take smaller values.
Considering the trade-off between dataset expansion and the cost of manual verification, we ultimately select and
as the optimal parameter configuration. Under this setting, the data augmentation accuracy remains at a high level while effectively increasing the number of training samples, thereby achieving a balance between data quality and data scale. These results demonstrate that the introduction of a human verification mechanism, together with a quantitative analysis of synonym replacement error rates, is a crucial factor in ensuring the feasibility and effectiveness of synonym-based data augmentation methods for low-resource languages.
3.2.3. Sheep disease question classification dataset.
The sheep disease question classification dataset constructed in this study comprises two main components: the creation of the original dataset and its subsequent augmentation.
- (1). Original Dataset Construction
First, raw question data were collected and organized from major websites and relevant literature, and categorized according to the proposed question classification schema. Subsequently, the collected questions were manually annotated by native Kazakh-speaking experts to ensure labeling accuracy and reliability. The original dataset consists of 3,000 questions distributed across 7 main categories and 32 subcategories, including: Disease Information (309 questions), Symptoms (314 questions), Diagnosis (421 questions), Treatment (571 questions), Prevention (593 questions), Husbandry Management (470 questions), and Emergency Care (322 questions). All annotation work was conducted manually to ensure high-quality labels.
- (2). After Dataset Augmentation
Using the data augmentation techniques described in Section 3.2.2, the original dataset was expanded to 7,000 questions. The distribution of question categories in the augmented dataset is shown in Fig 1. After augmentation, the distribution is as follows: Disease Information (623, 8.90%), Symptoms (843, 12.04%), Diagnosis (931, 13.30%), Treatment (1,275, 18.21%), Prevention (1,411, 20.16%), Husbandry Management (1,157, 16.53%), and Emergency Care (758, 10.83%). Data augmentation effectively increased the scale of training data, improving model generalization and the accuracy of question classification. The most frequent question categories in the dataset are Diagnosis, Treatment, Prevention, Symptoms, and Husbandry Management, whereas Disease Information and Emergency Care categories have the fewest questions. This distribution may reflect the data collection methods and the typical purposes for which users submit questions.
- (3). Question Length Distribution
In natural language processing, text length is an important consideration because it affects both the complexity of semantic expression and the difficulty of comprehension. Specifically, in question classification tasks, longer questions generally contain more information, better reflecting the user’s intended meaning. Theoretically, as the number of words in a question increases, the provided contextual information becomes richer, enabling classification models to more accurately capture semantic features and assign correct categories.
Statistical analysis of the Kazakh-language question dataset shows that most questions contain between 2 and 29 words, with questions of 7–11 words being particularly prevalent. To more intuitively illustrate this trend, the word-count distribution is presented in Fig 2. The Fig clearly depict the relationship between the number of words in a question and the frequency of questions, providing valuable insight for model design and evaluation.
3.3. DFF-SDQC
The proposed Dual-Channel Feature Fusion Network for Sheep Diseases Question Classification (DFF-SDQC) is illustrated in Fig 3. Initially, the input question text undergoes tokenization and word embedding mapping. The processed text is then fed into the CINO model, which employs a multi-layer Transformer architecture to encode the text and extract semantic information. Leveraging large-scale pretraining, CINO effectively captures Kazakh semantic representations, producing word embeddings that encode lexical meaning, contextual information, syntactic and grammatical structures, as well as domain-specific knowledge. These embeddings are subsequently input into a bidirectional BiLSTM module, which combines forward and backward LSTM layers to capture bidirectional contextual information. Each LSTM direction processes the input sequence independently, thereby learning the temporal and sequential features of sheep disease questions. In the left channel, interrogative pronouns within the question are identified to construct an interrogative pronoun attention matrix. The resulting features are then fed into the left CNN channel, where convolutional kernels of various sizes extract multi-scale features. The attention mechanism emphasizes interrogative pronoun features, enabling the model to capture long-range dependencies in medical question texts.
The strength of the DFF-SDQC model lies in its dual-channel architecture, which integrates embeddings of different granularities with the interrogative pronoun attention mechanism. This design effectively leverages the structural and semantic information present in questions, thereby enhancing classification accuracy. This study provides a novel perspective and methodology for question classification and contributes to advancing natural language processing research in the Kazakh language.
3.3.1. CINO pretrained model.
Pretrained language models are trained on large-scale unlabeled datasets to learn general linguistic knowledge or patterns, which can subsequently be fine-tuned for specific downstream tasks. CINO [37] (Chinese mINOrity pretrained language model) is a minority language pretrained model released by the Harbin Institute of Technology and iFLYTEK Joint Laboratory (HFL). CINO is further trained based on XLM-R and employs a method called Cross-Lingual Context Alignment, which leverages both monolingual and bilingual data while considering cross-lingual similarities and differences. CINO has been pretrained on seven languages, including Tibetan, Mongolian, Uyghur, Kazakh, Korean, Zhuang, and Cantonese. For Kazakh, which exhibits complex morphological features, CINO effectively addresses the issue of data sparsity. For a given question containing words, denoted as
, each word
is input into the CINO model. The input embeddings consist of two components: token embeddings representing the semantic vector of each word and position embeddings representing the word's position in the sequence:
After passing through N Transformer layers, the model produces contextualized word embeddings:
To further adapt these embeddings to downstream cross-lingual question understanding tasks, the CINO model is fine-tuned on the task-specific training dataset. Special tokens, such as [/s], which do not carry intrinsic semantic meaning, are processed by CINO to produce output representations that better capture the overall semantics of the input sequence. The resulting embeddings are then fed into a BiLSTM module for subsequent bidirectional contextual feature extraction.
3.3.2. BLSTM.
Long Short-Term Memory (LSTM) networks exhibit strong capabilities in sequential modeling, enabling the capture of contextual information across long distances. To effectively model word order and context-dependent features, a bidirectional LSTM (BiLSTM) layer is employed. In this layer, a weight-sharing mechanism is used to share parameters between question classification and named entity recognition tasks, thereby enhancing feature representations.
Specifically, the output embeddings from the CINO model, are fed into the BiLSTM to generate a sequence of hidden vectors
,
, encoding the contextual information of the entire question
into
. The forward and backward hidden states are computed as follows:
Here, and
represent the forward and backward hidden states, respectively;
denotes the LSTM model parameters, and
is the output vector of the BiLSTM layer, which captures the bidirectional contextual features for each word in the question sequence.
3.3.3. Construction of the Interrogative Pronoun Attention Matrix.
In question classification, interrogative pronouns exert a significant influence on task performance, aiding in the identification of question type, intent, and grammatical structure. In this study, an interrogative pronoun attention matrix is employed to enhance semantic information in questions, thereby improving the model’s ability to assess the impact of interrogative words. By integrating word embeddings with a diagonal attention matrix focused on interrogative pronouns, the model can identify which portions of a question are most relevant to the interrogative terms.
Construction of an Interrogative Pronoun Lexicon. Based on collected questions and Kazakh grammar references, a lexicon of 160 Kazakh interrogative pronouns was compiled. These pronouns are frequently used in sheep disease questions and exhibit distinct characteristics across different question types.
Identification of Interrogative Pronouns in Questions. Using the lexicon, interrogative pronouns are located within each question. For example, in the question “قويدىڭ سىرەسپەسى قانشالىق ۋاقىتتا جازىلادى؟” (“What is the treatment duration for sheep tetanus?”), the interrogative pronoun is “قانشالىق” (“how long”). First, word embeddings
are trained at the word level, as detailed in the following section. Then, a diagonal attention matrix
is computed to capture the semantic relevance between each word
and the interrogative pronoun
, defined as:
where denotes the inner product function and
is a learnable parameter. For each element in the diagonal attention matrix, the relative importance
of the word
with respect to eee is calculated as:
the attention-weighted word embeddings incorporating interrogative pronoun are computed as:
the resulting interrogative pronoun attention matrix is constructed as:
Integrating the interrogative pronoun attention mechanism enables the intent classification model to more accurately identify the segments of a question that exert the greatest influence on classification, thereby enhancing overall model performance.
3.3.4. CNN network with attention-based pooling.
Convolutional Neural Networks (CNNs) have been widely demonstrated to extract high-dimensional local features from raw data through convolution operations, which play a critical role in recognition and classification tasks. In this study, the output of the interrogative pronoun attention matrix, , is fed into a CNN for convolutional feature extraction. CNN employs convolutional kernels of sizes
,
,
to capture local information of varying granularities within the question. The convolution operation produces a new feature
, defined as:
where denotes the nonlinear activation function (ReLU is used in this study),
represents the convolution kernel with size
and word embedding dimension
, “
” denotes the dot product, and
is the resulting feature for the
convolution operation. For a question of length nnn, the convolutional feature vector is expressed as:
Typically, pooling operations are applied following convolution to reduce feature dimensionality and extract salient information. Conventional pooling methods, such as max pooling and average pooling, have inherent limitations: max pooling may discard non-maximal activations, resulting in information loss, while average pooling may dilute strong activation signals due to averaging. To address these issues, this study introduces attention-based pooling (Att-pooling) and Top_k pooling. These approaches effectively mitigate the loss of critical feature information. The detailed procedure is as follows:
Attention-based Pooling. Compute hidden units using the nonlinear activation function
:
where is the convolutional feature vector,
as a learnable context vector and
is the bias vector. Then calculate attention probabilities
using the Softmax function:
the attention-pooled value is then obtained by the weighted summation of hidden units:
the resulting feature vector is expressed as:
Top_k Pooling. On the basis of Att-pooling, the Top_k pooling selects the largest values as the final output features. Compared to traditional max pooling or average pooling, Top_k pooling preserves the most critical local convolutional features relevant for question classification. The computation is defined as:
where, is set to 2, and the final feature representation obtained via Top_k pooling is denoted as:
The combination of Att-pooling and Top_k pooling offers several advantages: first, Att-pooling reweights convolutional features according to their importance, enabling a more precise understanding of the question's intent. Compared to average pooling, Att-pooling better preserves high-intensity local features. Second, Top_k pooling retains more critical convolutional features than conventional max pooling, which is particularly beneficial for question classification. Finally, the combined pooling approach effectively minimizes the risk of information loss.
3.3.5. Dual-channel feature fusion.
The final classification layer of the question classification model employs a concatenation operation followed by a Softmax function to predict the question type. Specifically, the model integrates the outputs of a dual-channel hybrid neural network consisting of an attention-pooled CNN channel and a BiLSTM channel. The CNN channel produces a fixed-length feature vector through
pooling. In parallel, the BiLSTM processes the input sequence and generates a sequence of hidden states; to ensure dimensional consistency for feature fusion, a time-wise pooling operation is applied to the BiLSTM hidden state sequence, yielding a fixed-length semantic feature vector
The fused representation is then obtained by concatenating the two vectors:
where the concatenation is performed along the feature dimension. Finally, the Softmax function is applied to Y to compute the probability distribution over different question types. The Softmax operation is defined as:
where denotes the predicted probability vector of question categories in the DFF-SDQC model,
is the weight matrix of the fully connected layer, and
is the bias vector. The Softmax function allows for the computation of a normalized score distribution across categories, thereby enhancing classification discriminability. Furthermore, the cross-entropy loss function is employed to optimize the model parameters, defined as:
where represents the total number of training samples,
denotes the number of question categories,
is the ground-truth label of the
sample,
is the regularization coefficient, and
denotes the trainable parameters of the model. This loss formulation ensures both accurate classification and model regularization, promoting generalization performance.
4. Experiments and analysis
4.1. Datasets
4.1.1. D-SDQC.
The D-SDQC dataset was constructed in this study specifically for sheep disease question classification. It comprises seven primary categories: Disease Information, Symptoms, Diagnosis, Treatment, Prevention, Husbandry Management, and Emergency Care. Additionally, it includes 32 subcategories, such as Disease Definition, Etiology, Type, Disease Stage, Disease Symptoms, Symptom Trends, Symptom-Related Factors, Clinical Diagnosis, Required Diagnosis, Diagnostic Basis, Complications, Treatment Methods, Recommended Medication, Dosage, Medication Duration, Treatment Effectiveness, Preventive Measures, Vaccine Types, Vaccination Timing, Vaccination Methods, Vaccine Effectiveness, Vaccine Precautions, Environmental Sanitation, Feed and Water, Quarantine and Inspection, Seasonal Prevention, Preventive Guidelines, Husbandry Experience, Emergency Measures, Emergency Medication, Quarantine Measures, and Transmission Routes. To ensure a fair and reliable evaluation, the dataset was processed following a strict split-before-augmentation strategy. Specifically, the original question corpus was first divided into training, validation, and test sets at a ratio of 8:1:1. Subsequently, data augmentation techniques based on synonym substitution and paraphrase generation were applied only to the training set to enhance data diversity and improve model generalization ability. The validation and test sets consisted exclusively of original, non-augmented questions. After augmentation, the final dataset contained a total of 7,000 question instances, which were used for model training and evaluation.
4.1.2. D-TQC.
Given that Kazakh is a low-resource language, publicly available datasets are generally limited. To enhance experimental diversity and reliability, the tourism question classification dataset D-TQC [22] was employed to further evaluate the performance of the DFF-SDQC model and to conduct comparative experiments.
The D-TQC dataset consists of six primary categories: Entity, Time, Description, Numeric, Location, and Person. It includes 26 subcategories, such as Route, Specialty, Cuisine, Cultural Customs, Transportation, Climate Type, Accommodation, Time Period, Time Point, Reason, Abbreviation, Alias, Review, Distance, Price, Contact Information, Altitude, Area, Scenic Spot Level, Temperature, Administrative Region, Scenic Spot, Address, Specific Tasks, Personal Description, and Group/Organization. The dataset contains a total of 9,211 question instances. For the experiments, the D-TQC dataset was also split into training, validation, and test sets at an 8:1:1 ratio.
4.2. Baselines
To evaluate the performance of the proposed question classification model on both the self-constructed sheep disease question dataset (D-SDQC) and the tourism question classification dataset (D-TQC), several baseline models were selected for comparison.
- ●. SVM [20]: As a traditional machine learning algorithm, Support Vector Machines (SVM) have been widely applied in various text classification tasks. Wang Hua et al. were among the first to apply the SVM model for Kazakh text classification, achieving promising results.
- ●. CNN [21]: Shardal et al. employed Convolutional Neural Networks (CNN) to address Kazakh short-text classification. This approach first represents words or stems as vector embeddings using Word2Vec, followed by classification using a CNN architecture.
- ●. BiLSTM+CNN [12]: This model first extracts local features of the question using a CNN, followed by a BiLSTM network to capture global contextual information across the entire question.
- ●. BiGRU + CNN [4]: Utilizing a bidirectional Gated Recurrent Unit (BiGRU) network, this model captures semantic relationships within the question while a CNN is applied to enhance the understanding of local semantic features.
- ●. BiGRU + CNN + Att+Features [23]: By integrating various lexical and syntactic features, such as stems, affixes, part-of-speech tags, and phrase-level representations, this model effectively combines them with a BiGRU + CNN+Attention deep learning network.
- ●. BiGRU + CNN+INatt [22]: This model represents textual features using both word stem information and an interrogative-word attention mechanism, followed by BiGRU and CNN networks to extract deeper semantic features.
- ●. Transformer [37]: Based on the Multi-Head Attention mechanism, the Transformer model employs its encoder component to encode the semantic information of questions for classification purposes.
4.3. Implementation details
The DFF-SDQC model was implemented on the following hardware configuration: GPU acceleration library CUDA 10.1, CPU Intel Core i7, and graphics processing unit NVIDIA GeForce GTX 1660 Ti. The main model hyperparameters are summarized in Table 3.
4.4. Evaluation measures
The performance of the question classification models was evaluated using Precision, Recall, and F1-score metrics. Precision measures the proportion of correctly predicted instances among all instances predicted as positive for a given class, defined as:
Recall measures the proportion of correctly predicted instances among all actual positive instances, defined as:
F1-score provides a harmonic mean between Precision and Recall, balancing the trade-off between these two metrics, and is defined as:
4.5. Results and analysis
4.5.1. Experiment results on D-SDQC.
The performance of the proposed DFF-SDQC model was evaluated on the D-SDQC dataset and compared with several baseline models. The experimental results are presented in Table 4.
From the results, it can be observed that traditional machine learning methods, such as Support Vector Machine (SVM), perform significantly worse than deep learning models on the D-SDQC question classification dataset. This performance gap is primarily due to the independence assumption of words in statistical methods, which may fail to effectively capture contextual information in questions.
The CNN-based stem embedding representation slightly outperforms the word embedding method, indicating that effective morphological analysis and stem extraction can enhance text classification performance, particularly in agglutinative languages like Kazakh. BiGRU outperforms CNN, likely because BiGRU can better capture long-term dependencies and contextual information in the text. Moreover, the combined CNN-BiGRU model surpasses either single BiGRU or CNN models, demonstrating the advantage of integrating local feature extraction with sequential temporal features.
The Transformer baseline is a vanilla Transformer model trained from scratch on the D-SDQC dataset. While it leverages positional encoding and multi-head attention to capture contextual relations, its performance is limited by the relatively small dataset size and absence of pre-training.
In contrast, the proposed DFF-SDQC model employs CINO, a pre-trained transformer, as one channel of its dual-channel architecture. Combined with the CNN-based channel and the interrogative pronoun attention mechanism, DFF-SDQC effectively learns both word-level and sentence-level contextual semantic features. This results in a substantial performance improvement over all baselines. These results highlight the complementary benefits of pre-training and dual-channel design for domain-specific question classification in the sheep disease context.
Computational Cost and Efficiency. The proposed DFF-SDQC model employs a dual-channel feature fusion architecture, integrating a CINO encoder, BiLSTM, interrogative-pronoun attention, and a multi-scale CNN. While the CINO encoder captures rich Kazakh semantic representations, its parameters are fine-tuned, significantly reducing training overhead compared Transformer models. The BiLSTM module captures bidirectional contextual dependencies with complexity , while the CNN extracts multi-scale features efficiently, and the attention mechanism selectively emphasizes interrogative pronouns. Given the relatively short length of Kazakh questions, the overall computational cost remains manageable.
In terms of parameter size and inference efficiency, The dual channels allow parallel feature extraction, improving GPU utilization and inference speed. Compared to traditional models like SVM, DFF-SDQC incurs higher computation at inference but offers superior semantic modeling, while maintaining significantly lower cost and latency than Transformer-based classifiers. DFF-SDQC achieves a favorable balance between computational efficiency and classification performance, making it suitable for domain-specific Kazakh question classification tasks.
4.5.2. Experiment results on D-TQC.
The experimental results on the public dataset are presented in Table 5, where the DFF-SDQC model demonstrates excellent performance on the D-TQC dataset. Moreover, it can be observed that the proposed dual-channel feature fusion network, significantly contribute to improving the performance on the D-TQC Dataset. These findings validate the effectiveness of the DFF-SDQC model, indicating that it can achieve satisfactory results even under conditions of relatively limited data resources.
The experimental results on the public D-TQC Dataset, where the DFF-SDQC model demonstrates excellent performance on the D-TQC dataset. Moreover, it can be observed that the proposed dual-channel feature fusion network, significantly contribute to improving the performance on the D-TQC Dataset. These findings validate the effectiveness of the DFF-SDQC model, indicating that it can achieve satisfactory results even under conditions of relatively limited data resources.
4.5.3. Ablation Study.
To verify the contribution and collaborative effectiveness of each core module, ablation experiments were conducted on the D-SDQC and D-TQC datasets. The results are presented in Table 6 and Table 7 (where √ indicates that the module is enabled and × indicates that it is disabled). Among them, INatt denotes the interrogative word feature fusion module, CINO represents the CINO-based text feature representation module, and BiLSTM refers to the BiLSTM encoding module.
When all three modules—INatt, CINO, and BiLSTM—were enabled, the DFF-SDQC model achieved the best evaluation metrics and overall performance. This indicates that the three modules are non-conflicting and can effectively cooperate to enhance the model’s capability. When any one of the modules (INatt, CINO, or BiLSTM) was disabled, the performance of the DFF-SDQC model decreased to varying degrees. Specifically, disabling the INatt module led to a relatively small performance drop, while removing the BiLSTM module caused a more significant decline. These results demonstrate that the INatt, CINO, and BiLSTM modules complement each other well and collaboratively improve the model’s performance by enabling effective learning across different feature modalities, thereby substantially enhancing the overall model performance.
4.5.4. Experimental results on varying sentence lengths.
The majority of questions in the sheep disease domain dataset are relatively short. To evaluate whether incorporating a question-word attention mechanism can enhance classification performance for short texts, comparative experiments were conducted using question sets of different lengths. Specifically, the dataset was divided into two subsets based on the number of words per question: questions containing nine words or fewer were considered short questions, while questions containing ten words or more were classified as long questions.
From the results in Table 8, it can be observed that, this division allows for a systematic assessment of the model’s effectiveness across varying text lengths and provides insights into the impact of attention mechanisms on short versus long questions.
5. Conclusion
In this study, we proposed a Dual-Channel Feature Fusion Network for Sheep Diseases Question Classification (DFF-SDQC). The model leverages the CINO pre-trained language model and Interrogative Pronoun Attention to extract rich semantic representations of sheep disease-related questions. Subsequently, CNN and BiLSTM networks are employed to capture both local and global textual features of the questions. The extracted features are then integrated using a proposed feature fusion mechanism and fed into a fully connected layer for classification. Experimental evaluations were conducted on two datasets: a self-constructed sheep disease question classification dataset (D-SDQC) and a tourism question classification dataset (D-TQC). The results demonstrate that DFF-SDQC achieves high classification accuracy, outperforming several baseline models. Despite its effectiveness, the proposed model has certain limitations. Future work will focus on further refining the feature fusion mechanism to make it applicable to other models. Additionally, integrating domain-specific veterinary knowledge graphs or leveraging multimodal fusion techniques will be explored to further enhance the performance of DFF-SDQC on sheep disease question classification tasks. In terms of veterinary and livestock management applications, the proposed DFF-SDQC model can serve as a core component of intelligent question-answering systems by accurately classifying sheep disease related questions. This capability assists veterinarians and farm managers in handling inquiries from farmers. Moreover, the model is capable of understanding domain-specific terminology and contextual information in low-resource languages, which is particularly valuable in rural areas with limited access to professional veterinary resources.
References
- 1. Diefenbach D, Lopez V, Singh K, Maret P. Core techniques of question answering systems over knowledge bases: a survey. Knowl Inf Syst. 2018;55(3):529–69.
- 2.
Voorhees EM, Tice DM. The TREC-8 Question Answering Track Evaluation. NIST. 2000.
- 3. Liu Y, Yi X, Chen R, Zhai Z, Gu J. Feature extraction based on information gain and sequential pattern for English question classification. IET Software. 2018;12(6):520–6. https://doi.org/10.1049/iet-sen.2018.0006
- 4. Liu J, Yang Y, Lv S, Wang J, Chen H. Attention-based BiGRU-CNN for Chinese question classification. J Ambient Intell Human Comput. 2019.
- 5. Hovy E, Gerber L, Hermjakob U, Lin CY, Ravichandran D. Toward Semantics-Based Answer Pinpointing. In: Proceedings of the First International Conference on Human Language Technology Research (HLT 2001), 2001.
- 6. Brill E, Dumais S, Banko M. Association for Computational Linguistics, 2002. An Analysis of the AskMSR Question-Answering SystemC. 2002: 257–264.
- 7. Metzler D, Croft WB. Analysis of Statistical Question Classification for Fact-Based Questions. Inf Retrieval. 2005;8(3):481–504.
- 8. Zhang D, Lee WS. Question classification using support vector machines. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, 2003. 26–32.
- 9. Liu S, Liu S, Sha L, Zeng Z, Gašević D, Liu Z. Annotation Guideline-Based Knowledge Augmentation: Toward Enhancing Large Language Models for Educational Text Classification. IEEE Trans Learning Technol. 2025;18:619–34.
- 10. Kim Y. Convolutional Neural Networks for Sentence Classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. 1746–51.
- 11. Dachapally PR, Ramanam S. In-depth question classification using convolutional neural networks. arXiv. 2018.
- 12. Shi M, Yang Y, He L, Chen C. A community question answering sentence classification method based on Bi-LSTM and CNN with attention mechanism. Computer Systems Applications. 2018;27(09):157–62.
- 13. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. Roberta: A robustly optimized bert pretraining approach. 2019.
- 14. Pires T, Schlinger E, Garrette D. How multilingual is multilingual BERT? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019. 4996–5001.
- 15. Mrini K, Dernoncourt F, Yoon S, Bui T, Chang W, Farcas E, et al. A Gradually Soft Multi-Task and Data-Augmented Approach to Medical Question Understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP), 2021:1505–15.
- 16. Tohti T, Abdurxit M, Hamdulla A. Medical QA Oriented Multi-Task Learning Model for Question Intent Classification and Named Entity Recognition. Information. 2022;13(12):581.
- 17. Maulana A Y, Alfina I, Azizah K. Building Indonesian Dependency Parser Using Cross-lingual Transfer Learning C. 2022 International Conference on Asian Language Processing (IALP). 2022: 488–493.
- 18. Gong L, Yang J, Han S, Ji Y. MedBLIP: A multimodal method of medical question-answering based on fine-tuning large language model. Comput Med Imaging Graph. 2025;124:102581. pmid:40483830
- 19. Farahani AM, Adibi P, Ehsani MS, Hutter H-P, Darvishy A. Chart question answering with multimodal graph representation learning and zero-shot classification. Expert Systems with Applications. 2025;270:126508.
- 20. Wang H, Gulila A, Wu S. Kazakh text classification based on SVM. Computer Applications. 2010;30(6):1676–8.
- 21. Parhat S, Ablimit M, Hamdulla A. A Robust Morpheme Sequence and Convolutional Neural Network-Based Uyghur and Kazakh Short Text Classification. Information. 2019;10(12):387.
- 22. Haisa G, Altenbek G. Question classification based on weak supervision and interrogative pronouns attention mechanism. In: 2022 26th International Conference on Pattern Recognition (ICPR), 2022. 2273–8.
- 23.
Haisa G, Altenbek G, Aierzhati H, Kenzhekhan K. Research on classification of Kazakh questions integrate with multi-feature embedding. In: 2021. 943–7.
- 24.
Kuwanto G, Akyürek AF, Tourni IC, Li S, Jones A, Wijaya D. Low-resource machine translation training curriculum fit for low-resource languages. In Pacific Rim International Conference on Artificial Intelligence. 2023: 453–8). Singapore: Springer Nature Singapore.
- 25.
Li X, Li Z, Sheng J, Slamu W. Low-resource text classification via cross-lingual language model fine-tuning. Springer. Cham: Springer. 2020:231–46.
- 26.
Ding J. Practical grammar of modern kazakh language (vol. i & ii). Beijing: Minzu University of China Press. 2018.
- 27. Mandal SK. Diseases of sheep: causes, symptoms and treatment options. Farming Plan. 2023.
- 28. Taylor MA. Emerging parasitic diseases of sheep. Vet Parasitol. 2012;189(1):2–7. pmid:22525586
- 29. Alekish M, Ismail ZB. Common diseases of sheep (Ovis aries Linnaeus) and goats (Capra aegagrus hircus) in Jordan: A retrospective study (2015-2021). Open Vet J. 2022;12(6):806–14. pmid:36650874
- 30. Benavides J, González L, Dagleish M, Pérez V. Diagnostic pathology in microbial diseases of sheep or goats. Vet Microbiol. 2015;181(1–2):15–26. pmid:26275854
- 31. Silva J, Coheur L, Mendes AC, Wichert A. From symbolic to sub-symbolic information in question classification. Artif Intell Rev. 2010;35(2):137–54.
- 32. Ma G, Wang Z, Yuan Z, Wang X, Yuan B, Tao D. A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning. Int J Comput Vis. 2025;133(10):7368–405.
- 33. Ding K, Li R, Xu Y, Du X, Deng B. Adaptive data augmentation for mandarin automatic speech recognition. Appl Intell. 2024;54(7):5674–87.
- 34. Mohanad M, Al-Azani S. Enhancing Arabic NLP Tasks through Character-Level Models and Data Augmentation. In: Proceedings of the 31st International Conference on Computational Linguistics. Abu Dhabi, UAE. 2025:2744–57.
- 35.
Бизақов С, Болғанбаев Ә, Дәулетқұлов Ш. Қазақ тілінің синонимдер сөздігі. арыс. 2005.
- 36.
Abdukirim B. Chinese-Kazakh Synonyms Dictionary. Xinjiang Juvenile Press. 2012.
- 37. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30.