A transfer learning approach for automatic conflicts detection in software requirement sentence pairs based on dual encoders

Yizheng Wang; Tao Jiang; Jinyan Bai; Zhengbin Zou; Tiancheng Xue; Nan Zhang; Jie Luan

doi:10.1371/journal.pone.0344174

Abstract

Software Requirement Document (RD) typically contains tens of thousands of individual requirements, and ensuring consistency among these requirements is a critical prerequisite for the success of software engineering projects. Automated detection methods can significantly enhance efficiency and reduce costs; however, existing approaches still face several challenges, including low detection accuracy on imbalanced data, limited semantic extraction due to the use of a single encoder, and poor performance in cross-domain transfer learning. To address these issues, this paper proposes a Transferable Software Requirement Conflicts Detection Framework based on SBERT and SimSCE, termed TSRCDF-SS. First, the framework employs two independent encoders named Sentence-BERT (SBERT) and Simple Contrastive Sentence Embedding (SimCSE) to generate sentence embeddings for requirement pairs, followed by a six-element concatenation strategy. Furthermore, the classifier is enhanced by incorporating a two-layer fully connected, alongside a hybrid loss function optimization strategy for feedforward neural network (FFNN) that integrates a variant of Focal Loss, domain-specific constraints, and a confidence-based penalty term. Finally, the framework synergistically integrates sequential and cross-domain transfer learning. Experimental results demonstrate that, compared with other advanced classical methods, our framework achieves an improvement ranging from 4.9% to 12.1% in macro-F1 and weighted-F1 under non-cross-domain conditions, and an average enhancement of 6% in macro-F1 under optimal cross-domain scenarios.

Citation: Wang Y, Jiang T, Bai J, Zou Z, Xue T, Zhang N, et al. (2026) A transfer learning approach for automatic conflicts detection in software requirement sentence pairs based on dual encoders. PLoS One 21(3): e0344174. https://doi.org/10.1371/journal.pone.0344174

Editor: Sergio Consoli, European Commission, ITALY

Received: May 16, 2025; Accepted: February 14, 2026; Published: March 12, 2026

Copyright: © 2026 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data and code files are available from the figshare database (https://doi.org/10.6084/m9.figshare.29589251).

Funding: This research was funded solely by the National Natural Science Foundation of China (Grant No. 61363022). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Requirements Engineering (RE) is a core activity in software development, serving as the foundation for communication between developers, clients, and organizations [1,2]. Its primary responsibilities include documenting, identifying, analyzing, and managing requirements [3], which are commonly presented in the form of Requirement Specification (RS) documents. Due to the widespread applicability of natural language (NL) in RE [4–6], RS documents are usually written in NL and generally adopt standardized specification expression templates. This pattern has penetrated into various development projects across industries [6–8]. However, RS documents often suffer from issues such as requirement conflicts and redundancy, which pose significant challenges to the development and deployment of software systems. Specifically, a requirement conflict refers to a negative constraint relationship between two requirements, while requirement redundancy indicates that different formulations essentially refer to the same objective [9,10]. For example:

R1: The software is compiled from source code using a Java compiler. R2: The program is executed by a Python interpreter on the server.
R1 and R2 both concern the execution of code, but the implementation of R1 may hinder the realization of R2, and vice versa. Hence, a conflict exists between R1 and R2.
R3: The software is tested for quality assurance using automated tools. R4: Quality analysts use automated testing tools for software quality assurance.
R3 and R4 both require the use of automated tools for software testing. Implementing either requirement inherently satisfies the other, indicating a redundancy between R3 and R4.

Such defects increase project complexity and introduce obstacles to team collaboration. Therefore, effective management of requirements within the documentation and the definition of a complete, unambiguous, and conflict-free target system are of paramount importance [11,12]. The industry currently faces a dual dilemma: traditional manual inspection methods suffer from low efficiency and high error rates [9], while existing automated techniques are often constrained by rigid formatting requirements or dependencies on additional components, limiting their practical applicability. This situation underscores the urgent need for novel requirement management solutions.

Key challenges in software requirement conflict detection include:

Imbalanced data distribution: The inherent sparsity of requirement conflicts and redundancies in real-world scenarios leads to a highly skewed class distribution, making it difficult for models to effectively learn the features of minority classes and thus compromising their generalization capability [10].
Complex semantics and associative parsing barriers: Requirement texts often involve domain-specific terminology and complex semantic structures. Accurate conflict detection requires not only understanding explicit relationships between requirements but also reasoning about implicit connections based on contextual and domain knowledge [9].
Heterogeneity in expression: The diverse ways in which stakeholders articulate requirements, such as non-technical vocabulary or linguistic errors and so on, further complicate conflict detection [13].

To address these challenges, this paper proposes an automated detection framework based on a dual-encoder architecture combined with transfer learning strategies. First, the solution is based on the Transformer pre-trained language model BERT and SimCSE, and uses two independent encoders to vectorize and concatenate the demand sentence pairs. Additionally, the classifier is enhanced using a two-layer fully connected network and optimized via a hybrid loss function that integrates a variant of Focal Loss, domain-specific constraints, and a confidence penalty term. Finally, sequential and cross-domain transfer learning is synergistically integrated into the framework.

The main contributions of this paper are as follows:

This paper proposes a dual-encoder framework based on SBERT [14] and SimCSE [15], which utilizes two independent encoders to generate vector representations of requirement sentence pairs. A six-element concatenation strategy is employed to construct a feature representation system with multi-level semantic understanding capabilities, thereby enhancing the model’s sensitivity to semantic associations and its expressiveness in capturing conflicting relations.
We enhance the FFNN classifier by adopting a two-layer fully connected architecture. This two-layer fully connected design allows for progressive feature abstraction while maintaining moderate model complexity, achieving a balance between performance and efficiency in mid-scale tasks and facilitating richer semantic feature extraction.
We introduce a hybrid loss optimization strategy tailored for FFNN. By integrating a variant of Focal Loss, domain-specific constraints, and a confidence-based penalty term, the hybrid loss function dynamically adjusts the focusing parameters. This approach effectively addresses the limitations of traditional cross-entropy loss in handling hard samples, classification stability, and overfitting.
We construct a training mechanism that combines sequential and cross-domain transfer, taking into account the transfer efficiency of pre-training knowledge and the adaptability of target tasks, and improving the generalization performance of the model in heterogeneous domains and task change scenarios.

The structure of this paper is organized as follows: Section 2 reviews the related work. Section 3 presents the methodology adopted in this paper. Section 4 describes both the in-domain and cross-domain transfer experiments along with the corresponding results. Section 5 discusses potential threats to validity. Finally, Section 6 concludes the paper.

2 Related works

Since the 1980s, numerous researchers have explored innovative methods and technologies to tackle the complexity and variability of NL in RE. For instance, Abbott [16] extracted features from textual requirements based on syntactic patterns, while Aguilera and Berry [17] and Rolland and Proix [18] processed textual relationships in RS documents by identifying words, phrases, and semantic structures within sentences. Since the late 20th century, NLP technologies have been increasingly adopted in the RE domain [4], with researchers continually introducing novel approaches for a variety of RE tasks.

2.1 Conflict detection

Zhao et al. [4] conducted a survey of over 400 related studies and found that the majority of emerging NLP technologies and tools lack practical applicability. The research revealed a significant gap between recent advancements in NLP for RE abbreviated as NLP4RE and their actual deployment. Fischbach et al. [19], through the analysis of various embedded causal relationships, argued that automatically extracting causality from requirements facilitates semantic comparison. Linzen and Baroni [20] employed deep learning techniques for NLP tasks. They built a new causal relationship extraction method based on the NLP architecture of tree recursive neural network, and used it as a basis to detect requirements.

Guo et al. [9] introduced FSARC, a fine-grained semantic analysis-based conflict detector. They categorized contradictions into three main types and seven subcategories, and used heuristic rules and algorithms to identify semantic elements extracted via NLP techniques. However, their work did not address how to effectively identify conflicts based on these elements. Zhao et al. [4] pointed out that most research integrating NLP and RE relies heavily on laboratory validation or example-based evaluation, with a notable absence of industrial-scale empirical studies. Tian et al. generated difficult-to-detect adversarial samples through multi-label perturbation of LESSON and saliency guidance of EVADE, providing a reference for the construction of difficult examples and focusing strategies in demand conflict detection [21,22]. Importantly, existing research has yet to establish a machine learning classification framework specifically for requirement contradictions. Current achievements are largely concentrated on foundational classification tasks such as requirement type identification (e.g., distinguishing functional and non-functional requirements, security requirement detection) [23,24]. Yenugula et al. [25] demonstrate that privacy-preserving decision tree models can efficiently handle large-scale data, illustrating methods applicable to complex classify.

Gärtner et al. [26] investigated contradictions in RE environments, offering definitions, taxonomies, and identification methods, and developed a standardized semi-automated solution. Their method enables reviewers to identify contradictions related to the Law of Non-Contradiction without requiring deep familiarity with the RS document context. Subsequently, Gärtner et al. [27] further explored the detection of conditional sentences in RD, analyzing two NLP techniques for identifying conditional expressions and their role in RE. Their findings underscore the importance of conditional constructs in both requirement analysis and conflict detection. Building upon this, Gärtner et al. [28] proposed ALICE, which is called Automatic Logic for Identifying Contradictions in Engineering, which combines formal logic with large language models (LLMs) to detect engineering contradictions in RD. Despite the system’s high theoretical and practical value, it exhibits certain limitations: LLMs may generate false positives due to misuse of antonyms, and formal logic-based methods are vulnerable to disruption from filler words or punctuation errors. Additionally, the use of passive voice may further reduce contradiction detection accuracy.

Malik et al. [29,30] introduced SR-BERT, a deep learning framework based on Transformer architecture, designed for the automated detection of requirement conflicts and redundancies. SR-BERT can efficiently process the semantic relationship between requirement sentence pairs, thereby identifying conflicts and repetitions, while improving the adaptability and performance of the model through multi-stage refinement training. However, it fails to effectively solve the problem of uneven distribution of dataset labels, especially when conflicting and repeated demand sentence pairs are relatively scarce, the model may be affected by unbalanced data, resulting in overfitting and reduced accuracy.

2.2 Transfer learning

Transfer learning is a pivotal branch of machine learning that has attracted considerable attention in recent years. Its core principle is to transfer knowledge from one domain to a related but different domain in order to enhance model performance on new tasks. Pan and Yang [31] provided a systematic taxonomy of transfer learning methods, classifying them into instance-based, feature-based, parameter-based, and relation-based frameworks, thereby laying a theoretical foundation for subsequent work. In the NLP domain, transfer learning has markedly enhanced generalization and performance through large-scale pre-training and cross-domain knowledge transfer, overcoming challenges posed by data scarcity, domain discrepancies, and high computational costs. Numerous methods leveraging pre-trained language models have surpassed previous state-of-the-art benchmarks [32,33]. The success of AlexNet in the 2012 ImageNet competition not only revolutionized computer vision but also catalyzed the widespread adoption of transfer learning. Subsequent pre-trained architectures, such as VGG, ResNet and Inception, trained on massive datasets have been fine-tuned for downstream tasks, yielding substantial gains even in low-resource settings.

Yosinski et al. [34] empirically demonstrated that lower-layer representations exhibit higher transferability across tasks, whereas higher-layer features tend to be more task-specific. Pan and Yang‘s review [31] further clarified the definitions of domain and task in transfer learning, and outlined core concepts and classification schemes, though it did not deeply explore the integration of deep learning with transfer learning nor provide extensive empirical validation on NLP tasks. In 2018, Howard and Ruder [35] introduced the ULMFiT framework, achieving efficient transfer learning in NLP by employing an LSTM-based pre-trained language model. ULMFiT‘s three-stage strategy—general-domain pre-training, task-specific fine-tuning, and classifier fine-tuning. Its core contributions lie in the discriminative fine-tuning and gradual unfreezing techniques to balance feature updating and alleviate catastrophic forgetting. However, its fine-tuning hyperparameters lacked systematic optimization and thorough generalization analysis. Building on these advances, Malik et al. [29] designed the SR-BERT framework using a sequential transfer learning approach. Starting from a general pre-trained model, SR-BERT was first fine-tuned on a large-scale sentence-pair dataset to obtain an intermediate checkpoint, and then further fine-tuned on domain-specific software requirement pairs, enabling the model to capture specific patterns of conflicts and duplicates within the software requirements. Although this transfer learning approach demonstrates excellent performance on non-cross-domain and balanced datasets, it still requires further improvement in scenarios characterized by data scarcity or substantial domain divergence.

3 Methodology

3.1 Overall framework

In the task of requirement sentence pair detection, significant challenges persist, such as severely imbalanced label distributions within datasets and the limitations of single encoders in accurately capturing subtle semantic differences or complex logical relationships between sentences, and so forth. To improve detection accuracy, this paper extends the SR-BERT model by proposing an enhanced framework named TSRCDF-SS. The overall architecture is shown in Fig 1. First, the framework constructs a dual-encoder architecture by integrating two independent encoders and fine-tunes the encoders using a sequential transfer learning approach with a hierarchical K-fold strategy on the source domain dataset. Subsequently, the fine-tuned encoder is used to encode the requirement sentence pairs, and a six-element concatenation strategy across model layers is employed to fuse the two semantic representations, this design enables a complementary integration of semantic knowledge by fully leveraging the distinct encoding outcomes. Second, we obtain a classifier by training and optimizing a FFNN that adopts a nonlinear transformation architecture composed of two-layer fully connected and incorporates a hybrid loss function consisting of a Focal Loss variant, domain-specific constraint terms, and a confidence penalty term. The two-layer fully connected facilitates deeper semantic feature extraction, while the hybrid loss enables more comprehensive optimization of the class probability distribution. Finally, during the model transfer phase, we employ a cross-domain transfer learning strategy, transferring the dual encoders and classifier trained in the source domain to the target domain for conflict prediction on the target data. This combined strategy equips the model with enhanced adaptability and fault tolerance in complex scenarios involving both task evolution and domain shifts.

Download:

Fig 1. TSRCDF-SS structure diagram.

The TSRCDF-SS structure diagram includes a dual encoders, and the improved classifier combines sequential transfer and cross-domain transfer.

https://doi.org/10.1371/journal.pone.0344174.g001

3.2 Building dual encoders

Sentence embedding techniques encode semantic information into fixed-dimensional vectors to enhance the efficiency of NLP tasks. In the context of conflict detection, they offer advantages such as reduced computational cost, support for cross-domain transfer, and facilitation of model collaboration. However, mainstream encoders exhibit limited capacity to represent complex logical phenomena—such as semantic inversion—and the compression inherent in fixed-size vector spaces can result in the loss of fine-grained conflict features in long or syntactically complex sentences. Moreover, pre-training paradigms based on general corpora (e.g., BERT, SBERT) encounter domain-adaptation bottlenecks. Consequently, this approach still risks accuracy degradation when attempting to balance semantic representation strength against computational efficiency.

Compared to GPT-2 [36], FastText [37], and Universal Sentence Encode (USE) [38], SBERT and SimCSE offer significant benefits for sentence embedding tasks, including more precise semantic representations, higher computational efficiency, and broader applicability. GPT-2 is primarily optimized for text generation, whereas traditional word-embedding models such as FastText and USE neglect contextual semantics and thus struggle to capture logical contradictions in requirement pairs. Therefore, SBERT and SimCSE are better suited to scenarios with stringent demands on embedding quality.

As shown in Fig 2, the t-SNE dimensionality reduction reveals that the embeddings generated by SBERT form relatively tight local clusters in the projected space, indicating that its dual-tower siamese architecture effectively captures fine-grained semantic features, however, the boundaries between categories remain blurred, leading to some degree of overlap. FastText, which relies on statistical information at the subword level, tends to cluster morphologically similar but semantically unrelated words together, resulting in disordered clustering structures. The embeddings generated by GPT-2 exhibit strong domain-specific characteristics, which constrain their cross-domain generalization capabilities. SimCSE demonstrates an initial tendency toward cluster formation, with certain categories showing promising intra-cluster cohesion, but the boundaries between data points are not clear. The embeddings from USE display a mild layered structure, but the inter-class boundaries remain indistinct.

Download:

Fig 2. Encoder t-SNE dimensionality reduction projection.

Comparison of sentence embedding performance of different encoders and combined encoders using t-SNE dimensionality reduction projection. This visualization highlights the differences in encoding capabilities of different encoders. The dataset used is TRAINNLI.

https://doi.org/10.1371/journal.pone.0344174.g002

In contrast, the SBERT + SimCSE fusion model demonstrates the most favorable visualization performance. The embeddings generated by this model form distinct multi-cluster structures in the projected space, with reduced noise and more concentrated data coverage. Clear boundaries are observed between different categories, with most intra-class samples successfully clustered and inter-class samples well separated. These results indicate that the fusion strategy effectively integrates SBERT‘s robust semantic modeling capabilities with the contrastive learning strengths of SimCSE, resulting in more discriminative and structurally coherent semantic representations. Overall, the fusion approach exhibits superior quality in textual embeddings, highlighting its strong potential for application in requirement semantics modeling.

Therefore, this paper proposes a fusion sentence embedding framework based on a dual-encoder architecture, as illustrated in Fig 3. Specifically, during the encoding phase, the dual-channel structure enables parallel processing with isolated parameters to independently extract heterogeneous knowledge representations. SBERT, a discriminative model optimized from the BERT architecture, leverages pre-trained sentence-level relational knowledge to generate high-resolution semantic vectors. Its static encoding capabilities are well-suited for capturing fine-grained semantic features in software requirement texts, and achieve efficient alignment of the semantic space through the contrastive learning strategy of the siamese network. Simultaneously, SimCSE—representing the contrastive learning paradigm—employs an unsupervised dropout-based noise perturbation strategy to align positive samples and separate negative ones in latent space. This allows the generated embeddings to possess stronger resistance to contextual noise and improved inter-class discrimination. In the feature fusion stage, a cross-model hierarchical six-element concatenation strategy is employed to achieve collaborative optimization of the heterogeneous representations. Improve upon the three-element concatenation approach of the single-encoder SR-BERT framework proposed in reference [29], we construct a six-element concatenated vector by jointly exploiting SBERT and SimCSE embeddings. For each requirement sentence pair(R₁, R₂), the two sentences are independently encoded by each encoder, ensuring that the generated embeddings have consistent scaling. For SBERT, three interaction features are formed by concatenating the embedding of the first sentence, the embedding of the second sentence, and their element-wise difference. The same operation is applied to SimCSE embeddings, and the two sets of three-element vectors are further concatenated to produce a six-element concatenated vector, as defined in Eq (1). This construction creates a feature space that is orthogonally complementary. The concatenated vectors are fed into a multilayer perceptron classifier, where nonlinear transformations capture higher order semantic associations and produce a conflict probability distribution. Throughout this process, the sextuple concatenation not only enhances sensitivity to subtle semantic shifts but also facilitates discrimination between neutral and conflicting classes. The SBERT vector exhibits a Spearman correlation of 82% in the semantic similarity calculation task (STS-B benchmark) [14], while the unsupervised SimCSE improves the previous best average Spearman correlation by 4.2%, and the supervised SimCSE improves the best average Spearman correlation by 1.24% [15]. This shows that SimCSE can effectively improve the quality of clustering results when dealing with a small amount of annotated data.

(1)

Download:

Fig 3. Software requirement pair encoding model based on SBERT and SimCSE.

Use the SBERT model and SimCSE model to encode individual requirements to obtain their respective embeddings. Then these embeddings are fused to finally obtain the six-element concatenated embedding result.

https://doi.org/10.1371/journal.pone.0344174.g003

3.3 Improving the classifier and optimizing the loss function

This paper improves on the SR-BERT framework classifier in reference [29], and introduces a sentence pair classification model based on a two-layer fully connected FFNN. While a single fully connected layer offers structural simplicity and fewer parameters, its nonlinear expressive capacity is relatively limited, making it inadequate to fully capture complex semantic relationships in sentence pairs. Increasing the number of layers theoretically enhances the model’s fitting ability, but excessive depth often leads to gradient vanishing or explosion, parameter explosion, and higher computational complexity, thereby amplifying training difficulty and overfitting risks. To balance these trade-offs, we adopt a two-layer fully connected classification model. The model uses encoder-generated embedding vectors as initial representations and performs nonlinear transformation through two-layer fully connected: The first layer consists of 1,500 hidden units with ReLU activation, effectively expanding the feature space and enhancing representational richness. The second layer contains 1,000 hidden units, also utilizing ReLU, to progressively extract higher-order semantic features. Notably, we introduce asymmetric dropout rates of 0.2 and 0.3 between the two layers. This design, validated through grid search experiments, demonstrates that compared to symmetric dropout strategies, asymmetric dropout improves the F1 score on the validation set and alleviates covariate shift within deeper layers. The final output layer uses a Softmax activation function.

Regarding the loss function, although Binary Cross-Entropy (BCE) and Categorical Cross-Entropy (CCE) are commonly employed in classification tasks, both exhibit shortcomings under challenging conditions. BCE has obvious limitations in complex scenarios such as class imbalance, label noise, gradient saturation, and multi-label classification [39]. Meanwhile, CCE yields weak gradient signals for easily classified samples, limiting the model‘s ability to learn from difficult instances. To address this, this paper proposes a hybrid loss function optimization strategy for feedforward neural networks, which integrates a variant of focal loss, domain-specific constraints, and a confidence penalty term, to enhance the model‘s ability to handle hard samples and improve the smoothness of the output predictions.

Focal Loss [40] introduces a modulating factor , where denotes the predicted probability of the true class and γ is the focusing parameter. This mechanism reduces the contribution of easily classified examples and increases the focus on hard-to-classify instances, thereby improving the model‘s learning effectiveness for minority classes. The variant of Focal Loss is defined as shown in Eq (2), where C denotes the number of classes, represents the one-hot encoded ground truth label, is the predicted probability obtained via softmax, indicates the class weight, and γ is the focusing parameter controlling the weight of easy examples. The dynamic adjustment of γ is calculated using Eq (3), where is the initial value, η is a modulation factor, and denotes the validation accuracy, which can be treated as an externally provided scalar.

(2)

(3)

During training, the model may become overly confident in certain predictions, resulting in low-entropy distributions that typically lead to overfitting. To mitigate this, we introduce a confidence penalty term [41], computed as shown in Eq (4). This term penalizes overconfident predictions by incorporating the negative entropy of the output distribution, thereby enhancing the model‘s generalization on unseen data.

(4)

In specific application scenarios, prior domain knowledge may suggest that the matching degree of sentence pairs should follow certain distributional characteristics. To this end, we incorporate a domain-specific constraint term [42], as expressed in Eq (5). For instance, the Kullback-Leibler (KL) divergence is used to measure the discrepancy between the predicted class distribution and a predefined target distribution , guiding the model to produce outputs aligned with domain-specific priors and enhancing its performance on task-specific objectives. Here, represents the average predicted probability of each category in the current batch.

(5)

By combining the three components above, we define the Adaptive Focal Confidence Loss (AFC Loss) as shown in Eq (6), where α, β, λ are weighting coefficients. The AFC Loss comprehensively optimizes the sentence pair classification task and contributes to improved overall model performance.

(6)

3.4 Fusing sequential transfer and cross-domain transfer

Sequential transfer learning primarily focuses on the gradual accumulation and transfer of knowledge across a sequence of tasks, while cross-domain transfer learning aims to bridge the gap in data distribution and feature representation between different domains. If in a practical problem where tasks exhibit both temporal dependencies (i.e., requiring sequential learning) and domain heterogeneity (e.g., varying data acquisition conditions, task contexts, or data modalities), it is essential to design models that simultaneously address both temporal progression and domain disparity in order to effectively solve such problems.

In this paper, we integrate sequential transfer learning with cross-domain transfer learning. On one hand, sequential transfer learning enables the model to progressively accumulate and transfer knowledge from previous tasks, allowing for rapid adaptation and efficient updating when facing a series of new tasks. This approach also mitigates the issue of catastrophic forgetting, thereby maintaining continuity and stability in the model’s learned knowledge. On the other hand, cross-domain transfer learning allows the model to overcome discrepancies in data distributions or feature spaces between the source and target domains, facilitating the effective extraction and transfer of rich knowledge from the source domain to the target domain. By combining these two strategies, the proposed approach leverages the advantages of continuous knowledge updating offered by sequential transfer learning, while simultaneously addressing domain heterogeneity through cross-domain transfer learning, ultimately enhancing the model’s generalization capability and overall stability.

This paper extends the sequential transfer learning approach presented in reference [29] by proposing a unified process that integrates both sequential transfer learning and cross-domain transfer learning. The algorithm is shown in Fig 4. The input consists of a source domain requirement pair set and a target domain requirement pair set. First, both sets are preprocessed. Next, a pretrained encoder checkpoint is loaded. In Step 3, the training and testing sets are then divided using n-fold cross-validation. In Step 4, a portion (k/n) of the target domain requirement pairs is combined with the source domain requirement pairs to form a domain-adaptive training set. In Step 5, the encoder is fine-tuned based on the checkpoint and the constructed training set. Step 6 saves the updated encoder checkpoint. Step 7 encodes the testing set using the updated encoder. In Step 8, a classifier is trained on the encoded data, and in Step 9, the classifier checkpoint is saved. Finally, Step 10 outputs the classification results of the testing set. Steps 4–10 are repeated for n iterations to complete the cross-validation process.

Download:

Fig 4. The pseudocode of fusion algorithm of sequential transfer and cross-domain transfer.

https://doi.org/10.1371/journal.pone.0344174.g004

4 Experiment

4.1 Datasets

This paper integrates multiple publicly available and domain-specific requirement sentence pair datasets in the field of software engineering, encompassing tasks such as general requirement inference, domain-specific requirement analysis, and requirement conflict detection. The datasets used include TRAINNLI, CDN/CN, UAV, WorldVista, PURE, and OPENCOSS, where TRAINNLI and CDN/CN are balanced datasets, and UAV, WorldVista, PURE, and OPENCOSS are imbalanced. The characteristics of each dataset are detailed below:

1. TRAINNLI.

This repository provides a specialized Natural Language Inference data set [43], designed to optimize the performance of the language model for NLP problems in software development. The researchers manually examined various texts from the software development domain and established implicit relationships between different proposals. The texts encompass diverse sources, including software descriptions (e.g., Promise dataset, Pure dataset), user manuals for various software, operating system-related articles (e.g., Windows, official Mac documentation), databases (e.g., MongoDB, official Oracle documentation), cybersecurity (e.g., Mitre documentation), and AWS documentation. The dataset contains 500,000 sentence pairs, manually annotated and balanced across three labels: Entailment, Contradiction, and Neutral.

To align with our research requirements, we preprocessed the dataset by retaining the “gold_label” “sentence1” and “sentence2” fields. For consistency in the label, the original “Entailment” and “Contradiction” were renamed to “Duplicate” and “Conflict” respectively. Given the large scale of the dataset, we partitioned TRAINNLI into smaller subsets for experimental convenience. For instance, TRAINNLI(30000) denotes a randomly sampled subset of 30,000 entries, with similar subsets including TRAINNLI(20000), TRAINNLI(10000)1, TRAINNLI(10000)2 and TRAINNLI(10000)3.

2. CDN/CN.

The CDN/CN dataset was curated by Malik et al. [29], sourced from IBM-DOORS. The CDN dataset includes three categories: Conflict, Duplicate, and Neutral, with Conflict and Duplicate pairs dominating and Neutral pairs limited to half of the total. The CN dataset is a simplified version that excludes Duplicate pairs.

Additional domain-specific datasets used in this paper include: UAV Corpus [44] developed by the University of Notre Dame, which covers requirements related to flight control, sensor integration, and safety protocols for unmanned aerial vehicles. WorldVista EHR Corpus is derived from RDs of medical systems, featuring requirements on patient data management and standardization of clinical workflow. PURE Benchmark Corpus [45] which aggregates requirement sentences from 79 public RDs. The OPENCOSS Corpus sourced from the European Open Platform for Safety Certification project, focuses on embedded system safety certification and is characterized by significant class imbalance. Combining these four cross-domain requirements datasets, the overall ratio of Conflict to Neutral samples is approximately 1:367, with the OPENCOSS dataset exhibiting the most severe imbalance, with a Conflict to Neutral ratio of 1:678. The versions used in the above four datasets are all compiled by the Malik research group [29].

For balanced datasets, we adopt 5-fold cross-validation to ensure reliable performance estimation. For imbalanced datasets, we use 3-fold cross-validation to mitigate the impact of skewed label distribution during training and evaluation.

4.2 Experimental implementation

The experimental computing platform was built based on an Intel(R) Core(TM) i7-14700KF CPU and an NVIDIA GeForce RTX 4070 Ti SUPER GPU with 16GB of VRAM. The experimental development environment was centered around Ubuntu 20.04.1 and Python 3.9.21, integrating Pandas for data loading and preprocessing, NumPy for numerical computation, and Scikit-learn 1.6.0 for model evaluation and calibration tools. Deep learning model construction and optimization utilize PyTorch 1.13.0 + cu117, TensorFlow 2.10.0, and Keras 2.10.0. The pre-trained language models Sentence-BERT and SimCSE are loaded and fine-tuned using the Hugging Face Transformers library. The core parameter settings of the code used are detailed in Table 1.

Download:

Table 1. Operating parameters of the model used in the experiment.

https://doi.org/10.1371/journal.pone.0344174.t001

4.3 Experimental results and analysis

4.3.1 Dual encoders comparison experiment.

To assess the effectiveness of adopting a dual-encoder strategy for sentence embedding, we compared SBERT with multiple encoder combinations on the TRAINNLI (30,000) dataset. As shown in Fig 5, the SBERT + SimCSE configuration achieves the highest performance across all evaluation metrics. Specifically, both the macro F1 score and weighted F1 score reach 89.6%, which is an absolute improvement of 7.0% over the SR-BERT baseline. This improvement is consistently reflected in precision and recall, indicating a stable performance gain rather than a metric-specific trade-off. Other encoder combinations show either marginal improvements or inferior performance, demonstrating clear performance differences among alternative dual-encoder designs.

Download:

Fig 5. Comparison of encoder combination results.

Precision, recall and F1 of different encoder combination experiments on TRAINNLI (30000) dataset.

https://doi.org/10.1371/journal.pone.0344174.g005

In addition, the close agreement between macro-averaged and weighted results across all models indicates that the observed performance trends are stable across different evaluation perspectives, rather than being dependent on a specific averaging strategy. These results empirically demonstrate that introducing a dual-encoder architecture, when using appropriately matched encoders, can consistently enhance sentence embedding performance compared to a single-encoder baseline in the considered setting.

4.3.2 Classifier comparison experiment.

In the original SR-BERT model, the classifier composed of a FFNN only had a single fully connected layer. While a single layer has a simple structure and fewer parameters, its ability to express non-linearity is relatively limited, making it difficult to fully capture the complex semantic relationships in sentence pairs. To investigate the effect of classifier depth, we evaluate a FFNN with one to four fully connected layers while keeping all other settings unchanged. As shown in Fig 6, increasing the depth from one to two layers leads to a consistent but modest improvement in both macro-averaged and weighted-averaged F1 scores (approximately +0.4%). However, further increasing the number of layers does not yield additional gains and even results in slight performance degradation.

Download:

Fig 6. Comparison of results of fully connected layers with different numbers of layers.

Results of comparative experiments on different numbers of FFNN layers on the TRAINNLI (30000) dataset.

https://doi.org/10.1371/journal.pone.0344174.g006

This behavior can be attributed to the role of the FFNN in our framework. Given high-quality sentence embeddings from pretrained dual encoders, a single-layer FFNN mainly performs linear reweighting and is insufficient to capture nonlinear interactions in implicit requirement conflicts. Adding a second layer introduces a minimal yet effective nonlinear transformation that improves feature abstraction with limited additional complexity. In contrast, deeper FFNNs bring more parameters without meaningful representational gains under the given data scale, resulting in overfitting and higher computational cost. Therefore, the two-layer FFNN represents a balanced trade-off between performance, efficiency, and model stability.

To further substantiate this observation from a representation perspective, Fig 7 presents t-SNE visualizations of feature embeddings at different stages of the FFNN classifier on the CDN dataset. The original input embeddings show substantial inter-class overlap, indicating limited class-discriminative structure. After one fully connected layer, the embeddings become more organized, but noticeable overlap between classes still exists. In contrast, the two-layer FFNN produces embeddings with improved intra-class compactness and clearer inter-class separation. This progressive refinement of the embedding space provides intuitive evidence that the second hidden layer effectively enhances feature abstraction, which explains the consistent performance improvement observed when increasing the classifier depth from one to two layers.

Download:

Fig 7. t-SNE visualization of feature embeddings on the CDN dataset at different stages of the FFNN classifier.

(a) input embeddings before the fully connected layers, (b) embeddings after one fully connected layer, and (c) embeddings after two fully connected layers.

https://doi.org/10.1371/journal.pone.0344174.g007

Furthermore, we conducted a comparative experiment on the loss function based on the model of two-layer fully connected layers, using Binary Cross-Entropy and AFC Loss under the same conditions, using the TRAINNLI (30000) dataset. The experimental results are shown in Fig 8. As illustrated clearly in the bar chart, AFC Loss achieves higher values across all three major metrics—Precision, Recall, and F1-score—with a particularly notable improvement in the F1-score. This validates its applicability and advantage in multi-class mutually exclusive classification tasks, and aligns well with the theoretical analysis.

Download:

Fig 8. Comparison of results of different loss functions.

The results of the experiment using a two-layer FFNN and different loss functions on the TRAINNLI (30000) dataset.

https://doi.org/10.1371/journal.pone.0344174.g008

Experimental results show that the performance with AFC Loss outperforms the performance with Binary Cross-Entropy.

4.3.3 Dual encoders and classifier experiment.

In the previous experiments, we separately optimized and compared the encoder and classifier. The goal of this section is to combine both optimized components to evaluate the overall performance improvement on the requirement conflict detection task. Table 2 presents the evaluation results from experiments conducted on the TRAINNLI(30000) and CDN/CN datasets. We ran the joint optimized model on these datasets and compared its performance with the independently optimized encoder or classifier models. Overall, the combination of the improved encoder and classifier (Improved Encoder + Improved Classifier) achieved the best results across all datasets. Specifically, on the TRAINNLI dataset, both Macro-F1 and Weighted-F1 reached 91.2%, which is an improvement of approximately 8.6% over the baseline SR-BERT (82.6%). On the CDN dataset, the F1 scores improved to 94.3% and 96.1%, respectively. On the CN dataset, the improvement was more limited as the baseline model was already nearing saturation. Furthermore, optimizing the encoder alone (F1 improved from 82.6% to 89.6%) was significantly more effective than optimizing the classifier alone (F1 increased to 83.2%), indicating that high-quality sentence embeddings are crucial for requirement conflict detection. Based on the above analysis, this experiment confirms that the optimization of the encoder contributes the most to performance improvement, while the optimization of the classifier further enhances the model‘s discriminative capability.

Download:

Table 2. The improvement effect of different datasets on the improved model is evaluated, and ablation comparison experiments are performed on the TRAINNLI (30000) dataset (Unit:%).

https://doi.org/10.1371/journal.pone.0344174.t002

Table 3 compares our model with other models on the TRAINNLI(30000) dataset in a non-cross-domain scenario. The combination of the improved encoder and classifier performed best, with an F1 score of 91.2%, which is about a 5% improvement over BERT-MNLI‘s 86.3%, demonstrating strong overall discriminative power and balanced performance across classes. Compared to traditional shallow models such as TF-IDF + SVM and Okapi BM25 + MLP, our framework exhibits clear advantages in semantic modeling and feature abstraction, indicating that sentence-level pre-trained models are better suited to the requirement conflict detection task. These experimental results fully validate the effectiveness of the proposed model design for requirement conflict detection.

Download:

Table 3. Performance evaluation results on the TRAINNLI (30000) dataset (Unit:%).

https://doi.org/10.1371/journal.pone.0344174.t003

4.3.4 Cross-domain transfer experiment.

This section of the experiment utilizes TSRCDF-SS for cross-domain transfer learning on requirement sentence pairs. By leveraging the correlation between the source and target domains, the model’s ability to generalize across data from different domains is enhanced, thereby further optimizing the overall performance of requirement conflict detection. Table 4 presents the evaluation results for cross-domain training on different dataset combinations.

Download:

Table 4. Evaluation of cross-domain models trained with different combinations of requirement pair datasets (Unit:%).

https://doi.org/10.1371/journal.pone.0344174.t004

As shown in Table 4, for binary classification tasks, the weighted precision, recall, and F1 scores consistently approach 99.8%, indicating that the model achieves highly stable performance on the majority class across different cross-domain configurations. In contrast, the macro-averaged recall remains substantially lower, revealing a pronounced discrepancy between macro and weighted evaluation metrics. This phenomenon suggests that overall performance alone is insufficient to fully characterize model behavior in cross-domain conflict detection. Compared to the cross-domain experiments conducted by Malik et al., all evaluation metrics show varying degrees of improvement, when the target domain dataset is OPENCOSS, the macro F1 score average increased by 6%. For the three-class classification tasks on TRAINNLI and CDN, the F1 scores demonstrate strong performance, with a balanced distribution across both weighted and macro averages.

A closer inspection indicates that the degradation in macro averaged recall is mainly driven by the Conflict class rather than by a uniform decline across all classes. In cross-domain transfer settings, the model tends to adopt a conservative strategy for conflict prediction, effectively suppressing false positives while producing a relatively high number of false negatives. As a result, the Conflict class exhibits high precision but noticeably lower recall. This behavior reflects the intrinsic difficulty of generalizing conflict semantics across domains, where the linguistic realizations and contextual dependencies of requirement conflicts vary substantially, and the limited number of conflict samples in the source domain further restricts the effectiveness of transfer learning. In this context, the hybrid loss function in TSRCDF-SS primarily plays a stabilizing role by encouraging clearer decision boundaries and mitigating overfitting to domain-specific patterns, which helps maintain strong weighted performance and prevents excessive false-positive predictions. However, it is not explicitly designed to counter severe cross-domain semantic shifts or to rebalance minority-class recall, and therefore cannot fully recover Conflict-class recall in strict cross-domain transfer scenarios.

To further substantiate this observation, Fig 9 reports representative confusion matrices for two target domains. Fig 9.(a) shows that For WorldVista, which exhibits moderate class imbalance, the model correctly identifies a subset of Conflict instances but still produces a non-negligible number of false negatives, indicating that conflict recognition remains challenging even under relatively mild domain shift. Fig 9.(b) shows that for the OPENCOSS dataset, due to extreme class imbalance and significant semantic differences, the number of false negative conflict predictions increases significantly, making the identification task even more difficult. In both cases, false-positive Conflict predictions are largely suppressed, which explains the consistently high weighted metrics observed in Table 4, while the macro-averaged recall remains low. These results indicate that, although the proposed framework maintains stable overall performance, conflict identification remains the primary bottleneck in cross-domain transfer learning.

Download:

Fig 9. Representative confusion matrices for cross-domain requirement conflict detection.

These two examples correspond to (a) the WorldVista case, where the target domain data distribution is moderately imbalanced, and (b) the OPENCOSS case, where it is extremely imbalanced. These confusion matrices are reported to illustrate typical patterns under cross-domain transfer rather than to exhaustively characterize all source–target configurations.

https://doi.org/10.1371/journal.pone.0344174.g009

5 Discuss

5.1 Threats to Validity

This paper adopts a dual-encoder architecture and transfer learning strategies to enhance the accuracy and generalization capability of requirement conflict detection tasks. However, there are still potential threats to the effectiveness of the proposed approach:

Imbalanced Dataset Impact: Some datasets (e.g., UAV, WorldVista, PURE, OPENCOSS) suffer from class imbalance, resulting in lower recall for the minority classes, which in turn affects the performance of macro metrics. Although weighted loss functions and transfer learning strategies have been employed to solve this issue, the detection of minority class requirement conflicts may still be insufficient.
Cross-domain Transfer Generalization: This paper adopts a method that combines sequential transfer learning with cross-domain transfer learning to adapt to demand conflict detection tasks in different fields. However, semantic differences between different domains may limit the effectiveness of feature transfer. For instance, the requirement expressions in the medical domain such as WorldVista and embedded systems such as OPENCOSS differ significantly, which could affect the model’s adaptability to new domains.
Limitations in Experimental Setup: The experiments were primarily conducted on several well-known publicly available datasets, which may introduce dataset-specific biases and affect the generalizability of the results. Furthermore, due to limitations in experimental environment and computational resources, the paper did not test on large-scale datasets, which may affect the comprehensive evaluation of requirement conflict detection capabilities in real-world industrial environments.

5.2 Limitations and future work

In addition to the aforementioned threats to validity, the current TSRCDF-SS framework primarily relies on the semantic representation capabilities of pre-trained language models, without incorporating structured knowledge such as domain ontologies, standardized terminologies, or logical rules. As a result, it struggles to accurately identify complex requirements involving conditional statements, reasoning chains, or logical constraints. Moreover, the model‘s predictions lack interpretability, making it difficult to provide clear explanations or locate the underlying causes of conflicts, which limits its practicality and reduces user trust. Future research will focus on enhancing the model‘s robustness to imbalanced data, introducing large language models to support hybrid reasoning for improved interpretability, integrating domain knowledge and knowledge graphs to deepen semantic modeling, and extending the task to multi-label conflict type recognition for more fine-grained detection. Furthermore, efforts will be made to develop explainable and interactive tool systems to facilitate the practical deployment of this method in real-world software engineering projects.

6 Conclusion

This paper proposes an automatic requirement conflict detection framework combining SBERT and SimCSE dual encoders and introduces a transfer learning strategy to explore its applicability in cross-domain scenarios. Experimental results show that this framework has certain advantages in overall performance and training stability. The dual-encoder structure helps enhance the ability to identify semantic relationships between requirement sentence pairs under same-domain or similar-domain conditions, while the two-level fully connected structure and mixed loss function mitigate the imbalance during the training phase and reduce the framework’s over-reliance on the majority class. In the cross-domain transfer setting, sequential transfer and cross-domain joint strategies improve the model’s adaptability to data from unknown domains to some extent. However, the results also indicate that, limited by the differences in conflict semantic distribution and the scarcity of conflict samples in the source domain, the conflict class remains the main challenge in cross-domain detection. Overall, this work provides an extensible modeling approach for requirement conflict detection and reveals issues that require further research in cross-domain conflict semantic modeling.

References

1. Loucopoulos P. In: Clarkson J, Eckert C, editors. Requirements engineering. London: Springer London; 2005. p. 116–39. http://dx.doi.org/10.1007/978-1-84628-061-0_5
2. Gericke K, Blessing L. An analysis of design process models across disciplines. In: Marjanovic D, Storga M, Pavkovic N, Bojcetic N, editors. DESIGN 2012: Proceedings of the 12th International Design Conference. Zagreb: Fac. of Mechanical Engineering and Naval Architecture. 2012. p. 171–80.
3. Ingenieure VD. Design of technical products and systems-model of product design. Düsseldorf, Germany: VDI Verein Deutscher Ingenieure eV. 2019.
4. Zhao L, Alhoshan W, Ferrari A, Letsholo KJ, Ajagbe MA, Chioasca E-V, et al. Natural Language Processing for Requirements Engineering. ACM Comput Surv. 2021;54(3):1–41.
- View Article
- Google Scholar
5. Umar MA, Lano K. Advances in automated support for requirements engineering: a systematic literature review. Requirements Eng. 2024;29(2):177–207.
- View Article
- Google Scholar
6. Pasham SD. Managing Requirements Volatility in Software Quality Standards: Challenges and Best Practices. International Journal of Modern Computing. 2024;7(1):123–40.
- View Article
- Google Scholar
7. Dick J, Hull E, Jackson K. Requirements Engineering. Springer International Publishing. 2017. https://doi.org/10.1007/978-3-319-61073-3
8. Wiegers K, Hokanson C. Software requirements essentials: Core practices for successful business analysis. Addison-Wesley Professional. 2023.
9. Guo W, Zhang L, Lian X. Automatically detecting the conflicts between software requirements based on finer semantic analysis. 2021. https://doi.org/arXiv:2103.02255
10. Malik G, Cevik M, Basar A, Parikh D. Supervised Semantic Similarity-based Conflict Detection Algorithm: S3CDA. 2025. https://doi.org/arXiv:2206.13690
11. Bender B, Gericke K. Pahl/Beitz Konstruktionslehre. Springer Berlin Heidelberg. 2021. https://doi.org/10.1007/978-3-662-57303-7
12. Göhlich D, Fay TA. Arbeiten mit anforderungen: requirements management. Pahl/Beitz Konstruktionslehre: Methoden und Anwendung erfolgreicher Produktentwicklung. Springer. 2021. p. 211–29. https://doi.org/10.1007/978-3-662-57303-7
13. Pohl K. Requirements engineering fundamentals: a study guide for the certified professional for requirements engineering exam-foundation level-IREB compliant. Rocky Nook, Inc. 2016.
14. Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019. 3980–90. https://doi.org/10.18653/v1/d19-1410
15. Gao T, Yao X, Chen D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021. 6894–910. https://doi.org/10.18653/v1/2021.emnlp-main.552
16. Abbott RJ. Program design by informal English descriptions. Commun ACM. 1983;26(11):882–94.
- View Article
- Google Scholar
17. Aguilera C, Berry DM. The use of a repeated phrase finder in requirements extraction. Journal of Systems and Software. 1990;13(3):209–30.
- View Article
- Google Scholar
18. Rolland C, Proix C. A natural language approach for Requirements Engineering. Notes on Numerical Fluid Mechanics and Multidisciplinary Design. Springer International Publishing. 1992. p. 257–77. https://doi.org/10.1007/bfb0035136
19. Fischbach J, Hauptmann B, Konwitschny L, Spies D, Vogelsang A. Towards Causality Extraction from Requirements. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), 2020. 388–93. https://doi.org/10.1109/re48521.2020.00053
20. Linzen T, Baroni M. Syntactic Structure from Deep Learning. Annu Rev Linguist. 2021;7(1):195–212.
- View Article
- Google Scholar
21. Tian J, Shen C, Wang B, Xia X, Zhang M, Lin C, et al. LESSON: Multi-Label Adversarial False Data Injection Attack for Deep Learning Locational Detection. IEEE Trans Dependable and Secure Comput. 2024;21(5):4418–32.
- View Article
- Google Scholar
22. Tian J, Shen C, Wang B, Ren C, Xia X, Dong R, et al. EVADE: Targeted Adversarial False Data Injection Attacks for State Estimation in Smart Grid. IEEE Trans Sustain Comput. 2025;10(3):534–46.
- View Article
- Google Scholar
23. Kurtanovic Z, Maalej W. Automatically Classifying Functional and Non-functional Requirements Using Supervised Machine Learning. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), 2017. 490–5. https://doi.org/10.1109/re.2017.82
24. Jindal R, Malhotra R, Jain A. Automated classification of security requirements. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2016. 2027–33. https://doi.org/10.1109/icacci.2016.7732349
25. Yenugula M, Kasula VK, Yadulla AR, Konda B, Addula SR, Kotteti CMM. Privacy-Preserving Decision Tree Classification Using Homomorphic Encryption in IoT Big Data Scenarios. In: 2025 IEEE 4th International Conference on Computing and Machine Intelligence (ICMI), 2025. 1–6. https://doi.org/10.1109/icmi65310.2025.11141083
26. Gärtner AE, Fay T-A, Göhlich D. Fundamental Research on Detecting Contradictions in Requirements: Taxonomy and Semi-Automated Approach. Applied Sciences. 2022;12(15):7628.
- View Article
- Google Scholar
27. Gärtner AE, Göhlich D, Fay T-A. AUTOMATED CONDITION DETECTION IN REQUIREMENTS ENGINEERING. Proc Des Soc. 2023;3:707–16.
- View Article
- Google Scholar
28. Gärtner AE, Göhlich D. Automated requirement contradiction detection through formal logic and LLMs. Autom Softw Eng. 2024;31(2).
- View Article
- Google Scholar
29. Malik G, Yildirim S, Cevik M, Bener A, Parikh D. Transfer learning for conflict and duplicate detection in software requirement pairs. In: 2024. https://arxiv.org/abs/2301.03709
30. Malik G, Cevik M, Başar A. Data Augmentation for Conflict and Duplicate Detection in Software Engineering Sentence Pairs. In: Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering, 2023. 34–43. https://doi.org/10.5555/3615924.3615928
31. Pan SJ, Yang Q. A Survey on Transfer Learning. IEEE Trans Knowl Data Eng. 2010;22(10):1345–59.
- View Article
- Google Scholar
32. Devlin J, Chang M-W, Lee K, Toutanova K. In: Proceedings of the 2019 Conference of the North, 2019. 4171–86. https://doi.org/10.18653/v1/n19-1423
33. He P, Liu X, Gao J, Chen W. DeBERTa: Decoding-enhanced BERT with Disentangled Attention; 2021. Available from: https://arxiv.org/abs/2006.03654
34. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks?. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, editors. Advances in Neural Information Processing Systems. Curran Associates, Inc. 2014.
35. Howard J, Ruder S. Universal Language Model Fine-tuning for Text Classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018. 328–39. https://doi.org/10.18653/v1/p18-1031
36. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI blog. 2019.
37. Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of Tricks for Efficient Text Classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017. 427–31. https://doi.org/10.18653/v1/e17-2068
38. Cer D, Yang Y, Kong S, Hua N, Limtiaco N, John RS. Universal Sentence Encoder. In: 2018. https://doi.org/arXiv:1803.11175
39. Murphy KP. Machine learning: a probabilistic perspective. MIT Press. 2012.
40. Lin T-Y, Goyal P, Girshick R, He K, Dollar P. Focal Loss for Dense Object Detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 2999–3007. https://doi.org/10.1109/iccv.2017.324
41. Lu Y, Bo Y, He W. Confidence Adaptive Regularization for Deep Learning with Noisy Labels. 2021. https://doi.org/arXiv:2108.08212
42. Ghimire S, Masoomi A, Dy J. Reliable estimation of kl divergence using a discriminator in reproducing kernel hilbert space. Advances in Neural Information Processing Systems. 2021;34:10221–33.
- View Article
- Google Scholar
43. Natural Language Inference Dataset for Software Engineering. In: 2023. https://doi.org/10.5281/zenodo.8025041
44. Cleland-Huang J, Vierhauser M, Bayley S. In: 2018. 109–12.
45. Ferrari A, Spagnolo GO, Gnesi S. PURE: A Dataset of Public Requirements Documents. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), 2017. 502–5. https://doi.org/10.1109/re.2017.29

[ref1] 1. Loucopoulos P. In: Clarkson J, Eckert C, editors. Requirements engineering. London: Springer London; 2005. p. 116–39. http://dx.doi.org/10.1007/978-1-84628-061-0_5

[ref2] 2. Gericke K, Blessing L. An analysis of design process models across disciplines. In: Marjanovic D, Storga M, Pavkovic N, Bojcetic N, editors. DESIGN 2012: Proceedings of the 12th International Design Conference. Zagreb: Fac. of Mechanical Engineering and Naval Architecture. 2012. p. 171–80.

[ref3] 3. Ingenieure VD. Design of technical products and systems-model of product design. Düsseldorf, Germany: VDI Verein Deutscher Ingenieure eV. 2019.

[ref4] 4. Zhao L, Alhoshan W, Ferrari A, Letsholo KJ, Ajagbe MA, Chioasca E-V, et al. Natural Language Processing for Requirements Engineering. ACM Comput Surv. 2021;54(3):1–41.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref5] 5. Umar MA, Lano K. Advances in automated support for requirements engineering: a systematic literature review. Requirements Eng. 2024;29(2):177–207.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref6] 6. Pasham SD. Managing Requirements Volatility in Software Quality Standards: Challenges and Best Practices. International Journal of Modern Computing. 2024;7(1):123–40.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref7] 7. Dick J, Hull E, Jackson K. Requirements Engineering. Springer International Publishing. 2017. https://doi.org/10.1007/978-3-319-61073-3

[ref8] 8. Wiegers K, Hokanson C. Software requirements essentials: Core practices for successful business analysis. Addison-Wesley Professional. 2023.

[ref9] 9. Guo W, Zhang L, Lian X. Automatically detecting the conflicts between software requirements based on finer semantic analysis. 2021. https://doi.org/arXiv:2103.02255

[ref10] 10. Malik G, Cevik M, Basar A, Parikh D. Supervised Semantic Similarity-based Conflict Detection Algorithm: S3CDA. 2025. https://doi.org/arXiv:2206.13690

[ref11] 11. Bender B, Gericke K. Pahl/Beitz Konstruktionslehre. Springer Berlin Heidelberg. 2021. https://doi.org/10.1007/978-3-662-57303-7

[ref12] 12. Göhlich D, Fay TA. Arbeiten mit anforderungen: requirements management. Pahl/Beitz Konstruktionslehre: Methoden und Anwendung erfolgreicher Produktentwicklung. Springer. 2021. p. 211–29. https://doi.org/10.1007/978-3-662-57303-7

[ref13] 13. Pohl K. Requirements engineering fundamentals: a study guide for the certified professional for requirements engineering exam-foundation level-IREB compliant. Rocky Nook, Inc. 2016.

[ref14] 14. Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019. 3980–90. https://doi.org/10.18653/v1/d19-1410

[ref15] 15. Gao T, Yao X, Chen D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021. 6894–910. https://doi.org/10.18653/v1/2021.emnlp-main.552

[ref16] 16. Abbott RJ. Program design by informal English descriptions. Commun ACM. 1983;26(11):882–94.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref17] 17. Aguilera C, Berry DM. The use of a repeated phrase finder in requirements extraction. Journal of Systems and Software. 1990;13(3):209–30.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref18] 18. Rolland C, Proix C. A natural language approach for Requirements Engineering. Notes on Numerical Fluid Mechanics and Multidisciplinary Design. Springer International Publishing. 1992. p. 257–77. https://doi.org/10.1007/bfb0035136

[ref19] 19. Fischbach J, Hauptmann B, Konwitschny L, Spies D, Vogelsang A. Towards Causality Extraction from Requirements. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), 2020. 388–93. https://doi.org/10.1109/re48521.2020.00053

[ref20] 20. Linzen T, Baroni M. Syntactic Structure from Deep Learning. Annu Rev Linguist. 2021;7(1):195–212.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref21] 21. Tian J, Shen C, Wang B, Xia X, Zhang M, Lin C, et al. LESSON: Multi-Label Adversarial False Data Injection Attack for Deep Learning Locational Detection. IEEE Trans Dependable and Secure Comput. 2024;21(5):4418–32.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref22] 22. Tian J, Shen C, Wang B, Ren C, Xia X, Dong R, et al. EVADE: Targeted Adversarial False Data Injection Attacks for State Estimation in Smart Grid. IEEE Trans Sustain Comput. 2025;10(3):534–46.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref23] 23. Kurtanovic Z, Maalej W. Automatically Classifying Functional and Non-functional Requirements Using Supervised Machine Learning. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), 2017. 490–5. https://doi.org/10.1109/re.2017.82

[ref24] 24. Jindal R, Malhotra R, Jain A. Automated classification of security requirements. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2016. 2027–33. https://doi.org/10.1109/icacci.2016.7732349

[ref25] 25. Yenugula M, Kasula VK, Yadulla AR, Konda B, Addula SR, Kotteti CMM. Privacy-Preserving Decision Tree Classification Using Homomorphic Encryption in IoT Big Data Scenarios. In: 2025 IEEE 4th International Conference on Computing and Machine Intelligence (ICMI), 2025. 1–6. https://doi.org/10.1109/icmi65310.2025.11141083

[ref26] 26. Gärtner AE, Fay T-A, Göhlich D. Fundamental Research on Detecting Contradictions in Requirements: Taxonomy and Semi-Automated Approach. Applied Sciences. 2022;12(15):7628.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref27] 27. Gärtner AE, Göhlich D, Fay T-A. AUTOMATED CONDITION DETECTION IN REQUIREMENTS ENGINEERING. Proc Des Soc. 2023;3:707–16.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref28] 28. Gärtner AE, Göhlich D. Automated requirement contradiction detection through formal logic and LLMs. Autom Softw Eng. 2024;31(2).
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref29] 29. Malik G, Yildirim S, Cevik M, Bener A, Parikh D. Transfer learning for conflict and duplicate detection in software requirement pairs. In: 2024. https://arxiv.org/abs/2301.03709

[ref30] 30. Malik G, Cevik M, Başar A. Data Augmentation for Conflict and Duplicate Detection in Software Engineering Sentence Pairs. In: Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering, 2023. 34–43. https://doi.org/10.5555/3615924.3615928

[ref31] 31. Pan SJ, Yang Q. A Survey on Transfer Learning. IEEE Trans Knowl Data Eng. 2010;22(10):1345–59.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref32] 32. Devlin J, Chang M-W, Lee K, Toutanova K. In: Proceedings of the 2019 Conference of the North, 2019. 4171–86. https://doi.org/10.18653/v1/n19-1423

[ref33] 33. He P, Liu X, Gao J, Chen W. DeBERTa: Decoding-enhanced BERT with Disentangled Attention; 2021. Available from: https://arxiv.org/abs/2006.03654

[ref34] 34. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks?. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, editors. Advances in Neural Information Processing Systems. Curran Associates, Inc. 2014.

[ref35] 35. Howard J, Ruder S. Universal Language Model Fine-tuning for Text Classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018. 328–39. https://doi.org/10.18653/v1/p18-1031

[ref36] 36. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI blog. 2019.

[ref37] 37. Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of Tricks for Efficient Text Classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017. 427–31. https://doi.org/10.18653/v1/e17-2068

[ref38] 38. Cer D, Yang Y, Kong S, Hua N, Limtiaco N, John RS. Universal Sentence Encoder. In: 2018. https://doi.org/arXiv:1803.11175

[ref39] 39. Murphy KP. Machine learning: a probabilistic perspective. MIT Press. 2012.

[ref40] 40. Lin T-Y, Goyal P, Girshick R, He K, Dollar P. Focal Loss for Dense Object Detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 2999–3007. https://doi.org/10.1109/iccv.2017.324

[ref41] 41. Lu Y, Bo Y, He W. Confidence Adaptive Regularization for Deep Learning with Noisy Labels. 2021. https://doi.org/arXiv:2108.08212

[ref42] 42. Ghimire S, Masoomi A, Dy J. Reliable estimation of kl divergence using a discriminator in reproducing kernel hilbert space. Advances in Neural Information Processing Systems. 2021;34:10221–33.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref43] 43. Natural Language Inference Dataset for Software Engineering. In: 2023. https://doi.org/10.5281/zenodo.8025041

[ref44] 44. Cleland-Huang J, Vierhauser M, Bayley S. In: 2018. 109–12.

[ref45] 45. Ferrari A, Spagnolo GO, Gnesi S. PURE: A Dataset of Public Requirements Documents. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), 2017. 502–5. https://doi.org/10.1109/re.2017.29

Figures

Abstract

1 Introduction

2 Related works

2.1 Conflict detection

2.2 Transfer learning

3 Methodology

3.1 Overall framework

3.2 Building dual encoders

3.3 Improving the classifier and optimizing the loss function

3.4 Fusing sequential transfer and cross-domain transfer

4 Experiment

4.1 Datasets

1. TRAINNLI.

2. CDN/CN.

4.2 Experimental implementation

4.3 Experimental results and analysis

4.3.1 Dual encoders comparison experiment.

4.3.2 Classifier comparison experiment.

4.3.3 Dual encoders and classifier experiment.

4.3.4 Cross-domain transfer experiment.

5 Discuss

5.1 Threats to Validity

5.2 Limitations and future work

6 Conclusion

References