Figures
Abstract
Enhancing software quality remains a main objective for software developers and engineers, with a specific emphasis on improving software stability to increase user satisfaction. Developers must balance rigorous software testing with tight schedules and budgets. This often forces them to choose between quality and cost. Traditional approaches rely on software reliability growth models but are often too complex and impractical for testing complex software environments. Addressing this issue, our study introduces a system dynamics approach to develop a more adaptable software reliability growth model. This model is specifically designed to handle the complexities of modern software testing scenarios. By utilizing a system dynamics model and a set of defined rules, we can effectively simulate and illustrate the impacts of testing and debugging processes on the growth of software reliability. This method simplifies the complex mathematical derivations that are commonly associated with traditional models, making it more accessible for real-world applications. The key innovation of our approach lies in its ability to create a dynamic and interactive model that captures the various elements influencing software reliability. This includes factors such as resource allocation, testing efficiency, error detection rates, and the feedback loops among these elements. By simulating different scenarios, software developers and project managers can gain deeper insights into the impact of their decisions on software quality and testing efficiency. This can provide valuable insights for decision-making and strategy formulation in software development and quality assurance.
Citation: Li W, Fang C-C (2025) Applying a system dynamics approach for decision-making in software testing projects. PLoS One 20(5): e0323765. https://doi.org/10.1371/journal.pone.0323765
Editor: Iftikhar Ahmed Khan, University of Lahore - Raiwind Road Campus: The University of Lahore, PAKISTAN
Received: October 18, 2024; Accepted: April 15, 2025; Published: May 16, 2025
Copyright: © 2025 Li, Fang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: This work was sponsored by the Guangdong Basic and Applied Research Foundation, China [grant number 2024A0505050040]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
The stability and reliability of software play a crucial role in its success after development. Thorough testing of both ongoing and completed software is a crucial step in the software development process because it ensures high software quality. The responsibility of creating highly dependable software systems or more stable applications falls on the shoulders of the software development team. During the testing phase of the software development cycle, it is essential to allocate sufficient time and resources to uncover any hidden bugs. For customers purchasing the software, comprehensive, rigorous, and accurate testing of the software system is vital to enhance their confidence in using the product. From the perspective of the software development team, testing serves multiple purposes. It allows them to verify the correct execution of the programs they have written, assess whether the software’s performance meets the required standards, and ensure that the system’s security aligns with user requirements. Extending this, the testing phase also involves assessing the software’s compatibility with various hardware and operating systems, evaluating its user interface for usability, and ensuring that it complies with legal and regulatory standards. Regular updates and maintenance testing are crucial to address any newly discovered vulnerabilities or bugs and to keep the software aligned with evolving user needs and technological advancements. Engaging with end-users for feedback during the testing phase can provide valuable insights into user experience and expectations, thereby guiding the development team in refining the software. Ultimately, the rigorous testing process not only enhances the reliability and stability of the software but also contributes to the reputation of the development team and the company. This promotes trust and satisfaction among users and stakeholders [1–3].
Historically, research in software testing and reliability has extensively focused on the development of software reliability growth models (SRGMs). These models are crucial for tracking the improvement of software reliability during extended testing periods. They serve a critical function in forecasting the evolution of software reliability over time, thereby aiding in the estimation of the total cost involved in software testing. This predictive capability empowers software developers to make informed decisions regarding the allocation of time and resources in software testing. It helps strike a balance between the costs of testing and debugging and the overall quality of the software. Over the past two decades, there has been substantial theoretical evolution in SRGMs. A prominent approach within these models is the Non-Homogeneous Poisson Process (NHPP), a widely adopted statistical method for modeling the software failure process. The core principle of NHPP is to treat the software failure rate function, denoted as , as a function that is dependent on time. It assumes that
, which represents the cumulative number of errors detected by time
, is a crucial metric. The function
is used to determine the number of program errors identified at any given time
[4–11].
However, it is important to note that most conventional SRGMs operate under the assumption of “Perfect Debugging.” This assumption simplifies the complexity of mathematical analysis by presuming that every identified bug is flawlessly fixed, which may not always reflect real-world scenarios. To further enhance this understanding, recent advancements in SRGMs are exploring more realistic scenarios. These considerations include the possibility of “Imperfect Debugging”, where some errors may only be partially resolved or new issues may be introduced during the debugging process. Additionally, modern models are increasingly incorporating factors such as user feedback, varying testing environments, and the influence of software updates. There is also a growing emphasis on utilizing machine learning techniques to enhance the predictive accuracy of these models. This involves considering complex datasets and real-world usage patterns [12–16].
Furthermore, some studies have been focusing on how SRGMs can be adapted to agile practices, where software development and testing are conducted in a more iterative and continuous manner. This involves developing models that can dynamically adjust to rapid development cycles and frequent code releases, which is a stark contrast to traditional models designed for longer, more static testing periods. Therefore, the issue of multiversion or multiphase software testing arises. Overall, the continuous evolution of SRGMs reflects an ongoing effort to align these models more closely with the evolving landscapes of software development and testing. The goal is to achieve a more accurate, dynamic, and practical approach to understanding and enhancing software reliability [17–19].
Recent advancements in software reliability and testing have highlighted the significance of robust models and strategies for optimizing software quality and performance. Pradhan et al. [20] emphasize the role of software reliability models and multi-attribute utility functions in determining optimal software release times, thereby providing a strategic framework for balancing reliability with development constraints. Building on this, Pradhan et al. [21] explore emerging trends in software reliability growth modeling, identifying future directions to enhance predictive accuracy and adaptability in complex software environments. In the context of debugging and fault management, Zada et al. [22] propose the use of Support Vector Machines (SVM) for classifying software failure incidents, demonstrating the potential of machine learning in improving debugging efficiency. Furthermore, Zada et al. [23] extend this approach by evaluating supervised machine learning classifiers for malware detection, highlighting the critical role of automated tools in identifying and mitigating software defects. Meanwhile, Ilyas et al. [24] address the challenges of software reliability in critical systems by presenting a fog computing-based architecture for pervasive health monitoring, which integrates robust testing and debugging mechanisms to ensure system resilience. Collectively, these studies underscore the evolving landscape of software testing, reliability, and debugging. They highlight the integration of advanced modeling techniques, machine learning, and innovative architectures to address the complexities of contemporary software systems.
“System Dynamics” is a methodological and analytical approach developed by Jay W. Forrester [25] at the Sloan School of Management at MIT. It emphasizes the construction of cause-and-effect relationships between variables through systematic thinking and feedback control theories. This approach is used to create simulation models, which are tools designed to simulate and project objectives based on interconnected relationships. Historically, system dynamics has primarily been employed to solve industrial and business management issues. This includes analyzing changes in production and employment, national economic trends, and corporate market strategies. Computer simulations play a crucial role in these applications by enabling the visualization of how various elements—such as system structure, policies, and time delays—interact and influence one another. In recent years, the scope of system dynamics has significantly expanded. It is now increasingly being applied to address environmental and social science challenges. Models based on system dynamics are constructed to understand and analyze the complex feedback relationships among factors such as industrialization, pollution, healthcare, and resource allocation. These models aim to identify the key factors that influence complex systems and suggest strategies for improvement [25–28].
In this study, in order to simplify the complexities often associated with mathematical derivations in traditional SRGMs, some of these models may rely on simplified assumptions or conditions. While this makes the models more manageable, it can lead to an incomplete portrayal of software reliability growth, potentially omitting crucial factors. This simplification may lead to inaccuracies in predicting software reliability and estimating testing costs. Addressing this issue, the current study adopts a system dynamics approach. This method involves mapping the impact of test and debug time on software reliability growth through a causal loop diagram, which enables system dynamics simulations. This approach differs significantly from traditional studies, which usually rely on differential equations to infer the mean value function of software reliability growth before calculating software reliability. The use of system dynamics in modeling allows for the consideration of more complex systems while avoiding the need for complicated mathematical derivations. The study’s method provides a more comprehensive and dynamic view of the software development process, considering various interconnected factors that traditional models might overlook.
To further elaborate on this, the study aims to integrate real-world scenarios into the system dynamics model, thereby providing a more realistic and practical perspective. This includes considering factors such as team capabilities, resource limitations, and market pressures, all of which can significantly impact the reliability of software and the testing processes. In summary, the study introduces an innovative framework for decision-making in software testing projects by applying a system dynamics approach, offering three key contributions to the field. First, the proposed methodology presents a comprehensive scientific management framework that integrates dynamic interactions among software testing, debugging processes, and resource allocation. Second, the model enhances practical relevance by incorporating a broader array of real-world factors, such as evolving team dynamics, imperfect debugging scenarios, and changing project constraints. Third, the study also accounts for the learning curve of testing and debugging personnel, allowing for a more accurate representation of improvements in software reliability throughout the testing process.
To summarize, these features bridge the gap between theoretical models and practical decision-making, empowering software testing managers to implement data-driven strategies that optimize software testing results.
2. Software reliability modeling and system dynamics
2.1 Basic model development
In the field of reliability engineering, the NHPP has emerged as a crucial tool for addressing reliability issues in various hardware and software scenarios. Unlike the Homogeneous Poisson Process (HPP), the NHPP is distinct in its ability to account for variable failure rates over time. This characteristic is particularly advantageous for SRGMs because the frequency of software defects typically decreases with continued testing and debugging. Prior to market release, software systems undergo extensive testing and debugging to ensure their quality. During this critical phase, software departments or companies evaluate multiple strategies, and decision-makers choose the most appropriate testing methods. The management’s challenge is to find a balance between ensuring system stability and managing associated expenses. They also need to evaluate the impact of various resource allocations on the enhancement of system reliability. This includes assessing the effectiveness of testing and debugging efforts, as well as planning budgets for different resource scenarios.
Generally, the development of software reliability over time is modeled using mathematical functions in a counting process format, represented as . Within this framework,
is governed by a NHPP, which is defined by its mean value function, denoted as
. The dynamics of this process can be mathematically described through a specific formulation.
Viewed from a different perspective, the mean value function essentially estimates the total number of errors detected and identified within a specific time period, starting from 0 and ending at a specific point in time , and it can be presented as follows:
As the software undergoes testing, the number of remaining and undetected errors gradually decreases over time due to the debugging efforts. This reduction in undetected errors directly contributes to enhancing the reliability of the software. Based on this, the software reliability can be defined as follows:
where denotes the potential error number in a software system at the beginning and
represents the number of remaining and undetected errors in a software system at time
. The parameter
is an adjustment factor for software reliability. Additionally, as testing time approaches infinity, the software reliability metric is expected to progressively approach a value of one (expressed mathematically as
).
Given the points discussed above, we assume that the occurrence of software errors in this study follows a NHPP, with any identified errors being resolved immediately. Consequently, the time required to fix errors is considered negligible. To develop a robust model for predicting software reliability, we incorporate the concept of the learning effect within the SRGM. Figure 1 presents a causal loop diagram that visualizes the software testing process within the context of the system dynamics model. In this causal loop diagram, the variable represents the inherent ability of testing staff to independently detect software errors. This ability is based on their skills and training, rather than on insights gained from previous error patterns. This factor is distinct from the learning factor,
, which reflects the staff’s ability to learn from past errors and improve with increased testing time. Therefore, the learning factor is related to the accumulated number of detected errors. Furthermore, the effect of the two factors is dependent on the scale of the testing staffs (
). Besides, the number of undetected errors at the beginning (
) is the number of initial errors (
). However, some new errors may be introduced due to the staff’s negligence and inappropriate debugging process. Therefore, in order to obtain a more accurate estimation, it is assumed that the software system may experience an increase in the proportion of new errors during the debugging process. It should be noted that there are the two flows in Figure 1. The right one represents the error detection per unit time (
), and the left one represents the increase of the new errors due to imperfect debugging.
Drawing from the principles of the causal loop diagram, the error detection per unit time can be formulated through the use of differential equation techniques as follows:
In order to apply the integration of natural logarithm, the above equation can be rearranged as follows:
After the integration of the both sides of Equation (5), the result can be obtained as follows:
In order to derive the mathematical form of , one must initially solve Equation (6) with respect to
. This process results in a representation of
that includes an undetermined constant, as shown in the following expression:
Considering the initial condition of , which implies that no errors are identified at the beginning of the testing period (
), Equation (7) can be resolved by utilizing this initial condition to eliminate the unknown constant. As a result, Equation (7) is restructured as follows:
After solving Equation (8) to find the unknown constant, the solution is obtained as follows:
Substituting this constant into Equation (7) allows us to derive the complete expression for as:
This equation represents the initial form of the mean value function, which is utilized to estimate the average cumulative number of detected errors. Given that equals
, the expression for
can be further simplified to:
For software managers aiming to assess the number of errors detected at a specific time t during the testing phase, it is crucial to understand the intensity function of the mean value, . This can be found by computing the first derivative of
, which obtains the mathematical form of
as:
To effectively monitor the evolving rate of error detection per error at time t, the form of the error detection rate can be derived as follows:
Notably, the error detection rate is a strictly increasing function, indicating that the efficiency of debugging is expected to improve as testing time advances. Moreover, when considering the scenario of imperfect software debugging, the number of initial errors may increase over testing time. Consequently, the initial errors can be expressed as a function of testing time, denoted as . The mathematical form of
can be defined based on the specific characteristics and dynamics of the software testing process.
Furthermore, the mathematical formulation of the mean value function is utilized to estimate crucial parameters’ value for a specific testing team, assisting in the evaluation of their software testing effectiveness. Multifaceted testing environments present challenges, including collaboration among multiple testing groups, fluctuating learning factors, imperfect debugging, and various stochastic elements. These complexities create scenarios where traditional mathematical models struggle to apply. As a result, achieving accurate estimations becomes difficult due to the dynamic and interdependent nature of these factors. The fundamental mathematical model retains its usefulness in simpler testing situations. However, for more complex scenarios, a system dynamics approach is more appropriate. This method offers a comprehensive framework for understanding and analyzing the interactions and feedback loops among various elements in complex testing environments. It allows for the integration of various factors, such as team dynamics, different skill levels, changing software requirements, and resource limitations. By adopting this approach, it becomes possible to simulate and predict the behavior of the testing process under various conditions.
The next section will introduce how to estimate the parameter values of the fundamental mathematical model.
2.2. Parameter estimation
This study utilizes two main methods for parameter estimation in the proposed model: Maximum Likelihood Estimation (MLE) and Least Squares Estimation (LSE). These techniques are crucial in evaluating the effectiveness and precision of the model. Utilizing MLE and LSE allows for a comprehensive analysis of the collected software failure data, providing a solid foundation for benchmarking the proposed model against existing ones. The focus here is on the quality of fit, which is critical in determining how closely the new model reflects the actual patterns of software failures. This is a key aspect in validating the model’s predictive accuracy.
- (1) The MLE method is commonly used to estimate parameters in a presumed probability distribution, which is particularly relevant in an NHPP context. In this scenario, the likelihood function is tailored to reflect the unique characteristic of a time-varying event rate in the NHPP. MLE seeks to find the most probable parameter values within the distribution that best explain the observed data using the NHPP. The likelihood function is expressed as follows:
By applying the natural logarithm to this equation and then calculating the first-order derivatives with respect to each parameter, and setting them to zero, we can solve the log-likelihood function using numerical methods. Particularly, if an error-seeding method is employed to obtain an initial estimate of potential errors (), estimating other parameters becomes more feasible.
- (2) The LSE method minimizes the sum of squared differences between observed and predicted data in order to estimate model parameters. To implement LSE, a dataset comprising n pairs of observed values is utilized, represented as
,
,
,
In this context, each
signifies the accumulated number of errors detected within the time frame [
. The calculations for this estimation involve a specific procedure, tailored to align the model closely with the observed data points.
The estimation process involves solving the first-order derivatives of the error function with respect to each parameter and setting them equal to zero. Numerical methods are used for this purpose, and error seeding can simplify the estimation of and
by providing an initial estimate of potential errors (
).
Both methods are essential for understanding the rate and pattern of error detection in software testing, enabling managers to effectively navigate the testing phase and enhance software reliability.
2.3. Model verification and comparison
In this section, we assess the effectiveness of various SRGMs based on their fitness for different datasets obtained from related references. The evaluation begins by applying the LSE method to estimate the parameters of the proposed model using four specific datasets which are presented in Table 1. A comparative analysis is then conducted between the proposed model and three other classic SRGMs, as shown in Table 2.
For a comprehensive evaluation, two widely recognized indicators are employed:
- (1) Mean Square Error (MSE): This measures the discrepancy between the estimat207.999 pted and actual values. The MSE formula is
, where
is the actual cumulative number of detected errors up to
;
is the estimated cumulative number of errors up to
;
is the number of observations; and
is the number of parameters in the model.
- (2) R-squared (Rsq): This indicates how well the model accounts for the variability in the data. A higher Rsq value suggests a better fit. It’s calculated as
.
Table 3 demonstrates the superior performance of the proposed model in terms of MSE and R-squared. It should be noted that incorporating more parameters can enhance a model’s flexibility and fitting capability, contingent on careful model design. The proposed model includes three parameters, allowing it to effectively adapt to different testing scenarios and accurately fit the test data. Figure 2 illustrates the excellent fit of the proposed model with the traditional CI (Confidence Intervals) to these artificially generated datasets, highlighting its adaptability and accuracy in various testing scenarios.
3. System dynamics model for software testing process
In this section, we utilize a system dynamics model to demonstrate a software testing process that involves multiple teams of testing staff. Figure 3 presents a conceptual framework that illustrates the interrelationships and feedback loops between various components involved in software testing and debugging.
In traditional models, all testing teams are treated as a single entity, and the parameters of the software reliability growth model are estimated based on this collective unit. However, in practice, testing teams often undergo changes and reorganization. In such cases, the model parameters previously estimated for the unified team are no longer applicable and must be re-estimated, as the values of these parameters have shifted. Furthermore, when considering multiple testing teams and the possibility of imperfect testing introducing new software defects, it becomes challenging to estimate testing efficiency using past mathematical inference methods. This is because deriving a closed-form mathematical model under these conditions is highly complex. Please refer to Figure 3, which illustrates that each testing team’s testing efficiency can be described using a mathematical model. However, when considering the scenario of imperfect debugging across all teams, the combined mathematical models do not yield a closed-form expression to fully characterize the entire situation. To address this limitation, we utilize a system dynamics model, which allows for a comprehensive and dynamic representation of the entire testing process.
Here is a detailed description of the elements typically found in such a model:
- (1) Testing Staff Teams: There are three teams (
: 1, 2, 3) involved in the testing process, each with different scales of staffing (
) due to varying human resource allocations. Since individual testing staff team have different education and work experience, their inherent testing ability factor (
) and learning factor (
) will be also different. Therefore, the mean value function for the testing staff team
will be
. Furthermore, the sum of all the mean value functions for the different teams is considered the stock “Detected Errors” in this system dynamics model. However, the learning factor may change over time due to testing, and can be represented as
. The parameters
and
represent the intercept and linear coefficient of the learning factor function.
- (2) Error Detection per Time Unit: It is a flow in the causal loop diagram. This node aggregates the error detection rates from the three testing staff teams and demonstrates how it impacts the number of detected errors. It can be represented as
. Since the velocity of the error detection will fluctuate in practice, the randomness of the flow is set as
.
- (3) Initial Errors: These represent the errors present at the beginning of the testing process. While the testing process is designed to gradually identify and reduce the number of potential hidden errors, it is important to note that the error count may not always follow a strictly decreasing trend. This is primarily due to the phenomenon of imperfect debugging, where the process of fixing existing errors can inadvertently introduce new defects or fail to fully resolve the original issues. As a result, the overall error count may experience temporary increases during the testing phase, highlighting the complex and dynamic nature of software debugging. Therefore, the initial errors can be given as a function of testing time as
, and it will be affected by the factor “Proportion of New Errors Introduced” (
) and the flow “Increasing Rate of New Errors” (
). Since the flow “Increasing Rate of New Errors” may fluctuate in practice, we introduce the randomness mechanism to the flow
.
- (4) Increasing Rate of New Errors (
) due to Imperfect Debugging: This flow in system dynamics indicates that as the testing progresses, there is a possibility of new errors being introduced due to imperfect debugging efforts.
- (5) Proportion of New Errors Introduced: Connected to the imperfect debugging process node, this represents the proportion of new errors that emerge during the debugging process.
- (6) Undetected Errors: It is a stock in the system dynamics model. It means that the number of undiscovered errors decreases as the number of detected errors increases.
- (7) Total Cost: This is influenced by “Setup Cost”, “Routine Cost”, “Debugging Cost”, “Risk Cost”, and “Opportunity Cost”. Each of these costs is likely a factor of the resources and time invested in the testing and debugging process. The five distinct cost components, which are detailed as follows:
- (i) Setup Cost (
): This refers to the initial investment required to organize the testing process for alternative
, which includes expenses for planning, acquiring equipment, and other necessary preparatory tasks.
- (ii) Routine Cost (
): This cost accounts for ongoing operational expenses during the testing phase for alternative
, such as utilities, office space rental, and insurance premiums.
- (iii) Debugging Cost (
): This cost refers to the expenses associated with identifying and fixing software errors for testing staff team
during the testing phase. The total debugging cost can be measured by
.
- (iv) Risk Cost (
): This cost quantifies the potential financial impact associated with errors that might arise after the software is released or deployed, which could result in operational disruptions or damage to the company’s reputation. The cost of risk in this analysis is directly proportional to the number of errors that remain undiscovered. It is quantified here as
.
- (v) Opportunity Cost: Delaying the release of software can result in both measurable and immeasurable losses. Opportunity cost, therefore, relates to the economic impact of delays in launching the software. It is quantified here as
, with
being a scaling factor,
the initial amount, and
representing how quickly the loss escalates as time progresses. The model assumes a power-law relationship for opportunity cost, which can be customized to reflect the individual perspectives or circumstances of decision-makers. This provides a versatile approach to evaluating opportunity costs in different scenarios or according to specific strategic factors.
- (8) Parameter of Measuring Reliability: It denotes the symbol
, and this is likely a metric used to measure software reliability.
- (9) Software Reliability: This outcome is influenced by the Undetected Errors (
), representing the overall quality and dependability of the software after testing. It is quantified here as
. Besides, in practice, most software developers adhere to a standard of minimal software reliability (
) to ensure the quality of the software (
).
Arrows represent the direction of influence between different factors indicating a positive influence. The loops, denoted by circular arrows, represent feedback processes. For example, the loop connecting Detected Errors, Undetected Errors, and Error Detection per Time Unit suggests a reinforcing loop where an increase in detected errors could lead to improved error detection efficiency, thus resulting in a higher number of detected errors in a positive feedback loop. Overall, the diagram illustrates the complex interactions among different factors in the process of software testing and debugging, the resources used, and the ultimate objective of improving software reliability.
Moreover, if the system dynamics model for software testing is large-scale, or if the manager needs to evaluate multiple distinct models for decision-making, cloud-based or distributed computing systems will be necessary to address these challenges. In scenarios involving large-scale simulations of the system dynamics, the use of cloud-based testing tools and distributed computing frameworks is essential for efficiently managing extensive datasets and enabling real-time analytics. Cloud platforms such as AWS, Azure, and Google Cloud offer scalable, on-demand computational resources that can adapt to the increasing complexity of simulations. Distributed computing technologies, such as Apache Spark and Hadoop, facilitate parallel processing across multiple nodes, significantly reducing computation time and enhancing overall efficiency. By adopting these technologies, the simulation model can not only manage larger datasets but also deliver timely and actionable results, making it a robust solution for software testing.
4. Application and numerical analysis
Suppose that a software development company is working on an Enterprise Resource Planning Information System (ERPIS) project. After nine months of dedicated software development, the team is now ready to enter the crucial final phase of testing and debugging. The client has set a high standard for the software’s reliability, expecting it to operate with at least 90% reliability during its initial phase. This translates to a maximum of a 10% chance of encountering software errors during any given hour of operation. An analysis of the company’s testing history reveals a significant impact of different testing team compositions on the efficiency of the debugging process. In this scenario, the development have the three different testing teams, and each testing team has different testing efficiencies due to their individual work experience and ability. The three testing teams, designated as Teams 1, 2, and 3, represent varying levels of workforce intensity: low, medium, and high, respectively. Team 3, while exhibiting greater efficiency in testing compared to Teams 1 and 2, also incurs higher costs in debugging activities. Moreover, the learning factors of Teams 2 and 3 would increase with testing time to accelerate the debugging process. Here, it is assumed that the learning factors of Teams 2 and 3 increase linearly, and they can be reasonably estimated. Unfortunately, the software testing project manager cannot recruit all the members of the three teams because the software development company has another software testing project that needs to be undertaken. Therefore, the project manager for software testing can only recruit partial members from each of the three teams. Based on this situation, the project manager devises four alternatives based on the limitations of available human resources. It should be noted that the staff’s debugging work is not perfect, and as a result, new errors are always introduced in software system when they attempt to correct software bugs. Furthermore, the randomness mechanism needs to incorporate the new software errors introduced in the dynamic environment. Additionally, other associated expenses also affect the overall testing cost. In this case, the setup costs and routine costs of the four alternatives are not significantly different from each other. In order to ensure software quality, the project manager set the minimum requirement for software reliability under restricted timeline for software release. Although extending software testing can effectively improve software reliability, delaying the software release may result in missed opportunities and sales loss. Suppose that the missed opportunities and sales loss can be reasonably estimated as a function. With this information, the project manager can evaluate and balance the advantages and drawbacks associated with software reliability and the costs of missed opportunities. Figure 4 and Table 4 illustrate detailed information for the four testing alternatives.
Implementing the suggested system dynamics model on the four software testing alternatives, the simulation outcomes are shown in Figures 5 and 6, and detailed in Table 5. These figures demonstrate a distinct convex correlation between the expected testing costs and the testing duration for all four alternatives. As illustrated in Figure 5, each alternative meets the minimum software reliability threshold of 90% if the testing period exceeds 5.25 months. A deeper comparative analysis indicates that alternative A4 emerges as the most economically efficient option for the organization. This is due to its lowest projected testing costs compared to the other choices. Consequently, it is prudent for the project manager to choose alternative A4, which aims to release the software after 5.6 months. At this point, the software is expected to achieve a reliability rate of 96.05%, with an estimated testing cost of $133,490. Additionally, Figure 6 reveals a noteworthy observation: the reliability trajectories of alternatives A1 and A4 are nearly identical. The progression pattern of software reliability for A1 closely mirrors that of A4, yet the costs associated with A1 are about 13–15% higher than those of A4. This finding further emphasizes the cost-effectiveness of alternative A4 in achieving comparable levels of software reliability at a reduced financial investment. Hence, this analysis offers valuable insights for decision-making in selecting the most suitable testing strategy, efficiently balancing cost and reliability. From a software quality standpoint, alternative A3 is superior to the other alternatives, even though it is not the most expensive. Should the project manager prioritize software reliability as the key factor in upholding the company’s reputation, they may choose alternative A3 over A4. This choice could stem from the potential of alternative A3 to deliver superior performance, resulting in a more reliable and dependable software product. While alternative A4 may present cost advantages, alternative A3 might offer enhanced reliability measures that align more closely with the company’s strategic emphasis on quality and customer trust. By choosing alternative A3, the manager demonstrates a commitment to delivering a product that not only meets, but potentially exceeds, industry standards for reliability. This decision could be crucial in strengthening the company’s market position and enhancing its reputation for providing high-quality and reliable software solutions. It also reflects a strategic decision to invest in long-term brand credibility, which could potentially result in increased customer loyalty and a stronger competitive advantage. Thus, the choice of alternative A3, despite any additional costs or extended development time, could be seen as an investment in the company’s reputation for excellence and reliability in software development.
In their comprehensive analysis, the project manager included additional vital factors by conducting a sensitivity analysis to explore their effects on the overall cost and the decision-making process regarding the timing of the software’s release. Figures 7 and Table 6 detail the influences of specific parameters such as ,
,
,
,
,
,
, and
on the total cost. It is evident from Figure 7 that, in terms of the model parameters, the cost shows a higher sensitivity to
and
. This implies that inaccurate estimations of α_p and
can significantly disrupt the budget planning for the selected testing alternative. If the manager underestimates these parameters, there could be an overestimation of the testing costs. Additionally, incorrect predictions of
and
might also affect the decision regarding the software’s release timing. Overestimation of these values could lead to a shortened testing phase and a rushed release, potentially resulting in customer dissatisfaction and damage to the company’s reputation due to the release of unreliable software.
From another perspective, improving testing efficiency may necessitate the manager’s investment in advanced training for the testing staff, consequently increasing the cost of staff education. While this investment can improve the values of and
, the manager must weigh the benefits of this investment against the costs associated with enhancing staff skills and the resulting decrease in expenses for reliability improvement.
Furthermore, the time-dependent routine cost () and the debugging cost (
) also play significant roles in determining the testing cost. For instance, reducing debugging costs by 10% could result in a decrease in total testing costs by approximately 6.5%. In contrast, the impact of routine costs is less significant. For example, a 10% reduction in administrative costs may only lead to a 1% reduction in total testing costs. This finding suggests that the manager should focus on optimizing expenditures, particularly in debugging activities, to improve cost efficiency.
The study also examines the impact of random variations in error identification. Figure 8 showcases a simulation of the system dynamics, illustrating its inherent randomness. However, this randomness appears to have a limited impact on the effectiveness of testing in this particular scenario. This aspect of the analysis highlights the significance of taking into account variability and uncertainty in project management, especially in software testing, to guarantee sound decision-making and efficient resource allocation.
5. Conclusions and future directions
The rapid growth of the software industry highlights the importance of a high-quality software system in enhancing a company’s competitive advantage. In this context, software reliability and stability are crucial factors in the development process. Traditional software reliability growth models, although comprehensive, often involve complex mathematical formulations. Particularly in complex testing environments, these models require frequent recalibration to maintain accurate predictions, which limits their practicality in real-world applications. To address these challenges, this paper proposes the use of system dynamics to develop a mathematical model that is suitable for complex software testing scenarios. This innovative approach utilizes a system dynamics methodology to develop a software reliability growth model that effectively addresses the complexities of intricate testing environments. It involves creating a cause-and-feedback diagram and utilizing system simulation techniques to analyze the impact of testing and debugging on the growth of software reliability. This study accomplishes several objectives, as outlined below:
- (1) Implementation of a system dynamics approach for modeling the growth of software reliability, integrating autonomous and experiential learning factors that reflect real-world software testing dynamics. These factors, which were often overlooked in previous research, have been included in the system dynamic model to better reflect the changes in software reliability during the testing phase.
- (2) Development of a system dynamics prediction model that encompasses a wide range of parameters associated with software testing and debugging costs, resulting in a comprehensive cost estimation model. This model takes into account various cost factors, enhancing the decision of budget planning and resource allocation in software development projects.
- (3) An examination was conducted to assess the efficiency and cost-effectiveness of various software testing solutions. This process led to the development of individual system dynamic models for each solution. These models enable a comprehensive evaluation of the financial and reliability aspects of each testing solution, providing software developers with valuable insights to determine the most appropriate approach for their specific requirements.
Furthermore, the findings of this study provide actionable insights for both short-term and long-term project management strategies.
- (1) Short-term strategies: Project managers can utilize the model to optimize the allocation of testing resources, prioritize high-impact debugging tasks, and adjust timelines based on real-time reliability predictions. For example, the cost estimation framework (Objective 2) facilitates rapid “what-if” analyses for budget adjustments during sprint cycles.
- (2) Long-term strategies: Organizations can adopt the system dynamics approach to institutionalize data-driven decision-making. This may include aligning team training programs with the learning curves identified in the model (Objective 1) and establishing iterative feedback loops between testing outcomes and process improvements.
Besides, the system dynamics model introduced in this paper serves as a versatile tool for conducting sensitivity analyses on various parameters, thereby aiding in diverse decision-making processes. However, a common challenge faced in software testing is the lack of adequate historical data. This lack of data hampers the ability to accurately extract the necessary parameter values for the dynamic model, which is crucial for predicting software reliability and the associated costs. Consequently, this limitation restricts the possibility of simulating a variety of software testing scenarios, which hinders the ability of developers and decision-makers to evaluate and assess different software testing scenarios. To address these challenges, the study proposes two potential future directions.
- (1) Small-sample statistics: This approach involves using a limited set of software test data to construct a smaller, yet informative sample. By doing so, it addresses the issue of insufficient historical data, enabling more accurate predictions even with limited information. This method involves utilizing advanced statistical techniques to extrapolate meaningful insights from smaller datasets, thereby enhancing the adaptability of the model in situations where extensive data is not available.
- (2) Bayesian statistics and Monte Carlo simulations: This method can integrate the expertise of domain specialists with advanced computational techniques to estimate relevant parameters. Domain experts provide initial estimates, which are then refined through Bayesian statistical methods. These estimates are further enhanced using Monte Carlo simulations, which generate random samples from the parameter distributions to explore their uncertainty and variability. By iteratively sampling from these distributions, Monte Carlo methods allow for a comprehensive exploration of the parameter space, even in complex or high-dimensional scenarios. This combination of expert knowledge, Bayesian updating, and Monte Carlo sampling can be seamlessly integrated into the system dynamics model. The result is a more robust and adaptive estimation process that incorporates both subjective expertise and objective computational analysis. As more data becomes available, Bayesian statistics can further refine the parameter estimates, while Monte Carlo simulations ensure that uncertainty is explicitly accounted for. This makes the approach particularly well-suited for dynamic and uncertain environments, where flexibility and precision are critical.
In addition to these two approaches, the study can also explore other possibilities.
- (1) Integration of Machine Learning: Integrating machine learning algorithms into the system dynamics model could significantly enhance its predictive accuracy and adaptability. Machine learning models, trained on both historical and ongoing software testing data, can continuously learn and adjust their predictions, thereby providing more accurate and up-to-date insights.
- (2) Real-time Data Analysis: Integrating real-time data analysis into the system dynamics model can enable more agile and responsive decision-making. This approach involves continuously feeding real-time testing data into the model, enabling it to adjust its predictions based on the most recent information. As a result, it provides a more accurate and up-to-date assessment of software reliability and testing costs.
By exploring these directions, the study aims to refine and expand the applicability of the system dynamics model, making it a more effective tool for predicting software reliability and costs in diverse and data-constrained environments.
Supporting information
S4 Data. Failure data of Telecommunication system.
https://doi.org/10.1371/journal.pone.0323765.s004
(CSV)
Acknowledgments
This work was sponsored by the Guangdong Basic and Applied Basic Research Foundation and the Guangdong Soft Science Foundation, China [grant number 2024A0505050040].
References
- 1. Wang J, Zhang C, Yang J. Software reliability model of open source software based on the decreasing trend of fault introduction. PLoS One. 2022;17(5):e0267171. pmid:35500002
- 2. Yeh C-W, Fang C-C. Software testing and release decision at different statistical confidence levels with consideration of debuggers’ learning and negligent factors. Int J Ind Eng Comp. 2024;15(1):105–26.
- 3. Pradhan V, Patra A, Jain A, Jain G, Kumar A, Dhar J, et al. PERMMA: Enhancing parameter estimation of software reliability growth models: A comparative analysis of metaheuristic optimization algorithms. PLoS One. 2024;19(9):e0304055. pmid:39231125
- 4. ZHANG X, PHAM H. A software cost model with warranty cost, error removal times and risk costs. IIE Transactions. 1998;30(12):1135–42.
- 5. Pham H, Zhang X. NHPP software reliability and cost models with testing coverage. Eur. J. Oper. Res. 2003;145(2):443–54.
- 6. Huang C-Y. Performance analysis of software reliability growth models with testing-effort and change-point. J. Syst. Softw. 2005;76(2):181–94.
- 7. Zhang X, Pham H. Software field failure rate prediction before software deployment. J. Syst. Softw 2006;79(3):291–300.
- 8. Li Q, Pham H. A Generalized software reliability growth model with consideration of the uncertainty of operating environments. IEEE Access. 2019;7:84253–67.
- 9. Li Q, Pham H. Modeling software fault-detection and fault-correction processes by considering the dependencies between fault amounts. Appl. Sci. 2021;11(15):6998.
- 10. Li Q, Pham H. Software reliability modeling incorporating fault detection and fault correction processes with testing coverage and fault amount dependency. Mathematics. 2021;10(1):60.
- 11. Pradhan V, Dhar J, Kumar A. Testing-effort based NHPP software reliability growth model with change-point approach. J Inf Sci Eng. 2022;38:343–55.
- 12. YAMADA S, TOKUNO K, OSAKI S. Imperfect debugging models with fault introduction rate for software reliability assessment. Int. J. Syst. Sci. 1992;23(12):2241–52.
- 13. Tian Q, Fang C-C, Yeh C-W. Software release assessment under multiple alternatives with consideration of debuggers’ learning rate and imperfect debugging environment. Mathematics. 2022;10(10):1744.
- 14. Shyur H-J. A stochastic software reliability model with imperfect-debugging and change-point. J. Syst. Softw. 2003;66(2):135–41.
- 15. Li T, Si X, Yang Z, Pei H, Ma Y. NHPP testability growth model considering testability growth effort, rectifying delay, and imperfect correction. IEEE Access. 2020;8:9072–83.
- 16. Chiu KC, Huang YS, Huang IC. A study of software reliability growth with imperfect debugging for time-dependent potential errors. Int J Ind Eng Theory Appl Pract. 2019;26:376–93.
- 17. Chatterjee S, Saha D, Sharma A. Multi‐upgradation software reliability growth model with dependency of faults under change point and imperfect debugging. J Software Evolu Process. 2021;33(6).
- 18. Saraf I, Iqbal J. Generalized multi‐release modelling of software reliability growth models from the perspective of two types of imperfect debugging and change point. Qual. Reliab. Eng. Int. 2019;35(7):2358–70.
- 19. Huang Y, Fang C, Chou C, Tseng T. A study on optimal release schedule for multiversion software. Informs J Comput. 2023.
- 20.
Pradhan V, Dhar J, Kumar A. Software reliability models and multi-attribute utility function based strategic decision for release time optimization. Predictive analytics in system reliability. Cham: Springer International Publishing. 2022. p. 175–90.
- 21.
Pradhan V, Kumar A, Dhar J. Emerging trends and future directions in software reliability growth modeling. Engineering Reliability and Risk Assessment. Elsevier; 2023. p. 131–44. https://doi.org/10.1016/b978-0-323-91943-2.00011-3
- 22. Zada I, Rahman T, Khan I, Jameel A. Classification of software failure incidents using svm. Sciencetech. 2021;2(3):01–13.
- 23. Zada I, Alatawi MN, Saqlain SM, Alshahrani A, Alshamran A, Imran K, Alfraihi H. Fine-tuning cyber security defenses: evaluating supervised machine learning classifiers for windows malware detection. Comput. Mater. Contin, 2024;80(2).
- 24. Ilyas A, Alatawi MN, Hamid Y, Mahfooz S, Zada I, Gohar N, et al. Software architecture for pervasive critical health monitoring system using fog computing. J Cloud Comput (Heidelb). 2022;11(1):84. pmid:36465318
- 25. Forrester JW, Mass NJ, Ryan CJ. The system dynamics national model: Understanding socio-economic behavior and policy alternatives. Technol. Forecast. Soc. Change. 1976;9(1–2):51–68.
- 26. Song J, He Z, Jiang L, Liu Z, Leng X. Synergy management of a complex industrial production system from the perspective of flow structure. Systems. 2023;11(9):453.
- 27. Abdelbari H, Shafi K. A system dynamics modeling support system based on computational intelligence. Systems. 2019;7(4):47.
- 28. Taha H, Smith C, Durham J, Reid S. Identification of a one health intervention for brucellosis in Jordan using system dynamics modelling. Systems. 2023;11(11):542.