Figures
Abstract
Entrepreneurial Team (ET) plays an essential role in the business process by driving innovation and optimizing ideas via adaptability, collaboration, and resourcefulness. The team performance is continuously affected because of resource imbalance, poor communication and inefficient task allocation. The importance of ET in organization growth is the main reason for this analysis. Therefore, this work uses Multi-Agent Reinforcement Learning (MARL) to handle efficient dynamic decisions and coordination to improve ET efficiency in dynamic and complex environments. The main intention of this work is to improve resource utilization, communication efficiency and optimize task allocation. During the analysis, Proximal Policy Optimization (PPO) is utilized to direct agents toward achieving collaborative goals. In every state, the agent receives rewards and penalties for their actions, which helps meet the organization’s goal with minimum time and improves the overall task completion rate. This process is evaluated using different case studies like software development, optimized manufacturing and logistic coordination, which helps to validate the system’s adaptability in various scenarios. In addition, different hypotheses are validated via case studies and metrics such as defect resolution, collaboration quality, operational efficiency, resource optimization, and task completion rate. Thus, the work highlights the impact of MARL in ET to ensure the highest performance in a dynamic environment.
Citation: Wang J, Jiang L (2026) Using reinforcement algorithms to improve the collaboration efficiency of entrepreneurial teams. PLoS One 21(3): e0343247. https://doi.org/10.1371/journal.pone.0343247
Editor: Ioana Gutu, Grigore T Popa University of Medicine and Pharmacy Iasi: Universitatea de Medicina si Farmacie Grigore T Popa lasi, ROMANIA
Received: January 23, 2025; Accepted: February 3, 2026; Published: March 11, 2026
Copyright: © 2026 Wang, Jiang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript.
Funding: This work was supported by The Annual Project of Philosophy and Social Sciences Planning of Henan Province in 2023 under Grant No. 2023CJJ192, awarded to JW. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
The ET characteristics, significance, and dynamics are utilized to understand every enterprise and improve its performance to meet the organization’s goal. The ET consists of individuals sharing their knowledge, responsibilities, rewards, and risks to develop and maintain the new venture [1–2]. The entrepreneurial team members are interdependent; they work to meet the mutual goal, influencing the business’s performance and decision-making efficiency [3]. The founding teams of entrepreneurs often recognize the success of entrepreneurial ventures. Several studies [4] reveal that teams leading a venture are more viable and profitable and have more growth potential than those an entrepreneur started. Since entrepreneurial skills, experiences, and views are needed to handle the challenges and risks of establishing a new firm, entrepreneurial teams are better equipped to provide such skills than individuals [5]. However, the ET faces several challenges, such as uncertainty, high autonomy, resource constraints and dynamic environment. The ET process and functions operate under high autonomy conditions, creating ambiguity [6] while differentiating the agent’s role and responsibilities. The ET ambiguity causes team conflicts while analyzing and allocating tasks to the team members. Most of the ETs developed with limited resources, affecting team performance due to the strained relationships between the members [7,8]. Finally, the ET environment is dynamic, in which the requirements and business objectives change frequently; therefore, the team needs to adapt to the changing environment quickly. These challenges are addressed by several strategies [9], such as selecting the right team members, creating a shared and clear vision and defining exact roles and responsibilities. These strategies are used to understand the team member’s skills and values, which helps to learn the team objectives. According to their understanding, roles and responsibilities are given to the team to make effective decisions in a dynamic environment [10]. Even these strategies face conflicts, disagreement, and resource scarcity issues due to improper communication efficiency, which affects the entire task allocation and completion rate. In addition, entrepreneurial ventures utilize rule-based [11] and static frameworks [12–13] to improve ET performance. Still, they face difficulties differentiating the environmental conditions and team priorities.
The Research Questions in this study are:
- How might a Multi-Agent Reinforcement Learning (MARL) framework improve goal alignment, communication, and task allocation in entrepreneurial teams?
- How well can MARL-based solutions balance workload distribution, maximise resource productivity, and minimise operational downtime in entrepreneurial teams?
- How can MARL incorporate adaptive learning processes to help entrepreneurial teams adjust to dynamic priority shifts and make better decisions?
Therefore, this work uses Multi-agent reinforcement learning (MARL) to improve the adaptability and collaboration between the members and ensure effective interaction in the dynamic ET environment. This framework observes the environment and permits agents to decide depending on the experience and observation that maximize the ET collaborative efficiency. Multi-Agent Reinforcement Learning (MARL) can represent and enhance collaboration efficiency between several agents. A paradigm is built that integrates Proximal Policy Optimization (PPO) with customized reward systems to optimize entrepreneurial teams’ performance regarding workloads, conflicts, and productivity. The effective utilization of rewards and penalty schemes in the ET process helps improve overall collaboration and reduce delays while understanding the task in a dynamic environment. This process is developed in different agent environments, and the efficiency of the MARL is validated using a set of hypotheses and case studies. The case studies are created in the product lifecycle, manufacturing and logistic coordination, which helps to validate the ET’s adaptability, task allocation rate, scalability, downtime duration and operational efficiency. Then the overall contribution of this work is listed below.
- To analyze the ET dynamic environment using the MARL approach to enhance team collaboration by optimizing goal alignment, communication and task allocation.
- To maximize resource utilization by balancing the workload, resource productivity and minimizing the downtime according to the ET environment.
- To enable the ET to adapt to the priority change by incorporating adaptive learning to improve decision-making efficiency.
This manuscript is organized as follows: Section 2 deliberates the comparable study of ET from different researcher’s perspectives. Section 3 describes the working process of MARL-based ET collaboration efficiency analysis. Section 4 explains the various hypotheses and case studies to validate the MARL systems, and the conclusion is described in Section 5.
2. Comparable research analysis
Krawczyk-Bryłka B. et al. [14] analyze the collaboration principles importance in entrepreneurial teams for improving member satisfaction. The ETICP tool is utilized to identify the difference between established and nascent teams according to their relationship, which is directly linked with the venture performance. The author uses a set of questionnaires with 6-item scale values, and nine collaborative principles are utilized to evaluate the entrepreneurial team member’s satisfaction. This study does well on entrepreneurial team building, however it doesn’t analyze Poland’s leading research and education programs for entrepreneurs. Han et.al [15] advocated selecting entrepreneurial team members using a statistical LSTM chain network and social network analysis (SLSTM-CAN), a novel approach. This work intends to choose the high potential members for the team to improve the start-up success rate. The selection procedure is performed to explore the social network structure, collaborative capacities, and connections. This information is processed with the help of an introduced classifier that permits a forecasting system to identify the entrepreneurial members according to the dynamics and trends. The classifier-based member selection improves business growth by up to 25% and 30% in adaptability.
Chughtai M. S. et al. [16] introduced adaptive learning to improve organizational innovations by understanding leadership roles. This work is implemented using SPSS, Smart-PLS and AMOS software, and a simple random sampling approach is utilized for data collection. The gathered data is explored with the help of the interaction-effect analysis approach that is used to validate the correlation and reliability of the team. Along with this, leadership ability, role and self-efficacy are evaluated to improve the innovations in the organization. Therefore, the study focuses on leaders’ self-efficacy to enhance organizational innovations. Bouncken R. et al. [17] analyzed the fostering of entrepreneurship coworking space to improve the sharing and digital economy. This study provides the guidelines for understanding the dynamic condition of entrepreneurship coworking space to offer collaborative knowledge for sharing the resources to improve business growth. This study enables the interaction between the team members to manage flexibility and adaptability while driving innovations. Thus, the system provides the platform for improving the economic landscape and growth and effectively gives collaboration opportunities.
Covin, J. G. et al. [18] investigate the constructs of Team Entrepreneurial Orientation (TEO) and Individual Entrepreneurial Orientation (IEO), portraying a theory of how both orientations are likely to affect performance at the organizational level. The authors constructed and justified the IEO measurement scale and emphasized its essential components: risk, proactiveness, and innovativeness. They suggest that TEO is formed when team members have different goals but create together, and their goals are the same, facilitating effective team processes and performance. The study uses qualitative comparative analysis to identify successful configurations in teams, emphasizing the individual and the team entrepreneurs. Donbesuur F. et al. [19] inspect the correlation between the new venture performance according to the entrepreneurial orientations by identifying their actions and contingent roles. The orientation observes the dynamic environment based on the actions, business networking, opportunity discovery and support systems. The observed environment details help to provide robust collaboration, improving the overall performance of new ventures. Sutrisno, S., et al. [20] considered the contribution of information technologies for introducing new ideas and growth of business structures in the founding. The research shows that entrepreneurs are required to make good use of techniques to improve the creation of products, better the organization of processes, and reach more markets. The authors show that information technology accelerates business innovation and improves customer experience, making it essential for corporate growth by using qualitative analysis and secondary data collection. The results portray the need for entrepreneurs to embrace technology if they are to stay competitive in the business environment. On the other hand, Du et al. [21] proposes a MARL framework that is both scalable and safety-constrained for complex multi-agent systems. Enhanced coordination in the face of uncertainty can be achieved through the integration of robust policy learning and adaptive risk management. When it comes to dynamic situations, such as entrepreneurial teams, where agent collaboration and safety are essential for decision-making, the study is extremely significant. In their study [22], Jeloka and colleagues create a MARL technique that makes use of mean-field interactions to replicate the behaviour of large-scale, competitive teams. Providing significant parallels to the optimisation of collaborative tactics within big entrepreneurial teams that are navigating dynamic and competitive business situations, the research clearly demonstrates how PPO can scale to a large number of agents while retaining learning stability. A unique form of PPO called PPO-ACT is presented by Yang et al. [23]. This variant makes use of adversarial curriculum transfer in order to increase cooperation in spatial public goods games. Learning is dynamically adapted through this approach, which makes it suited for contexts that involve complicated and ever-changing teams. The lessons it teaches are valuable to entrepreneurial teams who are looking to improve their agility and their capacity to efficiently coordinate tasks. From the various researchers’ opinions, the information technology framework, data analysis, and innovation exploration process improve ET efficiency. However, the ET faces difficulties because of improper communication and resource allocation, which affects the entire task allocation and completion rates. These difficulties completely affect the efficiency of ET collaboration. Therefore, the research difficulties are addressed by including reinforcement learning to improve the ET collaboration efficiency.
Recent Research on AI-Based Collaborative Systems and Reinforcement Learning in Entrepreneurial or Multi-Agent Contexts as shown in Table 1 below:
3. Context of Reinforcement Learning (RL) to team collaboration in entrepreneurial team (ET)
3.1 Impact of RL framework
The main objective of this framework is to enhance the collaboration and team function in entrepreneurial teams. The work uses Reinforcement Learning (RL) to optimize decision-making, communication, conflict resolution, and task allocation in a dynamic environment. The manual task allocation in ET causes overburdened and inefficient problems that reduce the entire ET efficiency. The RL algorithm uses the agent and action concept to address task allocation issues by considering every member as the agent. For every agent’s workload, task priorities, skill, and balancing criteria help choose the team member to assign the specific task. Then, the team member’s improper communication creates misunderstanding issues, affecting decision-making efficiency. The introduced RL approach observes the interaction patterns, which helps identify the exact points and eliminate irrelevant interactions, improving ET clarity and decision-making efficiency. The ET team faces conflicts because of resource constraints, whereas the RL predicts the previous conflict patterns and provides solutions that reduce the delay and improve satisfaction. The ET functions in a dynamic and uncertain environment, so collaboration fails. The incorporated RL techniques provide effective, scalable, and reliable strategies for improving overall collaboration while making decisions. During the analysis, RL observes the agent’s actions that enhance ET’s cohesion and productivity. Then, the objective of this work is defined in Equation (5), which explains the collaborative efficiency . These objectives are created with certain constraints, which are described as follows.
Unified constraints.
As discussed, the RL is utilized in several parts of ET, such as task allocation, decision-making, etc. For a particular task the team member
is assigned depending on the priority
their skill set
, workload
. Then reward
should be allocated to the
depending on the performance
. Then, the task allocation is defined as
with specific constraints as defined in Equation (1)
Once the task is assigned, the is improved with the help of proper communication
relevant communication
, time
spent on
, task urgency
and
quality feedback
. Then the
efficiency is measured with the specific constraint that is defined in Equation (2)
During the , the task conflicts
occur, which is minimized with the help of the resolution priority
, resource allocation
and dissatisfaction score
. Therefore, the conflicts are optimized with the specified constraint that is defined in Equation (3)
After reducing the conflicts environment
should be observed, and adjust the ET strategies
for minimizing the loss function
Then, the learning process for task
is defined with constraint in Equation (4)
These overall constraints and descriptions are used to improve the productivity
, ET goal score
and cohesion score
. Then, the ET productivity is defined as
. Finally, the overall objective of this work is defined in Equation (5) based on the above computation constraints.
3.2 Research methodology
The entrepreneurial team uses a Multi-Agent Reinforcement Learning (MARL) framework to ensure ET collaborations and improve team efficiency in a dynamic environment. Initially, the environment was designed considering several components, such as conflict resolution, communication dynamics, and task allocation. These components cover task priorities, cohesion, productivity, and workload. These variables changed depending on the external events and agent actions. In the ET design, each team member is considered an agent with specific skills and decision-making capabilities, which helps improve the team’s performance. Every agent is involved in the ET components to meet their team goal and objectives. According to the
actions, rewards are allocated, and penalties are provided for uncertain tasks like imbalance workload distribution and communication delay. This process improves the
actions to reach the team goal effectively. This process is continuously performed and is measured in terms of a continuous loop, and with every iteration,
actions are observed based on the policies. The ET environment observes these actions and provides feedback in terms of penalties
or rewards
. The utilized policies are frequently updated using the proximal policy optimization (PPO) algorithm in which
continuously update their decisions. The PPO algorithm is an efficient and stable approach that optimizes the
performance. This process is repeated, and continuous output is obtained, which enhances the team performance computed in terms of communication relevance conflict and improves the team’s productivity. The structure of MARL is illustrated in Fig 1.
Fig 1 illustrates the framework of MARL-based collaboration efficiency in the ET environment. The ET environment has a few design components, such as communication flow, task allocation, and goal setting, which help to initiate the interactions for the particular task
to meet the business objectives. Then, the ET state is defined by observing the
concerning their skill requirement, priorities, workload (
of
, communication between the agents
, resource allocation conflicts
, team goal score
productivity score
, cohesion score
and dissatisfaction score
. According to these parameters, the
is defined for agent action
is defined using Equation (6)
After defining the , agent
has to be defined to perform the particular task
. The
is defined by the respective skill set
(decision policies). The
intends to enhance the
by performing task allocation
communication
and conflict resolve
. The rewards are allocated to the
based on the effective task completion (
, communication (
, conflict resolution (
and team workload balance (
. Then the
is estimated using Equation (7)
In Equation (7), is estimated from the successful task assignments
, relevant communication
, unresolved conflict penalty
and team productivity-cohesion
. After providing the
to
the policies need to be updated for
learn the policies to improve the rewards for attaining efficiency and stability. The policies are updated with the help of proximal policy optimization (PPO), which helps manage ET stability, efficiency, and scalability. The PPO-based policy update procedure enhances the environment interactions and ability to handle the complex environment. The PPO upgrade the
policies by exploitation and exploration during the policy update, and the updated values are close to the
previous policies. Therefore, the main objective of the PPO-based policy update is defined in Equation (8)
The PPO objective is achieved from various key components such as surrogate objectives (
, value function
and entropy bonus
. The
used to improve the exploration process while examining the
on specific state
. In addition, the PPO trains the
value function to predict the return with the help of the cumulative reward value
. As discussed, the policies
updated depending on the previous policy information; therefore, the rate value is estimated as
. During the computation, clipping factor
is used to manage the ET stability and weight values
are utilized while computing the PPO objectives. Then, the overall structure of PPO based policy update process is shown in Fig 2.
Fig 2 illustrates the PPO structure in ET to improve the collaboration between the team members to complete the task. The input layer gets the inputs (
as input, the shared neural network processes extract the features for value and policy functions. The main intention of the network is to get features like task progress, communication dynamics, and workload distribution for particular tasks. During the analysis, convolution layers are utilized to derive the high-level features for input
. The extracted
reduces the overhead and redundancy issues while analyzing the
environment. The derived features and patterns are denoted as
which used to get the
and
values from policy and value function networks. Theinput layer receives the
information, which is defined as
that covers continuous and discrete values that need to be normalized for further computation. The
is fed into the hidden layer that uses the
for computing the output value. Then, the hidden layer
computation is defined in Equation (9)
In Equation (9), first hidden layer output is defined as that is computed by utilizing
activation function that is computed as
. Similarly, the k hidden layers compute the output for deriving the information for policy and value functions. The hidden layer is selected depending on the
space, which is used to identify the relationship between the features. The output is computed as
which is compressed depending on the
. The estimated
value is divided into the policy and value functions to improve the overall ET performance. The policy network estimates the action probabilities with the help of the SoftMax activation function, which is calculated using Equation (10), and the value network identifies the state value with the help of the linear layer.
After computing the and
which network gradient value is estimated from the computation for deriving the
is estimated from Equation (8)
. The
value is computed with the help of an advantage estimate
and probability ratio
. In addition,
is computed for
which is used to estimate the deviation between the actual and predicted return value that is derived as
. During the loss computation, the exploration is encouraged by computing the
value, and the total loss is estimated as
. According to the computed
, environment interactions are improved by considering their
Then, the policies are updated frequently to improve the overall ET environment. The combined neural network is a single device for embedding the policy and the value functions. It provides efficient computations, uniformity of representation, and quick convergence. Because of this, the network eliminates unnecessary computing by sharing weights while maintaining specificity and generality throughout learning different tasks. Then, the overall working process of MARL pseudocode is described in Pseudocode 1 below:
Pseudocode for MARL
Pseudocode 1 illustrates for MARL-based ET performance improvement. The algorithm works started from ,
. The selected
are updated using PPO, which balances exploitation and exploration while improving ET performance. Effectively utilizing agent, reward and actions-based task analysis processes improve task allocation excellence, communication relevances, resolving conflicts, productivity and team cohesion. Then, the overall interaction of every member in ET is illustrated in Fig 3.
According to the Fig 3, it clearly shows that mechanism played an essential role in improving the efficiency of ET collaboration. The
the process used to regulate the
behaviour towards the
allocation and workload
balance. The
mechanism covers the immediate rewards and task delay rewards, which helps analyze the ET performance. Then the
is computed at t as
. From the computation,
is estimated from unsolved conflicts
, weights (
, task completion score
and workload
. Then, the reward scenario is analyzed using a 5-agent ET situation with dynamic
, communication delay and skill sets. During the analysis, 100 simulation cycles are utilized because of task complexities. In this scenario, the reward efficiency is explored in terms of the initial phase (0–20 episodes), learning phase (20–60 episodes) and stabilization phase (above 60 episodes). In the initial phase,
prioritize the rewards according to the team dynamics. In the learning phase,
adapt to the
process tasks are allocated to minimize the conflicts and, finally, the optimal
are allocated to the members to manage the collaboration. According to the analysis, the obtained
value is described in Fig 4.
Fig 4 clearly shows that approach is effectively utilized to improve ET performance while deriving collaboration efficiency. The gradual increment of
the reinforcement learning approach helps meet the team goal with minimum computation difficulties. In addition, the
are provided depending on the task completion and workload balance, directly indicating team improvement and stabilization in a dynamic environment. The
efficiency is further evaluated in terms of resource utilization
, workload balance
, collaboration quality
, adaptability
, resolution conflict
and overall productivity
. For these metrics, the efficiency of the MARL approach on ET performance is evaluated, and the respective results are shown in Fig 5.
(a) analysis (b)
Analysis (c)
analysis (d)
analysis (e)
analysis (f)
analysis.
Fig 5 illustrates that the MARL-based optimized reward mechanism improves ET collaboration efficiency in a dynamic environment. The excellence of the system is evaluated using various metrics, such as . In MARL,
adjusts their
to ensure the goal and balance the exploitation and explorations. During this process, the PPO learning process minimizes conflicts and improves the overall decision-making in task allocation. The iterative procedure reduces the computation difficulties and overall productivity in an ET dynamic environment. The reward mechanism clearly shows MARL’s efficiency while allocating tasks to the members according to their skills and priorities. The interaction between the team members and agents improves the overall collaboration in the ET environment. Frequently updating policies and rules helps maximize the overall team efficiency. Then, the collaboration efficiency is explored in terms of task completion time (
), communication efficiency (
and team cohesion score (
. Then, the obtained collaboration efficiency is shown in Fig 6.
(a) analysis (b)
Analysis (c)
analysis (d)
analysis.
Fig 6 shows the collaboration efficiency analysis of the MARL framework in the ET environment. The collaboration efficiency is determined using described in Fig 6a, which is evaluated in three different phases. From the analysis, the
value is higher in the initialization phase because of agent suboptimal coordination. After understanding correctly, the agent observes and learns the task features, allocation is performed effectively, and stability is maintained throughout task completion. The effective interaction and communication process reduce the completion time. The
Also, the minimum is at the beginning of the interaction because of unclear coordination and redundant interactions. The enhancement in understanding improves the overall
which indicates that the system ensures effective collaboration in the ET team process. Similarly,
value also having the minimum value in the initial stage; once the agent understands the overall complexity of task allocation, the entire collaboration efficiency is improved, as described in Fig 6c and 6d. These findings illustrate how MARL increases the level of cooperation by enhancing the division of agents, communication, the resolution of conflicts, and the unification of the team, which improves performance and productivity over time. The constant growth observed in all metrics reflects the system’s flexibility and demonstrates the opportunities MARL provides to enhance team dynamics. Then, the efficiency of this MARL study is evaluated using the respective case study analysis.
4. Research case analysis
This section examines how well the Multi-Agent Reinforcement Learning (MARL) framework analyses entrepreneurial team (ET) collaboration. Before assigning jobs, the analysis evaluates each agent’s behaviours, skills, and priorities. State, action, and reward reinforcement learning improves cooperation efficiency and business results iteratively. Python-based reinforcement learning programmes generated simulated datasets for the case studies, ensuring controlled and reproducible framework performance evaluation. No real-world or private data was used, hence no external data access or written agreements were needed. This method meets the journal’s ethical criteria because it doesn’t involve humans or confidential data. The hypotheses are explained in Table 2(a) and Table 2(b).
The data are taken from Academic and Entrepreneurial Development Dataset [30]. Undergraduates’ academic, behavioral, and entrepreneurial traits are recorded in this dataset in real-time. It includes 214,354 records with more than 40 variables, pulled from a variety of sources, including demographics, academic achievement, extracurricular activities, personality traits, and psychological characteristics. Created with educational and entrepreneurial research in mind, it yields useful information about talent prediction, skill development, and career outcomes. The analysis was conducted using Python 3.11, with TensorFlow and OpenAI Gym frameworks to implement the Multi-Agent Reinforcement Learning (MARL) model. Each agent represented an entrepreneurial team member, and the learning environment was designed to optimize communication, coordination, and decision-making outcomes. The MARL method was chosen due to its ability to model interdependent adaptive behaviors that characterize real-world entrepreneurial collaboration. Key hyperparameters were tuned through sensitivity analysis, including a learning rate of 0.001, a discount factor of 0.95, and an exploration decay rate ranging from 0.9 to 0.1 across 500 episodes. A reward convergence threshold of 10 ⁻ ³ was applied to ensure training stability. The experiments were repeated five times, and the averaged results were used to validate model reliability and consistency.
Table 2(a) presents a structured mapping between each research hypothesis (H1–H10) and the supporting scholarly references cited in the literature review. The studies of Han [15], Du [21], Lv [25], and Zhou [27] support H1 on task completion efficiency, while Krawczyk-Bryłka [14], Covin [18], Donbesuur [19], and Yang [23] support H2 on communication efficiency. Adaptive leadership, entrepreneurial team dynamics, DRL, MARL, incentive mechanisms, scalability, and long-term collaboration references support hypothesis H3–H10. It highlights how each hypothesis is grounded in previous research findings, ensuring theoretical relevance and academic rigor in the development of the MARL-based entrepreneurial team collaboration framework.
4.1 Case study discussions
The hypothesis from H1 to H10 explores the efficiency of the MARL framework in examining the entrepreneurial team’s collaboration efficiency (ET). Here, three case studies are utilized to investigate the efficiency of the MARL system, which is described as follows.
Research case 1: Collaborative product development.
A complete product lifecycle depends on the designer, engineer, and marketer’s work because the product should be designed with specific prototypes, and marketing should be created to improve overall efficiency. The entire project team works in a dynamic environment to meet the customer requirements. These scenarios address the ,
, and
; along with this case study evaluated using the
time at every stage,
among the team members and
according to the goal alignment. For this case, the
should be created for specific tasks with constraints defined in Table 3.
According to the above task, constraints and actions, the reward is computed using the MARL framework to increase the
,
and
. First, the task-finishing reward
has to be allocated for every member
because the
helps to complete the task in a fast manner. For every
the framework follows the {
strategies to complete the
at particular timeline
and the system gives penalties for every delay. Then, the reward is defined as
; the scaling factor is defined as
. The computed
value is given to the agent
depending on their performance and the team’s cumulative performance. The
receives the high
value according to the fastest completion. The
are act independently at the initial stage, which leads to lower
value due to the improper communication. Once the framework is stable, the
receives the high
value. The
is given to the
depending on the
because the member should be balancing their work in a dynamic environment. The
concerning
is computed as
. If any team member has an imbalance in the workload, the entire performance is affected. At the same,
handles more than 50% of the task at the initial stage, which reduces the
to 0.7, and the stabilization phase, the
increases to 0.97. Then,
in cohesion process
motivates
cooperations to achieve their goal in minimum time
. The
is obtained from communication quality, task synchronization and mutual support. Therefore,
improves the
. As same in the initial stage,
obtained minimum
value when in the stabilization face, the
value increased up to 0.95. Then, the combine
is defined in Equation (11)
In the Equation (11), is a higher value at the initial stage to prioritize the
and
is augmented gradually to boost cohesion and balance. According to the discussions, the
Case 1 is illustrated in Table 4.
From Table 4 it clearly shows that for case 1, the product development process improves their performance from to
which means the effective computation of MARL identifies the
for every
,
and
to ensure stable policies and maintain continuous collaborations. Then, case 1 is evaluated using the
,
and
metrics with three stages
,
at different episodes, and the results obtained are shown in Table 5.
Table 5 illustrates that the efficiency analysis of case 1 with three metrics ,
and
. The analysis clearly says that the
gradually increased from
to
because of the effective coordination that validates the hypothesis H1. Then
evenly distributed in three phases, which supports the hypothesis H4 and
helps
learn their strategies to meet the goal and enhance the cohesion, which satisfies hypothesis H5. Therefore, this case study clearly shows that the MARL approach effectively balances the agent’s roles, optimizes collaborations and improves the ET to fulfil the objectives successfully.
Research case 2: Start-up crisis management.
Every start-up team and business faces a crisis because of fast task reallocation, resource shortages and resolution conflicts between the members with high responsibilities. This case study helps to address the conflict resolution (H3), communication optimization
(H2) and adaptability
(H6) hypothesis. These hypotheses are handled with the help of different metrics such as conflict resolution time
, communication efficiency
and adaptability score according to
. This case study uses three agents: a developer
, a tester
and a product manager
. The agents are represented as
; their objective is to improve the start-up productivity by reducing the conflicts delay and aligning the ET goal. During the every task allocation and process,
is utilized for creating effective software development. First,
bonus value is given for every
of team members
to motivate to complete the task
at a given time. Then the
for
is defined as
in which expected (
and
actual completion time for
is estimated along with the time weight value
. Then rewards
is allocated to the
for successful address of bugs, which is defined as
. The bug-resolving characteristics help to understand the
skill for particular software development. In addition, the collaboration score
value is provided for
to successive prioritize tasks and communication. The
. Finally, the penalty
also given to the
for their negative actions or idle state in
Therefore, the entire
value is estimated using Equation (12).
These rewards are obtained from the MARL process in which the system inputs the backlog size, task progress, communication frequency, and bug severity. The inputs are processed by to derive the features fed into the policy and value network to get the optimized actions and respective values for each pair. During the
, goal reprioritization, bug escalation, and task allocation process is performed. According to these actions, the PPO algorithm updates the policies to improve the overall collaboration goal. Based on the discussion, the obtained values for case 2 are shown in Table 6.
Table 6 illustrates the efficiency analysis of case 2, which is determined in terms of task completion time (, bug resolution
, idle time percentage
and collaboration quality
. From the analysis,
value is minimized from
to
which indicates that the
manages the task allocation and prioritization time once they have the proper understanding of the task
. Then, for case 2, the
is increased by 45% from
to
(95%) that represented that,
understand the bug type and prioritize according to their importance, augmenting the debugging time. The
value is increased from 0.43 to 0.97, showing that MARL successfully improves teamwork, effective communication and understanding of ET goals. Finally, the
is minimized from 34% to 3%, which shows that idle
engaged in auxiliary roles and augmenting resource utilization. From the analysis, the mapping of the hypothesis concerning these metrics is shown in Table 7.
Case Study 2 is concerned with applying the MARL framework within an agile software development team. It attempts to address task reallocation, resource shortages and resolution of conflict issues. Results suggest that there were significant improvements across all metrics, and agents were able to learn how to coordinate themselves optimally, set shared objectives, and reduce idle time. The findings support the hypotheses H2 (, H3
and H6
respectively.
Research case 3: Strengthening team for crowdfunding success.
The entrepreneurial side of the team continues to expand as they run a crowdfunding activity, growing from three to six agents with different roles: content creators, outreach managers and financial analysts in the company. This case study was used to address the scalability (H7), reward optimization
(H8) and long-term and stabilization efficiency
(H10). These hypotheses are evaluated using different metrics like collaboration efficiency
, total reward efficiency
and long-term stabilization
. The case study uses the three
such as product manager
, quality analyst
and logistics coordinator
;
. For every function, reward
is tailored to improve overall manufacturing. The total reward
estimation for this case study 3 is shown in Equation (13).
Initially, the production throughput related rewards are computed as
. After allocating the
, the defects are examined and the
or
correctly identifies the defects, then defect reduction
is given as
. Along with this, logistic efficiency is evaluated, and the respective rewards are provided, which is defined as
. According to their performance, collaboration incentives
is given that is computed as
and the penalty is also given for the downtime that is measured as
. Based on the discussion, the obtained values for case 3 are shown in Table 8.
Table 8 clearly shows that case 3, based on MARL efficiency analysis, in which the is increased from 73% to 98% of
to
. The
increase shows that better task scheduling and coordination between the
. Then the
is reduced significantly from 17% to 3%, which creates an impact on
and quality check prioritization. The
is increased from 65% to 97% of
to
that represented supportive supply chain management. The
value is slightly improved from 0.55 to 0.96, effectively validating the team collaboration goal. finally, the
is reduced from 23% to 2.7%, showing that the project uses the optimized resources and minimum idle time. The analysis shows the mapping of the hypothesis concerning these metrics in Table 9.
Table 9 clearly shows that the MARL approach successfully addresses the scalability (H7), reward optimization
(H8) and long-term and stabilization efficiency
. The results indicate the computed metrics, such as
,
create an impact on the collaborative and adaptability capabilities of the system. Therefore, the research applies a multi-agent reinforcement learning (MARL) framework to improve the collaboration of entrepreneurial teams. The framework integrates team dynamics issues within varied case studies by improving task distributions, inter-team communication, and system usage proficiency. The results confirm that the framework works with each case, registering increased productivity, better utilization of resources, and higher quality of output, as measured against some standards. The insights suggest that MARL could improve the structure of team-based operations and improve their efficiency.
5. Conclusion
This study details a Multi-Agent Reinforcement Learning (MARL) framework to increase entrepreneurial team collaboration. The framework improves communication, resource utilisation, and downtime by continuously assessing team members’ competencies and dynamically assigning tasks. Individual performance rewards ease job allocation in dynamic circumstances and foster team cooperation. Agents’ role adaptation and alignment with shared goals promote productivity, lessen interpersonal disputes, and improve change adaptability. The framework was proven effective in various team-based situations through case studies in agile software development, manufacturing optimisation, and logistics coordination. Future research should integrate advanced AI models like deep neural networks to improve decision-making precision and scale the system to handle bigger, more varied teams. MARL’s effects on team performance, flexibility, and long-term company success must be assessed in entrepreneurial ecosystems in real time.The study uses simulated data, which may not accurately depict entrepreneurial teams. The MARL framework may be outdated, and scalability in bigger, heterogeneous teams is unknown. Agent modelling simplifies complex human behaviours, but real-time deployment in practice is still unexplored.
References
- 1. Bender-Salazar R. Design thinking as an effective method for problem-setting and needfinding for entrepreneurial teams addressing wicked problems. J Innov Entrep. 2023;12(1).
- 2. Tinh NH, Trai DV, Trang NTT, Tien NH. Knowledge transfer and succession process in small family businesses. Int J Entrep Small Business. 2025;1(1).
- 3. Hsieh C, Lee WJ. How would autonomist and autocratic teammates affect individual satisfaction on prefounding entrepreneurship teams?. J Small Business Manag. 2020;61(2):659–703.
- 4. Ceriotti LF, Gatica Soria LM, Guzman S, Sato HA, Tovar Luque E, Gonzalez MA, et al. The evolution of the plastid genomes in the holoparasitic Balanophoraceae. Proc Biol Sci. 2025;292(2043):20242011. pmid:40132625
- 5. Joel OT, Oguanobi VU. Entrepreneurial leadership in startups and SMEs: critical lessons from building and sustaining growth. Int J Manag Entrep Res. 2024;6(5):1441–56.
- 6. Hong S, Zheng X, Chen J, Cheng Y, Wang J, Zhang C, et al. MetaGPT: meta programming for multi-agent collaborative framework. arXiv preprint. 2023.
- 7. Schlaegel C, Gunkel M, Taras V. COVID-19 and individual performance in global virtual teams: the role of self-regulation and individual cultural value orientations. J Organ Behav. 2023;44(1):102–31. pmid:36712194
- 8. Tsai JC-A, Jiang JJ, Klein G, Hung S-Y. Task conflict resolution in designing legacy replacement systems. J Manag Inform Syst. 2023;40(3):1009–34.
- 9. Ahmed HB, alzuoubi M. Designing Accessible Virtual Reality Interfaces Using Reinforcement Learning for Users with Motor and Sensory Impairments. PIQM. 2025.
- 10. Karneli O. The role of adhocratic leadership in facing the changing business environment. JAdman. 2023;1(2):77–83.
- 11. Yaiprasert C, Hidayanto AN. AI-powered ensemble machine learning to optimize cost strategies in logistics business. Inter J Inform Manag Data Insights. 2024;4(1):100209.
- 12. Wang K, Jing P, Qu H, Huang L, Wang Z, Liu C. Study on wetting mechanism of nonionic silicone surfactant on coal dust. Heliyon. 2023;9(6):e16184. pmid:37265615
- 13. Zhang G, Li X, Hu G, Li Y, Wang X, Zhang Z. MARL-based multi-satellite intelligent task planning method. IEEE Access. 2023;11:135517–28.
- 14. Krawczyk-Bryłka B, Stankiewicz K, Ziemiański P, Tomczak MT. Effective collaboration of entrepreneurial teams—implications for entrepreneurial education. Educ Sci. 2020;10(12):364.
- 15. Han P. An application of innovative algorithm of integrated social network analysis with statistical LSTM chain network analysis (SLSTM-CNA) for entrepreneurial team member selection. J Electri Syst. 2024;20(3s):1592–602.
- 16. Chughtai MS, Syed F, Naseer S, Chinchilla N. Role of adaptive leadership in learning organizations to boost organizational innovations with change self-efficacy. Curr Psychol. 2023;:1–20. pmid:37359696
- 17. Bouncken R, Ratzmann M, Barwinski R, Kraus S. Coworking spaces: empowerment for entrepreneurship and innovation in the digital and sharing economy. J Business Res. 2020;114:102–10.
- 18. Mrad M, Cui CC. Comorbidity of compulsive buying and brand addiction: an examination of two types of addictive consumption. J Business Res. 2020;113:399–408.
- 19. Donbesuur F, Boso N, Hultman M. The effect of entrepreneurial orientation on new venture performance: contingency roles of entrepreneurial actions. J Business Res. 2020;118:150–61.
- 20. Sutrisno S, Kuraesin AD, Siminto S, Irawansyah I, Almaududi Ausat AM. The role of information technology in driving innovation and entrepreneurial business growth. J Minfo Polgan. 2023;12(1):586–97.
- 21. Du H, Gou F, Cai Y. Scalable safe multi-agent reinforcement learning for multi-agent systems. IEEE Transac Neural Netw Learn Syst. 2025.
- 22. Jeloka B, Guan Y, Tsiotras P. Learning large-scale competitive team behaviors with mean-field interactions. IEEE Transac Games. 2025.
- 23. Yang Z, Li C, Wang X, Tian Y. PPO-ACT: proximal policy optimization with adversarial curriculum transfer for spatial public goods games. IEEE Transac Artificial Intellig. 2025.
- 24. Zheng X. Construction of an innovative entrepreneurship project learning platform introducing a group recommendation algorithm for college students. Entertain Comput. 2024;51:100666.
- 25. Lv B, Jiang J, Wu L, Zhao H. Team formation in large organizations: a deep reinforcement learning approach. Decision Support Syst. 2024;187:114343.
- 26.
Duraimutharasan N, Deepan A, Swadhi R, Velmurugan PR, Varshney KR. Enhancing control engineering through human-machine collaboration. In: Advances in computational intelligence and robotics. IGI Global; 2025. 155–76. https://doi.org/10.4018/979-8-3693-7812-0.ch008
- 27. Zhou J, Zheng L, Fan W. Multirobot collaborative task dynamic scheduling based on multiagent reinforcement learning with heuristic graph convolution considering robot service performance. J Manufact Syst. 2024;72:122–41.
- 28. Tu C, Yu Z, Huang J, Huang F, Wu Y, Han L, et al. Adaptive role learning with evolutionary multiagent reinforcement learning for UAV-vehicle collaboration in sparse mobile crowdsensing. IEEE Internet Things J. 2025;12(18):38755–71.
- 29.
Duraimutharasan N, Deepan A, Swadhi R, Velmurugan PR, Varshney KR. Enhancing control engineering through human-machine collaboration. In: Advances in computational intelligence and robotics. IGI Global; 2025. 155–76. https://doi.org/10.4018/979-8-3693-7812-0.ch008
- 30.
Academic and entrepreneurial development dataset. https://www.kaggle.com/datasets/datasetengineer/academic-and-entrepreneurial-development-dataset