The improved business valuation model for RFID company based on the community mining method

Nowadays, the appetite for the investment and mergers and acquisitions (M&A) activity in RFID companies is growing rapidly. Although the huge number of papers have addressed the topic of business valuation models based on statistical methods or neural network methods, only a few are dedicated to constructing a general framework for business valuation that improves the performance with network graph (NG) and the corresponding community mining (CM) method. In this study, an NG based business valuation model is proposed, where real options approach (ROA) integrating CM method is designed to predict the company’s net profit as well as estimate the company value. Three improvements are made in the proposed valuation model: Firstly, our model figures out the credibility of the node belonging to each community and clusters the network according to the evolutionary Bayesian method. Secondly, the improved bacterial foraging optimization algorithm (IBFOA) is adopted to calculate the optimized Bayesian posterior probability function. Finally, in IBFOA, bi-objective method is used to assess the accuracy of prediction, and these two objectives are combined into one objective function using a new Pareto boundary method. The proposed method returns lower forecasting error than 10 well-known forecasting models on 3 different time interval valuing tasks for the real-life simulation of RFID companies.


Introduction
In recent years, driven by the rise of wireless sensor network [1] [2], Internet of things [3], Radio Vehicular Networks [4], Cloud Systems [5] and other Cyber-Physical Systems [6], the development of radio frequency identification (RFID) industries has reached a new milestone. It makes the breakthrough progress in all areas of production and life, especially in the financial payment, retail supply chain, clothing supply chain, logistics, and product traceability. Those followed by the gradually expanding of RFID market. IDTechEx found that in 2015, the total RFID market was worth $10.1 billion, up from $9.5 billion in 2014 and $8.8 billion in 2013, and it will rise to $13.2 billion in 2020 [7]. As a result, the appetite for funding for RFID technology is growing rapidly. According to PrivCo statistics, there were 23 well-known RFID-a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 related companies' mergers and acquisitions (M&A) in the three years from 2013 to 2015, and in 2014 alone, 10 cases of M&A deals in related industries occurred [8].
As business valuation tasks appear throughout time in M&A operation, they are often solved with dozens of different models, for instance payback period method, non-discounted approach, discount algorithm, net present value method, present value index method, real options approach (ROA) [9] [10],etc. In business valuation system, profit forecasting is an important input for calculating value of the acquired company's worth. In M&A operation, buyers use profit forecasting for managing business evaluation and pricing strategies. Keeping a high profit forecasting accuracy is important due to low profit margins in the industry. It would be beneficial for all but especially for the buyers when they could have a solution that can play a role of expert for them and give them lowest profit forecasting error for all of business valuation tasks.
In general, profit forecasting is a difficult task as it is affected by a group of indicators which descript the conditions of the external and internal environment of the company. Holistic overview of the current literature indicates that profit forecasting methods are mainly divided into two categories. One is statistical methods, such as regression model (RM), time series analysis, and gray system theory, another is neural network (NN) method [11] [12]. However, little attention has been directed to network graph (NG) method. To address these problems, we define NG to describe the non-linear relationship between the company's net profitability and its external and internal environment factors [13], and develop the enhanced community mining method (ECMM) based on Bayesian method (BM) for company profit predicting. ECMM successfully models the complex nonlinearity and gives better results than the traditional RMs and NN methods because ECMM can set flexible coefficients for different values of the independent variable and adopt complex nonlinear function to map the relationship between the independent variables and dependent variable. In ECMM, BM is adopted to express the clustering rules of NG's nodes. Furthermore, the improved bacterial foraging optimization algorithm (IBFOA) is developed to calculate Bayesian probability function (BPF) as well as to optimize their parameters. IBFOA is distinctive because it generates new solution according to the statistical analysis of the existing solutions and adaptively adjusts its parameters, those can increase the diversity of the population and avoid the prematurity. Last but not least, we use bi-objective method to assess the accuracy of prediction and Pareto boundary method (PBM) [14][15] is created to integrate the two indicators into one objective function, as a result the reliability of the valuation model is improved.
Based on the predict result of the company's future net profit by ECMM, the company value is assessed using ROA since it is preferred in valuing projects where uncertainty is high [16]. Simulation results of RFID companies in China show that the proposed method can significantly improve the accuracy and reliability of ROA.
The rest of paper is organized as follows: Section 2 presents the company valuation system. Section 3 provides ECMM in NG. In section 4, IBFOA is proposed. In section 5, experiments with real word data are used to demonstrate the workability of our proposed method. Section 6 offers the conclusions.
RFID companies according to the forecast results of ECMM, which gives a full consideration on the investment opportunity value of the RFID companies.

Evaluation factors
Development of efficient methods to accurately estimate the company value requires assessing the conditions of the external and internal environment of the company. We consider the specific application environment for the products of each RFID company, for example, in the financial system, real estate, logistics, retail, product traceability, vehicle, railway, software system and other industries. Since downstream company usually purchases the large majority of its needs from the upstream company, downstream industry boom index is adopted to descript the demand scale of the downstream companies. The other external environment factors include domestic economic growth, inflation, and interest rate. And the factors depicting the internal conditions of the company consist of financial indicators which describe company's health survival and development capacity, company operation indicators that characterize business operating capacity, profitability indicators that represent the corporation profitability, and development capacity indicators that depict company potential ability for expanding scale and growing strength. The factor hierarchy structure shown in Fig 2 illustrates key factors which are said to impact on the company future profit growth. A total number of 29 indicators are considered, and all these 29 indicators can be further grouped into 8 dimensions. Once the key factors affected the profit are identified, we develop the value evaluation model which integrates these factors into a cause-effect framework, as shown in the following sections.

Evaluate the value of company's worth based on ROA
There is uncertainty over the future rewards from the RFID investment, and these investments are partially or completely irreversible. ROA are most valuable when uncertainty is high, therefore we use ROA to calculate the company value, which is given by the following formula. s.t.
where Va is the discounted value of real assets obtained by discounted cash flow method. Vc is the discounted value of future growth opportunities, i.e. the discounted value of options. F i is the net profit at year i and obtained with ECMM, which is provided in next section. r is riskfree interest rate, V S is the current value of the company, which is determined by tangible assets, such as currencies, buildings, real estate, vehicles, inventories, and equipment. Y is the final price of the company, T is the option expiration period, σ is the volatility of asset returns, N(x) is the cumulative probability distribution function of standard normal distribution.

Profit forecast based on ECMM
In this section, we present how to achieve low forecasting error by customizing graph construction and flexibilizing the net profit forecasting model based on ECMM.

Definition of NG
NG is formally defined as: G = (W; E), where W is the set of network nodes, E is the set of edges. Nodes denote independent or dependent variables. If two nodes appear in the same sample, an edge is drawn between them. We divide NG nodes into three categories: (a) evaluation factor node representing each evaluation factor, (b) classification node which denotes a grade of one evaluation factor, and (c) output node or clustering center node indicating one grade of output variable. Notice that each output node indicating the clustering center of one community. Each edge is assigned a weight by counting the number of its occurrences in dataset. An example of constructing a NG from the dataset shown in Table 1 is provided as follows.
In Table 1, X i means sample i, I i represents evaluation factor node i, Z ij represents the jth classification node of evaluation factor i, similarly, C i is the clustering center node and presents the ith grade node of output variable. In a sample, evaluation factor nodes, the classification nodes which indicates the value of each evaluation factor and the output node which denotes the sample output are indicated by 1, and other nodes are indicated by 0. Fig 3 shows NG generated from dataset in Table 1.

Clustering credibility and BPF
Basic measures of NG include: the node degree distribution, average path length, clustering coefficient, degree of correlation coefficient, betweenness, etc [17]. However, these measures usually indicate the structure features of NG, rather than expresses the clustering rules implied  in network. In this study, we propose the index of clustering credibility, namely, the posterior probability of the node belonging to a community. The credibility index helps to discover the rules hidden in the network. The basis unit of clustering rules of NG in Fig 3 can be represented as Fig 4, which indicates that: if the co-occurrence number of I i and C k is U ik , and that of I i and Z ij is d ij , the number of Z ij belonging to community C k is α ijk . In this paper, the main idea of CM is to determine whether node Z ij can be allocated to community C k , but the clustering rule in Fig 4 still fails to intuitively judge the credibility of node Z ij belongings to community C k . As a result, BPF is adopted to figure out the credibility of a node belonging to each community.
In Fig 4, the credibility of node Z ij belongs to community C k is mainly decided by parameter a ihk . Therefore, the clustering credibility of node Z ij is only determined by α ijk and d ij . However, constrained by the number of RFID companies in real life, adequate samples cannot be collected. So α ijk and d ij obtained by the finite samples are not representative and only according to these two parameters, it is difficult to get the accurate clustering credibility of a node. Thus, we add offset ε ijk and β ij to α ijk and d ij , respectively. Accordingly, the posterior probability of node Z ij belongs to community C k is P(C k |Z ij ;d ij ; α ijk ;β ij ;ε ijk ), and the estimated parameters are ε ijk and β ij . As BM is suitable for the situation where the variables for classification are parametric variables, it is used to obtain this posterior probability [18]. To sample X, in order to judge its output, we need to calculate the posterior probability P(C k |Z ij ;d ij ;α ijk ;β ij ;ε ijk ), then decide the community which Z ij belongs to. According to Bayesian formula, we have The derived equation above shows that by observing the value of P(C k |d ij ;α ijk ;β ij ;ε ijk ) and P (Z ij |C k ;d ij ;α ijk ;β ij ;ε ijk ), the posterior probability of node Z ij belongs to community C k can be converted to the posterior probability P(C k |Z ij ;d ij ;α ijk ;β ij ;ε ijk ).
The maximum likelihood estimation of P(C k |d ij ;α ijk ;β ij ;ε ijk ) is shown as follows: , where X l is training samples, ψ is the set of training samples, and |ψ| is the set size of ψ. P(Z ij |C k ;d ij ;α ijk ;β ij ;ε ijk ) is the class-conditional prior probability, and using the traditional method to obtain its expression is very difficult. Generally, if a node belongs to a community, the link density between it and the cluster center node of the community is high, otherwise the link density between these nodes is low. That is similar to the interaction force between particles in the electric field, namely the weak force among nodes result in low density and the high force among nodes leads to high density. Accordingly, P(Z ij |C k ;d ij ;α ijk ;β ij ;ε ijk ) can be approximately defined by Gaussian potential function (GPF) which is mainly used to describe the interaction force of electric particles [19], as is shown in Eq 8.
where the impact factor θ is used to adjust the influence boundary.

CM and profit forecast
After all the posterior probabilities of each classification nodes in a particular sample X being annotated to each category have been calculated, the overall probability for input sample X to be annotated to a particular category, C k is calculated (i.e. D(X,C k )), which is shown in Eq 9.
The Bayesian classifier is able to determine the right category of input sample X by the category which has the highest posterior probability value, as shown in Eq 10.

IBFOA
In this section, the major components of IBFOA are provided, and IBFOA is developed to optimize the parameters of BPF, say, ε, β, and θ.

Bacterium encoding
As shown in Fig 5, each bacterium solution mainly consists of three parts: the offset of edge weight between the classification nodes and cluster center nodes, the offset of edge weight among the classification nodes, and impact factor θ in Eq 8.

Assess multiple objectives based on PBM
In order to improve the prediction reliability of the algorithm, we optimize two objectives, i.e. MSE and ME, simultaneously. We utilize versatile computational intelligent models such as PBM for handling this type of problems. MSE and ME is shown in Eq 11 and Eq 12, respectively.
where O represents the set of all prediction samples, F i is the actual output of sample i,F i is the predicted output of sample i, n is the number of prediction samples. Next, we show how to integrate MSE and ME into one objective function according to PBM for the sake of converging into optimal solutions and maintaining the nature of uniform distribution among solutions.
Define that as long as MSE or ME of an solution is not dominated by others, the solution is not dominated by others, that is, given population P, for p, q2P, if MSE(p) < MSE(q) or ME(p) < ME(q), then p is not dominated by q. Then the main steps of ranking solutions using single non-dominated sorting method (SNDSM) are shown as follows: 1. All solutions are sorted in ascending order based on MSE and ME, respectively. Then the rank of solution q (8q) for each indicator is obtained, namely S MSE (q) and S ME (q).
2. Select the lower rank of the solution in two objectives, that is, min{S MSE (q),S ME (q)}. In the same way, the entire population is graded.
In fact, relying solely on the above SNDSM still cannot sort solutions in the same grade set, we call these solutions undetermined solutions (USs), and we further rank them according to the single congestion degree (SCD). Fig 6 shows the example of SCD. Define the ultra-optimized solution for each solution in US set, whose rank of each objective is immediately lower than the solution, for instance q' is the ultra-optimized solution of q. The boundary consisting of those ultra-optimized solutions are called ultra-optimized boundary, namely A'. SCD is proposed to measure the closeness between a solution and its ultra-optimized solution. Intuitively, SCD is the perimeter of the rectangle in Fig 6, which is composed of the solution point (i.e. q) and its ultra-optimized solution point (i.e. q') on the ultra-optimized boundary. For a solution in US set, the less SCD means more optimal, and result in lower ranking.

Chemotaxis operation
Unlike the fixed-parameter bacterial foraging optimization algorithm (BFOA), in the chemotaxis operation of IBFOA, the swim step is adjusted according to the solutions' objective values adaptively. The detailed chemotaxis operation is described like this: 1. Tumble operation. A unit length random direction, say vector Δ(i) 2 [−1,1], is generated for each bacterium and this defines the direction of movement after a move. The update of bacteria is mainly according to Eq 13.
Qði; t þ 1; kÞ ¼ Qði; t; kÞ þ Hðt; kÞ ΔðiÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi where Q(i,t,k) is the code of the kth position of bacterial i at iteration t, H(t, k) is the step size of direction for position k, which is generated randomly.

Swim operation. If the better solution is found in tumble operation, H(t, k)
is updated in the generated direction, as shown in Eq 14, and continue to create bacteria until no better ones are found. where MSE best is the best MSE value found so far and ME best is the best ME value.

Reproduction operation
In reproduction operation, half of the optimal bacteria reproduce their offspring. Specially, we rank each bacterium according to PBM and half of bacteria with poor ranking reproduce. Assuming that every component in bacteria is independent with each other, and follow the Gaussian distribution, the reproduction operation presented as follows: where η 1 and η 2 are the uniformly distributed random number in interval [0, 1], the formula generating η norm is called BOX-Muller formula, μ t , σ t are the mean and standard deviation of the element in each position of the bacteria generated in iteration t and before.

Elimination-dispersal operation
In traditional BFOA, the elimination-dispersal probability for each bacterium solution is equal [20]. However, that will lose potential Pareto optimal solutions, consequently IBFOA sets elimination-dispersal probability mainly according to the PBM ranks of bacterium, shown as follows: where R max is the maximum rank according to PBM, R i is the rank of bacterium i, P ed is the basic elimination-dispersal probability.

IBFOA procedure
A complete and detailed IBFOA pseudo codes are provided in the following.

Experiment design
The application of the proposed model in solving a real world business valuation problem by determining the community structure in NG is presented in this section. Performance assessment is made in terms of forecast accuracy, running time for optimal solutions and domain knowledge extraction.

Experimental setup and baseline methods
We collected data on 13 typical Chinese RFID listed companies, and their main business areas covered are: tags and packaging, reading and writing equipment, system integration and software development. This data set contains samples of these companies from 1997 to 2014. Removing invalid samples, and 155 short-term (one-year-ahead forecast) samples, 142 medium-term (two-year-ahead forecast) samples and 129 long-term (three-year-ahead forecast) samples were obtained. We scaled the collecting data into interval [1,5].
We compared the performance of our method with several existing methods: RMs, support vector machine (SVM), NNs, NSGAII and SPEA2, which are two highly competitive algorithms for bi-objective optimization. We adopted 4 most commonly used RMs: multiple curvilinear regression with the kernel function of y = ax b (RM1), y ¼ 1 aþbe À x (RM2), y = a+bx+cx 2 (RM3), and y = a+bx+cx 2 +dx 3 (RM4). Three kinds of NNs were adopted: back propagation NN (BPNN), general regression NN (GRNN), and radial basis function (RBF) NN. The reasons for selecting these NNs to present a contrast is that: BPNN is the most widely used NN, GRNN has the ability to converge to the underlying functions of the data with only few training samples available and needs relatively small knowledge to turn the parameter, and RBFNN has the characteristics of best approximation and global optimal performance. IBFOA, NSGAII and SPEA2 were adopted to find the community and further forecast the net profit. The proposed IBFOA was implemented in VB. SVM, NNs, and RMs aimed at optimizing MSE and were achieved in Matlab with the default setting. NSGAII and SPEA2 were also designed in Matlab environment according to the codes provided in [21] and [22], respectively. The population size of NSGAII and SPEA2 were 20 and both ran 100 iterations.
In IBFOA, initiated β and ε between -8.25~8.25, initiated θ between 0~1, set eliminationdispersal probability (Ped) to be 0.9, and the population size to be 20, IBFOA was run 100 iterations. In ROA models, risk-free interest rate r was replaced by the bank's benchmark interest rate, which was selected as 3.5%, σ was set to be 40%.

Performance of ECMM
The valuation deviation at year t is calculated as follows: is the actual value of sample i,V i ðtÞ is the predicted value, and n is the total number of test samples. Three types of experiments were carried out: short-term, middle-term, and long-term test. In each test, randomly selected 100 samples as training samples and others as testing samples. The ME, MSE of profit forecast and ΔV are shown in Tables 2-4, respectively, where AV1 and UB1 represent average value (AV) and upper bound (UB) of ME and MSE in short-term test, AV2 and UB2 indicate that in medium-term test, and AV3 and UB3 mean that value of longterm test. Fig 7 shows the radar chart of average value of ME and MSE in different terms for various bi-objective algorithms.
As can be seen from Tables 2-4, whether in the test of net profit forecast or company value evaluation, our IBFOA demonstrates superior performances, namely MSE, ME and ΔV are the smallest. Those show that using IBFOA to optimize BPF and clustering NG with BPF are good at predicting the profit of company and ECMM is expert in evaluating the business value. In Fig 7, the solutions of IBFOA densely distribute in the internal and optimal area of the radar chart, while the solutions of NSGAII and SPEA2 uniformly disperse in the outer edge of the chart. Therefore, the solutions of IBFOA are more Pareto optimal than those of NSGAII and SPEA2, in other words, NSGAII and SPEA2 are dominated by IBFOA, and those mean that the proposed IBFOA contributes to promoting the convergence of the algorithm as well as maintaining the diversity of the population.

Running time for obtaining optimal solutions
By the mechanism of IBFOA, we can obtain its time complexity O(n 2 ), where n is the length of bacterium solution, as shown in Fig 5. Table 5 shows the running time of various models for each experiment. In Table 5, compared with other models, although our IBFOA takes a little more time, its high precision offsets this deficiency.

Domain knowledge extraction
We run IBFOA 100 iterations on the collected dataset to acquire the optimal offsets of the edge weight in NG, and then calculate the weights of nodes. Specifically, in Fig 3, for any evaluation factor node I i , its weight is the sum of the absolute value of weight d ij and its offset β ij , that is, X j jb ij þ d ij j, similarly, for any classification node Z ij , its weight is X k ja ijk þ ε ijk j. Then nodes are sorted in descending order of their weights, and top 10 nodes of each type are displayed in Table 6.
As can be seen from Table 6, factors significantly affecting the profitability of companies are net profit growth rate, revenue growth rate, producer price index, interest rate, etc. Although some factors, such as general index of total wages, total retail sales of social consumer goods, current asset turnover, and so on, are not listed as the key evaluation factors, some of their classification nodes have a significant impact on profitability. This shows that the NG based method can not only depict which factors have an effect on the profitability of companies, but also can describe the specific impact of each value of factors, for example, "GDP, grade 5" in Table 6 indicates if the GDP factor value is 5, it has an important impact on profitability.

Conclusions and future works
The valuation of the company is the basis for determining the relevant negotiating parameters when venture capitalist and entrepreneur negotiate the deal. A reliable valuation model helps to provide the basic standard for the measurement of the company's fair market value as well as determining its price. In this study, combing CM method and ROA, a novel business valuation model for RFID companies is proposed. Compared with the existing methods, the proposed technique is distinctive in the following aspects: Firstly, unlike the current RM or NN based forecast methods, our method is based on NG. Moreover, we calculate the credibility of a node belonging to each community of NG, and then cluster the network based on the credibility. That is different from the traditional CM methods, which cluster the network directly using the network structure.
Secondly, different from current BM which adopts rigid and fixed function expressions, in this study, the flexible BM is developed to calculate the clustering credibility, that is, adopt GPF to approximate BPF and use IBFOA to optimize the parameters of BPF.
Thirdly, dissimilar to general single objective BFOA, IBFOA is bi-objective algorithm, which uses PBM to evaluate the reliability of forecast model. In the proposed PBM, SNDSM and SCD are adopted to rank the solutions, which can keep the diversity of the solutions without the cost of delaying the algorithm convergence. Finally, in IBFOA, new solutions are generated mainly through integrating the merits of existing solutions, and that spurs further search along the optimal direction so as to promote the convergence of the algorithm. Simultaneously, the parameters of the algorithm are adaptively adjusted according to the performance of IBFOA to further improve its efficiency.
Simulation results of RFID companies show that the proposed model demonstrates higher accuracy and reliability compared with other models. Future research will focus on the further improvement of the proposed model of NG. Although the experiment results show NG structure in this paper is appropriate, we cannot prove that it is optimal. So an optimal NG structure for business valuation is hoped to be given in future studies.