Marketable value estimation of patents using ensemble learning methodology: Focusing on U.S. patents for the electricity sector

Patent valuation is required to revitalize patent transactions, but calculating a reasonable value that consumers and suppliers could satisfy is difficult. When machine learning is used, a quantitative evaluation based on a large volume of data is possible, and evaluation can be conducted quickly and inexpensively, contributing to the activation of patent transactions. However, due to patent characteristics, securing the necessary training data is challenging because most patents are traded privately to prevent technical information leaks. In this study, the derived marketable value of a patent through event study is used for patent value evaluation, matching it with the semantic information from the patent calculated using latent Dirichlet allocation (LDA)-based topic modeling. In addition, an ensemble learning methodology that combines the predicted values of multiple predictive models was used to determine the prediction stability. Base learners with high predictive power for each fold were different, but the ensemble model that was trained on the base learners’ predicted values exceeded the predictive power of the individual models. The Wilcoxon rank-sum test indicated that the superiority of the accuracy of the ensemble model was statistically significant at the 95% significance level.


Introduction
With the advancement of the industrial infrastructure, such as Industry 4.0, the importance of intangible assets continues to increase, and patents, a representative intangible asset, are considered a key resource for enhancing a corporation's technological competitiveness [1]. Despite the importance of patents, patent transactions in the market are not active due to difficulties in determining a reasonable value that consumers and suppliers can satisfy [2].
Traditionally, patent valuation methodologies have been largely divided into revenue, cost, and market approaches [3]. The revenue approach is a way to convert future revenue from patent rights to the present value. Because the revenue approach is based on predictions of uncertain futures, it has been difficult to calculate prices that both consumers and suppliers can understand. In contrast, the cost approach has advantages in that it is easy to measure and requires less subjective factors. However, the cost approach has the inherent limitation of not considering the profitability or market value of the patent. The market approach determines the value based on the market-traded price, and the proper value can be determined because overvalued or undervalued patents converge at an appropriate price through repeated transactions. However, unlike the stock and raw material markets, the patent market has been difficult to value through a market approach because patents are less frequently traded [4]. This study complements the limitations of the market approach using machine learning and an event study methodology. Although research using machine learning for patent analysis has been actively conducted [5][6][7][8], most of this research is limited to technology classification, technology prediction, and patent quality analysis. Few studies have used machine learning for patent valuation because securing patent transaction data to train a machine learning model is difficult, as most patents are privately traded to prevent technical information leaks [9].
In this study, the event study methodology is used to overcome these limitations. An event study can be applied to extract the judgment of market participants on the patent by controlling the common market factors from the stock rate at the patent enrollment date. Daily stock and index returns are collected from a certain point in time to the patent enrollment date to employ an event study, and a regression analysis is conducted on these returns. Based on the regression analysis results, the expected normal return is calculated. The abnormal return is also calculated by subtracting the expected normal return from the stock return at the patent enrollment date. In this study, the marketability value of patents is represented based on abnormal returns.
Topic modeling was performed using latent Dirichlet to analyze the semantic information of the patent text. The computed abnormal return was then combined with the topic information of the patent produced using topic modeling as training data for the machine learningbased ensemble learning model. Through this, the market value of a patent can be estimated using the machine learning methodology.
In this study, the ensemble method was used to secure a stable predictive power. A single model has an inevitable model bias problem. An ensemble model that calculates the final predicted value based on the predicted values of several models can reduce the model bias.

Related work
Related research can be divided into two parts: patent valuation and patent topic modeling.

Patent valuation
Patent valuation studies have been conducted based on cost [10][11][12], revenue [13][14][15], and market approaches [16], with market approaches centered on small-scale cases where patent transaction information exists. This study is a complementary approach that bypasses marketapproach limitations because it uses the event study methodology to extract the marketable value of patents. An event study is a methodology to study the influence of a particular event on stock prices [17][18][19].
Three typical methods of estimating the normal return are the mean adjusted return method, market adjusted return method, and market model [20]. The mean adjusted return method estimates the mean return during the period before the event as a normal return, and the market adjusted return method estimates the normal return using the market rate of return during the event period [21]. In contrast, the method using the market model calculates the normal return during the event period using the result of estimating the market return sensitivity ðâ;bÞ of the stock return using the ordinary least square (OLS) regression analysis [22].
Research using event studies for technology valuation has been actively conducted. Studies have evaluated the value of innovation through event research [23][24][25]. Research using event studies for patent valuation has also been actively conducted [19,26,27]. In this study, the marketable value of the patent is represented by the abnormal return, and the abnormal return of the patent is matched with the topic information produced through topic modeling.

Patent topic modeling
Research on topic modeling has actively been studied. Topic modeling is divided into nonprobabilistic and probabilistic models [28]. The probabilistic approach yields a probability distribution over a set of classes for each input sample. In contrast, non-probabilistic models separate the functional space without modeling the class distribution and return the classes associated with the space from which the sample originated.
A representative non-probabilistic approach is to cluster keywords belonging to patent documents using the K-means method [29]. Research using a patent ontology network for extracting the K-nearest neighbors [30] or using the fuzzy set methodology for subject evaluation [31] has also been conducted. However, non-probabilistic models have limitations in seizing the semantic effect of patent content.
The probabilistic model is effective in discovering the hidden subject structure of documents. Various LDA-based models have been widely adopted depending on various aspects of patent data. Some research has combined patent information with an LDA-based subject model to discover potential semantic subjects [32] and has explored patent-related segment topics using the document structure [33].
In this study, we conduct topic modeling of patent text information using the LDA, a probabilistic methodology. We use machine learning-based ensemble models to evaluate the marketable value of patents based on their textual information. Although related studies that directly estimate the market value of a patent using machine learning techniques have been insufficient, some studies have predicted patent quality using patent citations as a dependent variable [34][35][36][37]. Various machine learning algorithms predict dependent variables, such as patent quality, using patent text information, such as neural network-based models [38], support vector machine (SVM)-based models [39], and random forest (RF)-based models [40].
A single model may cause a deflection problem, whereas the ensemble technique potentially reduces deflection. We could not find any studies where machine learning and ensemble techniques were used in patent valuation because patent trade price information is not disclosed. However, AdaBoost, in which a decision tree is set as a base learner, was used as an ensemble in technical transfer prediction, which is a different field [41]. Furthermore, in the entity recognition of patents, a voting-based ensemble in which CRF, CNN, and BERT were set as base learners was used [42]. More recently, a methodology to measure an internal quality measure through a graph-based unsupervised ensemble has been attempted [43]. The present study combines event study methodology and topic modeling and calculates the final prediction value through an ensemble. Specifically, various base learners are used, while support vector regression (SVR) is used as a meta learner in order to account for the deflection of the base learner when calculating the final prediction value. Table 1 presents the differences between related work and the model proposed in this study.

Proposed methodology and framework
This work combines the marketable values of patents produced through an event study with the topic information produced by topic modeling. The corresponding data are used for training machine learning-based ensemble models. This training allows the ensemble model to estimate the marketable value of the patent when given an arbitrary patent. The patent data used for patent valuation are U.S. registered patents for the electricity (IPCH) sector from April 1999 to June 2020. An analysis was conducted only on patents held by listed companies with stock price information to link to the event study. The abstract information of the patents was converted to a corpus, and an LDA analysis was performed. Afterward, the optimal number of topics was calculated using perplexity and coherence analyses, and topic modeling was performed based on the optimal number of topics. To determine the marketable value of a patent, the past 30 daily stock price returns and index returns, including the date of patent enrollment, were collected. A regression equation was constructed using the calculated stock price and index return to calculate the abnormal return. In this study, the abnormal return was used as the market value of the patent. The base learner learns based on the calculated market value of the patents and patent topic information. As a base learner, the RF, multilayer perceptron (MLP), and convolutional neural network (CNN) were used. The meta learner learns using the predicted value calculated using each base learner and estimates the market value of the patent. In this study, SVR was used as a meta learner.

Patent text analysis
Topic modeling is one of the text mining techniques for extracting meaningful topics from unstructured text data. It operates on the principle of deriving and classifying the topics embedded in a document based on the words that comprise the document in a large corpus [44]. Representative models of topic modeling are probabilistic latent semantic indexing (pSLI) [45], and LDA models have improved on pSLI [46,47].
The LDA model is an unsupervised learning method that finds hidden topics within a document through learning that consists only of word patterns, the only observed data, without knowing what each document contains. This method is also called a probabilistic generative model for a set of documents.
The LDA assumes that documents have multiple topics, and each subject follows a Dirichlet distribution. When M documents are given and all documents belong to one of K topics, the word can be represented as an index of the vocabulary. If the size of the vocabulary is V, each word corresponds to an index v = {1,. . .,V}. The word vector w is expressed as vector V and satisfies w v = 1, w u = 0, v 6 ¼ u. In other words, if the document contains a v word, it is marked as 1; otherwise, it is marked as 0. Document W is written as W = (w 1 ,w 2 ,. . .,w N ) in N consecutive words. The corpus is a set of documents, expressed as D = {W 1 , W 2 , . . ., W N }.  the total number of words in the document. In addition, α is a hyperparameter with the same value in the set of documents. Moreover, α is a parameter of the K-dimensional Dirichlet distribution and determines θ, which represents the proportion of subjects in each document and is a K-dimensional vector. Further, θ i is the probability that the document belongs to the ith subject. That is, it indicates the distribution of the subject of the document and satisfies S K i¼1 y i ¼ 1. Additionally, Z is an N-dimensional vector, and z n is a subject assigned to the word w. In the figure, β is a matrix of size K × V, and β ij is the probability that the ith subject generates the jth word of the vocabulary. Here, w n is given through actual documentation, and the other variables are latent variables that cannot be observed. Moreover, β is a parameter of a Dirichlet distribution and determines ϕ. Here, ϕ indicates the proportion of the subject each document comprises. In addition, ϕ is a K-dimensional vector, and ϕ i is the probability that the document belongs to the ith subject. In other words, it is the distribution of the subject of the document and satisfies S K i¼1 � i ¼ 1. To use the LDA, given a document, a posterior distribution must be derived for the latent variable Z.
The topic model combines words belonging to the same semantic class into one subject based on the simultaneous occurrence of words in a document set. This process does not require manual labor; thus, large documents can be processed without a problem. However, as with unsupervised learning, it is not known how much the automatically processed results match the desired results, which is the most important limitation of the unsupervised topic modeling method.
Therefore, the task of evaluating the performance of the topic modeling results is emerging as important. Two categories of methods exist to solve this problem. Unlike intrinsic methods that do not use external sources of the dataset, external methods use topics discovered by external work, such as information retrieval or external statistics, to evaluate topics.
In this study, perplexity [48], a classical technique for implicitly evaluating a topic model, and the coherence technique [49], which complements the limitations of the perplexity analysis, were used together. Perplexity is an index that measures how well the probability model predicts a sample in information theory, and a lower perplexity value indicates better predictive power. Perplexity is primarily used to evaluate how well the probability model improves compared to other models or to evaluate performance according to parameters within the same model: where p(ω d ) denotes the probability that a specific word in the dth document is assigned to the subject, and N d represents the total number of words in the dth document. A low perplexity value indicates that learning is well done, not that the result is good for human interpretation.
Low perplexity values do not always exhibit adequate results for interpretation [50]. The coherence analysis complements this aspect. Coherence is an index that measures the semantical consistency of a topic. More consistent words indicate a higher coherence. Coherence is primarily used to determine the meaningfulness of the information produced by the model. The measurement index of coherence used in this study is as follows: where p(ω i , ω j ) represents the probability that the ith and jth words appear simultaneously in each document, and p(ω i ) denotes the probability that the ith word appears in the entire document set.
The Gibbs algorithm is a method of inferring that only one variable is changed while the remaining variables are fixed, and unnecessary variables are excluded. Numerous calculations are required to determine the posterior distribution considering all latent variables; thus, only the posterior distribution for the topic Z of interest was considered. The posterior distribution for Z is as follows: For sampling in the Gibbs algorithm, a conditional distribution must be defined, and the posterior distribution for Z can be defined from the relationship between the Dirichlet distribution and the multinomial distribution in the LDA probability model: where α and β are parameters of the Dirichlet distribution. In addition, n ðvÞ k;À n refers to the frequency at which the vth word is observed as the kth topic among the remaining words excluding the nth word w n in each document, and n ðkÞ d;À n refers to the frequency of words with subject k in the document, excluding the nth word in document d. A topic is randomly assigned to all words in the document set, and the topic distribution for each document to which the word belongs and the word distribution for each topic are calculated except for the nth word. Sampling is performed using the above equation, and the process is repeated until convergence.

Marketable value extraction
An event study is a methodology that measures the effect of unexpected events or the disclosure of new corporation information on the expected profitability of a corporation. The event study is used to extract market value in this study. A decision or situation that is considered to have influenced the stock price is called an event. The day when the stock price is considered to have been affected by the event is called an event day. A normal return is the rate of return calculated based on the normal stock price that would have been formed if the event had not occurred. Three typical methods of estimating a normal return are the mean adjusted return method, market adjusted return method, and market model. The mean adjusted return method estimates the mean return during the period before the event as a normal return, and the market adjusted return method estimates the normal return using the market rate of return during the event period. In contrast, the method using the market model calculates the normal return during the event period using the result of estimating the market return sensitivity ðâ;bÞ of the stock return using the OLS regression analysis.
In the market model, the normal return of the event period is calculated by substituting the explanatory variable data during the event period into the regression model that estimates the normal return. The target period for extracting the necessary data to measure the parameters of the regression model is called the estimation period. In this study, the normal return is calculated through OLS based on the stock price and index returns over a certain past period from the patent registration date. At time t when patent i is registered, the stock price return R i,t of the corporation holding the patent can be expressed as follows according to the market model: where RM t denotes the total market return at time t, e i,t represents the intrinsic rate of return of the corporation holding patent i at time t, which is not correlated with the entire market and is assumed to have an expected value of 0. In addition, β is the sensitivity of the analysis target stock return to the market return, and α is the expected return of the stock when the market return is zero. Therefore, Eq (5) can be explained as dividing the stock price return (R i,t ) of the analyzed stock into the rate of return by market factors and the rate of return by factors specific to the stock. The explanatory variable used in the market model is the rate of return of the index representing the entire market. In this study, Standard and Poor's 500 (S&P 500) is used as a representative index. The abnormal return is calculated by subtracting the normal daily return estimated by the model from the actual rate of return during the event period. The abnormal rate of return calculated in this way is also referred to as an excess return in the sense that the rate of return exceeds the normal return by the market model. The rate of return AR t , which is changed according to the occurrence of a patent event, can be expressed as follows using the above equation: where E(R i,t | RM t ) is the expected return of the corporation holding the patent given the market return RM t .

Base leaner composition
Random forest model. The RF model is a method of learning using multiple decision trees. A forest comprises uncorrelated trees using a classification and regression tree (CART), such as a method that combines random node optimization and bagging. The CART method is a technique to determine the effect on the response variable (dependent variable) by making the most of the nonlinearity and interactions of the explanatory variable or predictor. Moreover, the CART branches of explanatory variables are created according to the importance criterion, and they make judgments on response variables at the last node. The CART can be used even when the response variable is binomial, polynomial, or continuous. A predictor (explanatory variable) can also be selected without distinction between continuous and categorical variables. Like CART, RF can be used for both categorical and continuous response variables. The same algorithm as the CART is used to build the tree.
However, for the CART method, one decision tree is derived, whereas forming a forest comprising numerous decision trees is different in RF. The random sampling of predictors and observations is repeated to build multiple decision trees. After obtaining a categorical prediction from numerous decision trees, the final categorical prediction is decided by majority voting. By providing randomness to decision tree formation, independent decision trees can be repeatedly created, and prediction errors can be reduced. The bootstrapping technique is used for the random selection of predictors and observations. For the CART method, as the number of lower nodes increases, the bias of the prediction error decreases, but the variance increases. However, in the RF method, the variance of prediction errors can be reduced by repeatedly generating equally distributed decision trees.
Multilayer perception model. The multilayer perceptron is a neural network model consisting of an input layer, a hidden layer, and an output layer. In artificial neural networks, a methodology for configuring and analyzing multiple hidden layers is called the MLP. This MLP model is an analysis methodology that is widely used in pattern classification, recognition, and prediction and is extended to more advanced artificial neural network analyses according to the shape and activity function of the hidden layer. This is a more advanced algorithm than the single-layer perceptron, consisting of one basic input layer and a hidden layer. It could be expressed in the form of Eq (7): where v i is a signal of the input layer or the previous hidden layer, and b j and b k indicate the bias between the hidden and output layers, respectively. In addition, w ij and w jk denote the coefficient values of the hidden and output layers, respectively. Further, f represents the activation function, and the sigmoid and ReLU functions are commonly used. The resulting value (Y k ) of the output layer can be obtained through Eq (7). Convolutional neural network model. The CNN exhibits remarkable performance in the field of image identification. This methodology makes a value that recognizes only a certain part of the image into a new value through a filter. The CNN connects these values continuously so that the image characteristics can be better understood. The CNN uses the convolution of an input layer and filter and comprises one input and one output layer, one or more convolutional layers, and a pooling layer. The data are input through the input layer and filtered through the convolutional layer to extract the appropriate features. The number of feature maps is determined according to the number of filters.

Ensemble learning
No single model can perform well in all situations. The stacking ensemble model used in this study creates a model with the best performance by combining different models. The stacking ensemble model can be configured by combining various algorithms. Through these combinations, weaknesses can be compensated for while taking advantage of each algorithm. Thus, the performance can be improved over a single model that is generally trained. In this study, a stacking ensemble model is constructed using the RF, MLP, and CNN models.
To calculate the predicted value for each submodel to be used as the input data for the stacking ensemble model, the training set was again divided into seven subtraining data, and the submodel training and predicted value calculations were performed for each fold. As an algorithm for ensemble learning, we used the SVR, which has better interpretation power than the black-box model. The SVR is a generalization technique of the SVM, a classification algorithm [51], and predicts data by determining an optimal hyperplane that includes as much data as possible within the distance between support vectors. Table 2 lists the main logic and data structures of the RF, MLP, CNN, and stacking ensemble models used to predict the marketable value of a patent. Models 1 to 3 are the experimental group for comparing the predictive power of the stacking ensemble model. The base learners used in Model 4 are the same as the logic of Models 1 to 3 but were trained and tested models in a set of seven divided training data (seven-fold). The meta learner was trained based on the base learners' predictions calculated from the testing set of each fold. The MLP was used as an algorithm that combines the predicted values of the base learners.

Model comparison
The mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE) are used as indicators to evaluate the model's performance and calculated as follows: Next, to determine whether a statistically significant difference exists between the stacking ensemble model and predicted values of each individual model, a pair consisting of the stacking ensemble model and each individual model is constructed. The Anderson-Darling normality test is performed to determine whether the pair had normality. The Anderson-Darling test statistic is defined as follows: The Wilcoxon rank-sum test is a nonparametric alternative to the two-sample t-test which is based solely on the order in which the observations from the two samples fall. The test statistic was computed as follows: W = min(w 1 , w 2 ), where W1 and W2 are the sums of the combined rank from Groups 1 and 2, respectively. For n 1 and n 2 > 10, the normal approximation can used (i.e., W is normally distributed with mean μ w = n 1 (n 1 + n 2 + 1)/2 and standard deviation s w ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi n 1 n 2 ðn 1 þ n 2 þ 1Þ=12Þ p . The Z-statistic can be computed as follows: where Z 0 is approximately normal with a mean of 0 and a variance of 1.

Data description
In this study, among the electricity sector (IPC H) patents registered with the U.S. Patent Office from April 1999 to June 2020, the abstracts of the patents held by listed companies with stock price information were used as the text analysis data. The data were preprocessed to analyze patent text information, which is unstructured data. For example, converting words to lowercase, tokenizing a sentence by word, removing stop words to remove articles causing noise in the analysis, and stemming were conducted. The stock return and S&P 500 rate data from the past 30 days at the date of patent enrollment were used as basic data for determining the marketability of patents.

Patent text analysis
A total of 27,464 patent documents were used for analysis, and the abstract information from each patent was used for text analysis. The average number of words in each document was 67, and the maximum and minimum word counts were 3,576 and 5, respectively. The standard deviation of the number of words by document was 49.
The words and their weights for each topic are listed in Table 2. The topic relevance of each document is presented in Table 3. The rows of Table 4 indicate the patent document number, and the columns indicate the dominant topic for each document and the probability of being included in the individual topics. The probability values in the topic were matched with the marketable value of the patent and used for ensemble leaner learning. The mean and standard deviation were 0.012285 and 0.037705, respectively.
The number of topics was fixed to calculate the optimal passes, and the perplexity per pass was calculated. As displayed in Fig 3, the perplexity score continuously decreases as the number of topics increases. However, Fig 4 confirms that coherence value increased when the number of topics reached 80. The semantic consistency increases when there are 80 topics, and accordingly, 80 was selected as the number of topics. Table 5 presents the calculation results of variable importance using the random forest technique. The importance was the highest in Topic 54, followed by Topic 1 and Topic 15. Table 3. Words and their weights for each topic.

Patent market value extraction
To calculate the marketable value of a patent using the event study methodology, the corporation stock price return and S&P 500 return data for the past 30 days were collected based on the registration date of each patent document. A regression analysis was performed by setting the S&P 500 return as an independent variable and the corporation stock price return as a  dependent variable. Based on the coefficient information derived from the regression analysis, the abnormal return value of the patent registration date was calculated. Table 6 is a sample of abnormal return calculation results. Fig 5 is the regression analysis result for calculating the abnormal return of the first eight patents among all patents. Each point is the stock return and S&P 500 return value by date, and the solid line is the regression line for each point. Such a regression analysis was performed for each patent document and was performed a total of 27,464 times. Table 7 presents the basic statistics on the calculation results of the abnormal returns for each subtechnical field for all patent documents.

Base learner and ensemble model setting
In this study, tuning was conducted only for the base learner, and hyperparameter tuning was conducted based on the grid search to minimize the adaptation of the validation data in the ensemble model. The variables subject to base learner tuning were limited to the number of trees (random forest), epochs (MLP and CNN), and nodes (MLP and CNN) and were tuned. The training set was divided into seven folds to calculate the prediction value of the base learner for meta learner learning. In addition, each fold was subdivided into sub-training and sub-validation sets, and the optimal hyperparameter of the base learner was calculated for each fold. Table 8 presents the hyperparameter-tuning results of the base learner based on the random forest technique. For Fold 1, as the error rate decreased with an increasing number of trees but increased when the number of trees exceeded 1400, the optimal number of trees was selected as 1400. Tables 9 and 10 present the hyperparameter tuning results of the base learner based on MLP and CNN, respectively. For MLP and CNN, the numbers of nodes and epochs needed to be determined, after which hyperparameter tuning was conducted by building a grid with two variables. First, the optimal epoch was calculated for each node, and then the node whose error rate was the lowest was selected as the optimal node. For Fold 1 of MLP, the lowest error rate was achieved when the number of nodes was 300. The CNN was also constructed using the same process, and for Fold 1, the optimal number of nodes was set to 100. Table 11 summarizes the hyperparameter tuning results of the meta learner (SVR) with the input of the prediction value of the base learner (random forest, MLP, and CNN). A dataset was built by gathering prediction values of each fold, which was then divided into a sub-training set and sub-validation set to perform a grid search. For SVR, the regularization parameter was the tuning target, and the optimal value was calculated as 1.  Table 12 summarizes the hyperparameters of the base learner and meta learner used in the ensemble model. MSE was used as a loss function in the random forest, MLP, and CNN models. Gini was employed as a criterion of the random forest model, and Adam was used as the optimizer of MLP and CNN. For the CNN, the filter, kernel, and pool sizes were set as 32, 2, and 2, respectively. For SVR, which was used as the meta learner, the kernel used was linear and the cache size was set as 200.

PLOS ONE
A control group is needed to determine the predictive power of the ensemble model. As a control group, linear regression, random forest, MLP, and CNN were used, and hyperparameter tuning of random forest, MLP, and CNN, which required hyperparameter tuning, was performed. Table 13 presents the hyperparameter tuning results of the control group model. For the control group, because no fold needed to be built, a training set was used in learning. The optimal number of trees in the random forest, which was a control group, was 400. The optimal numbers of nodes in MLP and CNN were calculated as 300 and 50, respectively. Fig 6 shows the comparison result of the optimal epoch calculation process between MLP and CNN control group models using training and validation sets.

Experiment results
The RF, MLP, and CNN models were separately trained with equivalent training data to compare the predictive power of the SVR-based stacking ensemble model. Then, the predictive    Table 14 illustrates the error results for each model. As a result of the error analysis, the stacking ensemble model exhibited the lowest error in all error measures. Based on the RMSE, the predictive power of the stacking ensemble model was 1.013 times higher than that of the Linear regression model, 1.033 times higher than that of the Random forest model, 1.194 times higher than that of the MLP model, and 1.062 times higher than that of the CNN model. Table 15 lists the Anderson-Darling normality test results for each pair. As a result of testing, all pairs exhibited non-normality. Table 16 presents the Wilcoxon rank-sum test results of each pair. The predicted value of the stacking ensemble and linear regression models (Pair 1) exhibited a statistically significant difference, and Pairs 2, 3 and 4 also demonstrated  https://doi.org/10.1371/journal.pone.0257086.g007 statistically significant differences. Therefore, the predictive power of the ensemble model was superior to that of the other single models, and the difference in predictive power from the single model was statistically significant.

Conclusion
In this study, the patent text analysis methodology for patent valuation was combined with an event study. Among the event study methodologies, an analysis was performed based on a market model, and the calculated abnormal return was used as the market value of each patent. In the patent text analysis, LDA-based topic modeling was used to capture the semantic characteristics of the patent. Perplexity and coherence analyses were used to calculate the optimal number of topics. The coherence analysis found the optimal number of topics to be 80, which indicates that semantic consistency was highest when all patent documents were divided into 80 categories.
In addition, the marketable value of the patent was estimated using the SVR-based ensemble model. The RF, MLP, and CNN models were used as the base learners of the ensemble model. Considering the ease of interpretation, the SVR was used as the ensemble learning algorithm. The RF, MLP, and CNN were trained separately using equivalent training data to determine the predictive power of the ensemble model. After comparing the predictive power of each model using the MAE, MSE, and RMSE values, the predictive power of the ensemble model was the highest. The base learners with high predictive power for each fold were different. However, the ensemble model trained on the base learners' predicted values exceeded the predictive power of each of the single models, which indicates that the predictive power of the ensemble model is more stable than that of the single model.
To check whether the predicted values of the stacking ensemble model and the predicted values of the individual models exhibited statistically significant differences, the Anderson-Darling normality and Wilcoxon rank-sum tests were performed. The analysis revealed that the predicted value of the stacking ensemble model exhibited a statistically significant difference from the predicted value of a single model using the RF, MLP and CNN. Therefore, the predictive power of the ensemble model was superior to that of other single models, and the difference in predictive power from the single model was statistically significant. These models employ the stock price change at the patent registration date to calculate the target variable. However, this approach has the following limitations. First, "information other than patent", which is not controlled as the benchmark index, may be introduced to the abnormal return term although this approach assumes that the price-earnings ratio is divided into normal and abnormal returns and the abnormal return value at the patent registration date is structured by the patent events. Second, patent development information may be introduced to the market in advance before the patent registration date. In such a case, an increase or decrease in market value according to the patent registration can be reflected in advance, thereby producing an overestimated or underestimated value. Thus, we believe a study on improving the method to extract patent value is needed as follow-up research. If this methodology becomes popular, the keywords of patent specifications may be manipulated so as to ensure a high patent valuation. In such a case, evaluation may be required through multiple methodologies. In addition, the present study used three base learners (random forest, MLP, and CNN) to combine the calculated patent value and text information, and SVR was employed as a meta learner, but various other algorithms may be considered in the future study.