Assessing future technological impacts of patents based on the classification algorithms in machine learning: The case of electric vehicle domain

Fang Han; Shengtai Zhang; Junpeng Yuan; Li Wang

doi:10.1371/journal.pone.0278523

Abstract

Introduction

Identifying the technologies that will drive technological changes over the coming years is important for the optimal allocation of firms’ R&D resources and the deployment of innovation strategies. The citation frequency of a patent is widely recognized as representative of the patent’s value. Thus, identifying potential highly cited patents is an important goal. A number of studies have attempted to distinguish highly cited patents from others based on statistical models, but a more effective and applicable method needs to be further developed.

Methods

This paper treats the prediction of later patent citations as a classification problem and proposes a novel framework based on machine learning methods. First, a indices system to identify highly cited patents is constructed using multiple factors that are believed to influence citation frequency. Second, various machine learning models are utilized to identify highly cited patents. The optimized model with the best generalization capability is selected to predict the future impacts of newly applied patents, which may be representative of emerging significant technologies. Finally, we select the electric vehicle (EV) domain as a case study to empirically test the validity of this framework.

Results

The optimized support vector machine (SVM) model performs well in identifying highly cited EV patents. Technological frontiers in the EV domain are identified, which are related to the topics of information systems, batteries, stability control, wireless charging, and vehicle operation.

Discussion

The good performance in prediction accuracy and generalization capability of the method proposed in this paper verifies its effectiveness and feasibility.

Citation: Han F, Zhang S, Yuan J, Wang L (2022) Assessing future technological impacts of patents based on the classification algorithms in machine learning: The case of electric vehicle domain. PLoS ONE 17(12): e0278523. https://doi.org/10.1371/journal.pone.0278523

Editor: Zhihong (Arry) Yao, Southwest Jiaotong University, CHINA

Received: March 15, 2022; Accepted: November 17, 2022; Published: December 6, 2022

Copyright: © 2022 Han et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting Information files.

Funding: FH received funding for this work from National Science Library, Chinese Academy of Sciences(http://www.las.cas.cn/), and the Grant number is E0291303. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

The anticipation and forecasting of technological changes are important because technological advances have become increasingly fast and complex. The core issue is the identification of current technologies that will drive technological changes over the coming years [1]. While a patent’s citation frequency is widely accepted as one important measure of the importance of the patent in a particular technological area [2–5], predicting the citation frequency of new applied patents in the coming years is helpful to predict upcoming technologies that will be critical to firms and governments in the current competitive environment. However, it generally requires a long time to reach the citation peak after the publication of patents [6], so previous studies that have explored key innovations using citation frequency could not predict forthcoming critical technologies. In other words, predicting the citation frequency of patents or grasping features that affect this frequency is helpful to explore potentially important technologies. Newer techniques also emphasize the importance of citation centrality in assessing potential progress [7, 8].

Compared with citation prediction research on academic papers [9, 10], relatively few studies have explored the features of highly cited patents, and most researchers focus on citation analysis rather than citation prediction [11]. Nevertheless, some statistical models have been established to identify the features of highly cited patents and predict future citation behavior. Lee et al. proposed a stochastic patent citation analysis to assess future impacts over a particular period [1]. Yoshikane et al. [12] confirmed the significant positive correlation between the number of citations a patent receives and the number of the cited patent’s classifications. Yoo and Chung [13] applied a multiple regression model to distinguish important factors based on 17 explanatory variables in five subject fields. Their research showed that the number of pages, number of IPC codes of backward citations (i.e., cited patents; see Fig 1), number of claims, and four other indicators have impacts on the citation frequency of patents. The variables in different fields showed significant differences. With a focus on Korean medical device patents, Yoon et al. [14] used 13 technical indicators to develop a generalized linear model of the number of patent citations and found 7 influential indicators, including the number of inventors, countries for priority claims, and the number of IPC codes. Lin et al. [6] predicted citations of biotechnology patents based on regression analysis.

Download:

Fig 1. Illustration of “backward citation” and “forward citation”.

A, B, and C are three patents. If A was cited by B and B was cited by C, then A was the backward citation of B and C was the forward citation of B.

https://doi.org/10.1371/journal.pone.0278523.g001

These studies have shed light on the impact factors that are correlated with citation frequency. However, most studies have not comprehensively considered multiple factors. For instance, even though many previous studies have demonstrated the close relationship between science and technology [15, 16], studies have not adopted science-related indices, such as the number of scientific citations, the diversity of the cited scientific categories, and other variables linking science and technology, to the influencing factors. Furthermore, most previous studies have explained and predicted patent citation frequency based on regression analysis. However, patent citations often have obvious long-tail effects, make such predictions necessarily skewed. Thus, it is not suitable to use a regression model in studies [17]. Given this background, this study considers a broader scope of factors that may affect citation frequency. And we regard the prediction of citation frequency as a classification issue and calculate the contribution of each variable based on different machine learning methods. The optimized model with the best performance is selected to predict potential highly cited patents. Compared with the regression method, the classification algorithm used in this paper has much better performance.

This paper is organized as follows. First, we describe the source data used in the analyses and explicate the methodology, including the variables representing the quantitative characteristics of patents and the machine learning models. Second, we present the results of the analyses, in which the optimized model is selected and the contributions of the factors to citation frequency are calculated based on the model and are used to predict potential highly cited patents. We use the EV domain as a case to illustrate this method and verify its effectiveness. The last section concludes.

2 Materials and methods

2.1 Data

The patent data were downloaded from the patent search tool PatSnap, and keywords related to the research domain were searched in US patents. The obtained data included the patent number, associated classifications, numbers of inventors, number of claims, and other useful information that may contribute to the citation frequency. The citation frequency within 3 years could also be obtained directly from PatSnap.

2.2 Index system of highly cited patents

The response variable in this paper is the citation frequency within 3 years of the patent (Fcited), which is proven to be representative of the core technology [18]. And our discussion in section 3.2 also proved that this indicator could be utilized as the measurement of patent impact in the EV domain.

Combining previous studies and the principles of comprehensiveness, measurability, and availability once published, 17 explanatory variables, including the characteristics of patents, knowledge source, and inventors, which have been recognized as commonly influencing citation frequency, were adopted into the analysis as follows.

2.2.1 Characteristics of patents.

Number of associated classifications (VC): the number of International Patent Classification (IPC) codes attributed to the patent, which can indicate the breadth of technological knowledge of the patent. It has been proven to be positively related to citation frequency [11]. A previous study noted that examining the lower-level classifications could produce a higher value of the correlation coefficient with the citation frequency [1]. Therefore, this study utilizes the number of IPC codes at the bottom level in the IPC.

Number of new associated classifications (VCnew): the number of IPC codes that first appeared in the domain. Previous studies have shown a correlation between innovation and new technological knowledge [19, 20]. Thus, in this paper, we use the new IPC codes (at the bottom level) to represent the new knowledge of the patent.

Number of new combinations of associated classifications (CCnew): the number of new combinations of IPC codes (at the bottom level) in the patent. Based on technological convergence theory, the fusion of different technological categories can lead to disruptive technology [21, 22]. Thus, this index is expected to be correlated with citation frequency.

Number of patents in patent family (FM): a commonly used index in the assessment of patent value. A previous study showed that patents representing large international patent families are particularly valuable [23].

In addition, referring to previous studies, this paper adopts the number of figures (FG), number of pages (PG) and number of claims (CL), which can represent quantities of descriptions or can be specific to patent applications, as the explanatory variables [11].

2.2.2 Characteristics of the knowledge source.

The knowledge source can be divided into two aspects: technological knowledge from backward citations and scientific knowledge from cited papers. The related indices we selected are as follows.

Number of backward patent citations (VTciting): previous studies have found that highly cited patents have more citations than others [24]. This could be because inventing is a recombinant process, and the value of invention is associated with how many prior inventions are considered [25]. Thus, this study takes the number of backward citations as the explanatory variable.

Number of associated classifications with backward patent citations (TCciting): this indicator reflects the diversity of technological knowledge sources. Many researchers have found that technological diversity is correlated with the novelty of patents, which is related to citation frequency. Examples include Rosenkopf and Nerkar [19] and Yoshikane et al. [12]. Their studies showed that breakthrough inventions often cite more classifications than others. Based on these studies, this study calculates the number of citing classifications at the subclass level (the fourth layer in IPC).

Novelty of backward patent citations (NTciting): A previous study showed that the novelty of backward citations is related to patent value [26]. Thus, if a patent cites more newly published patents, it seems to be cited more frequently. We adopt the proportion of newly published patents in 5 years as the measure of the novelty of the technological knowledge source.

The relationship between scientific knowledge development and technological development is widely recognized as one of the most important and complex aspects of technological evolution. Gittelman and Kogut [27] established that the number of scientific citations is correlated to technological innovation. Han and Magee [28] confirmed that the structure of scientific citations, which has been measured by scientific categories, is related to the technological structure. Scientific references in patents can be added by both the applicants and the examiners [29]. In this paper, the scientific citations of both sources are considered to have the same function of linking science with technology. Since scientific citations represent a bridge between science and technology, they do not necessarily indicate the direct scientific basis of the patent [30]. We adopt the number of nonpatent literature (NPL), number of scientific citations (VSciting), number of categories associated with the scientific citations (SCciting), and novelty of scientific citations (NSciting) as the quantities specific to the scientific knowledge source. In this paper, the scientific citations are papers indexed by Web of Science, which are identified using the MATLAB tool. The classifications of journals by Journal Citation Report (JCR) are utilized in this paper to define the scientific categories.

2.2.3 Characteristics of inventors.

Number of inventors (IV): A previous study found that the number of inventors has a positive relation with citation frequency in most IPC sections [9]. This indicator can reflect the R&D input and the breadth of knowledge of the inventors to some degree.

In addition, the average number of patents of all inventors (VPaverage) is utilized as the explanatory variable. Many studies have shown the important role of personal capabilities on patent value [31, 32]. Therefore, in this paper, the average number of patents in a domain of all inventors and the number of patents of the first inventor (VPfirst) are used to measure personal capability.

Table 1 details the procedures employed to derive these indices from the data.

Download:

Table 1. Derivation procedure for each index.

https://doi.org/10.1371/journal.pone.0278523.t001

2.3 Machine learning model building

In this paper, patent citation prediction is regarded as a classification problem. There are many alternative methods to use machine learning for classification and prediction. This paper selects three classic machine learning classification methods that have been successfully applied in various fields: support vector machine (SVM), random forest (RF), and extreme gradient lifting (XGBoost).

2.3.1 SVM.

SVM is a data classification technique developed by Cortes et al. [33] according to supervised learning. First, it can only solve the problem of binary classification. However, most of the practical applications are multiclassification problems, so two methods are designed: "one-against-one" and "one-against-the-rest". The SVM method has a strong theoretical basis, and the extremum solution obtained is the global optimal solution, so the SVM model has good generalization ability for unknown samples and has been widely used in various fields.

2.3.2 RF.

RF was proved and completed by Breiman in 2001 [34], is a classifier containing multiple decision trees. RF selects several groups of mutually independent classifiers to form the final classifier structure and then averages the output results of each decision tree to obtain the final classification result. RF has the advantages of processing unbalanced datasets, effectively reducing the probability of overfitting and fast training speed.

2.3.3 XGBoost.

XGBoost was proposed by Chen and Guestrin [35], is based on the boosting idea. It is a supervised and extensible tree lifting algorithm derived from the gradient lifting algorithm. In the process of iterative optimization, the XGBoost algorithm performs second-order Teller expansion on its loss function, so it can estimate the loss function more accurately. In addition, the XGBoost algorithm has the advantages of effectively dealing with missing values, preventing overfitting, and improving the computing speed.

The accuracy rate, precision rate, recall rate, and F1 score [36] are used to evaluate the performance of the models in this paper. The influences on citation frequency are calculated for each explanatory variable through the optimized machine learning model with the best generalization ability, which can be further utilized to predict potential highly cited patents based on the newly published patents. The frame of the structure is as follows (Fig 2).

Download:

Fig 2. Potential highly cited patent identification model based on machine learning.

https://doi.org/10.1371/journal.pone.0278523.g002

3 Empirical analysis

3.1 Data

This study takes the electric vehicle domain as an example to verify the validity of the identification model we proposed. We searched the patent data using the keywords “electric vehicle” and “electric car” by utilizing the patent search tool PatSnap, which searched 13732 US patents from 1976 to 2021 (the EV patent data can be downloaded from S1 File). Since the response variable is the citation frequency in 3 years, we selected the patents that were published between 2000 and 2016, which include 5665 patents whose citation numbers are already known, to study the influences of the explanatory variables. We selected top 20% patents ranked by citation frequency in 3 years as the highly cited patents. At last, 1149 with a citation frequency ≥10 are selected as the highly cited patents and used to test the performance of various machine learning models after classification.

3.2 Construction of the highly cited patent identification indicator system

As mentioned above, this study utilizes the indicator of citation frequency in 3 years of the patent as the response variable. To show the citation relationships in the EV domain and verify the feasibility of the selected indicator, we applied Gephi [37] and VosViewer [38] to generate the citation-based map of the EV patents based on their citation relations with all other patents from 2000 to 2021. For better visualization, we display only the 5665 EV patents published between 2000 and 2016 (Fig 3). The 1149 highly cited patents are marked in red, and the non-highly cited patents are marked in green. Fig 4 shows the core part of the map. The maps show that the patents with high citation frequency in 3 years (the red nodes) hold the central position in the whole period. This means that they acquire more citations than the others. Fig 5 further supports this viewpoint: the highly cited patents (in 3 years) still acquire more citations than the others published in the same year after 3 years. The Pearson correlation coefficient is 0.61 between the citation frequency in 3 years and the citation frequency after 3 years for all EV patents published between 2000 and 2016, which shows a relatively strong relationship between them. This result also proves that the indictor of citation frequency in 3 years could be utilized as the agent of future impact of the patent in the EV domain.

Download:

Fig 3. The citation-base map of the EV patents.

The highly cited in 3 years are marked in red, and the non-highly cited ones are marked in green.

https://doi.org/10.1371/journal.pone.0278523.g003

Download:

Fig 4. The central part of citation-base map of the EV patents.

The highly cited in 3 years are marked in red, and the non-highly cited ones are marked in green.

https://doi.org/10.1371/journal.pone.0278523.g004

Download:

Fig 5. The average citation frequency of highly cited patents and non-highly cited patents in 2000–2016.

https://doi.org/10.1371/journal.pone.0278523.g005

We defined three patent classifications based on the number of forward citations in 3 years, which could be representative of the high impacts at different levels (patents with a C2 class are the most valuable, C1 class patents are the “mediumly high” valuable, and C0 class patents are the “normally high” valuable). Considering the balance and feasibility of the data requirements, the median of the cited frequency data in each class is used to decide the threshold values. The difference between the medians in C0 and C1, should be approximately equal to the difference between the medians in C1 and C2. Table 2 shows the detailed data classification criteria.

Download:

Table 2. Data classification criteria.

https://doi.org/10.1371/journal.pone.0278523.t002

After classification, we discussed the relationships between explanatory variables and between each of these and the response variable based on Pearson correlation test. Table 3 shows the part of results. Strong correlations are observed for 4 pairs variables, including VC and CCnew, FG and PG, VTciting and TCciting, VPaverage and VPfirst. Considering the multicollinearity between variables judged from the observed values of the correlation coefficients, we included only one variable in each pair of them in the following analyses. VC, PG, TCciting and VPaverage are selected because they show slightly stronger correlations with the response variable. Thus we obtained 13 explanatory variables.

Download:

Table 3. Correlations between variables.

https://doi.org/10.1371/journal.pone.0278523.t003

In general, all explanatory variables except VC and FM have positive values of the Pearson correlation coefficients with Fcited. And compared with the other variables, PG and FG show relatively strong correlations with Fcited. These are followed by TCciting and CL. However, correlations between the explanatory variables and Fcited are generally weak. Most of the absolute values of the correlation coefficient with Fcited are less than 0.20. Fig 6 visualizes the relationships between each of the 13 explanatory variables and Fcited.

Download:

Fig 6. Relationships between each of the explanatory variables and Fcited.

https://doi.org/10.1371/journal.pone.0278523.g006

At last, we obtained the highly cited patent identification indicator system of the EV domain (Table 4), which is a 1149×15 matrix. The first column is the publication number (PN) of the patents, Columns 2 to 14 are 13 explanatory variables used as input data for machine learning models, and the last column shows the classifications of the patents, which are the output data for the machine learning models.

Download:

Table 4. An example of the highly cited patent identification indicator system.

https://doi.org/10.1371/journal.pone.0278523.t004

3.3 Comparison of the performance of machine learning models

Based on the SVM, RF, and XGBoost models, we randomly selected 80% of the data for training, and the other 20% of the data are utilized for performance testing. The parameters of each model are optimized based on their performance, and the accuracy, precision, recall, and F1 score are used to measure the performance of the models. Table 5 shows that all the measurement indices of the optimized models are greater than 0.75, which indicates the validity and feasibility of our method.

Download:

Table 5. Classification performance of each machine learning model.

https://doi.org/10.1371/journal.pone.0278523.t005

To compare with the previous methods, this study also conducts the regression models, including the multiple linear regression and multiple logistic regression, which have been widely used in the previous studies [6, 12–14]. Results show that the linear regression is not statistically significant, indicating that it does not fit the data well. And the performance of logistic regression model (optimized by Bayesian optimization parameter adjustment) is relatively poor compared to the models used in this study. Fig 7 shows its performance under the 10-folds cross validation. The results further suggest that the regression models are not suitable for the citation prediction [17].

Download:

Fig 7. The performance of logistic regression model.

https://doi.org/10.1371/journal.pone.0278523.g007

3.4 Identification of potential highly cited patents

3.4.1 The generalization ability of the machine learning models.

This paper utilizes patents published in 2017 and 2018 to verify the generalization ability of the machine learning models. According to the methods utilized in Section 3.2, the index system for the 1840 patents is constructed and applied to the various machine learning models to predict their citation frequency after 3 years. Compared with the actual data, the accuracy of the three models of SVM, RF, and XGBoost is 0.998, 0.997, and 0.986, respectively, which further supports the feasibility of our method. The SVM model, with the best generalization ability, is used to construct the index system of highly cited patents (Table 6) and predict potential highly cited patents published from 2019 to 2022, which are representative of the technological frontier of this domain.

Download:

Table 6. The index system of highly cited patents in the electric vehicle domain based on the SVM model.

https://doi.org/10.1371/journal.pone.0278523.t006

Table 6 shows that the indices that represent quantities of descriptions in a patent document, such as the number of pages and the number of claims, have the most significant influences on the citation frequency. And compared with the characteristics of the patent (such as the number of new classifications), the indicators that are representative of the characteristics of technological knowledge sources (e.g., the number of associated classifications with backward patent citations and the novelty of backward patent citations) contribute more to the citation frequency. The results are basically consistent with those of Yoshikane [11]. They also agree with the features of highly cited scientific papers. The number of backward citations of papers has been found to be strongly correlated to the citation frequency. For example, Kostoff found that highly cited papers in the journal Lancet have a 33% to 300% greater average number of references per article than those that are less frequently cited [39]. Snizek et al. found a strong relationship between the number of backward citations and forward citations for papers in the DNA domain [40].

It is also interesting to note that in contrast to the important effect of the reputation of the author on the citation frequency of a paper [41], the characteristic indices related to the “inventor” only show moderate effects on the citation frequency of patents. There are 2339 inventors who have highly cited patents from 2000 to 2016 in this study, but only 71 of them have earlier patents published before 2000 in the EV domain. Thus, most of the inventors of highly cited patents between 2000 and 2016 are not “famous” people. However, after comparing the citation frequency (within 3 years) between the 71 inventors’ 133 earlier patents and all earlier patents, we found that the 71 inventors’ patents tended to be cited more than the others on average: 36.84% of the earlier patents by the 71 inventors were highly cited patents (cited frequency≥10), and 19.30% were in the whole set. These results show that there is a relationship between the reputation of the inventors and the citation frequency, but the relationship is not as close as in scientific papers.

Finally, the scientific source indices, which include the number of scientific citations and the number of scientific categories, show weak relationships with the citation frequency. This might be because patents rarely cite scientific papers. For example, in this study, only 392 patents cite 1732 scientific papers. In addition, a previous study showed that references to nonpatent literature (NPL) are only informative about the value of pharmaceutical and chemical patents but not other technical fields [23].

3.4.2 Potential highly cited patents and frontier technologies.

Based on the optimized SVM model, we predict the potential highly cited patents by applying a total of 4522 newly published patents in the years from 2019 to 2021. A total of 245 potential highly cited patents are identified, and the related topics are extracted using the LDA model and the extraction of terms (using the Cortext) based on the titles and abstracts of the patents. Jointly considering the coherence score [42] and the volumes of data, the optimized number of topics is set to 5. The detailed keywords and topics are summarized in Table 7.

Download:

Table 7. Technological frontier topics in the EV domain.

https://doi.org/10.1371/journal.pone.0278523.t007

The battery topic covers the battery framework design, battery materials, battery charging, etc. The battery pack is the most challenging item in terms of reliability and safety issues of EVs, so the protection of batteries is of great importance [43]. The utilized magnesium in the EV battery is also noteworthy. Compared to lithium, magnesium has some advantages, including low cost and high energy density. Thus, magnesium-lithium hybrid batteries are regarded as the most promising technologies in the EV battery domain [44]. The charging technologies include fast charging and charging connectors. Modifying the charging connector could cut off the power if there is an overheating issue in the battery and to avoid fire, so it is vital to the EV [45]. Fast charging is important for the current implementation of electric mobility systems and is directly related to the user experience. Improving the quality of the energy transferred by charging stations to the vehicle requires further innovation.

Intelligent information management involves the technologies of cloud systems, configuration information storage, data production, third-party applications, etc. An intelligent and connected vehicle (ICV) cloud system is regarded as a potentially synthetic solution for high-level automated driving to improve safety and optimize traffic flow in intelligent transportation [46]. Among the various technologies, a third-party application is noticeable; it can be used for the exchange of information with cloud-based processing systems and for managing user profiles for vehicles, and it is helpful to protect data privacy and secure transactions [47].

Stability control is related to insulated and fire-retardant, tire state detection, and braking technologies. The operation of an electric vehicle produces a significant amount of heat, which requires coolant to help control temperature and guarantee safe operation [45]. In addition, the silica-based aerogel has attracted a great deal of attention due to its lightweight and stable insulation performance [48]. The tire state of EVs is crucial to vehicle stability and driving safety. Related technologies include tire pressure detection and other technologies.

Wireless charging is a challenging issue in the domain of EVs. It includes the magnetic coupling design, charge monitoring, and power transmission technologies. It has the advantages of convenience and high speed and is expected to have bright development prospects. Magnetic field coupling has high reliability, flexibility, and security and has been successfully used in wireless charging in several domains [49]. However, the requirement of high transmission efficiency in the EV domain requires more innovation in the future. The charge level monitoring is directly related to the driver’s safety. Foreign object detection (FOD) is significant in wireless charging to prevent the system from overheating [50].

Vehicle operation topics include the technologies of power management, parking space detection, brake operation, and reaction operation. Due to the large current, the graphene cell can be charged quickly and has excellent future prospects [51]. With the growth in EV sales, the demand for parking at charging points has vastly increased. Thus, intelligent parking management has attracted considerable attention from manufacturers [52]. In addition, the sensing system and on-board diagnostic (OBD) system are vital to the driving safety and performance maintenance of EVs. For example, the user can be quickly identified through the OBD data [53]. However, the OBD technology applied in EVs is imperfect and should be improved in the future [54].

4 Conclusion

Early identification of the technological frontier is important for the optimal allocation of enterprises’ R&D resources and the formulation of government innovation strategies. Many scholars have used bibliometric methods to identify the technological frontier, and the citation analysis method has been widely used. However, it takes a certain amount of time to accumulate citations of patents. The existing citation analysis method cannot incorporate newly published patents, which are potentially highly cited, into the data collection of the important patents used to identify the technological frontier. Therefore, this paper proposes a novel framework for identifying the technological frontier by utilizing classification algorithms based on machine learning models. The electric vehicle domain is used to verify the availability of our method.

Compared with the multiple regression model used in previous studies, the classification algorithm based on machine learning models used in our study has many advantages. First, the citation frequency of patents often shows a long-tail effect, which is not suitable for a regression model. Second, coarse-grained models perform better in fitting accuracy and have better generalization abilities. In addition, this study considers a broader scope of factors that may affect citation frequency than previous studies, which could increase the accuracy of the results. Even though our new additional indices related to the characteristics of scientific citations show weak correlations with citation frequency, their positive correlations also support the relationship between science and technology.

The frame we propose to study the technological frontier is open, and additional machine learning models could be used to analyze the highly cited patents in a domain and examine the relations between different variables and citation frequency. Thus, one can predict the citation frequency of new patents once published and identify the technological frontier based on potential highly cited patents. This study can help us further understand the relationships between influencing factors and the citation frequency in a domain and contributes to the intelligent identification of the technological frontier in a timely and accurate manner.

There are also some limitations in this study. First, we suggested the indices using the literature reviews, in future studies, more factors that may affect the citation frequency of patents should be introduced. Second, more machine learning models could be utilized to analyze the technological frontier, and we could select the model with high classification accuracy. In addition, to clarify the different factors that affect the citation frequency of patents in different domains, we would like to conduct further analysis of various technological domains, especially the domains with significantly different technological knowledge.

Supporting information

S1 File. The patent data of EV domain.

https://doi.org/10.1371/journal.pone.0278523.s001

(XLSX)

Acknowledgments

The authors wish to acknowledge Professor Christopher L. Magee, from Massachusetts Institute of Technology, for his useful comments on the paper. And thanks to the anonymous referees and the editor for their valuable suggestions in the completion of this study.

References

1. Lee CY, Cho Y, Seol H, Park Y. A stochastic patent citation analysis approach to assessing future technological impacts. Technol Forecast Soc Chang. 2012; 79:16–29.
- View Article
- Google Scholar
2. Narin F, Carpenter MP, Woolf P. Technological performance assessments based on patents and patent citations. IEEE T Eng Manage. 1984; 36(2):172–183.
- View Article
- Google Scholar
3. Trajtenberg M. A penny for your quotes: Patent citations and the value of innovations. The Rand Journal of Economics. 1990; 21(1):172–187.
- View Article
- Google Scholar
4. Harhoff D, Narin F, Scherer FM, Vopel K. Citation frequency and the value of patented inventions. The Review of Economics and Statistics. 1999; 81(3):511–515.
- View Article
- Google Scholar
5. Sato Y, Iwayama M. A study of patent document score based on citation analysis. Information Processing Society of Japan SIG Technical Report. 2006; 59:9–16.
- View Article
- Google Scholar
6. Lin BW, Chung JC, Wu HL. Predicting citations to biotechnology patents based on the information from the patent documents. Int J Technol Manage. 2007; 40(1–3):87–100.
- View Article
- Google Scholar
7. Singh A, Triulzi G, Magee CL. Technological improvement rate predictions for all technologies: Use of patent data and an extended domain description. Res Policy. 2021; 50:104294.
- View Article
- Google Scholar
8. Triulzi G, Alstott J, Magee CL. Estimating technology performance improvement rates by mining patent data. Technol Forecast Soc Chang. 2020; 158:120100.
- View Article
- Google Scholar
9. Peters HPF, van Raan AFJ. On determinants of citation scores: A case study in chemical engineering. J Am Soc Inf Sci. 1994; 45(1):39–49.
- View Article
- Google Scholar
10. Bornmann L, Daniel HD. Multiple publication on a single research study: Does it pay? The influence of number of research articles on total citation counts in biomedicine. J Am Soc Inf Sci Technol. 2007; 58(8):1100–1107.
- View Article
- Google Scholar
11. Yoshikane F. Multiple regression analysis of a patent’s citation frequency and quantitative characteristics: the case of Japanese patents. Scientometrics. 2013; 96:365–379.
- View Article
- Google Scholar
12. Yoshikane F, Suzuki Y, Tsuji K. Analysis of the relationship between citation frequency of patents and diversity of their backward citations for Japanese patents. Scientometrics. 2012; 92(3):721–733.
- View Article
- Google Scholar
13. Yoo JB, Chung YM. Analysis of factors influencing patent citations. Journal of the Korean Society for Information Management. 2010; 27(1):103–118. Korean.
- View Article
- Google Scholar
14. Yoon JW, Lee CS, Lee SJ. Analysis of factors influencing patent citations: focused on Korea medical device patents. Journal of the Korean Society for Information Management. 2016; 33(2):103–133. Korean.
- View Article
- Google Scholar
15. Price DJS. Is technology historically independent of science? A study in statistical historiography. Technol Cult. 1965; 6(4):553–568.
- View Article
- Google Scholar
16. Nelson RR, Rosenberg N. Technical innovation and national systems. In: Nelson RR, editor. National innovation systems: A comparative study. Oxford: Oxford University Press; 1993.
17. Carterette B, Diaz F, Castillo C, Metzler D, editors. Will this paper increase your H-index? Scientific impact prediction. Proceedings of the 7th International Conference on Web Search and Data Mining; 2014 Feb 24–28; New York, USA. New York: ACM; 2015.
18. Benson CL. Cross-domain comparison of quantitative technology improvement using patent derived characteristics. Massachusetts Institute of Technology, 2014.
- View Article
- Google Scholar
19. Rosenkopf L, Nerkar A. Beyond local search: Boundary-spanning, exploration, and impact in the optical disk industry. Strateg Manage J. 2001; 22(4):287–306.
- View Article
- Google Scholar
20. Ahuja G, Lampert C. Entrepreneurship in the large corporation: A longitudinal study of how established firms create breakthrough inventions. Strateg Manage J. 2001; 22 (6/7):521–543.
- View Article
- Google Scholar
21. Fleming L. Recombinant uncertainty in technological search. Manage Sci. 2001; 47(1):117–132.
- View Article
- Google Scholar
22. Negroponte N. Creating a culture of ideas: what sparks the ideas that beget new technologies? The co-founder of MIT’s Media Lab says celebrating wrong answers and listening to youth make a good start. (Emerging Technologies). Technol Rev. 2003; 106(1):34–35.
- View Article
- Google Scholar
23. Harhoff D, Scherer FM, Vopel K. Citations, family size, opposition and the value of patent rights. Res Policy. 2003; 32(8):1343–1363.
- View Article
- Google Scholar
24. Schoenmakers W, Duysters G. The technological origins of radical inventions. Res Policy. 2010; 39(8):1051–1059.
- View Article
- Google Scholar
25. Hur W, Oh J. A man is known by the company he keeps?: A structural relationship between backward citation and forward citation of patents. Res Policy 2020, 50(1): 104–117.
- View Article
- Google Scholar
26. Dahlin KB, Behrens DM. When is an invention really radical? Defining and measuring technological radicalness. Res Policy. 2005; 34(5):717–737.
- View Article
- Google Scholar
27. Gittelman M, Kogut B. Does good science lead to valuable knowledge? Biotechnology firms and the evolutionary logic of citation patterns. Manage Sci. 2003; 49(4):366–382.
- View Article
- Google Scholar
28. Han F, Magee CL. Testing the science/technology relationship by analysis of patent citations of scientific papers after decomposition of both science and technology. Scientometrics. 2018; 116:767–796.
- View Article
- Google Scholar
29. Alcácer J, Gittelman M, Sampat B. Applicant and examiner citations in U.S. patents: An overview and analysis. Res Policy. 2009; 38:425–427.
- View Article
- Google Scholar
30. van Raan AFJ, Winnink JJ. Do younger sleeping beauties prefer a technological prince? Scientometrics. 2018; 114:701–717. pmid:29449753
- View Article
- PubMed/NCBI
- Google Scholar
31. Choi S, Park H. Investigation of strategic changes using patent co-inventor network analysis: The case of Samsung electronics. Sustainability. 2016; 8,1315
- View Article
- Google Scholar
32. Acedo FJ, Barroso C, Casanueva C, Galán JL. Co-authorship in management and organizational studies: An empirical and network analysis. J. Manag. Stud. 2006; 43:957–983.
- View Article
- Google Scholar
33. Cortes C, Vapnik VN. Support vector networks. Machine Learning. 1995; 20(3):273–297.
- View Article
- Google Scholar
34. Breiman L. Random Forests. Mach Learn. 2001; 45(1):5–32.
- View Article
- Google Scholar
35. Shah M, Aggarwal C, Shen D, Rastogi R, editors. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Aug 13–17; New York, USA. New York: ACM; 2016.
36. Forman G. An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res. 2003; 3(2):1289–1305.
- View Article
- Google Scholar
37. Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. Proceedings of the Third International AAAI Conference on Weblogs and Social Media; 2009 May 17–20; San Jose, California.
38. van Eck NJ, Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. 84(2):523–538. pmid:20585380
- View Article
- PubMed/NCBI
- Google Scholar
39. Kostoff RN. The difference between highly and poorly cited medical articles in the journal Lancet. Scientometrics. 2007; 72(3): 513–520.
- View Article
- Google Scholar
40. Snizek WE, Oehler K, Mullins NC. Textual and nontextual characteristics of scientific papers: Neglected science indicators. Scientometrics. 1991; 20(1): 25–35.
- View Article
- Google Scholar
41. Bazerman C. Physicists reading physics. Writ Commun. 1985; 2: 3–23.
- View Article
- Google Scholar
42. Tsujii J, Henderson J, Pasca M, editors. Exploring topic coherence over many models and many topics. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; 2012 Jul 12–14; Jeju Island, Korea; Pennsylvania: Association for Computational Linguistics; 2012.
43. Gandoman FH, Ahmade A, Bossche PV, Mierlo JV, Omar N, Nezhad AE, et al. Status and future perspective of reliability assessment for electric vehicles Reliab Eng Syst Saf. 2019; 183, 1e16.
- View Article
- Google Scholar
44. Li YJ, Zheng YJ, Guo K, Zhao JT, Li CL. Mg-Li hybrid batteries: the combination of fast kinetics and reduced overpotential. Energy Material Advances. 2022; Article ID:9840837.
- View Article
- Google Scholar
45. Feng SD, An HZ, Li HJ, Qi YB, Wang Z, Guan Q, et al. The technology convergence of electric vehicles: Exploring promising and potential technology convergence relationships and topics. Journal of Cleaner Production. 2020; 260:120992.
- View Article
- Google Scholar
46. Chu WB, Wuniri Q, Du XQ, Xiong QC, Huang T, Li KQ. Cloud control system architectures, technologies and applications on intelligent and connected vehicles: a review. Chinese Journal of Mechanical Engineering 2021; 34:139.
- View Article
- Google Scholar
47. Ding YY, Strategic collaboration between land owners and charging station operators: lease or outsource? SSRN, 2021.
- View Article
- Google Scholar
48. Almeida C, Ghica ME, Ramalho AL, Dures L. Silica-based aerogel composites reinforced with different aramid fibres for thermal insulation in Space environments. J Mater Sci. 2021; 56:2–3.
- View Article
- Google Scholar
49. Liao ZJ, Sun Y, Ye ZH, Tang C, Wang PY. Resonant analysis of magnetic coupling wireless power transfer systems. IEEE Trans Power Electron. 2019; 34(6): 5513–5523.
- View Article
- Google Scholar
50. Chun TR, Mi C. Wireless power transfer for electric vehicles and mobile devices. 1st ed. Piscataway: Wiley-IEEE Press; 1963.
51. Mooney J. Graphene-based supercapacitors: the future of electric cars: a new discovery could boost the power and range of electric vehicles. ECN, 2015; 59(2):21–22.
- View Article
- Google Scholar
52. Chen TD, Kockelman KM, Khan M. Locating electric vehicle charging stations: parking-based assignment method for Seattle, Transport Res Rec. 2013; 2385(1): 28–36.
- View Article
- Google Scholar
53. Wang LY, Wang LF, Liu WL, Zhang YW. Research on fault diagnosis system of electric vehicle power battery based on OBD technology. Proceedings of the 2017 International Conference on Circuits, Devices and Systems; 2017 Sep. 5–8; Chengdu, China. Piscataway: IEEE; 2017.
54. Yang Y, Chen B, Su L, Qin D. Research and development of hybrid electric vehicles can-bus data monitor and diagnostic system through OBD-II and android-based smartphones. ADV MECH. 2013; 741240.
- View Article
- Google Scholar

[ref1] 1. Lee CY, Cho Y, Seol H, Park Y. A stochastic patent citation analysis approach to assessing future technological impacts. Technol Forecast Soc Chang. 2012; 79:16–29.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Narin F, Carpenter MP, Woolf P. Technological performance assessments based on patents and patent citations. IEEE T Eng Manage. 1984; 36(2):172–183.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Trajtenberg M. A penny for your quotes: Patent citations and the value of innovations. The Rand Journal of Economics. 1990; 21(1):172–187.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Harhoff D, Narin F, Scherer FM, Vopel K. Citation frequency and the value of patented inventions. The Review of Economics and Statistics. 1999; 81(3):511–515.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Sato Y, Iwayama M. A study of patent document score based on citation analysis. Information Processing Society of Japan SIG Technical Report. 2006; 59:9–16.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Lin BW, Chung JC, Wu HL. Predicting citations to biotechnology patents based on the information from the patent documents. Int J Technol Manage. 2007; 40(1–3):87–100.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Singh A, Triulzi G, Magee CL. Technological improvement rate predictions for all technologies: Use of patent data and an extended domain description. Res Policy. 2021; 50:104294.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Triulzi G, Alstott J, Magee CL. Estimating technology performance improvement rates by mining patent data. Technol Forecast Soc Chang. 2020; 158:120100.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Peters HPF, van Raan AFJ. On determinants of citation scores: A case study in chemical engineering. J Am Soc Inf Sci. 1994; 45(1):39–49.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Bornmann L, Daniel HD. Multiple publication on a single research study: Does it pay? The influence of number of research articles on total citation counts in biomedicine. J Am Soc Inf Sci Technol. 2007; 58(8):1100–1107.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Yoshikane F. Multiple regression analysis of a patent’s citation frequency and quantitative characteristics: the case of Japanese patents. Scientometrics. 2013; 96:365–379.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Yoshikane F, Suzuki Y, Tsuji K. Analysis of the relationship between citation frequency of patents and diversity of their backward citations for Japanese patents. Scientometrics. 2012; 92(3):721–733.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Yoo JB, Chung YM. Analysis of factors influencing patent citations. Journal of the Korean Society for Information Management. 2010; 27(1):103–118. Korean.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Yoon JW, Lee CS, Lee SJ. Analysis of factors influencing patent citations: focused on Korea medical device patents. Journal of the Korean Society for Information Management. 2016; 33(2):103–133. Korean.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Price DJS. Is technology historically independent of science? A study in statistical historiography. Technol Cult. 1965; 6(4):553–568.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Nelson RR, Rosenberg N. Technical innovation and national systems. In: Nelson RR, editor. National innovation systems: A comparative study. Oxford: Oxford University Press; 1993.

[ref17] 17. Carterette B, Diaz F, Castillo C, Metzler D, editors. Will this paper increase your H-index? Scientific impact prediction. Proceedings of the 7th International Conference on Web Search and Data Mining; 2014 Feb 24–28; New York, USA. New York: ACM; 2015.

[ref18] 18. Benson CL. Cross-domain comparison of quantitative technology improvement using patent derived characteristics. Massachusetts Institute of Technology, 2014.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref19] 19. Rosenkopf L, Nerkar A. Beyond local search: Boundary-spanning, exploration, and impact in the optical disk industry. Strateg Manage J. 2001; 22(4):287–306.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref20] 20. Ahuja G, Lampert C. Entrepreneurship in the large corporation: A longitudinal study of how established firms create breakthrough inventions. Strateg Manage J. 2001; 22 (6/7):521–543.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref21] 21. Fleming L. Recombinant uncertainty in technological search. Manage Sci. 2001; 47(1):117–132.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref22] 22. Negroponte N. Creating a culture of ideas: what sparks the ideas that beget new technologies? The co-founder of MIT’s Media Lab says celebrating wrong answers and listening to youth make a good start. (Emerging Technologies). Technol Rev. 2003; 106(1):34–35.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref23] 23. Harhoff D, Scherer FM, Vopel K. Citations, family size, opposition and the value of patent rights. Res Policy. 2003; 32(8):1343–1363.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref24] 24. Schoenmakers W, Duysters G. The technological origins of radical inventions. Res Policy. 2010; 39(8):1051–1059.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref25] 25. Hur W, Oh J. A man is known by the company he keeps?: A structural relationship between backward citation and forward citation of patents. Res Policy 2020, 50(1): 104–117.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref26] 26. Dahlin KB, Behrens DM. When is an invention really radical? Defining and measuring technological radicalness. Res Policy. 2005; 34(5):717–737.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref27] 27. Gittelman M, Kogut B. Does good science lead to valuable knowledge? Biotechnology firms and the evolutionary logic of citation patterns. Manage Sci. 2003; 49(4):366–382.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref28] 28. Han F, Magee CL. Testing the science/technology relationship by analysis of patent citations of scientific papers after decomposition of both science and technology. Scientometrics. 2018; 116:767–796.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref29] 29. Alcácer J, Gittelman M, Sampat B. Applicant and examiner citations in U.S. patents: An overview and analysis. Res Policy. 2009; 38:425–427.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref30] 30. van Raan AFJ, Winnink JJ. Do younger sleeping beauties prefer a technological prince? Scientometrics. 2018; 114:701–717. pmid:29449753
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref31] 31. Choi S, Park H. Investigation of strategic changes using patent co-inventor network analysis: The case of Samsung electronics. Sustainability. 2016; 8,1315
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref32] 32. Acedo FJ, Barroso C, Casanueva C, Galán JL. Co-authorship in management and organizational studies: An empirical and network analysis. J. Manag. Stud. 2006; 43:957–983.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref33] 33. Cortes C, Vapnik VN. Support vector networks. Machine Learning. 1995; 20(3):273–297.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref34] 34. Breiman L. Random Forests. Mach Learn. 2001; 45(1):5–32.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref35] 35. Shah M, Aggarwal C, Shen D, Rastogi R, editors. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Aug 13–17; New York, USA. New York: ACM; 2016.

[ref36] 36. Forman G. An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res. 2003; 3(2):1289–1305.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref37] 37. Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. Proceedings of the Third International AAAI Conference on Weblogs and Social Media; 2009 May 17–20; San Jose, California.

[ref38] 38. van Eck NJ, Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. 84(2):523–538. pmid:20585380
View Article
PubMed/NCBI
Google Scholar

[106] View Article

[107] PubMed/NCBI

[108] Google Scholar

[ref39] 39. Kostoff RN. The difference between highly and poorly cited medical articles in the journal Lancet. Scientometrics. 2007; 72(3): 513–520.
View Article
Google Scholar

[110] View Article

[111] Google Scholar

[ref40] 40. Snizek WE, Oehler K, Mullins NC. Textual and nontextual characteristics of scientific papers: Neglected science indicators. Scientometrics. 1991; 20(1): 25–35.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref41] 41. Bazerman C. Physicists reading physics. Writ Commun. 1985; 2: 3–23.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

[ref42] 42. Tsujii J, Henderson J, Pasca M, editors. Exploring topic coherence over many models and many topics. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; 2012 Jul 12–14; Jeju Island, Korea; Pennsylvania: Association for Computational Linguistics; 2012.

[ref43] 43. Gandoman FH, Ahmade A, Bossche PV, Mierlo JV, Omar N, Nezhad AE, et al. Status and future perspective of reliability assessment for electric vehicles Reliab Eng Syst Saf. 2019; 183, 1e16.
View Article
Google Scholar

[120] View Article

[121] Google Scholar

[ref44] 44. Li YJ, Zheng YJ, Guo K, Zhao JT, Li CL. Mg-Li hybrid batteries: the combination of fast kinetics and reduced overpotential. Energy Material Advances. 2022; Article ID:9840837.
View Article
Google Scholar

[123] View Article

[124] Google Scholar

[ref45] 45. Feng SD, An HZ, Li HJ, Qi YB, Wang Z, Guan Q, et al. The technology convergence of electric vehicles: Exploring promising and potential technology convergence relationships and topics. Journal of Cleaner Production. 2020; 260:120992.
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref46] 46. Chu WB, Wuniri Q, Du XQ, Xiong QC, Huang T, Li KQ. Cloud control system architectures, technologies and applications on intelligent and connected vehicles: a review. Chinese Journal of Mechanical Engineering 2021; 34:139.
View Article
Google Scholar

[129] View Article

[130] Google Scholar

[ref47] 47. Ding YY, Strategic collaboration between land owners and charging station operators: lease or outsource? SSRN, 2021.
View Article
Google Scholar

[132] View Article

[133] Google Scholar

[ref48] 48. Almeida C, Ghica ME, Ramalho AL, Dures L. Silica-based aerogel composites reinforced with different aramid fibres for thermal insulation in Space environments. J Mater Sci. 2021; 56:2–3.
View Article
Google Scholar

[135] View Article

[136] Google Scholar

[ref49] 49. Liao ZJ, Sun Y, Ye ZH, Tang C, Wang PY. Resonant analysis of magnetic coupling wireless power transfer systems. IEEE Trans Power Electron. 2019; 34(6): 5513–5523.
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref50] 50. Chun TR, Mi C. Wireless power transfer for electric vehicles and mobile devices. 1st ed. Piscataway: Wiley-IEEE Press; 1963.

[ref51] 51. Mooney J. Graphene-based supercapacitors: the future of electric cars: a new discovery could boost the power and range of electric vehicles. ECN, 2015; 59(2):21–22.
View Article
Google Scholar

[142] View Article

[143] Google Scholar

[ref52] 52. Chen TD, Kockelman KM, Khan M. Locating electric vehicle charging stations: parking-based assignment method for Seattle, Transport Res Rec. 2013; 2385(1): 28–36.
View Article
Google Scholar

[145] View Article

[146] Google Scholar

[ref53] 53. Wang LY, Wang LF, Liu WL, Zhang YW. Research on fault diagnosis system of electric vehicle power battery based on OBD technology. Proceedings of the 2017 International Conference on Circuits, Devices and Systems; 2017 Sep. 5–8; Chengdu, China. Piscataway: IEEE; 2017.

[ref54] 54. Yang Y, Chen B, Su L, Qin D. Research and development of hybrid electric vehicles can-bus data monitor and diagnostic system through OBD-II and android-based smartphones. ADV MECH. 2013; 741240.
View Article
Google Scholar

[149] View Article

[150] Google Scholar

Figures

Abstract

Introduction

Methods

Results

Discussion

1 Introduction

2 Materials and methods

2.1 Data

2.2 Index system of highly cited patents

2.2.1 Characteristics of patents.

2.2.2 Characteristics of the knowledge source.

2.2.3 Characteristics of inventors.

2.3 Machine learning model building

2.3.1 SVM.

2.3.2 RF.

2.3.3 XGBoost.

3 Empirical analysis

3.1 Data

3.2 Construction of the highly cited patent identification indicator system

3.3 Comparison of the performance of machine learning models

3.4 Identification of potential highly cited patents

3.4.1 The generalization ability of the machine learning models.

3.4.2 Potential highly cited patents and frontier technologies.

4 Conclusion

Supporting information

S1 File. The patent data of EV domain.

Acknowledgments

References