Recognition of industrial machine parts based on transfer learning with convolutional neural network

As the industry gradually enters the stage of unmanned and intelligent, factories in the future need to realize intelligent monitoring and diagnosis and maintenance of parts and components. In order to achieve this goal, it is first necessary to accurately identify and classify the parts in the factory. However, the existing literature rarely studies the classification and identification of parts of the entire factory. Due to the lack of existing data samples, this paper studies the identification and classification of small samples of industrial machine parts. In order to solve this problem, this paper establishes a convolutional neural network model based on the InceptionNet-V3 pretrained model through migration learning. Through experimental design, the influence of data expansion, learning rate and optimizer algorithm on the model effectiveness is studied, and the optimal model was finally determined, and the test accuracy rate reaches 99.74%. By comparing with the accuracy of other classifiers, the experimental results prove that the convolutional neural network model based on transfer learning can effectively solve the problem of recognition and classification of industrial machine parts with small samples and the idea of transfer learning can also be further promoted.


Introduction
With the advancement of industrial technology and the transformation and development of modern factories, the manufacturing industry has gradually entered an unmanned and intelligent stage. In order to enable the entire production workshop to complete tasks systematically, orderly and independently, intelligent monitoring and maintenance are an important link. In order to improve the accuracy of monitoring and maintenance, accurate recognition of parts and machine parts of the entire industrial workshop is the primary goal.
In order to realize unmanned and intelligent factories, in literature [1], a method of recognizing the position, location and orientation of irregular machine parts with a complex outline of the external contour is suggested. In literature [2], a geometric part measurement system for shaft parts based on machine vision is presented. These studies have potential applications for the development of manufacturing. However, there is no research on the identification of small samples of parts in the whole factory. In order to realize the accurate recognition of various machine parts, it is necessary to research the image recognition of industrial machine parts. Image recognition refers to the technology of recognition of images with the same characteristics in different modes and environments via the use of computers to process, analyze and understand a large number of images. Traditional image recognition processes include image acquisition, image preprocessing, feature extraction and image recognition. Literature [3] evaluates the performance of the developed MLP and SOM NN based classifier for detection of four conditions of three phase induction motor and examined the results. The cross-validation method in the paper is worth learning, but its calculation is more complicated and there are fewer classification types, and the effect cannot be determined in the face of small samples and multiple classifications, and the accuracy of the results can be further improved. At the same time, the literature only studies the induction motor, and does not extend to the parts of the entire factory. In recent years, deep learning based on convolutional neural networks (CNN) [4] have been used for various types of image recognition and achieved very good results. In 2012, the CNN network model reduced the error rate from 25.77% to 15.319% for the first time in the ImageNet Large Acale Visual Recognition Challenge (ILSVRC) competition. And at the 2017 competition, the lowest error rate has dropped to 2.251%.
Although CNN plays a certain role in image feature extraction and recognition, they all require a large amount of sample data for iterative training of neural networks. However, for this specific field of industrial machine component recognition, there is not enough sample data. If the small sample data is directly trained by CNN, the obtained model will have a large error and is difficult be promoted. In order to solve the problem of image recognition with small sample, Gene Kitamura realized the detection and recognition of small samples of ankle fractures through a new training and multi-view merged CNN [5] and the final recognition accuracy reached 81%. Jufeng Yang proposed a self-paced learning algorithm for small sample recognition of clinical rare skin diseases [6]. Qian Huang proposes a new blood cell classification framework based on medical hyperspectral imaging in order to complete the task of white blood cell classification under small sample training, combining the modulated Gabor wavelet and CNN kernel [7]. In the field of industrial machine parts recognition, there are few algorithms related to its specific research. In order to solve the problem of recognition and classification of industrial machine parts under the condition of small samples, a recognition method of industrial machine parts based on transfer learning [8][9][10][11][12][13][14][15] with CNN is proposed. Not only can further improve the recognition accuracy, but also save the time of model training. And the related parameters are flexible and can be adjusted according to different target images and recognition tasks. It also has a good promotion effect in related research and industrial practice.
The first part of this article explains the necessity and current situation of the research on this subject, aiming to solve the problem that the factory will realize the accurate classification and recognition of industrial machine parts in the plant under a small sample data set in the future. The second part introduces and constructs the basic theory and basic framework of transfer learning based on CNN. The third part introduces the source and classification of the experimental data set, designs the experimental grouping and studies the influence of different variables on the model training effect. The fourth part summarizes the experiment and puts forward the research direction of the next stage. [16]. Based on the feed-forward neural network, the model updates the parameter weights by iteratively training the loss function to feed back the errors to each network layer. As the iterative training progresses, the parameter weights are continuously updated to achieve the desired training effect.
The role of the convolution layer is to perform feature mapping of the input through the convolution kernel to extract the features of the image [17][18][19]. The convolution operation formula is Where S (i, j) represents the output tensor of the convolution layer, I (i+m, j+n) represents the input tensor of the convolution layer, K (m, n) represents the convolution kernel, i, j represent the coordinate values of the tensor, m, n represent the coordinate value of the convolution kernel.
The function of the pooling layer is to further process the feature mapping results obtained by the convolution operation. The pooling function will statistically summarize the feature values of a position in the plane and its adjacent positions, and use the summarized result as the value of this position in this plane. Common pooling functions include average pooling, maximum pooling, and random pooling. Taking the maximum pooling function with a size of 2 × 2 as an example, its calculation formula is Where f pool represents the result after pooling, s i,j represents the element whose position on the feature map tensor is (i, j).
The fully connected layer is a dimensionality reduction and tiling of the results obtained by the convolutional layer and the pooling layer and then performs non-linear transformation through the activation function. Finally, the results are input into the classifier for classification.

Transfer learning.
Transfer learning is a new machine learning method that uses existing knowledge to solve different but related domain problems. For CNN, the convolutional layer and pooling layer are retained. The CNN's convolutional layer trained on a large amount of sample data can perform feature extraction on another image data. The extracted feature vector is processed by the pooling layer and then processed. Add a new fully connected layer to form a new network model. To put it simply, it retains the model's feature extraction and recognition capabilities and adds new object orientation to enable itself to complete new image recognition and classification tasks.

2.2.2.
InceptionNet-V3 convolutional network model. Commonly used pretrained models contain Resnet [20], VGG [21], Alexnet [22,23] and InceptionNet-V3 [24], etc. Compared with other models, InceptionNet-V3's classifier has a smaller number of operations, which can reduce the training time, and can also reduce the structural redundancy through convolution. At the same time, we can see from the literature [25] that on the classification problem based on transfer learning, InceptionNet-V3 achieved good results. Therefore, this article first considers the use of InceptionNet-V3 for transfer learning. This paper uses Incep-tionNet-V3 convolutional network model for transfer learning. InceptionNet-V3 was proposed in the paper "Rethinking the Inception Architecture for Computer Vision" in December 2015. InceptionNet-V3 has two main improvements over InceptionNet-V2. The first is to optimize the structure of the Inception Module. The second is to introduce a larger twodimensional convolution into two smaller one-dimensional convolutions in InceptionNet-V3. This is called the "Factorization into small convolutions" idea. This asymmetric convolutional structure split is more effective than symmetric structures in dealing with more and richer spatial features and increasing feature diversity. The architecture diagram of the InceptionNet-V3 model is shown in Fig 1. The InceptionNet-V3 model has a total of 46 layers and consists of 11 Inception modules, including 96 convolutional layers. The convolutional layers are implemented by TensorFlow's Slim tool.

Construction of transfer learning model.
InceptionNet-V3 completed training on the ImageNet data set and the number of training samples reached 1.2 million [26][27][28]. However, the number of images of industrial machine parts is not yet large enough. Therefore, the transfer learning method is used to recognize and classify industrial machine parts based on the InceptionNet-V3 model. For the trained InceptionNet-V3 model, the parameters of all convolutional layers are retained and the last fully connected layer is replaced. The previous network layer of this fully connected layer is called the bottleneck layer, which is the last

PLOS ONE
Recognition of industrial machine parts based on transfer learning with convolutional neural network Dropout layer in InceptionNet-V3. The results of the new fully connected layer are passed to a Softmax layer, and new recognition tasks can be processed. The modified module process of industrial machine parts recognition is shown in Fig 2. The gradient descent optimizers that can be used in training mainly include stochastic gradient descent, AdaGrad, RMSProp and Adam optimizers. Take the Adam algorithm as an example to introduce the principle of its optimization method.
First set the global learning rate σ. The exponential decay rate of moment estimation is ρ 1 and ρ 2 , and in the interval [0,1], the default is 0.9 and 0.990. The initialized parameter is ω. A small constant created for numerical stability δ, default takes δ = 10 −8 . The first and second moment variables s and r with initial values of 0. And an event step count t, t is initialized with t = 0. Then execute the following steps in a loop without stopping before the stop condition.
(1) Take out the mini-batch data of m samples from the training set {x 1 ,x 2 ,� � �,x m }, and the target corresponding to the data is represented by y i .
(2) Calculate the gradient as follows.
(3) The refresh time steps are as follows.
(4) Update the first-order partial moment estimation.
(5) Update the second-order biased moment estimation.
Assume that the output of the original neural network is y 1 , y 2 , . . . y n , then the output after softmax [29] regression processing is

Data set
The data set used in the experiment came from a field shooting of a factory, includeing 11 types of industrial machine parts such as control panels, plate, robotic arms, and assembly, etc. with a total of 1002 images. As shown in Fig 3A-3K are sample images of each type of image set. Table 1 summarizes the number of each category. In the experiment, the image is augmented to 4008 images through rotation, flip, etc. 80% of the data set is used for training, 10% is used for validation and 10% is used for testing. In Fig 3, from top to bottom, from left to right, control panels, robotic arms, interactive module, assembly, big machines, engine, hangar, old machinery, plates, tech parts and others are in turn.
The classification and recognition process of industrial machine parts based on transfer learning can be obtained as shown in Fig 4. Compared with other models, the novelty of the model in the paper is as follows: 1. The training process is simplified, the amount of calculation is reduced, and training time is saved.
2. In the case of limited samples, better training results can be achieved.
3. The redundancy of the structure is reduced, which is conducive to further expansion, improvement and supplementation.

Experimental design
The experiments were completed under the software environment of Python 3.7.0 and Tensor-Flow 1.15.0. In the hardware environment, the CPU uses Intel Corei5-6200U and the main frequency is 2.3GHz; the GPU uses NVIDIA GeForce 950M and 2GB video memory.  The hyperparameters of the training neural network are set as follows: the initial learning rate is set to 0.01, the batch size is set to 32 and the total number of iteration training times is set to 40,000.
In order to get better training results, the experiment set different contrast experimental groups: 1. Comparison of the original data set (1002 images) and the model of the data set (4008 images) after simple flipping, folding and other operations.
2. Comparison of models obtained under different learning rates.
3. Comparison of models obtained using different gradient descent optimizers.

Impact of image data augmentation on models.
For this experimental sample, under the condition that the learning rate is set to 0.01 and the optimizer uses a stochastic gradient descent optimizer, the two trained models are compared. The trend of the accuracy of the training set and validation set with the number of iterative trainings is shown in the Figs 5 and 6.
It can be seen that during the training process, the training set accuracy of the original data and the augmented data both reached 100% after 10,000 iterations of training. The accuracy of recognition of the augmented data validation set is significantly higher than the original data after 25,000 iterations of training. Then compare the value of the loss function during training, as shown in Fig 7. It can be found that the value of the loss function of the augmented data is always slightly higher than the value of the loss function of the original data, which also shows that as the number of data sets increases, it is necessary to increase the number of trainings to obtain more ideal training results. After the model training is completed, the test set divided by the experiment is used to test the model and the training effect of training data set and validation data set is summarized as shown in Table 2.
From Table 2, after 40,000 iterations of training, the accuracy of recognition of the training set has reached 100%. By expanding the image through operations such as rotation and  folding, the accuracy of recognition of the validation set of the model is increased by 6.26 percentage points, and the accuracy of recognition of the test set is increased by 1.52 percentage points. The accuracy of recognition of the test set is improved, but the amplitude is not large. The reason is that operations such as rotation and folding do not change the features and quality of the image. At the same time, the model has been trained on large data sets due to transfer learning model. A better feature extraction ability is obtained, so on the other hand, the effect of simple expansion of small sample data is also weakened. It can be obtained from Fig 8 that if the learning rate is too small (for example, the learning rate is set to 0.0001), the value of the loss function will fluctuate continuously, but the convergence cannot be reduced. The reason is that the learning rate is too small, the convergence speed is slow and no obvious convergence effect can be obtained with a small number of iterative training times. At the same time, it can be found that in the case of the experimental samples and settings, when the learning rate is 0.1, a significant gradient explosion occurs at the beginning of training. In order to eliminate the chance, further testing whether the learning rate is too large will cause a gradient explosion. Under the same conditions, the learning rate is set to 1 and 3 for iterative training. The loss function value of the first 200 iterations is shown in Fig 9. It can be obtained from Fig 9 that with the continuous increase of the learning rate, the peak value of the loss function at the beginning of training also increases and the effect of the gradient explosion is more significant. At the same time, in the subsequent iterative training, the value of the loss function continuously oscillated, proving that the model parameters are updated too quickly and the difference is large, destroying the previously trained weight information, causing the model to fail and the transfer learning to be meaningless.  It can be obtained from Fig 12 that under the experimental samples, neither of the two models of learning rate take the case of non-convergence or gradient explosion. The loss function value of the model with a learning rate of 0.01 is smaller than that of the model with a learning rate of 0.001. The convergence is faster and the fluctuation range is smaller.
After 40,000 iterations of training are completed, the test set is used to test the two learning rate models and the training effect of training data set and validation data set is summarized as shown in Table 3.
According to Table 3, after 40,000 iterations of training, the accuracy of recognition of the validation set of the model with a learning rate of 0.01 is the same as that of the model with a learning rate of 0.001, but the accuracy of recognition of the test set of the model with a learning rate of 0.01 is increased by 6.77 percentage points.

Impact of different gradient descent optimizers on models.
In order to further optimize the model and improve the accuracy, for the augmented data, the training results of the model are observed under the condition that the learning rate is set to 0.01 and a stochastic gradient descent optimizer and adaptive learning rate optimizers [33][34][35] based on AdaGrad algorithm, RMSProp algorithm, and Adam algorithm are used. Among them, for the adaptive learning rate optimizer based on the AdaGrad algorithm, initial_accumulator_value is set to 0.1. For the adaptive learning rate optimizer based on RMSPop algorithm, decay is set to 0.9, momentum is set to 0.0 and epsilon is set to 1e-10 by default. For Adaptive learning rate  optimizer has almost no difference and the accuracy with the Adam adaptive learning rate optimizer is higher than the other three optimizers. At the same time, it was found that the model with the adaptive learning rate optimizer based on RMSProp algorithm has lower accuracy of recognition of training set and validation set than the other three optimizer models and the fluctuation range is large. Combining the change of the loss function value under different optimizers (as shown in Fig 15), during the training of the optimizer of the RMSProp algorithm, the value of the loss function continuously oscillates and does not converge. It can be obtained that the adaptive learning rate optimizer based on RMSProp algorithm is not applicable to this experimental sample.
On the basis of Fig 15, remove the RMSprop adaptive learning rate optimizer and compare the loss function values of the other three optimizers, as shown in Fig 16. It can be found that the Adam adaptive learning rate optimizer has a small loss function value and fast convergence. After 7000 iterations of training, it has approached 0 and the change value is less than 0.005. The optimizer experienced a brief fluctuation during 6000 iterations, but the fluctuation range was less than 0.4, which did not affect the parameter update.
After 40,000 iterations of training were completed, the test set was used to test the models under the four optimizers and the training effect of training data set and validation data set is summarized as shown in Table 4.
It can be obtained from Table 4 that, for the experimental sample, the accuracy of the recognition of training set, validation set and test set of the model based on Adam adaptive learning  rate optimizer are higher than the other three optimizers and the model training effect is better.

Further optimization of results.
Since the above experiment is based on 80% of dataset used for training, under normal circumstances it is easy to lead to overfitting. Therefore, the samples are divided into different groups for model training. The final accuracy rates of training set, validation set and test set are shown in Table 5.
It can be seen from the results that when the ratio is 8:1:1, no overfitting occurs, and when the ratio of the training set is reduced, the accuracy of the test set obtained decreases. This paper analyzes this phenomenon and the main reasons are as follows.
1. Compared with big data, the number of the data set of this article is relatively small. If the proportion of the training set is further reduced, the training sample will be too small and the model fitting effect will be poor.
2. The InceptionNet-V3 used in this article has been trained by ImageNet and has good feature extraction capabilities. Unlike ordinary deep learning, transfer learning solves research problems with small samples.
3. It can be seen from the above training results that the accuracy of the test set has reached 99.04%, so it is judged that there is no overfitting.
Since the model will be applied to large sample data in the future, it is necessary to further optimize the model to achieve higher accuracy. This paper draws on the ideas and algorithms of literature [36], and considers the method of k-fold cross-validation to improve the model. Take 10% of the original data set as the final test set, and perform 10-fold cross-validation on the remaining 90% of the data. The accuracy rates of the 10 validation sets and test set obtained are shown in Table 6. It can be seen from Table 6 that the model has been further optimized by using 10-fold cross-validation. Except for one validation set with an accuracy rate of 96.88%, the accuracy rates of the remaining 9 validation sets are all 100%, and the accuracy rate of the final test set reaches 99.74%, which is 0.7% higher than the optimal result in Table 5. This proves the effectiveness and superiority of the 10-fold cross-validation method.

Comparison of classification results of different classifiers
In order to prove the superiority of the method adopted in this paper, after the feature extraction of the image, different classifiers are used for training and the accuracy of the obtained results is shown in Table 7. The parameter settings of the classifiers here are general values or default values, and the ratio of the training set to the test set of the classifier is 4:1. The k-fold cross-validation method is not used here, because it can be seen from the results that the accuracy rate obtained is much lower than the method used in this paper, so even if other classifiers use k-fold cross-validation, they cannot achieve quite high accuracy. It can be seen that for the recognition of industrial machine parts in factories with small samples, transfer learning based on CNN has obtained very good results and can be applied in the intelligent construction of factories in the future.

Conclusion
Based on the transfer learning of the InceptionNet-V3 convolutional neural network model, this paper identifies and classifies 11 types of components of industrial machines. Through Table 7. Accuracy of classification results of different classifiers.

Numbers
Types of classifier Accuracy / % Transfer learning based on Alexnet 82.14 data augmentation, setting different learning rates and different gradient descent optimizers, the accuracy of recognition of training set accuracy, validation set and test set of the trained model are compared based on 40,000 iterations of training. In the end, after the data augmentation, the initial learning rate is taken as 0.01 and the optimizer uses the Adam adaptive learning rate gradient descent optimizer, the obtained training model is optimal. Through the analysis of the data set division ratio and 10-fold cross-validation, the final accuracy rate of the test set is 99.74%. By comparing with the accuracy of other classifiers, it can be seen that the method adopted in this paper has a better effect. This provides a basis and foundation for each factory to carry out intelligent monitoring based on its own parts and components in the future industrial background. Due to the complexity of model calculations, we will continue to study how to simplify calculations so that the model can be quickly applied in industry.