Solving the initial value problem of ordinary differential equations by Lie group based neural network method

To combine a feedforward neural network (FNN) and Lie group (symmetry) theory of differential equations (DEs), an alternative artificial NN approach is proposed to solve the initial value problems (IVPs) of ordinary DEs (ODEs). Introducing the Lie group expressions of the solution, the trial solution of ODEs is split into two parts. The first part is a solution of other ODEs with initial values of original IVP. This is easily solved using the Lie group and known symbolic or numerical methods without any network parameters (weights and biases). The second part consists of an FNN with adjustable parameters. This is trained using the error back propagation method by minimizing an error (loss) function and updating the parameters. The method significantly reduces the number of the trainable parameters and can more quickly and accurately learn the real solution, compared to the existing similar methods. The numerical method is applied to several cases, including physical oscillation problems. The results have been graphically represented, and some conclusions have been made.


Introduction
Various fields, such as science, finance, and engineering, can transform several problems into a set of ordinary differential equations (ODEs) or partial DEs (PDEs) through mathematical modeling. Usually, the analytical solutions of these equations are unavailable. Therefore, solving the numerical solution of ODEs becomes particularly important to explore real world problems. Several off-the-shelf methods (e.g., Euler [1], Runge-Kutta [2], Finite difference [3], Adomian decomposition [4], and Lie group [5] methods) are available for solving numerical solutions of DEs [6]. Several new studies are being conducted to obtain more efficient algorithms. A class of them is the methods based on artificial neural networks (ANNs) or deep learning models [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. The main idea used in these methods is the use of the highly accurate approximation capability of an ANN to a continuous function [8]. This has been used in various ways. Considering [9,10], a representative ANN method for solving ODEs and PDEs is presented. Regarding the method, the trial solution is expressed as the sum of two terms. The a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 first term satisfies the initial or boundary values, and it does not contain network parameters. The second term is a feedforward NN (FNN) to be trained to satisfy the given ODEs or PDEs. However, the first term is constructed synthetically by observing the given initial or boundary values. Such free selection of the first term in the trial solution may not be suitable for capturing nonlinearity information of the real solution in the domain of the initial point. Regarding [11], He et al. used the extended back propagation algorithm to train the derivative of an FNN to solve a class of first-order PDEs. Moreover, Li-ying et al. proposed an ANN algorithm based on cosine basis functions to solve ODEs [12]. Ioannis et al. created a trial solution in the form of a neural network based on a syntactic evolution scheme to solve ODEs and PDEs [13]. Furthermore, Shekari et al. determined the approximate solutions of time-dependent PDEs based on ANNs and a hybrid method of minimization techniques and configuration methods [14]. Chakraver et al. proposed a regression-based NN model to solve the initial or boundary value problems (IBVPs) for ODEs [15]. Additionally, Mall et al. proposed a new method based on a single-layer Legendre NN model to solve the IBVP for nonlinear ODEs [16]. Dockhorn proved that small NNs (parameters <500) can accurately learn the solutions of practical problems (Poisson and Navier-Stokes equations) [17]. Shangjie [18] obtained the approximate solutions of DEs through an ANN. In [19], a deep learning algorithm is provided to solve the IBVP of a class of high dimensional stochastic PDEs. Multi-layer physical information NN deep learning was used to investigate data-driven peakon solutions and periodic peakon solutions of Camassa-Holm (CH) equation, Degasperis-Procesi equation, etc, with initial conditions [20].
The goal of using ANN to solve ODEs or PDEs is to make it computationally cheaper. Generally, performing ANNs to solve ODEs or PDEs involves a number of parameters to be trained with a lot of data. Considering several DEs, there is prior knowledge that is currently not being used in ANN algorithms or machine learning practice. These include some mathematical principles, solution expressions, or physical information. This prior information can act as a regularization agent that reduces the number of trainable parameters or the demand for a large number of training data. Therefore, using such useful information in a network algorithm results in amplifying the information content of the solution that the algorithm guides itself towards the accuracy approximation, even when only small network models or a few training samples are available. The physics-inspired NNs for solving DEs are proposed, where physical conservation laws and prior physical knowledge are encoded into the NNs (refer to [21] and references therein). For example, the logarithmic nonlinear Schrödinger (LNLS) equation is solved by a deep learning method of physical information NN. This is an important physical model in several fields such as quantum optics and nuclear physics in [22]. On the other hand, because the solution of DE is a continuous function, it is also important to find a trial function expression that is closer to the properties of the true solution. This makes the ANN effective and can quickly capture the implied properties of the real solution.
Lie group (symmetry) of DEs, universally known as one parameter transformation group method of DEs, has had a profound impact on all areas of mathematics (both pure and applied), physics, engineering, and other mathematically based sciences [5,23,24]. The groups can be found using symbolic computational method, and they are used to construct the explicit solutions of the corresponding DEs [25]. They also reduce the dimension in order of equations or number of involved variables, making the method efficiently solve nonlinear DEs within a certain range. Particularly, regarding an IVP of a first-order system of ODEs, this method also provides an explicit formal formula of the solution. This point is used in the present study. The advantages of the Lie groups in numerical analysis are extensively discussed in [26][27][28].
This study presents a method for solving the IVP of ODEs by combining the Lie group theory of ODEs and an FNN, which is significantly computationally cost effective. Moreover, the method leads to a differentiable, closed analytic form solution of the problem on whole interval of independent variables. Considering the method, using Lie group theory, the trial solutions to the IVP are expressed by the sum of two parts. The first part is the solution to an IVP of other ODEs with the initial value of the original problem. The ODEs in the IVP are from the part terms of the original ODEs selected by obeying some solvable principles so that the first part of the solution can easily be solved in advance by Lie group method, some known symbolic or numerical methods. Owing to the specific feature of the IVP, the first part itself can capture the essential properties of the real solution in an interval of initial point of independent variable. Moreover, the first part contains no training parameters because it does not take part in training the networks. The second part of the solution is constructed by an FNN, without being required to satisfy any specific initial value.
The study makes the following contributions. This is first time that the Lie theory and ANN method are being combined to solve IVP of ODEs. Because the first part of the solution bears part of the workload and detects the nonlinearity of solution, small-scale networks and a small amount of training data (samples) can solve the IVP more accurately. The algorithm significantly increases efficiency of the network method to solve the IVP of ODEs. On the other hand, the method provides new idea and enlighten on how to add "complementary elements" and design an ANN to increase the efficiency of ANN algorithm to solve DEs by incorporating with mathematical theory. Furthermore, the method can be applied to more a general form IVP or BVP of ODEs. Moreover, after modifying or combining some of the existing methods [29], the approach can be applied to IVP and BVP of PDEs.
The rest of the paper is structured as following. In Section 2, we introduce the Lie group method for solving IVP of ODEs and an expression of solution to the IVP is proved. In Section 3, the general process of the proposed method is given. In Section 4, numerical experiments on the application of our method to some problems are given, including the oscillation model in physical problems. Finally, in Section 5, some discussions as well as future research directions are presented.

A basic theorem
Here functions f = (f 1 , f 2 , � � �, f n ) are continuous and differentiable with respect to (w.r.t) independent variables x i and parameter a varying in a neighborhood O of 0 2 R 1 . Let be a linear partial differential operator generated from the auxiliary functions defined by From [24], we know that set G 1 form an one parameter transformation group with generator D from R n to itself if f i satisfy additional conditions: ; iii) f i is differential w.r.t parameter a. In the group case, by Taylor expansion at a = 0, we have expressions of x � i as follows Here formal notation e aD ¼ P 1 k¼0 a k k! D k is used. The correspondence between set G 1 and an one parameter transformation group is given in the following theorem (The more details can be seen in [5,23,24]). (1) defining an one parameter transformation group satisfies the IVP of ODEs Conversely, for any given continuously differentiable functions ξ i (x), i.e., the operator D determined by (2), the function x � = f(x; a) obtained by solving the problem (5) determines an one parameter transformation group with D as generator. Moreover, the transformations T a : x � = f(x; a), i.e. the solution of (5) can be expressed as formula (4), i.e., x � = e aD x.

An expression of solution to IVP of an ODEs
Effectively finding solution of IVP (5) depends on the complexity of the operator D. To reduce this difficulty, let's assume that D in (5) can be decomposed into summands of two parts as Thus, we have (6) and initial values x 2 R n , the solutions to (5) have expression as where � x ¼ � xðx; aÞ ¼ e aD 1 x andÑ ðx; aÞ ¼ R a 0 D 2 ðe ðaÀ tÞD xÞj x!� xðt;xÞ dt. The proof of the theorem is obtained by Grobner's formula in [26] given by Obviously, the first part � x ¼ e aD 1 x of the above formula (7) is the solution to new IVP by Theorem 1. Evidently, by suitably choosing the first part D 1 of D in (6), IVP (8) is more easily solved than original IVP (5) by Lie group method as well as various symbolic or numerical methods [6]. Moreover, in [24,26], it is proved that under probably selection D 1 , the solution of (8) higher accurately approximates the real solution of (5) in some interval of initial point a = 0.
Although the second partÑ in (7) will computed by symbolically in some simple cases, in general it is extremely complicate if one directly solve it. In this article, we use the formula (7) to design a FNN to solve the term.

Method
In the section, we describe the scheme of our Lie group based on FNN method which relies on the expressions (7) of the solutions to an IVP of ODEs given in Theorem 2.

Scheme
Our method, first, is designed for the IVP of an autonomous system of ODEs where x 2 O 2 R 1 is independent variable and y i = y i (x) are dependent variables and f i are differential functions of own arguments. Then, we extend the method to solve different form ODEs or PDEs problems. We rewrite IVP (9) as operator form by introducing notations y = (y 1 , y 2 , � � �, y n ), α = (α 1 , α 2 , � � �, α n ) and operator D ¼ P n i¼1 f i ðyÞ @ @y i , which has the same form with (5). Consequently, by Theorem 2, the solution of IVP (9) can be written as with x as group parameter. Here we understand that e xD α = e xD y| y!α .
We use this expression of the solution y to design a FNN method to approximate y(x) by a neural networkŷðx; yÞ, where θ is the neural network parameters. To this end, we require that the network (trial solution) has the expression where N ðx; yÞ is a differential function to be determined and � yðxÞ ¼ e xD 1 a which is solution of the following new IVP by Theorem 1 for D 1 ¼ P n i¼1 g i ð� yÞ@ � y i . Therefore, the deviations between the trial solutionŷ in (12) and exact solution y in (11) is characterized by for x in considered interval. Remark 2: We call IVP (13) as an associated IVP of original IVP (9) or (10). Its solution � yðxÞ is determined in advance by solving IVP (13) and by applying the Lie group method or other acknowledged various methods after appropriately selecting operator D 1 . The solvability and good approximation capability of the associated IVP solution � y are main considerations of choosing operator D 1 .
Consequently, compared with synthetic ways to give the first part of the trial solution in literatures, here the first part of the trial solution is constructed by a solution of the associated IVP which more accurately approximates the real solution of IVP (9) to be solved in an interval of initial point x = 0. Due to the promising feature of efficient approximation of the first term in (12) to exact solution, the construction of the second term in (12) by networks algorithm takes much less a computational burden than that of solving the overall solution with big scale training parameters. Hence, in automatic learning (training) procedure, the network detecting quickly tunes itself to approach the natural properties of the solution. This is the main our motivation why we combine the Lie group method with ANN. Therefore, our attention focuses on determining the second part of solution (12).
This unknown function N ðx; yÞ in the second part of the trial solution (12) is constructed by a FNN through optimizing the loss (error) function is a set of train points obtained from training interval O in some distributive sense.
Assuming that the operator D = D 1 + D 2 in (10) has expressions Thus the loss function (15) is rewritten as obtained by substituting (12), (13) and (16) into (15). In the loss function (17), the g i ð� y k Þ is independent of training parameters θ. This saves more computational effort. This is another benefit of combining Lie expression of DE solution and ANN algorithm.
In our case (9), the network models should have one-input x and an n-output N ðx; yÞ ¼ fN i ðx; yÞg n i¼1 layer. We use the network possessing m neurons as model to present our proposed method scheme.
For each input x, there are outputs N i ðx; yÞ as follows where trainable parameters θ = {w i , v ji , b k , c j } and w i is the weight from the input x to the ith neuron in the hidden layer and v j = {v j1 , v j2 , � � �, v jm } and v ji is the weight vector from the ith neuron in the hidden layer to the jth neuron in output layer, b j is the bias of the jth neuron in hidden layer and c j is the bias of the jth output neuron. The both σ(�) and δ(�) are activation functions for outlets of hidden and output layers respectively. In our examples given next section, we use a Sigmoid function as hidden layer activation and take a linear activation in the output layer for our networks.
In training the networks, the values of loss function are obtained by the forward training with formula (18), and the gradient descents of loss function (17) are obtained in the backpropagation; minimizing the loss function involves not only the network output N ðx; yÞ, but also the derivatives of the network output N ðx; yÞ w.r.t the input and network parameters; while these derivatives are given recursively by the ones ofN i and functions δ and σ; the update network parameters are using unified formula as where η is the learning rate and k is the iteration step (refer to [9]).

Principle to select operator D 1
The efficiency of the proposed method above depends on the selecting of operator D 1 in decomposition D = D 1 + D 2 . The main principles in choosing D 1 are given in the following considerations.
1. The associated IVP (13) is easily solved explicitly or numerically. In particular, we choose g i in (16) as part terms of f i in (9) so that the associated IVP (13) yields explicit or numerical solutions. (17), such as calculation of derivatives and values of them, are not expansive as possible as.

Extension
The method proposed above can be applied to any IVP or BVP of DEs which as long as can be transformed to the form of (9).

Non-autonomous case.
For the non-autonomous case we introduce a new variable y 0 = x and add an equation dy 0 dx ¼ 1 and initial value y 0 (0) = 0 to (19). Then, the current IVP turns to the standard IVP (9) with n + 1 unknown functions y 0 , y 1 , � � �, y n and operator D ¼ P n i¼1 f i @ y i þ @ x .

Nonzero start points.
In nonzero start point x = x 0 in (5), we do a translation x ! x − x 0 to change the problem into standard form.

PDE case.
The most intuitive idea is to convert the PDE to the ODE and then solve the original problem indirectly. For example, use symmetries of PDE, traveling wave transformations, etc to decrease the order of the PDEs and number of independent variables involved in the PDEs. Another idea is to combine existing semi-discrete methods for solving PDE [29]. Discrete the underline PDEs in spatial variables approximately so that the resulting semi-discrete equation can be cast into an ODEs (9), then use the proposed method on the resulted ODEs.

Two point problem case.
To two points problem of ODEs, we firstly regard the problem as an IVP with using a point value as initial value problem, then put another point value in the loss function so that the network automatically learn the boundary value.
Next section, by applying the given method on solving some physical problems, we illustrate the superiority of the method.

Numerical experiments and applications
In this section, we provide some examples to show the effectiveness of our proposed method. Considering each example, we evaluate the method by calculating the error between our results and the exact solutions (if available) or numerical solutions obtained from known numerical methods. Regarding all the examples, a small architecture FNN with one hidden layer of only three neurons, a linear output, and a small training data is used. To illustrate the feature of the solution obtained by our method, we provide figures displaying the graph of the solution and the exact solution of the training interval. In addition, we draw the compared figures of first terms of the trial solutions given in present article and literatures to show the capabilities of the terms to capture nonlinear properties of the real solutions. Moreover, we consider points outside the training interval (some extensions of the training interval) to show the generalization and stability of our method. Example 1. The first example shows the procedure and efficiency of our proposed method by comparing it with the methods provided in previous studies. We solve the IVP of two coupled first-order nonlinear ODEs considered in [9].
Considering Fig 3, the accuracies by computing the deviations D i ¼ y i Àŷ i between the trial solutionsŷ i obtained by our method and the exact solutions y i on the training set are controlled in 10 −5 order of magnitudes. This shows that our method admits strong robustness on the training interval by fewer training samples. Regarding the predicted (no training) intervals [−1.5, −1] and [1, 1.5], although the errors are increasing, the accuracies are maintained in 10 −3 order of magnitudes without the training data.
Regarding Fig 4, the dependence of the network errors on the number of iteration steps in our methods is shown. Considering the figure, when the iteration is approximately 130 step, the error is close to 0, and this state is persisted. This shows that the convergence of our method is faster, demonstrating the stability of our algorithm.
Compared to the method in a previous study [9] in which network architecture consisted of one hidden with ten neurons, our method achieves higher accuracy solutions in a shorter computational time by using fewer network parameters and small scale training data. Example 2. We consider the linearly forced oscillation problem where f(t) is a known function and a and b are constants and � is damping parameter with 0 < � < 1. Since the solution of the problem is sensitive to the parameter � ! 0. It is well known that it was solved by Multiple scales and averaging methods [30]. To apply our method, let's turn the problem into the form of (9) with initial values y 0 (0) = 0, y 1 (0) = a, y 2 (0) = b by letting y 0 = t, y 1 = y, which yields the operator D ¼ y 2 @ y 1 þ ðf ðy 0 Þ À y 1 À 2�y 2 Þ@ y 2 þ @ y 0 . In order to its associated IVP more accurately approximate the real solution of (26) and be solved easily, we take D 1 ¼ y 2 @ y 1 À ðy 1 þ 2�y 2 Þ@ y 2 þ @ y 0 and D 2 ¼ f ðy 0 Þ@ y 2 as in (16). By symbolic computation, one obtains the exact solutions of the associated IVP of (26) for selected the D 1 as � y ¼ ð� y 0 ; � y 1 ; � y 2 Þ ¼ e tD 1 ðy 0 ; y 1 ; y 2 Þj y 0 !0;y 1 !a;y 2 !b , i.e., � y 0 ¼ t; � y 1 ¼ R t 0 � y 2 ðtÞdt with � y 1 ¼ s À 1 e À �t ðða� þ bÞsinðstÞ þ as cosðstÞÞ; with s ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 À � 2 p : Therefore, we have trial solution asŷ To concretely computation, we take example � = 0.2, a = 0, b = 1 and f(t) = −0.4e −0.4t cos t. In this situation, IVP (25) has exact solution yðtÞ ¼ e À 0:4t sinðtÞ Regarding the previous studies, the first part of the trial solutions should be simply considered � y 1 ¼ 0 or x or x(1 + x). The degrees to which these first terms of the trial solutions approximate the exact solution in the neighborhood of initial point are shown in the Fig 5. This shows that the first term in the trial solutionŷ 1 more accurately captures the nonlinearity of the exact solution y in an interval of the initial point.
We use a grid of 21 equidistant points in [0, 2] as the training and testing data for training our networks N ðt; yÞ. Using such a small-scale network and fewer training data, our method obtains more accurate results.
The comparisons between the exact solutions y 1,2 and approximated onesŷ 1;2 provided by our method on the training interval [0, 2] and its extension [0, 2.5] are shown in Fig 6. Considering Fig 7, the trend of the network errors LðyÞ as the number of iteration steps increases is shown. The graph reflects the rapid decline in the algorithm, and the error is close to zero at approximately 220 iterations, reaching 10 −7 . This indicates that the algorithm converges quickly and is stable.
Considering Fig 8, the deviations Δy i of solution y i on the training and test points are shown. It can be observed that the algorithm has a good generalization performance and a higher accuracy even on few training samples and a small network structure.
To show the robustness of our method, we investigate the performances of our method on different values of parameter � in (25) for the same nonhomogeneous term f(t). The results are shown in the following Table 1. For more information, see S1 File.
From the table, we can see that the strong stability of our algorithm is presented. Example 3. Consider the following nonlinear initial value problem of Duffing equation [30] u 00 þ u þ 2�u 3 ¼ 0 ð27Þ with initial conditions u(0) = 1, u 0 (0) = 0 and 0 < � < 1. In mechanics, it is solved by

PLOS ONE
perturbation techniques, such as strightforward expansion, multiple scales, averaging methods, etc to analysis the different resonance phenomenons.
Here, as an application of our proposed method, we consider the case with parameter values � = 0.5.
Considering Fig 11, the decreasing trend of train errors as iteration steps increasing is displayed showing the quick convergence and stability of our algorithm.
It shows that the proposed method can also capture the fluctuation wave properties of the solution to a strong nonlinear dynamics system.

Discussion and conclusions
In this study, we propose a new NN method based on the Lie group theory of ODEs to solve the IVP of ODEs. Examples are used to verify the effectiveness of the method, and the superiority of the method is proven by comparing the exact solutions and other numerical methods. The examples also prove that even when the network structure is small, this method can achieve higher accuracy solutions than that of the methods in [9]. Moreover, the combination of the Lie group and NN methods is easy to implement.
1. The success of the method can be attributed to two aspects. The first is the employment of the Lie group expression of the real solution to the IVP of ODEs. These expressions provide a two-part summation form of the solution. Considering the first part, the initial values can be easily determined before the network calculation stage and is independent of the trainable parameters. Moreover, the first part yields approximate solutions to the real solution to be determined at least in an interval of the initial point of the independent variable. These  features of the first part of the trial solution save much workload in machine learning, which results in higher efficiency of the method. The second is the use of NNs that are excellent function approximators to determine the second part of the solution. The training (learning) procedure is implemented by optimizing the loss functions derived from the original ODEs and the additional initial or boundary values satisfied by the trial solutions.
2. Contrary to most previous methods, the proposed method is more general and can be extended to solve various problems of ODEs and PDEs by the appropriate selection of Lie group expressions of trial solutions and the loss functions.
3. As indicated by the applications in the numerical experiments, the method exhibits excellent generalization and stability performance. This is because the deviations in the training and testing data are of lower values and are uniformly stable.
4. The NN architectures employed were one hidden layer with three neurons, indicating that the trainable parameters are fewer. This indicates that the effect of the NN architectures on the quality of the solution depends on the structure of the trial solutions. Moreover, this demonstrates the importance of using the additional information of solutions in the network method instead of directly using the network approach.
5. The applications also show that the method can be applied to strong nonlinear cases and more accurately detect the severe nonlinearities of the physical phenomena.
6. The randomness of the selection D 1 in the decomposition D = D 1 + D 2 may be a drawback. However, Theorem 1-2 proves that the solution expressions used always exist for any selection D 1 . Hence, the method always works well at least in the accuracy range of previous similar methods. If the operator D 1 is selected well, then the efficiency of the method is ensured. Because the possible selections of D 1 are finite for a concrete IVP of ODEs, one of the efficiencies can always be selected after several times of computations. This leads to future studies of how to use more sophisticated theories to design an ANN algorithm for solving problems of DEs. These benefits and the ideas in this study will stimulate future studies on solving DEs' problem by using the Lie theory of DEs and ANN.