Transfer learning of neural operators for partial differential equations based on sparse network λ-FNO

Jinghong Xu; Yuqian Zhou; Qian Liu; Kebing Li; Haolin Yang

doi:10.1371/journal.pone.0321154

Abstract

When the solution domain, internal parameters, and initial and boundary conditions of partial differential equation (PDE) are changed, many potential characteristics of the equation’s solutions are still similar. This provides the possibility to reduce the cost of PDE operator learning through transfer learning methods. Based on Fourier neural operator (FNO), we propose a novel sparse neural operator network named λ-FNO. By introducing the λ parameter matrix and using a new pruning method to make the network sparse, the operator learning ability of λ-FNO is greatly improved. Using λ-FNO can efficiently learn the operator from the discrete initial function space on the uniform grid to the discrete equation’s solution space on the unstructured grid, which is not available in FNO. Finally, we apply λ-FNO to several specific transfer tasks of partial differential equations under conditional distributions to demonstrate its excellent transferability. The experimental results show that when the shape of the solution domain of the equation or its internal parameters change, our framework can capture the potential invariant information of its solution and complete related transfer learning tasks with less cost, higher accuracy, and faster speed. In addition, the sparse framework has excellent extension and can be easily extended to other network architectures to enhance its performance. Our model and data generation code can get through https://github.com/Xumouren12/TL-FNO.

Citation: Xu J, Zhou Y, Liu Q, Li K, Yang H (2025) Transfer learning of neural operators for partial differential equations based on sparse network λ-FNO. PLoS One 20(5): e0321154. https://doi.org/10.1371/journal.pone.0321154

Editor: Mohamed Kamel Riahi, Khalifa University of Science and Technology, UNITED ARAB EMIRATES

Received: September 18, 2024; Accepted: March 3, 2025; Published: May 22, 2025

Copyright: © 2025 Xu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data files are available from the public repository Figshare(DOI:10.6084/m9.figshare.27618834).

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Many practical problems involve solving complex partial differential equations (PDE). Examples include spacecraft design, turbulence simulation, climate prediction, oil field development, etc., and most of the complex differential equations cannot find analytical solutions, so we can only find their numerical solutions. Traditional numerical solvers commonly include finite element methods (FEM) and finite difference methods (FDM). These two methods discretize the solution domain into a finite number of grids to solve the equation. In general, the accuracy of the solution is positively related to the resolution. The higher the resolution, the higher the accuracy, but the efficiency will be lower. As the dimension of the equation increases, the total number of grid points required for discretization grows exponentially, leading to the “curse of dimensionality”, which often overwhelms traditional numerical solvers. Moreover, these traditional numerical methods are “data agnostic” [1] and are not designed to learn from available large datasets generated through simulation or observation. This limits the practical application range of traditional numerical solvers. Therefore, with the improvement of open source deep learning (DL) frameworks (such as Pytorch [2], Tensorflow [3]) in recent years and the great achievements of deep learning in many fields such as image processing [4–6], object detection [7–8], and natural language processing [9–11], people have shown great interest in solving partial differential equations through data-driven deep learning methods.

M. Raissi proposed the physical neural network (PINN) [12] in 2019. The PINNs method [13–16] uses neural networks composed of a combination of linear transformations and nonlinear activation functions and imposes physical constraints to approximate a specific solution function of the PDE. However, when a parameter or initial condition of the PDE changes slightly, the solution function will change accordingly, and we need to change and retrain the network. That limits its application to practical problems since many problems in science involve solving the same PDE system with different parameters or initial values. With the proposal of DeepONet [17] and neural operator, such as Fourier Neural Operator (FNO) [18], Multiwavelets Neural Operator [19–20], Graph Neural Operator [21], Multipole Neural Operator [22], DNO [23] and other neural operator methods [24–26], this problem has been solved to some extent. Li [27] proposed that neural operator methods are the only known models that have both discretization-invariance and universal approximation. Among the neural operator methods, FNO has been widely used for its fast speed and high precision, such as soliton mapping [28], heterogeneous material [29], CO₂ geological storage prediction [30], and more. FNO uses neural networks to accurately approximate any nonlinear continuous operator, learning the mapping between two infinite-dimensional Banach spaces. When solving PDE, FNO learns the mapping from parameter function space or initial function space to solution function space. Therefore, changes in parameters or initial values do not require retraining of the network, which has higher application values. After we successfully learn a solution operator for a PDE model through FNO, if the initial conditions and boundary conditions of the PDE model change, since the FNO method is a data-driven method, we need to rebuild or collect a large amount of labeled data to train a new FNO operator from scratch. However, in some practical situations, collecting sufficiently large and accurate data is costly, and retraining the model is also very time-consuming.

This problem may be solved with minimal cost by the transfer learning (TL) method. In fact, Karniadakis [31] proposed a novel idea in 2022. By combining transfer learning and DeepONet, they applied the approach of transfer learning to the field of operator learning and achieved exciting results. For a PDE model, even if the PDE model undergoes some changes, the global features may be interlinked and shareable. When the network has successfully learned a solution operator for a PDE model, we can obtain the solution operator of the changed PDE model with the least cost through the TL method. TL aims to use the existing data, models, and knowledge on a specific domain (source domain) to transfer information to other different but related domains with fewer data (target domain) through domain similarity. At present, one of the more commonly used methods is the model pre-training transfer learning method. If there is already a model trained on the source domain and the target domain has few labeled data for learning, then it can apply directly on the target domain for fine-tuning. This Pre-training and fine-tuning mode do not need to train the network from scratch for new tasks, which saves time and costs, and can make our model more robust and generalize better. In the pre-training fine-tuning mode, the lower bound of the downstream transfer task is tightly dependent on the performance of pre-training [32]. The pre-training model can be used as the benchmark model for subsequent tasks. Therefore, further improving the accuracy of pre-trained models is an important challenge. Our contributions can be summarized as follows:

We propose a novel sparse neural operator network called λ-FNO. Compared with FNO, λ-FNO can learn the operator from the discrete initial function space on the uniform grid to the discrete equation solution space on the unstructured grid, and the prediction accuracy is greatly improved when the inference time of the model is almost the same. This new architecture effectively improves the lower bound of accuracy for downstream transfer tasks.
We combine λ-FNO, the new pruning method and the transfer learning method to obtain the transfer learning framework TL-λFNO. TL-λFNO can quickly and effectively complete specific transfer learning tasks.
We performed the main experiments in the references [31] using TL-λFNO. The results show that TL-λFNO can solve various transfer learning problems (including changes in the domain geometry of solution, material properties, etc.) faster and more efficiently.

2. The λ-FNO architecture

Let be a bounded spacial domain. The spaces and , denote the input function space (initial function or parametric function) and solution function space respectively. We focus on learning a nonlinear mapping , where the , is a input function , and the , is a solution function . Next, we construct a parametric map , to approximate . The is a parameter space. Then by minimizing the loss function to find the optimal set of parameters such that approximates best. The is defined in references [18]:

we choose the Euclidean norm as the .

Then we use neural operator as the parametric map to approximate . The neural operator was defined in references [33], neural operators are iterative architectures:

where the local transformer is a lifting operator which parameterized by a fully connected layer with activation function . It lifts to a higher-dimension function . The transformer let the higher-dimension function back to the lower. After the transformer , with the help of , enters an iterative process. In the neural network, the non-linear operator layer is refer to article [18]. In our paper, we modify the neural operator iteration formula to facilitate the construction of our new framework λ-FNO. It should be noted that we introduce the λ parameter. We define the non-linear layer which makes as follows:

(1)

where the vector and . Next, we discuss how to determine the data type and size of λ parameter in (1). Assume that the resolution of our two-dimensional input function discretization is . For the discretized input function with the initial channel , where the represents the dimension of the function and represents a set of real matrices of size . So, the size of input data is . We define as a parameter matrix as follows:

(2)

We define the multiplication operation between vector and in (1) as:

and in (1) are the weight and bias parameters of this non-linear layer, respectively. The in the formula represents the nonlinear activation function which can improve the nonlinear fitting ability and the expression ability of characteristics. is the kernel integral operator parameterized by a neural network, defined in references [18] as follows:

Removing the dependence on and supposing , the become a convolution operator. Then we directly parameterize the in Fourier space, the becomes the following Fourier integral operator, as shown in Fig 1:

Download:

Fig 1. The Fourier layer: The internal structure of the Fourier layer.

https://doi.org/10.1371/journal.pone.0321154.g001

Here, is related to the via the Fourier transform, that is . Apparently, is the Fourier transform and is the inverse Fourier transform:

In our actual computation, we can use the Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (IFFT) to efficiently compute.

In this part, we explain how to get the value of λ and filter λ to complete the pruning of the network. In our network, λ is randomly initialized by the network to generate. Then, after the construction of the network framework, λ will be pre-trained with the network, as shown in Fig 2. It should be noted that we need to set a larger learning rate for λ parameter separately, to find its optimal solution in a larger domain. After the pre-training is completed, we get the optimal value of λ. From now on, we fix λ to the optimal value obtained after pre-training, so that, we can treat λ as a constant matrix. The network currently is a special dense net [34]. Next, we transform the dense net into an adaptive residual structure (residual structure is used in almost all neural network models) to improve its performance and efficiency, we need to use a reasonable mathematical method to prune the network architecture, i.e., to filter the constant matrix λ.

Download:

Fig 2. Process of pruning: First, pretrained above network framework for findingλ values.

Use data in source domain pre-train the network to get the optimal λ value. Then use the pruning method we proposed to prune the network, cut off the λ parameters that do not contribute much and select the λ parameters that need to be retained. This sparse structure can make our network faster and more compact without losing accuracy. The network below the picture is the pruned network, and we will perform subsequent transfer tasks on this network.

https://doi.org/10.1371/journal.pone.0321154.g002

Assume there are in our pre-trained network. According to (2), each λ is a matrix of size . We flatten all λ into a row vector, and then form a matrix X:

Next, we find the covariance matrix D(X) of X:

since D(X) is a real symmetric matrix, it must be orthogonal to diagonalization:

where the are the eigenvalues of the matrix , here we perform a simple sorting of the eigenvalues so that . The is a matrix composed of mutually orthogonal unit eigenvectors. Then we use the eigenvalues to determine how many λ we want to leave. There must be an integer such that:

(3)

where TH is a hyperparameter that needs to be set before training. The setting of this parameter depends on the degree of pruning you want, and it is set to 0.85 in subsequent experiments.

We take out the eigenvectors corresponding to the eigenvalues from the and sum them:

Assume , then in the end, we keep the value of , i.e., one-to-one correspondence with the subscripts of the larger elements in . Let the remaining . Using the above algorithm, we complete the prune operation. The pruned network is shown in Fig 2. Subsequent transfer tasks are performed on the pruned network framework.

3. Transfer learning method

In this part, we mainly consider that there is enough labeled data in the source domain , where the vector of input random variables and the corresponding output vector . Furthermore, there is a target domain with many unlabeled data and few labeled data , where , and . Obviously, there are identical marginal distribution () and different conditional distribution (). After given the target domain and the source domain, transfer learning methods can be uniformly characterized as:

where the is transfer regex.

Since we are using the pre-training and fine-tuning transfer method, which fine-tuning is a standard empirical risk minimization process, we need to add transfer regex to improve performance. In our experiments, we use the L2-SP regularization constraint in the fine-tuning:

where the represents the starting point of weight. Note that the source and target network frameworks may differ slightly. In the common part (or with the same structure) we directly apply the above formula. In the other part, ordinary L2 regularization is used for constraints.

The construction of the transfer learning framework is based on λ-FNO. Since the subsequent experimental data is generated based on the unstructured grid, we put a fully connected layer at the end of the network to finally map the discrete input function on the structured grid to the discrete solution function on the unstructured grid.

3.1. Steps of the proposed method

Pre-training λ-FNO on source domain.

Dense λ-FNO (Fig 2) is pre-trained on a source domain , where , . We pre-training the source model using the relative L2 error:

where is Euclidean norm. and are prediction data and label data respectively. In the optimization part, we adopt the Adam optimizer to minimize the loss function. After the training is completed, we need to save the λ parameter.

Pruning the network.

Filter the λ parameter through the algorithm in part 2 to complete the pruning of the network. We fix the value of λ that passed the screening to make it a constant matrix and set it to a zero matrix if it failed the screening. The pruned network is shown in Fig 2.

Parameters transfer.

Transfer the training parameters of λ-FNO to the target model TL-λFNO. This method is very commonly used in transfer learning, which can effectively reduce the dependence on the label data of the target domain and reduce the cost of training from scratch. TL-λFNO remains roughly the same, only a slight change to the output layer neurons of the last fully connected layer is required.

Fine-tuning TL-λFNO on target domain.

Fine-tuning some specific layers of TL-λFNO in the target domain . In the field of computer vision, it is generally believed that the parameters of the convolutional layer are general, while the parameters of the fully connected layer are used for specific task goals [35–36]. So, we freeze the parameters of all layers except the last three layers of fully connected layers, and then fine-tune the parameters of the last three layers through the hybrid loss function (fine-tune the parameters of the red shaded part of Fig 3). Ensure that it still has excellent results for specific tasks while reducing training costs and reducing the occurrence of overfitting. The transfer regex in the hybrid loss function is conducive to the closeness of the parameter’s distribution of the target domain model and the source domain model, which can effectively improve the transfer performance, and reduce the overfitting of the model. The hybrid loss function define as follows:

Download:

Fig 3. Target model (TL-λFNO): Transfer the parameters trained on the source model to the target model.

Except for the dimensionality reduction operator (including two fully connected layers inside) and the last fully connected layer (red shaded part), other layer parameters are frozen. The network layers represented by the red shaded part are fine-tuned using the labeled data in target domain and hybrid loss function .

https://doi.org/10.1371/journal.pone.0321154.g003

where is an adjustable hyper-parameter that was set to 0.0001 in subsequent experiments. and denote the current value and initial value of the target model parameters (i.e., transfer initial value) respectively. In the optimization process, we still use the Adam optimizer to minimize the loss and finally get the optimal parameters.

4. Numerical experiments

In this section, we performed some specific transfer tasks for Darcy Flow, Elasticity model, and Burgers’ equation. All tasks consider transfer learning under conditional shift, that is, the distribution of task input data is the same but there is a difference in output distribution. All experiments uniformly use four Fourier operator layers, ReLU activation function and Adam optimizer. The error item in the table is the average value of the test set error obtained in the last 10 epochs. All calculations are performed on a single NVIDIA GeForce RTX 3,060 Mobile GPU.

4.1. Darcy Flow

The 2-d Darcy Flow equation which is a linear second-order PDE on the unit square box is described by:

where is the diffusion coefficient and is the forcing function. For simplicity, we directly consider a constant forcing term, i.e., . We uniformly impose the Dirichlet boundary condition on the boundary. This PDE can be applied in industrial fields such as the control of groundwater pollution and the exploitation of fracture-cavity reservoirs. We want to learn the operator which nonlinearly maps the diffusion coefficient to the solution , i.e., .

For this PDE, we consider the following four transfer tasks which are presented in Fig 4:

Download:

Fig 4. Transfer learning for Darcy Flow: Geometry differences between source and target domains in TL1-TL4.

https://doi.org/10.1371/journal.pone.0321154.g004

TL1: Transfer from the square domain to the equilateral triangle domain.
TL2: Transfer from a square domain to a right-angled triangle.
TL3: Transfer from a square domain to an equilateral triangle with a vertical notch.
TL4: Transfer from a square domain with one vertical notch to a square domain with two horizontal notches.

To train the operator network, we randomly sample and generate the corresponding model response . For source domain data, we uniformly generate training data and test data. In addition, for the target domain, we also generate training data and test data. When training on the target domain, sequential training is performed for = {5, 20, 50, 100, 150, 200, 250, 2,000} samples and test data to evaluate the effect of training set size on the network impact. We train on the source domain using DeepONet, FNO, and our proposed λ-FNO, respectively. The relative L2 norm error (%) and training cost (s) are shown in Tables 1, 5 and 7 respectively. In terms of source domain accuracy, λ-FNO has an absolute advantage over DeepONet and FNO. The training speed is only slightly slower than FNO and faster than DeepONet. The Training cost of DeepONet can be obtained in reference [31]. For the target domain, we construct TL-λFNO for TL1–4. On the one hand, we train the target domain data on a new λ-FNO from scratch. On the other hand, we fine-tune the target domain data on TL-λFNO which is constructed by transfer of source domain network parameters. The target domain relative L2 norm error (%) and training cost (s) of TL1–4 are in Tables 2–4, 6 and 8 respectively. Under the extremely small data of , TL-DeepOnet has a better effect. When = {50, 100, 150, 200, 250, 2000}, TL-λFNO has a huge advantage over training on TL-DeepONet. The reason is that TL-λFNO contains a lot of source domain information, guaranteed very good accuracy even when little data is available. So, fine-tuning TL-λFNO is more correct than retraining on λ-FNO. In general, the lower bound of the downstream transfer task is tightly dependent on the performance of pre-training. So, it is only natural that TL-λFNO outperforms TL-DeepONet in most cases. In the case of a large amount of data, such as , training λ-FNO from scratch can achieve the highest accuracy, but at the same time there will be a high time cost. As shown in Table 8, when , the time to train λ-FNO from scratch is 4,102, 4,128, 4,073 and 3,983 seconds respectively. If the accuracy requirements are not strict, fine-tuning TL-λFNO is still the best choice. When , the relative L2 norm error of fine-tuning TL-λFNO is 0.51%, 0.73%, 0.98% and 0.33% for TL1–4. The training time is only 654, 640, 666 and 637 seconds respectively, which takes less than a quarter of the time compared to training λ-FNO from scratch. This is expected, since we only need to train its last three fully connected layers when fine-tuning TL-λFNO, which can greatly reduce training parameters and time. However, the parameters transferred from the source model contain enough information to ensure accuracy. Through this result, we can also judge that when transfer learning is applied to Darcy Flow, it can reduce the cost to a certain extent, whether it is in terms of time cost or data cost.

Download:

Table 1. Relative L2 error (%) on the test set for the source domain (TL1 – TL3).

https://doi.org/10.1371/journal.pone.0321154.t001

Download:

Table 2. Relative L2 error (%) on the test set for the target domain (TL1).

https://doi.org/10.1371/journal.pone.0321154.t002

Download:

Table 3. Relative L2 error (%) on the test set for the target domain (TL2).

https://doi.org/10.1371/journal.pone.0321154.t003

Download:

Table 4. Relative L2 error (%) on the test set for the target domain (TL3).

https://doi.org/10.1371/journal.pone.0321154.t004

Download:

Table 5. Relative L2 error (%) on the test set for the source domain (TL4).

https://doi.org/10.1371/journal.pone.0321154.t005

Download:

Table 6. Relative L2 error (%) on the test set for the target domain (TL4).

https://doi.org/10.1371/journal.pone.0321154.t006

Download:

Table 7. Training cost in seconds (s) for the source domain (TL1-3 and TL4).

https://doi.org/10.1371/journal.pone.0321154.t007

Download:

Table 8. Training cost in seconds (s) for the target domain (TL1-TL4).

https://doi.org/10.1371/journal.pone.0321154.t008

4.2 .Elasticity model

We consider modeling a thin rectangular plate subjected to in-plane loading as a two-dimensional plane stress elasticity problem:

where is the Cauchy stress tensor, is the body force, represents the x-displacement and represents the y-displacement. The relation between displacement and stress under plane stress is defined as:

where represents Young’s Modulus and represents Poisson’s Ratio.

For the elasticity model, what we want to learn is the mapping operator from random boundary loads to displacement fields (: x-displacement and : y-displacement), i.e., . We still choose to randomly sample from a Gaussian random field. For the source domain, we consider a thin plate with a centered circular internal boundary and material properties (). Furthermore, based on different geometric shapes, in this application we will consider that the source domain and the target domain have different material properties. This not only increases the difficulty of our network framework, but also demonstrates the wide applicability and practical application value of transfer learning.

For this experiment, we consider the following four transfer tasks which are presented in Fig 5:

Download:

Fig 5. Transfer learning for Elasticity model: Geometry differences between source and target domains in TL5-TL6.

https://doi.org/10.1371/journal.pone.0321154.g005

TL5: Transfer the source domain with a centered circular inner boundary and material properties () to two smaller circular inner boundaries with upper right and lower left corners and different material properties () of the target domain.
TL6: Transfer a source domain with a centered circular interior boundary and material properties () to a target domain with a small square interior boundary and different material properties ().

To train the source domain model, we randomly sample random boundary loads functions and get the corresponding displacements and . Randomly divide 2,000 data into training data and test data. We still compare DeepONet, FNO and our proposed λ-FNO framework, the results of the source domain are shown in Table 9 and Table 10. From the analysis of the results, the proposed λ-FNO is still superior in accuracy, and only slightly inferior to FNO in speed. This proves that our proposed framework is effective. For the target domain, we still sample = {5, 20, 50, 100, 150, 200, 250, 1900} training data and test data for sequential training and evaluation. So that we can observe whether there is an advantage in fine-tuning the transferred model compared to training our network from scratch when there is only a small amount of data in the target domain. All relative L2 error (%) results for TL5 and TL6 are shown in Table 11 and 12, respectively. From the result analysis, in the case of extremely small data such as , fine-tuning TL-DeepONet still performs better. When the number of training data samples is = {20, 50, 100, 150, 200, 250, 1900}, compared to training λ-FNO from scratch, fine-tuning our transfer model TL-λFNO can achieve smaller errors and greatly improve the prediction accuracy of our model. Moreover, the error of our proposed TL-λFNO is reduced to another order of magnitude compared to the transfer model TL-DeepONet, i.e., For TL5 target domain, when , the test errors of TL-DeepONet are 3.56% and 4.33% respectively, while the test errors of TL-λFNO are only 0.38% and 0.84%. In the case that the target domain itself has a large amount of data, training the network framework from scratch will lead to smaller errors, which is expected. We observe that for the resulting TL-λFNO accuracy is like the case of training λ-FNO on the target domain but the time cost is expensive for training from scratch, as shown in Table 13. Therefore, when the accuracy requirements are not very strict or the amount of data in the target domain is small, it is very practical to use transfer learning to reduce time costs and improve accuracy.

Download:

Table 9. Relative L2 error (%) on the test set for the source domain (TL5 – TL6).

https://doi.org/10.1371/journal.pone.0321154.t009

Download:

Table 10. Training cost in seconds (s) for the source domain (TL5-6).

https://doi.org/10.1371/journal.pone.0321154.t010

Download:

Table 11. Relative L2 error (%) on the test set for the target domain (TL5).

https://doi.org/10.1371/journal.pone.0321154.t011

Download:

Table 12. Relative L2 error (%) on the test set for the target domain (TL6).

https://doi.org/10.1371/journal.pone.0321154.t012

Download:

Table 13. Training cost in seconds (s) for the target domain (TL5 and TL6).

https://doi.org/10.1371/journal.pone.0321154.t013

4.3 .Burgers’ equation

We consider the 1-d Burgers’ equation which is a nonlinear parabolic equation. It takes the form

where represents the velocity field of the fluid with periodic boundary conditions and is the diffusion coefficient. The Burgers’ equation is a fundamental partial differential equation in various fields of applied mathematics, such as fluid mechanics, nonlinear acoustics, gas dynamics. We aim to learn the operator mapping between the initial condition and the equation’ solution at time one . The initial condition is randomly sampled from a Gaussian random field. For the source model, we choose the diffusion coefficient for the equation.

For Burgers’ equation, we consider the following transfer scenario:

1.TL7: Transfer learning from diffusion coefficients to .

For the source model, we generate train and test data. In addition, we generate , train and test target data. When training the target model, we sample = {5, 10, 15, 20, 25, 50, 100, 500} training data to train the model in sequence and use test data to test the target model. It is worth noting that due to the small proportion of the last three layers of network parameters in this experiment, the last Fourier layer was added to the fine-tuning process during the fine-tuning transfer model. The relative L2 error (%) results of the test set on the source model are shown in Table 14. It can be seen from the Table 14 that even when the magnitude of the error is very small, if the training is extreme, the λ-FNO proposed by us can still reduce the error by more than 15%. For the target domain, we still sample = {5, 10, 15, 20, 25, 50, 100, 500} training data and test data for sequential training and evaluation. We use two training modes, one is to fine-tune the TL-λFNO which is transferred from source model, and the other is to train a new λ-FNO network from scratch, the results are shown in Table 15. It can be seen from the table that when there is little data in the target domain, i.e., , fine-tuning TL-λFNO can get higher accuracy. When the amount of data becomes larger, the accuracy obtained by training a new λ-FNO from scratch is gradually approaching the accuracy obtained by fine-tuning TL-λFNO. But in terms of time cost, it still takes less time to fine-tune TL-λFNO, as shown in Table 16. Overall, when the target domain data is scarce, the method using the fine-tuned transfer model turns out to be better both in terms of accuracy and time cost. When there is more data in the target domain, if you pursue higher accuracy, you should choose to train a new λ-FNO from scratch. If you pursue training speed, fine-tuning the transfer model is still a better choice. Finally, plots of three representative realizations of the initial condition with reference response are shown in Fig 6.

Download:

Table 14. Relative L2 error (%) on the test set for the source domain (TL7).

https://doi.org/10.1371/journal.pone.0321154.t014

Download:

Table 15. Relative L2 error (%) on the test set for the target domain (TL7).

https://doi.org/10.1371/journal.pone.0321154.t015

Download:

Table 16. Training cost in seconds (s) for the target domain (TL7).

https://doi.org/10.1371/journal.pone.0321154.t016

Download:

Fig 6. Plots of three representative realizations of the initial condition with reference response.

https://doi.org/10.1371/journal.pone.0321154.g006

5. Discussion and conclusion

In conclusion, we successfully construct a new framework λ-FNO based on FNO by introducing λ parameter and pruning method. λ-FNO is then used for the transfer scenario of PDEs on unstructured grid discretization. We conduct numerical experiments on Darcy Flow, Elasticity model, and Burgers’ equation respectively. First, FNO and λ-FNO are trained and compared using source domain data. On the source model, the error we get with λ-FNO is about 20%−50% lower than that of FNO. Then use λ-FNO for transfer learning. We constructed a total of 7 transfer scenarios and used two training methods, one is to train λ-FNO from scratch using the target domain data, and the other is to fine-tune the TL-λFNO transferred from the source model. Experimental results show that using a good transfer model can effectively reduce data cost and time cost. Finally, we compared the error results of our proposed TL-λFNO with the TL-DeepONet proposed in the reference [31], and the results show that TL-λFNO has better performance in terms of accuracy and speed. It is worth noting that we believe that the λ-FNO construction method proposed in this paper, i.e., the introduction and pruning of λ parameters, can not only be applied to FNO but also can be extended to other networks. In our expectation, this method can achieve higher accuracy than ResNet and faster speed than DenseNet. We will have a more in-depth study on it later.

Appendix A

Data generation

In this section, we provide the parameters of data generator for the three equation we used in Section 4.

1.1. Burgers’ equation.

The 1-d Burgers’ equation takes the form:

The initial conditions are randomly sampled in a Gaussian random field from , where with periodic boundary conditions. We uniformly take 128 points for in the definition domain , and represent the initial condition in a discretized form. Then solve the equations using a split step method, where the heat equation part is solved exactly in Fourier space, and then the nonlinear part is solved again in Fourier space using the forward Euler method. We solve on a spatial grid with resolution .

1.2. Darcy Flow.

Recall the 2-d Darcy Flow on the unit square box:

The is described as a stochastic process and generated according to , where . We discretize in the grid. The realizations are generated with a truncated Karhunen-Loéve expansion (KLE). The Dirichlet boundary conditions are imposed on all boundaries. For the TL1–3, We employ 1,541, 2,295, 1,200, 2,295 unstructured meshes in our simulations for the square, equilateral triangle, the right-angled triangle, and the triangular domain with a notch, respectively. For the TL4, we employ 1,538, 1,552 unstructured meshes in our simulations for the square domain with a vertical notch and the square domain with two horizontal notches, respectively.

1.3. Elasticity model.

The rectangular plate subjected to in-plane loading as a two-dimensional plane stress elasticity problem:

The random boundary loads is described as a stochastic process and generated according to , where . We discretize in the grid. The realizations are generated with a truncated Karhunen-Loéve expansion (KLE). For the TL5–6, we employ 1,020, 1,183, 1816 unstructured meshes in our simulations for the source domain, TL5 target domain and the TL6 target domain respectively.

Appendix B

Network and training parameters

We uniformly use 0.001 and 0.01 as the initial learning rate of network parameters and λ parameters respectively. In this section, we will give the necessary hyperparameters for each training task (Table 17).

Download:

Table 17. Training hyperparameter.

https://doi.org/10.1371/journal.pone.0321154.t017

References

1. Mishra S. A machine learning framework for data driven acceleration of computations of di erential equations. Mathematics in Engineering. 2018;1(1):118–46.
- View Article
- Google Scholar
2. Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library[J]. Advances in neural information processing systems. 2019; 32.
3. Abadi M, Barham P, Chen J, et al. Tensorflow: a system for large-scale machine learning. OSDI. 2016;16:265–83.
- View Article
- Google Scholar
4. Chen X, Wang X, Zhang K, Fung K-M, Thai TC, Moore K, et al. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal. 2022;79:102444. pmid:35472844
- View Article
- PubMed/NCBI
- Google Scholar
5. van der Velden BHM, Kuijf HJ, Gilhuijs KGA, Viergever MA. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med Image Anal. 2022;79:102470. pmid:35576821
- View Article
- PubMed/NCBI
- Google Scholar
6. Schapiro D, Sokolov A, Yapp C, Chen Y-A, Muhlich JL, Hess J, et al. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods. 2022;19(3):311–5. pmid:34824477
- View Article
- PubMed/NCBI
- Google Scholar
7. Li B, Xiao C, Wang L. Dense nested attention network for infrared small target detection. IEEE Transactions on Image Processing. 2022.
- View Article
- Google Scholar
8. Jha SS, Nidamanuri RR, Ientilucci EJ. Influence of atmospheric modeling on spectral target detection through forward modeling approach in multi-platform remote sensing data. ISPRS Journal of Photogrammetry and Remote Sensing. 2022;183(N/A):286–306.
- View Article
- Google Scholar
9. Zhang T, Schoene AM, Ji S, Ananiadou S. Natural language processing applied to mental illness detection: a narrative review. NPJ Digit Med. 2022;5(1):46. pmid:35396451
- View Article
- PubMed/NCBI
- Google Scholar
10. Liu P, Yuan W, Fu J. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35.
- View Article
- Google Scholar
11. Goldstein A, Zada Z, Buchnik E, Schain M, Price A, Aubrey B, et al. Shared computational principles for language processing in humans and deep language models. Nat Neurosci. 2022;25(3):369–80. pmid:35260860
- View Article
- PubMed/NCBI
- Google Scholar
12. Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics. 2019;378:686–707.
- View Article
- Google Scholar
13. Wu G-Z, Fang Y, Kudryashov NA, Wang Y-Y, Dai C-Q. Prediction of optical solitons using an improved physics-informed neural network method with the conservation law constraint. Chaos, Solitons & Fractals. 2022;159:112143.
- View Article
- Google Scholar
14. Jiang X, Wang D, Fan Q. Physics-informed neural network for nonlinear dynamics in fiber optics. Laser & Photonics Reviews. 2022;16(9):2100483.
- View Article
- Google Scholar
15. Cuomo S, Di Cola VS, Giampaolo F. Scientific machine learning through physics–informed neural networks: where we are and what’s next. Journal of Scientific Computing. 2022;92(3):88.
- View Article
- Google Scholar
16. Gokhale G, Claessens B, Develder C. Physics informed neural networks for control oriented thermal modeling of buildings. Applied Energy. 2022;314:118852.
- View Article
- Google Scholar
17. Lu L, Jin PZ, Karniadakis GE. Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. Nat Mach Intell. 2021;3:218–29.
- View Article
- Google Scholar
18. Li ZY, Kovachki NB, Azizzadenesheli K, et al. Fourier neural operator for parametric partial differential equations. 2018.
19. Xiao X, Cao D, Yang R, et al. Coupled multiwavelet operator learning for coupled differential equations[C]//The Eleventh International Conference on Learning Representations. 2022.
20. Gupta G, Xiao X, Bogdan P. Multiwavelet-based operator learning for differential equations. Advances in neural information processing systems. 2021;34:24048–62.
- View Article
- Google Scholar
21. Anandkumar A, Azizzadenesheli K, Bhattacharya K, et al. Neural operator: Graph kernel network for partial differential equations. ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations. 2020.
- View Article
- Google Scholar
22. Li Z, Kovachki NB, Azizzadenesheli K, Liu B, Stuart AM, Bhattacharya K, et al. Multipole graph neural operator for parametric partial differential equations. Advances in Neural Information Processing Systems. 2020;33.
- View Article
- Google Scholar
23. Lu M, Mohammadi A, Meng Z, Meng X, Li G, Li Z. Deep neural operator for learning transient response of interpenetrating phase composites subject to dynamic loading. Computational Mechanics. 2023;5(1):1–4.
- View Article
- Google Scholar
24. Patel RG, Trask NA, Wood MA, et al. A physics-informed operator regression framework for extracting data-driven continuum models. Computer Methods in Applied Mechanics and Engineering. 2021;373:113500.
- View Article
- Google Scholar
25. Raonic B, Molinaro R, Rohner T, et al. Convolutional Neural Operators. ICLR 2023 Workshop on Physics for Machine Learning. 2023.
- View Article
- Google Scholar
26. Cao Q, Goswami S, Karniadakis GE. Lno: Laplace neural operator for solving differential equations. arXiv preprint. 2023.
- View Article
- Google Scholar
27. Kovachki N, Li Z, Liu B. Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research. 2023;24(89):1–97.
- View Article
- Google Scholar
28. Zhong M, Yan Z. Data-driven soliton mappings for integrable fractional nonlinear wave equations via deep learning with Fourier neural operator. Chaos, Solitons & Fractals. 2022;165:112787.
- View Article
- Google Scholar
29. You H, Zhang Q, Ross CJ. Learning deep implicit fourier neural operators (IFNOs) with applications to heterogeneous material modeling. Computer Methods in Applied Mechanics and Engineering. 2022;398:115296.
- View Article
- Google Scholar
30. Wen G, Li Z, Long Q. Real-time high-resolution CO2 geological storage prediction using nested Fourier neural operators. Energy & Environmental Science. 2023;16(4):1732–41.
- View Article
- Google Scholar
31. Goswami S, Kontolati K, Shields MD. Deep transfer operator learning for partial differential equations under conditional shift. Nature Machine Intelligence. 2022;1(1):1–10.
- View Article
- Google Scholar
32. Kornblith S, Shlens J, Le QV. Do better imagenet models transfer better?. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019;2019:2661–71.
- View Article
- Google Scholar
33. Anandkumar A, Azizzadenesheli K, Bhattacharya K. Neural operator: Graph kernel network for partial differential equations. ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations. 2020.
- View Article
- Google Scholar
34. Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.
35. Zhang X, Garikipati K. Machine learning materials physics: Multi-resolution neural networks learn the free energy and nonlinear elastic response of evolving microstructures. Computer Methods in Applied Mechanics and Engineering. 2020;372:113362.
- View Article
- Google Scholar
36. Yosinski J, Clune J, Bengio Y, et al. How transferable are features in deep neural networks?[J]. Advances in neural information processing systems, 2014, 27.

[ref1] 1. Mishra S. A machine learning framework for data driven acceleration of computations of di erential equations. Mathematics in Engineering. 2018;1(1):118–46.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library[J]. Advances in neural information processing systems. 2019; 32.

[ref3] 3. Abadi M, Barham P, Chen J, et al. Tensorflow: a system for large-scale machine learning. OSDI. 2016;16:265–83.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Chen X, Wang X, Zhang K, Fung K-M, Thai TC, Moore K, et al. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal. 2022;79:102444. pmid:35472844
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref5] 5. van der Velden BHM, Kuijf HJ, Gilhuijs KGA, Viergever MA. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med Image Anal. 2022;79:102470. pmid:35576821
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref6] 6. Schapiro D, Sokolov A, Yapp C, Chen Y-A, Muhlich JL, Hess J, et al. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods. 2022;19(3):311–5. pmid:34824477
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref7] 7. Li B, Xiao C, Wang L. Dense nested attention network for infrared small target detection. IEEE Transactions on Image Processing. 2022.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Jha SS, Nidamanuri RR, Ientilucci EJ. Influence of atmospheric modeling on spectral target detection through forward modeling approach in multi-platform remote sensing data. ISPRS Journal of Photogrammetry and Remote Sensing. 2022;183(N/A):286–306.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. Zhang T, Schoene AM, Ji S, Ananiadou S. Natural language processing applied to mental illness detection: a narrative review. NPJ Digit Med. 2022;5(1):46. pmid:35396451
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref10] 10. Liu P, Yuan W, Fu J. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys. 2023;55(9):1–35.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref11] 11. Goldstein A, Zada Z, Buchnik E, Schain M, Price A, Aubrey B, et al. Shared computational principles for language processing in humans and deep language models. Nat Neurosci. 2022;25(3):369–80. pmid:35260860
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref12] 12. Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics. 2019;378:686–707.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref13] 13. Wu G-Z, Fang Y, Kudryashov NA, Wang Y-Y, Dai C-Q. Prediction of optical solitons using an improved physics-informed neural network method with the conservation law constraint. Chaos, Solitons & Fractals. 2022;159:112143.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref14] 14. Jiang X, Wang D, Fan Q. Physics-informed neural network for nonlinear dynamics in fiber optics. Laser & Photonics Reviews. 2022;16(9):2100483.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref15] 15. Cuomo S, Di Cola VS, Giampaolo F. Scientific machine learning through physics–informed neural networks: where we are and what’s next. Journal of Scientific Computing. 2022;92(3):88.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref16] 16. Gokhale G, Claessens B, Develder C. Physics informed neural networks for control oriented thermal modeling of buildings. Applied Energy. 2022;314:118852.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref17] 17. Lu L, Jin PZ, Karniadakis GE. Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. Nat Mach Intell. 2021;3:218–29.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref18] 18. Li ZY, Kovachki NB, Azizzadenesheli K, et al. Fourier neural operator for parametric partial differential equations. 2018.

[ref19] 19. Xiao X, Cao D, Yang R, et al. Coupled multiwavelet operator learning for coupled differential equations[C]//The Eleventh International Conference on Learning Representations. 2022.

[ref20] 20. Gupta G, Xiao X, Bogdan P. Multiwavelet-based operator learning for differential equations. Advances in neural information processing systems. 2021;34:24048–62.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref21] 21. Anandkumar A, Azizzadenesheli K, Bhattacharya K, et al. Neural operator: Graph kernel network for partial differential equations. ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations. 2020.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref22] 22. Li Z, Kovachki NB, Azizzadenesheli K, Liu B, Stuart AM, Bhattacharya K, et al. Multipole graph neural operator for parametric partial differential equations. Advances in Neural Information Processing Systems. 2020;33.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref23] 23. Lu M, Mohammadi A, Meng Z, Meng X, Li G, Li Z. Deep neural operator for learning transient response of interpenetrating phase composites subject to dynamic loading. Computational Mechanics. 2023;5(1):1–4.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref24] 24. Patel RG, Trask NA, Wood MA, et al. A physics-informed operator regression framework for extracting data-driven continuum models. Computer Methods in Applied Mechanics and Engineering. 2021;373:113500.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref25] 25. Raonic B, Molinaro R, Rohner T, et al. Convolutional Neural Operators. ICLR 2023 Workshop on Physics for Machine Learning. 2023.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref26] 26. Cao Q, Goswami S, Karniadakis GE. Lno: Laplace neural operator for solving differential equations. arXiv preprint. 2023.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref27] 27. Kovachki N, Li Z, Liu B. Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research. 2023;24(89):1–97.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref28] 28. Zhong M, Yan Z. Data-driven soliton mappings for integrable fractional nonlinear wave equations via deep learning with Fourier neural operator. Chaos, Solitons & Fractals. 2022;165:112787.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref29] 29. You H, Zhang Q, Ross CJ. Learning deep implicit fourier neural operators (IFNOs) with applications to heterogeneous material modeling. Computer Methods in Applied Mechanics and Engineering. 2022;398:115296.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref30] 30. Wen G, Li Z, Long Q. Real-time high-resolution CO2 geological storage prediction using nested Fourier neural operators. Energy & Environmental Science. 2023;16(4):1732–41.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref31] 31. Goswami S, Kontolati K, Shields MD. Deep transfer operator learning for partial differential equations under conditional shift. Nature Machine Intelligence. 2022;1(1):1–10.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref32] 32. Kornblith S, Shlens J, Le QV. Do better imagenet models transfer better?. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019;2019:2661–71.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref33] 33. Anandkumar A, Azizzadenesheli K, Bhattacharya K. Neural operator: Graph kernel network for partial differential equations. ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations. 2020.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref34] 34. Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.

[ref35] 35. Zhang X, Garikipati K. Machine learning materials physics: Multi-resolution neural networks learn the free energy and nonlinear elastic response of evolving microstructures. Computer Methods in Applied Mechanics and Engineering. 2020;372:113362.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref36] 36. Yosinski J, Clune J, Bengio Y, et al. How transferable are features in deep neural networks?[J]. Advances in neural information processing systems, 2014, 27.

Figures

Abstract

1. Introduction

2. The λ-FNO architecture

3. Transfer learning method

3.1. Steps of the proposed method

Pre-training λ-FNO on source domain.

Pruning the network.

Parameters transfer.

Fine-tuning TL-λFNO on target domain.

4. Numerical experiments

4.1. Darcy Flow

4.2 .Elasticity model

4.3 .Burgers’ equation

5. Discussion and conclusion

Appendix A

Data generation

1.1. Burgers’ equation.

1.2. Darcy Flow.

1.3. Elasticity model.

Appendix B

Network and training parameters

References