Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A machine learning approach to model the impact of line edge roughness on gate-all-around nanowire FETs while reducing the carbon footprint

  • Antonio García-Loureiro ,

    Contributed equally to this work with: Antonio García-Loureiro, Natalia Seoane, Julián G. Fernández, Enrique Comesaña, Juan C. Pichel

    Roles Funding acquisition, Investigation, Software, Supervision

    Affiliation CITIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain

  • Natalia Seoane ,

    Contributed equally to this work with: Antonio García-Loureiro, Natalia Seoane, Julián G. Fernández, Enrique Comesaña, Juan C. Pichel

    Roles Software, Writing – original draft, Writing – review & editing

    Affiliation CITIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain

  • Julián G. Fernández ,

    Contributed equally to this work with: Antonio García-Loureiro, Natalia Seoane, Julián G. Fernández, Enrique Comesaña, Juan C. Pichel

    Roles Software

    Affiliation CITIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain

  • Enrique Comesaña ,

    Contributed equally to this work with: Antonio García-Loureiro, Natalia Seoane, Julián G. Fernández, Enrique Comesaña, Juan C. Pichel

    Roles Software

    Affiliation Departamento de Electrónica e Computación, Universidade de Santiago de Compostela, Lugo, Spain

  • Juan C. Pichel

    Contributed equally to this work with: Antonio García-Loureiro, Natalia Seoane, Julián G. Fernández, Enrique Comesaña, Juan C. Pichel

    Roles Conceptualization, Software, Supervision, Writing – original draft, Writing – review & editing

    Affiliation CITIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain


The performance and reliability of semiconductor devices scaled down to the sub-nanometer regime are being seriously affected by process-induced variability. To properly assess the impact of the different sources of fluctuations, such as line edge roughness (LER), statistical analyses involving large samples of device configurations are needed. The computational cost of such studies can be very high if 3D advanced simulation tools (TCAD) that include quantum effects are used. In this work, we present a machine learning approach to model the impact of LER on two gate-all-around nanowire FETs that is able to dramatically decrease the computational effort, thus reducing the carbon footprint of the study, while obtaining great accuracy. Finally, we demonstrate that transfer learning techniques can decrease the computing cost even further, being the carbon footprint of the study just 0.18 g of CO2 (whereas a single device TCAD study can produce up to 2.6 kg of CO2), while obtaining coefficient of determination values larger than 0.985 when using only a 10% of the input samples.


In nanoelectronics, an unsolved issue is the ever-closer limit of transistor scaling that threatens to put a halt to the digital revolution observed over the last 50 years [1]. Therefore, it is essential and urgent to investigate new alternatives and solutions to be used in future transistor technology nodes. Currently, gate-all-around (GAA) device architectures, like nanosheet (NS) or nanowire (NW) FETs, are suggested as strong contenders by the International Roadmap for Devices and Systems [2], because of their excellent electrostatic control [3].

Considering that the fabrication of nanoelectronic devices is a long, complex and very expensive process [4], the use of Technology Computer-Aided Design (TCAD) to predict device performance is mandatory in order to reduce costs and to optimize development times [5]. At the nanoscale, the random deficiencies introduced during the manufacturing process lead to variability issues, heavily impacting the performance and reliability of the final product. Metal-gate granularity (MGG), line edge roughness (LER), random discrete dopants (RDD), oxide thickness variation (OTV) and interface trap charges (ITC) are the main sources of variability affecting current multigate transistors [6]. To properly analyze the effect of these sources of fluctuations, statistical analysis of large ensembles of devices are needed [7]. On top of that, three-dimensional simulations that account for quantum effects are required to realistically model device behavior [8], heavily increasing the computational cost of the studies. For that reason, it is relevant to apply complementary techniques, such as machine learning (ML) [9, 10], to either shorten the computational times or to open the path to the investigation of other effects that would be unfeasible using only TCAD. Recently, different aspects of machine learning have attracted interest in the field of nanoelectronics. At circuit level, ML techniques have been applied to predict the current-voltage curves needed for NW FETs compact models [11]. At device level, several works have analyzed the impact of MGG or/and RDD induced variability in GAA NW FETs [12, 13] and NS FETs [14, 15]. However, other sources of variability, such as LER, have not been investigated so far.

Within this work, we demonstrate that multi-layer perceptron networks can efficiently predict the effect of LER in state-of-the-art GAA NW FETs, greatly reducing the number of device simulations required to fully capture this effect and thus, the associated computational cost. In addition, we evidence that the use of transfer learning techniques can further decrease the computing effort, obtaining coefficient of determination values (R2) above 0.985 when using only a 10% of the input samples.


Fig 1 shows 2D cross-sectional schematics of the two Si-based GAA NW FETs used in this work, a 22 nm gate length device (top figures) and a 10 nm gate length one (bottom figures). Their main device dimensions are included in Table 1 for an easy comparison. These devices have an uniform p-type doping in the semiconductor channel and a n-type Gaussian doping in the source/drain (s/d) regions, that is fixed to Ns/d from the s/d contacts till a point (Xm) nearby the gate region, where the doping exponentially decays (with a slope δ), as shown in Fig 2. The specific doping values for each device and region are also included in Table 1.

Fig 1. 2D schematics of the 22 nm (top) and 10 nm (bottom) gate length GAA NW FETs.

Fig 2. Cross-section of Gaussian-like doping profile along the transport direction in the 22 nm and 10 nm gate length GAA NW FETs.

Table 1. Dimensions, dopings and configuration parameters for the two ideal, LER-free GAA NW FETs.

The 22 nm GAA NW FET structure is designed after an experimental device [16] and, Fig 3 shows a comparison of experimental versus simulated ID-VG characteristics, on both linear and logarithmic scales, at a supply bias of 1.0 V. Two three-dimensional finite-element (FE) device simulation approaches, implemented in VENDES [17], have been considered in this work. First, a quantum-corrected drift-diffusion (DD) method, able to efficiently characterize the device behavior in the sub-threshold region. Second, a quantum-corrected ensemble Monte-Carlo (MC) approach, that produces noisy results in the sub-threshold (see Fig 3) but it is able to correctly capture non-equilibrium effects, thus being valid for calculating the device on-current (Ion). The DD approach, thanks to a careful fitting of the mobility models [18], also shows a very good agreement with the experimental data in the device on-region; however, this method has been previously demonstrated to produce inaccurate results in on-region variability studies that involve fluctuations in the device channel cross-section [19], as is the case with LER. The simulation times of one gate bias point at VD = 1.0 V using either 3D quantum-corrected DD or MC simulations are on average 1.4 hours and 80.3 hours, respectively, in an Intel i7-9700K CPU @ 3.60GHz single core for a 190 K nodes device mesh. Therefore, to save computational time and resources, we combine fast DD simulations to obtain the values of the sub-threshold region figures of merit, i.e. off-current (Ioff), sub-threshold slope (SS) and threshold voltage (Vth), with slower MC results to extract the Ion. As indicated in Fig 3, Ioff and Ion are calculated as the drain currents at the specific gate biases of 0.0 V and 1.0 V, respectively. Vth is extracted via the linear extrapolation (LE) method, that defines the threshold voltage as the x-intercept of the curve linear extrapolation at its maximum first derivative point [20]. Note that, unlike the off- and on-currents, to obtain an accurate Vth value several gate bias points need to be simulated, with average execution times of 8.4 hours. The SS is the slope of the linear part of the ID-VG curve observed for VG values lower than Vth (see Fig 3). Therefore, once you obtain Vth you can also estimate SS without any further computation. Table 2 shows the main figures of merit for the 22 and 10 nm gate length devices, in ideal conditions, i.e. not affected by LER.

Fig 3. Experimental (EXP) vs. simulated ID-VG characteristics at VD=1.0 V, on both logarithmic and linear scales, for the 22 nm gate length GAA NW FET.

Quantum-corrected drift-diffusion (DD) and Monte Carlo (MC) simulations are included for comparison. The main figures of merit (FOM) that characterize device performance, off-current (Ioff), threshold voltage (Vth) and on-current (Ion) are included.

Table 2. Main figures of merit that characterize the performance of the two ideal, LER-free GAA NW FETs.

LER is a source of variability that arises from the fabrication processes since the device edges are not perfectly smooth and deviate from the ideal shape. At the current scaling level, with dimensions below 10 nm, LER can be as large as the size of the device’s critical features, thus heavily impacting the transistor’s performance and reliability [2]. To model LER, the edges of the nanowire in y-direction (see an example in Fig 4) are deformed according to the shape of a roughness profile created via the Fourier synthesis method [21]. These deformations are typically characterized by two parameters: i) the correlation length (CL), which describes the spatial correlation between deformations in the different points of the device in the x-direction and, ii) the root mean square (RMS) height, that establishes the amplitude of the roughness in the y-direction. First, the Gaussian spectrum (SG) is used to generate the roughness as follows: (1) being k the frequency values that are defined by the discretization in real space. Then, the spectrum SG is multiplied by an array of complex random numbers and transformed back to real space via an inverse fast Fourier transform. The applied LER is uncorrelated, i.e. the deformations will not be equal at both edges of the device, mimicking the real fabrication process. Fig 5 shows an example of a roughness profile with a CL = 10 nm and a RMS = 1 nm, which is then used to modify the device structure. Note that, the FE-based tetrahedral discretization will allow to properly capture the LER-induced deformation.

Fig 4. Example of a 22 nm gate length GAA NW FET affected by LER.

The correlation length (CL) is 10 nm and the root mean square (RMS) height is 1 nm.

Fig 5. Line-edge roughness deformation (CL = 10 nm, RMS = 1 nm) illustrating the effect on the 22 nm gate length device geometry.

The outline of the ideal undeformed device is included as reference.

Machine learning modeling

Machine learning and deep learning models have been successfully applied to many research areas [22]. However, the use of such methods to deal with the most relevant transistor design challenges has only recently been initiated. As it was previously noted, the characterization of Si-based GAA NW FETs behavior requires very time-consuming simulations, especially in the case of using MC methods. For this reason, we propose a machine learning approach to predict the impact of LER on these devices with the aim of decreasing noticeably the total simulation time. In particular, to obtain the device on-current (Ion), off-current (Ioff), sub-threshold slope (SS) and threshold voltage (Vth), we plan to use multi-layer perceptron (MLP) networks, which are simpler with respect to other types of neural networks but powerful enough to deliver very good results [23]. In any case, we will also compare the performance results against other well-established ML methods.

MLPs are fully connected feed-forward neural networks, which consist of three or more layers (an input and an output layer with one or more hidden layers). An example is shown in Fig 6. The input layer consists of a set of neurons (from x1 to xn in the figure) representing the input features. Each neuron in the hidden layer transforms the values from the previous layer with a weighted linear summation, followed by a non-linear activation function. The output layer receives the values from the last hidden layer and transforms them into the output values. The neurons in the MLP are trained with the back propagation learning algorithm. As a result, MLPs are designed to approximate any continuous function and can solve problems which are not linearly separable for either classification or regression. In our case, we will focus on regression since the goal is to obtain the values that characterize a particular device (Ion, Ioff, SS and Vth) using as input some features describing its LER deformations.

Fig 6. Example of a multi-layer perceptron network (MLP) containing two hidden layers.

Specifically, to generate the input features for training the neural network, the total length of the device (x-direction) is discretized into 400 points (see a simplified example in Fig 5), a value large enough to capture the effect of the LER deformation. At each of these points the downward vertical distance between the middle of the device (y = 0.0) to its edge (y = hwdown), is measured and stored, using negative values for reference. Next, the same procedure is carried out in the upward direction (y = hwup), but now these values are considered as positive. Consequently, for each LER-affected device, there are a total of 800 input values that characterize its deformation. From now on, we refer to these points as the LER profile of the device.

Results and discussion

Datasets and experimental setup

As mentioned before, in this work we use two GAA NW FETs that differ both in their physical dimensions and in other configuration parameters, as shown in Table 1. LER deformations are then applied to these ideal non-deformed devices considering three RMS heights (0.4, 0.6 and 1.0 nm), and four CL values (10, 15, 20 and 30 nm), generating 1,000 different device configurations (LER profiles) for each combination of RMS height and CL. For each LER-affected device, as mentioned in the Methodology section, we run a DD simulation to extract the sub-threshold region figures of merit, and a Monte Carlo simulation, to obtain the Ion. Note that, although the material properties of the devices under study are exactly the same, their physical dimensions will differ due to LER. Therefore, for a particular device configuration, it may occur that one simulation methodology is able to reach convergence while the other one fails to do so, providing a Null output in the corresponding figure of merit, which is then disregarded for the study. It is worth mentioning that the longer the RMS height, the larger the influence of LER on the device performance [24]. The impact of LER also tends to grow for increasing CLs, although it reaches a plateau at CL values similar to the device gate length [25]. For these reasons, for both gate length devices, we have combined the extreme values of CL (10 and 30 nm) and RMS (0.4 and 1.0 nm) to generate the training datasets, and we have used two intermediate values of CL and RMS, i.e. CL = 15 nm, RMS = 0.6 and CL = 20 nm, RMS = 0.4, to generate the test datasets. As a consequence, the training and test datasets for each gate length device contain 4,000 and 2,000 examples, respectively.

All the codes used for this work have been implemented using Python 3.9 and the Scikit-learn library (v1.2.1). The main features of the MLP network used in the experiments are detailed in Table 3. Note that the number of hidden layers and their sizes (hyperparameters) were previously obtained using a grid search. To train the MLP model, LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm) was adopted because of the relatively small size of the training data set (4,000 examples) [26]. It uses the square error as loss function. The response variables Ioff and Ion are scaled using the logarithmic function as a pre-processing stage, so the hyperbolic tangent (tanh) is the most suitable activation function because it can work for positive as well as negative input values [12]. On the other hand, all the experiments were conducted on a server with one Intel Core i7-9700K CPU @ 3.60GHz and 128 GB of RAM memory.

Table 3. Main characteristics of the MLP network considered in this work.

Note that the solver refers to the algorithm or method used to solve the optimization problem involved in training the regressor. L2 regularization adds a penalty term to the loss function during training to prevent overfitting.

Performance results

To evaluate and compare the different machine learning approaches, we have considered two performance metrics:

  1. Coefficient of determination (R2): It is a measure that provides information about the goodness of fit of a model. In the context of regression it is a statistical measure of how well the regression line approximates the actual data and therefore a measure of how well unseen examples are likely to be predicted by the model. If is the predicted value of the i-th example and yi is the corresponding true value for total n examples, the estimated R2 is defined as: (2) where . The best possible score is 1 and it can be negative.
  2. Root Mean Squared Error (RMSE): It is the most common evaluation metric for regression models. It is the square root of the mean squared error (MSE): (3)
    The calculated value is in the same unit as the required output variable. Lower values are better.

Figs 7 and 8 show the performance of our trained MLP model by comparing predicted and actual values of the four figures of merit (Ioff, Ion, SS and Vth) for the 22 nm and 10 nm LER-affected devices, respectively. It can be observed that the predictions are noticeably accurate in all cases, also including the extreme values. It demonstrates that it is possible to predict the behavior of devices affected by intermediate values of CL and RMS using as input of the MLP only information obtained from their extreme values. In particular, Table 4 summarizes the R2 and RMSE values achieved by our models. Performance metrics confirm that predictions are excellent, reaching R2 values up to 0.9994. Note that the worst case, Ion for 22 nm devices, is still very good with a coefficient of determination of 0.9868. RMSE is always very low, being at least two orders of magnitude lower than the actual values of the considered response variable. See in Table 2 the different figures of merit reference values for the ideal GAA NW FETs.

Fig 7. Predicted and actual values for the considered figures of merit using our test dataset (LER-affected 22 nm gate length GAA NW FETs).

Fig 8. Predicted and actual values for the considered figures of merit using our test dataset (LER-affected 10 nm gate length GAA NW FETs).

Table 4. Performance metrics (R2 and RMSE) of our MLP-based regression models.

The above results were obtained using the complete training datasets to feed the MLP networks. Next we will evaluate the impact of the input data size on the training process. With this goal in mind, R2 and RMSE performance metrics were computed for MLP networks trained using only a fraction of the input dataset ranging from 0.1 (10% of the dataset) to 1 (complete dataset). Results when considering the 22 nm devices are displayed in Fig 9. It can be observed that even for small percentages of the input dataset, metrics for all the response variables are quite good. For instance, R2 values range from 0.9642 (Vth) to 0.9919 (Ioff) using only 10% of the training dataset. The response variable that benefits the most from the increase in the number of input examples is Vth (bottom right figure). On the other hand, as expected, RMSE tends to decrease when adding more examples to the training dataset. It is worth noting that the behavior for the 10 nm devices is very similar. Therefore, our approach is capable of successfully predicting the values of the figures of merit even using a reduced training dataset.

Fig 9. Impact of the training data size on the performance of the MLP models (LER-affected 22 nm gate length GAA NW FETs).

(a) Ioff. (b) Ion. (c) SS. (d) Vth.

Training our MLP models is extremely fast. In particular, it only takes on average from 50.9 to 162.6 seconds depending on the considered response variable and gate length. It means that, for example, computing Ion from our trained model for a particular 22 nm device is about 2,370 × faster than using a MC simulation (80.3 hours, as was explained in the Methodology section). Note that times to generate the training data are not included in these experiments.

Comparison with other machine learning methods.

In addition to MLPs, we can find in the literature many regression methods. To demonstrate the benefits of our approach, a comparison with some of the most successful regression techniques was carried out. In particular:

  • Decision Tree (DT) regression [27]: It creates a tree-based structure that predicts the value of a target variable by learning simple decision rules inferred from the data features. A regression tree is built using a binary recursive partitioning process. Initially, all the training examples are grouped into the same partition. The algorithm then begins allocating the data into the first two partitions or branches, using every possible binary split on every feature. The algorithm selects the split that minimizes the sum of the squared deviations from the mean in the two separate partitions. This splitting rule is then applied to each of the new branches. To predict a response, the decisions in the tree should be followed from the root (beginning) node down to a leaf node. The leaf node contains the response.
  • Random Forest (RF) regression [28]: It is a supervised learning algorithm that uses ensemble learning methods for regression. In particular, it uses the combination of multiple random decision trees, each trained on a subset of data. The use of multiple trees gives stability to the algorithm and reduces variance.
  • Support Vector Machine (SVM) regression [29]: It is one of the most successful and well studied methods for regression. SVM regression is considered a nonparametric technique because it relies on kernel functions. One of the main advantages of SVM regression is that its computational complexity does not depend on the dimensionality of the input space.

Table 5 shows the performance metrics using different machine learning regression techniques for the 22 nm and 10 nm LER-affected devices. For all the response variables, the best performer is always our proposal based on MLP networks. Differences with respect to the other methods are noticeable. Regardless, RF obtains decent results for all the studied cases. We must highlight that RF training times are always higher than those observed for MLP. DT and SVM are faster to train, but their results are poor and very irregular.

Table 5. Performance metrics obtained by different machine learning techniques for regression.

Transfer learning.

Machine learning methods, especially those related to (deep) neural networks, require big datasets to successfully train the models. There are scenarios where training data is expensive or difficult to collect. In our case, for example, a single MC simulation takes tenths of hours on a standard server. This is where transfer learning comes in.

Transfer Learning (TL) refers to a technique for predictive modeling on a different but somewhat related problem that can then be reused partially or completely to speed up training and/or improve a model’s performance on the problem of interest. In the context of neural networks, this means reusing the weights of one or more layers of a pre-trained network model in a new model and keeping the weights fixed, adjusting them or adapting them completely when training the model.

Next, we will apply a transfer learning approach to predict the figures of merit of the 10 nm devices using as starting point the trained models used for the 22 nm devices. It means that we will retain the values of the model’s trainable parameters from the previous model (22 nm devices) and use those initially instead of starting a training process from scratch. First, we want to demonstrate that this way the training process will be faster. That is, the number of iterations required to successfully train the networks is significantly lower. A comparison between training the networks from scratch or using the transfer learning approach is shown in Fig 10. The graphs display the results of the training process in terms of the evolution of R2 and RMSE when using different numbers of iterations. It can be observed that the transfer learning method works since R2 and RMSE quickly reach values very close to the maximum and minimum, respectively. For instance, R2 and RMSE are 0.9986 and 4.812 × 10−8 A when training the network from scratch to predict Ion for the 10 nm LER-affected devices (see Table 4). In that case, the number of iterations was 2,000. Using transfer learning, the corresponding values after only 100 iterations are 0.9981 and 5.635 × 10−8 A, which are almost identical to our best ones (top right Fig 10). Note that, if we train the MLP model without transfer learning instead, after 100 iterations, R2 and RMSE would be 0.7098 and 6.934 × 10−7 A, very far from our best results. At the same time, reducing the iterations to converge also has a big impact on the training times. In this way, considering 100 iterations, we reduce to less than 5 seconds the time required to train our MLP-based models in order to predict the response variables. In other words, computing the figures of merit for the 10 nm devices when using transfer learning is about 57,800× and 1,000× faster than MC and DD simulations, respectively (see Methodology section).

Fig 10. Performance metrics using a transfer learning approach and training the networks from scratch (i.e., without transfer learning) to predict the figures of merit of the 10 nm gate length devices.

(a) Ioff. (b) Ion. (c) SS. (d) Vth.

As commented above, another important advantage of transfer learning is the reduction of the required input data to train the models, which is especially relevant in cases where training data is costly or difficult to collect. Table 6 shows the regression performance metrics obtained by our transfer learning approach when using a small fraction of the training dataset to predict the response variables of the 10 nm devices. Results confirm the benefits of our methodology where good predictions are achieved even when using a small percentage of the training dataset. For example, R2 is always above 0.985 using 10% of the input examples (i.e. only 400 LER profiles).

Table 6. Performance metrics obtained by our transfer learning approach when using a small fraction of the training dataset to predict the figures of merit of the 10 nm gate length devices.

Therefore, we can conclude that transfer learning is a good solution to speed up the training process and also to reduce noticeably the required training dataset size. This technique could aid in the design of variability-resistant device architectures since it could allow quick and simple testing of the impact of different device features (e.g. gate length, cross-section dimensions) on an LER-affected transistor’s performance.

Impact on the environment: Carbon emissions.

As we pointed out, our approach reduces noticeably the computing time to calculate the figures of merit for a particular device. Next, we demonstrate that it also has a strong impact on the environment, causing a reduction in the carbon footprint. To estimate the carbon emissions we follow the methodology presented in Lacoste et al. [30]. In particular, the estimated carbon emissions in grams are derived using the following expression: (4) where t is the equivalent CPU-hours of computation, Ce = 341 is the carbon efficiency coefficient of the grid (measured in grams of CO2eq/kWh) and Wcpu is the Thermal Design Power of the CPU in watts (95 W in our case). Note that the carbon efficiency data for our region was taken from Moro and Lonza [31]. We use the corresponding CPU-hours required to compute the figures of merit by means of simulations (DD and MC) and training the MLP models (our approach).

Carbon emissions are shown in Table 7. We compare the calculation of the figures of merit for an LER-affected 10 nm device using simulations (DD and MC) and our proposal based on training MLP networks. It is important to highlight that, unlike the simulations procedure, the carbon footprint of the training process should be paid only once, because the same trained network can be reused for different configurations (LER profiles) of devices with equal gate length. As a result, for instance, our proposal reduces the emissions from 2.6 kg to just 0.84 g of CO2 for the calculation of Ion when training the MLP network from scratch. However, if 1,000 configurations are considered, the carbon emissions caused by the MC simulations will increase up to 2.6 tons of CO2, while our method does not require additional training. If the transfer learning method is used instead, the carbon footprint is dramatically reduced to only 0.05 g of CO2.

Table 7. Carbon emissions in grams of CO2 to compute the figures of merit of only one LER-affected 10 nm gate length GAA NW FET using simulations and our MLP-based approach with and without transfer learning.


The digital world we live in would have not been possible without the continuous advance of the semiconductor industry. In this context, the use of advanced simulation tools (TCAD) to evaluate new semiconductor device architectures and assess their robustness is crucial for both the semiconductor industry and academic research. However, with the current device’s critical dimensions deep into the nanometer regime, the computational cost of some TCAD studies can be prohibitive. Therefore, the introduction of less computationally-demanding methods is needed to deal with this problem. Here, we have demonstrated the advantages of using machine learning techniques to assess the effect of the line edge roughness-induced variability on gate-all-around nanowire (GAA NW) FETs. The impact of LER on four different figures of merit (off-current, threshold voltage, sub-threshold slope and on-current) has been predicted for two different GAA NW FETs, a 22 nm gate length device and a scaled-down version, with a 10 nm gate length. The MLP networks have achieved the best performance metrics (R2 and RMSE values), when compared to well-known regression methods (DT, RF and SVM), with R2 ∼ 0.99 for the two devices and the four analyzed figures of merit. Finally, we demonstrate that MLP networks can dramatically decrease variability studies computational effort, which can be diminished even further by using transfer learning techniques, achieving R2 > 0.985 when using only a 10% of the input samples, and producing as little as 0.18 g of CO2 emissions (when computing the four studied figures of merit), a value several orders of magnitude lower than that of TCAD studies. Finally, it is worth mentioning that the MLP architecture could also be applied (with an adequate calibration of the network hyperparameters and weights) to other relevant sources of variability affecting semiconductor devices, such as metal grain granularity, gate-edge roughness or random discrete dopants.


  1. 1. Iwai H. Impact of Micro-/Nano-Electronics, Miniaturization Limit, and Technology Development for the Next 10 Years and After. ECS Transactions. 2021;102(4):81–88.
  2. 2. IEEE. International Roadmap for Devices and Systems.; 2022.
  3. 3. Nagy D, Espiñeira G, Indalecio G, García-Loureiro AJ, Kalna K, Seoane N. Benchmarking of FinFET, Nanosheet, and Nanowire FET Architectures for Future Technology Nodes. IEEE Access. 2020;8:53196–53202.
  4. 4. Liddle JA, Gallatin GM. Nanomanufacturing: A Perspective. ACS Nano. 2016;10(3):2995–3014. pmid:26862780
  5. 5. Maiti CK. Introducing Technology Computer-Aided Design (TCAD). Fundamentals, Simulations, and Applications. Jenny Stanford Publishing; 2017.
  6. 6. IEEE. In: More than Moore. International Roadmap for Devices and Systems. White paper; 2022.
  7. 7. Reid D, Millar C, Roy G, Roy S, Asenov A. Analysis of Threshold Voltage Distribution Due to Random Dopants: A 100000-sample 3D Simulation Study. IEEE Transactions on Electron Devices. 2009;56(10):2255–2263.
  8. 8. Vasileska D, Goodnick SM. In: Computational Electronics: Semiclassical and Quantum Device Modeling and Simulation. CRC Press. Taylor & Francis; 2010.
  9. 9. Kim I, Park S, Jeong C, Shim M, Sin Kim D, Kim G-T, Seok J. Simulator acceleration and inverse design of fin field-effect transistors using machine learning. Scientific Reports. 2022;12(1):1140. pmid:35064166
  10. 10. Woo S, Jeong H, Choi J, Cho H, Kong J, Kim S. Machine-Learning-Based Compact Modeling for Sub-3-nm-Node Emerging Transistors. Electronics. 2022;11(17):2761.
  11. 11. Kao MY, Kam H, Hu C. Deep-Learning-Assisted Physics-Driven MOSFET Current-Voltage Modeling. IEEE Electron Device Letters. 2022;43(6):974–977.
  12. 12. Akbar C, Li Y, Sung WL. Transfer learning approach to analyzing the work function fluctuation of gate-all-around silicon nanofin field-effect transistors. Computers and Electrical Engineering. 2022;103:108392.
  13. 13. Carrillo-Nuñez H, Dimitrova N, Asenov A, Georgiev V. Machine Learning Approach for Predicting the Effect of Statistical Variability in Si Junctionless Nanowire Transistors. IEEE Electron Device Letters. 2019;40(9):1366–1369.
  14. 14. Akbar C, Li Y, Sung WL. Deep Learning Approach to Inverse Grain Pattern of Nanosized Metal Gate for Multichannel Gate-All-Around Silicon Nanosheet MOSFETs. IEEE Transactions on Semiconductor Manufacturing. 2021;34(4):513–520.
  15. 15. Butola R, Li Y, Kola SR. A Machine Learning Approach to Modeling Intrinsic Parameter Fluctuation of Gate-All-Around Si Nanosheet MOSFETs. IEEE Access. 2022;10:71356–71369.
  16. 16. Bangsaruntip S, Balakrishnan K, Cheng SL, Chang J, Brink M, Lauer I, et al. Density scaling with gate-all-around silicon nanowire MOSFETs for the 10 nm node and beyond. In: Proc. IEEE Electron Devices Meeting (IEDM); 2013. p. 20.2.1–20.2.4.
  17. 17. Seoane N, Nagy D, Indalecio G, Espiñeira G, Kalna K, García-Loureiro AJ. A Multi-Method Simulation Toolbox to Study Performance and Variability of Nanowire FETs. Materials. 2019;12(15):2391–2406. pmid:31357496
  18. 18. Seoane N, Kalna K, Cartoixà X, García-Loureiro A. Multilevel 3-D Device Simulation Approach Applied to Deeply Scaled Nanowire Field Effect Transistors. IEEE Transactions on Electron Devices. 2022;69(9):5276–5282.
  19. 19. Nagy D, Indalecio G, García-Loureiro AJ, Espiñeira G, Elmessary MA, Kalna K, et al. Drift-Diffusion versus Monte Carlo simulated ON-current variability in Nanowire FETs. IEEE Access. 2019;7:12790–12797.
  20. 20. Ortiz-Conde A, García-Sánchez FJ, Muci J, Terán Barrios A, Liou JJ, Ho CS. Revisiting MOSFET threshold voltage extraction methods. Microelectronics Reliability. 2013;53(1):90–104.
  21. 21. Indalecio G, Aldegunde M, Seoane N, Kalna K, García-Loureiro AJ. Statistical study of the influence of LER and MGG in SOI MOSFET. Semiconductor Science and Technology. 2014;29:045005.
  22. 22. Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, et al. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Compututing Surveys. 2018;51(5).
  23. 23. Haykin S. Neural Networks and Learning Machines. Prentice Hall; 2009.
  24. 24. Seoane N, Fernandez JG, Kalna K, Comesaña E, García-Loureiro A. Simulations of Statistical Variability in n-Type FinFET, Nanowire, and Nanosheet FETs. IEEE Electron Device Letters. 2021;42(10):1416–1419.
  25. 25. Yu T, Wang R, Huang R, Chen J, Zhuge J, Wang Y. Investigation of Nanowire Line-Edge Roughness in Gate-All-Around Silicon Nanowire MOSFETs. IEEE Transactions on Electron Devices. 2010;57(11):2864–2871.
  26. 26. Lee J, Asenov P, Aldegunde M, Amoroso SM, Brown AR, Moroz V. A Worst-Case Analysis of Trap-Assisted Tunneling Leakage in DRAM Using a Machine Learning Approach. IEEE Electron Device Letters. 2021;42(2):156–159.
  27. 27. Loh WY. Fifty Years of Classification and Regression Trees. International Statistical Review. 2014;82(3):329–348.
  28. 28. Breiman L. Random forests. Machine learning. 2001;45:5–32.
  29. 29. Awad M, Khanna R. Efficient learning machines: theories, concepts, and applications for engineers and system designers. Springer Nature; 2015.
  30. 30. Lacoste A, Luccioni A, Schmidt V, Dandres T. Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:191009700. 2019;.
  31. 31. Moro A, Lonza L. Electricity carbon intensity in European Member States: Impacts on GHG emissions of electric vehicles. Transportation Research Part D: Transport and Environment. 2018;64:5–14. pmid:30740029