Figures
Abstract
The widespread adoption of cloud computing necessitates privacy-preserving techniques that allow information to be processed without disclosure. This paper proposes a method to increase the accuracy and performance of privacy-preserving Convolutional Neural Networks with Homomorphic Encryption (CNN-HE) by Self-Learning Activation Functions (SLAF). SLAFs are polynomials with trainable coefficients updated during training, together with synaptic weights, for each polynomial independently to learn task-specific and CNN-specific features. We theoretically prove its feasibility to approximate any continuous activation function to the desired error as a function of the SLAF degree. Two CNN-HE models are proposed: CNN-HE-SLAF and CNN-HE-SLAF-R. In the first model, all activation functions are replaced by SLAFs, and CNN is trained to find weights and coefficients. In the second one, CNN is trained with the original activation, then weights are fixed, activation is substituted by SLAF, and CNN is shortly re-trained to adapt SLAF coefficients. We show that such self-learning can achieve the same accuracy 99.38% as a non-polynomial ReLU over non-homomorphic CNNs and lead to an increase in accuracy (99.21%) and higher performance (6.26 times faster) than the state-of-the-art CNN-HE CryptoNets on the MNIST optical character recognition benchmark dataset.
Citation: Pulido-Gaytan B, Tchernykh A (2024) Self-learning activation functions to increase accuracy of privacy-preserving Convolutional Neural Networks with homomorphic encryption. PLoS ONE 19(7): e0306420. https://doi.org/10.1371/journal.pone.0306420
Editor: Andrew J, Manipal Institute of Technology, Manipal Academy of Higher Education, INDIA
Received: August 15, 2023; Accepted: June 13, 2024; Published: July 22, 2024
Copyright: © 2024 Pulido-Gaytan, Tchernykh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files. Source code is available in the GitHub repository www.github.com/bernardopulido/CNN-HE-SLAF).
Funding: This work was supported by the Ministry of Science and Higher Education of the Russian Federation (Project 075-15-2022-294).
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Machine Learning (ML) solutions based on Neural Networks (NN) are extensively used in multiple domains due to the remarkable results achieved in image rendering [1], business decision-making [2], pharmaceutical [3], bioinformatics [4], cancer research [5], and drug discovery, among many others. However, NN modeling demands considerable infrastructure and computing power for calculations and training with big data in a reasonable amount of time [6].
Cloud computing provides the necessary infrastructure and services to implement, train, and deploy NN models [7, 8]. However, using NNs on third-party infrastructures reveals several data privacy issues during the pre-processing, processing, and post-processing [9]. Third-party infrastructure reduces resource costs and makes it easier to solve tasks but introduces privacy issues of sensitive information processing [10, 11]. The access of the NN to the raw data can create potential privacy risks because data are located in the cloud-shared infrastructure. In general, cloud environments provide cost-savings, flexibility, scalability, and ubiquitous computing resources, but they potentially sacrifice the privacy of information [12]. Their main constraints are insufficient protection of sensitive data processing and the inherently insecure nature of shared resources.
Modern encryption techniques ensure security and are considered the best option to protect stored data and data in transit from an unauthorized third party. However, a decryption process is necessary when the data must be processed or analyzed, falling into the initial problem of data vulnerability. Efficient processing of sensitive data implies beyond an accurate prediction, analysis, or classification; it also implies adequate privacy handling.
Homomorphic Encryption (HE) schemes are solutions to address privacy problems by providing encrypted data computations to the client [13, 14]. Applications with HE provide security and confidentiality to users, even when they are executed on untrusted shared infrastructures. However, HE schemes have limitations concerning the arithmetic operations that can be executed efficiently. They are bound to solve real problems only using additions and multiplications.
Recent advances in HE focus on incorporating HE cryptosystems into NN models. However, redesigning neuron functions and developing HE-friendly non-linear operations like comparison, max/min, sign detection, etc. [15] are open problems.
Activation functions like Rectified Linear Unit (ReLU), Leaky ReLU, Parametric ReLU, Gaussian Error Linear Unit (GELU), Sigmoid, Tangent Hyperbolic (Tanh), Softmax, Swish, Binary Step, etc. are non-polynomial and use operations not supported by HE schemes. Therefore, the efficient implementation of cryptographically computable activation functions becomes imperative for developing a privacy-preserving NN with HE (NN-HE) [16].
Several polynomial approximation methods are used to support homomorphic computation in NN-HE, such as Taylor series, Least-squares, Chebyshev series, Newton, Composite polynomials, etc.
Another important aspect is to allow homomorphic computations in the domain of continuous values. The Cheon-Kim-Kim-Song (CKKS) scheme [17] is designed to perform approximate arithmetic over encrypted real numbers. While the CKKS scheme uses state-of-the-art solutions to support approximations, they are still inefficient in practice, and thus, efficient HE-compliant activation functions remain an open challenge [18].
Several limitations arise in reproducing the behavior of neurons using polynomial approximations in the homomorphic space [19]. The main goal is to control the trade-off between having a non-linear transformation, which is needed by the learning algorithm, and keeping the degree of the polynomials small to make HE feasible [20–22].
In this paper, we design an adaptive NN-HE with improved accuracy and performance. Optimization is supported by polynomial Self-Learning Activation Functions (SLAF). Trainable coefficients of polynomial approximations of NN activation functions are updated during the training together with synaptic weights at each neuron independently to learn problem-specific, dataset-specific, and NN-specific features.
Two Convolutional Neural Network (CNN) models are proposed: CNN-HE-SLAF and CNN-HE-SLAF-R. In the first model, all the activation functions are replaced with SLAFs, and CNN is trained to find weights and coefficients. In the second one, CNN is trained with the original activation function, then the found weights are fixed, activation functions are substituted by polynomials, and CNN is shortly re-trained to adapt SLAF coefficients at each polynomial.
We present two SLAF initialization approaches: SLAF(0) and SLAF(P). The SLAF(0) considers trainable polynomial coefficients initialized to zero. In SLAF(P), coefficients are initialized by values of known polynomial approximations.
CNN-HE-SLAF and CNN-HE-SLAF-R are non-interactive privacy-preserving models to classify encrypted data. For a detailed assessment of the capabilities of the SLAF approach, we conduct a comprehensive analysis over 1-convolutional (CNN1) and 2-convolutional (CNN2) NNs using both plaintext and ciphertext as inputs from the MNIST dataset.
These results show the remarkable capability of HE-SLAF to yield comparable accuracy to traditional activation functions, indicating that models with trainable polynomial activations have the same representative ability as their non-polynomial counterparts. HE-SLAF low-degree polynomials achieve the same accuracy as a non-polynomial ReLU activation function over non-homomorphic models in the plaintext space.
On the other hand, we show that NN-HE-SLAF models with trainable three-degree polynomial activations improve accuracy over known NN-HE solutions in the literature. The training of polynomial coefficients of each neuron independently allows the exploration of the space of all the activations to construct better homomorphically evaluated approximations.
The content of the paper is structured as follows. Section 2 introduces privacy-preserving techniques, delving into homomorphic encryption schemes. Section 3 presents the primitives of the CKKS scheme. Section 4 describes privacy-preserving neural networks with homomorphic encryption. Section 5 discusses state-of-the-art NN-HE models that polynomially approximate cryptographically non-computable activation functions. Section 6 presents related work. Section 7 introduces homomorphic self-learning activation functions. Section 8 provides a brief overview of the case study. Section 9 defines the experimental setup. The experimental results are presented in Section 10. Finally, we discuss our solutions and conclude in Section 11.
To facilitate understanding of the described ideas, we summarize the main HE, NN, and SLAF terminology in the Appendix. S1 Table shows the terms used in the paper, and S2 Table presents the main acronyms.
2. Privacy-preserving techniques
Privacy-preserving methods are emerging as a crucial consideration in cloud environments. Secure Multi-Party Computation (MPC) [23, 24], Federated Learning (FL) [25, 26], Differential Privacy (DP) [27], Functional Encryption (FE) [28], and Homomorphic Encryption (HE) are representative approaches in this domain.
2.1. Secure multi-party computation
Multi-Party Computation (MPC) encompasses several cryptographic algorithms and protocols that allow multiple mutually distrustful parties to calculate jointly a function over their inputs while keeping those inputs private: garbled circuits, secret sharing schemes, and oblivious transfer.
In MPC, regardless of dishonest behavior and information sharing among some colluded parties, they should not be able to learn more about the other participants’ inputs beyond what can be inferred from the computation result. MPC guarantees that even with such a collusion, the privacy and correctness of the collaborative computation are preserved.
Formally, n parties with their respective private data d1, d2,…,dn can jointly compute the public function f(d1, d2,…,dn) while preserving their input secret. The parties compute f on d1, d2,…,dn without compromising confidentiality. The information about parties’ secrets must not be leaked during the process.
Yao’s work [24] proposed the concept of secure two-party computation by inventing the garbled circuit. The subsequent extension to MPC is provided by Goldreich et al. [23]. Several methods were proposed to evaluate privacy-preserving Neural Network (NN) models using MPC protocols. However, these protocols imply a high communication overhead, making the latency of models a problem. Communication costs can be minimized by performing resource-intensive pre-computations, such as the NN non-linear activations, during an offline phase or using trusted execution environments [20].
2.2. Federated learning
Federated Learning (FL) is an approach to train NN models across multiple decentralized parties holding local data samples. It allows the model training without explicitly exchanging the data between parties, guaranteeing privacy while enabling collaborative learning.
The FL process involves training local models on each party, which are then aggregated into a comprehensive global model. The gradient updates, rather than the data itself, are shared across the network. Subsequently, a copy of the global model is disseminated to each party to repeat the process. This procedure iterates until the model performance meets the desired criteria.
Although FL does not inherently provide cryptographic privacy guarantees [25], gradient aggregation can be secured using privacy-preserving techniques such as MPC and HE [26]. FL is a prominent technique in scenarios that require strict respect for data confidentiality, e.g., where data dissemination is restricted due to regulatory constraints or to protect individual privacy. The main FL drawbacks are the need for a trusted gradient aggregation manager and communication overhead.
2.3. Differential privacy
Differential Privacy (DP) is a mathematical framework that adds random noise to data to provide privacy guarantees for individual-level information while enabling accurate aggregate analysis. It quantifies the anonymization of sensitive data based on the amount of noise added.
A randomized function δ provides ϵ-differential privacy if, for all datasets d1 and d2 differing on at most one element, and all S⊆Range(δ) [27], it holds that
(1)
where ϵ denotes the privacy level. Smaller ϵ values indicate higher privacy levels. Eq (1) implies that the probability of δ producing an output in the set S for d1 should be no less than exp ϵ times the probability of δ producing an output in S for d2.
Therefore, the computation output remains statistically indistinguishable, within parameter ϵ, when performed on any two datasets that differ by a single individual’s data. This statistical property makes it challenging for an adversary to extract specific information about any individual from the function output, thereby preserving individual-level privacy [29].
Within the scope of NN models, DP prevents the disclosure of individual information in training. The aim is to discourage attempts of data extraction from the trained model by controlling the impact of each sample on the parameters during training. FL can also employ DP to collaboratively train a shared model on decentralized data without revealing inherent information about data to other participants. However, the injected noise leads to a decrease in model accuracy. Furthermore, DP only offers a probabilistic bound on potential information leakage rather than a cryptographic privacy guarantee [30].
2.4. Functional encryption
Functional Encryption (FE) is a cryptographic scheme that supports a controlled functional evaluation of encrypted data without plaintext access. In FE, the result of function evaluation is directly revealed in the decryption process [28].
More precisely, in an FE system for a function f, a user generates a ciphertext c from a plaintext m using a public key pk. A third party, known as the decryptor, who possesses a decryption key dkf for f (issued by a trusted manager holding secret key sk) can compute f(m) by decrypting c. In other words, FE performs a specific f on c without revealing the underlying information m to the decryptor.
Incorporating FE schemes into privacy-preserving NN models implies that only the initial model layer is computed on encrypted data. In contrast, the subsequent hidden layers undergo evaluation utilizing plaintext data, potentially leaking the intermediate computation results and the model output to the decryptor entity.
2.5. Homomorphic encryption
In cryptography, the term Homomorphic Encryption (HE) describes a form of encryption capable of performing certain operations over ciphertexts without access to the secret key. The information is public without representing a risk of a data breach. The output of the calculated functions remains encrypted. Its correctness relies on the homomorphism concept, a structure-preserving transformation where two groups in different spaces can be mapped. In this case, a homomorphic function applied to ciphertexts provides the same result (after decryption) as using the function to the original unencrypted data.
Unlike MPC and DP privacy-preserving techniques, third-party systems with HE protect the entire data lifecycle (data storage, data transmission, and data processing) without needing trusted data managers and multiple rounds of client-server communication. HE enables blind, two-party non-interactive processing of sensitive data [13].
Let ma be the message a in plaintext, sk a secret key for decryption, and pk a public key for encryption. The corresponding ciphertext ca of ma is generated by the encryption operation ca = Encrypt(pk, ma).
Recovering the information in an additively HE from a ciphertext c+ is performed by the decryption operation and the secret key as m+ = Decrypt(sk, c+), where c+ = Add(ca, cb) contains the result of the homomorphic addition between ca and cb. Analogously for multiplication c× = Mult(ca, cb) in a multiplicative HE. The HE schemes obtain ciphertexts c+ and c×, without knowing m1 and m2. Ciphertexts c+ and c× cannot be computed with conventional encryption without the decryption of ca and cb.
Each HE scheme defines a conventional public-key scheme with basic operations to generate secret and public keys, encrypt and decrypt ciphertexts, and perform homomorphic additions and multiplications [31].
Several HE schemes are proposed. They include Brakerski-Gentry-Vaikuntanathan (BGV) [32], Brakerski/Fan-Vercauteren (BFV) [33, 34], Gentry-Sahai-Waters (GSW) [14], YASHE [35], Hoffstein-Pipher-Silverman (HPS) [36], López-Tromer-Vaikuntanathan (LTV) [37], and Cheon-Kim-Kim-Song (CKKS) [17] schemes. All of them pursue the implementation of more efficient HE schemes.
A distinctive feature of CKKS is the implementation of approximate arithmetic over encrypted real (complex) numbers, thereby allowing for privacy-preserving computations on sensitive data in the domain of continuous values. This capability of CKKS is crucial for complex and demanding applications, such as privacy-preserving NN [16, 38].
The following section describes the CKKS scheme primitives used throughout this manuscript as the underlying HE cryptosystem.
3. CKKS homomorphic encryption scheme
CKKS is a lattice-based HE scheme whose security is based on the hardness of the Ring Learning with Errors (RLWE) problem. The CKKS scheme performs approximate arithmetic over encrypted real (complex) numbers. Given messages m1 and m2, it allows secure computing encryptions of approximate values of m1+m2 and m1m2 with a prefixed precision. The main characteristic of CKKS is that it treats the inserted noise of the RLWE problem as part of an error occurring during approximate computation.
In CKKS, after the selection of parameters such as degree N of the polynomial ring and a modulo q, real (complex) numbers are encoded into plaintext polynomials. The secret key is a randomly generated polynomial. The public key is created using the secret key, chosen parameters, and some randomness. The plaintext polynomial is encrypted using the public key and noise to increase security. After operations, rescaling is performed to reduce noise and preserve the ciphertext. The ciphertext is decrypted using the secret key to obtain an approximated version of the original plaintext polynomial that is decoded into a real or complex resulting number.
Let the polynomial ring for a power-of-two N be R = ℤ[X]/(XN+1), and Rq = ℤq[X]/(XN+1) be a modulo-q residue ring of R. The polynomial coefficients of Rq are bounded by the modulo q and the degree of polynomials by XN+1. Let L be a level parameter that indicates the maximum multiplicative depth. Modulo q = q0∙q1∙…∙qL is defined as the product of co-prime moduli q0,…qL, where qℓ = 2ℓ∙q0 for 1≤ℓ≤L.
The distribution χkey = HW(h) outputs a polynomial from Rq of {±} -coefficient having h non-zero coefficients, where HW(h) denotes the set of signed binary vectors in {±}N whose Hamming weight is h∈Z+. χenc and χerr denote discrete Gaussian distributions with some predefined standard deviation. U(Rq) refers to a uniform distribution over the ring Rq. Moreover, for , then
, where ⌊∙⌉ returns the nearest integer of a real-number input, rounding upwards in case of a tie [17].
CKKS encodes real values using what is known as canonical embedding , where a plaintext vector
is transformed into
and then rounded to an integer-coefficient polynomial using a scaling factor Δ, i.e.,
. The parameter Δ affects the accuracy of the computation in CKKS.
Let us discuss the main primitives of the CKKS scheme: KeyGen, Encrypt, Decrypt, Add, Mult, Rot, and Resc.
- KeyGen (N, q, L)→sk, pk, ek. Sets secret key as sk = (1,s), where s←χkey. Sets public key as
, where b = −as+e (mod qL),
, and e←χerr. The evaluation key is set as
, where
,
and e′←χerr.
. For a plaintext vector of real (complex) numbers
, ist encodes
, and provides the ciphertext
, where v←χenc and e0, e1←χerr.
: For a ciphertext
, decodes the message as
, and outputs a plaintext vector
.
- Add(c1, c2)→c+: Add two ciphertexts
. It returns ciphertext
.
: For two ciphertexts
, let
. It returns ciphertext
.
- Rot(c,r)→c′: Given an encryption c of
, outputs c′ that encrypts the left-rotated vector
by r positions.
- Since every
is scaled, the plaintext of c←Mult(c1, c2) is Δ∙m1m2, which results in an exponential growth of plaintexts. To deal with such a problem, CKKS introduces the so-called rescaling procedure:
- Resc(c)→c′: For a ciphertext
, outputs
.
For more detailed information and additional considerations on the CKKS scheme, including the correctness and security analysis, refer to [17].
4. Privacy-preserving NN-HE
This section focuses on understanding the construction and evaluation of the privacy-preserving NN model with HE (NN-HE).
NN-HE is a natural extension of NN models. It consists of applying HE to the network inputs and homomorphically propagating the data across the network. NN-HE implementation has several restrictions because not all the functions for processing NN have homomorphic counterparts. So, the internal NN structure must be modified to perform secure processing. One way is to substitute non-homomorphic operations with appropriate approximations.
The main challenge in constructing NN-HE models is designing the efficient homomorphic processing of inner network functions. In the following sections, we discuss this complex problem, considering homomorphic neurons and homomorphic layers.
4.1. Homomorphic neuron
Homomorphic neurons are the processing that involves several operations. Each neuron performs a weighted-sum and a non-linear activation function defined by
(2)
where n denotes the number of neuron inputs xi with weights wi, for i = 1…n. The weighted-sum consists of additions and multiplications between the inputs and their corresponding synaptic weights, enabling its computation over encrypted data.
The most important and still open problem in the NN-HE is the definition of adequate non-linear components. Standard activation functions, such as Rectified Linear Unit (ReLU), are not polynomials and use operations not supported by HE schemes. So, finding cryptographically compatible replacement functions is imperative to developing privacy-preserving NN-HE models [39].
The sign function can be used to implement ReLU over fixed-precision numbers [40]. The ReLU function is denoted as
(3)
where sign(x) function is defined by
(4)
An accurate polynomial approximation of sign(x) is enough to perform a HE-compliant ReLU operation [41]. It considers two intervals [−1, −ϵ]∪[ϵ, 1] because sign(x) is discontinuous at x = 0.
Therefore, the homomorphic neuron is defined by
(5)
where n denotes the number of encrypted inputs c with weights
. Bias
is a ciphertext, and the activation function
is a polynomial approximation that only consists of homomorphic operations
and
is the encrypted output of the neuron. It guarantees the privacy of the result even if
is disclosed.
Several studies have been conducted to implement the sign function efficiently in the CKKS scheme. These approaches approximate the sign function through Taylor series, least-squares, Newton-Raphson, Fourier series, Chebyshev polynomials, and composite polynomials, among others [39, 40, 42–51]. However, state-of-the-art solutions are still inefficient in practice, and thus, implementing an efficient homomorphically computable activation remains an open challenge [52].
The main goal is to control the trade-off between having a non-linear transformation needed by the learning algorithm and keeping the degree of the polynomials small to make HE parameters feasible. The polynomial degree is directly related to the desired error in a given interval. As the degree grows, the lower approximation error is guaranteed and, thus, better precision. However, a higher computational cost is required.
4.2. Homomorphic NN layers
The architecture of the NN defines a set of neurons organized in layers, with connections between them. This section details the homomorphic processing of the standard layers in NN-HEs and their applicability to ciphertexts: fully-connected, convolutional, batch normalization, and pooling layers. The encrypted data adds complexity to the design of NN-HE architectures, which must meet the HE constraints.
A fully connected layer, also known as a dense layer, connects each neuron in the current layer to every neuron in the previous layer. It is the most basic and commonly used type of layer in NN architectures. The connection between two consecutive dense layers is represented as a complete graph where each edge has an associated weight. The operation in a neuron is a dot product between its inputs (outputs from the neurons of the previous layer) and their weights. This dot product consists of multiplication and addition operations, enabling its computation over HE-encrypted data. Dense layers enable complex interactions, providing ample opportunity to identify complex patterns and relationships among the data. However, without an effective regularization method, they lead to increased computational complexity and potential overfitting when NN tries to learn too many details in the training data, resulting in poor evaluation.
A convolutional layer is the fundamental building block of Convolutional Neural Network (CNN) models, designed to adaptively learn spatial hierarchies of features from input data [53]. It applies a set of trainable filters, also known as kernels, to the input data. Each filter is responsible for extracting a specific feature from data. The filter is moved across the input data with a specific stride, computing the dot product between the filter weights and input values for each spatial position. This process generates a feature map, also known as an activation map, representing the locations and strengths of the learned features within the input data.
Multiple filters can be applied, creating a stack of feature maps as output from the layer. This operation is especially effective for image processing, as it can capture spatial features like edges, corners, and textures while reducing complexity by sharing weights across space. As a result, a convolutional layer implies matrix multiplications and dot products, which are HE-compliant since they only consist of additions and multiplications. Hence, this layer does not need to be modified to process data homomorphically. The substitution of operations by their homomorphic versions is enough to use it.
Several studies have been conducted to speed up convolutional NN-HEs by reducing the overall number of arithmetic operations through techniques such as ciphertext packing and Single Instruction Multiple Data (SIMD) [19] operations.
The pooling layer is a sampling layer that reduces the spatial dimensions of the input while retaining important features. It allows the subsequent layers to focus on higher-level representations and makes the model more efficient and invariant to specific spatial transformations.
The pooling operates on each feature map independently by moving a non-overlapping window across the data. There are two main pooling layers: max and average. Max pooling operation searches for the maximum value (strongest response) in the window, making it inefficient when dealing with encrypted data. In contrast, average pooling computes the average value within each region. As a result, average pooling is the solution to be implemented in NN-HE models [19, 47], where the division operation is substituted with multiplication by a constant.
The batch normalization (BN) layer stabilizes and accelerates the training process by addressing the internal covariate shift phenomenon caused by continuous changes in the distribution of each layer’s inputs during training. BN makes normalization a part of the model architecture and performs it for each training mini-batch. The BN layer forces the inputs to follow a normal distribution with zero mean and unit variance. In addition to accelerating training by allowing higher learning rates without the risk of divergence, it offers additional indirect benefits for NN-HE models. For instance, a BN layer incorporated before a polynomial activation encourages the pre-activation values to fit in the approximated interval, which reduces the overall approximation error, prevents the generalization of higher feature values, and indirectly provides smaller weights for homomorphic processing. It helps mitigate the potential overflow of values beyond the bounds of the plaintext space.
5. State-of-the-art NN-HE models
This section presents the state-of-the-art privacy-preserving NN-HE models, highlighting their implementation details, latency (Lat), and accuracy (Acc).
Table 1 summarizes the essential characteristics of the NN-HEs. The “Layers” column corresponds to the number of convolutional, dense, activation, and pooling layers incorporated within each NN-HE architecture. GPU (Graphic Processing Unit) and 2-arch (2-architecture) columns denote specific aspects of the model’s operation. GPU indicates that the model takes advantage of hardware acceleration methodologies through GPUs. 2-architecture indicates the use of a dual-architecture strategy by collapsing adjacent linear layers during the evaluation process.
Dowlin et al. [54] propose CryptoNets to address the challenge of achieving a blind non-interactive classification. The approach uses the leveled YASHE [35] scheme for NN inputs and propagates signals across the network homomorphically.
CryptoNets contains two convolutional layers and two dense layers. It replaces the activation function with the square function, the lowest-degree non-linear polynomial function. Its performance is limited due to using the square function as an activation function, computational overhead, and the insecure YASHE scheme [61]. Several subsequent works in the literature focus on improving its constraints.
Chabanne et al. [47] present the premises of a privacy-preserving Deep Neural Network (DNN) evaluated homomorphically. To improve CryptoNets’ performance, the authors polynomially approximate the ReLU activation function and use a BN layer before each activation layer; the normalization layer limits the need for an accurate approximation to a small part of ℝ around the neighborhood of zero point. However, although the approach indicates the use of the BGV HE scheme, it does not provide any results on the encrypted data.
Chou et al. [44] introduce pruning and quantization methods that leverage sparse representations in the underlying BFV cryptosystem to accelerate CryptoNets inference. Faster-CryptoNets iteratively removes and clusters the model parameters without affecting accuracy. They derive a quantized minimax approximation for standard activation functions that achieves maximally-sparse encodings and optimizes approximation error [62]. Experiments over MNIST show that the NN-HE model maintains competitive accuracy and achieves a significant speedup over previous methods. On CIFAR-10, the proposed model takes more than 6 hours and achieves an accuracy of 76.7%. Also, the authors propose a transfer learning approach with DP to address real-world tasks like medical imaging applications.
Bourse et al. [46] implement a Discretized Neural Network (DiNN) for inference over encrypted data, whose complexity is linear in the network size. Weights and inputs are discretized into elements in [−1,1] with a threshold value equal to 128: any pixel whose value is smaller than the threshold is mapped to −1, otherwise to +1. Although this method can homomorphically evaluate NNs of any depth, each neuron output is refreshed through the bootstrapping procedure featured by the Fully Homomorphic Encryption scheme over the Torus (TFHE), resulting in high overhead and low accuracy. They construct two simple FHE-DiNN models with one hidden (dense) layer containing 30 and 100 neurons with a security level of 80 bits.
Sanyal et al. [55] propose a privacy-preserving Binary Neural Network (BNN) with a TFHE scheme built upon the FHE-DiNN approach. The BNN evaluates every arithmetic operation as a composition of binary gates; it performs homomorphic multiplication by applying the logical operator XNOR and homomorphic addition by summing the number of 1s. TAPAS implements sparsification techniques and algorithmic tools to speed up and parallelize ciphertext computation. Every operation consists of many bootstrapping procedures, is therefore inherently immune to noise, and has no architectural restrictions. Nonetheless, the model is slower than any previous work: one prediction on MNIST takes 37 hours on a single machine and 2.41 hours using a cluster of 16 machines with 16 cores each.
Van Elsloo et al. [56] propose SEALion, an extensible framework built upon the CryptoNets approach for implementing privacy-preserving NN-HE models. SEALion performs an automatic encryption parameter selection for BFV, which side-steps complex implementation details for ad-hoc homomorphic solutions. The proposed approach improves both latency and encrypted inference by sparsifying the activation functions with the L0 norm.
Hesamifard et al. [42] develop an NN-HE based on the model architecture of [47, 54]. CryptoDL replaces standard non-linear activation functions with HE-friendly low-degree polynomial approximations within a specific error range. ReLU, Sigmoid, and Tangent Hyperbolic (Tanh) functions are approximated using Chebyshev and Taylor series. Additionally, the authors implement a scaled-up version of average pooling, which calculates the summation of values without dividing by the number of values. Building on this approach, Liao et al. [63] implement a CryptoDL-based NN-HE model over encrypted sensor data; the authors approximate Tanh, ReLU, and Swish functions to generate cryptographically computable activations.
Brutzkus et al. [57] propose Low-Latency (Lo-La) CryptoNets to improve latency and memory usage over its predecessors. While CryptoNets encodes each image’s feature as a separate message, Lo-La encrypts entire layers as a single message with the BFV scheme and uses different matrix-vector multiplication implementations throughout the inference. These technical modifications allowed the evaluation of the same model as CryptoNets in a pair of seconds and the private inference over larger datasets such as CIFAR-10. However, the scheme is still dependent on the message dimension and, therefore, impractical for applications with high-dimension data. Additionally, they present the premises of using transfer learning to solve HE limitations, such as message size and noise growth.
Boemer et al. [58] introduce nGraph-HE, an extension of the Intel nGraph compiler to deploy NN models on homomorphically encrypted data. nGraph-HE incorporates a privacy-preserving abstraction layer, enabling HE-aware optimizations at compile- and run-time. The framework supports BFV and CKKS cryptosystems without bootstrapping. For experimental analysis, they use CKKS on fixed-precision numbers as the underlying encryption scheme over an Xeon Platinum 8180 platform with 112 CPUs and 376GB of RAM.
Jiang et al. [59] propose a method to perform arithmetic operations over homomorphically encrypted matrices. They introduce the E2DM framework to demonstrate the applicability of the proposed matrix computation approach for the secure evaluation of NN-HE models. They also considered packing multiple matrices (images) into a single ciphertext and computing on them in a SIMD manner, yielding better-amortized performance; their CryptoNets implementation on MNIST takes 1.69 seconds to compute ten likelihoods of 64 input images simultaneously. E2DM reports the same accuracy for plaintext and ciphertext inference, considering CKKS as the underlying cryptosystem with a security level of 80 bits; however, it remains unclear how a square activation function in an approximate homomorphic computation achieves a zero-precision loss.
Badawi et al. [19] present an efficient NN-HE for image classification on GPUs. The authors use a GPU-based BFV implementation as an underlying engine to perform encrypted computations. The heterogeneous GPU cluster consists of three Nvidia Tesla P100 cards with 3584 CUDA cores and one Nvidia V100 with 5120 cores. The MNIST-HCNN model is five layers deep: two sequential convolutional layers with their respective square activation function followed by one dense layer. For CIFAR-10, they provide an 11-layer network: three convolutional layers, each one followed by a square activation and HE-friendly pooling layer, and finally, two dense layers. The HCNN model significantly accelerates the classification process while maintaining security and accuracy; it classifies MNIST in 2% of the time CryptoNets takes.
Falcetta and Roveri [60] develop a privacy-preserving LeNet-1 model variant by incorporating some of the best practices in the NN-HE field, such as replacing non-HE-compliant activation functions with low-degree polynomials, max pooling with average pooling, etc. They highlight the security parameter selection and the approximation of non-linear layers as the primary considerations in constructing secure and accurate NN-HEs.
The main challenge toward deploying HE schemes in real-world applications lies in overcoming the high computational costs associated with these cryptosystems. Computing over ciphertexts with state-of-the-art schemes such as CKKS represents a slowdown of 4-6 orders of magnitude compared to performing the same computations on unencrypted data [64]. In recent years, several researchers have supported the evaluation of complex DNN models over encrypted data. We remark that the highest state-of-the-art DNN accuracy on MNIST and CIFAR-10 in the unencrypted domain is 99.79% and 96.53%, respectively.
Lee et al. [50] implement a privacy-preserving ResNet-20 model using the Residue Number System CKKS (RNS-CKKS) scheme with bootstrapping. It is based on the theoretical contribution presented in [65], where authors propose a precise polynomial approximation technique for the ReLU and max-pooling functions using a composition of minimax approximate polynomials of a small degree. Lee et al. [50] homomorphically evaluate the NN-HE with 383 CIFAR-10 images and plaintext model parameters. Nonetheless, the proposed model takes about 3 hours to infer one image due to the more than a thousand bootstrapping functions implemented and the high-degree minimax composite polynomials used for approximating ReLU and Softmax functions. Although such latency is high for practical use, it represents a significant step toward a DNN with FHE using bootstrapping.
Pure MPC and hybrid MPC-HE-based solutions, such as presented in [43, 66–73], are outside the scope of the current study and, therefore, are omitted. For instance, in an MPC-HE approach like [43], the client carries out the bootstrapping process directly.
Privacy-preserving NN models have come a long way from the first introduction to actual solutions. Several companies, such as IBM [74] and Microsoft [75], offer HE services and incorporate HE into their commercial privacy-aware solutions. However, there is still a long way to go and daunting challenges. We are observing a developing research area in constant growth, with many potential applications and significant privacy benefits.
Table 2 summarizes the polynomial approximation methods used in the state-of-the-art NN-HEs to address cryptographically non-computable activation functions: Square function, Taylor series, Least-squares, Chebyshev polynomials, Newton, and Composite polynomials, implemented in CryptoNets, Chabanne-NN, Faster-CryptoNets, SEALion, Liao-CNN, CryptoDL, Lo-La, nGraph-HE, E2DM, HCNN, LeNet-HE, RNS-CKKS-NN, and our CNN-HE-SLAF model.
6. Related work
In recent years, there has been a growing interest among the research community in exploring non-homomorphic activation functions that are trained during the learning process. In this section, we review the latest advances in this emerging field. First, we examine the taxonomy of trainable activations. Later, we highlight related works in the field of adaptative polynomial activation functions.
The idea behind a trainable activation is to search for an adequate function shape using knowledge from the training data [76]. There are three main adaptive activation families [77]: parametrized activations, ensemble-based activations, and trainable polynomial activations [78].
The first family refers to those parametrized activations such as PReLU [79], PELU [80], RePU [81], Swish [82], and Syncular [83], which are based on conventional activation functions and fine-tuned through the incorporation of one or more trainable parameters.
The second family includes activations based on ensemble methods [84–86]. These approaches comprise a collection of basic functions, which can consist of standard activations, trainable functions, or a combination of both, and a model that defines the linear combination of such a basis.
The third family refers to those polynomial activation functions with trainable coefficients, where the polynomial coefficients are learned together with the network parameters using the backpropagation learning algorithm and gradient descent optimization method.
The first and second families involve non-polynomial functions and use operations not supported by HE cryptosystems. Therefore, we focus on the third family of trainable activation functions, which consists of HE-friendly polynomials with coefficients that can be effectively learned and adjusted.
Hou et al. [78] propose a piecewise polynomial Smooth Adaptive Activation Function (SAAF) on non-homomorphic CNN models for regression tasks. The parameters of SAAF control the shape of activation functions, where these parameters are trained along with other NN model parameters. The authors argue that applying SAAFs in the regression (second-to-last) layer of a NN can significantly decrease the bias of the regression NN and avoid overfitting by simply regularizing the model. They report results over eight real-world datasets while incurring a minimal increase of less than 1% in the total number of model parameters.
Goyal et al. [87] introduce Self-Learning Activation Functions (SLAF) over non-homomorphic NN models. SLAFs are learned during training and can approximate most existing activation functions. SLAFs are a weighted sum of Taylor polynomials that can accurately approximate the optimal activation. According to the authors, non-homomorphic NN models with SLAF improve the accuracy concerning equivalent architectures with conventional activation functions on standard benchmarking datasets.
In the third family, special attention is given to trainable rational activation functions. These functions are defined as the ratio of two polynomials P(x) and Q(x), both with trainable coefficients and respective degrees m and n. Rational functions are better suited than polynomials to approximate ReLU [88]. They demonstrate competitive accuracy on non-homomorphic NN models for image classification [89–91]. Additionally, trainable Padé activations can approximate standard hand-designed activations and learn new activations with compact representations [89]. Nevertheless, their inherent inverse operation makes it infeasible for encrypted computation.
Several studies indicate that NNs with polynomial activations have the same representational power as their non-polynomial counterparts [92, 93]. Additionally, these polynomial activations enhance latency without compromising accuracy [62].
While multiple works in the NN-HE field have explored using polynomial activation functions to enable an efficient homomorphic classification (see Table 2), trainable polynomial activation functions remain unexplored. We propose the privacy-preserving SLAF-based NN-HE (NN-HE-SLAF) model, introducing a novel approach in the encrypted classification domain by considering self-learning polynomial approximations to generate cryptographically computable activation functions. We use a customized polynomial with trainable coefficients as an activation function. In contrast to the traditional use of a single fixed-interval approximation of a specific activation function for all neurons, NN-HE-SLAF models generate customized polynomials for each homomorphic neuron independently, yielding better accuracy and performance.
7. Self-learning activation functions
In this section, we present SLAF models for designing a privacy-preserving NN-HE. Given the definition of a homomorphic neuron of the NN-HE model by Eq (5), we approximate the non-linear activation function by a polynomial at each neuron independently with trainable coefficients as
(6)
where
denote the trainable coefficients of the polynomial
at neuron k.
The central concept of the SLAF training process is to find an adequate mapping from the input to the output space based on modifying NN weights and polynomial coefficients in each neuron separately. NNs can learn and generalize more information based on training examples. The process is an analogy of the human brain evolution during a person’s experience.
In addition to synaptic weights, the training process aims to find the SLAF polynomial coefficients. Therefore, NN-HE-SLAF activations are adapted to the specific problem, structure of the network, its parameters, and the dataset.
Fig 1 shows the general structure of a homomorphic neuron with SLAF.
We study two approaches for polynomial initialization: SLAF(0) and SLAF(P). SLAF(0) considers n+1 trainable polynomial coefficients a0, a1,…,an initialized by zero. In the SLAF(P) approach, coefficients are initialized by coefficients obtained by known approximation methods. While SLAF(0) starts searching from the “zero point” of the solution space, SLAF(P) explores a search space close to the best approximation solution.
Let us consider an example of SLAF(P) using as a polynomial the nine-degree approximation fc of the sign(x) function performed by the composition of minimax approximate polynomials [21].
Fig 2 shows the approximation of the sign(x) function given by
(7)
SLAF(P) polynomial is initialized by ten coefficients γ0 = 0, γ1 = 3.114, γ2 = 0, γ3 = −6.645, γ4 = 0, γ5 = 8.851, γ6 = 0, γ7 = −5.485, γ8 = 0, and γ9 = 1.169. The training results in new polynomial coefficients γ0, γ1,…,γn, that is
(8)
where
denote the trained SLAF(P) coefficients at neuron k. All neurons initially have identical γ0, γ1,…,γn, but post-training exhibits unique coefficients
for each neuron k.
Considering that an NN mimics the behavior of the biological brain, SLAF refers to the physiological capacity of the human brain called plasticity. This ability allows the brain to adapt to new learning methods by conditioning and remodeling how neurons connect. Our NN-HE-SLAF models not only enable the identification of underlying relationships in the information but also facilitate the determination of specific activation functions for a given dataset that guide the learning.
To better understand the capabilities of SLAF, we prove its feasibility in approximating a continuous activation function.
Theorem 1. An NN-HE-SLAF can approximate any continuous activation function f if its input domain is bounded to a desired error as a function of the SLAF degree.
Proof
On the one hand, the Universal approximation theorem, also known as the Hornik theorem, establishes that every continuous function can be approximated by an NN with at least a single hidden layer with arbitrary precision concerning an Lp-norm. Since a polynomial is a continuous function, it can be approximated by a NN.
On the other hand, the Stone-Weierstrass theorem establishes that every continuous function can be approximated by a polynomial in a bounded interval. Specifically, for any continuous and real-valued function f(x) defined on the interval [a, b], for every δ>0, there exists polynomial p(x) s.t. for ∀x∈[a, b], we have
(9)
Since the input domain is restricted, the bounded property holds even if the activation function is unbounded, e.g., ReLU, PELU, SeLU, ELU, etc. Moreover, if f(x) is not a polynomial, the degree of p(x) tends to infinity as δ approaches zero.
From the Weierstrass theorem, it follows that continuous activation functions, such as ReLU, Swish, etc., can be approximated by a polynomial. Then, such activations can be approximated by an NN-HE-SLAF with at least a single dense layer. The theorem is proved.
8. Case study
This section presents details of the proposed models. To provide a fair comparison, we implement NNs with a similar architecture to the ones used in state-of-the-art models, such as CryptoNets [54], Faster-CryptoNets [44], and HCNN [19] (see Table 1).
We study two SLAF models based on CNN: CNN-HE-SLAF with a single training phase and CNN-HE-SLAF-R with training and re-training.
In the first model, we replace all activation functions with three-degree SLAF(0) or SLAF(P) and train CNN to find weights and coefficients. We denote these models as CNN-SLAF(0) and CNN-SLAF(P).
In the second model, we train CNN with original non-homomorphic activations function, lock weights, substitute activation functions by polynomials SLAF(0) and SLAF(P), and (R)e-train CNN to adapt coefficients. We denote these models as CNN-SLAF(0)-R and CNN-SLAF(P)-R, respectively.
Fig 3 illustrates the training and testing procedures for the proposed privacy-preserving CNN-HE-SLAF, CNN-HE-SLAF(0)-R, and CNN-HE-SLAF(P)-R models.
We study two architectures of CNNs: 1-convolutional (CNN1) and 2-convolutional (CNN2). Figs 4 and 5 illustrate their structures.
The yellow element denotes a convolutional layer. Circular elements after Conv1 and FC1 denote activation functions. Blue elements represent dense layers.
Red elements denote average pooling layers. Circular elements after Conv1 and FC1 denote activation functions. Blue elements represent fully connected layers.
CNN1, with a single convolutional layer and two fully connected layers, is a variant of Lo-La small NN [57]. In contrast to the Lo-La approach, CNN1 incorporates approximated activations after the convolutional and the first fully connected layers. CNN1 details are presented in Table 3.
CNN2 is a CryptoNets-based network architecture with two convolutional layers. To provide a direct comparison concerning state-of-the-art solutions, we adopt the architecture proposed in [44], which in turn is a variant of CryptoNets, CryptoDL, Lo-La, HCNN, SEALion, and others. It incorporates a BN layer before each activation to encourage the activation inputs to fit in the approximated interval.
The BN layer transforms them into normal distribution inputs with zero mean and unit variance, which reduces the overall approximation error, prevents the generalization of higher feature values and indirectly provides smaller weights for homomorphic processing. The details of such an architecture are presented in Table 4.
For CNN1 and CNN2 with SLAF(0)-R and SLAF(P)-R, we use the original ReLU function in the first training. Then, we lock trained weights, replace ReLU activation functions with SLAF(0) or SLAF(P) approximations, and re-train CNN to learn customized polynomial approximation coefficients. For re-training, we choose considerably fewer epochs than for training (see Section 9.3: Developing tools).
Several state-of-the-art models, as presented in [40, 50, 65], have proposed high-degree approximations to obtain an accuracy similar to ReLU. They state that a precise approximation of the ReLU function is necessary to evaluate models applied to plaintext or encrypted inputs.
To construct a better ReLU approximation for each neuron independently, we collect data during the first training to find intervals of activation inputs. Then, we generate a polynomial approximation of sign(x) function on found symmetric intervals [−B, B], where B denotes the supremum norm of the activation inputs of each neuron. Due ReLU(x) is equivalent to and |x| = x∙sign(x), we approximate
(see Fig 6). A large B has the advantage of enlarging the input ranges of the polynomial activation function at the cost of a higher approximation error.
For approximation methods such as Chebyshev polynomials, which are orthogonal in [−1, 1], the interval is extended from [−1, 1] to [−B, B] by p(x/B) for B>1. Similarly, for Newton-Schulz, which approximates the ±1 roots of by the iterative computation of
, converging to sign(x) in [−1, 1].
Fig 6 shows the polynomial approximation of the sign function by a Fourier sequence (Fig 6A) and its respective ReLU approximation based on the equivalence (Fig 6B).
We can see that this simple transformation using sign(x) approximation allows better ReLU approximation.
9. Experimental setup
This section describes the evaluation method and dataset characteristics and defines the experimental setup.
9.1. Security settings
The ciphertext size, scheme performance, multiplicative depth, and security level depend on the security parameter settings. We adopt the security settings specified in the HE standard [94]. Table 5 shows such security settings for the CKKS scheme. The security level λ = 128 bits guarantees that an adversary needs to perform 2128 elementary operations to break the scheme with a probability one.
9.2. Dataset
The Modified National Institute of Standards and Technology (MNIST) database is a standard dataset widely used in the literature [95]. It consists of 60,000 grayscale images of handwritten digits. Each image is a 28x28 pixel array, where the value of each pixel is a positive integer in the range [0, 255]. The MNIST training set includes 50,000 examples. The remaining 10,000 images represent the testing set. While MNIST is arguably a simple dataset, it has remained the standard benchmark for homomorphic inference tasks [42, 65].
9.3. Developing tools
The HE schemes, homomorphic operations, and NN-HE models are implemented using PyTorch [96] and the open-source Simple Encrypted Arithmetic Library (SEAL) v3.5.6 [97] through the Python TenSEAL library [98].
The experimental evaluation is performed on a server Express x3650 M4 with Intel(R) Xeon(R) CPU E5-2650v2 95W at 2.6GHz, 64 GB. The 64-bit server OS is Ubuntu 18.04.6.
To understand the generalization capacity of the proposed solutions and avoid overfitting, we vary the number of epochs for training in the range of 2-50 and utilize a batch size of 64 and a cross-entropy loss function. The networks are trained with Stochastic Gradient Descent (SGD) with a momentum of 0.9. 30 epochs better train CNN1 and CNN2 for testing accuracy.
For re-training, we consider from 2 to 20 epochs. For CNN-SLAF(0)-R with Adam optimizer and a learning rate of 0.001, 5 epochs provide better testing accuracy. For CNN-SLAF(P)-R with SGD, 12 epochs provide better testing accuracy.
The learning rate is the most important hyper-parameter to tune for training NN models [99]. We apply the 1-cycle policy [100], also called the super-convergence phenomenon. It uses one round of an increasing and decreasing learning rate, in which the maximum learning rate serves as a regularizer. Simplified activation may not provide a good approximation. A highly flexible activation can lead to overfitting. Since SLAF includes a weighted sum of monomials with trainable weights, employing an effective coefficient regularization strategy is necessary to mitigate the risk of overfitting [78, 87]. The 1-cycle policy enables learning a flexible and accurate activation without compromising the overfitting risk.
We adopt the weight initialization proposed by He et al. [79] for convolutional layers without dropout to obtain the initial synaptic weights. The source code used in this study has been made openly available to the scientific community as a public repository.
10. Experimental analysis
To compare traditional CNNs and solutions with SLAF, we analyze their differences and similarities in the training and testing phases. We provide the performance evaluation of non-homomorphic CNN1 and CNN2 on plaintext inputs and privacy-preserving CNN1-HE and CNN2-HE on ciphertext inputs. For CNN-HE models, both inputs and weights are encrypted before testing. We perform the following experiments:
First, we train and test non-homomorphic models with the ReLU activation function. The training accuracy of CNN1 and CNN2 with ReLU over plaintext is 99.562% and 99.748%, respectively. The testing accuracy of CNN1 and CNN2 with ReLU over plaintext is 98.56% and 99.38%, respectively.
Second, we evaluate the performance of CNNs and CNN-HEs using a single training to determine both weights and SLAF coefficients.
Third, we analyze CNNs and CNN-HEs by training with ReLU activations. Then weights are fixed, ReLU is replaced by SLAF, and the model is re-trained to adapt SLAF coefficients at each polynomial independently.
Fourth, we evaluate CNNs and CNN-HEs using ReLU in training. We determine the bounded intervals of each neuron’s activation inputs and approximate sign(x) on these intervals using known polynomial approximations. Then, we apply sign(x)→ReLU(x) transformations and replace ReLU with such approximations. Finally, we perform a re-training process to optimize SLAF coefficients.
10.1 CNN1
We provide the performance evaluation of non-homomorphic CNN1 and CNN1-HE without and with SLAF.
Table 6 presents the latency (Lat) and accuracy (Acc) of the CNN1 model using SLAF(0), SLAF(0)-R, different polynomials for SLAF(P)-R, and Lo-La. Results for plaintext and ciphertext inputs are reported in the CNN1 and CNN1-HE columns, respectively. Latency corresponds to the time required to process a single classification request. ρ denotes the generalization gap, which measures the difference between the training and testing accuracies (%). Lo-La corresponds to the original implementation using specifications, parameters, and characteristics proposed in the corresponding paper [57].
We can see that CNN1-SLAF(P)-R with three-degree Chebyshev polynomials achieves the same accuracy of 98.56% as the non-homomorphic CNN1 with the original ReLU activation function. CNN1-SLAF(0)-R has the best accuracy of 98.22% in CNN-HE models. That is 1.3% better than 96.92% HE Lo-La.
CNN1 and CNN1-HE latencies illustrate the significant overhead of NN-HE models compared with their unencrypted analogous. For example, computing CNN1-SLAF(0)-R over ciphertexts results in a slowdown of approximately 240 times compared to performing the same computations on unencrypted data.
To evaluate the effectiveness of SLAF(P) approach over traditional polynomial approximation methods, we test CNN1-HE using known approximations to replace the ReLU activation: Least-squares, Chebyshev polynomials, Newton-Raphson, and Minimax composition. Polynomial approximations to an activation function usually consider fixed approximation intervals regardless of the particular dataset and problem. CNN1-SLAF(P)-R with adaptive intervals for each neuron increases the model accuracy.
With a Chebyshev series, CNN1-HE-SLAF(P)-R achieves an accuracy of 97.96%, showing an improvement of 11.85% compared to the 86.11% of the Chebyshev series that approximates ReLU without SLAF(P). With composite minimax polynomials, it achieves an accuracy of 97.86%, improving by 18.55% over 79.31% of Composite minimax polynomials without SLAF(P). With Least-squares, it provides an accuracy of 97.31%, improving by 6.01% over 91.30% of Least-squares without SLAF(P). With Newton-Raphson, it has 97.56%, improving by 33.27% over 64.29% of Newton-Raphson without SLAF(P).
10.2. CNN2
Since the combination of linear operations is linear and there is no intervening non-linearity between the first pooling layer and the first fully-connected layer of CNN2, the hidden layers between them can be collapsed into a single linear layer for the testing process, as depicted in Fig 7.
This linear transformation
(10)
can be accurately reproduced (up to some numerical rounding error) by the dense layer FCnew. Collapsed models exhibit the same accuracy as the original models.
Tables 7 and 8 present the performance of the non-homomorphic and homomorphic CNN2, respectively.
CNN2-SLAF(0) and CNN2-SLAF(P) models using low-degree polynomials achieve the same accuracy of 99.38% as the standard ReLU activation function over non-homomorphic models in the plaintext space.
These results show the remarkable capability of SLAF variants to yield comparable accuracy to traditional non-linear functions. It confirms that NN models with SLAF have the same representative ability as their non-polynomial counterparts.
Now, let us evaluate the performance of CNN2-HE-SLAF models in comparison to the state-of-the-art models in the ciphertext space, such as CryptoNets, Faster-CryptoNets, CryptoDL, Lo-La, nGraph-HE, E2DM, and HCNN. These models adopt the 2-convolutional CryptoNets architecture and implement diverse polynomial approximation methods of ReLU.
As shown in Table 8, the CNN2-HE-SLAF(0)-R (CNN2-HE-SLAF(P)-R) model with re-training achieves an accuracy of 99.21% (99.05%), showing improvements between 0.21% (0.05%) and 1.11% (0.95%) over CryptoNets-based NN-HE solutions, 0.21% (0.05%) over 99% of HCNN, 0.26% (0.10%) over 98.95% of CryptoNets, 0.51% (0.35%) over 98.70% of Faster-CryptoNets, 0.69% (0.53%) over 98.52% of CryptoDL, and 1.11% (0.95%) over 98.10% of E2DM.
CNN2-HE-SLAF(0)-R (CNN2-HE-SLAF(P)-R) also outperforms models with a similar 2-convolutional architecture, such as LeNet-HE and SEALion (see Table 1), showing accuracy improvements of 1.03% (0.87%) over 98.18% of LeNet-HE and 0.30% (0.14%) over 98.91% of SEALion, respectively. Furthermore, the CNN2-HE-SLAF(0)-R (CNN2-HE-SLAF(P)-R) improves accuracy by 0.61% (0.45%) compared to 98.60% of the TAPAS model, which utilizes a 19-layer architecture with six convolutional layers.
Comparing performance, we demonstrate that using trainable three-degree polynomial activations, CNN2-HE-SLAF(0)-R classifies MNIST 6.26 times faster than CryptoNets with better accuracy (99.21% vs 98.95%).
11. Discussion and conclusions
Common encryption algorithms such as Advanced Encryption Standard (AES) successfully protect stored and transmitted data, preventing third parties and the public from reading them. However, data processing needs a decryption, falling into the problem of data vulnerability. Data have to be searched, computed and analyzed securely. HE is a solution to address the processing of confidential data while it remains encrypted. However, HE performs only addition and multiplication operations efficiently on ciphertexts, increasing impracticality due to the significant computational overhead. This overhead is the biggest challenge for widespread adoption. Optimization techniques are required to improve performance and usability in various domains such as cloud computing, machine learning, healthcare, finance, voting, etc.
NN-HE is a rapidly growing research area with many potential applications and significant benefits. The main direction of the current NN-HE development is to increase accuracy and efficiency. While protecting sensitive data’s privacy is essential, they must also offer acceptable accuracy and time complexity.
The CNN-HE-SLAF solutions contribute to these two goals and existing knowledge by designing an adaptive NN-HE.
This paper proposes methods of CNN-HE optimization by Self-Learning Activation Functions (SLAF). We theoretically prove that NNs with SLAF can approximate any continuous activation function to any desired error as a function of the degree of SLAF. In contrast to the traditional NN-HE approaches, where the activation functions of all neurons are replaced by a single (best) polynomial approximation, NN-HE-SLAF generates customized polynomials for each homomorphic neuron independently. It yields better accuracy and performance.
Two convolutional NN-HE models are proposed: CNN-HE-SLAF and CNN-HE-SLAF-R.
- CNN-HE-SLAF replaces all activation functions with SLAF, and it is trained to find synaptic weights and coefficients together. Then, CNN is encrypted homomorphically. We denote these models as CNN-HE-SLAF(0) if polynomial coefficients are initialized to zero and CNN-HE-SLAF(P) if coefficients are initialized to a known polynomial approximation.
- CNN-HE-SLAF-R is trained with the original activation function, weights are locked, activation functions are substituted by SLAF(0) or SLAF(P), and it is re-trained to adjust coefficients. We denote these models as CNN-HE-SLAF(0)-R and CNN-HE-SLAF(P)-R, respectively. For re-training, we choose considerably fewer epochs than for training.
- CNN-HE-SLAF leverages the approximation capabilities of neural networks by optimizing polynomial approximations of individual homomorphic neurons, customizing them for a given problem, dataset, network structure, parameters, and approximation intervals.
- CNN-HE-SLAF(0)-R and CNN-HE-SLAF(P)-R achieve the same accuracy of 99.38% as a non-polynomial non-homomorphic ReLU applying better approximation of sign(x) on the found intervals and sign(x)→ReLU(x) transformations.
- CNN-HE-SLAF achieves an accuracy of 99.21%, improving the state-of-the-art CNN-HE solutions on the MNIST optical character recognition benchmark dataset.
- CNN2-HE-SLAF collapses the hidden layers between the first pooling layer and the first fully connected layer into a single linear layer with batch normalization to reduce latency while keeping the same accuracy.
- CNN2-HE-SLAF(0)-R increases the performance (6.26 times faster) than the state-of-the-art CNN-HE CryptoNets on the MNIST.
- CNN-HE-SLAF demonstrates the feasibility of applying trained polynomials to generate HE-compliant and problem-specific activation functions, enabling efficient, accurate, and privacy-preserving CNN-HE solutions.
These benefits are possible at the expense of increased CNN-HE model complexity. SLAF adds additional trainable parameters, such as coefficients of polynomial approximations of activation functions, making training time longer. So, it is important to find an adequate initialization point and avoid potential overfitting. In future work, it is essential to apply the state-of-the-art practical methods of NN hardware acceleration to SLAF techniques: highly parallel CPU, GPU, Tensor Processing Unit (TPU), Field Programmable Gate Array (FPGA) or Application-Specific Integrated Circuit (ASIC), etc.
Moreover, it is important to validate the applicability and efficiency of the self-learning approach to other ML models, such as logistic regression, and to other CNN-HE layers, e.g., pooling, batch, etc. Furthermore, SLAF performance can be optimized by introducing a dropout mechanism, whereby coefficients below a specific ∈-threshold are nullified or converted to zero.
Additionally, while CNN-HE-SLAF models can generate homomorphically computable activation functions, these privacy-preserving models inherit both the strengths and weaknesses of current NN models, where even advanced models trained with state-of-the-art optimization techniques do not provide a practical solution for every problem. The SLAF approach does not guarantee the best approximation for a given function and norm. In future work, it is imperative to enhance the approximation and model accuracy by considering SLAF as a weighted sum of orthogonal polynomials instead of monomials.
Another relevant line of future work involves improving the internal structure of the CNN-HE to increase the performance of homomorphic self-adaptive learning using different HE schemes and other state-of-the-art strategies, e.g., incorporating RNS HE variants to decompose the encrypted inputs into several parts and propagating them homomorphically and independently across the model. In addition to the highly parallelizable property, an RNS-based HE scheme allows for smaller encryption parameters without compromising the scheme security level, i.e., a reduced latency, keeping security and accuracy.
Finally, the list of potential applications is broad due to the strongly applied nature in real-world problems and the significant privacy benefits of CNN-HE-SLAF solutions. In future work, it is essential to explore the applicability of proposed models for sensitive domains such as medical image classification.
References
- 1. NVIDIA. NVIDIA DLSS 2.0: A Big Leap in AI Rendering. 2020. Available from: https://www.nvidia.com/engb/geforce/news/nvidia-dlss-2-0-a-big-leap-in-ai-rendering/
- 2. Shahid N, Rappon T, Berta W. Applications of artificial neural networks in health care organizational decision-making: A scoping review, PLoS One. 2019; 14: e0212356. pmid:30779785
- 3. Pastur-Romay LA, Cedrón F, Pazos A, Porto-Pazos AB. Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications, Int J Mol Sci. 2016; 17: 1313. pmid:27529225
- 4. Chen Z, Pang M, Zhao Z, Li S, Miao R, Zhang Y, et al. Feature selection may improve deep neural networks for the bioinformatics problems, Bioinformatics. 2020; 36: 1542–1552. pmid:31591638
- 5. Bertolaccini L, Solli P, Pardolesi A, Pasini A. An overview of the use of artificial neural networks in lung cancer research, J Thorac Dis. 2017; 9: 924. pmid:28523139
- 6. Selvarajan S, Manoharan H, Iwendi C, Alsowail RA, Pandiaraj S. A comparative recognition research on excretory organism in medical applications using artificial neural networks, Front. Bioeng. Biotechnol. 2023; 11: 1211143. pmid:37397968
- 7. Kaushik S, Gandhi C. Capability Based Outsourced Data Access Control with Assured File Deletion and Efficient Revocation with Trust Factor in Cloud Computing, International Journal of Cloud Applications and Computing. 2020; 10: 64–84.
- 8. Hai T, Zhou J, Lu Y, Jawawi DNA, Wang D, Selvarajan S, et al. An archetypal determination of mobile cloud computing for emergency applications using decision tree algorithm, J Cloud Comp. 2023; 12: 73.
- 9. Babenko M, Tchernykh A, Pulido-Gaytan B, Cortés-Mendoza JM, Shiryaev E, Golimblevskaia E, et al. RRNS Base Extension Error-Correcting Code for Performance Optimization of Scalable Reliable Distributed Cloud Data Storage. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW); 2021. p. 548–553.
- 10. Tchernykh A, Babenko M, Kuchukov V, Miranda-López V, Avetisyan A, Rivera-Rodriguez R, et al. Data Reliability and Redundancy Optimization of a Secure Multi-cloud Storage Under Uncertainty of Errors and Falsifications. In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW); 2019. p. 565–572.
- 11. Zheng Q, Wang X, Khurram Khan M, Zhang W, Gupta BB, Guo W. A Lightweight Authenticated Encryption Scheme Based on Chaotic SCML for Railway Cloud Service, IEEE Access. 2018; 6: 711–722.
- 12. Shitharth S, Prasad KM, Sangeetha K, Kshirsagar PR, Babu TS, Alhelou HH. An Enriched RPCO-BCNN Mechanisms for Attack Detection and Classification in SCADA Systems, IEEE Access. 2021; 9: 156297–156312.
- 13. Gentry C. A Fully Homomorphic Encryption Scheme. Ph.D thesis, Stanford University. 2009. Available from: https://crypto.stanford.edu/craig/craig-thesis.pdf
- 14. Gentry C, Sahai A, Waters B. Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based. In: Advances in Cryptology – CRYPTO 2013; 2013. p. 75–92.
- 15. Cortés-Mendoza JM, Radchenko G, Tchernykh A, Pulido-Gaytan B, Babenko M, Avetisyan A, et al. LR-GD-RNS: Enhanced Privacy-Preserving Logistic Regression Algorithms for Secure Deployment in Untrusted Environments. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid); 2021. p. 770–775.
- 16. Pulido-Gaytan B, Tchernykh A, Cortés-Mendoza JM, Babenko M, Radchenko G, Avetisyan A, et al. Privacy-preserving neural networks with Homomorphic encryption: Challenges and opportunities, Peer-to-Peer Netw. Appl. 2021; 14: 1666–1691.
- 17. Cheon JH, Kim A, Kim M, Song Y. Homomorphic Encryption for Arithmetic of Approximate Numbers. In: Advances in Cryptology – ASIACRYPT 2017; 2017. p. 409–437.
- 18. Babenko M, Tchernykh A, Golimblevskaia E, Pulido-Gaytan B, Avetisyan A. Homomorphic Comparison Methods: Technologies, Challenges, and Opportunities. In: 2020 International Conference Engineering and Telecommunication (En&T); 2020. p. 1–5.
- 19. Al Badawi A, Jin C, Lin J, Mun CF, Jie SJ, Tan BHM, et al. Towards the AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data With GPUs, IEEE Trans Emerg Top Comput. 2021; 9: 1330–1343.
- 20. Cheon JH, Kim D, Kim D, Lee HH, Lee K. Numerical Method for Comparison on Homomorphically Encrypted Numbers. In: Advances in Cryptology – ASIACRYPT 2019; 2019. p. 415–445.
- 21. Lee E, Lee J-W, No J-S, Kim Y-S. Minimax Approximation of Sign Function by Composite Polynomial for Homomorphic Comparison, IEEE Trans Dependable Secure Comput. 2022; 19: 3711–3727.
- 22. Cheon JH, Kim D, Kim D. Efficient Homomorphic Comparison Methods with Optimal Complexity. In: Advances in Cryptology – ASIACRYPT 2020; 2020. p. 221–256.
- 23. Goldreich O, Micali S, Wigderson A. How to Play Any Mental Game. In: Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing; 1987. p. 218–229.
- 24. Yao AC-C. How to Generate and Exchange Secrets. In: 27th Annual Symposium on Foundations of Computer Science; 1986. p. 162–167.
- 25. Zhang L, Xu J, Vijayakumar P, Sharma PK, Ghosh U. Homomorphic Encryption-based Privacy-preserving Federated Learning in IoT-enabled Healthcare System, IEEE Trans Netw Sci Eng. 2022; 1–17.
- 26. Podschwadt R, Takabi D, Hu P, Rafiei MH, Cai Z. A Survey of Deep Learning Architectures for Privacy-Preserving Machine Learning With Fully Homomorphic Encryption, IEEE Access. 2022; 10: 117477–117500.
- 27. Dwork C, McSherry F, Nissim K, Smith A. Calibrating Noise to Sensitivity in Private Data Analysis. In: Theory of Cryptography – TCC 2006; 2006. p. 265–284.
- 28. Boneh D, Sahai A, Waters B. Functional encryption: Definitions and challenges. In: Theory of Cryptography. TCC 2011; 2011. p. 253–273.
- 29. Dwork C. Differential Privacy. In: Automata, Languages and Programming. ICALP 2006; 2006. p. 1–12.
- 30. Chaudhuri K, Monteleoni C, Sarwate AD. Differentially Private Empirical Risk Minimization, J. Mach. Learn. Res. 2011; 12: 1069–1109.
- 31. Yoosuf MS, Muralidharan C, Shitharth S, Alghamdi M, Maray M, Rabie OBJ. FogDedupe: A Fog-Centric Deduplication Approach Using Multi-Key Homomorphic Encryption Technique, Journal of Sensors. 2022; 2022: 1–16.
- 32. Brakerski Z, Gentry C, Vaikuntanathan V. (Leveled) Fully Homomorphic Encryption without Bootstrapping. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference - ITCS ’12; 2012. p. 309–325.
- 33. Brakerski Z. Fully Homomorphic Encryption without Modulus Switching from Classical GapSVP. In: Advances in Cryptology – CRYPTO 2012; 2012. p. 868–886.
- 34.
Fan J, Vercauteren F Somewhat Practical Fully Homomorphic Encryption. Cryptology ePrint Archive. 2012. Available from: https://eprint.iacr.org/2012/144
- 35. Bos JW, Lauter K, Loftus J, Naehrig M. Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme. In: Cryptography and Coding. IMACC 2013; 2013. p. 45–64.
- 36. Hoffstein J, Pipher J, Silverman JH. NTRU: A ring-based public key cryptosystem. In: Algorithmic Number Theory. ANTS 1998; 1998. p. 267–288.
- 37. López-Alt A, Tromer E, Vaikuntanathan V. On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Proceedings of the Annual ACM Symposium on Theory of Computing; 2012. p. 1219–1234.
- 38. Babenko M, Tchernykh A, Pulido-Gaytan B, Golimblevskaia E, Cortes-Mendoza JM, Avetisyan A. Experimental Evaluation of Homomorphic Comparison Methods. In: 2020 Ivannikov Ispras Open Conference (ISPRAS); 2020. p. 69–74.
- 39. Obla S, Gong X, Aloufi A, Hu P, Takabi D. Effective Activation Functions for Homomorphic Evaluation of Deep Neural Networks, IEEE Access. 2020; 8: 153098–153112.
- 40. Lee E, Lee J-W, Kim Y-S, No J-S. Optimization of Homomorphic Comparison Algorithm on RNS-CKKS Scheme, IEEE Access. 2022; 10: 26163–26176.
- 41. Pulido-Gaytan B, Tchernykh A, Cortés-Mendoza JM, Babenko M, Radchenko G. A Survey on Privacy-Preserving Machine Learning with Fully Homomorphic Encryption. In: Latin America High Performance Computing Conference (CARLA); 2021. p. 115–129.
- 42. Hesamifard E, Takabi H, Ghasemi M. CryptoDL: Deep Neural Networks over Encrypted Data. arXiv:1711.05189. 2017. Available from: https://arxiv.org/abs/1711.05189
- 43. Takabi H, Hesamifard E, Ghasemi M. Privacy Preserving Multi-party Machine Learning with Homomorphic Encryption. In: 29th Annual Conference on Neural Information Processing Systems (NIPS); 2016.
- 44. Chou E, Beal J, Levy D, Yeung S, Haque A, Fei-Fei L. Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference. arXiv:1811.09953. 2018. Available from: https://arxiv.org/abs/1811.09953
- 45. Shokri R, Shmatikov V. Privacy-Preserving Deep Learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security; 2015. p. 1310–1321.
- 46. Bourse F, Minelli M, Minihold M, Paillier P. Fast Homomorphic Evaluation of Deep Discretized Neural Networks. In: Advances in Cryptology – CRYPTO 2018; 2018. p. 483–512.
- 47. Chabanne H, De Wargny A, Milgram J, Morel C, Prouff E. Privacy-Preserving Classification on Deep Neural Network. Cryptol. ePrint Arch. 2017. Available from: https://eprint.iacr.org/2017/035
- 48. Kwabena O-A, Qin Z, Zhuang T, Qin Z. MSCryptoNet: Multi-Scheme Privacy-Preserving Deep Learning in Cloud Computing, IEEE Access. 2019; 7: 29344–29354.
- 49. Babenko M, Tchernykh A, Pulido-Gaytan B, Avetisyan A, Nesmachnow S, Wang X, et al. Towards the Sign Function Best Approximation for Secure Outsourced Computations and Control, Mathematics. 2022; 10: 2006.
- 50. Lee J-W, Kang H, Lee Y, Choi W, Eom J, Deryabin M, et al. Privacy-Preserving Machine Learning With Fully Homomorphic Encryption for Deep Neural Network, IEEE Access. 2022; 10: 30039–30054.
- 51. Bakshi M, Last M. CryptoRNN-Privacy-Preserving Recurrent Neural Networks using Homomorphic Encryption. In: Cyber Security Cryptography and Machine Learning (CSCML); 2020. p. 245–253.
- 52. Pulido-Gaytan B, Tchernykh A, Leprévost F, Bouvry P, Goldman A. Toward Understanding Efficient Privacy-Preserving Homomorphic Comparison, IEEE Access. 2023; 11: 102189–102206.
- 53. Althubiti SA, Alenezi F, Shitharth S, Sangeetha K, Reddy CVS. Circuit Manufacturing Defect Detection Using VGG16 Convolutional Neural Networks, Wireless Communications and Mobile Computing. 2022; 2022: 1–10.
- 54. Dowlin N, Gilad-Bachrach R, Laine K, Lauter K, Naehrig M, Wernsing J. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In: 33rd International Conference on Machine Learning; 2016. p. 201–210.
- 55. Sanyal A, Kusner M, Gascon A, Kanade V. TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service. In: 35th International Conference on Machine Learning; 2018. p. 4490–4499.
- 56.
Elsloo TV, Patrini G, Ivey-Law H. SEALion: a Framework for Neural Network Inference on Encrypted Data. arXiv.1904.12840. 2019. Available from: https://arxiv.org/abs/1904.12840
- 57. Brutzkus A, Elisha O, Gilad-Bachrach R. Low latency privacy preserving inference. In: 36th International Conference on Machine Learning; 2019. p. 1295–1304.
- 58. Boemer F, Lao Y, Cammarota R, Wierzynski C. NGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data. In: Proceedings of the 16th ACM International Conference on Computing Frontiers; 2019. p. 3–13.
- 59. Jiang X, Kim M, Lauter K, Song Y. Secure Outsourced Matrix Computation and Application to Neural Networks. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security; 2018. p. 1209–1222. pmid:31404438
- 60. Falcetta A, Roveri M. Privacy-Preserving Deep Learning With Homomorphic Encryption: An Introduction, IEEE Comput Intell Mag. 2022; 17: 14–25.
- 61. Albrecht M, Bai S, Ducas L. A Subfield Lattice Attack on Overstretched NTRU Assumptions. In: Advances in Cryptology – CRYPTO 2016; 2016. p. 153–178.
- 62.
Timmons NG, Rice A. Approximating Activation Functions. arXiv:2001.06370. 2020. Available from: https://arxiv.org/abs/2001.06370
- 63. Liao Z, Luo J, Gao W, Zhang Y, Zhang W. Homomorphic CNN for Privacy Preserving Learning on Encrypted Sensor Data. In: 2019 Chinese Automation Congress (CAC); 2019. p. 5593–5598.
- 64. Jung W, Lee E, Kim S, Kim J, Kim N, Lee K, et al. Accelerating fully homomorphic encryption through architecture-centric analysis and optimization, IEEE Access. 2021; 9: 98772–98789.
- 65. Lee J, Lee E, Lee J-W, Kim Y, Kim Y-S, No J-S. Precise Approximation of Convolutional Neural Networks for Homomorphically Encrypted Data, IEEE Access. 2023; 11: 62062–62076.
- 66. Riazi MS, Samragh M, Chen H, Laine K, Lauter K, Koushanfar F. XONN: Xnor-based oblivious deep neural network inference. In: 28th USENIX Security Symposium; 2019. p. 1501–1518.
- 67.
Liu J, Juuti M, Lu Y, Asokan N. Oblivious Neural Network Predictions via MiniONN Transformations. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security; 2017. p. 619–631. https://doi.org/10.1145/3133956.3134056
- 68. Juvekar C, Vaikuntanathan V, Chandrakasan A. Gazelle: A Low Latency Framework for Secure Neural Network Inference. In: 27th USENIX Security Symposium; 2018. p. 1651–1668.
- 69. Wagh S, Gupta D, Chandran N. SecureNN: 3-Party Secure Computation for Neural Network Training. In: Proceedings on Privacy Enhancing Technologies; 2019. p. 26–49.
- 70. Mohassel P, Zhang Y. SecureML: A System for Scalable Privacy-Preserving Machine Learning. In: 2017 IEEE Symposium on Security and Privacy; 2017. p. 19–38.
- 71. Phong LT, Aono Y, Hayashi T, Wang L, Moriai S. Privacy-Preserving Deep Learning via Additively Homomorphic Encryption, IEEE Transactions on Information Forensics and Security. 2018; 13: 1333–1345.
- 72. Li M, Chow SSM, Hu S, Yan Y, Shen C, Wang Q. Optimizing Privacy-Preserving Outsourced Convolutional Neural Network Predictions, IEEE Trans Dependable Secure Comput. 2022; 19: 1592–1604.
- 73. Zhang Q, Xin C, Wu H. SecureTrain: An Approximation-Free and Computationally Efficient Framework for Privacy-Preserved Neural Network Training, IEEE Trans Netw Sci Eng. 2022; 9: 187–202.
- 74. IBM. IBM security homomorphic encryption services. 2020. Available from: https://www.ibm.com/downloads/cas/KQ27PWBO
- 75. Microsoft. Password Monitor: Safeguarding passwords in Microsoft Edge. 2021. Available from: https://microsoft.com/research/blog/password-monitor-safeguarding-passwords-in-microsoft-edge/
- 76.
Piazza F, Uncini A, Zenobi M. Artificial neural networks with adaptive polynomial activation function. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN); 1992. p. 343–349.
- 77. Apicella A, Donnarumma F, Isgrò F, Prevete R. A survey on modern trainable activation functions, Neural Networks. 2021; 138: 14–32. pmid:33611065
- 78.
Hou L, Samaras D, Kurc TM, Gao Y, Saltz JH. ConvNets with Smooth Adaptive Activation Functions for Regression. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics; 2017. p. 430–439.
- 79.
He K, Zhang X, Ren S, Sun J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV); 2015. p. 1026–1034.
- 80.
Trottier L, Giguere P, Chaib-draa B. Parametric Exponential Linear Unit for Deep Convolutional Neural Networks. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA); 2017. p. 207–214. https://doi.org/10.1109/ICMLA.2017.00038
- 81. Li B, Tang S, Yu H. PowerNet: Efficient Representations of Polynomials and Smooth Functions by Deep Neural Networks with Rectified Power Units, Journal of Mathematical Study. 2020; 53: 159–191.
- 82. Ramachandran P, Zoph B, Le QV. Searching for activation functions. arXiv:1710.05941. 2017. Available from: https://arxiv.org/abs/1710.05941
- 83. Vershkov N, Babenko M, Tchernykh A, Pulido-Gaytan B, Cortés-Mendoza JM, Kuchukov V, et al. Optimization of Neural Network Training for Image Recognition Based on Trigonometric Polynomial Approximation, Programming and Computer Software. 2021; 47: 830–838.
- 84. Apicella A, Isgrò F, Prevete R. A simple and efficient architecture for trainable activation functions, Neurocomputing. 2019; 370: 1–15.
- 85. Qian S, Liu H, Liu C, Wu S, Wong HS. Adaptive activation functions in convolutional neural networks, Neurocomputing. 2018; 272: 204–212.
- 86. Bingham G, Miikkulainen R. Discovering Parametric Activation Functions, Neural Networks. 2022; 148: 48–65. pmid:35066417
- 87. Goyal M, Goyal R, Lall B. Learning Activation Functions: A new paradigm for understanding Neural Networks. arXiv:1906.09529. 2019. Available from: https://arxiv.org/abs/1906.09529
- 88.
Telgarsky M. Neural networks and rational functions. In: 34th International Conference on Machine Learning; 2017. p. 3387–3393.
- 89.
Molina A, Schramowski P, Kersting K. Padé activation units: End-to-end learning of flexible activation functions in deep networks. In: International Conference on Learning Representations (ICLR); 2020.
- 90.
Chen Z, Chen F, Lai R, Zhang X, Lu C-T. Rational Neural Networks for Approximating Graph Convolution Operator on Jump Discontinuities. In: 2018 IEEE International Conference on Data Mining (ICDM); 2018. p. 59–68. https://doi.org/10.1109/ICDM.2018.00021
- 91.
Boullé N, Nakatsukasa Y, Townsend A. Rational neural networks. arXiv:2004.01902. 2020. Available from: https://arxiv.org/abs/2004.01902
- 92. Livni R, Shalev-Shwartz S, Shamir O. On the computational efficiency of training neural networks. In: Advances in Neural Information Processing Systems (NIPS); 2014.
- 93. Gautier A, Nguyen QN, Hein M. Globally optimal training of generalized polynomial neural networks with nonlinear spectral methods. In: Advances in Neural Information Processing Systems (NIPS); 2016.
- 94. Albrecht M, Chase M, Chen H, Ding J, Goldwasser S, Gorbunov S, et al. Homomorphic Encryption Security Standard; 2018. Available from: https://homomorphicencryption.org/standard/
- 95. LeCun Y, Corte C. MNIST handwritten digit database; 2010. Database: MNIST [Internet]. Available from: http://yann.lecun.com/exdb/mnist/
- 96. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library In: Advances in Neural Information Processing Systems (NeurIPS); 2019. p. 8026–8037.
- 97. Chen H, Laine K, Player R. Simple Encrypted Arithmetic Library; 2019.
- 98. Benaissa A, Retiat B, Cebere B, Belfedhal AE. TenSEAL: A Library for Encrypted Tensor Operations Using Homomorphic Encryption. arXiv:2104.03152. 2021. Available from: https://arxiv.org/abs/2104.03152
- 99. Smith LN. A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay. arXiv:1803.09820. 2018. Available from: https://arxiv.org/abs/1803.09820
- 100. Smith LN, Topin N. Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates. arXiv:1708.07120. 2017. Available from: https://arxiv.org/abs/1708.07120