Figures
Abstract
With the prosperity of machine learning and cloud computing, meaningful information can be mined from mass electronic medical data which help physicians make proper disease diagnosis for patients. However, using medical data and disease information of patients frequently raise privacy concerns. In this paper, based on single-layer perceptron, we propose a scheme of privacy-preserving clinical decision with cloud support (PPCD), which securely conducts disease model training and prediction for the patient. Each party learns nothing about the other’s private information. In PPCD, a lightweight secure multiplication is presented and introduced to improve the model training. Security analysis and experimental results on real data confirm the high accuracy of disease prediction achieved by the proposed PPCD without the risk of privacy disclosure.
Citation: Ma H, Guo X, Ping Y, Wang B, Yang Y, Zhang Z, et al. (2019) PPCD: Privacy-preserving clinical decision with cloud support. PLoS ONE 14(5): e0217349. https://doi.org/10.1371/journal.pone.0217349
Editor: Lixiang Li, Beijing University of Posts and Telecommunications, CHINA
Received: December 9, 2018; Accepted: May 9, 2019; Published: May 29, 2019
Copyright: © 2019 Ma et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All the data sets are available from the UCI machine learning repository. URL:http://archive.ics.uci.edu/ml.
Funding: This work is supported by the National Key R&D Program of China under Grants no. 2017YFB0802000 to BW, the National Natural Science Foundation of China under Grant no. U1736111 to BW, the Plan For Scientific Innovation Talent of Henan Province under Grand no. 184100510012 to BW, the Program for Science & Technology Innovation Talents in Universities of He’nan Province under Grant No. 18HASTIT022 to YP, Key Technologies R&D Program of He’nan Province under Grant No. 182102210123 to YP, the Foundation of He’nan Educational Committee under Grant No. 18A520047 to YP, the Foundation for University Key Teacher of He’nan Province under Grant No. 2016GGJS-141 to YP, Key Technologies R&D Program of He’nan Province (192102210295 to HM), and Innovation Scientists and Technicians Troop Construction Projects of Henan Province. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
With sharp growth of electronic data, machine learning has impacted on human’s lifestyle by predicting human’s behavior and future trends on everything [1], [2], [3]. To overcome the limitations of storage and computing resource, how to outsource pricey tasks of machine learning to the Cloud has attracted much more attention. For instances, data of the client can be transmitted to the Cloud for either model training and predicting [4], [5], [6]. As a popular machine learning algorithm, single-layer perceptron (SLP) is simple yet efficient and has been widely used in disease prediction [7], [8], [9]. It is more appropriate for real-time disease predicting than some complex techniques such as naïve bayesian [10], decision trees [2] and support vector machines (SVMs) [11], [12] and so on. Clinical decision support system (CDSS), which uses various data mining techniques to help physicians make proper disease diagnosis and provide health services for patients, has received considerable attention [7], [13], [14],[15]. However, for privacy concerns, users don’t want to submit their medical data to an unauthorized institution [16], [17], [18]. At the same time, due to classifier being considered as own asset of the medical service provider, there is a risk of exposing the prediction model to third-party. Otherwise, third-party will use the model to make disease prediction for a patient who could damage the profile of medical service provider. Therefore, the confidentiality of both medical data and disease model are crucial for the CDSS. How to achieve secure disease prediction without compromising the accuracy of the result becomes a challenging issue.
To protect the privacy of patients’ medical data and the security of the prediction model, in this study, we propose a privacy-preserving clinical decision scheme based on SLP with cloud support (PPCD). As shown in Fig 1, two phases of SLP model training and disease predicting are included. In the model training, Diagnosed patients encrypt their symptoms data and outsource them with the corresponding diagnosed disease to the cloud. Meanwhile, the hospital generates random weights which are then encrypted and sent to the cloud. After receiving both of the encrypted medical data and the weights, the cloud trains the model accompanied by a few interactions with the hospital. The cloud selects an encrypted sample and executes the sign(.) function. If the returned value of sign(.) does not match its label, the cloud updates the weights until the convergence criterion is satisfied or all the disease cases are matched. When a patient wants to check his disease, he encrypts the data of the symptoms and submits it to the hospital which completes the analysis based on the disease model and sends back the encrypted diagnosis result and some medical advice.
Towards tackling the privacy concerns in Clinical decision support system, PPCD provides disease model training and disease risk prediction for the patient in a privacy-preserving way that makes the Cloud learns nothing about the patient’s medical information and the actual model. Specifically, the main contributions lie in:
- The proposal of PPCD which provides a privacy-preserving clinical decision based on SLP with cloud support. It helps the doctor to predict disease since the medical data and the diagnosis result remains in encrypted forms. Furthermore, the built disease diagnosis model is also protected as an asset of the hospital.
- For privacy-preserving in the phase of model training, a specific lightweight secure multiplication (LSM) is presented. By employing LSM, PPCD securely finishes the inner-product in encrypted-domain (ED) after one round.
- We implement PPCD by Java to check its performance in ED. Experimental results from several medical data analysis confirm that PPCD achieves comparable accuracies with SLP in plain-domain (PD).
The remainder of this paper is organized as follows: The following section briefly introduces the preliminaries. Then, PPCD is proposed along with LSM. Also, correction & security analysis is detailed, followed by the section of performance evaluation. Related works and conclusions are respectively given by the last two sections.
Preliminaries
In this section, a brief glimpse of the Paillier cryptosystem, SLP and secure multiplication (SM) are given. Table 1 summarizes the key notations.
Single-layer perceptron
Following [19], SLP is to learn the weight vector w which is then multiplied with the input features to determine if a sample belongs to one class or the other. We define an activation function sign(z) which takes the linear combination of the input values x and w as input. If sign(z) is greater than a defined threshold θ, we predict 1 and -1 otherwise. In order to simplify the notation, we define w0 = −θ and x0 = 1, so that
(1)
where
For each training sample xi, we calculate the output value, and update w if the output is not the same with the target. The value for updating the weights at each increment is calculated by the learning rule,
(2)
where η is the learning rate (0 < η ≤ 1).
It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable. If a linear decision boundary can’t separate the two classes, a maximum number of passes should be set over the training dataset and/or a threshold for the number of tolerated misclassifications.
Paillier cryptosystem
Paillier cryptosystem is an additively homomorphic cryptosystem [20]. It works as follows:
- Key generation: Two large prime numbers p and q are randomly and independently chosen such that gcd(pq, (p − 1)(q − 1)) = 1, where |p| = |q|. Then, we compute n = pq and λ = lcm(p − 1, q − 1), and select a random integer g in
. By setting μ = (L(gλ mod n2))−1 mod n and
, the public key (n, g) and the private key (λ, μ) are obtained.
- Encryption: Let m be a message to be encrypted where 0 ≤ m < n. With a randomly selected r where 0 < r < n, the ciphertext is calculated by c = E(m) = gm · rm mod n2.
- Decryption: Let c be the ciphertext to decrypt where
, the plaintext message is got by m = D(c) = L(cλ mod n2) · μ mod n.
As a additively homomorphic, its identities: D((E(m1, r1) · E(m2, r2) mod n2) = (m1 + m2) mod n and homomorphic multiplication of plaintexts: D((m1, r1)k mod n2) = km1 mod n.
Secure multiplication.
Secure Multiplication(SM) [21] supports multiplication in ED. Suppose Alice has two encrypted data Epk(X) and Epk(Y), Bob has the private key sk corresponding to public key pk, the goal of SM is to compute Epk(X * Y) without leaking X and Y to Alice. SM protocol is described as follow:
- Alice gets ciphertext Epk(x) and Epk(y), generates two random numbers rx, ry ∈ zn, and then calculates x1 = Epk(x) · Epk(rx) and y1 = Epk(y) · Epk(ry). Send x1 and y1 to Bob.
- After received x1 and y1, Bob decrypts x1 and y1 by using the private key sk to get Hx = Dsk(x1) and Hy = Dsk(y1), then computes H1 = Hx · Hy mod N, last Bob encrypts H1 with pk H = Epk(H1) and sends H to Alice.
- Alice first computes
,
and s3 = Epk(rx · ry)N−1, then multiplies them as Epk(x · y) = H · s1 ·s2 ·s3.
The proposed PPCD model
Model overview and requirements
Model overview.
To make employing SLP for model training and disease prediction with privacy being protected, the proposed PPCD model contains four parties which are illustrated in Table 2. They collaboratively conduct SLP model training and disease predicting. The CS trains a disease prediction model based on the DP’s disease data. To check a disease, UP submits his symptoms data to the Hospital which predicts the corresponding disease based on the trained model. Fig 1 depicts the detailed procedure.
Privacy requirements.
In PPCD, DPs are trustworthy. They provide correct medical data to the Cloud server. Meanwhile, CS and UP are honest-but-curious [22]. CS strictly follows the privacy-preserving SLP learning protocol performed in the system. It wants to know HP’s sensitive medical data and UP’s medical information once the condition is met. UP is interested in the trained disease model. Hospital is honest. At the same time, an adversary from outside is curious about all transferred data in the system by eavesdropping. So privacy-preserving is critical for successfully diagnosing the patient’s disease, and security requirements of PPCD are listed as follows.
- UP’s Privacy: In the disease diagnosis, sensitive symptom data of UP should not be leaked to other untrusted parties during the transmission. Furthermore, the diagnosed result is confidential for the patients such that it cannot be exposed to any other entities. It means that UP’s privacy should be preserved.
- DP’s Privacy: Generally, DP gets some history medical information, e.g., the diagnosed disease and the confirmed symptoms data. This information is highly sensitive and cannot be got by the unauthorized entities. Otherwise, DP is unwilling to provide the history disease data for model training due to privacy concerns.
- Hospital’s Privacy: In PPCD, hospital trains disease model using the historical medical data with the help of the Cloud. As an asset of the hospital, the disease model cannot be leaked to UP and other parties during disease diagnosis.
Design goal.
Based on the above scenarios and the security requirements, the system will realize model training and disease diagnosis in a privacy-preserving and efficient way. The particular goals are shown as follows.
- Privacy-preserving requirements: the flourish of Clinical decision support hinges upon information secure and privacy-preserving. If the model’s privacy requirements are not considered, the patient’s sensitive data and the disease model will be exposed to the unauthorized parties. Thus history patients are more unwilling to share their medical data to PPCD, the accuracy of the trained model is not ensured, and diagnosis service will be bad. Therefore, the system should realize the privacy of history patients and undiagnosed patients.
- Confidentiality and accuracy of disease model should be achieved: the disease model is a valuable asset of the hospital, which may be reluctant to reveal the information of the disease model. Simultaneously, it is crucial applying privacy-preserving can’t compromise the accuracy of predicting model.
The Proposed PPCD Model
Privacy-preserving training.
This section shows how to construct PPCD, train the disease model and predict disease based on the model in a privacy-preserving way.
- (1) System setting
Key generation: Paillier encryption algorithm is run by the hospital to generate keys for both UP and the hospital. Given the secure parameter k, choose two large prime numbers p and q randomly which satisfy |q| = |p| = k, hospital generates the pubic key (n, g) and the corresponding private key (λ, μ), where n = pq and λ = lcm(p − 1, q − 1).
Data encryption: Raw medical data are encrypted and submitted to the Cloud for storage and model training. The Cloud stores the disease patterns
, each of which represents a disease sample
, where xi is a n-dimension vector, each element represents confirmed symptom and Oi ∈ {−1, 1} is associated desired output, where 1 represents suffering from the disease and -1 represents not. Suppose medical data have been preprocessed, so the format of data is suitable for PPCD. In system, disease output is stored in cloud server in plaintext because leaking disease output does not damage patients’ privacy. The encrypted patients’ medical data are stored in cloud as Table 3.
Meanwhile, the disease predicting model is sensitive data which should be encrypted. At the beginning of model training, the hospital generates a random weight w = (w1, w2, ⋯, wn) and encrypts it, then sends ciphertext of the weight to the Cloud server.
- (2) Lightweight secure multiplication protocol
SM can be used to calculate inner-product on the two encrypted vectors. Given and
,
is calculated by running SM for n times. To efficiently compute the inner-product of two encrypted vectors, based on SM, we propose an efficient lightweight secure multiplication (LSM) protocol which can achieve inner-product on ciphertext in one time. By considering two parties C1 and C2, LSM is detailed in Algorithm 1.
Algorithm 1:
Require: C1 has and
; C2 has sk
Step1: C1:
- (1) Chooses 2n random numbers rxij, rwj, ∈ ZN
- (2) Crxij ← E(rxij)
- (3) Crwj ← E(rwj)
For each Cxij and Cwj
- (4) Xij = Cxij · Crxij
- (5) Wj = Cwj · Crwj; Send Xij, Wj to the C2
Step2: C2
- (1) Receive Xij Wj from C1
- (2)
- (3)
- (4)
- (5) H = Epk(h); sends H to C1
Step3: C1
- (1) Receiving the H
- (2)
- (3)
- (4)
- (5)
- (3) Model training
In system setting phase, DP encrypts its medical information <xi, Oi> and outsources <Cxi, Oi> to the Cloud. The Cloud collects some medical data where k represents the k-th disease. To train the predicting model wk of the k-th disease, the Cloud selects disease samples with Ik to train the model.
Privacy-preserving disease model training is described by Algorithm 2.
Algorithm 2: Privacy-Preserving Model Training Based on SLP
1: Input: n input samples, , 1 ≤ k ≤ m, iterationmax, learning rate η, sign function sign(·)
2: Output: prediction model wk, 1 ≤ k ≤ m
3: DP: for 1 ≤ k ≤ m do
4: for 1 ≤ i ≤ n do
5: DP encrypts symptom data as <Cxi, Oi, Ik> and submits to the cloud
6: Endfor
7: Endfor
8: for 1 ≤ k ≤ m do
9: Hospital: chooses initialization randomly.
10: for iteration = 1, 2, …, iterationmax
11: for 1 ≤ i ≤ n do
12: Hospital: encrypts and upload to the cloud
13: Cloud: chooses a medical sample <Cxi, Oi> and executes LSM to get
14: and send to the hospital
15: Hospital: decrypt R and calculation sign function Si = sign(DEC(R)) and send to the cloud.
16: Cloud: If S # Oi and Oi = 1, exp = η
17: If S # Oi and Oi = −1, exp = n − η
18: for j = 1,…d
19:
20: Cwj = Cwj ⋅ uj
21: endfor
22: endfor
23: endfor
24: return wk, 1 ≤ k ≤ m
Lines 3–7: DP encrypts symptom data and submits <Cxi, Oi, Ik> to the cloud.
Lines 8–12: The hospital randomly generates the weight in which not all elements is equal to 0 and encrypts it with own public key pk, then, send weight ciphertext
{to the Cloud.
Lines 13–14: In the Cloud, choose a disease sample {Cxi, Ik} and 2n random numbers rxij, rwj ∈ ZN, then executes LSM to compute , where the cloud server is C1, hospital is C2. Lastly send R to the hospital.
Lines 15: After receiving R, teh hospital decrypts R with private key sk, and execute the sign(·) function as , then send S to cloud.
Lines 16–20: The Cloud compare S with Oi. if S # Oi and Oi = 1, let exp = η; if S # Oi and Oi = −1, let exp = n − η. Next the Cloud updates Cxi as , and then, update Cwj as
.
Line 24: If the entire disease samples are matched or training count is greater than convergence criterion, hospital will terminate the training model and <wk Ik> is seen as prediction model for Dk, else return and repeat lines 13–14.
After getting the k-th disease model, the Cloud selects and repeats lines 8–24. After all medical sample are trained, hospital cloud get prediction models
for all disease.
Disease prediction.
In the phase, assuming prediction models have been trained and stored in the hospital. The hospital can predict whether a patient suffers from K-th disease using a K-th disease model. When an undiagnosed patient submits his encrypted symptoms information to the hospital, the prediction will be executed as follow.
- Step 1: When the ciphertext of symptoms information is arrived, the hospital decrypts the ciphertext and gets the plaintext symptoms data
.
- Step 2: Let s = 0, for each xj and wj, the hospital calculates sj = xj · wj, then gets
.
- Step 3: Compute S = sign(s), If S > = 0, then the patient suffers from the disease, but not otherwise.
- Step 4: hospital encrypts the prediction result with UP’s public key and return to the patient.
Correction & security analysis
In this section, we analyze the correction and security of the proposed PPCD scheme. Notably, we focus on how PPCD achieve the privacy preserving of medical information of patient and disease model.
(1) Correctness analysis of LSM
The correctness of LSM can be illustrated as follows:
In the Step3:
(8)
(9)
(10)
(11)
From the above derivation, LSM can calculate the in a round.
(2) Correctness analysis of training model
The correctness of PPCD can be illustrated as follows: in step3, the hospital decrypts R with private key sk, and compute
(12)
So si is consistent with that in Eq (1).
In Step 4. The Cloud update Cwk as Cwj = Cwj · uj,
where
If S # Oi and Oi = 1, exp = η
(13)
If S # Oi and Oi = −1, exp = n − η
(15)
Thus Cwj is also consistent with that in Eq (2).
From the above calculation, PPCD train correct disease model in the cloud. Namely the accuracy of prediction model is satisfied.
(3) Security of patient’s medical data
To predict disease for patients, DP and UP encrypt medical information xi = {xi1, xi2,…,xij} with the hospital’s public key PKh and upload the ciphertext Cxi = {Cxi1, Cxi2,…,Cxij} to the Cloud. In the process of transmission, all the medical information is encrypted to prevent outside attacker from eavesdropping. An adversary cannot decrypt the ciphertext without the hospital’s private key SKh. The symptom data is encrypted by the Paillier which is semantic secure against the choose plaintext attack. So the medical information stored in the Cloud is secure since the Cloud cannot identify the corresponding contents and get the plaintext of symptom data.
(4) Security of training disease model
During training the prediction model, all the computations are done over ciphertexts. is calculated by using LSM in which each party learns nothing from the protocol. The initial model is generated by the hospital randomly and updated in the process of training over ciphertext, and the hospital’s SKh is well protected.
and Cwj = Cwj · uj = E(wj + ηOixi,j) can be computed easily over ciphertext because of the additive homomorphism property of Paillier. Suppose the disease model is leaked to UP or the Cloud, they are not able to recover wk, without the private key SKh.
(5) Security of predicting result
When a patient wants to identity his disease, he submits the ciphertext of symptoms data to the hospital. After finishing disease prediction, diagnosis result is encrypted by UP’s public key PKup and returned to UP. When an attack captures predicting result, he can’t recover the corresponding contents without DP’s private key SKup.
Performance evaluation
Complexity analysis
Computational complexity.
To analyze the complexity of the proposed PPCD, Table 4 illustrates the computational cost for each step. For simplicity, we use EXP to denote the time complexity of one exponentiation operation on ciphertext in the Paillier cryptosystem. Similarly, the time complexities of one multiplication operation on ciphertext and one modular inverse operation in the decryption algorithm are represented by MUL and DIV, respectively. In Step 1 of the disease learning phase, n exponents and multiplications are required by the hospital which encrypts the initial weight. In Step 2, the Cloud uses (2n+3) exponents and (4n+7) multiplications, and the hospital executes 2n exponents and 4n multiplications to obtain R. In Step 3, one exponent and one modular inverse are consumed before getting S. In Step 4, to update the weight, the Cloud does n exponents and n multiplication. At last, (n-1) multiplications, one exponent and one modular inverse are executed to predict disease risk. Then the encrypted diagnosis result is sent to UP.
Communication complexity.
Assuming there are N samples with n dimensions, and the length of the ciphertext is p. In the proposed PPCD system, the encrypted symptom data are outsourced to the Cloud to train the classifier which costs O(N(np+L)). In model training, the hospital transmits the encrypted initial weight which requires O(np+LIK). To compute R, the cost of transferring data is O(3np+2p+LIK). In disease prediction, the hospital sends the encrypted predicting result to UP that costs O(np+LIK). The communication complexities of the proposed PPCD are detailed in Table 5.
Experimental results
To fairly evaluate the performance, the proposed PPCD is implemented by Java on Windows 7-X64. The Cloud is a computer with Intel Quad core 3.4GHz and 16GB available RAM, the hospital runs a machine with Intel Quad core 3.4GHz and 8GB available RAM, and the patient uses a laptop with Intel Dual core 2.0GHz and 8GB available RAM.
Data sets.
In the experiment, we use the Wisconsin breast cancer dataset (WBCD), the heart disease dataset (HDD) and the acute inflammations dataset (AID) from the UCI machine learning repository [23] to test the performance of SLP based on our PPCD scheme. Table 6 shows the statistical information of the employed three datasets.
WBCD contains 683 instances, and each instance includes 9 attributes ranging from 1 to 10. In WBCD, each instance can be grouped into one of two possible classes: benign or malignant. HDD has 297 instances, and each instance consists of 13 attributes with two classes. Except for sex, trestbpl, chol and thalach, the other 9 attributes range from 1 to 10. AID contains 120 instances, and each instance includes 6 attributes with two decisions, i.e., inflammation of urinary bladder (IUB) and nephritis of renal pelvis origin (NRPO). Except for the temperature, the other attribute is either 1 (YES) or 0 (No).
In reality, the raw medical data may be decimal. However, the Paillier can only encrypt integers. To resolve the above problem, approximation and expansion (A&E) method is adopted. Following the suggestion of [12], we adopt expanding each piece of medical data by multiplying 104, and rounding off all the values after the decimal point. For instance, xij is an integer lying in (Zn ∼ −Zn), the item of weight w = (w1, w2, …, wn) is in (Zn ∼ −Zn), then xi,j are encrypted using the Pallier as follows.
(17)
(18)
where Cxi,j, Cwj are the ciphertexts of xi,j and Cwj, respectively.
Results and analysis.
We conduct PPCD with a predefined iteration threshold 100, and then use the classifier and three real data sets to evaluate the classifier’s performance in terms of accuracy. For each data set, the ratio of training data samples to the testing data samples is 7:3. Experimental results are detailed in Tables 7–10. Apparently, for breast cancer, the overall accuracy achieved by SLP is 96.2% while PPCD reaches 95.6%. For heart disease, SLP obtains an overall accuracy of 94.6%, and PPCD has 93.9%. On AID, SLP gets an accuracy of 93.3% for IUB while PPCD achieves a comparable result 92.5%. For NRPO in AID, accuracy for SLP is 93.3% while PPCD gets 91.7%. Actually, PPCD reaches comparable disease analysis results with that of by SLP.
In terms of efficiency, Table 11 gives the runtime comparisons of PPCD on the three data sets. For Breast cancer, it takes 6.125s for history patients to encrypt all the symptoms. In the training phase, it takes 2993.1s for the Cloud to train the classifier. In the predicting phase, it takes 0.098s for the hospital to computer undiagnosed patient’s disease risk (including 0.013s for UP to encrypt all the symptoms). For Heart disease and AID, the time cost of data encryption, model training, and disease predicting are decreased as the reduction of the number of sample cases. For the sake of simplicity, multicore programming has not adopted the evaluation.
Related work
Without sufficient storage, computation or knowledge of the clinical decision, the clients frequently prefer outsourcing their data to the Cloud for model training and disease predicting. Ledley and lusted [24] firstly proposed a clinical decision support system which can help physicians to solve diagnostic problems. Later, a large number of disease prediction system based on various data mining techniques have been presented. For example, a fast prediction disease system based on SVM was proposed by [25] to predict the risk of progression of adolescent idiopathic scoliosis. Wang et al. [26] gave a risk assessment for individuals with a family history of pancreatic cancer using Bayesian classification. By introducing SVM, Huang et al. [27] designed a prediction model for breast cancer diagnosis while Barakat et al. [28] focused on the diagnosis of diabetes mellitus. For heart disease analysis, Anooj et al. [29] tried to use specific fuzzy rules. Though various prediction models have been developed, privacy protection of patients medical information fails to take into account which will impede the more progress of CDSS.
To address this challenge, some secure disease prediction [1], [7], [8], [9], [11], [12], [14] which diagnose patients’ disease without leaking medical data and prediction model have been widely studied. Wang et al. [14] proposed a Healer framework based on somewhat homomorphic encryption. It uses a small samples size to facilitate secure rare variants analysis and obtains the final results by decrypting ciphertexts in the trusted party. A privacy-preserving CDSS on Naïve Bayesian Classification was proposed by Liu et al. [5] which can help a clinician to diagnose the risk of patients’ disease in a privacy-preserving way. Wang et al. [9] proposed a secure SLP learning model for e-Healthcare, but it can only protect the privacy of patients’ medical information, the disease model isn’t protected. In [11], Zhu et al. proposed an efficient and privacy-preserving medical pre-diagnosis framework using SVM which can protect the sensitive personal health information without privacy disclosure with lightweight multi-party random masking and polynomial.
Recently, Tsung et al. [30] proposed a decentralized privacy-preserving healthcare predictive modeling framework on private Blockchain networks, in which privacy-preserving online machine learning is integrated with a private Blockchain network, apply transaction metadata to disseminate partial models, and design a new proof-of-information algorithm to determine the order of the online learning process, Each participating site contributes to model parameter estimation without revealing any patient health information. Zhang et al. [1] proposed a secure disease prediction scheme based on matrices and SLP which builds on new medical data encryption, disease learning, and disease prediction algorithms that utilizes random matrices. Liu et al. [7] proposed a Hybrid privacy-preserving clinical decision support system in fog–cloud computing, in which a fog server uses SLP to securely monitor patients’ health condition in real-time, The newly detected abnormal symptoms can be further sent to the cloud server for high-accuracy prediction in a privacy-preserving way. Compared with some sophisticated machine learning algorithms such as Naïve Bayesian, SVM, and deep learning classification, SLP is efficient and straightforward.
Conclusions
In this paper, we proposed a privacy-preserving disease predicting system based SLP which can help physicians make a proper diagnosis of disease and provide health services for patients anytime anywhere in a privacy-preserving way. In PPCD, DP’s historical medical data are used to train SLP in ED, and the hospital uses the trained model to predict diseases for a UP. Towards easing the privacy concerns from DP, we suggest an additively homomorphic encryption also for simplicity and generality. Inevitable multiplications of SLP motivate us introducing LSM into PPCD. Then users’ medical information and the trained model are secret to the cloud. Compared with SLP, comparable results reached by PPCD suggest that sacrificing data precision to improve efficiency is feasible in practical use.
Although PPCD benefits privacy-preserving diagnosis, the balance between security and efficiency should be considered firstly. Therefore, how to optimize the model training using mini-batch for efficiency improvement and finding an effective way of introducing some other advanced machine learning methods to build the privacy-preserving disease prediction system are worthy of investigation.
Acknowledgments
The authors would like to thank the Editor and the anonymous reviewers for their constructive comments that greatly improved the quality of this manuscript.
References
- 1. Zhang C, Zhu L, Xu C, and Lu R. PPDP: An efficient and privacy-preserving disease prediction scheme in cloud-based e-Healthcare system. Future Generation Computer Systems. 2018;79: 16–25.
- 2.
Taigel F, Tueno AK, and Pibernik P. Privacy-preserving condition-based forecasting using machine learning. 2018. https://doi.org/10.1007/s11573-017-0889-x.
- 3.
Phan N, Wang Y, Wu X, Dou D. Differential Privacy Preservation for Deep Auto-Encoders: An Application of Human Behavior Prediction. in Proc. Thirtieth Int. Conf. Artificial Intelligence processing. 2016; 1309–1316.
- 4.
Liu J, Juuti M, Lu Y, Asokan N. Oblivious neural network predictions via minion transformation. in proc. twenty-fourth ACM. Conf. computer communications security. 2017; PP. 619–631.
- 5. Li P, Li J, Huang Z, Li T, Gao CZ, Yiu SM, et al. Multi-key privacy-preserving deep learning in cloud computing. Future Generation Computer Systems.2017; 74:76–85.
- 6. Gao CZ, Cheng Q, He P, Susilo W, Li J. Privacy-preserving naïve bayes classifiers secure against the substitution-then-comparison attack. Information Sciences. 2018;444:72–88.
- 7. Liu XM, Deng RH, Yang Y, tran NH, and Zhong SP. Hybrid privacy-preserving clinical decision support system in fog–cloud computing. Future Generation Computer Systems. 2018;78(2): 825–837.
- 8.
Zhang X, Chen X, Wang J, Zhan Z, and Li J. Verifiable privacy-preserving single-layer perceptron training scheme in cloud computing. 2018. Soft Computing [online]. https://doi.org/10.1007/s00500-018-32-33-7.
- 9.
Wang GM, Lu RX, and Huang C. PSLP: privacy-preserving Single-Layer Perceptron Learning for e-Healthcare. Proc ICICS 10th Int. Conf. information, communication and Signal processing. 2015; pp. 1–5.
- 10. Schurink C, Lucas P, Hoepelman I, and Bonten M. computer-assisted decision support for the diagnosis and treatment of infectious diseases in intensive care units. The Lancet infectious diseases. 2005; 5(5):305–312. pmid:15854886
- 11. Zhu H, Liu X, Lu R, and Li H. Efficient and Privacy-Preserving Online Medical Pre-Diagnosis Framework Using Nonlinear SVM. IEEE Journal of Biomedical and Health Informatics. 2017;21(3): 838–850. pmid:28113828
- 12. Rahulamathavan Y, Veluru S, phan RC, Chambers JA, Rajarajan M. Privacy-Preserving Clinical Decision Support System Using Gaussian Kernel-Based Classification. IEEE Journal of Biomedical and Health Informatics. 2014;18(1): 56–66. pmid:24403404
- 13.
Musen MA, Shahar Y, Shortliffe EH, Clinical decision-support systems. Springer. Journal of Biomedical Informatics. pp. 698–736, 2014.
- 14. Wang S, zhang Y, Dai W, Lauter K, Kim M, Tang Y, et al. HEALER:Homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS. Bilinformatics. 2016;32(2): 211–218.
- 15. Liu X, Lu R, Ma J, Chen L, and Qin B. Privacy-preserving Patient-Centric Clinical Decision Support System on Naïve Bayesian Classification. IEEE Journal of Biomedical and Health Informatics. 2016;20(2): 655–668. pmid:26960216
- 16. Jiang X, zhao Y, Wang X, Malin B, Wang S, Ohno-Machado L, et al. A community assessment of privacy preserving techniques for human genomes. BMC medical informatics and decision making. 2014;14(S1):S1.
- 17. Zhao Y, Wang X, Jiang X, Ohno-Machado L, and Tang H. Choosing blindly but wisely: differentially private solicitation of dna datasets for disease marker discovery. Journal of the American Medical Informatics Association. 2015;22(1):100–8. pmid:25352565
- 18. Wang S, Mohammed N, and Chen R. Differentially private genome data dissemination through top-down specialization. BMC medical informatics and decision making. 2014;14(S1):S2.
- 19. Freund Y, and Schapire RE. Large margin classification using the perceptron algorithm. Mach. Learn. 1999;37(3) 277–296.
- 20.
Paillier P. public-key cryptosystems based on composite degree residuosity classes. Proc advances in Cryptology–EUROCRYPT ‘99, Theory and Application of Cryptographic Techniques, Prague, Czech Republic, may 2–6, 1999; pp.223-238.
- 21.
Samanthula BK, Elmehdwi Y, and Jiang W. K-nearest neighbor classification over semantically secure encrypted relational data. arXiv preprint arXiv:1403.5001, 2014.
- 22.
Vimercati SDCdi, Foresti S, Jajodia S, Paraboschi S and Samarati P. Over-encryption: management of access control evolution on outsourced data. In Proc. 33th Int. Conf. Very Large Data Bases. VLDB endowment, 2007, pp. 123–134.
- 23.
Lichman M. UCI machine learning repository. [cited 2018 Dec 8]. http://archive.ics.uci.edu/ml.
- 24. Ledley RS and Lusted LB. Reasoning foundations of medical diagnosis. Science. 1959;130(3366): 9–21. pmid:13668531
- 25. Ajemba P, Ramirez L, Durdle N, Hill D, and Raso V. A support vectors classifier approach to predicting the risk of progression of adolescent idiopathic scoliosis. IEEE Trans. Inform. Technol. Biomed. 2005;9(2):276–282.
- 26. Wang W, Chen S, Brune KA, Hruban RH, Parmigiani G, and Klein AP. PancPRO: risk assessment for individuals with a family history of pancreatic cancer. J. Clin. Oncol. 2007;25(11):1417–1422. pmid:17416862
- 27. Huang CL, Chen HC, Chen MC. Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Syst. Appl. 2008;34(1): 578–587.
- 28. Barakat MNH, and Bradley AP. Intelligble support vector machine for diagnosis of diabetes mellitus. IEEE Trans. Inform, Technol. Biomed. 2010;14(4): 1114–1120.
- 29. Anooj PK. Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules. J.King Saud Univ.–Comput. Inf.Sci. 2012;24(1): 27–40.
- 30.
Kuo TT, and Ohno-Machado L. ModelChain: Decentralized privacy-preserving healthcare predictive modeling framework on private blockchain networks. 2018. https://arxiv.org/abs/1802.01746.