A data-sharing scheme that supports multi-keyword search for electronic medical records

As cloud storage technology develops, data sharing of cloud-based electronic medical records (EMRs) has become a hot topic in the academia and healthcare sectors. To solve the problem of secure search and sharing of EMR in cloud platforms, an EMR data-sharing scheme supporting multi-keyword search is proposed. The proposed scheme combines searchable encryption and proxy re-encryption technologies to perform keyword search and achieve secure sharing of encrypted EMR. At the same time, the scheme uses a traceable pseudo identity to protect the patient’s private information. Our scheme is proven secure based on the modified Bilinear Diffie-Hellman assumption and Quotient Decisional Bilinear Diffie-Hellman assumption under the random oracle model. The performance of our scheme is evaluated through theoretical analysis and numerical simulation.


Introduction
An electronic medical record (EMR) is a digital document that contains medical information about a patient; this document is stored, managed, transmitted, and reproduced with electronic devices (computers, health cards, and others) [1]. Compared to the traditional medical record in paper form, EMR has the advantages of large storage capacity, resource saving, convenient query, improved diagnosis and treatment efficiency. With the continuous development of cloud computing, EMR has been rapidly developed, widely used, and gradually improved. A growing number of institutions and individuals use EMR and upload these data to the cloud for storage. Cloud-based systems have more advantages than traditional systems. Users can store and maintain massive data quickly and enjoy high-quality data storage services formed by cloud computing [2].
As a pervasive storage platform, cloud server providers are willing to deploy their EMR storage and application services to cloud servers [3]. Since EMR involves a large amount of patient's private information, an important task is to prevent the EMR from being leaked by unauthorized users and cloud servers [4,5]. To ensure data security and user privacy, the data are usually stored in the form of ciphertext in the cloud server, but users encounter the problem of how to search through the ciphertext. Searchable encryption is a cryptographic primitive that has been developed in recent years to assist users when performing keyword search on the ciphertext. This type of encryption fully utilizes abundant computing resources of cloud a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 1. We propose a framework for cloud-based EMR sharing with security and privacy preservation for diagnosis improvements in e-Health system. The doctor generates the EMR for the patient and encrypts it using the public key of the patient. The cloud server is responsible for storing the patient's EMR ciphertext and performs the search operation on EMR.
2. Our scheme can achieve conditional privacy preservation, in which each EMR encrypted by a patient is mapped to a distinct pseudo identity, while a legal hospital can retrieve the real identity of a patient from any pseudo identity. When the true identity of the patient needs to be obtained, the user can send an identity-tracking request to the hospital. After the verification request is legal, the hospital returns the true identity of the patient to the user.
3. We apply the searchable encryption to implement the secure search on the patient's EMR. The keyword index is stored in the cloud server. When the patient or data user needs to access the patient's EMR, the patient uses his/her private key and multi-keyword to generate a trapdoor and upload to the cloud server, then the cloud server performs the search operation. 4. In this scheme, EMR can be obtained not only by the patient, but also by the data user, such as medical institution and insurance company. We apply the proxy re-encryption to ensure secure sharing of the patient's EMR. When the patient wants to access his/her EMR, the patient sends the trapdoor to the cloud server. The cloud server returns EMR ciphertext to the patient. When the data user wants to obtain the patient's EMR, an authorization request is sent to the patient. After the authorization of the patient, the cloud server generates a reencryption key to encrypt the EMR ciphertext. After obtaining the re-encryption ciphertext, the data user decrypts it with his/her private key to obtain the patient's EMR.

Paper organization
The rest of this paper is organized as follows. In section 2, we present some preliminaries. In section 3, we introduce the system architecture, threat model and design goals, and algorithm model of our scheme. In section 4, we provide an overview of our scheme and describe the scheme in detail. Section 5 provides the security analysis, including the achieving goals and security proof of our scheme. In section 6, we compare the proposed scheme with relevant schemes through theoretical analysis and numerical simulation. Finally, we conclude the paper in section 7.

Bilinear map
Let G 1 and G 2 be two cyclic groups of a large prime q. Let g be a generator of G 1 . A bilinear pairing e is a function defined by e: G 1 × G 1 ! G 2 if the function e satisfies the following properties: 1. Bilinearity: For any a; b 2 Z � q , e(g a , g b ) = e(g, g) ab .

Hardness assumptions
Let G 1 be a cyclic group of a large prime q with a generator g. The following assumptions hold in our scheme. Definition 1. (Modified Bilinear Diffie-Hellman (mBDH) Problem) [20]. Given (g, g a , g b , g c ) 2 G 1 for a; b; c 2 Z � q , the mBDH problem is to compute e(g, g) ab/c . mBDH assumption. We say the mBDH assumption holds if no probabilistic polynomialtime algorithm can solve the mBDH problem with a non-negligible advantage.
QDBDH assumption. We say the QDBDH assumption holds if no probabilistic polynomial-time algorithm can solve the QDBDH problem with a non-negligible advantage.

System model
In this section, we present an architecture for the EMR system. Moreover, we consider several threats and propose several design goals.

System architecture
As shown in Fig 1, five entities are involved in this system: patients, doctor, hospital, cloud server, and data users. Patient. A patient is an entity who needs medical assistance. The patient first needs to register at the hospital to obtain his/her visiting token. When a patient visits a doctor for treatment,

PLOS ONE
his/her health information is generated by the doctor. When the patient's EMR is needed, he/ she can access the EMR by sending the trapdoor to the cloud server. In addition, the patient calculates a pseudo-identity for himself/herself and sends it to the hospital.
Doctor. The doctor is an entity responsible for generating the EMR for the patient and uploading them to the hospital. The doctor is also responsible for encrypting the EMR with the patient's public key and sends the ciphertext to the cloud server. When the doctor wants to obtain the patient's historical EMR, the doctor sends a request to the patient. After receiving the EMR from the cloud server, the patient shows the EMR to the doctor.
Hospital. A hospital is an entity that is responsible for generating a visiting token with value τ for the patient and sending the token to the cloud server. The hospital is also responsible for calculating the true identity of the patient required by the data user.
Cloud server. The cloud server is an entity that takes responsibility for storing the patient's encrypted EMR ciphertext and providing the function of searching EMR. After receiving the trapdoor from the patient, the cloud server performs the search operation on EMR. The cloud server generates the re-encryption key by interacting with data users and patients. Then, the cloud server re-encrypts the EMR ciphertext using the re-encryption key and sends the reencryption ciphertext to the data user.
Data user. In our scheme, the data user refers to the user authorized by the patient who wants to use the patient's EMR. For example, if a patient's condition is complicated, multiple experts are needed for consultation, and the experts come from different hospitals. After interacting with the patient and the cloud server, the data user receives the re-encryption ciphertext sent by the cloud server. The data user can decrypt it using his/her private key.

Threat model and design goals
In this study, we consider a semi-trust server that has been widely utilized in existing work. Specifically, the server honestly searches information for the benefit of patients, but curiously learns the underlying meaning of the sender's EMR. In addition, malicious outside attackers may intercept and analyze the information transferred in the public channel. Based on the preceding system architecture and threat model, the design goals of our scheme are as follows: 1. Data confidentiality and integrity. Whether the EMR is stored on the hospital server or transmitted through the public channel, no entity can retrieve or modify the EMR data.
2. Access control. The EMR data belongs to the patients who can control data access. In other words, only authorized users have the right to access the data. Simultaneously, data access activities should always be carried out with the participation and monitoring of patients and hospitals.
3. Secure search. When the doctor wants to access the patient's history EMR to improve diagnosis, the patient generates a trapdoor to search the EMR. During the process, only patients can generate the trapdoor. Moreover, the pseudo-identity of the patient is used in the search process, so the eavesdropper cannot deduce the real identity of the patient.
4. Privacy preservation. As the EMR data contains privacy-sensitive information of the patient, the patient's identity must be kept secret.

Algorithm description
The proposed scheme is composed of nine polynomial-time algorithms: Setup(1 λ ) ! PP: The algorithm takes a security parameter 1 λ as input, and outputs the public parameters PP. KeyGen(PP) ! (pk, sk): Given the public parameters PP, the algorithm outputs a public/ private key pair (pk, sk).
Enc(pk P , W, M) ! C: The algorithm inputs a public key of user P, an electronic medical record M, a keyword set W = (w 1 , � � �, w n ), outputs an original ciphertext C.
Trapdoorðsk P ; QÞ ! T w 0 : Takes a private key of user P and a query keyword set n Þ as input, the algorithm outputs a keyword trapdoor set The algorithm takes a private key of user P and an original ciphertext C, and output a record M if each input parameter is correct.
ReKeyGen(sk P , sk R ) ! rk P!R : Given user P's private key sk P and user R's private key sk R , the algorithm outputs a re-encryption key rk P!R . This process is performed by user P, user R and the cloud server.
ReEnc(rk P!R , C) ! C 0 : Takes a re-encryption key rk P!R from user P to user R and an original ciphertext C for user P, the algorithm converts the ciphertext C to C 0 for user R.
Dec(sk R , C 0 ) ! M: The algorithm takes a private key of user R and a re-encryption ciphertext C 0 , and output a record M if each input parameter is correct.

Overview of scheme
Without loss of generality, we assume that a patient P registers to a hospital for medical assistance, and the hospital generates a visiting token τ for the patient and sends it to the patient. Here, τ works as the authorization for the doctor to generate EMR for the patient P. Meanwhile, the patient P computes a pseudo identity ID P for himself/herself and returns it to the hospital. The hospital packs the tuple (ID P , τ) and sends it to the cloud server. After the patient P physically visits the doctor, he/she provides τ to the doctor as accordance for generating his/her EMR. We assume that the doctor generates health record M for the patient P by the interaction. To safely store the data with interoperability, the doctor extracts a keyword set W = (w 1 , � � �, w n ) for the EMR. Then, the doctor encrypts M and W with the patient's public key pk P . The ciphertext C = (C M , C W ) is stored in the cloud server, where C M is the ciphertext of EMR M and C W is the ciphertext of keyword set W.
When the patient P visits another doctor in a different hospital, the doctor may think it is necessary to know the patient's history health record. The patient P can send an access request that includes keyword trapdoor to the cloud server. If the access request is valid, the cloud server sends the patient P the ciphertext C M . The patient P can decrypt C M with his/her private key to obtain the health record M. Then, the patient shows it to the doctor.
If the data user R wants to access the EMR of patient P, then he/she sends an interactive request to the patient and the cloud server. After the interaction, the cloud server generates a re-encryption key. The cloud server uses this key to re-encrypt the EMR ciphertext and obtains the re-encryption ciphertext. Then, the cloud server sent it to the data user R. The data user R uses his own private key to decrypt the re-encryption ciphertext. If the data user wants to obtain the true identity of the patient P, he/she can send a request to the hospital.

Our scheme
In this section, we introduce the details of our proposed scheme. The entities in our scheme involved at least one of the algorithms mentioned in "algorithm definition". Roughly, our proposed scheme is composed of four main phases: initialization, data processing, search, and record retrieval.
Phase 1: Initialization. In this phase, the system generates the public parameter PP by operating the algorithm Setup(1 λ ), where 1 λ is the security parameter. All the patients P, doctors D, and data users R generate their private and public keys by running the algorithm KeyGen(PP).
Setup(1 λ ): Select two bilinear groups (G 1 , G 2 ) of prime order q and a bilinear map e. Pick g as a generator of G 1 and set Z = e(g, g). Select four hash functions H 1 : Thus, the public parameter can be denoted KeyGen(PP): Each patient P randomly selects a secret value p 2 Z � q as its private key sk P and computes the public key pk P = g p . Each doctor D randomly chooses a secret value d 2 Z � q as its private key sk D and computes the public key pk D = g d . Each data user R randomly selects a secret value r 2 Z � q as its private key sk R and computes the public key pk R = g r .
When the patient P registers at the hospital, the hospital randomly selects β 2 {0, 1} � and computes τ = g 1/β . Then, the hospital sends the token τ to the patient P securely. Meanwhile, the patient randomly selects s 2 Z � q and computes S = g s . Thereafter, the patient calculates his/ her pseudo identity ID P = RID P � H 1 (τ s ) where RID P is the real identity of the patient P. The patient P returns the tuple (τ, S, ID P ) to the hospital. The hospital chooses a doctor D for the patient and sends the tuple to the cloud server with the doctor.
Phase 2: Data encryption and storage. As a patient P sees a doctor D for medical assistance, he/she shows the doctor token τ, which works as a proof of the patient's authorization to the doctor for generating his/her EMR. After interaction with the patient P, the doctor D generates health record M 2 G 2 and extracts a keyword set W = (w 1 , � � �, w n ) from the record. Then, the doctor stores M in the hospital and encrypts M and W with the patient's public key pk P by operating the algorithm Enc(pk P , W, M).
Enc(pk P , W, M): The doctor randomly selects a value k 2 Z � q and computes C 1 = M � Z k , The output of encryption algorithm is C = (C M , C W ), where C M = (C 1 , C 2 ) and C W = (t, H 3 (t)). Here, C M is the record ciphertext and C W is the keyword index. The doctor sends the ciphertext C and the patient's pseudo identity ID P to the cloud server. To match the patient's token in the cloud server, the doctor performs the following operations: • Randomly chooses value a 2 Z � q and computes a ¼ g The doctor sends (α, τ 0 ) to the cloud server. Then, the cloud server checks whether the equation If the equality holds, the EMR ciphertext C successfully matches the token τ of the patient P. The cloud server stores the ciphertext C and ID P together. Correctness:

PLOS ONE
Phase 3: Search. This phase is divided into two steps: trapdoor generation and test. On another day, the patient may visit another doctor in a different hospital. During the interaction process of the doctor and the patient, the doctor may find that it is necessary to access the patient's history record for a more accurate diagnosis. To search over the encrypted record C, the patient P needs to compute the trapdoor set for a query keyword set Q ¼ ðw 0 1 ; � � � ; w 0 n Þ by invoking the algorithm Trapdoor(sk P , Q).
Meanwhile, the patient P sets an effective access time tr for this request [22], and then sends a tuple (tr, T w 0 ) to the cloud server.
The cloud server checks the validity of tr after receiving the tuple. If tr is not effective, the message is ignored. Otherwise, the cloud server performs Search ðT w 0 ; CÞ to check whether the encrypted record C involves the keyword set Q. Precisely, for each w i in Q, the cloud server checks whether the equation Upon receiving EMR ciphertext C M from the cloud server, the patient P decrypts the ciphertext C M to retrieve the record M by invoking the algorithm Dec 1 (sk P , C).

Dec(sk
After obtaining the EMR M, the patient P shows it to the doctor. Correctness:

PLOS ONE
To obtain the patient P's EMR, the data user R first requests the patient P and cloud servers to interact with him/her. The cloud server generates the re-encryption key by running the algorithm ReKeyGen(sk P , sk R ). More precisely, the re-encryption key is generated by the following steps: • The patient P randomly chooses value j 2 Z � q . Then, the patient P sends j to the cloud server and sk P � j to the data user R.
• After receiving sk P � j from the patient P, the data user R sends sk R /(sk P � j) to the cloud server.
• Finally, the cloud server computes the re-encryption key rk P!R = r/p.
Then, the cloud server re-encrypts the EMR ciphertext C M with rk P!R to generate the reencryption ciphertext C 0 M for the data user R by running the algorithm ReEnc(rk P!R , C).

ReEnc(rk P!R , C):
The cloud server computes C The cloud server sets the re-encryption ciphertext C When the real identity of the patient P needs to be obtained for treatment or medical insurance purposes, the data user R sends a request to the hospital. The hospital obtains the true identity of the patient P by calculating RID P = ID P � H 4 (S 1/β ) and returns it to the data user R. In our scheme, only the hospital system knows the β value, so only the hospital can extract the real identity of the patient.

Achieving goals
In this section, we illustrate how the proposed scheme can effectively achieves the design goals presented in "System Model". The proposed scheme achieves data confidentiality and integrity. The EMR data are encrypted before being outsourced to the hospital server. The doctor uses the patient's public key to encrypt the EMR. On the one hand, the patient uses his/her private key to decrypt the EMR ciphertext; on the other hand, the data user authorized by the patient uses his/her private key to decrypt the EMR re-encryption ciphertext.
The proposed scheme achieves access control. As mentioned in phase 4, if the data user wants to access the patient's EMR, he/she first sends an authorization request to the patient.
After the patient agrees, the cloud server generates a re-encryption key. The cloud server reencrypts the EMR ciphertext with it to generate the re-encryption ciphertext that the data user can decrypt with his/her private key.
The proposed scheme achieves secure search. In phase 2 of our scheme, the EMR is encrypted with keyword search. In phase 3, the patient generates the trapdoor set to search his/her history health record to improve the doctor's diagnosis of the patient. In this scenario, the keyword trapdoor

Security proof
As the data used by the patient is similar to that of the data user, we only demonstrate the safety of data used by data users. Theorem 1. Our scheme is IND-CKA secure in the random oracle model, if mBDH assumption holds in G 1 and G T .
Proof. We assume the existence of a polynomial-time adversary A 1 with non-negligible advantage �(k) in attacking the privacy for keywords of our scheme, where �(k) is a negligible function in the security parameter k. We construct a simulator B that can compute the solution of the mBDH problem.
Let (g, g α , g β , g γ 2 G 1 ) be an instance of the mBDH problem, where g is the generator of G 1 and a; b; g 2 Z � q are uniformly random choices. The goal of B is to output e(g, g) αβ/γ 2 G 2 by interacting with A 1 as follows: H 1 query: B maintains an empty-initial Uncorrupted key query: On input an index i, B selects x i 2 Z � q randomly and outputs the public key pk i ¼ ðg g Þ x i . Thus, the private key is defined as sk i = γx i implicitly. B adds <i, pk i , Corrupted key query: On input an index i, B selects x i 2 Z � q randomly and outputs the public key pk i ¼ g x i . Thus, the private key is defined as sk i = x i implicitly. B adds <i, pk i , x i > to L U .
Trapdoor query: When A 1 makes a trapdoor query on the keyword w i , B responds as follows: Then, T i is the trapdoor for keyword w i and B returns Re-encryption key query: When A 1 asks B about the re-encryption key rk i!j for two public keys pk i , pk j , B responds as follows: • If neither pk i nor pk j belongs to L C , B aborts.
• Otherwise, B returns rk i!j = x j /x i to A 1 .
Challenge: Eventually, A 1 issues a challenge on two keywords w 0 , w 1 , a message m, and a public key pk i . If pk i belongs to L C , then B aborts. Otherwise, B performs as follows: • B conducts two H 1 queries to obtain h 0 , h 1 2 G 1 such that H 1 (w 0 ) = h 0 , H 1 (w 1 ) = h 1 . If both c 0 = 1 and c 1 = 1 hold, then B aborts.
• Otherwise, at least one of c 0 and c 1 is equal to 0. Then B randomly picks b 2 {0, 1} so that c b = 0.
• B implicitly defines pk b=g as its guess to e(g, g) βα/γ .

Theorem 2. Our scheme is IND-CPA secure in the random oracle model, if QDBDH assumption holds in G 1 and G T .
Proof. We assume the existence of a polynomial-time adversary A 2 with non-negligible advantage �(k) in attacking our scheme, where �(k) is a negligible function in the security parameter k. We construct a simulator B that can compute the solution of the QDBDH problem.
Let (g, g a , g b 2 G 1 ) be an instance of the mBDH problem, where g is the generator of G 1 and a; b 2 Z � q are uniformly random choices. The goal of B is to output e(g, g) a/b 2 G 2 by interacting with A 2 as follows: H 1 query: B maintains an empty-initial Public key query: B generates a random coin c 2 {0, 1}. If c i = 1, B selects a random value x i 2 Z � q and outputs the public key pk i ¼ g ax i . Otherwise, B outputs the public key pk i ¼ g x i and adds <c i , pk i , x i > to table L C , where the private key is implicitly defined as sk i = x i .
Private key query: B recovers <c i , pk i , x i > from L C . If c i = 0, the private key sk d = x d is returned to A 2 . Otherwise, it aborts.
Re-encryption key query: The adversary A 2 can adaptively ask B for the re-encryption key rk i!j for any two public keys pk i , pk j and B generates the re-encryption key as follows: • If c i = 1 and c j = 1, B aborts.
• Otherwise, B responds rk i!j = x j /x i to A 2 .

PLOS ONE
Re-encryption query: Based on the result of re-encryption query, B obtains the re-encryption ciphertext through the re-encryption algorithm and returns it to A 2 .
Decryption query: After obtaining the re-encryption ciphertext C 0 m ¼ ðC Þ . Challenge: Eventually, A 2 issues a challenge on two messages m 0 , m 1 and a public key pk i . B recovers the tuple <c i , pk i , x i > from L C . If c = 1, then B reports failure and aborts. Otherwise, B randomly selects δ 2 {0, 1} and sets the challenge ciphertext C 0 d as follows:

Performance analysis
In this section, we expound a theoretical analysis on the performance of the proposed schemes. Then, we analyze the efficiency of the scheme by numerical simulation. To show the performance more intuitively, we have implemented our scheme, as well as the schemes used by Wu [23] and Wang [24] in the Linux operating system using Pairing-Based Cryptography (PBC) Library [25], programmed in C language, and ran in a virtual machine of a PC (HP PC, 3.1 GHz CPU, and 4 GB RAM). In the experiment, we used elliptical curves with a base field size of 512 bits and an embedding degree of 2. The security levels are selected as |p| = 512.

Theoretical analysis
In this section, we compare the computation overhead of the proposed scheme and other schemes from a theoretical perspective. We denote T e , T p , T h , T H , T mul as the computation cost of exponentiation operation, bilinear pairing operation, general hash function, hash-to-point operation, and multiplication operation, respectively. The running time of those basic operations are presented in Table 1. As shown in Table 1, T H and T mul are much smaller than the others, so the hash-to-point operation time and multiplication operation time are negligible. The descending order time of common cryptographic algorithms is T e , T p , T h , T H , T mul , and the computational cost of the bilinear pairing operation is much higher than that in other cryptographic algorithms. The computation cost of the proposed schemes in the index generation and search phases is presented in Table 2. We specify n as the number of keywords.
As shown in Table 2, in the index generation phase, the descending order of the computation cost is Wang's scheme [24], our scheme, and Wu's scheme [23]. In the search phase, the descending order of the computation cost is Wang's scheme [24], our scheme, and Wu's scheme [23]. Since Wu's scheme [23] only implements single-keyword encryption and our

PLOS ONE
scheme implements multi-keyword encryption, the computation cost of our scheme in the index generation and search phase is higher than that of Wu's scheme [23].

Numerical simulation
We compared our scheme with the schemes proposed by Wu [23] and Wang [24] through numerical simulation. Both our scheme and Wang's scheme [24] realize multi-keyword search function in ciphertext, whereas Wu's scheme [23] only realizes single-keyword search. In the numerical simulation, we use the same number of keywords in the index generation and search phases, and compare the computational overhead of different keyword quantities in each phase. We specify the number of keywords as n = 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000. The experimental result is the average time for the algorithm to run 10 times. For more information, see S1 Appendix and S1 File. As illustrated in Fig 2, index generation time increases with the number of keywords. The index generation time of our scheme is less than that of Wang's scheme [24] but higher than that of Wu's scheme [23]. The reason is that our scheme uses bilinear pairing operations in the keyword encryption process, but Wu's scheme [23] is not used. In Fig 3, we present the time cost of the search phase in all schemes. The time spent linearly increases with the number of keywords. Wu's scheme [23] and our scheme have a subtle difference in the search phase, and both are higher than Wang's scheme [24].

Conclusion
We presented an EMR data sharing scheme with privacy protection, secure storage, and secure sharing based on searchable encryption and proxy re-encryption technology, which solves the security problems of data security and personal privacy in the process of EMR sharing based

PLOS ONE
on cloud storage. While protecting the privacy of the patient, this scheme enables patients to access their own EMR. After authorization is provided by the patient, the data users can also access the EMR, which is a practical approach. The EMR ciphertext and keyword index are stored in the cloud server to enable the patient to search EMR with keyword search. The cloud server generates a re-encryption key for the data user after the patient authorizes the data user to access his/her EMR. Then, the cloud server re-encrypts the EMR ciphertext with the reencryption key and sends it to the data user, who can decrypt it using the private key.
Supporting information S1 Appendix. Data used to build graphs. The experimental data used for plotting in