A secure and efficient certificateless content extraction signature with privacy protection

Efficiency and privacy are the key aspects in content extraction signatures. In this study, we proposed a Secure and Efficient and Certificateless Content Extraction Signature with Privacy Protection (SECCESPP) in which scalar multiplication of elliptic curves is used to replace inefficient bilinear pairing of certificateless public key cryptosystem, and the signcryption idea is borrowed to implement privacy protection for signed messages. The correctness of the SECCESPP scheme is demonstrated by the consistency of the message and the accuracy of the equation. The security and privacy of the SECCESPP scheme are demonstrated based on the elliptic curve discrete logarithm problem in the random oracle model and are formally analyzed with the formal analysis tool ProVerif, respectively. Theory and experimental analysis show that the SECCESPP scheme is more efficient than other schemes.


Introduction
Content extraction signatures have been widely used to protect electronic medical records and electronic commerce security, in which the signature verifier can verify the authenticity of the extracted message without knowing the entire signed message [1]. Efficiency and privacy protection are the key aspects of a content extraction signature. For efficiency purposes, most content extraction signature schemes were developed using the traditional public key cryptosystem [2][3][4][5] or the identity-based public key cryptosystem [6][7][8][9][10]. Owning to the problems of certificate management in traditional public key cryptosystems and user key management in identity-based public key cryptosystems, respectively, people have turned to certificateless public key cryptosystems to implement the content extraction signatures [11][12][13] in the present day because no certificate is required and key management problems barely exist for certificateless public key cryptosystem. However, certificateless public key cryptosystem are based on costly bilinear paring of elliptic curves. The efficiency of such an approach is low. The operations currently available for elliptic curves include exponential operation, scalar multiplication, bilinear pairing and a hash function According to references [8,9], scalar multiplication is the most efficient operation and is therefore a good option to replace inefficient bilinear pairing in certificateless public key cryptosystem. For privacy protection, the signed message of a content extraction signature may contain a private message. However, existing content extraction signatures do not consider privacy protection. The available privacy protection methods mainly include data distortion and data encryption [11]. Data distortion has no universality and depends on the data types used. Data encryption is a good choice for implementing privacy with regard to content extraction signatures. In general, the sign-then-encrypt approach can provide privacy and authentication. However, it is inefficient because the encryption and signature processes are separate operations. But signcryption [2] can provide privacy and authentication with high efficiency because the encryption and signature are completed in a single logical step.
Therefore, to improve the efficiency of and provide privacy protection for content extraction signatures, we propose a Secure and Efficient Certificateless Content Extraction Signature with Privacy Protection (SECCESPP) scheme. The main works is presented as follows: 1. Scalar multiplication is used on elliptic curves to replace inefficient bilinear pairing in a certificateless public key cryptosystem, and the signcryption idea from data encryption approach is borrowed to implement privacy for signed message.
2. The correctness of the scheme is demonstrated by the consistency of the message and the accuracy of the equation, the scheme's security is verified based on the elliptic curve discrete logarithm problem in the random oracle model, and privacy is analysed with the formal analysis tool ProVerif.
The rest of the paper is organized as follows. Section 1 discussed the related works of privacy protection, content extraction signature and signcryption. Section 2 makes a simple review of preliminaries. Section 3 presents the SECCESPP scheme. Section 4 illustrates the correctness of the SECCESPP scheme. Section 5 gives proofs about security requirements. Sec-tion6 makes comparisons of performance. Finally, Section 7 presents the conclusion and future work.

Related work
Since the SECCESPP scheme involves privacy protection, content extraction signatures and signcryption, this section briefly introduces the related works regarding these three components.

Privacy protection
There is much information that is vulnerable to attackers. In 2017, Adat et al. discussed the history, background, statistics of IoT and security based analysis of IoT architecture and provided taxonomy of various defense mechanisms [20]. It is urgent to protect private information. The existing privacy protection technologies can be divided into two categories: data distortion and data encryption approaches [11].
Technologies based on data distortion distort sensitive data while keeping some data or data attributes unchanged, and the most common technique of this type is the differential privacy technique. In 2018, Ye [21] proposed a new localized differential privacy protection model. The protection model adopts random response technology, which first makes the data private and then sends it, to provide comprehensive protection for sensitive information; this approach can not only resist attackers with arbitrary background knowledge but also prevent privacy attacks from distrusted third parties. However, it depends heavily on the accompanying data, and different algorithms are designed for different data, so it has no universality. In 2021, Stergoiu et al. [22] proposed an innovative system of secure caching scenario which operates in a wireless-mobile 6G network for managing BD on Smart Buildings(SB) and created a novel and secure Cache Decision System(CDS) in a wireless network that operates over a SB, which offer the users a safer and efficient environment for browsing the internet, sharing and managing large-scale data in the fog. It could be a start point for better and more efficient wireless networking scenario, for managing and sharing Big Data on a Smart Building.
Technologies based on data encryption hide sensitive data in the process of data mining. It is mainly used in distributed application environments and can solve the communication security problem. At present, Homomorphic Encryption (HE) is widely used. HE refers to specific classes of encryption schemes that allow for computing directly on encrypted data without having to decrypt them. In 2020, Zhao [23] presented a circular secure public key homomorphic encryption scheme using noise flooding technique, and provided security proofs and parameter setting. Furthermore, by introducing the refuse sampling technique, an optimized circular secure public key homomorphic encryption scheme was given, and the system parameters were reduced from the super-polynomial level to the polynomial level, thereby greatly reducing the size of the public key and ciphertext. Then, the computational complexity of ciphertext evaluation could be effectively improved, and the performance of the homomorphic encryption scheme could be enhanced. However, at the same time, there are some problems with this method, such as high computing costs, high communication costs, complex deployment, and high practical application difficulty. In 2021, Zhang et al. [24] proposed a secure decentralized spatial crowdsourcing scheme for 6G-Enabled Network in Box using CBC-MAC authenticated encryption mechanism to provide confidentiality and integrity. It solves leakage of sensing nodes locations. But it Still hasn't solved the problem of data leaks in transit. In the same year, in order to solve the security of shared information VANET system, Vijayakumar et al. [25] proposed an efficient batch authentication and key exchange schemes, which will be applied to blockchain users in the future. Then, Azees et al. [26] completed the following work, applying blockchain technology to the security guarantee of VANET system, realizing the rapid reauthentication of vehicles, and making a contribution to the information security in the future blockchain era.

Content extraction signature
The Content Extraction Signature (CES) was first proposed by Steinfeld [27]. According to the technology used, CES types are mainly divided into a CommitVector (CV)-based content extraction signature, an RSA-based content extraction signature, and a hash tree (HT)-based content extraction signature.
A CES scheme [3] based on a CV has the characteristics of unforgeability and exclusivity. Its unforgeability is jointly guaranteed by the standard digital signature EUF-CMA and the binding of the message commitment scheme, while its exclusivity is guaranteed by the hiding of the message commitment scheme. Scheme [27] formalized and proved these two securities. Scheme [3] has a lower computational cost than the CES scheme [27], which requires a signature operation and a commitment operation to be performed for the original signature generation process. However, because the original signature and the intercepted signature contain the committed random numbers of all the retained sub-data and the committed values of the deleted sub-data, the length of the signature expands, and the communication overhead increases.
To solve the problem of CVs, an RSA-based CES was proposed. An RSA-based CES is formed on the basis of a CV-based CES using the RSA signature, and the length of the signature is only the length of the modulo of the RSA signature, which greatly reduces the length of the signature. Combining this with the idea of batch signatures, in 2014, Li [4] proposed an improved scheme for content extraction signatures based on RSA. The scheme can judge whether a content extraction access structure (CEAS) meets the given extraction conditions through the correspondence between (M 0 ) and CI(M 0 ). In 2015, Lan [28] proposed an identity-based CES scheme. This scheme does not need to sign every sub-message, thereby improving efficiency, and it can prevent PKG from forging signatures and thus improve its application value. In 2017, Wang [29], based on [28], achieved the goal of shortening the length of the signature by reducing the commitment value and random number and performed unified signature and verification operations on sub-messages, which improved the efficiency of signing and verification.
However, by using quantum cryptography, the keys of Std.RSA might get broken down to approximately 850 bits. This result in the need to enhance the current public key cryptosystem. Thus, the HT-based CES was proposed. Drawing on the idea of a binary tree, hashing every two message blocks generates a commitment hash value, recursively, layer by layer, and obtains a total hash value, which greatly reduces the chance of the CES breaking. In 2016, Thirumalai [30] proposed a commitment tree-based batch signature scheme. Compared with the CV-based CES and RSA-based CES, this scheme has a lower signature length, fewer calculation operations, and improved signature efficiency. In 2019, Szalachowski [31] proposed a TLS-N method based on the TLS extension. In this method, the Merkle tree is used during the process of generating evidence. The server generates evidence about the TLS session content, generates a noninteractive certificate about the TLS session content on the client, and then sends the session content and the certificate to a third party for verification. In 2020, Cheng [32] proposed a blockchain based secure storage and sharing scheme for electronic health records data. In this scheme, a certificateless content extraction signature algorithm is used to provide privacy protection, secure sharing of data has realized in combination with smart contracts. The combination of blockchain and content extraction signature is better applied in the electronic medical records.

Signcryption
Signcryption is a cryptographic primitive that captures a common practical scenario where one simultaneously requires confidentiality and nonrepudiation for transmitted data. Signcryption schemes achieve confidentiality and authentication simultaneously by combining public key encryption and digital signatures, offering better overall performance and security than other schemes [11]. There are three types of signcryption: public key infrastructure (PKI)-based signcryption, identity (ID)-based signcryption, and certificateless signcryption.
A PKI is required to manage and distribute public keys. In such systems, a public key is bound to the corresponding unique user ID. Trusted third-party tools are used to bind the users to unique public keys through an appropriate registration process. Based on the PKI concept, in 1997, Zheng [2] proposed signcryption, which has since been widely discussed and studied. The original scheme used interactive zero-knowledge proof technology, which is not efficient. In 2019, Yan [5] proposed a signcryption scheme that directly uses the sender's public key to verify the validity of the signature. Compared with the sign-then-encrypt mechanism, the public key size and the computational cost of the signcryption operation are both obviously reduced. However, the use of an additional third-party application makes public key cryptography expensive and inefficient.
To overcome the problem of the PKI management system, many ID-based signcryption schemes have been proposed. The idea of ID-based signcryption was first proposed by Malone [6] in 2002 along with a security model. This model was developed by Boyen [7]. Three new security notions were added: ciphertext unlinkability, ciphertext authentication and ciphertext anonymity. In 2019, Wang [8] proposed a basic model for ID-based signcryption schemes that can use bilinear pairs to design signcryption schemes. In the same year, Shankar [8] pointed out that scheme [8] was not secure and proposed three new secure solutions. However, they do not satisfy public verifiability and forward security at the same time. In response to this problem, in 2019, Pan [10] proposed a solution that uses two private keys for signcryption and unsigncryption. In 2019, Deng [33] proposed a new ID-based signcryption model, that solved the problem in which [9] does not simultaneously satisfy public verifiability and forward security. However, the system needs a third-party application for private key management to generate them secretly and distribute them to users.
To address PKI-based and ID-based signcryption issues, a certificateless signcryption approach was proposed by Riyamin [11] in 2008. It presents stronger security properties than one might expect from its internal building blocks; sharing randomness between encryption and signature modules not only provides extra savings in terms of the computational and bandwidth loads but also yields strong insider security guarantees. In 2019, Gao [12] proposed an improved certificateless signcryption scheme. This scheme guarantees the security of the signcryption phase by defining the length of the message space during the system establishment phase. In the same year, Wang [13] proposed the definition of a blind signcryption scheme under certificated and certificateless public key cryptosystems and proposed a blind signcryption scheme based on bilinear pairing. The scheme increases blindness, but the computational cost does not increase significantly. In 2020, Fang [34] proposed a certificateless multi-receiver multi-message simultaneous broadcast signcryption scheme. Combined with random elements in an elliptic curve cyclic group, the encryption key is generated, which solves the problems of receiver decryption ciphertext and identity anonymity protection. However, the scheme lacks a security mechanism when verifying the signcryption.

Preliminaries
This section introduces the commitment scheme, salt tree, binary commitment tree and content extraction access structure used in the SECCESPP scheme.

Commitment
Commitment, with two characteristics of Hiding and Binding, is a fundamental model in the field of cryptography. Hiding means that the commitment can hide information, that is, no other entity can obtain information from the commitment except the entity that places the information there. Binding means that no entity is allowed to change the information within the commitment, and it can verify that the information it receives is indeed the information originally promised.
The commitment scheme is composed of three algorithms: Gen(), Com() nad Ver(). Initialization phase: Gen() accepts a "1" bit string of length k as input and outputs a common reference string crs. crs Commitment phase: Com() accepts a common reference string crs and a committed message m as input, and outputs m's commitment value com and decommitment information dec.

ðcom; decÞ Comðcrs; mÞ ð2Þ
Decommitment phase: The sender sends dec and m to the receiver. Verification phase: Ver() accepts the message m, common string crs, commitment value com and decommitment information dec as input, and it outputs verification result.

Yes=No
Verðcrs; com; dec; mÞ ð3Þ In the SECCESPP scheme, the entire message M is divided into n blocks, along with a random number string called salt, and used as input for the message commitment algorithm C(). Because the entropy of the message block is low, pseudorandom salt can be used to protect the committed privacy and avoid brute force attacks. Combine the message block M[i] and pseudorandom salt S i r ;i c together to generate the com-

Salt tree
The leaf nodes in a salt tree are used as salt to generate a binary commitment tree to greatly increase the computations required by an attack.
To protect privacy, S i r ;i c and S i 0 r ;i 0 c must remain independent, meaning that the salt tree generation process cannot be sent directly to the verifier and S i r ;i c must be retrieve. The salt tree is constructed by the function E(), and the salt value is obtained as the input of the binary commitment tree. The inputs for the function E() are a session-based secret value and a Nonce random value, and the output is a pseudorandom salt. During the verification process, the corresponding salt is required for completion.

Binary commitment tree
The purpose of building the binary commitment tree is to include the commitment in the proof. The commitment value generated by the commitment algorithm serves as input for obtaining the leaf node value of the binary commitment tree. The hash value h i of the binary commitment tree is calculated by the anti-collision hash function of the hash value, session message length l r and signer information of O i its child node. O i is the i-th member of the order vector. The commitment values generated in section II.B are used as leaf nodes to generate the hash chain. All the leaf nodes can be verified by signature verification of the root node using the binary commitment tree.
The generation process for a binary commitment tree is as follow: A session message Record i consists of n sub-message blocks. Each message block and its corresponding salt leaf node salt secret serve as input; the commitment value c is generated as the leaf node of the binary commitment tree. Every two leaf nodes are hashed in a cascade recursively, layer by layer, until the root node is finally obtained. At this point, the binary commitment tree construction process is completed.

Content extraction access structure
To extract sub-message blocks from the original message, the concept of a content extraction access structure (CEAS) is introduced. Sub-message block numbers in a CEAS must be

The SECCESPP scheme
The SECCESPP scheme can implement a content extraction signature with privacy protection for a singed message in a single security mechanism in which scalar multiplication on elliptic curves and the signcryption idea are used. Fig 1 shows the main research framework of the SECCESPP scheme. The SECCESPP scheme is composed of Key-Generation, Signature-Generation, Signcryption-Extraction, and Signcryption-Verification algorithms. Three roles are defined: signer, signcryptor and verifier. The signer first divides the entire message into n blocks and then generates a commitment for each block. Next, the signer generates a signature and sends it to the signcryptor. After receiving the signature, the signcryptor extracts the extracted message blocks and the corresponding commitments, encrypts the extracted message, and generates the content extraction signcryption sent to the verifier. The verifier receives and verifies the content extraction signcryption.

Key-Generation algorithm
The Key-Generation algorithm generates a public key SK A and a private key PK A for signer A in a certificateless cryptosystem.
1. Set up the system parameters: First, the KGC (key generator center) selects the master number of k-bits P, where k is the security parameter, and obtains {F P , E / F P , G, P}. Then, x 2 Z � n is selected as the system master key msk, and the master public key P pub = xP is calculated. Next, hash function are selected:  2. Signer A randomly selects x A 2 Z � n as the secret value, computes P A = x A � P, and sends it to the KGC, where the identity of signer A is ID A . The KGC calculates h A = H 1 (ID A , R A , P A ), R A = r A � P and s A = r A + h Ax mod n to generate a partial key D A = {S A , R A }. Finally, the private key SK A = (x A , s A ) and public key PK A = (P A , R A ) are produced for signer A in a certificateless public cryptosystem.  11: return σ F //generates signature

Signcryption-Extraction algorithm
After receiving the signature σ F , the signcryptor obtains v 0 according to the Signature-Generation algorithm, calculates h A = H 1 (ID A , R A , P A ) and h = H 3 (v 0 , R, PK A ), and verifies the equation s(R+ hP) = P A + R A + h A P pub . If the equation is not correct, the signcryptor stops the algorithm. Otherwise, the following steps are performed to obtain the signcryption σ E : 1. Generate ext(i) according to the CEAS.

Calculate E A , E and extract the signcryption σ E
The complete pseudo code for signcryption extraction is given in Algorithm 2.

Signcryption-Verification algorithm
After receiving the signcryption σ E , the verifier decrypts M 0 and v 0 and verifies the signcryption by the equation s(R + hP) = P A + R A + h A P pub . The complete pseudocode for signcryption verification is given in Algorithm 3.

Correctness analysis
In this section, we prove the correctness by the consistency of the message and the accuracy of the equation.

Message consistency
Message consistency indicates that the extracted message M 0 in the Signcryption-Extraction algorithm is consistent with the decrypted message M 0 in the Signcryption-Verification algorithm.
We analyze the consistency between the submessage M 00 extracted in the Signcryption-Extraction algorithm and the submessage M 000 decrypted in the Signcryption-Verification algorithm.
Submessage M 00 is extracted in the Signcryption-Extraction algorithm using the following equation: Submessage M 00 is decrypted in the Signcryption-Verification algorithm using the following equation: The fact is that M 00 and M 000 are the same, hence, the SECCESPP scheme has consistency.

Equation accuracy
If equation s(R + hP) = P A + R A + h A P pub in the Signcryption-Verification algorithm is true, then the SECCESPP scheme is bounded. In this section, we check s(R + hP) = P A + R A + h A P pub with the following process.
Therefore, s(R + hP) = P A + R A + h A P pub is true, and the SECCESPP scheme is bounded.

Security analysis
In this section, first, we demonstrate the security of the SECCESPP scheme under the random oracle model. Then, we use the formal analysis tool ProVerif to formally analyze privacy. Finally, we provide proof of the unforgeability of the SECCESPP scheme.

Security under the random oracle model
The SECCESPP scheme is demonstrably secure under the random oracle model in [35] and can resist adaptive chosen message attack. The possible attacks are divided into two types. TYPE 1: The attacker does not have access to the primary key. However, the attacker can request or replace the user public key. As discussed above, we impose several natural restrictions on TYPE 1: (1) The attacker cannot extract the private key for ID i at any point. (2) The attacker cannot request the private key for and identity if the corresponding public has already been replaced. TYPE 2: The attacker does have access to privacy, but cannot request or replace the user public key. The restrictions on this type of attacker are as follows: (1) The attacker cannot replace public keys at any point. (2) The attacker cannot extract the private key for ID i at any point.
The proofs for the two types of attacks are similar. Hence, we only present the proof for the attacker who does not access the primary key but can request or replace the user public key.
Definition (ECDLP): For a random number x 2 Z � n , given two elements P, Q such that Q = x � P, the goal of the ECDLP is to calculate x.
Theorem: In the random oracle model, the SECCESPP scheme is secure if the ECDLP is intractable.
Proof: Assume that attacker B who attacks the SECCESPP scheme. Let attacker B construct algorithm F to solve the ECDLP problem.
Initialization Phase: F initializes P and Q and transmits the public parameters to attacker B. The public parameter is params = {F p , E / F p , G, P, P pub = Q, H 1 , H 2 , H 3 , H 4 }.
Queries Phase: Attacker B executes the following queries, and F adaptively responds to these queries.
1. User query: When attacker B performs a user query on ID i , challenger F selects a random number t 2 {1, 2, . . ., q c }. Then, the pseudo-code is executed.
Returns (ID i , R i , P i , s i , x i , h i )and adds into L H 1 ;} 5: else{ 6: F selects a; b; c 2 Z � n ; 7: Computes 6. Signcryption Extraction query: When B extracts the signcryption of (v 0 , List, that means the user's public key has been replaced. Thus, F lets s = a, R = a −1 h i P pub and outputs (R, s) as the signcryption. When ID i 6 ¼ ID t , if ξ = 1, F returns 'failure'; otherwise, ξ = 0, F lets s = a, R = a −1 h i P pub , and it outputs (R, s) as the signcryption.
Forgery: B stops the queries and outputs a valid signcryption (R, s (1) ). If ID i 6 ¼ ID � , 'failure' is declared. Otherwise, the attack is successful. Then, F makes full use of the generalized Forking lemma [36] of certificateless signatures, inputs two different H 2 values, repeats the above process, and obtains two different signatures (R, s (2) ) and (R, s (3) ). Then, s (k) (R + h (k) P) = P i + R i + h i P pub , k = 1, 2, 3. Additionally R = lP, P i = r i P, P pub = xP; thus, s (k) (l + h (k) ) = x i + r i + h i x, k = 1, 2, 3.
In the above interrogation process, i, r i , x are unknown, but F can solve the three unknowns and output x; therefore, the elliptic curve discrete logarithm problem can be solved.
To solve a given instance of the ECDLP, F is required to successfully execute the following events: T 1 : F does not stop the whole time. T 2 : (R, s) is a valid signcryption forgery of ðID � ; PK � ID ; m � Þ. T 3 : q s is finite and ξ = 1. The probability of the attack of F being successful is Claim1: If T 1 occurs, during the attack of F, the probability of success of the Partial Key Extraction query is ð1 À ð1=qcÞÞ q PKE x , where q PKE x is the number of times some keys are queried.
Claim2: The probability of success for the Secret Value query is ð1 À ð1=qcÞÞ q VE x is the number of times the secret value is queried. Claim3: The probability of success for the Signcryption Extraction query is ð1 À ð1=qcÞxÞ � ð1 À xÞ q s , where q s represents the number of times the signcryption is extracted.
As a result,Pr½T1� � ð1 À ð1=qcÞÞ q PKE x þq VE x ð1 À xÞ q s . We state that Pr[T 2 | T 1 ] = ε, so Pr[T 3 | is a constant and you cannot ignore ε, so you cannot ignore Adv ECDLP F , which contradicts the hypothesis. The SECCESPP scheme is secure under the random oracle model in [29].

Privacy
We analyze the privacy of a signed message in the SECCESPP scheme using the formal analysis tool ProVerif [37,38]. Privacy is modeled as confidentiality. First, Applied PI is used to formalize the SECCESPP scheme. Then, ProVerif is used for analyzing privacy.
Function and equational theory. The functions and equations used in the modeling process are described in this section. We use the Applied PI calculus to formalize the SECCESPP scheme. Fig 2 depicts the function and equational theory.
The function and equational theory of the SECCESPP scheme mainly includes the public key encryption algorithm En(x, Pu) encrypt the message x with public key Pu and the decryption algorithm De(y, Pu) decrypt the message which the En(x, Pu) encrypted with private key Pu. The function Pu(y) accepts private value y as input and produces public key as output. The function Pr(y) accepts private value y as input and produces private key as output.

Process
The whole process in Fig 3 consists of three processes: the signer process processSig in Fig 4, the signcryptor process processSc in Fig 5 and the verifier process processVer in Fig 6. They constitute the main process together, as shown.
The signer first divides the entire message into n blocks, and then generates a commitment for each block. After that, the signer calculates R and h. Then, the signer compares the value gcd() and '1'. If they are equal then forwards the signature σ F to the signcryptor process pro-cessSc in form of the message m 1 through the public channel c.
The signcryptor receives the message m 1 from the signer process through the public channel c. Then it extracts the extracted message blocks and the correspondent commitments. After that, the signcryptor calculates h and h A . Then it compares the value s � (R + hP)with P A + R A + h A � P pub . If they are equal then encrypts the extracted message M 0 and follows the content extraction signcryption σ E to the verifier process processVer in form of the message m 2 through the public channel c.
The verifier receives message m 2 from the signcryptor process through the public channel c. Then it decrypt M 0 and v [i], and verifies the signcryption by the equation s(R + hP) = P A + R A + h A P pub . Finally, the verifier inputs message m 3 as the verification results.
ProVerif for automatic privacy verification. The privacy of the SECCESPP scheme is modelled as the confidentiality of the signed message M. query attacker: M 0 is used to model the confidentiality of the signed message. and is added into formal model in Fig 7. Result analysis. The result in Fig 8 is true, and the SECCESPP scheme has privacy because M 0 is encrypted before it is sent to the verifier. Attackers can only get encrypted message block, thus, privacy is guaranteed.

Unforgeability
In a certificateless public key cryptosystem, KGC generates a partial private key, and the user generates a secret value and generates a full private key and the public key respectively according to the secret value and the partial private key. This method solves the key management issue in identity-based cryptosystem and the certificate problem in PKI cryptosystem. It is possible for KGC to forge the user's signature because KGC holds part of the user's private key. Therefore, we divide the attack model of the SECCESPP scheme into two types:  Adversary A I : ordinary user attack. In this attack, the attacker cannot obtain the master key but can replace the public key.
Adversary A II : malicious KGC attack. In this attack, the attacker has the master key and can generate any part of the user's private key, but it is specified that the user's public key cannot be replaced.
Since Adversary A I type is similar to Adversary A II type but Adversary A II type is more representative, we provide a proof of unforgeability for the Adversary A II type. Theorem: If Adversary A II can output a valid content extraction signcryption σ E and has not performed a Signcryption Extraction query, then the attacker succeeds, that is, the SEC-CESPP scheme is broken.
Lemma: If Adversary A II wins the game by at least the probability of ε after q k User query, q PK Key Extract query and q s Signcryption Extraction query within a bounded time, then the SECCESPP scheme is said to be unforgeability under an adaptive chosen message attack. Game: The SECCESPP scheme in the Adversary A II case of the adaptive chosen message attack game, which is between challenger C and Adversary A II .
Proof: The security model of unforgeability consist of three phases: Setup Phase, Queries Phase and Forgery Phase. In Queries phase, adversary A II performs multiple queries including User query, Key Extraction query and Signcryption query. challenger C gives corresponding responses.
Setup Phase: Adversary A II makes multiple queries, challenger C maintains lists l 1 − l 3 that are empty initially.
Initialization: Challenger C runs the Initialization algorithm. Input security parameter k, challenger C generates x 2 Z � n , system master key P pub and the system parameter params, and sends them to Adversary A II . The params = {F p , E / F p , G, P, P pub = Q, H 1 , H 2 , H 3 , H 4 }. Set a 2 Z � n . Queries Phase: Adversary A II executes the following queries, and challenger C adaptively responds to these queries.
1. User query: When Adversary A II presents query on ID t , challenger C maintains a hash list H 1 − list that is initially empty including two-tuples (ID t , Q t ). Challenger C checks whether record exists in a hash list H 1 − list. If so, challenger C returns corresponding record, else challenger C makes the following responses: q PK , where q PK is the number of times public key are queried.

Comparison and discussion
In the SECCESPP scheme, we use scalar multiplication on elliptic curves, thereby reducing the number of calculations in the signing and verification process. For efficiency, we compare two aspects of the proposed scheme to those of related schemes in [14][15][16][17][18][19]. One is the theory calculation aspect. The other is the practical running time aspect. For the theory calculation aspect, the SECCESPP scheme is compared in terms of the following four factors in Tale 1: exponential operation (exp), scalar multiplication (sca), bilinear pairing operation (par) and hash function (has), where n is the number of submessages, m is the number of the extracted messages, m CEAS is the number of submessages in the content extraction access structure CEAS. The SECCESPP scheme has "(2n+2)has+4sca" and "(m+2)has+2sca" calculation, which are the lowest amounts. Hence, the SECCESPP scheme is highly efficient from the theoretical calculation aspect. For the practical running time aspect, the hardware platform consists of Intel Core m3-6Y30 @0.90 GHz processor and 8GB memory. The software environment includes Windows 10 operating system for 64 bits and Miracl library [39] for which the parameters are specified as follows: the supersingular elliptic curve E / F p : y 2 = x 3 + x is selected, in which the embedding degree is 2 and the prime number p satisfies 2 510 < p < 2 511 , p + 1 = 12qr, q = 2 159 + 2 17 + 1. The Tate pair operation is defined in E / F p : y 2 = x 3 + x. The scalar multiplication operations on the elliptic curves satisfy p = 2 160 − 2 31 − 1.
To avoid randomness, the simulation experiments are performed five times to obtain the averaged results. The computing time of the four factors is shown in Table 2. According to Tables 1 and 2, the running time in Table 3 are calculated.
The running time of the scheme depends on the signcryption and extraction time and verification time. From Table 3 and Fig 9, we can clearly see that the scalar multiplication operation on the elliptic curves takes much less time than bilinear pair operation and exponential operation. Hence the analysis result of theory calculation is consistent to analysis result of the practical running time. Therefore, the SECCESPP scheme has higher efficiency than compared scheme.

Conclusion
To improve the efficiency of and provide privacy protection for content extraction signatures, we proposed the SECCESPP scheme in which the scalar multiplication on elliptic curves is used to replace inefficient bilinear pairing in a certificateless public key cryptosystem, and the signcryption idea is borrowed to provide privacy protection. The SECCESPP scheme is provably secure based on the elliptic curve discrete logarithm problem in the random oracle model. It not only has correctness and privacy, but is also more efficient than related schemes [14][15][16][17][18][19].
In the future, we will use the SECCESPP scheme to the off-chain data access in blockchain to implement security and privacy.