Figures
Abstract
Background
A computer tomography image (CI) sequence can be regarded as a time-series data that is composed of a great deal of nearby and similar CIs. Since the computational and I/O costs of similarity measure, encryption, and decryption calculation during a similarity retrieval of the large CI sequences (CIS) are extremely high, deploying all retrieval tasks in the cloud, however, will lead to excessive computing load on the cloud, which will greatly and negatively affect the retrieval performance.
Methodologies
To tackle the above challenges, the paper proposes a progressive privacy-preserving Batch Retrieval scheme for the lung CISs based on edge-cloud collaborative computation called the BRS method. There are four supporting techniques to enable the BRS method, such as: 1) batch similarity measure for CISs, 2) CIB-based privacy preserving scheme, 3) uniform edge-cloud index framework, and 4) edge buffering.
Citation: Zhuang Y, Jiang N (2022) Progressive privacy-preserving batch retrieval of lung CT image sequences based on edge-cloud collaborative computation. PLoS ONE 17(9): e0274507. https://doi.org/10.1371/journal.pone.0274507
Editor: Omar A. Alzubi, Al-Balqa Applied University Prince Abdullah bin Ghazi Faculty of Information Technology, JORDAN
Received: February 17, 2022; Accepted: August 28, 2022; Published: September 15, 2022
Copyright: © 2022 Zhuang, Jiang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are available from https://luna16.grand-challenge.org/.
Funding: This work is supported in part by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY22F020010; the Zhejiang Public Welfare Technology Application Research Project under grant No. LGF22H180039; the Zhejiang Medical and Health Research Project under grant No. 2019RC070.
Competing interests: The authors have declared that no competing interests exist.
Introduction
With the rapid growth of the number of medical images and the increasing demand for remote diagnosis, content-based mobile retrieval for Computed tomography Image Sequences (CIS)s in telemedicine systems (TSs) [1] plays an increasingly important role in disease diagnosis in recent years. Fig 1 illustrates an example of a lung CIS consisting of a large number of nearby and visually similar Computed tomography Image(CI)s. As one of the main tasks of the TS, mobile-terminal-based high-resolution CIS retrieval enables medical professionals to identify lesion tissues in aberrant CIs and carry out the computer-assisted diagnosis and treatment.
The motivations of the CIS retrieval in edge-cloud collaborative computing mode are based on the following key observations:
- Instead of using the whole CIS, traditional CI retrieval takes a single CI as the retrieval one to perform similarity comparison which is ineffectual and inadequate in modeling the whole retrieval CIS leading to poor retrieval precision ratio;
- As the CIS data belongs to patients’ personal privacy information [2], it should be encrypted during the retrieval processing; otherwise, the personal information leakage will take place;
- To better understand the condition of their patients during remote consultations, doctors will frequently retrieve and examine their CISs in real time, which involves high computational costs, as well as the intensive transmission of the CISs. Therefore, deploying and executing all expensive retrieval and computing tasks in the cloud will result in significant computational overhead and have a negative impact on the retrieval’s performance improvement. To efficiently reduce the load of cloud computing, edge computing [3] came into being. As a new distributed computing mode, edge computing makes up for some shortcomings of cloud computing and diverts most computing tasks to edge device nodes (i.e., edge server (ES)) around the mobile terminal. This can not only significantly reduce the computing load on the cloud, but also reduce the transmission cost [4, 5] to support the retrieval in real time [6];
- For these mobile terminals whose computing resources are constraint such as the battery reserves, screen resolutions and computational powers, etc. The data transmission is negatively affected by the unstable network bandwidth which causes delays in the data retrieval and transmission, especially in rural areas with inadequate mobile communication infrastructure [1].
Based on the above analysis, the paper presents a privacy-preserving Batch Retrieval method for large lung CISs in the edge-cloud computing network, called the BRS, by analyzing the similarity of the nearby CIs in the sequence. There are few studies on how to speedup the batch similarity retrieval of the large CISs using the edge-cloud collaborative computing environment. Specifically, when a user submits a retrieval CIS(XR), firstly, an index mechanism at the edge layer called eIndex is used to quickly judge whether there are some answer CISs similar to XR in the edge buffer. If exists, then a high-dimensional similarity retrieval of the partial answer ones supported by the cIndex is carried out by accessing the cloud; otherwise, the similarity retrieval of all CISs supported by the cIndex is performed directly through the cloud; finally the answer CISs are returned to the user node. The extensive experiments demonstrate the effectiveness, efficiency, and scalability of the BRS method.
Background
Over the past fifty years, content-based image retrieval(CBIR) has been a persistent and difficult research problem [6–10]. Due to the ‘semantic gap’, however, the retrieval accuracies are still not satisfactory.
As one of the key subfields of the CBIR, content-based medical image retrieval (CBMIR) research has become increasingly popular in recent years. The first CBMIR system built for high-resolution lung CIs is ASSERT [11]. After that, many prototype systems were developed, including IRMA [12], FIRE [13], and others. A noisy image bag-based technique for retrieving medical images was presented by Huang et al. [14]. To further reduce the ‘semantic gap’, Huang et al. [15] developed a relevance feedback technique for the CBMIR based on a noisy-smoothing model. Kitanovski et al. [16] designed a multi-modality-based CBMIR system. Lan et al. [17] proposed a simple texture feature extraction algorithm for the CBMIR. A multi-panel medical image segmentation framework for the CBMIR system was supplied by Ali et al. [18]. Based on the fusion of the wavelet optimization and adaptive block truncation coding, Kasban et al. [19] built a reliable CBMIR system. Tuyet et al. [20] used the deep learning techniques to support the salient region-based CBMIR.
Since the aforementioned CBMIR systems are based on single-PC mode, their retrieval performances are not satisfactory when dealing with a great deal of medical images [21]. Anbarasi et al. [22] developed a distributed CBMIR system using distributed database techniques. Charisi et al. [23] designed a parallel CBMIR scheme in a peer-to-peer(P2P) network. Based on the hybrid features, Depeursinge et al. [24] proposed a mobile access approach to peer-reviewed medical information. Although Zhuang et al. [25] put forward an efficient and robust CBMIR technique in a mobile wireless network, the retrieval efficiency is poor since the effectiveness of the load balance strategy needs to be further improved. Based on the previous work [25], Zhuang et al. [26] introduced a high-performance batch retrieval technique for medical images in wireless network from a standpoint of multi-retrieval optimization to further improve the retrieval efficiency. A mobile teleradiology system [27] is appropriate for streamlining the CBMIR procedure. For telemedicine applications, Chitra et al. [28] suggested an enhanced retrieval approach for brain images utilizing carrier frequency offset adjusted OFDM technique. To solve the ‘semantic gap’, Jiang et al. [29] introduced a novel framework of mobile similarity retrieval of medical images based on a crowdsourcing model.
On the basis of the CI analysis, Lei et al. [30] developed a sparse CNN model-based high-resolution CI retrieval technique. Yu et al. [31] presented a liver CI retrieval algorithm based on a non-tensor product wavelet. Based on an adder combining two local bit plane-based dissimilarities, Hatibaruah et al. [32] introduced a novel CI retrieval approach. Hwang et al. [33] applied a CBIR and CNN techniques to enable diffuse interstitial lung disease retrieval. To facilitate the effective diagnosis of the lung cancer, Alzubi et al [34] designed a boosted neural network ensemble classification approach.
Despite extensive study of the CI retrieval, the majority of approaches still rely solely on this retrieval without taking the CIS retrieval into account. Meanwhile, very little research has addressed CIS retrieval in the collaborative edge-cloud environment.
Preliminaries and preprocessing
Firstly, the main symbol notations are listed in Table 1.
Fig 2 depicts the three-layer network architecture of the BRS system, which is formally stated in Definition 1.
Definition 1(MECN). A mobile edge-cloud network(MECN) is represented by a graph () which can be modeled by a triplet:
(1) where
• N means a set of nodes, formally represented as N = NU ∪ NE ∪ NC, in which
i) NU represents a user node that is used for: 1) submitting the retrieval; 2) decryption of the RIBs; 3) acquisition, reconstruction and display of the CISs;
ii) NE represents the edge nodes that are used for: 1) temporally storing the CISs buffered at NE; and 2) sending back the partial answer CISs to NU;
iii) NC represents the cloud nodes which are used for: 1) partition processing of the IBs; 2) encryption and storage of the CIBs; storing the NIB replicas, and 3) sending back the answer CISs to NU;
• E denotes a collection of edges representing the different network bandwidths for data communication at time T, formally denoted as: E = < e1, e2, …, e|E| >, where ek = (Ni, Nj) refers to the k-th edge in in which Ni and Nj are connected.
Definition 2 (POA). A pathological object area (POA) in a CI can be modeled by a two-tuple:
(2) where i is the ID number of the POA, PO is the coordinate of the POA in the CI.
According to Definition 2, a non-POA part of a CI is denoted as NPOA.
Definition 3 (IB). An image block (IB) can be modeled as a triplet:
(3) where bid refers to the block ID, PO is the coordinate of the IB in the CI, and TP is the transmission priority of the block.
Definition 4 (CIB). Given a POA(i.e., POAk) in a CI, a correlated image block (CIB) of POAk is an IB which is contained in or intersects with it, formally denoted as: CIB = {IBbid|IBbid ∩ POAk ≠ ∅}, where k ∈ [1, |POA|] and |POA| means the number of the POAs in the CI.
Definition 5 (NIB). A NIB is an IB that is contained by a NPOA in a CI, formally represented by: NIB = {IBbid|IBbid ∩ NPOA = IBbid}.
As indicated in introduction section, there are usually some lesion tissues that the doctors may focus on in the CISs. The region of such lesion organ in the CIS is called the pathological object area (POA). In the preprocessing step, the POAs need to be preliminarily marked by medical specialists; then each CI in the sequence is equally divided into some IB (i.e., NIB and CIB) replicas, with the CIBs being encrypted and saved at their original pixel resolutions while the NIBs are stored at a lesser resolution. As illustrated in Fig 3, there are two POAs (A and B) and one NPOA (i.e., C) in an example CI which can be segmented into 6 × 8 IBs marked by red dash lines.
Methodologies
In this section, we first introduce four supporting techniques based on which a BRS algorithm is proposed next.
Supporting techniques
To better facilitate the batch retrieval of the lung CISs in the MECN, in this subsection, we introduce four supporting techniques: 1) batch similarity measure for CISs, 2) CIB-based privacy preserving scheme, 3) uniform edge-cloud index framework, and 4) edge buffering.
Batch similarity measure for CISs.
As mentioned before, a CIS Xi is a time-series data which can be modeled by a vector: Xi = {CI1, CI2, …, CI|Xi|}, where |Xi| means the number of CIs in Xi. Due to the large amount of the CIs in a CIS, to effectively reduce the high computation cost in the CIS similarity matching, we propose a representative CI(RCI)-based batch similarity measurement of the CISs.
Before introducing the batch similarity measure, how to extract the RCIs is a challenging issue. As summarized in Algorithm 1, given a CIS Xi, a RCI extraction processing of Xi is first performed to obtain ||Xi|| RCIs from a CIS, where ||Xi|| means the number of RCIs in Xi, d(x, y) is stated in Table 1, and ε is a small positive threshold.
Algorithm 1 RCI extraction (Xi)
input: Xi
output: ||Xi||RCIs
1: j←1, ||Xi||←1;
2: while (i < |Xi|) do
3: if d(CIj, CIj+1) > ε then
4: add CIj+1 as the ||Xi||-th RCI;
5: ||Xi||++;
6: else
7: j++;
8: end if
9: end while
10: return ||Xi|| RCIs
Once the ||Xi|| RCIs are extraction from Xi, Xi can be re-represented as: . So given two CISs (Xm and Xn), their batch similarity can be defined as follows:
(4)
As can be seen from Eq (4), the similarity of two CISs can be measured by the percentage of similar RCIs in the two corresponding CISs.
CIB-based privacy-preserving scheme.
Before introducing the CIB-based privacy-preserving scheme, let’s first give a definition.
Definition 6 (POAR). Given a POA(i.e., POAj), its corresponding POA-related region (POAR) consists of all CIBs in POAj, subjecting to the following criteria:
(5)
where POAR(POAj) means the corresponding POAR of POAj, |•| denotes the number of CIBs in •.
In Fig 3, there are two POAs (i.e., A and B) in the CI that is equally segmented into 6 × 8 IBs. Based on Definition 6, the corresponding POARs of the two POAs are represented by the green shadow areas which consist of 20 CIBs. Since the nearby CIB numbers have the characteristics of continuous distribution, it is easier to use these CIBs to reconstruct the original CI. As a result, the objective of the encryption strategy is to disrupt the continuity of the ID numbers of the nearby CIBs in the CI by encoding the ID numbers of the CIBs such that the CI reconstruction is hard to perform.
So for each CIB in a CI, we first introduce a encoding scheme (IBID) of the ID numbers of the above CIBs, which is represented in Eq (6):
(6)
where SID mean the ID of the CIS that the CIB belongs to, IID refers to the ID of the CI in which the CIB is contained, rID is row ID, cID is column ID, c1, c2, c3 are stretch constants and c1 >> c2 >> c3.
Based on the ID numbers of the CIBs in Eq (6), their encryption and decryption strategies are described as follows:
1) Encryption strategy:
Algorithm 2 details the steps of the CIB-based encryption processing in which the ID numbers of the CIBs are encrypted, where δ and ω are two key values and δ < SID, δ < IID and ω < rID.
Algorithm 2 Encryption()
input: SID, IID, rID, cID of a CIB
output: IBID: the encrypted ID number of the CIB
1: if rID is an odd number and cID is an odd number then
2: IBID = (SID + δ) ⋅ c1 + (IID − δ) ⋅ c2 + (rID + ω) ⋅ c3 + cID
3: else if rID is an odd number and cID is an even number then
4: IBID = (SID + δ) ⋅ c1 + (IID − δ) ⋅ c2 + (rID − ω) ⋅ c3 + cID
5: else if rID is an even number and cID is an odd number then
6: IBID = (SID − δ) ⋅ c1 + (IID + δ) ⋅ c2 + (rID + ω) ⋅ c3 + cID
7: else
8: IBID = (SID − δ) ⋅ c1 + (IID + δ) ⋅ c2 + (rID − ω) ⋅ c3 + cID
9 end if
10 return the encrypted IBID
2) Decryption strategy:
Similarly, for the encrypted CIBs, their corresponding decryption processing is discussed in Algorithm 3.
Algorithm 3 Decryption()
input: SID, IID, rID, cID of a CIB
output: IBID: the encrypted ID number of the CIB
1: if rID is an odd number and cID is an odd number then
2: IBID = (SID − δ) ⋅ c1 + (IID + δ) ⋅ c2 + (rID − ω) ⋅ c3 + cID
3: else if rID is an odd number and cID is an even number then
4: IBID = (SID − δ) ⋅ c1 + (IID + δ) ⋅ c2 + (rID + ω) ⋅ c3 + cID
5: else if rID is an even number and cID is an odd number then
6: IBID = (SID + δ) ⋅ c1 + (IID − δ) ⋅ c2 + (rID − ω) ⋅ c3 + cID
7 else
8: IBID = (SID + δ) ⋅ c1 + (IID − δ) ⋅ c2 + (rID + ω) ⋅ c3 + cID
9: end if
10: return the encrypted IBID
For instance, assume that SID is 7, IID is 4, c1, c2, c3 are 1000, 100 and 10, respectively, then the original ID numbers of the CIBs before encryption are depicted in Fig 4(a). Fig 4(b) shows the encrypted ID numbers of the CIBs after encryption when δ = 3 and ω = 0.6.
Fig 4(a) shows the continuous distribution of the ID numbers of the nearby original CIBs before encryption. After encryption, as illustrated in Fig 4(b), the ID number distribution of the nearby CIBs is discrete. Therefore, the encryption of the CIBs makes it more and more difficult to find the corresponding nearby CIBs in the CI reconstruction.
Next, we proceed to analyze the probability of the successful decryption (i.e., the probability of accurate image reconstruction). Given a CI with m POAs, for each POA(i.e., POAi), the rows and columns of its corresponding POAR can be denoted as RSi and CSi, respectively. Then, the probability that the decryption processing is successful can be derived in Eq (7):
(7)
Based on Eq (7), with increasing number of the CIBs in a CI, the probability of the successful decryption becomes smaller and smaller which guarantees the hardness of the decryption from a theoretical level. The encrypted CIBs are stored in NC or NE which ensures the corresponding CIBs’ IDs in a CI presents a discrete distribution rather than continuity to a certain extent. The reconstruction and display of the CIs are conducted at NU by reversely decrypting the ID numbers of the CIBs based on the key values (i.e., δ and ω).
Uniform edge-cloud indexing framework.
To support faster CIS filtering processing, we propose a uniform edge-cloud index framework (UECIF) based on iDistance [35], in which the UECIF is composed of two types of indexes: the eIndex in NE and the cIndex in NC.
• Index Construction
For the eIndex, initially, suppose that the CISs in Ω are virtually stored in NE, which means the CISs in Ω are physically stored in NE, they are logically, however, not buffered in NE. Then, the CISs are first grouped into the K clusters by the AP-cluster [36] based on visual similarity (i.e., Eq (4)). Given a CIS Xi, its index key can be represented below:
(8)
where
is the cluster centre of the j-th cluster that Xi belongs to, sim(⋅, ⋅) represents the visual similarity distance function (i.e., Eq (4)), j ∈ [1, K], and the constant c1 is used to stretch the value range.
The index key is inserted into an improved B+-Tree in which a leaf node (LNode) can be modeled by a triplet: LNode(Xi) = < key, value, EType >, where EType = ‘F’ means Xi is not buffered in NE; otherwise, Xi has been buffered in NE. Algorithm 4 summarizes the initial construction process of the eIndex in which LNode(Xi).EType ← ‘F’(line 6) means all of the CISs are virtually stored in NE.
Algorithm 4 eIndex construction(Ω)
input: Ω: the CIS set
output: eIdx
1: eIdx←NULL; ▹ initialize
2: the CISs in Ω are grouped into K clusters; ▹ at edge node
3: for each CIS(Xi) in Ω do
4: ;
5: insert key(Xi) into an improved B+-tree(i.e., eIdx);
6: LNode(Xi).EType ← ‘F’;
7: end for
8 return the eIdx;
Similar to the above, for the cIndex, first of all, the clustering processing of the CISs(Ω) in NC is performed to obtain T clusters based on the above visual similarity. For a CIS Xi, its index key can be derived as:
(9)
where j ∈ [1, T], and other parameters and symbols are the same to that of in Eq (8). In Algorithm 5, the index key is inserted into an improved B+-Tree in which a leaf node(LNode) can be represented by a triplet: LNode(Xi) = < key, value, CType >, where LNode(Xi).CType ← ‘T’ means Xi is stored in NC.
Algorithm 5 cIndex construction(Ω)
input: Ω: the CIS set
output: cIdx
1: cIdx←NULL; ▹ initialize
2: the CISs in Ω are grouped into T clusters; ▹ at cloud node
3: for each CIS(Xi) in Ω do
4: ;
5: insert KEY(Xi) into an improved B+-tree(i.e., cIdx);
6: LNode(Xi).CType ← ‘T’;
7: end for
8: return the cIdx;
• Index-Support Retrieval Processing
Based on Eqs (8) and (9), suppose that there are n CISs in Ω, the index keys are inserted by an improved B+-Tree respectively. So for a range retrieval Θ(XR, rR) and each cluster Cj, as illustrated in Fig 5, there are five cases in terms of the positions of the two spheres.
Case 1: in Fig 5(a), the inequalities and
are met, which means Θ(XR, rR) intersects with
by which XR is contained. So the search range is represented as:
;
Case 2: in Fig 5(b), the inequalities and
are met, which means Θ(XR, rR) intersects with
and
does not contain XR. So the search range is represented as:
;
Case 3: in Fig 5(c), the inequality is met, which means Θ(XR, rR) contains
. So the search range is represented as:
;
Case 4: in Fig 5(d), the inequality is met, which means
contains Θ(XR, rR). So the search range is represented as:
;
Case 5: in Fig 5(e), the inequality is met, which means Θ(XR, rR) does not intersect with
. No candidate sequences are retrieved.
For the similarity retrieval in the MECN, there are two cases in terms of whether there exists a partial answer in NE: 1) the complete answer sequences(Ψ) are directly obtained from NC based on cIndex (see Algorithm 8); 2) the complete answer sequences (Ψ) are composed of the partial answer ones (Ψ′) obtained from NE (see Algorithm 6) and the partial answer ones(Ψ″) from NC (see Algorithm 7).
Algorithm 6 details the similarity range retrieval of the CISs based on the eIndex in NE. The routing Search() is the implementation of the range similarity search in the improved B+-Tree which is described in Algorithm 9.
Algorithm 6 ESearch(XR, rR, Ω′)
input: Θ(XR, rR): the retrieval CIS,
Ω′: the CIS whose ETypes are ‘T’ in NE
output: Ψ′: the partial answer CISs from NE
1: Ψ′ ← Φ; ▹ initialization
2: Ψ′ ← Search(XR, rR, Ω′)
3: for each candidate CIS(Xi) ∈ Ψ′ do
4: if sim(XR, Xi) > rR then
5: Ψ′ ← Ψ′ − Xi;
6: else
7: if LNode(Xi).EType = ‘F’ then
8: LNode(Xi).EType ← ‘T’; ▹ for eIndex
9: LNode(Xi).CType ← ‘F’; ▹ for cIndex
10: update the information of Xi (e.g., access frequencies and access time) in the log file;
11: end if
12: end if
13: end for
14: return Ψ′
Similarly, Algorithm 7 summarizes the index support partial similarity range retrieval of the CISs at the cloud node level. It is worth mentioning that LNode(Xi).CType = ‘T’ means Xi is not buffered in NE.
Algorithm 7 CSearch(XR, rR, Ω′)
input: Θ(XR, rR): the retrieval CIS,
Ω′: the CIS whose ETypes are ‘T’ in Nc
output: Ψ′: the partial answer CISs from cloud node
1: Ψ′ ← Φ; ▹ initialization
2: Ψ′ ← Search(XR, rR, Ω′)
3: for each candidate CIS(Xi) ∈ Ψ′ do
4: if sim(XR, Xi) > rR then
5: Ψ′ ← Ψ′ − Xi;
6: else
7: if LNode(Xi).EType = ‘F’ then
8: LNode(Xi).EType ← ‘T’; ▹ for eIndex
9: LNode(Xi).CType ← ‘F’; ▹ for cIndex
10: end if
11: end if
12: end for
13: return Ψ′
Finally, obtaining the complete answer CISs from the cloud node is detailed in Algorithm 8.
Algorithm 8 DSearch(XR, rR, Ω)
input: Θ(XR, rR): the retrieval CIS,
Ω: the CIS in Nc
output: Ψ: the complete answer CISs from Nc
1: Ψ ← Φ ▹ initialization
2: Ψ ← Search(XR, rR, Ω)
3 for each (Xi) ∈ Ψ do
4: if sim(XR, Xi) > rR then
5: Ψ ← Ψ − Xi;
6: end if
7: end for
8: return Ψ
Algorithm 9 Search(XR, rR, Ω)
input: Θ(XR, rR): the retrieval CIS,
Φ: the CISs
output: Φ′: the candidate CISs
1: Φ′ ← NULL;
2: for the CISs in each cluster Cj do
3: if and
then
4: ;
5: else if and
then
6: ;
7: else if them
8: ;
9: else if then
10: ;
11: else
12: exit();
13: end if
14: Φ′ ← Φ′ ∪ BRSearch[left, right];
15: end for
16: return Φ′
• Index Update
When user submits a retrieval request, the eIndex needs to be updated by adding the CISs that have been accessed in this retrieval. Since the number of CISs buffered in NE is limited, how to optimally choose the buffered CISs is challenging.
For example, assume that there are six CISs in NE, Tables 2 and 3 illustrate the ranking of access time (AT) and access frequencies (AF) for the six CISs, respectively. In Table 2, the ATs of the six CISs are sorted in an ascending order, which are quantitatively represented by the AT_IDs. Then a weighted AT (WAT) can be derived as follows:
(10)
Similarly, for the ranking of the access frequencies(AF), a weighted AF(WAF) is represented in Eq (11):
(11)
Based on Eqs (10) and (11), given a CISj, its uniform ranking score (URS) is shown below:
(12) Table 4 illustrates the uniform ranking scores of the six CISs. Based on Eq (12), the smaller the URS, the more important the CIS. If MaxN is 4, then CIS3 and CIS1 can be removed from the edge buffer.
Algorithm 10 Update(Ω′)
input: Ω′: the CISs buffered at NE
output: the updated Ω′
1: for i = 1 to MaxN − |Ω′| − 1 do
2: remove a CISj whose URS is the largest from Ω′;
3: Ω′ ← Ω′ − CISj;
4: LNode(CISj).CType ← ‘T’;
5: LNode(CISi).EType ← ‘F’;
6: end for
7: return the updated Ω′
Edge buffering
Unlike the traditional image retrieval methods, which directly obtain data from the remote cloud, if the answer CISs can be directly obtained from the edge without accessing the cloud, it will greatly shorten the long-distance transmission delay and improve the retrieval efficiency. Based on the above motivation, we propose an edge buffering scheme by analyzing the user historical retrieval (HR) log file. The refinement cost of the candidate CISs can be significantly decreased with the help of the buffering scheme since a portion of answer CISs can be retrieved directly without any refinement processing.
Specifically, assume that n HRs have been successfully completed with accurate results. Due to the fact that the answer CISs provided by each HR have been verified, when a user submits a new retrieval CIS (i.e., XR), it is highly possible that XR may be similar or even the same as the HR one (i.e., ). As a result, the retrieval efficiency and accuracy can be greatly improved if the HR results in NE can be carefully reused as a part of the current results.
Definition 7(CRS). Given a retrieval CIS XR and a retrieval radius rR, their corresponding CIS retrieval sphere (CRS) is a high-dimensional sphere with a centre XR and a radius rR, denoted as Θ(XR, rR).
Definition 8(HCRS). Given a HR CIS and a retrieval radius
, their corresponding historical CIS retrieval sphere (HCRS) is a high-dimensional sphere with a centre
and a radius
, denoted as
.
Definition 9(AA). Given a CRS Θ(XR, rR) and a HCRS , their corresponding affected area (AA) is the intersection part of the two spheres, formally denoted as:
.
For example, as shown in Fig 6, there are three HCRSs, i.e., ,
and
. The current CRS is represented as: Θ(XR, rR). For XR, it’s corresponding 1st, 2nd and 3rd nearest neighbor CISs are
,
and
, respectively. Therefore, the HR of
can be safely discarded since its corresponding HCRS does not intersect with Θ(XR, rR). The CISs falling in the AA (i.e.,
and
can be a part of the answer CISs of Θ(XR, rR).
Next, given two CIS retrieval spheres: Θ(XR, rR) and , there exists two cases on the basis of the two retrieval CISs (i.e., XR and
), which are shown in Figs 7 and 8, respectively.
, (a).
, (b).
, (c).
, (d).
.
In Fig 7, if , then there exists two cases in terms of the retrieval radii (i.e., rR and
).
• For case (a) which is formally represented as: , since the CISs falling in the HCRS
have already undergone verification, they can be part of the answer CISs for Θ(XR, rR);
• For case (b) which is formally represented as: , the answer CISs for Θ(XR, rR) can be derived from the CISs in
.
In Fig 8, if , there are four cases according to the placement of the two spheres (i.e., Θ(XR, rR) and
).
• In case (a), as the AA of the above two spheres does not exist, formally represented as: , the answer CISs need to be calculated sequentially in Θ(XR, rR);
• In case (b), as the AA of the above two spheres exists, formally represented as: . Since the CISs falling in
have been verified previously, they can be regarded as a part of a candidate CIS set of Θ(XR, rR);
• In case (c), as the AA of the above two spheres is , formally represented as:
. As the CISs that fall in
have been verified previously; they can be regarded as a part of an answer CIS set of Θ(XR, rR);
• In case (d), as the AA of the above two spheres is Θ(XR, rR), formally represented as: , the answer CISs are contained by the CISs falling in
.
The BRS algorithm
Before introducing the algorithm, a pre-processing step is first conducted. Algorithm 11 summarizes the detailed steps of our proposed BRS method in which ESearch(XR,rR), CSearch(XR,rR) and DSearch(XR,rR) correspond to Algorithms 6-8, respectively. As illustrated in Fig 9, first of all, when a retrieval lung CIS (XR) is submitted to the edge node level NE from the user one NU, then the eIndex scheme in NE is adopted to quickly judge whether there are some answer CISs similar to XR. If exists, then the high-dimensional similarity retrieval is carried out with the support of the cIndex scheme at the cloud to obtain some partial retrieval answer CISs; otherwise, the similarity retrieval of all CISs supported by the cIndex is performed directly through the cloud, and finally the answer CISs are returned to the receiver node. Note that, in line 9, before transmitting the answer CISs to the receiver, the decryption processing of the CIBs in the CIs need to be performed to ensure the accurate reconstruction and display of the answer CISs. Compared to NIBs, the CIBs have higher transmission priorities. In accordance with the various priorities of the IBs, they can be transmitted in descending order of priority, which not only assures the security of data transmission but also ensures that the critical information can be transmitted first.
Algorithm 11 BRS(XR, rR)
input: XR: a retrieval CIS, rR: a retrieval radius
output: Ψ: the answer CISs
1: a retrieval CIS (XR) is submitted from NU;
2: Ψ′ ← ESearch(XR, rR); ▹ obtain answer CISs(Ψ′) based on the eIndex at the edge node UE;
3: Ψ′ ≠ NULL then
4: Ψ″ ← CSearch(XR, rR); ▹ obtain the partial answer CISs(Ψ″) based on cIndex at the cloud UC;
5: Ψ ← Ψ′ ∪ Ψ″;
6: else
7: Ψ ← DSearch(XR, rR); ▹ obtain the complete answer CISs based on cIndex at the cloud UC;
8: end if
9: transmit the CISs in Ψ to the receiver node level with different transmission priorities
Experiments
To verify the efficiency of the proposed BRS method, extensive simulation experiments are conducted to demonstrate the retrieval performance.
Experimental setup.
In the experiments, the image receiver client is equipped with a 5.9-inch, full HD 1080p screen and a Qualcomm® Snapdragon™ 650 processor running at 1.7GHz. The client system is developed in Java and operates on the Android operating system [37]. The edge node and the cloud one are connected via 1Gbps network links. In the cloud node, the IBs (i.e., CIB and NIB) with various transmission priorities are kept in a file system and some structured data is recorded by the MySQL [38]. Each node contains a 2.7 GHz quad-core Xeon processor, 2.0 Gigabyte memory, and 1.0 Terabyte hard disk. The maximum data communication rate is 150 Mbps in the wireless network.
We selected the LUNA16 dataset [39], which contains 239,232 lung CIs, as our experimental dataset. There are 888 lung CISs in this database, with an average of 336 lung CISs each set. The lung CISs in each set range in level from 200 to 600.
A prototype retrieval system.
Fig 10 depicts a demonstration of the prototype system. An example of the CIS pre-processing backend interface is shown in Fig 10(a) in which a POA as been marked by a blue rectangle line. In Fig 10(b), a CIS with the category ‘lung’ has been inputted as a retrieval sequence. Four result CISs were quickly retrieved, and their matching IBs are restored and shown.
Effectiveness of the BRS method.
The first experiment testifies the effectiveness of our BRS method by using the lung CISs randomly selected as experimental data. The recall and precision achieved by this retrieval method can be defined as:
(13)
where rel means the set of ground-truth, and ret refers to the set of results returned by a similarity range search.
As shown in Fig 11, performance comparisons of the retrieval effectiveness of the 10 CISs with the same organ (i.e., lung) that are randomly selected from the database are conducted. As can be observed from the figure, precision steadily declines as recall ratio rises. The reason is that when the recall rate is low, it’s highly possible that the correctness rate of the result CISs is high. Meanwhile, the high recall rate can not guarantee that the retrieval results contain the correct CISs.
Effect of data size.
In this experiment, we investigate the effect of data size (i.e., the number of the lung CISs) on the retrieval efficiency by using the two methods: 1) our proposed BRS method; 2) The MIRC method in [25]. In this experiment, the network bandwidth is 100Mbps and the number of edge nodes is 15, and the UECI framework is used. In Fig 12, with the increase of the CISs, the BRS method is superior to the MIRC since the edge buffer is verified to significantly reduce the retrieval computation cost and transmission delay. Meanwhile, it is interesting to observe that as the data size increases, the overall response time first grows rapidly and then gradually. This is because the index performs better when there is more data.
Effect of ε.
The experiment evaluates the effect of ε on the retrieval performance. Similar to the above experiment, the network bandwidth and the number of cloud(edge) nodes are fixed, and the edge buffering scheme and the indexing mechanism are adopted. As demonstrated in Fig 13(a), with the increase of ε, the CPU cost for the similarity computation is decreasing due to the decrease of the number of the RCIs in each CIS. Meanwhile, it’s interesting to note in Fig 13(b) that the precision ratio increases rapidly first and then decreases gradually. The reason is that too many or too few RCIs will make it difficult to accurately and completely measure the similarity of the CISs. Therefore, an optimal ε is set to be 0.65.
Effect of edge buffering scheme.
In this experiment, we proceed to study the effect of the edge buffering scheme on the retrieval performance. Method 1 adopts the edge buffering scheme and method 2 do not use it. Fig 14 demonstrates that the overall response time using method 1 is faster than method 2 when the bandwidth is stable and the retrieval radius (rR) is fixed. Meanwhile, the performance gap widens as rR steadily grows while the band-width remains constant. This is because with the increase of rR, the probability of obtaining the result CISs in the edge buffering is also increasing.
Effect of indexing scheme.
The final experiment examines how the index framework (i.e., eIndex and cIndex) affects retrieval efficiency. Here, method 1 uses the aforementioned two indexes, whereas method 2 does not (i.e., it sequentially searches each cloud node to find the answer CISs). In Fig 15, when the data size and the network bandwidth are fixed, the number of the cloud nodes varies from 10 to 50, the response time for the method 1 (i.e., index-based retrieval) is growing with the number of the cloud nodes increases. Meanwhile, the performance gap of the two approaches becomes smaller since the response time for method 2 is relatively stable and locating the corresponding candidate CISs based on the index is faster than that of no index. It’s interesting to notice that the retrieval response time is the smallest when the number of cloud nodes is 10. The larger the number of cloud nodes involved in retrieval, a large amount of data exchange and transmission will occur, resulting in retrieval delay.
Conclusion
In this paper, we introduced the BRS method—a privacy-preserving batch retrieval of the lung CISs in edge-cloud collaborative computing environment. The goal of our proposed BRS is to provide a safe and efficient retrieval of the lung CISs in resource- constraint network with low and unstable network bandwidth. To enable the efficient BRS processing, four supporting techniques are proposed, namely, 1) batch similarity measure for CISs, 2) CIB-based privacy preserving scheme, 3) uniform edge-cloud index framework, and 4) edge buffering scheme. The experimental results reveal that the efficiency of the BRS method is more than 200% higher than that of the sequential retrieval with the aid of the supporting techniques, especially when the number of cloud nodes is smaller.
Acknowledgments
The authors would like to thank the editors and anonymous reviewers for their helpful comments. Special thanks to Dr. Yujia Ge for her editing.
References
- 1. Welter P, Fischer B, Günther RW, Deserno TM. Generic integration of content-based image retrieval in computer-aided diagnosis. Computer Methods & Programs in Biomedicine, 108(2):589–599. 2012. pmid:21975083
- 2.
Alzubi J, Alzubi O, Singh A, et al. Cloud-IIoT Based Electronic Health Record Privacy-Preserving by CNN and Blockchain-Enabled Federated Learning. IEEE Transactions on Industrial Informatics. pp:1-8. July 2022
- 3. Razaque A, Aloqaily M, Almiani M, et al Efficient and reliable forensics using intelligent edge computing. Future Generation Computer Systems. pp.230–239. 2021.
- 4. Alzubi O, Alzubi J, Shankar K, Gupta D. Blockchain and artificial intelligence enabled privacy-preserving medical data transmission in Internet of Things. Transactions on Emerging Telecommunications Technologies. Vol.32, Issue12. December 2021. e4360
- 5. Nazir S, Alzubi O, Mohammad K, et al. Image subset communication for resource-constrained applications in wirelesssensor networks, Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 28: No. 5, Article 21. 2020.
- 6.
Philbin J, Chum O, Isard M, et al. Object retrieval with large vocabularies and fast spatial matching. International Conference on Computer Vision and Pattern Recognition. 2007.
- 7. Flickner M, Sawhney H, Niblack W, Ashley J. Query by image and video content: The QBIC system. Computers. 28(9). 23–31. 1995.
- 8. Rui Y, Huang T.S, Chang S.F. Image Retrieval: Current Techniques, Promising Directions and Open Issues, Journal of Visual Communication and Image Representation, 10. pp.39–62. 1999.
- 9. Smith J, Chang S.F. VisualSEEK: a fully automated content-based image query system. ACM Multimedia. pp.89–98, 1996.
- 10. Smith J, Chang S.F. Visually Searching the Web for Content, IEEE Multimedia. 4(3): 12–20, 1997.
- 11. Shyu C.R, Brodley C.E, Kak A.C, et al. ASSERT: a physician-in-the-loop content-based retrieval system for HRCT image databases. Computer Vision and Image Understanding. 75. pp.111–132. 1999.
- 12.
Deselaers T. Features for Image Retrieval [dissertation] Aachen, Germany: Rheinisch-Westfalische Technische Hochschule Aachen. 2003.
- 13.
Deselaers T. Fire [Internet] Tomas Deselaers; Available from: http://thomas.deselaers.de/fire. 2009.
- 14. Huang YG, Zhang J, Huang HY, Wang DF. Medical image retrieval based on unclean image bags. Multimedia Tools and Applications. 72(3), pp 2977–2999. 2014.
- 15. Huang YG, Huang HY, Zhang J. A noisy-smoothing relevance feedback method for content-based medical image retrieval. Multimedia Tools and Applications. 73(3), pp. 1963–1981. 2014.
- 16. Kitanovski I, Strezoski G, Dimitrovski I. Multimodal medical image retrieval system. Multimedia Tools and Applications. 76(2), pp. 2955–2978. 2017.
- 17. Lan RS, Zhong S, Liu ZB, Shi Z. A simple texture feature for retrieval of medical images. Multimedia Tools and Applications. 77(9), pp. 10853–10866. 2018.
- 18. Ali M, Dong L, Akhtar R. Multi-panel medical image segmentation framework for image retrieval system. Multimedia Tools and Applications. 77(16), pp. 20271–20295. 2018.
- 19. Kasban H, Salama D.H. A robust medical image retrieval system based on wavelet optimization and adaptive block truncation coding. Multimedia Tools and Applications. 2019.
- 20. Tuyet V.T.H, Binh N.T, Quoc N.K, et al. Content Based Medical Image Retrieval Based on Salient Regions Combined with Deep Learning. Mobile Network and Application, 26, pp.1300–1310. 2021.
- 21.
Elkariem A.F.A, Bashir M.B, Ahmed T.H, et al. Distributed medical image retrieval techniques: A review. 2017 Sudan Conference on Computer Science and Information Technology. 2017.
- 22.
Anbarasi M.S, Mehata K.M, Sandhya S, Suganya V. Medical image retrieval from distributed environment. International Conference on Intelligent Agent & Multi-Agent Systems. 2009.
- 23.
Charisi A, Megalooikonomou V. Content-based medical image retrieval in peer-to-peer systems. ACM International Health Informatics Symposium (IHI’10). 724-733. 2010.
- 24. Depeursinge A, Duc S, Eggel I, et al. Mobile medical visual information retrieval. IEEE Trans. on Information Technology in Biomedicine. 16(1), pp. 53–61, Jan. 2012. pmid:22157061
- 25. Zhuang Y, Jiang N, Wu Z.A., Li Q, et al. Efficient and robust large medical image retrieval in mobile cloud computing environment. Information Sciences. Vol.263. pp.60–86. 2014.
- 26. Zhuang Y, Jiang N, Li Q, Chen L, Ju C.H. Progressive batch medical image retrieval processing in mobile wireless network. ACM Trans on Internet Technology. 15(3), Article 9, 2015.
- 27.
Cruz A.L, Medina R, Vega F, et al. Mobile teleradiology system suitable for m-health services supporting content and semantic based image retrieval on a grid infrastructure. 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2016.
- 28. Chitra S, Kumaratharan N, Ramesh S. Enhanced brain image retrieval using carrier frequency offset compensated orthogonal frequency division multiplexing for telemedicine applications. International Journal of Imaging Systems and Technology. 28(3), pp. 186–195. 2018.
- 29. Jiang N, Zhuang Y, Chiu D. K.W. Towards Effective and Efficient Framework for Crowd-Assisted Mobile Similarity Retrieval of Medical Images in Mobile Telemedicine Systems. Multimedia Tools and Applications. 79, pp.19893–19923. 2020.
- 30.
Lei Y, Xu D, Zhou Z-Y, Higgins K, et al. High-resolution CT image retrieval using sparse convolutional neural network, SPIE 10573, Medical Imaging 2018: Physics of Medical Imaging, 105733F (9 March 2018)
- 31.
Yu M, Lu Z-T, Feng Q-J, Chen W-F. Liver CT image retrieval based on non-tensor product wavelet. International Conference of Medical Image Analysis and Clinical Application. 2010.
- 32. Hatibaruah R, Nath V. K. and Hazarika D. Computed tomography image retrieval via combination of two local bit plane-based dissimilarities using an adder. International Journal of Wavelets, Multi-resolution and Information Processing vol. 19, no. 01 2021
- 33. Hwang H-J, Seo J-B, Lee S-M, et al. Content-Based Image Retrieval of Chest CT with Convolutional Neural Network for Diffuse Interstitial Lung Disease: Performance Assessment in Three Major Idiopathic Interstitial Pneumonias. Korean Journal of Radiol. 22(2): pp.281–290. 2021. pmid:33169547
- 34. Alzubi J.A, Bharathikannan B, Tanwar S, Manikandand R, Khanna A, Thaventhiran C, Boosted neural network ensemble classification for lung cancer disease diagnosis, Applied Soft Computing, Vol. 80, pp. 579–591, July 2019.
- 35. Jagadish H.V, Ooi B.C, Tan K.L, et al. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search. ACM Trans on Database Systems, 30(2), 364–397. 2005.
- 36. Frey B.J, Dueck D. Clustering by passing messages between data points, Science. 315 (5814), 972–976. 2007. pmid:17218491
- 37.
Android. The Android platform. http://code.google.com/intl/zh-CN/android/. 2010
- 38.
MySQL. http://www.mysql.com/. 2011.
- 39.
The Luna16 Dataset. https://luna16.grand-challenge.org/. 2016