Zum Hauptinhalt springen

Efficient backward private searchable encryption

Shravan Kumar Parshuram Puria ; Shah, Akash ; et al.
In: Journal of Computer Security, Jg. 28 (2020-03-17), S. 229-267
Online unknown

Efficient backward private searchable encryption 

Dynamic Searchable Symmetric Encryption (DSSE), apart from providing support for search operation, allows a client to perform update operations on outsourced database efficiently. Two security properties, viz., forward privacy and backward privacy are desirable from a DSSE scheme. The former captures that the newly updated entries cannot be related to previous search queries and the latter ensures that search queries should not leak matching entries after they have been deleted. These security properties are formalized in terms of the information leakage that can be incurred by the respective constructions. Existing backward private constructions either have a non-optimal communication overhead or they make use of heavy cryptographic primitives. Our main contribution consists of two efficient backward private schemes Π BP and Π WBP that aim to achieve practical efficiency by using light weight symmetric cryptographic components only. In the process, we also revisit the existing definitions of information leakage for backward privacy [Bost et al. (In ACM CCS (2017) 1465–1482 ACM Press)] and propose a relaxed formulation. Π BP is the first construction to achieve backward privacy in the general setting with optimal communication complexity. Our second construction, Π WBP , is the first single round-trip scheme achieving backward privacy in a restricted setting with optimal communication complexity using light weight symmetric cryptographic primitives. The prototype implementations of our schemes depict the practicability of the proposed constructions and indicate that the cost of achieving backward privacy over forward privacy is substantially small. The performance results also show that the proposed constructions outperform the currently most efficient scheme achieving backward privacy.

Keywords: Dynamic Searchable Symmetric Encryption; backward privacy; forward privacy

1. Introduction

Due to a variety of crucial benefits, enterprises outsource their data to cloud resident storage. If the outsourced data is stored as plaintext on remote servers then it may be intercepted by adversaries. Hence, data is stored in encrypted form on remote servers. However, if the client has to decrypt the whole data in order to get results for a search query, it defeats the purpose of outsourcing data. Generic tools such as fully homomorphic encryption [[20]] or oblivious RAM [[23], [40]] can be considered to construct protocols that leak almost no information to the server. But as of now, these tools are costly for large databases and hence, are impractical.

A practical solution to this problem is Searchable Symmetric Encryption ( SSE ) [[10], [13], [37]] that trades efficiency for security. Dynamic Searchable Symmetric Encryption ( DSSE ) [[9], [28], [34]] adds a vital feature to static SSE schemes, i.e., the ability for the client to efficiently perform update operations remotely on the outsourced database with the guarantee that minimal information is leaked to the server in the process. These constructions of (D)SSE that aim to achieve an acceptable balance between security and performance, explicitly describe the leakage profile and formally prove that the information leaked from the scheme is bounded by the leakage profile.

Simultaneously with the works on constructing SSE schemes with improved efficiency, security and expressiveness of queries [[10], [17], [27]], there is another line of work that shows the real-world consequences of these leakages [[1], [8], [44]]. Zhang et al. [[44]] through file injection attack showed that it is possible to reveal the contents of past search queries of DSSE schemes with a few injection of documents and Abdelraheem et al. [[1]] showed that the consequences of this attack is even more devastating in the case of relational databases.

Because of the file injection attack, forward privacy has garnered significant interest in the research community. The notion of forward privacy was introduced in [[39]], while it was first formalized in [[4]]. In hindsight, the first DSSE scheme that satisfied the notion of forward privacy was proposed in 2005 [[12]]. Along with forward privacy, Stefanov et al. [[39]] asserted that backward privacy should also be satisfied by a DSSE scheme. Informally, backward privacy states that search queries should not leak matching entries after they have been deleted. The notion of backward privacy was first formalized by Bost at el. [[6]].

1.1. Related work

Kamara et al. [[28]] proposed the first sublinear (in the size of the database) DSSE scheme. Forward private scheme Σoϕoζ [[4]] achieves optimal communication complexity (linear in the size of result set) but it makes use of asymmetric cryptographic primitives and does not support parallel processing. A GGM-PRF [[22]] based forward private DSSE scheme Diana was proposed in [[6]]. Diana makes use of symmetric cryptographic primitives only and supports parallel processing but doesn't have optimal computational and communication complexity. Asymptotically optimal forward private DSSE schemes that make use of symmetric cryptographic primitives only and support parallelism were proposed in [[16], [30], [38]]. Further, FASTIO scheme [[38]] ensures a reasonable locality [[9]], a measure of I/O efficiency.

The notion of backward privacy was first formally described in [[6]], through three different definitions, in the ascending order in terms of information leakage, called respectively BPIP , BPUP and WBP . Bost et al. [[6]] proposed a generic way to achieve backward privacy from any forward private DSSE scheme. However, the communication complexity of the derived backward private scheme isn't optimal. In [[6]], a backward private scheme Dianadel based on constrained pseudo-random function ( CPRF ) [[3], [7], [29]] and a backward private Janus framework based on a puncturable encryption scheme with a particular incremental update property [[26]], were also proposed. The communication and computational complexity of search and update protocols of Dianadel are not optimal. The search protocol in Janus is single-roundtrip and has an optimal communication complexity. However, the computational complexity of search protocol is O(nw·dw) , where nw and dw respectively denote the number of documents matching keyword w and delete operations performed on keyword w. As acknowledged in [[6]], with just a few hundred deletions per keyword, Janus will not be practical because of both computational and storage overhead reasons. Further, [[6]] has imposed the following restriction on Dianadel and Janus : reinsertion of document-keyword pairs that were previously deleted is not allowed. We refer to this constraint as reinsertion restriction in the rest of our paper.

Very recently, there appeared two works on backward private DSSE [[21], [41]]. Chamani et al. [[21]] proposed a practically efficient BPUP -secure scheme called Mitra . Further, by handling delete queries efficiently in this scheme resulted in the most efficient backward private scheme until now, i.e., Mitra . They also proposed two other backward private schemes Orion and Horus that achieve quasi-optimal (linear in nw upto a logarithmic factor) search computation complexity but make use of Path ORAM [[40]] and as a result are impractical for large databases [[33]]. Sun et al. [[41]] proposed a symmetric puncturable encryption ( SPE ) scheme and instantiated the Janus framework with the proposed SPE scheme. We provide a comprehensive comparative analysis of the performance and security of these schemes vis à vis our proposed schemes in Section 4.4 and 5.

1.2. Our contributions

We start with revisiting the notion of information leakage in the context of backward privacy. Our investigation suggests that Weak Backward Privacy ( WBP ) notion proposed by Bost et al. [[6]] should only be used to argue backward privacy in reinsertion restriction setting. The other two existing notions of information leakage BPIP and BPUP are strong in the sense that it seems difficult to obtain a DSSE scheme satisfying these notions with optimal communication complexity. Therefore, like WBP one may need to relax these stronger notions a bit but unlike WBP one needs to ensure that the notion of backward privacy is not violated in the general setting. To this end we formalize a relaxed notion of information leakage, BPLP .

Our main contribution consists of two backward private schemes ΠBP and ΠWBP that are BPLP and WBP secure respectively. We start with a simple forward private scheme ΠFP , that serves as a building block for our backward private schemes. Currently, Mitra [[21]] is the most suitable backward private candidate for adoption in practice as it makes use of only light weight symmetric components and has low leakage level. However, one limitation of Mitra is that the communication complexity isn't optimal. We address this issue in our main construction ΠBP . ΠBP makes use of tags generated using a pseudo random permutation ( PRP ) which ensures that only the tags corresponding to the set of documents currently matching the keyword w are returned to the client. Thus, ΠBP avoids unnecessary communication overhead, while at the same time ensuring that any information violating the notion of backward privacy is not leaked to the server. To the best of our knowledge, ΠBP is the first practical backward private scheme that has optimal update and search communication complexity and uses symmetric cryptographic primitives only. Further, ΠBP is easily parallelizable, provides reasonable locality and allows reinsertion of document-keyword pair. With a simple modification in ΠBP , we construct a single roundtrip weak backward private scheme ΠWBP that improves upon the concrete communication overhead in Search protocol by more than 50%. Both the proposed constructions are forward private as well. A comparison of our schemes with some prior works [[6], [21], [41]] is provided in Table 1.

Table 1 Comparison of backward private schemes ΠBP and ΠWBP with some prior works

SchemesComputationCommunicationBackward privacy
SearchUpdateSearchUpdate# Rounds

Fides

[6]

O(ow)

O(1)

O(ow)

O(1)

2

BPUP

Dianadel

[6]

O(aw)

O(log(aw))

O(nw+dwlog(aw))

O(1)

2

WBP

Janus

[6]

O(nw·dw)

O(1)

O(nw)

O(1)

1

WBP

Mitra

[21]

O(ow)

O(1)

O(ow)

O(1)

2

BPUP

Orion

[21]

O(nwlog2(N))

O(log2(N))

O(nwlog2(N))

O(log2(N))

O(log(N))

BPIP

Horus

[21]

O(nwlog(dw)log(N))

O(log2(N))

O(nwlog(dw)log(N))

O(log2(N))

O(log(dw))

WBP

Janus++

[41]

O(nw·d)

O(d)

O(nw)

O(1)

1

WBP

ΠBP

(this work)

O(ow)

O(1)

O(nw)

O(1)

2

BPLP

ΠWBP

(this work)

O(ow)

O(1)

O(nw)

O(1)

1

WBP

All the constructions are also forward private. The client storage for all the constructions is

O(mlog(n))

except

Orion

, where the corresponding complexity is

O(1)

. See Table 2 for the notations used. Generally,

ow<nw·dw

, as the former has an additive factor whereas the latter has a multiplicative factor.

ΠFP is the most efficient forward private scheme in literature. Our implementation results show that the performance of ΠWBP and ΠBP is comparable to ΠFP . For example, the time taken by the search protocol of ΠWBP and ΠBP for a search that returned 150,000 results is around 0.60 and 0.73 seconds respectively as compared to 0.54 seconds in ΠFP . The performance results indicate that the proposed constructions are 2× faster and improve upon the communication cost by a factor of 1.5–11 compared to the most efficient backward private construction in literature.

2. Notations and definitions

The security parameter is denoted by λ. All procedures in our construction implicitly take λ as input. By efficient, we mean probabilistic polynomial-time in λ. All the algorithms (including adversaries and simulators) are assumed to be efficient unless otherwise specified. A function f: NR is said to be a negligible function iff for all c> 0, n0N such that nn0 , f(n)<nc . The function neg(λ) denotes a negligible function in λ. For a finite set X, x$X means that x is uniformly sampled from X and |X| denotes the cardinality of set X. xy denotes that variable x is assigned the value of variable y and operator ‖ denotes concatenation. For a data structure DS , |DS| denotes the memory space occupied by the data structure in bits. addr(D) denotes the address in memory at which data structure instance D is stored. ⊥ denotes null value. For sets X1,...,Xn and Y, Func(X1××Xn,Y) denotes the set of all functions from X1××Xn to Y. For set X, Perm(X) denotes the set of all permutations on X. For sets X1 and X2 , PermX1(X2) denotes the set of all functions from X1×X2 to X2 , where for every xX1 , we have a permutation on X2 .

We use pseudo random functions ( PRF ), pseudo random permutations ( PRP ) and pseudorandom ciphertexts under chosen plaintext attack (RCPA) secure symmetric key encryption schemes in our constructions.

2.1. Pseudorandom function

Pseudorandom Function ( PRF ) F is polynomial-time computable in λ and is indistinguishable from a truly random function by any adversary A .

Definition 2.1.

Let FFunc(K×I,O) be an efficient, keyed function. For algorithm A , we define the experiments RealAPRF(λ) and IdealAPRF(λ) as shown in Fig. 1.

F is a pseudorandom function if for all probabilistic polynomial-time adversaries A , AdvF,APRF(λ)=|Pr[RealAPRF(λ)=1]Pr[IdealAPRF(λ)=1]|neg(λ) .

2.2. Pseudorandom permutation

Pseudorandom permutation ( PRP ) F is polynomial-time computable and invertible in λ and is indistinguishable from a truly random permutation by any adversary A .

Definition 2.2.

Let FPermK(X) be an efficient, keyed function. For algorithm A , we define the experiments RealAPRP(λ) and IdealAPRP(λ) as shown in Fig. 2.

Graph: Fig. 1. PRF security definition.

Graph: Fig. 2. PRP security definition.

F is a pseudorandom permutation if for all probabilistic polynomial-time adversaries A , AdvF,APRP(λ)=|Pr[RealAPRP(λ)=1]Pr[IdealAPRP(λ)=1]|neg(λ) .

2.3. Symmetric key encryption scheme

A symmetric key encryption scheme E consists of three algorithms: Gen,Enc and Dec .

  • Gen() : It outputs a key k .
  • Enc(k,m) : It takes as input the key k , and a message mM , where M is the message space, and outputs a ciphertext e .
  • Dec(k,e) : It takes as input the key k and a ciphertext e and outputs a message mM or ⊥.

Correctness: For all kGen() and for all messages mM , it is required that Dec(k,Enc(k,m))=m .

RCPA security notion of symmetric key encryption scheme. The pseudorandom ciphertexts under chosen plaintext attack (RCPA) security notion [[9]] for symmetric key encryption scheme E=(Gen,Enc,Dec) is captured in Fig. 3. Initialize () algorithm generates the key k using Gen algorithm of E and picks a random challenge bit b. The adversary can then adaptively ask queries to Encrypt () oracle. The game returns true if the adversary's output b equals the challenge bit b. In Fig. 3, C denotes the ciphertext space.

Definition 2.3.

A symmetric key encryption scheme E=(Gen,Enc,Dec) is said to have pseudorandom ciphertexts under chosen plaintext attack (RCPA) if no adversary A can win the game shown in Fig. 3, except with probability at most 12+neg(λ) . This probability is denoted by Pr[RCPAE=1] .

The advantage of A in RCPAE game is defined as follows: AdvE,ARCPA(λ)=2·Pr[RCPAE=1]1 .

Graph: Fig. 3.Security game: RCPAE.

2.4. Dynamic searchable symmetric encryption (DSSE)

The system consists of two parties: the client C (data owner) and the server S . C , who owns the database DB , encrypts the database and outsources it to S . The encrypted copy of the database created ensures that S responds to C 's queries (search and update) in an efficient manner with the guarantee that minimal information apart from the intended output of the query operation is leaked to the server. Usually, the encrypted database comprises of a secure index and a collection of encrypted documents. S  responds to C 's queries with the help of this secure index.

We primarily follow the formalization of Bost et al. [[6]] with certain additions. A keyword is denoted by w and a document is addressed by its identifier ind . The database DB can be represented as: DB={(indi,Wi):1in} , where n denotes the number of documents in the database, indi{0,1} are distinct document identifiers and Wi{0,1} is a set of keywords matching document indi , represented by binary strings of arbitrary length. Additionally, we consider the notations described in Table 2.

Table 2 Notations used in DSSE description

NotationDescription

W

i=1nWi

, the set of keywords
m

|W|

, # keywords
N

i=1n|Wi|

, # document-keyword pair

DB(w)

{indi:wWi}

, the set of documents containing w

nw

|DB(w)|

, # documents containing w

aw

#

add

operations performed on w

dw

#

del

operations performed on w

ow

# updates performed on w

nw

# documents containing w in previous search* operation

aw

#

add

operations performed on w after the previous search

dw

#

del

operations performed on w after the previous search

ow

nw+aw+dw

dupper bound on the number of deletes corresponding to a keyword w that can happen between two successive search queries on w

1 *By previous search on w, we mean the last search operation on w before the current search on w.

A DSSE scheme Π comprises of the following [[6]]:

  • Setup(DB) is a probabilistic algorithm that takes as input the initial database DB . It outputs ( stC,EDB ), where the client's state stC is given to C and the encrypted database EDB is given to  S . Setup algorithm is executed by C .
  • Search(q,stC;EDB)=(SearchC(q,stC),SearchS(EDB)) is a protocol (possibly probabilistic) between C and S . The input of C is the search query q and client's state stC . The input of S is the encrypted database EDB . The output to the client is the updated client's state stC and the set Res comprising of document identifiers matching the search query q . The output to the server is the updated encrypted database EDB .
  • Update(q,stC;EDB)=(UpdateC(q,stC),UpdateS(EDB)) is a protocol (possibly probabilistic) between C and S . The input of C is the query q=(op,in) comprising of update operation op{add,del} and the document-keyword pairs ( ind ,w) denoted by in , and client's state stC . The input of S is the encrypted database EDB . The output to the client is the updated client's state stC . The output to the server is the updated encrypted database EDB .

In this work, we consider the case of single keyword search, so, q=w and Res=DB(w) in Search protocol. For simplicity, we consider in=(ind,w) , i.e., a single document-keyword pair in Update protocol. There are two use-cases of Update protocol in a DSSE scheme. In the first use-case updates to the DB are done at document level, such as in storing a collection of text files. In such scenarios, bulk-updates are needed, which can be supported by calling the Update protocol repeatedly. In the second use-case updates to the DB are done at keyword-document pair level, such as in record databases which requires a single call to Update protocol.

Similar to [[9], [28], [30], [39]], for a given query q , we consider the output of Search protocol to be the set of document identifiers satisfying q . This allows us to decouple the storage of documents from the storage of data structures used to realize the search operation, which is the focus of this work. The leakage profile L in the security definitions for an SSE scheme can then be given with respect to only the information that gets leaked to adversarial server till the set of document identifiers satisfying q are determined. The storage of actual documents can be done in a variety of ways [[10]], with varying types of leakages. SSE schemes are of two types: response-revealing and response-hiding [[27]]. The former reveals the query response in plaintext whereas the latter does not. We use this categorization in our paper.

Security. We consider S to be honest-but-curious. The security definition follows the real/ideal simulation paradigm [[10], [14]]. The definition is parameterized by a leakage profile L which captures all the information that the adversary may learn about the encrypted database and C 's queries through its participation in the protocols. Hence, the view of the adversary in the real world can be simulated by L . L={LSetup,LSearch,LUpdate} , where LSetup , LSearch and LUpdate correspond to the information leaked in the Setup , Search and Update protocols respectively to the server.

Definition 2.4.

Let Π=(Setup,Search,Update) be a DSSE scheme and let L={LSetup,LSearch,LUpdate} be a stateful algorithm. For probabilistic polynomial time ( PPT ) algorithms A and Sim , we define the experiments RealAΠ(λ) and IdealA,SimΠ(λ) as follows:

RealAΠ(λ) . A(1λ) chooses DB . The experiment then runs (stC,EDB)Setup(DB) and gives EDB to A . Then A makes polynomial number of adaptive queries. For each query q , if q is a search query (resp. update query), the game runs (Res,stC,EDB)Search(q,stC;EDB) (resp. (stC,EDB)Update(q,stC;EDB) ) and gives the generated transcript to A . Eventually A returns a bit that the game uses as its own output.

IdealA,SimΠ(λ) . A(1λ) chooses DB . The experiment then runs EDBSim(LSetup(DB)) and gives EDB to A . Then A makes polynomial number of adaptive queries. For each query q , if q is a search query (resp. update query), the game gives transcript generated by Sim(LSearch(q)) (resp. Sim(LUpdate(q)) ). Eventually A returns a bit that the game uses as its own output.

We say that Π is L -semantically secure against adaptive attacks if for all adversaries A , there exists an algorithm Sim such that |Pr[RealAΠ(λ)=1]Pr[IdealA,SimΠ(λ)=1]|neg(λ) .

Common leakage. We follow some of the notations of common leakages from [[4], [6]]. The leakage profile L keeps as state the query list Q , i.e., the list of all queries issued so far along with their timestamp. Basically, the timestamps are a sequence of integers. The entries in Q are (u,w) for a search query on w, or (u,op,(ind,w)) for an update query (op,(ind,w)) , where u denotes the timestamp of the query. Corresponding to the search queries, the search pattern sp(w) can be defined as sp(w)={u:(u,w)Q} .

We also use the notation Hist(w) that denotes the list of all the modifications made to DB(w) over the time. It consists of DB0(w) , the set of document indices matching w at setup, and a list UpHist(w) , comprising of the updates of documents matching w, called the update history. For example, consider two documents 1 and 2 matching w. Suppose, the update queries are (add,(1,w)) , (add,(2,w)) and (del,(1,w)) at timestamp 3, 12 and 20 respectively, then UpHist(w)=[(3,add,1),(12,add,2),(20,del,1)] .

Updates(w) denotes the set of timestamps of updates on w. Formally,

Updates(w)={u|(u,add,(ind,w))Q or (u,del,(ind,w))Q}.

Updatesop(w) is exactly like Updates(w) except along with the timestamp it also stores op corresponding to the update query.

Correctness. We say that a DSSE scheme is correct if the Search protocol returns the correct results for the keyword being searched (i.e., DB(w) ), except with negligible probability. We follow a similar formalization as [[9]].

In Fig. 4, the adversary A makes p(λ) many queries, for some polynomial p. Transcript[Protocol] means the view of server in Protocol . T0= and Ti={τj:1ji} , 1 ip(λ) . Here, τj denotes the transcript of the jth query.

Graph: Fig. 4. DSSE correctness.

Definition 2.5.

Let Π=(Setup,Search,Update) be a DSSE scheme. For algorithm A we define the experiment DSSECorAΠ(λ) as shown in Fig. 4.

We say that Π is correct if for all adversaries A , Pr[DSSECorAΠ(λ)=1]neg(λ) .

3. Security notions in DSSE

As discussed earlier, two security properties, viz., forward privacy and backward privacy are desirable from a DSSE scheme. Informally speaking, the former captures that the newly updated entries cannot be related to previous search queries and the latter ensures that the search queries should not leak any information about indexes of deleted files. In the context of searchable encryption, the usual practice [[9], [13], [27]] is to argue security of SSE schemes by formulating a leakage profile L and proving that the leakage incurred in the proposed scheme is bounded by L . Recent works in the context of DSSE [[4], [6]] abstract out the information leakage incurred by a DSSE scheme in the form of definitions and attempt to justify that the formalized leakage profile adheres to the notion of forward/backward privacy. However, there could be several leakage profiles that capture the security notion in hand. One can choose among these formulations of leakage to achieve some sort of optimal balance between security and efficiency for the given context. Hence, the exercise of abstracting leakage-profiles of proposed constructions as alternative candidate definitions (as was the case for SSE [[8]]) may smoothen future evaluation of these leakages.

The question of what is the right definition of security is a vexed one [[31]]. Even for a widely used cryptographic primitive like digital signature, it has been argued that the accepted standard definition of security [[24]] does not take into account various issues that may crop up depending upon the application scenarios [[32], [35], [43]]. In the context of DSSE , security is argued by formally describing a leakage profile L and showing that the leakage incurred in construction is bounded by L . Hence, it is but natural that for a relatively new crypto/security protocol like DSSE , one needs to look at the formulations of leakage profiles from various perspectives. In this section, we perform a critical analysis of the existing notions of information leakage in backward private DSSE followed by an alternative formulation.

Bost at el. [[6]] made a seminal contribution in the area of DSSE by formalizing the notion of backward privacy through three different formulations viz., BPIP , BPUP and WBP , arranged in the ascending order in terms of information leakage. Naturally, like any other formalization of security these notions require further investigation. In that vein, we first reason why weak backward privacy ( WBP ) cannot be used to argue backward privacy in general and instead be reserved for backward privacy in a restricted setting only. Second, we introduce a relaxed leakage profile (Definition 3.5) that captures the notion of backward privacy. Finally, we introduce a desirable property for DSSE scheme called inverse backward privacy.

For simplicity we will assume that, DB is initially empty. Thus, the Setup algorithm leaks no information. If DB is not initially empty, typically, LSetup=N , where N denotes the number of document-keyword pairs. Let L1 and L2 denote two leakage profiles. If L1 leaks less than L2 , we denote it as L1L2 . By the proposition, " L1 leaks less than L2 " we mean that L1 gives less information about the database and the queries to the simulator than L2 , or, said otherwise, that every information given by L1 can be inferred from L2 . If L1 leaks strictly less than L2 , we denote it as L1L2 .

3.1. Forward privacy

We recall the strongest notion of forward privacy discussed in [[6]]. Informally, a DSSE scheme is forward private if the Update protocol leaks no information about the updated keywords. Definition 3.1 captures that an update operation doesn't leak more than the operation op={add,del} of the update query q .

Definition 3.1.

( FP-I ). An L={LSetup,LSearch,LUpdate} -semantically secure against adaptive attacks DSSE scheme Π is FP-I iff LSetup,LSearch,LUpdate can be written as:

LSetup()=.LSearch(w){sp(w),Hist(w)}.LUpdate(op,(ind,w)){op}.

3.2. Backward privacy

The notion of backward privacy was informally introduced in [[39]]. Intuitively, a backward private search for a keyword w should not reveal which documents in the past (which are now removed from the database) contained w. Like any other security property in the context of SSE , there could be several formulations of leakage profiles capturing this notion. Backward privacy was formalized in [[6]], through three different definitions, in the ascending order of information leakage, called respectively BPIP , BPUP and WBP . Informally, these formulations of leakages are described below:

  • Backward Privacy with insertion pattern ( BPIP ): During a search on some keyword w, BPIP schemes leak the documents currently matching w, when they were inserted, and the total number of updates on w.
  • Backward Privacy with update pattern ( BPUP ): During a search on w, BPUP schemes leak the documents currently matching w, when they were inserted, and when all the updates on w happened (but not their contents).
  • Weak backward privacy ( WBP ): During a search on w, WBP schemes leak the documents currently matching w, when they were inserted, when all the updates on w happened, and which deletion update canceled which insertion update.

Let us demonstrate the differences between these notions with an example. Consider the following entries in query list Q1 corresponding to keyword w: (1,add,(ind1,w)) , (4,add,(ind2,w)) , (5,del,(ind1,w)) , (12,add,(ind3,w)) . Let us consider the leakage for each definition after a search query on w at timestamp 15. The first notion reveals that ind2 and ind3 match keyword w and that this entries were added at time 4 and 12 respectively. It also reveals that there were a total of 4 updates for w. The second notion, additionally reveals that updates on w happened at time 1, 4, 5 and 12. Finally, the third definition also reveals that the index that was added for w at time 1 was deleted at time 5.

We now recall the additional leakage functions from [[5]] apart from the ones described in Section 2.4, that are required to formally capture the notions of backward privacy mentioned above.

For a keyword w, TimeDB(w) is the list of all documents matching w, excluding the deleted ones together with the timestamp of when they were inserted in the database. Formally, TimeDB(w) can be constructed from the query list Q as follows:

TimeDB(w)={(u,ind)|(u,add,(w,ind))and (1)u>u,(u,del,(w,ind))Q}.

The deletion history DelHist(w) of w is the list of timestamps for all deletion operations, together with the timestamp of the inserted entry it removes. Formally, DelHist(w) is constructed as:

DelHist(w)={(uadd,udel)|ind s.t. (uadd,add,(w,ind))and (2)(udel,del,(w,ind))Q}.

With these tools, we can now formally define these three notions of backward privacy formally.

Definition 3.2 (BPIP).

An L={LSetup,LSearch,LUpdate} -semantically secure against adaptive attacks DSSE scheme Π is BPIP iff LSetup,LSearch,LUpdate can be written as:

LSetup()=.LUpdate(op,(ind,w)){op}.LSearch(w){TimeDB(w),ow}.

Definition 3.3 (BPUP).

An L={LSetup,LSearch,LUpdate} -semantically secure against adaptive attacks DSSE scheme Π is BPUP iff LSetup,LSearch,LUpdate can be written as:

LSetup()=.LUpdate(op,(ind,w)){op,w}.LSearch(w){TimeDB(w),Updates(w)}.

Definition 3.4.

( WBP ). An L={LSetup,LSearch,LUpdate} -semantically secure against adaptive attacks DSSE scheme Π is WBP iff LSetup,LSearch,LUpdate can be written as:

LSetup()=.LUpdate(op,(ind,w)){op,w}.LSearch(w){TimeDB(w),DelHist(w)}.

As is evident from the above definitions, the notion of backward privacy is more involved than that of forward privacy. Forward privacy can be formalized by ensuring that the update query doesn't leak the keyword corresponding to which the update has been made. Whereas it is more subtle in the case of backward privacy. One approach is to formulate a strong leakage profile for the security notion in hand that allows very limited information to be leaked by the constructions satisfying it. This approach was followed in formulating leakage profiles in definitions BPIP and BPUP . A possible shortcoming of this approach is that there could be candidate constructions that don't satisfy these strong requirements but may still satisfy the intuitive notion of backward privacy. The other extreme could be to let "as much information as one can think of" to be leaked that can be allowed by the security notion in hand. This seems to be the approach followed in WBP security notion. However, in this approach one must be careful that the additional leakage in the leakage profile does not end up violating the basic security notion of the corresponding task. We revisit the notion of WBP from this perspective.

Remark.

LSearch in BPIP , BPUP and WBP should be augmented with leakage function sp(w) . Moreover, LSearch in WBP should also be augmented with leakage function Updates(w) . The rationale behind the same is described in Section 3.4.

3.3. Revisiting weak backward privacy

In this section, we scrutinize the notion of weak backward privacy. Let us consider the following entries in query list Q1 corresponding to w: (1,add,(ind,w)) , (3,w) , (5,del,(ind,w)) , (6,w) , (12,add,(ind,w)) , (14,del,(ind,w)) , (18,w) . Let us denote the search queries on w at timestamps 3, 6 and 18 by q1 , q2 and q3 respectively.

Leakage of the search query q1 :

TimeDB(w)={(1,ind)}.DelHist(w)=.

Leakage of the search query q2 :

TimeDB(w)=.DelHist(w)={(1,5)}.

Note that, DelHist(w) leaks that the add operation at timestamp 1 is canceled by the del operation at timestamp 5. Through the content of DB(w) after queries q1 and q2 the adversary can infer that the del operation at timestamp 5 corresponds to document ind . Therefore, it adheres to the notion of backward privacy described at the beginning of this section.

Now, let us consider the search query q3 . After the search query q2 , (ind,w) was added at timestamp 12 and later deleted at timestamp 14. Through the same intuitive notion of backward privacy, based on the state of DB(w) after queries q2 and q3 , the adversary should not infer which document does the update queries at timestamp 12 and 14 correspond to. However, the leakage of the search query q3 is:

TimeDB(w)=.DelHist(w)={(1,5),(1,14),(12,5),(12,14)}.

Hence, through the leakage profile the adversary can infer that updates at timestamp 12 and 14 correspond to document ind as it has already inferred which document the update queries at timestamp 1 and 5 correspond to. Clearly, this goes against the intuitive notion of backward privacy.

The following restriction is imposed on the constructions Dianadel and Janus that are proven to be weak backward private in [[6]]: "reinsertion of a document-keyword pair is not allowed after the deletion of the corresponding document-keyword pair". The reinsertion restriction allows one to avoid scenarios such as above that violate the intuitive notion of backward privacy. Hence, WBP can be considered to argue backward privacy in reinsertion restriction setting only. However, WBP -constructions proposed in subsequent works [[21], [41]] do not explicitly mention that the reinsertion of document-keyword pair is not allowed. Therefore, in order to avoid any ambiguity, we feel it needs to be explicitly mentioned that WBP is applicable in such restricted scenarios only.

Remark.

The case of reinsertion of document-keyword pair, may not be a concern in certain use-cases of SSE schemes where a new document identifier can be assigned to the updated document, thereby, ensuring that the newly inserted document-keyword pairs cannot be related to older ones. But this trick may not always be applicable especially when the contents of file can change dynamically over time. Here, one needs to handle reinsertion of keyword in existing documents, i.e., the document identifier can't be changed. Therefore, in such scenarios one needs to support reinsertion of document-keyword pairs. As an example, consider the case of a hospital database where the patients' records are documents and the disease they are suspected to suffer from are keywords. One cannot rule out a scenario in which based on newer symptoms a patient is re-suspected to suffer from a disease, say malignant brain tumor, which was ruled out earlier.

3.4. Suggested modifications

Here, we point out some subtle issues in Definitions 3.2, 3.3 and 3.4 and suggest modifications to address them. We first argue that sp(w) should be a part of LSearch(w) in definitions of BPIP , BPUP and WBP [[6]]. BPUP -secure Fides [[6]] as well as WBP -secure Janus [[6]] and Janus++ [[41]] leak sp(w) in Search protocol. Since, sp(w) is not mentioned explicitly to be part of the leakage profile in the definitions, one may conclude that sp(w) can always be derived from the other leakage functions in the respective definitions. However, consider the border-line scenario where no updates corresponding to w has occurred so far and two search queries on w are executed at timestamps 5 and 8 respectively. For search query at timestamp 8, sp(w)={5} . However, the state of other leakage functions at timestamp 8 are: TimeDB(w)= , Updates(w)= and DelHist(w)= . As can be observed, sp(w) cannot be derived from other leakage functions. Hence, sp(w) should be included in LSearch(w) of BPIP , BPUP and WBP .

Next, we argue that LSearch in WBP should also be augmented with leakage function Updates(w) . Recall that the informal notion of WBP in [[6]] states that a search query on w should at most leak the documents currently matching w, when they were inserted, when all the updates on w happened and the total number of updates on w. While formalizing the leakage profile of WBP [[6], Definition 4.2], the leakage in Search protocol is described as: LSearch(w)(TimeDB(w),DelHist(w)) . Note that, Updates(w) isn't explicitly a part of LSearch(w) . This gives the impression that, given TimeDB(w) and DelHist(w) one can construct Updates(w) . Now, consider the following entries in query list Q1 corresponding to w: (1,add,(1,w)) , (2,del,(2,w)) , (4,add,(3,w)) , (6,del,(1,w)) , (8,w) . Let us denote the search query on w at timestamps 8 by q1 .

Leakage functions at search query q1 :

TimeDB(w)={(4,3)}.DelHist(w)={(1,6)}.Updates(w)={1,2,4,6}.

Note that, TimeDB(w) and DelHist(w) collectively do not capture any information about the update operation at timestamp 2. As a result, Updates(w) cannot be constructed given the leakage functions TimeDB(w) and DelHist(w) . Hence, Updates(w) should be a part of LSearch(w) of WBP . Further, the construction Janus also leaks Updates(w) in Search protocol. Hence, Updates(w) needs to be included in the leakage profile of Janus construction.

Thus, we suggest that the respective information leakage definitions in the context of search should be:

LBPIP,Search:{sp(w),TimeDB(w),ow}.LBPUP,Search:{sp(w),TimeDB(w),Updates(w)}.LWBP,Search:{sp(w),TimeDB(w),Updates(w),DelHist(w)}.

3.5. A relaxed formulation of backward privacy

Constructing backward private DSSE schemes with optimal communication complexity is an interesting question to explore. To achieve this goal it seems necessary for S to be able to identify which insertion entry is canceled by a particular delete entry. However, the stronger leakage profile definitions of BPIP and BPUP do not allow such information to be leaked to S . Therefore, like WBP , one may need to allow some non-trivial relation among the update queries to be leaked. But unlike WBP it should be able to capture the notion of backward privacy in the general setting. We observe that there can be several alternative formulations of such information leakage. Here we describe one such candidate definition, i.e., Backward Privacy with Link Pattern BPLP and later in Section 4.2 propose a natural construction whose leakage profile is tightly captured by BPLP .

In order to describe the leakage profile of BPLP , we introduce some new leakage functions. Let DBx(w) and DBx+1(w) denote the set of documents matching w at two successive searches x and x+1 respectively and let LDB(w) denote the list of documents matching w in x+1 th search, in order of their insertion. An element in LDB(w) is of the form (i,ind) , where i denotes the index and ind denotes the document identifier. Let the timestamp of the xth and x+1 th search be denoted as ux and ux+1 respectively. We define leakage functions Link Pattern I ( LP-I ), Link Pattern II ( LP-II ) and Link Pattern III ( LP-III ):

LP-I(w)={(ind,u)|indDBx(w) and (u,op,(ind,w))Q and ux<u<ux+1}.LP-II(w)={(ind,u)|indDBx+1(w) and (u,op,(ind,w))Q and ux<u<ux+1}.LP-III(w)={(u1,u2)|(u1,op,(ind,w))Q and (u2,op,(ind,w))Q and ux<u1<u2<ux+1}.

LP-I(w) captures the relation between the identifiers obtained as a result in the xth search on w and the updates that happen between searches x and x+1 on w. LP-II(w) captures the relation between the identifiers obtained as a result in the x+1 th search on w and the updates that happen between searches x and x+1 on w. LP-III(w) captures the relation among the updates corresponding to same ind , that happen between x and x+1 search on w.

For example, let ( x+1 th) search query on w be at timestamp 16. Let the timestamp of the previous search query (x) on w be 4. Let DBx(w) and DBx+1(w) be {1,2} and {1,3} respectively. Let (5,del,(1,w)) , (7,del,(2,w)) , (12,add,(1,w)) and (15,add,(3,w)) be the update queries on w that occur between these two searches. The respective leakage functions will be:

LP-I(w)={(1,5),(2,7),(1,12)}.LP-III(w)={(5,12)}.LP-II(w)={(1,5),(1,12),(3,15)}.

Though the leakage functions defined above may appear a bit complex, they are useful abstractions through which BPLP captures the notion of backward privacy as shown below. In Search protocol, along with sp(w) and Updatesop(w) , leakage functions LP-I(w) , LP-II(w) , LP-III(w) and LDB(w) can be leaked to S . Let IBP denote the set of document identifiers corresponding to which the entries have been already deleted. Note that, indIBP , as indDBx(w) and as the last update operation on (ind,w) is del operation, it follows that indDBx+1(w) . Hence, no information about the ind s in IBP can be revealed through leakage functions LP-I(w) , LP-II(w) and LDB(w) . As the other leakage functions viz., sp(w) and LP-III(w) , do not leak ind corresponding to an update, no information about identifiers in IBP gets revealed.

Definition 3.5 (BPLP).

An L={LSetup,LSearch,LUpdate} -semantically secure against adaptive attacks scheme Π is BPLP iff LSetup,LSearch,LUpdate can be written as:

LSetup()=.LUpdate(op,(ind,w))=.LSearch(w){sp(w),Updatesop(w),LP-I(w),LP-II(w),LSearch(w)LP-III(w),LDB(w)}

Note that Definition 3.5 ( BPLP ) captures the notion of forward privacy as well. Therefore, constructions that are BPLP secure are naturally forward private.

Next, we give a comparison among the leakage incurred in Search protocol of BPUP and BPIP with BPLP (Definition 3.5). As remarked earlier, the notion of WBP is only suitable to argue backward privacy in a restricted setting. Hence, it would not be meaningful to compare the definitions of backward privacy which apply to the general setting with WBP .

From the search leakage in BPUP (Definition 3.3) and BPLP (Definition 3.5), one can conclude that all the leakage functions in LBPUP,Search can be derived from the leakage functions in LBPLP,Search , as TimeDB(w) can be derived from LP-II(w) , Updatesop(w) and DB(w) (in particular LDB(w) ) of the successive searches on w. Note that, TimeDB(w) comprises of information regarding the timestamp of the add updates corresponding to document identifiers in DB(w) that are not followed by their respective del update. This can be easily determined by keeping track of all the add updates corresponding to document identifiers in DB(w) that are not followed by the respective del update through LP-II(w) over all searches on w and Updatesop(w) . Further, some leakage functions in LBPLP,Search such as LP-III(w) , cannot be derived from LBPUP,Search (see below example). Hence, LBPUP,SearchLBPLP,Search . Figure 5 summarizes the relation among definitions of backward privacy.

Graph: Fig. 5. Relations among definitions of backward privacy.

Example.

Consider the entries corresponding to w in query lists Q1 and Q2 as described below. Corresponding to w, (1,add,(1,w)) , (2,add,(2,w)) , (3,del,(1,w)) , (4,del,(2,w)) , (5,add,(3,w)) and (6,w) are present in Q1 . Corresponding to w, (1,add,(1,w)) , (2,add,(2,w)) , (3,del,(2,w)) , (4,del,(1,w)) , (5,add,(3,w)) and (6,w) are present in Q2 .

Following is the description of various leakage functions at timestamp 6 in Q1 :

sp(w)={6}.Updates(w)={1,2,3,4,5}.TimeDB(w)={(5,3)}.LP-III(w)={(1,3),(2,4)}.

Similarly, the leakage functions at timestamp 6 in Q2 are described as:

sp(w)={6}.Updates(w)={1,2,3,4,5}.TimeDB(w)={(5,3)}.LP-III(w)={(1,4),(2,3)}.

As can be observed the leakage functions in LBPUP,Search , i.e., sp(w) , Updates(w) and TimeDB(w) are the same for query lists Q1 and Q2 . However, LP-III(w) is different for query lists Q1 and Q2 . Clearly, LP-III(w) cannot be uniquely determined from the leakage functions in LBPUP,Search .

With the newly introduced BPLP , we now have four different formulations of information leakage in the context of backward privacy. An interesting question would be to analyze the real-world consequences of the leakages incurred by constructions satisfying these different notions, viz., BPIP , BPUP , BPLP or even WBP (in restricted setting).

Inverse backward privacy. We propose a new desirable property for a DSSE scheme called inverse backward privacy which captures the complementary situation of backward privacy. Analogous to backward privacy, a DSSE scheme is inverse backward private if whenever a document-keyword pair (ind,w) is deleted and later added, subsequent search queries on w won't reveal the fact that (ind,w) was deleted unless it can be inferred by the search and access pattern of the search query. For example, let DB(w)={1,2} for search query q at timestamp 5. Let the update operations on w after query q and before the next search query be (6,del,(1,w)) , (12,add,(1,w)) , then no information about the identifier 1 in update queries at timestamp 6 and 12 should be revealed to S in the next search query on w.

Inverse Backward Privacy property could be of relevance in various use-cases. For instance, consider the employee database where the employee records correspond to document and project teams she works in correspond to keywords. An employee E1 maybe dropped and reincluded in a project team. We would like to hide the fact that employee E1 was dropped briefly, if no search on that project team had been performed during that period.

4. Backward private DSSE constructions

In this section we propose two backward private schemes ΠBP and ΠWBP that are respectively BPLP and WBP secure. Our starting point is a forward private DSSE scheme ΠFP which is a modified version of the scheme proposed in [[16]].

4.1. ΠFP: A warm-up solution

Graph: Fig. 6. Scheme ΠFP.

The central idea in ΠFP (Fig. 6) is to make updates using fresh keys. Hence, the keys disclosed in previous searches do not reveal anything about these new updates. ΠFP is described in Fig. 6. The construction makes use of PRF s Ft , Fd : {0,1}λ×{0,1}{0,1}λ and hash functions H1 : {0,1}λ×{0,1}{0,1}2λ and H2 : {0,1}λ×{0,1}{0,1}μ , where μ=λ+1 and λ is the security parameter.

The Setup algorithm generates secret keys kt and kd . C initiates three maps: T, D and W. The maps T and D are stored at S 's end. Corresponding to w, D stores the pointer to PSetw , the set of document identifiers in plaintext that were obtained as a result of the latest search operation on w and T stores encrypted entries inserted after the latest search operation on w. The map W is stored at C 's end. In W, corresponding to w, C stores the version verw (initialized to 0) and counter cw (initialized to −1). verw ensures that the key kw used in the Update protocol is unknown to S , cw stores information about the number of entries added to T corresponding to w after the latest search operation.

Update_ : When C wants to add/del a document-keyword pair ( ind,w ), it computes key kw using keyword w and verw (see line 5) and increments cw . Based on kw and cw , C computes the hash digests label and pad and sends (label,e=pad(b||ind)) to S . S then adds (label,e) to T. For add (resp. del ) operation, b=0 (resp. b=1 ).

Note that, kw used in processing update queries is computed using updated verw . Since, kw is output of PRF Ft at wverw , it is indistinguishable from random for S . As label (resp. pad ) is computed as H1(kwcw) (resp. H2(kwcw) ) for an update query, both are indistinguishable from random for S as H1 (resp. H2 ) is modeled as a random oracle. Hence, in the security proof, update queries can be simulated by generating random (label,e) pair.

Search_ : When C wants to perform a search query on w, it computes labelw (see line 5) and the key kw is computed (see line 7) only if new entries are inserted in map T. C sends labelw , kw and cw to S . kw gets revealed to S only if a new entry was inserted to T after the previous search on w. Hence, C updates the version verw (see line 8). verw is not updated in a search query on w for which corresponding to w, no updates on map T were made after the previous search on w. Based on the information received from C , S computes the result set and updates D with the newly computed result set.

In ΠFP , verw ensures that S cannot relate later updates with previous search queries and cw ensures that S cannot correlate the update queries on w done after the previous search operation on w.

Remark.

Essentially, ΠFP is same as the construction in [[16]] except the following: 1) The search counter corresponding to w (denoted as verw ) is updated differently than in [[16]] to avoid unnecessary increments to the search counter. 2) After a search operation on w, the revealed document identifiers are stored together ( DB(w) ) in plaintext (as suggested in [[6]]) to provide reasonable locality without incurring any additional leakage.

Example 1.

Consider the following list of update queries: (add,(1,w1)) , (add,(1,w2)) , (add,(2,w1)) , (add,(3,w1)) , (add,(3,w2)) , (del,(1,w1)) . Figure 7(a) shows the state of indexes at C and S after these updates are processed. Figure 7(b) shows the state of indexes at C and S after search on w1 .

Graph: Fig. 7.Example 1: indexes at C and S before and after search on w1 in construction ΠFP. W = client index, D = index that stores search results of previous search query at S and T = index that stores updates after the last search on w at S.

Correctness. The scheme is correct as long as there are no repeated labels in maps T and D. Since, Fd is a PRF , only with negligible probability labelw in D is same for two distinct keywords. The input to H1 is repeated only with negligible probability as Ft is a PRF . If we consider H1 to be a collision resistant hash function, only with negligible probability label in T is repeated.

In Appendix, we provide complete proofs of correctness and FP-I -security (see Definition 3.1) of ΠFP .

4.2. ΠBP: Realizing optimal communication complexity

Bost et al. [[6]], proposed a generic way to obtain a two-roundtrip backward-private scheme from a forward private DSSE scheme. Applying this generic transformation on ΠFP one gets a backward private scheme which is essentially the same as Mitra [[21]]. This backward private scheme is very efficient as it makes use of light-weight symmetric primitives only. However, the communication complexity of Search protocol in such a backward private scheme is O(ow) which is not optimal i.e., O(nw) . Further, as C has to process each ciphertext it receives from S , the computation complexity at C 's end also becomes O(ow) due to the above communication overhead. In order to obtain optimal communication complexity, S should send entries corresponding to only the set of documents currently matching w ( DB(w) ) to C . One approach to satisfy the above requirement is to associate a tag corresponding to each update entry. Using these tags, S , while performing a search on w will be able to correlate the update queries on w corresponding to the same ind . As in Dianadel and Janus , if the tags are generated deterministically using just ind and w, it leaks DelHist(w) and hence, will not satisfy the notion of backward privacy in the general setting. Here, we leverage the version verw to generate the tags in a simple yet non-trivial manner to ensure that the leakage is bounded by BPLP (see Definition 3.5). This results in the first backward private scheme in the general setting achieving optimal communication complexity using light-weight symmetric primitives only.

Graph: Fig. 8. Scheme ΠBP.

Graph: Fig. 9. Example 2: indexes at C and S before and after search on w1 in ΠBP. W = client index, D = index that stores tags of search results of previous search query at S and T = index that stores updates after the last search on w at S.

Scheme ΠBP is described in Fig. 8. For a keyword w, D stores the pointer to PSetw , the set of tags corresponding to document identifiers that were obtained as a result of the latest search operation on w. The construction makes use of PRF s Ft , Fd : {0,1}λ×{0,1}{0,1}λ , PRP Gtag : {0,1}λ×{0,1}2λ{0,1}2λ and hash functions H1 : {0,1}λ×{0,1}{0,1}2λ and H2 : {0,1}λ×{0,1}{0,1}μ , where μ=λ+1 .

For an update query (op,(ind,w)) , a tag corresponding to (ind,w) is generated using the current version verw . Then, btag is stored in T at S in Update protocol (see line 10). For add (resp. del ) operation, b=0 (resp. b=1 ).

Round 1 of Search protocol is similar to the Search protocol of ΠFP . At the end of round 1 of Search , S sends TS , the set of tags corresponding to the document identifiers currently matching w in order of their insertion. For every tag in TS , C computes Gtag1 to get the document identifier ind which it adds to AuxSet and re-computes tag using the updated version which it stores in PSet . AuxSet consists of all the document identifiers currently matching keyword w and PSet consists of the updated tags, in order of their insertion. C sends PSet and AuxSet to S , who stores PSet at D[labelw] and can use AuxSet to fetch the matching documents. Note that, recomputed tags enable S to consistently handle future searches and updates as the tags corresponding to subsequent updates on w are made using the same value of verw .

Let U be the set of update queries corresponding to w after the previous search and before the current search on w. The tags are generated using the same verw exclusively quU and indDB(w) . Since, the tags are computed using PRP Gtag taking w, ind and verw as input, S can only learn the relation between the tags corresponding to the same ind among the update queries in U and DB(w) . As the ind s in PSet are stored in order of insertion, S can link the update queries in U with these ind s. The above leakage is precisely captured in BPLP via leakage functions LP-I , LP-II and LP-III . Moreover, S cannot relate update queries in U with the update queries that are made before the previous search on w and after the current search on w as the tags are generated using different verw .

Example 2.

Consider the following list of update queries: (add,(1,w1)) , (add,(1,w2)) , (add,(2,w1)) , (add,(3,w1)) , (add,(3,w2)) , (del,(1,w1)) . Figure 9(a) shows the state of indexes at C and S after these updates are processed. Figure 9(b) shows the state of indexes at C and S after search on w1 .

To summarise, ΠBP achieves optimal communication complexity and uses symmetric primitives only. D provides reasonable locality as it stores the tags corresponding to previous search results together. ΠBP is easily parallelizable and doesn't impose reinsertion restriction.

Further, in order to obtain a response-hiding scheme, C sends only PSet to S in line 39 in second round of search protocol. We can eliminate the second round of communication using standard piggybacking technique [[16], [19]] and upload the updated tag set PSet with the next search query, thus, achieving a single roundtrip response-hiding backward private DSSE protocol.

Correctness. As mentioned in correctness of ΠFP , the scheme is correct as long as there are no repeated labels in maps T and D. Since, these labels are generated in the same manner as they were generated in ΠFP , correctness of ΠBP immediately follows from that of ΠFP .

Asymptotic complexity. The communication complexity of the Search protocol is O(nw) . The computational complexity of the Search protocol is O(ow) . The communication and computational cost of the Update protocol is O(1) . Space complexity at the server's end is O(N+D) , where D=wdw and at the client's end is O(mlog(n)) .

Security. We prove ΠBP is BPLP in the random oracle model. The proof relies on pseudo randomness of Fd , Ft and Gtag .

Theorem 4.1.

If Fd , Ft are secure PRF s, Gtag is a secure PRP and H1,H2 are hash functions modeled as random oracles outputting 2λ and μ bits respectively then ΠBP is BPLP secure (Definition3.5).

Proof.

We structure our proof using a sequence of games G0 to G5 . G0 will compute a distribution identical to RealAΠBP(λ) and G5 will compute a distribution that can be simulated perfectly given the leakage profile L , i.e., its distribution is identical to IdealA,SimΠBP(λ) .

Game G0 : G0 is exactly identical to RealAΠBP(λ) .

(3)Pr[RealAΠBP(λ)=1]=Pr[G0=1].

Game G1 : In G1 , every call to PRF s Ft and Fd are answered using tables Keyt and Keyd respectively. The entries in table Keyt are referred by (w,ver) and entries in table Keyd are referred by w. Conventionally, when an entry is being accessed for the first time, it is chosen at random and then used thereafter, which is followed in the rest of the paper unless mentioned explicitly. If there exists an adversary A that is able to distinguish between games G0 and G1 , we can construct an adversary B1 that can distinguish Ft from a truly random function and/or an adversary B2 that can distinguish Fd from a truly random function. Formally, there exist adversaries B1 and B2 , such that

|Pr[G0=1]Pr[G1=1]|AdvFt,B1PRF(λ)+AdvFd,B2PRF(λ)(4)neg(λ).

Game G2 : In G2 , every call to PRP Gtag is answered using table Keytag . The entries in table Keytag are referred by (w,ind,ver) . If the randomly generated tag has been selected earlier in Keytag , G2 is aborted. Since, the number of queries to PRP Gtag , say q, is a polynomial in security parameter, by the birthday bound we can conclude that the probability that the two tags are equal is at most q222λ , i.e., neg(λ) . Therefore, G2 aborts with negligible probability.

Now, if there exists an adversary A that is able to distinguish between games G1 and G2 , we can construct an adversary B that can distinguish Gtag from a truly random permutation. Formally, there exists an adversary B , such that

|Pr[G1=1]Pr[G2=1]|AdvGtag,BPRP(λ)+q222λ(5)neg(λ).

Game G3 : In G3 , instead of calling H1 to generate label in the Update protocol, we pick random strings. Then, during the Search protocol, the random oracle H1 is programmed accordingly, to ensure consistency.

Tables Hash1 and H1 are used to simulate the random oracle H1 , the entries in the table Hash1 are referred by (w,ver,c) and in the table H1 by ( k,c ).

Graph: Fig. 10.Games G3 and G3′ (Theorem 4.1). G3′ includes the box code and G3 does not.

Figure 10 formally describes G3 , and an intermediate game G3 . In G3 , H1 is never programmed to two different values for the same input, thus, ensuring consistency. Instead of storing the randomly picked value in table Hash1 at position (w,verw,cw) , one first checks whether H1 is already programmed at value (kw,cw) which can happen if there was a query to the random oracle H1 with input (kwcw) . If the check is true, the value H1(kwcw) is stored in Hash1[w,verw,cw] else the randomly picked value is stored in Hash1[w,verw,cw] . The random oracle when needed in the Search protocol in line 9 or by an adversary's query to random oracle H1 in line 5 is lazily programmed in G3 , so that the outputs are consistent throughout.

The only difference between game G2 and G3 is how we model the random oracle H1 . The outputs of H1 is perfectly indistinguishable in both these games, therefore,

(6)Pr[G3=1]=Pr[G2=1].

Let us denote the event "*"the flag bad is set to true in G3 by E1 . The games G3 and G3 are also perfectly identical unless the event E1 occurs, and we can apply identical-until-bad technique [[2]] to bound the distinguishing advantage between G3 and G3 .

(7)|Pr[G3=1]Pr[G3=1]|Pr[E1].

The event E1 occurs in line 8 of Update protocol and in line 5 of H1 algorithm (see Fig. 10). The former captures the fact that adversary has already queried random oracle H1 at input (kwcw) and the latter captures the fact that the adversary queries random oracle H1 on a valid input (kc) . Since, the value kw is picked uniformly at random and the adversary cannot do anything better than guessing the value of k , the probability with which event E1 occurs is negligible. Using (6) and (7), we can conclude:

(8)|Pr[G2=1]Pr[G3=1]|neg(λ).

Game G4 : In G4 , what we did for H1 in game G3 , we do for H2 . Using the same arguments, we can conclude:

(9)|Pr[G3=1]Pr[G4=1]|neg(λ).

Graph: Fig. 11.Game G5 (Theorem 4.1).

Game G5 : In G5 (see Fig. 11), we abstract out the information that needs to be simulated by the simulator in order to output transcripts identical to G4 . Using GetDatar1 and GetDatar2 algorithms in G5 , one keeps track of the randomly generated tag , label and pad differently than in G4 . In Search protocol, the random oracles are programmed identically to that in G4 . Queries to random oracles H1 and H2 can be simulated by outputting random values.

As we output fresh random strings in Update protocol, the transcripts of Update protocol is identical to that of Update protocol in G4 .

Next, we describe the Search protocol in G5 . Based on tables Update , STs (search timestamps) and Tag map, the value of the following components: emptyw , verw , cw and {tagc,H1,w,c,H2,w,c}0ccw are determined using GetDatar1 algorithm. Flag empty is set to 1 if Update[w] is empty. verw is the count of searches for which there was an update on map T corresponding to keyword w after the last search. The loop in line 6 of GetDatar1 algorithm determines the number of times version number is updated, i.e., value of verw . Here, cw denotes the count of updates on map T corresponding to keyword w after the last search. tag is picked from the map Tag , therefore, if the indices are same, the tags are equal, thus ensuring the consistency. The values {H1,w,c,H2,w,c}0ccw are used to simulate the random oracles consistently with the response given at the time of update queries. The loop in line 10 of GetDatar1 algorithm computes the values of cw and {tagc,H1,w,c,H2,w,c}0ccw . GetDatar2 algorithm outputs LDB(w) and new tags corresponding to the document identifiers currently present in DB(w) . These tags are ordered based on the order of insertion of documents they correspond to. As the values of components are computed correctly and consistently w.r.t. all the previous queries, the transcripts of Search protocol is identical to that of Search protocol in G4 . Therefore, we conclude that:

(10)Pr[G5=1]=Pr[G4=1].

Simulator Sim : Finally, we construct a simulator that given the leakage profile L simulates game G5 correctly. Sim can simulate Update protocol correctly as in G5 . Instead of using keyword w in Fig. 11, Sim uses the counter w=min sp(w) uniquely mapped from w using LSearch in simulating the Search protocol (line 6 and line 8). In line 2 of Search protocol, Sim uses sp(w) , Updatesop(w) , LP-I(w) and LP-III(w) instead of STs , Update and Tag as input to the GetDatar1 algorithm. In GetDatar1 , we use the timestamps of search and update queries and make use of indices to generate tag . However, the indices are used just to identify when same tags need to be generated. This can be ensured using, LP-I(w) and LP-III(w) .

Also, the output of GetDatar2 is LDB(w) , i.e., AuxSet and an ordered list of freshly generated random tags PSet which can be simulated easily using LP-II(w) and LDB(w) . In GetDatar2 , the indices are used to just associate an order to the generated tags, which can be ensured by the components LP-II(w) and LDB(w) of the leakage profile. Thus, Sim is able to produce transcripts of output of Search and Update protocols identical to G5 . Hence, we conclude that:

(11)Pr[IdealA,SimΠBP(λ)=1]=Pr[G5].

By connecting all the games, we conclude

(12)|Pr[RealAΠBP(λ)=1]Pr[IdealA,SimΠBP(λ)=1]|neg(λ).

4.3. ΠWBP: A weak backward private variant

Graph: Fig. 12. Scheme ΠWBP.

As mentioned in Section 3.3, there are various use-cases such as storing a collection of text files in which reinsertion restriction may not be a serious concern. Here, we propose a simple one roundtrip response-hiding backward private scheme ΠWBP in the reinsertion restriction setting. ΠWBP is essentially a simple modification to ΠBP and inherits all its salient features. ΠWBP (see Fig. 12) improves upon the concrete efficiency of Search protocol in ΠBP , by eliminating the computation and transmission of newly generated tags in the second round of communication. This shows that one can construct an efficient one-round trip WBP scheme with optimal communication complexity using simple primitives only.

In ΠBP , we use verw in computation of tags to securely handle reinsertions. The tags corresponding to fresh updates after a search are computed using the updated verw . Hence, these tags cannot be related to the entries that are deleted before this search. This ensures backward privacy in cases where reinsertion is allowed. However, when we consider the reinsertion restriction setting, an add query is not allowed after a delete query corresponding to the same document-keyword pair. Hence, verw is not required in computation of tags in this setting. Therefore, for an update query, tag in line 8 of Update protocol in ΠBP (see Fig. 8) is computed as tagGtag(kg,wind) in line 8 of Update protocol in ΠWBP (see Fig. 12).

Search protocol in ΠWBP is identical to Round 1 of Search protocol in ΠBP . In line 31 of Search protocol in ΠBP , S along with sending TS to C , stores TS at D[labelw] . From TS , C retrieves the search results in similar fashion as in ΠBP . Recomputation of tags in Search protocol in ΠBP is needed because verw gets updated. While in ΠWBP , since the tags are independent of verw , recomputation of tags is not required. Hence, the second round of communication is not needed.

The changes we make in ΠBP in constructing ΠWBP induces additional leakage which can be shown to be bounded by the leakage profile of WBP (Definition 3.4). The proof of Theorem 4.1 can be easily adapted to argue security of ΠWBP .

The crucial observation from our constructions is that efficient backward private schemes with optimal communication complexity can be realized without involving complex cryptographic primitives, as was the case in [[6], [41]]. The simplicity of design in our backward private constructions is an appealing feature, particularly from the implementation perspective.

4.4. Comparative analysis

In Table 1, we provided a comparison of our schemes with some prior and concurrent works [[6], [21], [41]]. On that line, we conclude this section with a comparative analysis of the currently available BP-secure DSSE schemes. The goal is to figure out the scenarios in which each of these constructions would fit best. Let us first consider the scenario where the requirement is to achieve minimal information leakage. The candidate constructions are Orion [[21]] and Moneta [[6]] as they satisfy strong notion of backward privacy ( BPIP ). As the search time of Orion is quasi-optimal in nw (linear in nw upto a logarithmic factor), it may appear to be more suitable than Moneta in such scenarios. However, Orion may not be practical for very large databases as Path-ORAM [[40]] is used as a building block in its construction, which limits the applicability of Orion in such scenarios [[33]]. Constructions Mitra [[21]] and Fides [[6]] satisfy the next level of backward privacy. Mitra makes use of symmetric primitives only and thus is very efficient in practice. But, it still suffers from significant communication overhead.

In order to overcome the above limitations, one may trade security a bit for performance, while at the same time ensure that the notion of forward and backward privacy is preserved. Constructions Dianadel [[6]], Janus [[6]], Janus++ [[41]], Horus [[21]] and ΠWBP (Section 4.3) satisfy the notion of weak backward privacy (WBP) . The communication and computation complexity of search and update protocols of Dianadel isn't optimal (See Table 1). To improve upon the communication overhead of Search protocol, a single round-trip Janus framework [[6]] was proposed. It was instantiated using asymmetric and symmetric puncturable encryption scheme in Janus [[6]] and Janus++ [[41]] respectively. However, the computational complexity of search protocol in Janus framework is O(nw·dw) , which is unreasonably high ( nw·dwow ). Horus , a modified version of Orion , was proposed in order to improve the number of round trips in the Search protocol ( O(log(N)) to O(log(dw)) ). But Horus suffers from the same scalability issue as Orion . In contrast, ΠWBP is a single-round trip DSSE scheme that achieves optimal communication complexity, makes use of symmetric primitives only and is very efficient in practice. However, the computation complexity of the Search protocol is not quasi-optimal in nw . Moreover, the notion of weak backward privacy can only be used in scenarios where reinsertion of keyword-document is not allowed.

BPLP (Definition 3.5) along with satisfying the intuitive notion of backward privacy, allows the reinsertion of keyword-document pairs. The corresponding construction, ΠBP (Section 4.2) is the first DSSE scheme that satisfies the notion of backward privacy in the general setting that achieves optimal communication complexity, makes use of symmetric primitives only and is very efficient in practice. The only limitation is that the search time in ΠBP is not quasi-optimal in nw which seems to be the cost that one has to pay to achieve optimal communication complexity in Update and Search protocol.

Remark.

As the size of EDB grows in Mitra [[21]] with every update, including deletions, the authors employed a periodic "clean-up" operation [[4], [6], [16], [39]]. In this operation, C removes the deleted entries after a search, re-encrypts the remaining ones, and sends them back to S . The resultant scheme is called Mitra [[21]]. If there are no deletions then for the first search on a keyword w, the actual computation cost of Mitra will be less compared to ΠBP as Mitra involves only PRF evaluations which can be realized using the blazingly fast AES block cipher whereas ΠBP involves hash function evaluations as well. However, for subsequent searches on the same keyword w, the hash function evaluations reduces drastically in ΠBP with the introduction of map D. Moreover, the communication cost of Mitra will be more than ΠBP even when there are no deletions. This is because in Mitra , C sends labels to S to identify the respective entries in the dictionary in addition to sending the re-encrypted (label, value) pairs.

5. Implementation results

In this section, we discuss the performance of schemes ΠFP , ΠBP and ΠWBP . ΠFP being currently the most efficient forward private scheme in literature, serves as a benchmark to evaluate the performance of other DSSE schemes. We also compare the performance of the proposed schemes with the most practically promising backward private schemes in literature, viz., Mitra [[21]] and Dianadel [[6]]. For Mitra and Dianadel , we used the codes available in [[11]]. The implementation results for ΠBP gives a fair indication about the performance of ΠWBP as their asymptotic computation complexity are the same and both make use of same light-weight symmetric primitives. Hence, the performance of ΠWBP is reported only on some parameters which gives a fair idea of how the performance of ΠWBP stacks against that of ΠBP and ΠFP .

We have implemented the schemes in C++. For pseudo random functions Fd , Ft and pseudo random permutation Gtag , we use AES, and for hash functions H1 and H2 , we use SHA-256. We use the AES and SHA-256 function available in OpenSSL library [[42]] in our code. Maps T and W are stored using RocksDB [[18]].

All our experiments were performed on a desktop computer with an Intel Core i5 4460 3.20 GHz CPU and 8 GB RAM running Ubuntu 16.04 LTS. Our code is designed to run as a single program as we are interested in determining the performance of Search and Update protocols of our constructions.

We used Enron email dataset [[15]] to create EDB on which we perform our search and update operations. We wrote a python code to extract keywords from the mails in Enron email dataset using NLTK library [[36]]. The number of documents, number of keywords and number of document-keyword pairs in our dataset are 517,401, 212,020 and 36,688,028 respectively.

Table 3 EDB creation

ImplementationTime (s)Time per pair* (μs)Strg. at

S

(GB)
Strg. at

C

(MB)

ΠFP

164.664.492.82.4

ΠWBP

179.044.882.82.4

ΠBP

182.904.982.82.4

Mitra

223.596.091.32.4

Dianadel

327.038.912.14.1

* – document-keyword pair, Strg. = Storage.

EDB creation. EDB was created to store all the document-keyword pairs extracted from the Enron email dataset. The computational works and I/O latency required for EDB creation are parallelized using thread pool. Table 3 shows the time taken to create an EDB , the time taken to process each document-keyword pair and the size of EDB and W just after EDB creation for schemes ΠFP , ΠWBP , ΠBP , Mitra and Dianadel .

For each entry in Table 3, we ran our experiment 10 times and computed the average. The time taken to create EDB for schemes ΠWBP and ΠBP is just 8.6% and 10.9% more than the time taken to create EDB for scheme ΠFP . ΠWBP and ΠBP improve upon the performance of currently the most efficient backward private scheme Mitra by 19.9% and 18.2% respectively. The per document-keyword processing time is very less for all the schemes as only symmetric primitives are used in these constructions. For ΠWBP and ΠBP , the size of EDB is same as in ΠFP . However, the size of EDB in Mitra and Dianadel is comparatively lesser. This is because in our implementation the labels were computed using SHA-256 (256 bits long), whereas the implementation of Mitra and Dianadel in [[11]] utilized AES (128 bits long) for the same. Our choice is guided by [[4], [30]], who suggest for security parameter λ, the size of the labels, μ, must be atleast λ+2log(N) to ensure correctness of the scheme. The storage requirement at C 's end for all the schemes is very less, as C 's index is only needed to book keep the respective counter values corresponding to the keywords.

Since, EDB is created by calling the Update protocol of the respective constructions, it also gives a fair indication of the performance of the Update protocol. Hence, we do not separately compare the performance of Update protocol.

Graph: Fig. 13. Average per entry search time (single-threaded).

EDB search. To evaluate the search performance, we searched all the keywords extracted from the Enron email dataset just after EDB creation and measured the overall time taken for the Search protocol. The main purpose of this experiment was to evaluate the search performance independent of the improvements achieved through locality and efficient handling of delete queries in our proposed schemes. We first begin with evaluating the performance of our proposed scheme with Mitra and Dianadel . Figure 13 describes the search time per matched entry ( stpme ) based on the number of documents returned in the search results for single-threaded instances of schemes ΠFP , ΠBP , ΠWBP and Mitra . In Figs 13 and 14,

RS(i)=DB(w)=1if i=02i1<DB(w)2iif 1i18

denotes discretization of the result set size.

Graph: Fig. 14. Average per entry search time (four-threaded).

Graph: Fig. 15. Average per entry search time as a function of number of threads.

The performance of Mitra is better than ΠBP for RS(0) and RS(1). This is mainly because of one time cost of accessing map D which is amortized for searches matching large number of documents. From then on our proposed schemes, ΠWBP and ΠBP , keep performing better than Mitra . ΠWBP and ΠBP achieve 2× improvement over Mitra in most of the cases. Further, as can be observed from the figure ΠWBP and ΠBP do not incur a significant overhead over ΠFP . For instance, the stpme in ΠFP , ΠBP , ΠWBP and Mitra is 6.81, 9.39, 8.16 and 19.20 μs for queries whose result set size was in the interval (217,218] . The performance of Dianadel is orders of magnitude slower than the rest of the schemes, hence not included in Fig. 13. For example, the stpme in Dianadel is 599 and 294 μs for queries whose result set size was in the interval RS(0) and RS(18) respectively. As can be observed from the performance results, the improvement in computation cost is quiet notable. However, the main performance benefit is achieved by remarkably reducing the communication cost which is described later in this section.

Next, we describe the benefit of parallelism in the performance of Search protocol in our constructions. The structure of ΠBP construction is similar to Mitra and ΠWBP . Therefore, ΠBP will continue to have the same advantage (resp. disadvantage) in multi-threaded environment over Mitra (resp. ΠWBP ), as in single-threaded implementation. Figure 14 describes stpme based on the number of documents returned in the search results for multi-threaded (four-threaded) instances of schemes ΠFP and ΠBP . For searches matching less number of documents, the cost is high because of one time computations such as storage access in computation of token at C 's end, creation of threads, access to map D, etc., which is amortized for searches matching large number of documents. Further, on looking closely at Fig. 13 and 14, one can observe that the time taken by multi-threaded instance is more than in single threaded instance for RS(0) and RS(1). This is because of unnecessary overhead of creation of multiple threads. In multi-threaded environment, the performance of ΠBP is very close to the performance of ΠFP . The stpme in ΠBP was 3.6 (resp. 4.89) μs for queries whose result set size was in the interval (217,218] . Figure 15 illustrates that the stpme for both the schemes is affected by the number of threads used to perform the search operation. The schemes performed the best when the number of threads were around the number of cores in the processor, i.e., 4.

EDB dynamic environment. In this experiment, we study the performance of search queries in dynamic environment, i.e., where search queries are interspersed with update queries. For this purpose, we identified a set of keywords, denoted by S80k , in which each keyword matches more than 80k many documents in the Enron Dataset. An initial EDB was constructed using extracted document-keyword pairs apart from those corresponding to keywords in S80k . The document-keyword pairs corresponding to keywords in S80k are then utilized to perform update queries dynamically. Figure 16 describes the stpme with regard to the probability (p) of search queries on keywords in S80k . This implies that the update queries occur with probability 1p , of which, with probability 0.1 it is a del query. The performance evaluation includes the time required to process del operation in search queries. Figure 16 illustrates that the average stpme decreases in scenarios where search queries are frequent, as all these schemes exploit the locality introduced by D. The average stpme in ΠBP (resp. ΠWBP ) turns out to be 1.53 μs (resp 0.58 μs), when the probability of search query is 0.0005. Schemes ΠWBP and ΠFP perform neck-to-neck on this metric. Constructions ΠFP , ΠWBP and ΠBP leverage the locality improvement obtained through the introduction of map D to achieve this significant boost in performance.

The prototype implementations of ΠFP , ΠBP and ΠWBP indicate that the cost of achieving backward privacy over and above forward privacy is substantially small. Moreover, it also indicates that ΠWBP and ΠBP have an appreciable edge in terms of computation cost over the existing backward private constructions. This makes ΠBP and ΠWBP suitable candidates for practical applications.

Graph: Fig. 16. Average per entry search time (dynamic environment).

Communication cost. As all our constructions make use of symmetric primitives only, the communication cost becomes the main performance bottleneck. We now compare the communication cost of Mitra , ΠBP and ΠWBP as a function of the nature of update queries. Figure 17 depicts the communication cost of the trio based on the number of documents returned in the search results and the probability (d) with which an update query is a del query. As can be observed, for ΠBP and ΠWBP the communication cost depends only upon the result set size and is independent of the nature of update queries, i.e., metric d. Therefore, the communication complexity is the same for all types of update query distribution for ΠBP and ΠWBP . However, that is not the case for Mitra , as can be observed from Fig. 17, where the communication complexity increases with increase in the percentage of delete queries. Concretely, ΠWBP has the least communication overhead among these three backward private constructions in practice. For instance, for a query result size of 262144 documents, the communication cost in ΠWBP , ΠBP and Mitra respectively is 8.4 GB, 18.9 GB and 29.4 GB which rises to 92.3 GB when the probability of delete query is 0.4. Hence, the communication cost of our constructions fair reasonably well against the most efficient construction until this work.

Graph: Fig. 17. Communication cost: number of matching documents vs. probability of delete queries.

6. Conclusion

The main contribution of this paper is to propose two efficient backward private DSSE schemes, viz., ΠBP and ΠWBP . We start with revisiting the existing definitions of backward privacy and propose an alternative formulation of leakage for backward privacy, BPLP . The proposed constructions achieve practical efficiency by using light weight symmetric cryptographic components only. In particular, our construction ΠBP is the first backward private scheme in the general setting that achieves optimal communication complexity using symmetric cryptographic primitives only. The main takeaway from this work is that efficient backward private schemes can be realized without involving complex cryptographic primitives. The simplicity of their design make our backward private constructions even more appealing, particularly from the implementation perspective. On the definition front, an interesting question arising out of this study is to analyze the real-world consequences of the leakages incurred by constructions satisfying the notion of backward privacy, viz., BPIP , BPUP BPLP or even WBP (in restricted setting). On the construction front, an interesting problem to pursue is to design an efficient, single roundtrip, response revealing backward private scheme in the general setting ideally with optimal communication complexity.

Appendix Correctness and security of ΠFP

A.1. Correctness

Theorem A.1.

If Fd , Ft are secure PRF s and H1 is a collision-resistant hash function then ΠFP is correct.

Proof.

We use the games G0 and G1 . In the modification snippets, G1 includes the box code and G0 does not. The games are identical to ΠFP (see Fig. 6) except for the following changes:

  • (1) In the Setup algorithm, we initialize the sets LabSet1 , LabSet2 to null set and set boolean variable bad to false .
  • (2) We remove line 5 and 6 of Update protocol and after line 4 in the Update protocol, we add the following code:

Graph

  • (3) We replace line 5 in the Search protocol with the following code:

Graph

  • (4) We replace line 7 in the Search protocol with the following code:

Graph

The first game G0 will output 1, only if bad is set, as repeated labels in maps T and D are the only source of incorrectness. G0 produces an identical distribution to real game when bad is not set. If the value assigned to label is repeated, G0 replaces it with new value which hasn't been assigned to any label up till now.

Let us denote the event "the flag bad is set to true " in G0 by E0 . This gives,

(13)Pr[DSSECorAΠFP(λ)=1]Pr[E0].

In G1 , every call to PRF s Ft and Fd are answered using tables Keyt and Keyd respectively. The entries in table Keyt are referred by (w,ver) and entries in table Keyd are referred by w.

Let us denote the event "the flag bad is set to true " in G1 by E1 .

If there exists an adversary A that is able to distinguish between games G0 and G1 , we can construct an adversary B1 that can distinguish Ft from a truly random function and/or an adversary B2 that can distinguish Fd from a truly random function. Formally, there exist adversaries B1 and B2 , such that

|Pr[E1]Pr[E0]|AdvFt,B1PRF(λ)+AdvFd,B2PRF(λ)(14)neg(λ).

The event E1 occurs only when the newly picked label value was already picked earlier in G1 . E1 occurs in line 7 of modification (3) above, if the same label is generated for more than one keyword. Since, the labels are picked uniformly at random in line 5 and the number of keywords, m, is a polynomial in security parameter, by the birthday bound we can conclude that the probability that the two labels for map D are equal is at most m22λ , i.e., neg(λ) .

Further, the key kw is picked uniformly at random (see line 5 of modification (2)) and the number of updates on T, say q, is a polynomial in security parameter. By the birthday bound we can conclude that the probability that the two keys are equal is at most q22λ , i.e., neg(λ) . Therefore, only with negligible probability the input to H1 is repeated. So, if E1 occurs in line 8 of modification (2), one can find collision in the hash function H1 . Since, H1 is collision resistant this happens with negligible probability. Therefore,

(15)Pr[E1]neg(λ).

From (13), (14) and (15), we get

(16)Pr[DSSECorAΠFP(λ)=1]neg(λ).

A.2. Security

Theorem A.2.

If Fd , Ft are secure PRF s and H1 , H2 are hash functions modeled as random oracles outputting 2λ and μ bits respectively then ΠFP is FP-I secure (Definition 3.1 ).

Proof.

We structure our proof using a sequence of games G0 to G4 . G0 will compute a distribution identical to RealAΠFP(λ) and G4 will compute a distribution that can be simulated perfectly given L , i.e., its distribution is identical to IdealA,SimΠFP(λ) and the intermediate games are hybrids.

Game G0 : G0 is exactly identical to RealAΠFP(λ) .

(17)Pr[RealAΠFP(λ)=1]=Pr[G0=1].

Game G1 : In G1 , every call to PRF s Ft and Fd are answered using tables Keyt and Keyd respectively. The entries in table Keyt are referred by (w,ver) and entries in table Keyd are referred by w. If there exists an adversary A that is able to distinguish between games G0 and G1 , we can construct an adversary B1 that can distinguish Ft from a truly random function and/or an adversary B2 that can distinguish Fd from a truly random function. Formally, there exist adversaries B1 and B2 , such that

|Pr[G0=1]Pr[G1=1]|AdvFt,B1PRF(λ)+AdvFd,B2PRF(λ)(18)neg(λ).

Graph: Fig. 18. Games G2 and G2′ (Theorem A.2). G2′ includes the box code and G2 does not.

Game G2 : In G2 , instead of calling H1 to generate label in the Update protocol, we pick random strings. Then, during the Search protocol, the random oracle H1 is programmed accordingly to ensure consistency.

Table Hash1 and H1 are used to simulate the random oracle H1 , the entries in the table Hash1 are referred by (w,ver,c) and in the table H1 by ( k,c ).

Figure 18 formally describes G2 , and an intermediate game G2 . In G2 , H1 is never programmed to two different values for the same input, thus, ensuring consistency. Instead of storing the randomly picked value in table Hash1 at position (w,verw,cw) , it first checks whether H1 is already programmed at value (kw,cw) which can happen if there was a query to the random oracle H1 with input (kwcw) . If the check is true, the value H1(kwcw) is stored in Hash1[w,verw,cw] else the randomly picked value is stored in Hash1[w,verw,cw] . The random oracle when needed in the Search protocol in line 9 or by an adversary's query to the random oracle H1 in line 5 is lazily programmed in G2 , so that the outputs are consistent throughout.

The only difference between game G1 and G2 is how we model the random oracle H1 . The outputs of H1 is perfectly indistinguishable in both these games, therefore,

(19)Pr[G2=1]=Pr[G1=1].

Let us denote the event "the flag bad is set to true " in G2 by E1 . The games G2 and G2 are also perfectly identical unless the event E1 occurs, and we can apply identical-until-bad technique to bound the distinguishing advantage between G2 and G2 .

(20)|Pr[G2=1]Pr[G2=1]|Pr[E1].

The event E1 occurs in line 7 of Update protocol and in line 5 of H1 algorithm of Fig. 18. The former captures the fact that adversary has already queried random oracle H1 at input (kwcw) and the latter captures the fact that the adversary queries random oracle H1 on a valid input (kc) . Since, the value kw is picked uniformly at random and the adversary cannot do anything better than guessing the value of k , the probability with which event E1 occurs is negligible. Using (19) and (20), we can conclude:

(21)|Pr[G1=1]Pr[G2=1]|neg(λ).

Game G3 : In G3 , what we did for H1 in game G2 , we do for H2 . Using the same arguments, we can conclude:

(22)|Pr[G2=1]Pr[G3=1]|neg(λ).

Graph: Fig. 19. Game G4 (Theorem A.2).

Game G4 : In G4 (see Fig. 19), we abstract out the information that needs to be simulated by the simulator in order to output transcripts identical to G3 . Using GetData algorithm in G4 , one keeps track of the randomly generated label and pad differently than in G3 . In Search protocol, the random oracles are programmed identically to that in G3 . Queries to random oracles H1 and H2 can be simulated by outputting random values.

The transcripts outputted by Update protocol is identical to that of Update protocol in G3 , as we output fresh random strings in the Update protocol.

Next, we describe the Search protocol in G4 . Based on tables Update and STs , the value of the following components: empty , verw , cw and {indw,c,H1,w,c,H2,w,c}0ccw is determined using GetData algorithm. Flag empty is set to 1 if Update is empty. verw is the count of searches for which there was an update on map T corresponding to keyword w after the last search. The loop in line 6 of GetData determines the number of times version number is updated, i.e., value of verw . Here, cw denotes the count of updates on map T corresponding to keyword w after the last search. {indw,c}0ccw are the document identifier values along with operation bit that have been added to T after the last search operation and the values {H1,w,c,H2,w,c}0ccw are used to simulate the random oracles consistently with the response given at the time of update queries. The loop in line 12 of GetData computes the values of cw and {indw,c,H1,w,c,H2,w,c}0ccw . As the value of components are computed correctly and consistently w.r.t. previous queries, the transcripts of Search protocol is identical to that of Search protocol in G3 . Therefore, we conclude that:

(23)Pr[G4=1]=Pr[G3=1].

Simulator Sim : Finally, we construct a simulator that given the leakage profile L simulates game G4 correctly. It is easy to see that, Sim can simulate Update protocol correctly. Instead of using keyword w, Sim uses the counter w=minsp(w) uniquely mapped from w using LSearch in simulating the Search protocol (line 6 and line 8 of Fig. 19). In line 2 of Search protocol of Fig. 19, Sim uses sp(w) and Hist(w) instead of STs and Update as input to the GetData algorithm. Thus, Sim is able to produce transcripts of output of Search and Update protocols identical to G4 . Hence, we conclude that:

(24)Pr[IdealA,SimΠFP(λ)=1]=Pr[G4].

By connecting all the games, we conclude

|Pr[RealAΠFP(λ)=1]Pr[IdealA,SimΠFP(λ)=1]|neg(λ).

Acknowledgments

We thank the anonymous reviewers for their elaborate and insightful comments which helped us in improving the overall presentation of our work.

References 1 M.A. Abdelraheem, T. Andersson and C. Gehrmann, Inference and record-injection attacks on searchable encrypted relational databases, IACR Cryptology ePrint Archive. 2017 (2017), 24. 2 M. Bellare and P. Rogaway, The security of triple encryption and a framework for code-based game-playing proofs, in. EUROCRYPT, LNCS, Vol. 4004, Springer, 2006, pp. 409 – 426. 3 D. Boneh and B. Waters, Constrained pseudorandom functions and their applications, in. ASIACRYPT, LNCS, Vol. 8270, Springer, 2013, pp. 280 – 300. 4 R. Bost, Σo φ o ς. Forward secure searchable encryption, in. ACM CCS, ACM Press, 2016, pp. 1143 – 1154. 5 R. Bost, Searchable encryption. New constructions of encrypted databases, PhD thesis, 2018. 6 R. Bost, B. Minaud and O. Ohrimenko, Forward and backward private searchable encryption from constrained cryptographic primitives, in. ACM CCS, ACM Press, 2017, pp. 1465 – 1482. 7 E. Boyle, S. Goldwasser and I. Ivan, Functional signatures and pseudorandom functions, in. PKC, LNCS, Vol. 8383, Springer, 2014, pp. 501 – 519. 8 D. Cash, P. Grubbs, J. Perry and T. Ristenpart, Leakage-abuse attacks against searchable encryption, in. ACM CCS, ACM Press, 2015, pp. 668 – 679. 9 D. Cash, J. Jaeger, S. Jarecki, C.S. Jutla, H. Krawczyk, M. Rosu and M. Steiner, Dynamic searchable encryption in very-large databases. Data structures and implementation, in. NDSS, The Internet Society, 2014. D. Cash, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. Roşu and M. Steiner, Highly-scalable searchable symmetric encryption with support for Boolean queries, in. CRYPTO, LNCS, Vol. 8042, Springer, 2013, pp. 353 – 373. G. Chamani, SSE, https://github.com/jgharehchamani/SSE (Accessed. 2019-10-08). Y. Chang and M. Mitzenmacher, Privacy preserving keyword searches on remote encrypted data, in. ACNS, Lecture Notes in Computer Science, Vol. 3531, 2005, pp. 442 – 455. M. Chase and S. Kamara, Structured encryption and controlled disclosure, in. ASIACRYPT, LNCS, Vol. 6477, Springer, 2010, pp. 577 – 594. R. Curtmola, J.A. Garay, S. Kamara and R. Ostrovsky, Searchable symmetric encryption. Improved definitions and efficient constructions, in. ACM CCS, ACM Press, 2006, pp. 79 – 88. Enron Email dataset, https://www.cs.cmu.edu/~enron/ (Accessed. 2018-05-14). M. Etemad, A. Küpçü, C. Papamanthou and D. Evans, Efficient dynamic searchable encryption with forward privacy, PoPETs. 2018 (1) (2018), 5 – 20. S. Faber, S. Jarecki, H. Krawczyk, Q. Nguyen, M. Rosu and M. Steiner, Rich queries on encrypted data. Beyond exact matches, in. ESORICS, LNCS, Vol. 9327, Springer, 2015, pp. 123 – 145. Facebook, RocksDB. A persistent key-value store for fast storage environment, https://rocksdb.org/ (Accessed. 2018-05-14). S. Garg, P. Mohassel and C. Papamanthou, TWORAM. efficient oblivious RAM in two rounds with applications to searchable encryption, in. CRYPTO, LNCS, Vol. 9816, Springer, 2016, pp. 563 – 592. C. Gentry, A fully homomorphic encryption scheme, PhD thesis, Stanford, CA, USA, 2009. ISBN 978-1-109-44450-6. J. Ghareh Chamani, D. Papadopoulos, C. Papamanthou and R. Jalili, New constructions for forward and backward private symmetric searchable encryption, in. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS '18, ACM, 2018, pp. 1038 – 1055. O. Goldreich, S. Goldwasser and S. Micali, How to construct random functions (extended abstract), in. FOCS, IEEE Computer Society Press, 1984, pp. 464 – 479. O. Goldreich and R. Ostrovsky, Software Protection and Simulation on Oblivious RAMs, J. ACM (1996), 431–473. doi. 10.1145/233551.233553. S. Goldwasser, S. Micali and R.L. Rivest, A "paradoxical" solution to the signature problem (extended abstract), in. FOCS, IEEE Computer Society, 1984, pp. 441 – 448. S. Goldwasser, S. Micali and R.L. Rivest, A digital signature scheme secure against adaptive chosen-message attacks, SIAM J. Comput.. 17 (2) (1988), 281 – 308. doi. 10.1137/0217017. M.D. Green and I. Miers, Forward secure asynchronous messaging from puncturable encryption, in. IEEE Symposium on Security and Privacy, IEEE Computer Society Press, 2015, pp. 305 – 320. S. Kamara and T. Moataz, Boolean searchable symmetric encryption with worst-case sub-linear complexity, in. EUROCRYPT, LNCS, Vol. 10212, Springer, 2017, pp. 94 – 124. S. Kamara, C. Papamanthou and T. Roeder, Dynamic searchable symmetric encryption, in. ACM CCS, ACM Press, 2012, pp. 965 – 976. A. Kiayias, S. Papadopoulos, N. Triandopoulos and T. Zacharias, Delegatable pseudorandom functions and applications, in. ACM CCS, ACM Press, 2013, pp. 669 – 684. K.S. Kim, M. Kim, D. Lee, J.H. Park and W. Kim, Forward secure dynamic searchable symmetric encryption with efficient updates, in. ACM CCS, ACM Press, 2017, pp. 1449 – 1463. N. Koblitz and A. Menezes, Another look at security definitions, Adv. in Math. of Comm.. 7 (1) (2013), 1 – 38. doi. 10.3934/amc.2013.7.1. A. Menezes and N.P. Smart, Security of signature schemes in a multi-user setting, Des. Codes Cryptography. 33 (3) (2004), 261 – 274. doi. 10.1023/B:DESI.0000036250.18062.3f. M. Naveed, The fallacy of composition of oblivious RAM and searchable encryption, IACR Cryptology ePrint Archive. 2015 (2015), 668. M. Naveed, M. Prabhakaran and C.A. Gunter, Dynamic searchable encryption via blind storage, in. IEEE Symposium on Security and Privacy, IEEE Computer Society Press, 2014, pp. 639 – 654. T. Pornin and J.P. Stern, Digital signatures do not guarantee exclusive ownership, in. ACNS, LNCS, Vol. 3531, 2005, pp. 138 – 150. NLTK Project, Natural Language Toolkit, https://www.nltk.org/ (Accessed. 2018-05-14). D.X. Song, D. Wagner and A. Perrig, Practical techniques for searches on encrypted data, in. IEEE Symposium on Security and Privacy, IEEE Computer Society Press, 2000, pp. 44 – 55. X. Song, C. Dong, D. Yuan, Q. Xu and M. Zhao, Forward private searchable symmetric encryption with optimized I/O efficiency, in. IEEE Transactions on Dependable and Secure Computing, 2018. E. Stefanov, C. Papamanthou and E. Shi, Practical dynamic searchable encryption with small leakage, in. NDSS, The Internet Society, 2014. E. Stefanov, M. van Dijk, E. Shi, C.W. Fletcher, L. Ren, X. Yu and S. Devadas, Path ORAM. An extremely simple oblivious RAM protocol, in. ACM CCS, ACM Press, 2013, pp. 299 – 310. S.-F. Sun, X. Yuan, J.K. Liu, R. Steinfeld, A. Sakzad, V. Vo and S. Nepal, Practical backward-secure searchable encryption from symmetric puncturable encryption, in. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS '18, ACM, 2018, pp. 763 – 780. The OpenSSL Project, OpenSSL Cryptography and SSL/TLS Toolkit, https://www.openssl.org/ (Accessed. 2018-05-14). S. Vaudenay, Digital signature schemes with domain parameters. Yet another parameter issue in ECDSA, in. ACISP, LNCS, Vol. 3108, Springer, 2004, pp. 188 – 199. Y. Zhang, J. Katz and C. Papamanthou, All your queries are belong to us. The power of file-injection attacks on searchable encryption, in. USENIX Security Symposium, USENIX Association, 2016, pp. 707 – 720.

By Sanjit Chatterjee; Shravan Kumar Parshuram Puria and Akash Shah

Reported by Author; Author; Author

Titel:
Efficient backward private searchable encryption
Autor/in / Beteiligte Person: Shravan Kumar Parshuram Puria ; Shah, Akash ; Chatterjee, Sanjit
Link:
Zeitschrift: Journal of Computer Security, Jg. 28 (2020-03-17), S. 229-267
Veröffentlichung: IOS Press, 2020
Medientyp: unknown
ISSN: 1875-8924 (print) ; 0926-227X (print)
DOI: 10.3233/jcs-191322
Schlagwort:
  • Scheme (programming language)
  • Cryptographic primitive
  • Theoretical computer science
  • Computer Networks and Communications
  • business.industry
  • Computer science
  • 05 social sciences
  • 020206 networking & telecommunications
  • Cryptography
  • 02 engineering and technology
  • Encryption
  • Symmetric-key algorithm
  • Hardware and Architecture
  • Information leakage
  • 0202 electrical engineering, electronic engineering, information engineering
  • Overhead (computing)
  • 0501 psychology and cognitive sciences
  • Safety, Risk, Reliability and Quality
  • business
  • Communication complexity
  • computer
  • Software
  • 050104 developmental & child psychology
  • computer.programming_language
Sonstiges:
  • Nachgewiesen in: OpenAIRE

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

oder
oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

oder
oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

xs 0 - 576
sm 576 - 768
md 768 - 992
lg 992 - 1200
xl 1200 - 1366
xxl 1366 -