Efficient backward private searchable encryption

Shravan Kumar Parshuram Puria ; Shah, Akash ; et al.

In: Journal of Computer Security, Jg. 28 (2020-03-17), S. 229-267

Online unknown

Zugriff:

Volltext (PDF)

Dynamic Searchable Symmetric Encryption ( DSSE), apart from providing support for search operation, allows a client to perform update operations on outsourced database efficiently. Two security properties, viz., forward privacy and backward privacy are desirable from a DSSE scheme. The former captures that the newly updated entries cannot be related to previous search queries and the latter ensures that search queries should not leak matching entries after they have been deleted. These security properties are formalized in terms of the information leakage that can be incurred by the respective constructions. Existing backward private constructions either have a non-optimal communication overhead or they make use of heavy cryptographic primitives. Our main contribution consists of two efficient backward private schemes I BP and I WBP that aim to achieve practical efficiency by using light weight symmetric cryptographic components only. In the process, we also revisit the existing definitions of information leakage for backward privacy Bost et al. (In ACM CCS (2017) 1465-1482 ACM Press) and propose a relaxed formulation. I BP is the first construction to achieve backward privacy in the general setting with optimal communication complexity. Our second construction, I WBP , is the first single round-trip scheme achieving backward privacy in a restricted setting with optimal communication complexity using light weight symmetric cryptographic primitives. The prototype implementations of our schemes depict the practicability of the proposed constructions and indicate that the cost of achieving backward privacy over forward privacy is substantially small. The performance results also show that the proposed constructions outperform the currently most efficient scheme achieving backward privacy. Â© 2020 - IOS Press and the authors. All rights reserved.

Efficient backward private searchable encryption

Dynamic Searchable Symmetric Encryption (DSSE), apart from providing support for search operation, allows a client to perform update operations on outsourced database efficiently. Two security properties, viz., forward privacy and backward privacy are desirable from a DSSE scheme. The former captures that the newly updated entries cannot be related to previous search queries and the latter ensures that search queries should not leak matching entries after they have been deleted. These security properties are formalized in terms of the information leakage that can be incurred by the respective constructions. Existing backward private constructions either have a non-optimal communication overhead or they make use of heavy cryptographic primitives. Our main contribution consists of two efficient backward private schemes Π BP and Π WBP that aim to achieve practical efficiency by using light weight symmetric cryptographic components only. In the process, we also revisit the existing definitions of information leakage for backward privacy [Bost et al. (In ACM CCS (2017) 1465–1482 ACM Press)] and propose a relaxed formulation. Π BP is the first construction to achieve backward privacy in the general setting with optimal communication complexity. Our second construction, Π WBP , is the first single round-trip scheme achieving backward privacy in a restricted setting with optimal communication complexity using light weight symmetric cryptographic primitives. The prototype implementations of our schemes depict the practicability of the proposed constructions and indicate that the cost of achieving backward privacy over forward privacy is substantially small. The performance results also show that the proposed constructions outperform the currently most efficient scheme achieving backward privacy.

Keywords: Dynamic Searchable Symmetric Encryption; backward privacy; forward privacy

1. Introduction

Due to a variety of crucial benefits, enterprises outsource their data to cloud resident storage. If the outsourced data is stored as plaintext on remote servers then it may be intercepted by adversaries. Hence, data is stored in encrypted form on remote servers. However, if the client has to decrypt the whole data in order to get results for a search query, it defeats the purpose of outsourcing data. Generic tools such as fully homomorphic encryption [[20]] or oblivious RAM [[23], [40]] can be considered to construct protocols that leak almost no information to the server. But as of now, these tools are costly for large databases and hence, are impractical.

A practical solution to this problem is Searchable Symmetric Encryption ( $SSE$ ) [[10], [13], [37]] that trades efficiency for security. Dynamic Searchable Symmetric Encryption ( $DSSE$ ) [[9], [28], [34]] adds a vital feature to static $SSE$ schemes, i.e., the ability for the client to efficiently perform update operations remotely on the outsourced database with the guarantee that minimal information is leaked to the server in the process. These constructions of $(D) SSE$ that aim to achieve an acceptable balance between security and performance, explicitly describe the leakage profile and formally prove that the information leaked from the scheme is bounded by the leakage profile.

Simultaneously with the works on constructing $SSE$ schemes with improved efficiency, security and expressiveness of queries [[10], [17], [27]], there is another line of work that shows the real-world consequences of these leakages [[1], [8], [44]]. Zhang et al. [[44]] through file injection attack showed that it is possible to reveal the contents of past search queries of $DSSE$ schemes with a few injection of documents and Abdelraheem et al. [[1]] showed that the consequences of this attack is even more devastating in the case of relational databases.

Because of the file injection attack, forward privacy has garnered significant interest in the research community. The notion of forward privacy was introduced in [[39]], while it was first formalized in [[4]]. In hindsight, the first $DSSE$ scheme that satisfied the notion of forward privacy was proposed in 2005 [[12]]. Along with forward privacy, Stefanov et al. [[39]] asserted that backward privacy should also be satisfied by a $DSSE$ scheme. Informally, backward privacy states that search queries should not leak matching entries after they have been deleted. The notion of backward privacy was first formalized by Bost at el. [[6]].

1.1. Related work

Kamara et al. [[28]] proposed the first sublinear (in the size of the database) $DSSE$ scheme. Forward private scheme $Σ o ϕ o ζ$ [[4]] achieves optimal communication complexity (linear in the size of result set) but it makes use of asymmetric cryptographic primitives and does not support parallel processing. A $GGM - PRF$ [[22]] based forward private $DSSE$ scheme $Diana$ was proposed in [[6]]. $Diana$ makes use of symmetric cryptographic primitives only and supports parallel processing but doesn't have optimal computational and communication complexity. Asymptotically optimal forward private $DSSE$ schemes that make use of symmetric cryptographic primitives only and support parallelism were proposed in [[16], [30], [38]]. Further, $FASTIO$ scheme [[38]] ensures a reasonable locality [[9]], a measure of I/O efficiency.

The notion of backward privacy was first formally described in [[6]], through three different definitions, in the ascending order in terms of information leakage, called respectively $BPIP$ , $BPUP$ and $WBP$ . Bost et al. [[6]] proposed a generic way to achieve backward privacy from any forward private $DSSE$ scheme. However, the communication complexity of the derived backward private scheme isn't optimal. In [[6]], a backward private scheme ${Diana}_{del}$ based on constrained pseudo-random function ( $CPRF$ ) [[3], [7], [29]] and a backward private $Janus$ framework based on a puncturable encryption scheme with a particular incremental update property [[26]], were also proposed. The communication and computational complexity of search and update protocols of ${Diana}_{del}$ are not optimal. The search protocol in $Janus$ is single-roundtrip and has an optimal communication complexity. However, the computational complexity of search protocol is $O (n_{w} \cdot d_{w})$ , where $n_{w}$ and $d_{w}$ respectively denote the number of documents matching keyword w and delete operations performed on keyword w. As acknowledged in [[6]], with just a few hundred deletions per keyword, $Janus$ will not be practical because of both computational and storage overhead reasons. Further, [[6]] has imposed the following restriction on ${Diana}_{del}$ and $Janus$ : reinsertion of document-keyword pairs that were previously deleted is not allowed. We refer to this constraint as reinsertion restriction in the rest of our paper.

Very recently, there appeared two works on backward private $DSSE$ [[21], [41]]. Chamani et al. [[21]] proposed a practically efficient $BPUP$ -secure scheme called $Mitra$ . Further, by handling delete queries efficiently in this scheme resulted in the most efficient backward private scheme until now, i.e., ${Mitra}^{*}$ . They also proposed two other backward private schemes $Orion$ and $Horus$ that achieve quasi-optimal (linear in $n_{w}$ upto a logarithmic factor) search computation complexity but make use of Path ORAM [[40]] and as a result are impractical for large databases [[33]]. Sun et al. [[41]] proposed a symmetric puncturable encryption ( $SPE$ ) scheme and instantiated the $Janus$ framework with the proposed $SPE$ scheme. We provide a comprehensive comparative analysis of the performance and security of these schemes vis à vis our proposed schemes in Section 4.4 and 5.

1.2. Our contributions

We start with revisiting the notion of information leakage in the context of backward privacy. Our investigation suggests that Weak Backward Privacy ( $WBP$ ) notion proposed by Bost et al. [[6]] should only be used to argue backward privacy in reinsertion restriction setting. The other two existing notions of information leakage $BPIP$ and $BPUP$ are strong in the sense that it seems difficult to obtain a DSSE scheme satisfying these notions with optimal communication complexity. Therefore, like $WBP$ one may need to relax these stronger notions a bit but unlike $WBP$ one needs to ensure that the notion of backward privacy is not violated in the general setting. To this end we formalize a relaxed notion of information leakage, $BPLP$ .

Our main contribution consists of two backward private schemes $Π_{BP}$ and $Π_{WBP}$ that are $BPLP$ and $WBP$ secure respectively. We start with a simple forward private scheme $Π_{FP}$ , that serves as a building block for our backward private schemes. Currently, ${Mitra}^{*}$ [[21]] is the most suitable backward private candidate for adoption in practice as it makes use of only light weight symmetric components and has low leakage level. However, one limitation of ${Mitra}^{*}$ is that the communication complexity isn't optimal. We address this issue in our main construction $Π_{BP}$ . $Π_{BP}$ makes use of tags generated using a pseudo random permutation ( $PRP$ ) which ensures that only the tags corresponding to the set of documents currently matching the keyword w are returned to the client. Thus, $Π_{BP}$ avoids unnecessary communication overhead, while at the same time ensuring that any information violating the notion of backward privacy is not leaked to the server. To the best of our knowledge, $Π_{BP}$ is the first practical backward private scheme that has optimal update and search communication complexity and uses symmetric cryptographic primitives only. Further, $Π_{BP}$ is easily parallelizable, provides reasonable locality and allows reinsertion of document-keyword pair. With a simple modification in $Π_{BP}$ , we construct a single roundtrip weak backward private scheme $Π_{WBP}$ that improves upon the concrete communication overhead in Search protocol by more than 50%. Both the proposed constructions are forward private as well. A comparison of our schemes with some prior works [[6], [21], [41]] is provided in Table 1.

Table 1 Comparison of backward private schemes $Π_{BP}$ and $Π_{WBP}$ with some prior works

Schemes	Computation		Communication			Backward privacy

	Search	Update	Search	Update	# Rounds
$Fides$ [6]	$O (o_{w}^{'})$	$O (1)$	$O (o_{w}^{'})$	$O (1)$	2	$BPUP$
${Diana}_{del}$ [6]	$O (a_{w})$	$O (log (a_{w}))$	$O (n_{w} + d_{w} log (a_{w}))$	$O (1)$	2	$WBP$
$Janus$ [6]	$O (n_{w} \cdot d_{w})$	$O (1)$	$O (n_{w})$	$O (1)$	1	$WBP$
$Mitra$ [21]	$O (o_{w}^{'})$	$O (1)$	$O (o_{w}^{'})$	$O (1)$	2	$BPUP$
$Orion$ [21]	$O (n_{w} {log}^{2} (N))$	$O ({log}^{2} (N))$	$O (n_{w} {log}^{2} (N))$	$O ({log}^{2} (N))$	$O (log (N))$	$BPIP$
$Horus$ [21]	$O (n_{w} log (d_{w}) log (N))$	$O ({log}^{2} (N))$	$O (n_{w} log (d_{w}) log (N))$	$O ({log}^{2} (N))$	$O (log (d_{w}))$	$WBP$
$Janus + +$ [41]	$O (n_{w} \cdot d)$	$O (d)$	$O (n_{w})$	$O (1)$	1	$WBP$
$Π_{BP}$ (this work)	$O (o_{w}^{'})$	$O (1)$	$O (n_{w})$	$O (1)$	2	$BPLP$
$Π_{WBP}$ (this work)	$O (o_{w}^{'})$	$O (1)$	$O (n_{w})$	$O (1)$	1	$WBP$

All the constructions are also forward private. The client storage for all the constructions is

$O (m log (n))$

except

$Orion$

, where the corresponding complexity is

$O (1)$

. See Table 2 for the notations used. Generally,

$o_{w}^{'} < n_{w} \cdot d_{w}$

, as the former has an additive factor whereas the latter has a multiplicative factor.

$Π_{FP}$ is the most efficient forward private scheme in literature. Our implementation results show that the performance of $Π_{WBP}$ and $Π_{BP}$ is comparable to $Π_{FP}$ . For example, the time taken by the search protocol of $Π_{WBP}$ and $Π_{BP}$ for a search that returned 150,000 results is around 0.60 and 0.73 seconds respectively as compared to 0.54 seconds in $Π_{FP}$ . The performance results indicate that the proposed constructions are 2× faster and improve upon the communication cost by a factor of 1.5–11 compared to the most efficient backward private construction in literature.

2. Notations and definitions

The security parameter is denoted by λ. All procedures in our construction implicitly take λ as input. By efficient, we mean probabilistic polynomial-time in λ. All the algorithms (including adversaries and simulators) are assumed to be efficient unless otherwise specified. A function f: $N \to R$ is said to be a negligible function iff for all $c >$ 0, $\exists n_{0} \in N$ such that $\forall n ⩾ n_{0}$ , $f (n) < n^{- c}$ . The function $neg (λ)$ denotes a negligible function in λ. For a finite set X, $x \overset{$}{\leftarrow} X$ means that x is uniformly sampled from X and $| X |$ denotes the cardinality of set X. $x \leftarrow y$ denotes that variable x is assigned the value of variable y and operator ‖ denotes concatenation. For a data structure $DS$ , $| DS |$ denotes the memory space occupied by the data structure in bits. $addr (D)$ denotes the address in memory at which data structure instance $D$ is stored. ⊥ denotes null value. For sets $X_{1}, ..., X_{n}$ and Y, $Func (X_{1} \times \dots \times X_{n}, Y)$ denotes the set of all functions from $X_{1} \times \dots \times X_{n}$ to Y. For set X, $Perm (X)$ denotes the set of all permutations on X. For sets $X_{1}$ and $X_{2}$ , ${Perm}_{X_{1}} (X_{2})$ denotes the set of all functions from $X_{1} \times X_{2}$ to $X_{2}$ , where for every $x \in X_{1}$ , we have a permutation on $X_{2}$ .

We use pseudo random functions ( $PRF$ ), pseudo random permutations ( $PRP$ ) and pseudorandom ciphertexts under chosen plaintext attack (RCPA) secure symmetric key encryption schemes in our constructions.

2.1. Pseudorandom function

Pseudorandom Function ( $PRF$ ) $F$ is polynomial-time computable in λ and is indistinguishable from a truly random function by any adversary $A$ .

Definition 2.1.

Let $F \in Func (K \times I, O)$ be an efficient, keyed function. For algorithm $A$ , we define the experiments ${Real}_{A}^{PRF} (λ)$ and ${Ideal}_{A}^{PRF} (λ)$ as shown in Fig. 1.

$F$ is a pseudorandom function if for all probabilistic polynomial-time adversaries $A$ , ${Adv}_{F, A}^{PRF} (λ) = | Pr [{Real}_{A}^{PRF} (λ) = 1] - Pr [{Ideal}_{A}^{PRF} (λ) = 1] | ⩽ neg (λ)$ .

2.2. Pseudorandom permutation

Pseudorandom permutation ( $PRP$ ) $F$ is polynomial-time computable and invertible in λ and is indistinguishable from a truly random permutation by any adversary $A$ .

Definition 2.2.

Let $F \in {Perm}_{K} (X)$ be an efficient, keyed function. For algorithm $A$ , we define the experiments ${Real}_{A}^{PRP} (λ)$ and ${Ideal}_{A}^{PRP} (λ)$ as shown in Fig. 2.

Graph: Fig. 1. PRF security definition.

Graph: Fig. 2. PRP security definition.

$F$ is a pseudorandom permutation if for all probabilistic polynomial-time adversaries $A$ , ${Adv}_{F, A}^{PRP} (λ) = | Pr [{Real}_{A}^{PRP} (λ) = 1] - Pr [{Ideal}_{A}^{PRP} (λ) = 1] | ⩽ neg (λ)$ .

2.3. Symmetric key encryption scheme

A symmetric key encryption scheme $E$ consists of three algorithms: $Gen, Enc and Dec$ .

$Gen ()$ : It outputs a key $k$ .

$Enc (k, m)$ : It takes as input the key $k$ , and a message $m \in M$ , where $M$ is the message space, and outputs a ciphertext $e$ .

$Dec (k, e)$ : It takes as input the key $k$ and a ciphertext $e$ and outputs a message $m \in M$ or ⊥.

Correctness: For all $k \leftarrow Gen ()$ and for all messages $m \in M$ , it is required that $Dec (k, Enc (k, m)) = m$ .

RCPA security notion of symmetric key encryption scheme. The pseudorandom ciphertexts under chosen plaintext attack (RCPA) security notion [[9]] for symmetric key encryption scheme $E = (Gen, Enc, Dec)$ is captured in Fig. 3. $Initialize$ () algorithm generates the key $k$ using $Gen$ algorithm of $E$ and picks a random challenge bit b. The adversary can then adaptively ask queries to $Encrypt$ () oracle. The game returns true if the adversary's output $b^{'}$ equals the challenge bit b. In Fig. 3, $C$ denotes the ciphertext space.

Definition 2.3.

A symmetric key encryption scheme $E = (Gen, Enc, Dec)$ is said to have pseudorandom ciphertexts under chosen plaintext attack (RCPA) if no adversary $A$ can win the game shown in Fig. 3, except with probability at most $\frac{1}{2} + neg (λ)$ . This probability is denoted by $Pr [{RCPA}_{E} = 1]$ .

The advantage of $A$ in ${RCPA}_{E}$ game is defined as follows: ${Adv}_{E, A}^{RCPA} (λ) = 2 \cdot Pr [{RCPA}_{E} = 1] - 1$ .

Graph: Fig. 3.Security game: RCPAE.

2.4. Dynamic searchable symmetric encryption (DSSE)

The system consists of two parties: the client $C$ (data owner) and the server $S$ . $C$ , who owns the database $DB$ , encrypts the database and outsources it to $S$ . The encrypted copy of the database created ensures that $S$ responds to $C$ 's queries (search and update) in an efficient manner with the guarantee that minimal information apart from the intended output of the query operation is leaked to the server. Usually, the encrypted database comprises of a secure index and a collection of encrypted documents. $S$ responds to $C$ 's queries with the help of this secure index.

We primarily follow the formalization of Bost et al. [[6]] with certain additions. A keyword is denoted by w and a document is addressed by its identifier $ind$ . The database $DB$ can be represented as: $DB = {({ind}_{i}, W_{i}) : 1 ⩽ i ⩽ n}$ , where n denotes the number of documents in the database, ${ind}_{i} \in {0, 1}^{ℓ}$ are distinct document identifiers and $W_{i} \subseteq {0, 1}^{*}$ is a set of keywords matching document ${ind}_{i}$ , represented by binary strings of arbitrary length. Additionally, we consider the notations described in Table 2.

Table 2 Notations used in $DSSE$ description

Notation	Description
$W$	$⋃_{i = 1}^{n} W_{i}$ , the set of keywords
m	$\| W \|$ , # keywords
N	$\sum_{i = 1}^{n} \| W_{i} \|$ , # document-keyword pair
$DB (w)$	${{ind}_{i} : w \in W_{i}}$ , the set of documents containing w
$n_{w}$	$\| DB (w) \|$ , # documents containing w
$a_{w}$	# $add$ operations performed on w
$d_{w}$	# $del$ operations performed on w
$o_{w}$	# updates performed on w
$n_{w}^{'}$	# documents containing w in previous search* operation
$a_{w}^{'}$	# $add$ operations performed on w after the previous search
$d_{w}^{'}$	# $del$ operations performed on w after the previous search
$o_{w}^{'}$	$n_{w}^{'} + a_{w}^{'} + d_{w}^{'}$
d	upper bound on the number of deletes corresponding to a keyword w that can happen between two successive search queries on w

1 *By previous search on w, we mean the last search operation on w before the current search on w.

A $DSSE$ scheme Π comprises of the following [[6]]:

$Setup (DB)$ is a probabilistic algorithm that takes as input the initial database $DB$ . It outputs ( $s t_{C}, EDB$ ), where the client's state $s t_{C}$ is given to $C$ and the encrypted database $EDB$ is given to $S$ . $Setup$ algorithm is executed by $C$ .

$Search (q, s t_{C}; EDB) = ({Search}_{C} (q, s t_{C}), {Search}_{S} (EDB))$ is a protocol (possibly probabilistic) between $C$ and $S$ . The input of $C$ is the search query $q$ and client's state $s t_{C}$ . The input of $S$ is the encrypted database $EDB$ . The output to the client is the updated client's state ${s t_{C}}^{'}$ and the set $Res$ comprising of document identifiers matching the search query $q$ . The output to the server is the updated encrypted database ${EDB}^{'}$ .

$Update (q, s t_{C}; EDB) = ({Update}_{C} (q, s t_{C}), {Update}_{S} (EDB))$ is a protocol (possibly probabilistic) between $C$ and $S$ . The input of $C$ is the query $q = (op, in)$ comprising of update operation $op \in {add, del}$ and the document-keyword pairs ( $ind$ ,w) denoted by $in$ , and client's state $s t_{C}$ . The input of $S$ is the encrypted database $EDB$ . The output to the client is the updated client's state ${s t_{C}}^{'}$ . The output to the server is the updated encrypted database ${EDB}^{'}$ .

In this work, we consider the case of single keyword search, so, $q = w$ and $Res = DB (w)$ in $Search$ protocol. For simplicity, we consider $in = (ind, w)$ , i.e., a single document-keyword pair in $Update$ protocol. There are two use-cases of $Update$ protocol in a $DSSE$ scheme. In the first use-case updates to the $DB$ are done at document level, such as in storing a collection of text files. In such scenarios, bulk-updates are needed, which can be supported by calling the $Update$ protocol repeatedly. In the second use-case updates to the $DB$ are done at keyword-document pair level, such as in record databases which requires a single call to $Update$ protocol.

Similar to [[9], [28], [30], [39]], for a given query $q$ , we consider the output of $Search$ protocol to be the set of document identifiers satisfying $q$ . This allows us to decouple the storage of documents from the storage of data structures used to realize the search operation, which is the focus of this work. The leakage profile $L$ in the security definitions for an $SSE$ scheme can then be given with respect to only the information that gets leaked to adversarial server till the set of document identifiers satisfying $q$ are determined. The storage of actual documents can be done in a variety of ways [[10]], with varying types of leakages. $SSE$ schemes are of two types: response-revealing and response-hiding [[27]]. The former reveals the query response in plaintext whereas the latter does not. We use this categorization in our paper.

Security. We consider $S$ to be honest-but-curious. The security definition follows the real/ideal simulation paradigm [[10], [14]]. The definition is parameterized by a leakage profile $L$ which captures all the information that the adversary may learn about the encrypted database and $C$ 's queries through its participation in the protocols. Hence, the view of the adversary in the real world can be simulated by $L$ . $L = {L_{Setup}, L_{Search}, L_{Update}}$ , where $L_{Setup}$ , $L_{Search}$ and $L_{Update}$ correspond to the information leaked in the $Setup$ , $Search$ and $Update$ protocols respectively to the server.

Definition 2.4.

Let $Π = (Setup, Search, Update)$ be a $DSSE$ scheme and let $L = {L_{Setup}, L_{Search}, L_{Update}}$ be a stateful algorithm. For probabilistic polynomial time ( $PPT$ ) algorithms $A$ and $Sim$ , we define the experiments ${Real}_{A}^{Π} (λ)$ and ${Ideal}_{A, Sim}^{Π} (λ)$ as follows:

${Real}_{A}^{Π} (λ)$ . $A (1^{λ})$ chooses $DB$ . The experiment then runs $(s t_{C}, EDB) \leftarrow Setup (DB)$ and gives $EDB$ to $A$ . Then $A$ makes polynomial number of adaptive queries. For each query $q$ , if $q$ is a search query (resp. update query), the game runs $(Res, s t_{C}, EDB) \leftarrow Search (q, s t_{C}; EDB)$ (resp. $(s t_{C}, EDB) \leftarrow Update (q, s t_{C}; EDB)$ ) and gives the generated transcript to $A$ . Eventually $A$ returns a bit that the game uses as its own output.

${Ideal}_{A, Sim}^{Π} (λ)$ . $A (1^{λ})$ chooses $DB$ . The experiment then runs $EDB \leftarrow Sim (L_{Setup} (DB))$ and gives $EDB$ to $A$ . Then $A$ makes polynomial number of adaptive queries. For each query $q$ , if $q$ is a search query (resp. update query), the game gives transcript generated by $Sim (L_{Search} (q))$ (resp. $Sim (L_{Update} (q))$ ). Eventually $A$ returns a bit that the game uses as its own output.

We say that Π is $L$ -semantically secure against adaptive attacks if for all adversaries $A$ , there exists an algorithm $Sim$ such that $| Pr [{Real}_{A}^{Π} (λ) = 1] - Pr [{Ideal}_{A, Sim}^{Π} (λ) = 1] | ⩽ neg (λ)$ .

Common leakage. We follow some of the notations of common leakages from [[4], [6]]. The leakage profile $L$ keeps as state the query list $Q$ , i.e., the list of all queries issued so far along with their timestamp. Basically, the timestamps are a sequence of integers. The entries in $Q$ are $(u, w)$ for a search query on w, or $(u, op, (ind, w))$ for an update query $(op, (ind, w))$ , where u denotes the timestamp of the query. Corresponding to the search queries, the search pattern $sp (w)$ can be defined as $sp (w) = {u : (u, w) \in Q}$ .

We also use the notation $Hist (w)$ that denotes the list of all the modifications made to $DB (w)$ over the time. It consists of ${DB}_{0} (w)$ , the set of document indices matching w at setup, and a list $UpHist (w)$ , comprising of the updates of documents matching w, called the update history. For example, consider two documents 1 and 2 matching w. Suppose, the update queries are $(add, (1, w))$ , $(add, (2, w))$ and $(del, (1, w))$ at timestamp 3, 12 and 20 respectively, then $UpHist (w) = [(3, add, 1), (12, add, 2), (20, del, 1)]$ .

$Updates (w)$ denotes the set of timestamps of updates on w. Formally,

$\begin{array}{l} Updates (w) = {u | (u, add, (ind, w)) \in Q or (u, del, (ind, w)) \in Q} . \end{array}$

${Updates}^{op} (w)$ is exactly like $Updates (w)$ except along with the timestamp it also stores $op$ corresponding to the update query.

Correctness. We say that a $DSSE$ scheme is correct if the $Search$ protocol returns the correct results for the keyword being searched (i.e., $DB (w)$ ), except with negligible probability. We follow a similar formalization as [[9]].

In Fig. 4, the adversary $A$ makes $p (λ)$ many queries, for some polynomial p. $Transcript [Protocol]$ means the view of server in $Protocol$ . $T_{0} = \emptyset$ and $T_{i} = {τ_{j} : 1 ⩽ j ⩽ i}$ , 1 $⩽ i ⩽ p (λ)$ . Here, $τ_{j}$ denotes the transcript of the jth query.

Graph: Fig. 4. DSSE correctness.

Definition 2.5.

Let $Π = (Setup, Search, Update)$ be a $DSSE$ scheme. For algorithm $A$ we define the experiment ${DSSECor}_{A}^{Π} (λ)$ as shown in Fig. 4.

We say that Π is correct if for all adversaries $A$ , $Pr [{DSSECor}_{A}^{Π} (λ) = 1] ⩽ neg (λ)$ .

3. Security notions in DSSE

As discussed earlier, two security properties, viz., forward privacy and backward privacy are desirable from a $DSSE$ scheme. Informally speaking, the former captures that the newly updated entries cannot be related to previous search queries and the latter ensures that the search queries should not leak any information about indexes of deleted files. In the context of searchable encryption, the usual practice [[9], [13], [27]] is to argue security of $SSE$ schemes by formulating a leakage profile $L$ and proving that the leakage incurred in the proposed scheme is bounded by $L$ . Recent works in the context of $DSSE$ [[4], [6]] abstract out the information leakage incurred by a DSSE scheme in the form of definitions and attempt to justify that the formalized leakage profile adheres to the notion of forward/backward privacy. However, there could be several leakage profiles that capture the security notion in hand. One can choose among these formulations of leakage to achieve some sort of optimal balance between security and efficiency for the given context. Hence, the exercise of abstracting leakage-profiles of proposed constructions as alternative candidate definitions (as was the case for $SSE$ [[8]]) may smoothen future evaluation of these leakages.

The question of what is the right definition of security is a vexed one [[31]]. Even for a widely used cryptographic primitive like digital signature, it has been argued that the accepted standard definition of security [[24]] does not take into account various issues that may crop up depending upon the application scenarios [[32], [35], [43]]. In the context of $DSSE$ , security is argued by formally describing a leakage profile $L$ and showing that the leakage incurred in construction is bounded by $L$ . Hence, it is but natural that for a relatively new crypto/security protocol like $DSSE$ , one needs to look at the formulations of leakage profiles from various perspectives. In this section, we perform a critical analysis of the existing notions of information leakage in backward private DSSE followed by an alternative formulation.

Bost at el. [[6]] made a seminal contribution in the area of $DSSE$ by formalizing the notion of backward privacy through three different formulations viz., $BPIP$ , $BPUP$ and $WBP$ , arranged in the ascending order in terms of information leakage. Naturally, like any other formalization of security these notions require further investigation. In that vein, we first reason why weak backward privacy ( $WBP$ ) cannot be used to argue backward privacy in general and instead be reserved for backward privacy in a restricted setting only. Second, we introduce a relaxed leakage profile (Definition 3.5) that captures the notion of backward privacy. Finally, we introduce a desirable property for $DSSE$ scheme called inverse backward privacy.

For simplicity we will assume that, $DB$ is initially empty. Thus, the $Setup$ algorithm leaks no information. If $DB$ is not initially empty, typically, $L_{Setup} = N$ , where N denotes the number of document-keyword pairs. Let $L_{1}$ and $L_{2}$ denote two leakage profiles. If $L_{1}$ leaks less than $L_{2}$ , we denote it as $L_{1} ⪯ L_{2}$ . By the proposition, " $L_{1}$ leaks less than $L_{2}$ " we mean that $L_{1}$ gives less information about the database and the queries to the simulator than $L_{2}$ , or, said otherwise, that every information given by $L_{1}$ can be inferred from $L_{2}$ . If $L_{1}$ leaks strictly less than $L_{2}$ , we denote it as $L_{1} ≺ L_{2}$ .

3.1. Forward privacy

We recall the strongest notion of forward privacy discussed in [[6]]. Informally, a $DSSE$ scheme is forward private if the $Update$ protocol leaks no information about the updated keywords. Definition 3.1 captures that an update operation doesn't leak more than the operation $op = {add, del}$ of the update query $q$ .

Definition 3.1.

( $FP - I$ ). An $L = {L_{Setup}, L_{Search}, L_{Update}}$ -semantically secure against adaptive attacks $DSSE$ scheme Π is $FP - I$ iff $L_{Setup}, L_{Search}, L_{Update}$ can be written as:

$\begin{array}{l} L_{Setup} () = \emptyset . L_{Search} (w) ⪯ {sp (w), Hist (w)} . \\ L_{Update} (op, (ind, w)) ⪯ {op} . \end{array}$

3.2. Backward privacy

The notion of backward privacy was informally introduced in [[39]]. Intuitively, a backward private search for a keyword w should not reveal which documents in the past (which are now removed from the database) contained w. Like any other security property in the context of $SSE$ , there could be several formulations of leakage profiles capturing this notion. Backward privacy was formalized in [[6]], through three different definitions, in the ascending order of information leakage, called respectively $BPIP$ , $BPUP$ and $WBP$ . Informally, these formulations of leakages are described below:

Backward Privacy with insertion pattern ( $BPIP$ ): During a search on some keyword w, $BPIP$ schemes leak the documents currently matching w, when they were inserted, and the total number of updates on w.

Backward Privacy with update pattern ( $BPUP$ ): During a search on w, $BPUP$ schemes leak the documents currently matching w, when they were inserted, and when all the updates on w happened (but not their contents).

Weak backward privacy ( $WBP$ ): During a search on w, $WBP$ schemes leak the documents currently matching w, when they were inserted, when all the updates on w happened, and which deletion update canceled which insertion update.

Let us demonstrate the differences between these notions with an example. Consider the following entries in query list $Q_{1}$ corresponding to keyword w: $(1, add, ({ind}_{1}, w))$ , $(4, add, ({ind}_{2}, w))$ , $(5, del, ({ind}_{1}, w))$ , $(12, add, ({ind}_{3}, w))$ . Let us consider the leakage for each definition after a search query on w at timestamp 15. The first notion reveals that ${ind}_{2}$ and ${ind}_{3}$ match keyword w and that this entries were added at time 4 and 12 respectively. It also reveals that there were a total of 4 updates for w. The second notion, additionally reveals that updates on w happened at time 1, 4, 5 and 12. Finally, the third definition also reveals that the index that was added for w at time 1 was deleted at time 5.

We now recall the additional leakage functions from [[5]] apart from the ones described in Section 2.4, that are required to formally capture the notions of backward privacy mentioned above.

For a keyword w, $TimeDB (w)$ is the list of all documents matching w, excluding the deleted ones together with the timestamp of when they were inserted in the database. Formally, $TimeDB (w)$ can be constructed from the query list Q as follows:

$\begin{array}{l} TimeDB (w) = {(u, ind) | (u, add, (w, ind)) \in Q and \\ (1) & \forall u^{'} > u, (u^{'}, del, (w, ind)) \notin Q} . \end{array}$

The deletion history $DelHist (w)$ of w is the list of timestamps for all deletion operations, together with the timestamp of the inserted entry it removes. Formally, $DelHist (w)$ is constructed as:

$\begin{array}{l} DelHist (w) = { & (u^{add}, u^{del}) | \exists ind s.t. (u^{add}, add, (w, ind)) \in Q and \\ (2) & (u^{del}, del, (w, ind)) \in Q} . \end{array}$

With these tools, we can now formally define these three notions of backward privacy formally.

Definition 3.2 (BPIP).

An $L = {L_{Setup}, L_{Search}, L_{Update}}$ -semantically secure against adaptive attacks $DSSE$ scheme Π is $BPIP$ iff $L_{Setup}, L_{Search}, L_{Update}$ can be written as:

$\begin{array}{l} L_{Setup} () = \emptyset . L_{Update} (op, (ind, w)) ⪯ {op} . \\ L_{Search} (w) ⪯ {TimeDB (w), o_{w}} . \end{array}$

Definition 3.3 (BPUP).

An $L = {L_{Setup}, L_{Search}, L_{Update}}$ -semantically secure against adaptive attacks $DSSE$ scheme Π is $BPUP$ iff $L_{Setup}, L_{Search}, L_{Update}$ can be written as:

$\begin{array}{l} L_{Setup} () = \emptyset . L_{Update} (op, (ind, w)) ⪯ {op, w} . \\ L_{Search} (w) ⪯ {TimeDB (w), Updates (w)} . \end{array}$

Definition 3.4.

( $WBP$ ). An $L = {L_{Setup}, L_{Search}, L_{Update}}$ -semantically secure against adaptive attacks $DSSE$ scheme Π is $WBP$ iff $L_{Setup}, L_{Search}, L_{Update}$ can be written as:

$\begin{array}{l} L_{Setup} () = \emptyset . L_{Update} (op, (ind, w)) ⪯ {op, w} . \\ L_{Search} (w) ⪯ {TimeDB (w), DelHist (w)} . \end{array}$

As is evident from the above definitions, the notion of backward privacy is more involved than that of forward privacy. Forward privacy can be formalized by ensuring that the update query doesn't leak the keyword corresponding to which the update has been made. Whereas it is more subtle in the case of backward privacy. One approach is to formulate a strong leakage profile for the security notion in hand that allows very limited information to be leaked by the constructions satisfying it. This approach was followed in formulating leakage profiles in definitions $BPIP$ and $BPUP$ . A possible shortcoming of this approach is that there could be candidate constructions that don't satisfy these strong requirements but may still satisfy the intuitive notion of backward privacy. The other extreme could be to let "as much information as one can think of" to be leaked that can be allowed by the security notion in hand. This seems to be the approach followed in $WBP$ security notion. However, in this approach one must be careful that the additional leakage in the leakage profile does not end up violating the basic security notion of the corresponding task. We revisit the notion of $WBP$ from this perspective.

Remark.

$L_{Search}$ in $BPIP$ , $BPUP$ and $WBP$ should be augmented with leakage function $sp (w)$ . Moreover, $L_{Search}$ in $WBP$ should also be augmented with leakage function $Updates (w)$ . The rationale behind the same is described in Section 3.4.

3.3. Revisiting weak backward privacy

In this section, we scrutinize the notion of weak backward privacy. Let us consider the following entries in query list $Q_{1}$ corresponding to w: $(1, add, (ind, w))$ , $(3, w)$ , $(5, del, (ind, w))$ , $(6, w)$ , $(12, add, (ind, w))$ , $(14, del, (ind, w))$ , $(18, w)$ . Let us denote the search queries on w at timestamps 3, 6 and 18 by $q_{1}$ , $q_{2}$ and $q_{3}$ respectively.

Leakage of the search query $q_{1}$ :

$\begin{array}{l} TimeDB (w) = {(1, ind)} . DelHist (w) = \emptyset . \end{array}$

Leakage of the search query $q_{2}$ :

$\begin{array}{l} TimeDB (w) = \emptyset . DelHist (w) = {(1, 5)} . \end{array}$

Note that, $DelHist (w)$ leaks that the $add$ operation at timestamp 1 is canceled by the $del$ operation at timestamp 5. Through the content of $DB (w)$ after queries $q_{1}$ and $q_{2}$ the adversary can infer that the $del$ operation at timestamp 5 corresponds to document $ind$ . Therefore, it adheres to the notion of backward privacy described at the beginning of this section.

Now, let us consider the search query $q_{3}$ . After the search query $q_{2}$ , $(ind, w)$ was added at timestamp 12 and later deleted at timestamp 14. Through the same intuitive notion of backward privacy, based on the state of $DB (w)$ after queries $q_{2}$ and $q_{3}$ , the adversary should not infer which document does the update queries at timestamp 12 and 14 correspond to. However, the leakage of the search query $q_{3}$ is:

$\begin{array}{l} TimeDB (w) = \emptyset . DelHist (w) = {(1, 5), (1, 14), (12, 5), (12, 14)} . \end{array}$

Hence, through the leakage profile the adversary can infer that updates at timestamp 12 and 14 correspond to document $ind$ as it has already inferred which document the update queries at timestamp 1 and 5 correspond to. Clearly, this goes against the intuitive notion of backward privacy.

The following restriction is imposed on the constructions ${Diana}_{del}$ and $Janus$ that are proven to be weak backward private in [[6]]: "reinsertion of a document-keyword pair is not allowed after the deletion of the corresponding document-keyword pair". The reinsertion restriction allows one to avoid scenarios such as above that violate the intuitive notion of backward privacy. Hence, $WBP$ can be considered to argue backward privacy in reinsertion restriction setting only. However, $WBP$ -constructions proposed in subsequent works [[21], [41]] do not explicitly mention that the reinsertion of document-keyword pair is not allowed. Therefore, in order to avoid any ambiguity, we feel it needs to be explicitly mentioned that $WBP$ is applicable in such restricted scenarios only.

Remark.

The case of reinsertion of document-keyword pair, may not be a concern in certain use-cases of $SSE$ schemes where a new document identifier can be assigned to the updated document, thereby, ensuring that the newly inserted document-keyword pairs cannot be related to older ones. But this trick may not always be applicable especially when the contents of file can change dynamically over time. Here, one needs to handle reinsertion of keyword in existing documents, i.e., the document identifier can't be changed. Therefore, in such scenarios one needs to support reinsertion of document-keyword pairs. As an example, consider the case of a hospital database where the patients' records are documents and the disease they are suspected to suffer from are keywords. One cannot rule out a scenario in which based on newer symptoms a patient is re-suspected to suffer from a disease, say malignant brain tumor, which was ruled out earlier.

3.4. Suggested modifications

Here, we point out some subtle issues in Definitions 3.2, 3.3 and 3.4 and suggest modifications to address them. We first argue that $sp (w)$ should be a part of $L_{Search} (w)$ in definitions of $BPIP$ , $BPUP$ and $WBP$ [[6]]. $BPUP$ -secure $Fides$ [[6]] as well as $WBP$ -secure $Janus$ [[6]] and $Janus + +$ [[41]] leak $sp (w)$ in $Search$ protocol. Since, $sp (w)$ is not mentioned explicitly to be part of the leakage profile in the definitions, one may conclude that $sp (w)$ can always be derived from the other leakage functions in the respective definitions. However, consider the border-line scenario where no updates corresponding to w has occurred so far and two search queries on w are executed at timestamps 5 and 8 respectively. For search query at timestamp 8, $sp (w) = {5}$ . However, the state of other leakage functions at timestamp 8 are: $TimeDB (w) = \emptyset$ , $Updates (w) = \emptyset$ and $DelHist (w) = \emptyset$ . As can be observed, $sp (w)$ cannot be derived from other leakage functions. Hence, $sp (w)$ should be included in $L_{Search} (w)$ of $BPIP$ , $BPUP$ and $WBP$ .

Next, we argue that $L_{Search}$ in $WBP$ should also be augmented with leakage function $Updates (w)$ . Recall that the informal notion of $WBP$ in [[6]] states that a search query on w should at most leak the documents currently matching w, when they were inserted, when all the updates on w happened and the total number of updates on w. While formalizing the leakage profile of $WBP$ [[6], Definition 4.2], the leakage in $Search$ protocol is described as: $L_{Search} (w) ⪯ (TimeDB (w), DelHist (w))$ . Note that, $Updates (w)$ isn't explicitly a part of $L_{Search} (w)$ . This gives the impression that, given $TimeDB (w)$ and $DelHist (w)$ one can construct $Updates (w)$ . Now, consider the following entries in query list $Q_{1}$ corresponding to w: $(1, add, (1, w))$ , $(2, del, (2, w))$ , $(4, add, (3, w))$ , $(6, del, (1, w))$ , $(8, w)$ . Let us denote the search query on w at timestamps 8 by $q_{1}$ .

Leakage functions at search query $q_{1}$ :

$\begin{array}{l} TimeDB (w) = {(4, 3)} . DelHist (w) = {(1, 6)} . \\ Updates (w) = {1, 2, 4, 6} . \end{array}$

Note that, $TimeDB (w)$ and $DelHist (w)$ collectively do not capture any information about the update operation at timestamp 2. As a result, $Updates (w)$ cannot be constructed given the leakage functions $TimeDB (w)$ and $DelHist (w)$ . Hence, $Updates (w)$ should be a part of $L_{Search} (w)$ of $WBP$ . Further, the construction $Janus$ also leaks $Updates (w)$ in $Search$ protocol. Hence, $Updates (w)$ needs to be included in the leakage profile of $Janus$ construction.

Thus, we suggest that the respective information leakage definitions in the context of search should be:

$\begin{array}{l} L_{BPIP, Search} : {sp (w), TimeDB (w), o_{w}} . \\ L_{BPUP, Search} : {sp (w), TimeDB (w), Updates (w)} . \\ L_{WBP, Search} : {sp (w), TimeDB (w), Updates (w), DelHist (w)} . \end{array}$

3.5. A relaxed formulation of backward privacy

Constructing backward private $DSSE$ schemes with optimal communication complexity is an interesting question to explore. To achieve this goal it seems necessary for $S$ to be able to identify which insertion entry is canceled by a particular delete entry. However, the stronger leakage profile definitions of $BPIP$ and $BPUP$ do not allow such information to be leaked to $S$ . Therefore, like $WBP$ , one may need to allow some non-trivial relation among the update queries to be leaked. But unlike $WBP$ it should be able to capture the notion of backward privacy in the general setting. We observe that there can be several alternative formulations of such information leakage. Here we describe one such candidate definition, i.e., Backward Privacy with Link Pattern $BPLP$ and later in Section 4.2 propose a natural construction whose leakage profile is tightly captured by $BPLP$ .

In order to describe the leakage profile of $BPLP$ , we introduce some new leakage functions. Let ${DB}_{x} (w)$ and ${DB}_{x + 1} (w)$ denote the set of documents matching w at two successive searches x and $x + 1$ respectively and let $LDB (w)$ denote the list of documents matching w in $x + 1$ th search, in order of their insertion. An element in $LDB (w)$ is of the form $(i, ind)$ , where i denotes the index and $ind$ denotes the document identifier. Let the timestamp of the xth and $x + 1$ th search be denoted as $u_{x}$ and $u_{x + 1}$ respectively. We define leakage functions Link Pattern I ( $LP - I$ ), Link Pattern II ( $LP - II$ ) and Link Pattern III ( $LP - III$ ):

$\begin{array}{l} LP - I (w) = & {(ind, u) | ind \in {DB}_{x} (w) and (u, op, (ind, w)) \in Q \\ and u_{x} < u < u_{x + 1}} . \\ LP - II (w) = & {(ind, u) | ind \in {DB}_{x + 1} (w) and (u, op, (ind, w)) \in Q \\ and u_{x} < u < u_{x + 1}} . \\ LP - III (w) = & {(u_{1}, u_{2}) | (u_{1}, op, (ind, w)) \in Q and \\ (u_{2}, op, (ind, w)) \in Q and u_{x} < u_{1} < u_{2} < u_{x + 1}} . \end{array}$

$LP - I (w)$ captures the relation between the identifiers obtained as a result in the xth search on w and the updates that happen between searches x and $x + 1$ on w. $LP - II (w)$ captures the relation between the identifiers obtained as a result in the $x + 1$ th search on w and the updates that happen between searches x and $x + 1$ on w. $LP - III (w)$ captures the relation among the updates corresponding to same $ind$ , that happen between x and $x + 1$ search on w.

For example, let ( $x + 1$ th) search query on w be at timestamp 16. Let the timestamp of the previous search query (x) on w be 4. Let ${DB}_{x} (w)$ and ${DB}_{x + 1} (w)$ be ${1, 2}$ and ${1, 3}$ respectively. Let $(5, del, (1, w))$ , $(7, del, (2, w))$ , $(12, add, (1, w))$ and $(15, add, (3, w))$ be the update queries on w that occur between these two searches. The respective leakage functions will be:

$\begin{array}{l} LP - I (w) = {(1, 5), (2, 7), (1, 12)} . LP - III (w) = {(5, 12)} . \\ LP - II (w) = {(1, 5), (1, 12), (3, 15)} . \end{array}$

Though the leakage functions defined above may appear a bit complex, they are useful abstractions through which $BPLP$ captures the notion of backward privacy as shown below. In $Search$ protocol, along with $sp (w)$ and ${Updates}^{op} (w)$ , leakage functions $LP - I (w)$ , $LP - II (w)$ , $LP - III (w)$ and $LDB (w)$ can be leaked to $S$ . Let $I_{BP}$ denote the set of document identifiers corresponding to which the entries have been already deleted. Note that, $\forall ind \in I_{BP}$ , as $ind \notin {DB}_{x} (w)$ and as the last update operation on $(ind, w)$ is $del$ operation, it follows that $ind \notin {DB}_{x + 1} (w)$ . Hence, no information about the $ind$ s in $I_{BP}$ can be revealed through leakage functions $LP - I (w)$ , $LP - II (w)$ and $LDB (w)$ . As the other leakage functions viz., $sp (w)$ and $LP - III (w)$ , do not leak $ind$ corresponding to an update, no information about identifiers in $I_{BP}$ gets revealed.

Definition 3.5 (BPLP).

An $L = {L_{Setup}, L_{Search}, L_{Update}}$ -semantically secure against adaptive attacks scheme Π is $BPLP$ iff $L_{Setup}, L_{Search}, L_{Update}$ can be written as:

$\begin{array}{l} L_{Setup} () = \emptyset . L_{Update} (op, (ind, w)) = \emptyset . \\ L_{Search} (w) ⪯ {sp (w), {Updates}^{op} (w), LP - I (w), LP - II (w), \\ LP - III (w), LDB (w)} \end{array}$

Note that Definition 3.5 ( $BPLP$ ) captures the notion of forward privacy as well. Therefore, constructions that are $BPLP$ secure are naturally forward private.

Next, we give a comparison among the leakage incurred in $Search$ protocol of $BPUP$ and $BPIP$ with $BPLP$ (Definition 3.5). As remarked earlier, the notion of $WBP$ is only suitable to argue backward privacy in a restricted setting. Hence, it would not be meaningful to compare the definitions of backward privacy which apply to the general setting with $WBP$ .

From the search leakage in $BPUP$ (Definition 3.3) and $BPLP$ (Definition 3.5), one can conclude that all the leakage functions in $L_{BPUP, Search}$ can be derived from the leakage functions in $L_{BPLP, Search}$ , as $TimeDB (w)$ can be derived from $LP - II (w)$ , ${Updates}^{op} (w)$ and $DB (w)$ (in particular $LDB (w)$ ) of the successive searches on w. Note that, $TimeDB (w)$ comprises of information regarding the timestamp of the $add$ updates corresponding to document identifiers in $DB (w)$ that are not followed by their respective $del$ update. This can be easily determined by keeping track of all the $add$ updates corresponding to document identifiers in $DB (w)$ that are not followed by the respective $del$ update through $LP - II (w)$ over all searches on w and ${Updates}^{op} (w)$ . Further, some leakage functions in $L_{BPLP, Search}$ such as $LP - III (w)$ , cannot be derived from $L_{BPUP, Search}$ (see below example). Hence, $L_{BPUP, Search} ≺ L_{BPLP, Search}$ . Figure 5 summarizes the relation among definitions of backward privacy.

Graph: Fig. 5. Relations among definitions of backward privacy.

Example.

Consider the entries corresponding to w in query lists $Q_{1}$ and $Q_{2}$ as described below. Corresponding to w, $(1, add, (1, w))$ , $(2, add, (2, w))$ , $(3, del, (1, w))$ , $(4, del, (2, w))$ , $(5, add, (3, w))$ and $(6, w)$ are present in $Q_{1}$ . Corresponding to w, $(1, add, (1, w))$ , $(2, add, (2, w))$ , $(3, del, (2, w))$ , $(4, del, (1, w))$ , $(5, add, (3, w))$ and $(6, w)$ are present in $Q_{2}$ .

Following is the description of various leakage functions at timestamp 6 in $Q_{1}$ :

$\begin{array}{l} sp (w) = {6} . Updates (w) = {1, 2, 3, 4, 5} . \\ TimeDB (w) = {(5, 3)} . LP - III (w) = {(1, 3), (2, 4)} . \end{array}$

Similarly, the leakage functions at timestamp 6 in $Q_{2}$ are described as:

$\begin{array}{l} sp (w) = {6} . Updates (w) = {1, 2, 3, 4, 5} . \\ TimeDB (w) = {(5, 3)} . LP - III (w) = {(1, 4), (2, 3)} . \end{array}$

As can be observed the leakage functions in $L_{BPUP, Search}$ , i.e., $sp (w)$ , $Updates (w)$ and $TimeDB (w)$ are the same for query lists $Q_{1}$ and $Q_{2}$ . However, $LP - III (w)$ is different for query lists $Q_{1}$ and $Q_{2}$ . Clearly, $LP - III (w)$ cannot be uniquely determined from the leakage functions in $L_{BPUP, Search}$ .

With the newly introduced $BPLP$ , we now have four different formulations of information leakage in the context of backward privacy. An interesting question would be to analyze the real-world consequences of the leakages incurred by constructions satisfying these different notions, viz., $BPIP$ , $BPUP$ , $BPLP$ or even $WBP$ (in restricted setting).

Inverse backward privacy. We propose a new desirable property for a $DSSE$ scheme called inverse backward privacy which captures the complementary situation of backward privacy. Analogous to backward privacy, a $DSSE$ scheme is inverse backward private if whenever a document-keyword pair $(ind, w)$ is deleted and later added, subsequent search queries on w won't reveal the fact that $(ind, w)$ was deleted unless it can be inferred by the search and access pattern of the search query. For example, let $DB (w) = {1, 2}$ for search query $q$ at timestamp 5. Let the update operations on w after query $q$ and before the next search query be $(6, del, (1, w))$ , $(12, add, (1, w))$ , then no information about the identifier 1 in update queries at timestamp 6 and 12 should be revealed to $S$ in the next search query on w.

Inverse Backward Privacy property could be of relevance in various use-cases. For instance, consider the employee database where the employee records correspond to document and project teams she works in correspond to keywords. An employee $E_{1}$ maybe dropped and reincluded in a project team. We would like to hide the fact that employee $E_{1}$ was dropped briefly, if no search on that project team had been performed during that period.

4. Backward private DSSE constructions

In this section we propose two backward private schemes $Π_{BP}$ and $Π_{WBP}$ that are respectively $BPLP$ and $WBP$ secure. Our starting point is a forward private $DSSE$ scheme $Π_{FP}$ which is a modified version of the scheme proposed in [[16]].

4.1. ΠFP: A warm-up solution

Graph: Fig. 6. Scheme ΠFP.

The central idea in $Π_{FP}$ (Fig. 6) is to make updates using fresh keys. Hence, the keys disclosed in previous searches do not reveal anything about these new updates. $Π_{FP}$ is described in Fig. 6. The construction makes use of $PRF$ s $F_{t}$ , $F_{d}$ : ${0, 1}^{λ} \times {0, 1}^{*} \to {0, 1}^{λ}$ and hash functions $H_{1}$ : ${0, 1}^{λ} \times {0, 1}^{*} \to {0, 1}^{2 λ}$ and $H_{2}$ : ${0, 1}^{λ} \times {0, 1}^{*} \to {0, 1}^{μ}$ , where $μ = λ + 1$ and λ is the security parameter.

The $Setup$ algorithm generates secret keys $k_{t}$ and $k_{d}$ . $C$ initiates three maps: T, D and W. The maps T and D are stored at $S$ 's end. Corresponding to w, D stores the pointer to ${PSet}_{w}$ , the set of document identifiers in plaintext that were obtained as a result of the latest search operation on w and T stores encrypted entries inserted after the latest search operation on w. The map W is stored at $C$ 's end. In W, corresponding to w, $C$ stores the version ${ver}_{w}$ (initialized to 0) and counter $c_{w}$ (initialized to −1). ${ver}_{w}$ ensures that the key $k_{w}$ used in the $Update$ protocol is unknown to $S$ , $c_{w}$ stores information about the number of entries added to T corresponding to w after the latest search operation.

$\underline{Update}$ : When $C$ wants to $add / del$ a document-keyword pair ( $ind, w$ ), it computes key $k_{w}$ using keyword w and ${ver}_{w}$ (see line 5) and increments $c_{w}$ . Based on $k_{w}$ and $c_{w}$ , $C$ computes the hash digests $label$ and $pad$ and sends $(label, e = pad \oplus (b | | ind))$ to $S$ . $S$ then adds $(label, e)$ to T. For $add$ (resp. $del$ ) operation, $b = 0$ (resp. $b = 1$ ).

Note that, $k_{w}$ used in processing update queries is computed using updated ${ver}_{w}$ . Since, $k_{w}$ is output of $PRF$ $F_{t}$ at $w ‖ {ver}_{w}$ , it is indistinguishable from random for $S$ . As $label$ (resp. $pad$ ) is computed as $H_{1} (k_{w} ‖ c_{w})$ (resp. $H_{2} (k_{w} ‖ c_{w})$ ) for an update query, both are indistinguishable from random for $S$ as $H_{1}$ (resp. $H_{2}$ ) is modeled as a random oracle. Hence, in the security proof, update queries can be simulated by generating random $(label, e)$ pair.

$\underline{Search}$ : When $C$ wants to perform a search query on w, it computes ${label}_{w}$ (see line 5) and the key $k_{w}$ is computed (see line 7) only if new entries are inserted in map T. $C$ sends ${label}_{w}$ , $k_{w}$ and $c_{w}$ to $S$ . $k_{w}$ gets revealed to $S$ only if a new entry was inserted to T after the previous search on w. Hence, $C$ updates the version ${ver}_{w}$ (see line 8). ${ver}_{w}$ is not updated in a search query on w for which corresponding to w, no updates on map T were made after the previous search on w. Based on the information received from $C$ , $S$ computes the result set and updates D with the newly computed result set.

In $Π_{FP}$ , ${ver}_{w}$ ensures that $S$ cannot relate later updates with previous search queries and $c_{w}$ ensures that $S$ cannot correlate the update queries on w done after the previous search operation on w.

Remark.

Essentially, $Π_{FP}$ is same as the construction in [[16]] except the following: 1) The search counter corresponding to w (denoted as ${ver}_{w}$ ) is updated differently than in [[16]] to avoid unnecessary increments to the search counter. 2) After a search operation on w, the revealed document identifiers are stored together ( $DB (w)$ ) in plaintext (as suggested in [[6]]) to provide reasonable locality without incurring any additional leakage.

Example 1.

Consider the following list of update queries: $(add, (1, w_{1}))$ , $(add, (1, w_{2}))$ , $(add, (2, w_{1}))$ , $(add, (3, w_{1}))$ , $(add, (3, w_{2}))$ , $(del, (1, w_{1}))$ . Figure 7(a) shows the state of indexes at $C$ and $S$ after these updates are processed. Figure 7(b) shows the state of indexes at $C$ and $S$ after search on $w_{1}$ .

Graph: Fig. 7.Example 1: indexes at C and S before and after search on w1 in construction ΠFP. W = client index, D = index that stores search results of previous search query at S and T = index that stores updates after the last search on w at S.

Correctness. The scheme is correct as long as there are no repeated labels in maps T and D. Since, $F_{d}$ is a $PRF$ , only with negligible probability ${label}_{w}$ in D is same for two distinct keywords. The input to $H_{1}$ is repeated only with negligible probability as $F_{t}$ is a $PRF$ . If we consider $H_{1}$ to be a collision resistant hash function, only with negligible probability $label$ in T is repeated.

In Appendix, we provide complete proofs of correctness and $FP - I$ -security (see Definition 3.1) of $Π_{FP}$ .

4.2. ΠBP: Realizing optimal communication complexity

Bost et al. [[6]], proposed a generic way to obtain a two-roundtrip backward-private scheme from a forward private DSSE scheme. Applying this generic transformation on $Π_{FP}$ one gets a backward private scheme which is essentially the same as ${Mitra}^{*}$ [[21]]. This backward private scheme is very efficient as it makes use of light-weight symmetric primitives only. However, the communication complexity of $Search$ protocol in such a backward private scheme is $O (o_{w}^{'})$ which is not optimal i.e., $O (n_{w})$ . Further, as $C$ has to process each ciphertext it receives from $S$ , the computation complexity at $C$ 's end also becomes $O (o_{w}^{'})$ due to the above communication overhead. In order to obtain optimal communication complexity, $S$ should send entries corresponding to only the set of documents currently matching w ( $DB (w)$ ) to $C$ . One approach to satisfy the above requirement is to associate a tag corresponding to each update entry. Using these tags, $S$ , while performing a search on w will be able to correlate the update queries on w corresponding to the same $ind$ . As in ${Diana}_{del}$ and $Janus$ , if the tags are generated deterministically using just $ind$ and w, it leaks $DelHist (w)$ and hence, will not satisfy the notion of backward privacy in the general setting. Here, we leverage the version ${ver}_{w}$ to generate the tags in a simple yet non-trivial manner to ensure that the leakage is bounded by $BPLP$ (see Definition 3.5). This results in the first backward private scheme in the general setting achieving optimal communication complexity using light-weight symmetric primitives only.

Graph: Fig. 8. Scheme ΠBP.

Graph: Fig. 9. Example 2: indexes at C and S before and after search on w1 in ΠBP. W = client index, D = index that stores tags of search results of previous search query at S and T = index that stores updates after the last search on w at S.

Scheme $Π_{BP}$ is described in Fig. 8. For a keyword w, D stores the pointer to ${PSet}_{w}$ , the set of tags corresponding to document identifiers that were obtained as a result of the latest search operation on w. The construction makes use of $PRF$ s $F_{t}$ , $F_{d}$ : ${0, 1}^{λ} \times {0, 1}^{*} \to {0, 1}^{λ}$ , $PRP$ $G_{tag}$ : ${0, 1}^{λ} \times {0, 1}^{2 λ} \to {0, 1}^{2 λ}$ and hash functions $H_{1}$ : ${0, 1}^{λ} \times {0, 1}^{*} \to {0, 1}^{2 λ}$ and $H_{2}$ : ${0, 1}^{λ} \times {0, 1}^{*} \to {0, 1}^{μ}$ , where $μ = λ + 1$ .

For an update query $(op, (ind, w))$ , a $tag$ corresponding to $(ind, w)$ is generated using the current version ${ver}_{w}$ . Then, $b ‖ tag$ is stored in T at $S$ in $Update$ protocol (see line 10). For $add$ (resp. $del$ ) operation, $b = 0$ (resp. $b = 1$ ).

Round 1 of $Search$ protocol is similar to the $Search$ protocol of $Π_{FP}$ . At the end of round 1 of $Search$ , $S$ sends $TS$ , the set of tags corresponding to the document identifiers currently matching w in order of their insertion. For every $tag$ in $TS$ , $C$ computes $G_{tag}^{- 1}$ to get the document identifier $ind$ which it adds to $AuxSet$ and re-computes $tag$ using the updated version which it stores in $PSet$ . $AuxSet$ consists of all the document identifiers currently matching keyword w and $PSet$ consists of the updated tags, in order of their insertion. $C$ sends $PSet$ and $AuxSet$ to $S$ , who stores $PSet$ at $D [{label}_{w}]$ and can use $AuxSet$ to fetch the matching documents. Note that, recomputed tags enable $S$ to consistently handle future searches and updates as the tags corresponding to subsequent updates on w are made using the same value of ${ver}_{w}$ .

Let $U$ be the set of update queries corresponding to w after the previous search and before the current search on w. The tags are generated using the same ${ver}_{w}$ exclusively $\forall q_{u} \in U$ and $\forall ind \in DB {(w)}^{'}$ . Since, the tags are computed using $PRP$ $G_{tag}$ taking w, $ind$ and ${ver}_{w}$ as input, $S$ can only learn the relation between the tags corresponding to the same $ind$ among the update queries in $U$ and $DB {(w)}^{'}$ . As the $ind$ s in $PSet$ are stored in order of insertion, $S$ can link the update queries in $U$ with these $ind$ s. The above leakage is precisely captured in $BPLP$ via leakage functions $LP - I$ , $LP - II$ and $LP - III$ . Moreover, $S$ cannot relate update queries in $U$ with the update queries that are made before the previous search on w and after the current search on w as the tags are generated using different ${ver}_{w}$ .

Example 2.

Consider the following list of update queries: $(add, (1, w_{1}))$ , $(add, (1, w_{2}))$ , $(add, (2, w_{1}))$ , $(add, (3, w_{1}))$ , $(add, (3, w_{2}))$ , $(del, (1, w_{1}))$ . Figure 9(a) shows the state of indexes at $C$ and $S$ after these updates are processed. Figure 9(b) shows the state of indexes at $C$ and $S$ after search on $w_{1}$ .

To summarise, $Π_{BP}$ achieves optimal communication complexity and uses symmetric primitives only. D provides reasonable locality as it stores the tags corresponding to previous search results together. $Π_{BP}$ is easily parallelizable and doesn't impose reinsertion restriction.

Further, in order to obtain a response-hiding scheme, $C$ sends only $PSet$ to $S$ in line 39 in second round of search protocol. We can eliminate the second round of communication using standard piggybacking technique [[16], [19]] and upload the updated tag set $PSet$ with the next search query, thus, achieving a single roundtrip response-hiding backward private $DSSE$ protocol.

Correctness. As mentioned in correctness of $Π_{FP}$ , the scheme is correct as long as there are no repeated labels in maps T and D. Since, these labels are generated in the same manner as they were generated in $Π_{FP}$ , correctness of $Π_{BP}$ immediately follows from that of $Π_{FP}$ .

Asymptotic complexity. The communication complexity of the $Search$ protocol is $O (n_{w})$ . The computational complexity of the $Search$ protocol is $O (o_{w}^{'})$ . The communication and computational cost of the $Update$ protocol is $O (1)$ . Space complexity at the server's end is $O (N + D^{'})$ , where $D^{'} = \sum_{\forall w} d_{w}^{'}$ and at the client's end is $O (m log (n))$ .

Security. We prove $Π_{BP}$ is $BPLP$ in the random oracle model. The proof relies on pseudo randomness of $F_{d}$ , $F_{t}$ and $G_{tag}$ .

Theorem 4.1.

If $F_{d}$ , $F_{t}$ are secure $PRF$ s, $G_{tag}$ is a secure $PRP$ and $H_{1}, H_{2}$ are hash functions modeled as random oracles outputting $2 λ$ and μ bits respectively then $Π_{BP}$ is $BPLP$ secure (Definition3.5).

Proof.

We structure our proof using a sequence of games $G_{0}$ to $G_{5}$ . $G_{0}$ will compute a distribution identical to ${Real}_{A}^{Π_{BP}} (λ)$ and $G_{5}$ will compute a distribution that can be simulated perfectly given the leakage profile $L$ , i.e., its distribution is identical to ${Ideal}_{A, Sim}^{Π_{BP}} (λ)$ .

Game $G_{0}$ : $G_{0}$ is exactly identical to ${Real}_{A}^{Π_{BP}} (λ)$ .

$\begin{array}{l} (3) & Pr [{Real}_{A}^{Π_{BP}} (λ) = 1] = Pr [G_{0} = 1] . \end{array}$

Game $G_{1}$ : In $G_{1}$ , every call to $PRF$ s $F_{t}$ and $F_{d}$ are answered using tables ${Key}_{t}$ and ${Key}_{d}$ respectively. The entries in table ${Key}_{t}$ are referred by $(w, ver)$ and entries in table ${Key}_{d}$ are referred by w. Conventionally, when an entry is being accessed for the first time, it is chosen at random and then used thereafter, which is followed in the rest of the paper unless mentioned explicitly. If there exists an adversary $A$ that is able to distinguish between games $G_{0}$ and $G_{1}$ , we can construct an adversary $B_{1}$ that can distinguish $F_{t}$ from a truly random function and/or an adversary $B_{2}$ that can distinguish $F_{d}$ from a truly random function. Formally, there exist adversaries $B_{1}$ and $B_{2}$ , such that

$\begin{array}{l} | Pr [G_{0} = 1] - Pr [G_{1} = 1] | & ⩽ {Adv}_{F_{t}, B_{1}}^{PRF} (λ) + {Adv}_{F_{d}, B_{2}}^{PRF} (λ) \\ (4) & ⩽ neg (λ) . \end{array}$

Game $G_{2}$ : In $G_{2}$ , every call to $PRP$ $G_{tag}$ is answered using table ${Key}_{tag}$ . The entries in table ${Key}_{tag}$ are referred by $(w, ind, ver)$ . If the randomly generated $tag$ has been selected earlier in ${Key}_{tag}$ , $G_{2}$ is aborted. Since, the number of queries to $PRP$ $G_{tag}$ , say q, is a polynomial in security parameter, by the birthday bound we can conclude that the probability that the two tags are equal is at most $\frac{q^{2}}{2^{2 λ}}$ , i.e., $neg (λ)$ . Therefore, $G_{2}$ aborts with negligible probability.

Now, if there exists an adversary $A$ that is able to distinguish between games $G_{1}$ and $G_{2}$ , we can construct an adversary $B$ that can distinguish $G_{tag}$ from a truly random permutation. Formally, there exists an adversary $B$ , such that

$\begin{array}{l} | Pr [G_{1} = 1] - Pr [G_{2} = 1] | & ⩽ {Adv}_{G_{tag}, B}^{PRP} (λ) + \frac{q^{2}}{2^{2 λ}} \\ (5) & ⩽ neg (λ) . \end{array}$

Game $G_{3}$ : In $G_{3}$ , instead of calling $H_{1}$ to generate $label$ in the $Update$ protocol, we pick random strings. Then, during the $Search$ protocol, the random oracle $H_{1}$ is programmed accordingly, to ensure consistency.

Tables ${Hash}_{1}$ and $H_{1}$ are used to simulate the random oracle $H_{1}$ , the entries in the table ${Hash}_{1}$ are referred by $(w, ver, c)$ and in the table $H_{1}$ by ( $k, c$ ).

Graph: Fig. 10.Games G3 and G3′ (Theorem 4.1). G3′ includes the box code and G3 does not.

Figure 10 formally describes $G_{3}$ , and an intermediate game $G_{3}^{'}$ . In $G_{3}^{'}$ , $H_{1}$ is never programmed to two different values for the same input, thus, ensuring consistency. Instead of storing the randomly picked value in table ${Hash}_{1}$ at position $(w, {ver}_{w}, c_{w})$ , one first checks whether $H_{1}$ is already programmed at value $(k_{w}, c_{w})$ which can happen if there was a query to the random oracle $H_{1}$ with input $(k_{w} ‖ c_{w})$ . If the check is true, the value $H_{1} (k_{w} ‖ c_{w})$ is stored in ${Hash}_{1} [w, {ver}_{w}, c_{w}]$ else the randomly picked value is stored in ${Hash}_{1} [w, {ver}_{w}, c_{w}]$ . The random oracle when needed in the $Search$ protocol in line 9 or by an adversary's query to random oracle $H_{1}$ in line 5 is lazily programmed in $G_{3}$ , so that the outputs are consistent throughout.

The only difference between game $G_{2}$ and $G_{3}^{'}$ is how we model the random oracle $H_{1}$ . The outputs of $H_{1}$ is perfectly indistinguishable in both these games, therefore,

$\begin{array}{l} (6) & Pr [G_{3}^{'} = 1] = Pr [G_{2} = 1] . \end{array}$

Let us denote the event "*"the flag $bad$ is set to $true$ in $G_{3}^{'}$ by $E_{1}$ . The games $G_{3}^{'}$ and $G_{3}$ are also perfectly identical unless the event $E_{1}$ occurs, and we can apply identical-until-bad technique [[2]] to bound the distinguishing advantage between $G_{3}^{'}$ and $G_{3}$ .

$\begin{array}{l} (7) & | Pr [G_{3}^{'} = 1] - Pr [G_{3} = 1] | ⩽ Pr [E_{1}] . \end{array}$

The event $E_{1}$ occurs in line 8 of $Update$ protocol and in line 5 of $H_{1}$ algorithm (see Fig. 10). The former captures the fact that adversary has already queried random oracle $H_{1}$ at input $(k_{w} ‖ c_{w})$ and the latter captures the fact that the adversary queries random oracle $H_{1}$ on a valid input $(k ‖ c)$ . Since, the value $k_{w}$ is picked uniformly at random and the adversary cannot do anything better than guessing the value of $k$ , the probability with which event $E_{1}$ occurs is negligible. Using (6) and (7), we can conclude:

$\begin{array}{l} (8) & | Pr [G_{2} = 1] - Pr [G_{3} = 1] | ⩽ neg (λ) . \end{array}$

Game $G_{4}$ : In $G_{4}$ , what we did for $H_{1}$ in game $G_{3}$ , we do for $H_{2}$ . Using the same arguments, we can conclude:

$\begin{array}{l} (9) & | Pr [G_{3} = 1] - Pr [G_{4} = 1] | ⩽ neg (λ) . \end{array}$

Graph: Fig. 11.Game G5 (Theorem 4.1).

Game $G_{5}$ : In $G_{5}$ (see Fig. 11), we abstract out the information that needs to be simulated by the simulator in order to output transcripts identical to $G_{4}$ . Using ${GetData}_{r 1}$ and ${GetData}_{r 2}$ algorithms in $G_{5}$ , one keeps track of the randomly generated $tag$ , $label$ and $pad$ differently than in $G_{4}$ . In $Search$ protocol, the random oracles are programmed identically to that in $G_{4}$ . Queries to random oracles $H_{1}$ and $H_{2}$ can be simulated by outputting random values.

As we output fresh random strings in $Update$ protocol, the transcripts of $Update$ protocol is identical to that of $Update$ protocol in $G_{4}$ .

Next, we describe the $Search$ protocol in $G_{5}$ . Based on tables $Update$ , $STs$ (search timestamps) and $Tag$ map, the value of the following components: ${empty}_{w}$ , ${ver}_{w}$ , $c_{w}$ and ${{tag}_{c}, H_{1, w, c}, H_{2, w, c}}_{0 ⩽ c ⩽ c_{w}}$ are determined using ${GetData}_{r 1}$ algorithm. Flag $empty$ is set to 1 if $Update [w]$ is empty. ${ver}_{w}$ is the count of searches for which there was an update on map T corresponding to keyword w after the last search. The loop in line 6 of ${GetData}_{r 1}$ algorithm determines the number of times version number is updated, i.e., value of ${ver}_{w}$ . Here, $c_{w}$ denotes the count of updates on map T corresponding to keyword w after the last search. $tag$ is picked from the map $Tag$ , therefore, if the indices are same, the tags are equal, thus ensuring the consistency. The values ${H_{1, w, c}, H_{2, w, c}}_{0 ⩽ c ⩽ c_{w}}$ are used to simulate the random oracles consistently with the response given at the time of update queries. The loop in line 10 of ${GetData}_{r 1}$ algorithm computes the values of $c_{w}$ and ${{tag}_{c}, H_{1, w, c}, H_{2, w, c}}_{0 ⩽ c ⩽ c_{w}}$ . ${GetData}_{r 2}$ algorithm outputs $LDB (w)$ and new tags corresponding to the document identifiers currently present in $DB (w)$ . These tags are ordered based on the order of insertion of documents they correspond to. As the values of components are computed correctly and consistently w.r.t. all the previous queries, the transcripts of $Search$ protocol is identical to that of $Search$ protocol in $G_{4}$ . Therefore, we conclude that:

$\begin{array}{l} (10) & Pr [G_{5} = 1] = Pr [G_{4} = 1] . \end{array}$

Simulator $Sim$ : Finally, we construct a simulator that given the leakage profile $L$ simulates game $G_{5}$ correctly. $Sim$ can simulate $Update$ protocol correctly as in $G_{5}$ . Instead of using keyword w in Fig. 11, $Sim$ uses the counter $\overline{w} = min sp (w)$ uniquely mapped from w using $L_{Search}$ in simulating the $Search$ protocol (line 6 and line 8). In line 2 of $Search$ protocol, $Sim$ uses $sp (w)$ , ${Updates}^{op} (w)$ , $LP - I (w)$ and $LP - III (w)$ instead of $STs$ , $Update$ and $Tag$ as input to the ${GetData}_{r 1}$ algorithm. In ${GetData}_{r 1}$ , we use the timestamps of search and update queries and make use of indices to generate $tag$ . However, the indices are used just to identify when same tags need to be generated. This can be ensured using, $LP - I (w)$ and $LP - III (w)$ .

Also, the output of ${GetData}_{r 2}$ is $LDB (w)$ , i.e., $AuxSet$ and an ordered list of freshly generated random tags $PSet$ which can be simulated easily using $LP - II (w)$ and $LDB (w)$ . In ${GetData}_{r 2}$ , the indices are used to just associate an order to the generated tags, which can be ensured by the components $LP - II (w)$ and $LDB (w)$ of the leakage profile. Thus, $Sim$ is able to produce transcripts of output of $Search$ and $Update$ protocols identical to $G_{5}$ . Hence, we conclude that:

$\begin{array}{l} (11) & Pr [{Ideal}_{A, Sim}^{Π_{BP}} (λ) = 1] = Pr [G_{5}] . \end{array}$

By connecting all the games, we conclude

$\begin{array}{l} (12) & | Pr [{Real}_{A}^{Π_{BP}} (λ) = 1] - Pr [{Ideal}_{A, Sim}^{Π_{BP}} (λ) = 1] | ⩽ neg (λ) . □ \end{array}$

4.3. ΠWBP: A weak backward private variant

Graph: Fig. 12. Scheme ΠWBP.

As mentioned in Section 3.3, there are various use-cases such as storing a collection of text files in which reinsertion restriction may not be a serious concern. Here, we propose a simple one roundtrip response-hiding backward private scheme $Π_{WBP}$ in the reinsertion restriction setting. $Π_{WBP}$ is essentially a simple modification to $Π_{BP}$ and inherits all its salient features. $Π_{WBP}$ (see Fig. 12) improves upon the concrete efficiency of $Search$ protocol in $Π_{BP}$ , by eliminating the computation and transmission of newly generated tags in the second round of communication. This shows that one can construct an efficient one-round trip $WBP$ scheme with optimal communication complexity using simple primitives only.

In $Π_{BP}$ , we use ${ver}_{w}$ in computation of tags to securely handle reinsertions. The tags corresponding to fresh updates after a search are computed using the updated ${ver}_{w}$ . Hence, these tags cannot be related to the entries that are deleted before this search. This ensures backward privacy in cases where reinsertion is allowed. However, when we consider the reinsertion restriction setting, an add query is not allowed after a delete query corresponding to the same document-keyword pair. Hence, ${ver}_{w}$ is not required in computation of tags in this setting. Therefore, for an update query, $tag$ in line 8 of $Update$ protocol in $Π_{BP}$ (see Fig. 8) is computed as $tag \leftarrow G_{tag} (k_{g}, w ‖ ind)$ in line 8 of $Update$ protocol in $Π_{WBP}$ (see Fig. 12).

$Search$ protocol in $Π_{WBP}$ is identical to Round 1 of $Search$ protocol in $Π_{BP}$ . In line 31 of $Search$ protocol in $Π_{BP}$ , $S$ along with sending $TS$ to $C$ , stores $TS$ at $D [{label}_{w}]$ . From $TS$ , $C$ retrieves the search results in similar fashion as in $Π_{BP}$ . Recomputation of tags in $Search$ protocol in $Π_{BP}$ is needed because ${ver}_{w}$ gets updated. While in $Π_{WBP}$ , since the tags are independent of ${ver}_{w}$ , recomputation of tags is not required. Hence, the second round of communication is not needed.

The changes we make in $Π_{BP}$ in constructing $Π_{WBP}$ induces additional leakage which can be shown to be bounded by the leakage profile of $WBP$ (Definition 3.4). The proof of Theorem 4.1 can be easily adapted to argue security of $Π_{WBP}$ .

The crucial observation from our constructions is that efficient backward private schemes with optimal communication complexity can be realized without involving complex cryptographic primitives, as was the case in [[6], [41]]. The simplicity of design in our backward private constructions is an appealing feature, particularly from the implementation perspective.

4.4. Comparative analysis

In Table 1, we provided a comparison of our schemes with some prior and concurrent works [[6], [21], [41]]. On that line, we conclude this section with a comparative analysis of the currently available BP-secure $DSSE$ schemes. The goal is to figure out the scenarios in which each of these constructions would fit best. Let us first consider the scenario where the requirement is to achieve minimal information leakage. The candidate constructions are $Orion$ [[21]] and $Moneta$ [[6]] as they satisfy strong notion of backward privacy ( $BPIP$ ). As the search time of $Orion$ is quasi-optimal in $n_{w}$ (linear in $n_{w}$ upto a logarithmic factor), it may appear to be more suitable than $Moneta$ in such scenarios. However, $Orion$ may not be practical for very large databases as Path-ORAM [[40]] is used as a building block in its construction, which limits the applicability of $Orion$ in such scenarios [[33]]. Constructions $Mitra$ [[21]] and $Fides$ [[6]] satisfy the next level of backward privacy. $Mitra$ makes use of symmetric primitives only and thus is very efficient in practice. But, it still suffers from significant communication overhead.

In order to overcome the above limitations, one may trade security a bit for performance, while at the same time ensure that the notion of forward and backward privacy is preserved. Constructions ${Diana}_{del}$ [[6]], $Janus$ [[6]], $Janus + +$ [[41]], $Horus$ [[21]] and $Π_{WBP}$ (Section 4.3) satisfy the notion of weak backward privacy $(WBP)$ . The communication and computation complexity of search and update protocols of ${Diana}_{del}$ isn't optimal (See Table 1). To improve upon the communication overhead of $Search$ protocol, a single round-trip $Janus$ framework [[6]] was proposed. It was instantiated using asymmetric and symmetric puncturable encryption scheme in $Janus$ [[6]] and $Janus + +$ [[41]] respectively. However, the computational complexity of search protocol in $Janus$ framework is $O (n_{w} \cdot d_{w})$ , which is unreasonably high ( $n_{w} \cdot d_{w} ≫ o_{w}^{'}$ ). $Horus$ , a modified version of $Orion$ , was proposed in order to improve the number of round trips in the $Search$ protocol ( $O (log (N))$ to $O (log (d_{w}))$ ). But $Horus$ suffers from the same scalability issue as $Orion$ . In contrast, $Π_{WBP}$ is a single-round trip $DSSE$ scheme that achieves optimal communication complexity, makes use of symmetric primitives only and is very efficient in practice. However, the computation complexity of the $Search$ protocol is not quasi-optimal in $n_{w}$ . Moreover, the notion of weak backward privacy can only be used in scenarios where reinsertion of keyword-document is not allowed.

$BPLP$ (Definition 3.5) along with satisfying the intuitive notion of backward privacy, allows the reinsertion of keyword-document pairs. The corresponding construction, $Π_{BP}$ (Section 4.2) is the first $DSSE$ scheme that satisfies the notion of backward privacy in the general setting that achieves optimal communication complexity, makes use of symmetric primitives only and is very efficient in practice. The only limitation is that the search time in $Π_{BP}$ is not quasi-optimal in $n_{w}$ which seems to be the cost that one has to pay to achieve optimal communication complexity in $Update$ and $Search$ protocol.

Remark.

As the size of $EDB$ grows in $Mitra$ [[21]] with every update, including deletions, the authors employed a periodic "clean-up" operation [[4], [6], [16], [39]]. In this operation, $C$ removes the deleted entries after a search, re-encrypts the remaining ones, and sends them back to $S$ . The resultant scheme is called ${Mitra}^{*}$ [[21]]. If there are no deletions then for the first search on a keyword w, the actual computation cost of ${Mitra}^{*}$ will be less compared to $Π_{BP}$ as ${Mitra}^{*}$ involves only $PRF$ evaluations which can be realized using the blazingly fast AES block cipher whereas $Π_{BP}$ involves hash function evaluations as well. However, for subsequent searches on the same keyword w, the hash function evaluations reduces drastically in $Π_{BP}$ with the introduction of map D. Moreover, the communication cost of ${Mitra}^{*}$ will be more than $Π_{BP}$ even when there are no deletions. This is because in ${Mitra}^{*}$ , $C$ sends labels to $S$ to identify the respective entries in the dictionary in addition to sending the re-encrypted (label, value) pairs.

5. Implementation results

In this section, we discuss the performance of schemes $Π_{FP}$ , $Π_{BP}$ and $Π_{WBP}$ . $Π_{FP}$ being currently the most efficient forward private scheme in literature, serves as a benchmark to evaluate the performance of other $DSSE$ schemes. We also compare the performance of the proposed schemes with the most practically promising backward private schemes in literature, viz., ${Mitra}^{*}$ [[21]] and ${Diana}_{del}$ [[6]]. For ${Mitra}^{*}$ and ${Diana}_{del}$ , we used the codes available in [[11]]. The implementation results for $Π_{BP}$ gives a fair indication about the performance of $Π_{WBP}$ as their asymptotic computation complexity are the same and both make use of same light-weight symmetric primitives. Hence, the performance of $Π_{WBP}$ is reported only on some parameters which gives a fair idea of how the performance of $Π_{WBP}$ stacks against that of $Π_{BP}$ and $Π_{FP}$ .

We have implemented the schemes in C++. For pseudo random functions $F_{d}$ , $F_{t}$ and pseudo random permutation $G_{tag}$ , we use AES, and for hash functions $H_{1}$ and $H_{2}$ , we use SHA-256. We use the AES and SHA-256 function available in OpenSSL library [[42]] in our code. Maps T and W are stored using RocksDB [[18]].

All our experiments were performed on a desktop computer with an Intel Core i5 4460 3.20 GHz CPU and 8 GB RAM running Ubuntu 16.04 LTS. Our code is designed to run as a single program as we are interested in determining the performance of $Search$ and $Update$ protocols of our constructions.

We used Enron email dataset [[15]] to create EDB on which we perform our search and update operations. We wrote a python code to extract keywords from the mails in Enron email dataset using NLTK library [[36]]. The number of documents, number of keywords and number of document-keyword pairs in our dataset are 517,401, 212,020 and 36,688,028 respectively.

Table 3 $EDB$ creation

Implementation	Time (s)	Time per pair* (μs)	Strg. at $S$ (GB)	Strg. at $C$ (MB)
$Π_{FP}$	164.66	4.49	2.8	2.4
$Π_{WBP}$	179.04	4.88	2.8	2.4
$Π_{BP}$	182.90	4.98	2.8	2.4
${Mitra}^{*}$	223.59	6.09	1.3	2.4
${Diana}_{del}$	327.03	8.91	2.1	4.1

* – document-keyword pair, Strg. = Storage.

EDB creation. $EDB$ was created to store all the document-keyword pairs extracted from the Enron email dataset. The computational works and I/O latency required for $EDB$ creation are parallelized using thread pool. Table 3 shows the time taken to create an $EDB$ , the time taken to process each document-keyword pair and the size of $EDB$ and W just after $EDB$ creation for schemes $Π_{FP}$ , $Π_{WBP}$ , $Π_{BP}$ , ${Mitra}^{*}$ and ${Diana}_{del}$ .

For each entry in Table 3, we ran our experiment 10 times and computed the average. The time taken to create $EDB$ for schemes $Π_{WBP}$ and $Π_{BP}$ is just 8.6% and 10.9% more than the time taken to create $EDB$ for scheme $Π_{FP}$ . $Π_{WBP}$ and $Π_{BP}$ improve upon the performance of currently the most efficient backward private scheme ${Mitra}^{*}$ by 19.9% and 18.2% respectively. The per document-keyword processing time is very less for all the schemes as only symmetric primitives are used in these constructions. For $Π_{WBP}$ and $Π_{BP}$ , the size of $EDB$ is same as in $Π_{FP}$ . However, the size of $EDB$ in ${Mitra}^{*}$ and ${Diana}_{del}$ is comparatively lesser. This is because in our implementation the labels were computed using SHA-256 (256 bits long), whereas the implementation of ${Mitra}^{*}$ and ${Diana}_{del}$ in [[11]] utilized AES (128 bits long) for the same. Our choice is guided by [[4], [30]], who suggest for security parameter λ, the size of the labels, μ, must be atleast $λ + 2 log (N)$ to ensure correctness of the scheme. The storage requirement at $C$ 's end for all the schemes is very less, as $C$ 's index is only needed to book keep the respective counter values corresponding to the keywords.

Since, $EDB$ is created by calling the $Update$ protocol of the respective constructions, it also gives a fair indication of the performance of the $Update$ protocol. Hence, we do not separately compare the performance of $Update$ protocol.

Graph: Fig. 13. Average per entry search time (single-threaded).

EDB search. To evaluate the search performance, we searched all the keywords extracted from the Enron email dataset just after $EDB$ creation and measured the overall time taken for the $Search$ protocol. The main purpose of this experiment was to evaluate the search performance independent of the improvements achieved through locality and efficient handling of delete queries in our proposed schemes. We first begin with evaluating the performance of our proposed scheme with ${Mitra}^{*}$ and ${Diana}_{del}$ . Figure 13 describes the search time per matched entry ( $stpme$ ) based on the number of documents returned in the search results for single-threaded instances of schemes $Π_{FP}$ , $Π_{BP}$ , $Π_{WBP}$ and ${Mitra}^{*}$ . In Figs 13 and 14,

$\begin{matrix} RS (i) = \{\begin{matrix} DB (w) = 1 & if i = 0 \\ 2^{i - 1} < DB (w) ⩽ 2^{i} & if 1 ⩽ i ⩽ 18 \end{matrix} \end{matrix}$

denotes discretization of the result set size.

Graph: Fig. 14. Average per entry search time (four-threaded).

Graph: Fig. 15. Average per entry search time as a function of number of threads.

The performance of ${Mitra}^{*}$ is better than $Π_{BP}$ for RS(0) and RS(1). This is mainly because of one time cost of accessing map D which is amortized for searches matching large number of documents. From then on our proposed schemes, $Π_{WBP}$ and $Π_{BP}$ , keep performing better than ${Mitra}^{*}$ . $Π_{WBP}$ and $Π_{BP}$ achieve 2× improvement over ${Mitra}^{*}$ in most of the cases. Further, as can be observed from the figure $Π_{WBP}$ and $Π_{BP}$ do not incur a significant overhead over $Π_{FP}$ . For instance, the $stpme$ in $Π_{FP}$ , $Π_{BP}$ , $Π_{WBP}$ and ${Mitra}^{*}$ is 6.81, 9.39, 8.16 and 19.20 μs for queries whose result set size was in the interval $(2^{17}, 2^{18}]$ . The performance of ${Diana}_{del}$ is orders of magnitude slower than the rest of the schemes, hence not included in Fig. 13. For example, the $stpme$ in ${Diana}_{del}$ is 599 and 294 μs for queries whose result set size was in the interval RS(0) and RS(18) respectively. As can be observed from the performance results, the improvement in computation cost is quiet notable. However, the main performance benefit is achieved by remarkably reducing the communication cost which is described later in this section.

Next, we describe the benefit of parallelism in the performance of $Search$ protocol in our constructions. The structure of $Π_{BP}$ construction is similar to ${Mitra}^{*}$ and $Π_{WBP}$ . Therefore, $Π_{BP}$ will continue to have the same advantage (resp. disadvantage) in multi-threaded environment over ${Mitra}^{*}$ (resp. $Π_{WBP}$ ), as in single-threaded implementation. Figure 14 describes $stpme$ based on the number of documents returned in the search results for multi-threaded (four-threaded) instances of schemes $Π_{FP}$ and $Π_{BP}$ . For searches matching less number of documents, the cost is high because of one time computations such as storage access in computation of token at $C$ 's end, creation of threads, access to map D, etc., which is amortized for searches matching large number of documents. Further, on looking closely at Fig. 13 and 14, one can observe that the time taken by multi-threaded instance is more than in single threaded instance for RS(0) and RS(1). This is because of unnecessary overhead of creation of multiple threads. In multi-threaded environment, the performance of $Π_{BP}$ is very close to the performance of $Π_{FP}$ . The $stpme$ in $Π_{BP}$ was 3.6 (resp. 4.89) μs for queries whose result set size was in the interval $(2^{17}, 2^{18}]$ . Figure 15 illustrates that the $stpme$ for both the schemes is affected by the number of threads used to perform the search operation. The schemes performed the best when the number of threads were around the number of cores in the processor, i.e., 4.

EDB dynamic environment. In this experiment, we study the performance of search queries in dynamic environment, i.e., where search queries are interspersed with update queries. For this purpose, we identified a set of keywords, denoted by $S_{80k}$ , in which each keyword matches more than 80k many documents in the Enron Dataset. An initial $EDB$ was constructed using extracted document-keyword pairs apart from those corresponding to keywords in $S_{80k}$ . The document-keyword pairs corresponding to keywords in $S_{80k}$ are then utilized to perform update queries dynamically. Figure 16 describes the $stpme$ with regard to the probability (p) of search queries on keywords in $S_{80k}$ . This implies that the update queries occur with probability $1 - p$ , of which, with probability 0.1 it is a $del$ query. The performance evaluation includes the time required to process $del$ operation in search queries. Figure 16 illustrates that the average $stpme$ decreases in scenarios where search queries are frequent, as all these schemes exploit the locality introduced by D. The average $stpme$ in $Π_{BP}$ (resp. $Π_{WBP}$ ) turns out to be 1.53 μs (resp 0.58 μs), when the probability of search query is 0.0005. Schemes $Π_{WBP}$ and $Π_{FP}$ perform neck-to-neck on this metric. Constructions $Π_{FP}$ , $Π_{WBP}$ and $Π_{BP}$ leverage the locality improvement obtained through the introduction of map D to achieve this significant boost in performance.

The prototype implementations of $Π_{FP}$ , $Π_{BP}$ and $Π_{WBP}$ indicate that the cost of achieving backward privacy over and above forward privacy is substantially small. Moreover, it also indicates that $Π_{WBP}$ and $Π_{BP}$ have an appreciable edge in terms of computation cost over the existing backward private constructions. This makes $Π_{BP}$ and $Π_{WBP}$ suitable candidates for practical applications.

Graph: Fig. 16. Average per entry search time (dynamic environment).

Communication cost. As all our constructions make use of symmetric primitives only, the communication cost becomes the main performance bottleneck. We now compare the communication cost of ${Mitra}^{*}$ , $Π_{BP}$ and $Π_{WBP}$ as a function of the nature of update queries. Figure 17 depicts the communication cost of the trio based on the number of documents returned in the search results and the probability $(d)$ with which an update query is a $del$ query. As can be observed, for $Π_{BP}$ and $Π_{WBP}$ the communication cost depends only upon the result set size and is independent of the nature of update queries, i.e., metric d. Therefore, the communication complexity is the same for all types of update query distribution for $Π_{BP}$ and $Π_{WBP}$ . However, that is not the case for ${Mitra}^{*}$ , as can be observed from Fig. 17, where the communication complexity increases with increase in the percentage of delete queries. Concretely, $Π_{WBP}$ has the least communication overhead among these three backward private constructions in practice. For instance, for a query result size of 262144 documents, the communication cost in $Π_{WBP}$ , $Π_{BP}$ and ${Mitra}^{*}$ respectively is 8.4 GB, 18.9 GB and 29.4 GB which rises to 92.3 GB when the probability of delete query is 0.4. Hence, the communication cost of our constructions fair reasonably well against the most efficient construction until this work.

Graph: Fig. 17. Communication cost: number of matching documents vs. probability of delete queries.

6. Conclusion

The main contribution of this paper is to propose two efficient backward private $DSSE$ schemes, viz., $Π_{BP}$ and $Π_{WBP}$ . We start with revisiting the existing definitions of backward privacy and propose an alternative formulation of leakage for backward privacy, $BPLP$ . The proposed constructions achieve practical efficiency by using light weight symmetric cryptographic components only. In particular, our construction $Π_{BP}$ is the first backward private scheme in the general setting that achieves optimal communication complexity using symmetric cryptographic primitives only. The main takeaway from this work is that efficient backward private schemes can be realized without involving complex cryptographic primitives. The simplicity of their design make our backward private constructions even more appealing, particularly from the implementation perspective. On the definition front, an interesting question arising out of this study is to analyze the real-world consequences of the leakages incurred by constructions satisfying the notion of backward privacy, viz., $BPIP$ , $BPUP$ $BPLP$ or even $WBP$ (in restricted setting). On the construction front, an interesting problem to pursue is to design an efficient, single roundtrip, response revealing backward private scheme in the general setting ideally with optimal communication complexity.

Appendix Correctness and security of ΠFP

A.1. Correctness

Theorem A.1.

If $F_{d}$ , $F_{t}$ are secure $PRF$ s and $H_{1}$ is a collision-resistant hash function then $Π_{FP}$ is correct.

Proof.

We use the games $G_{0}$ and $G_{1}$ . In the modification snippets, $G_{1}$ includes the box code and $G_{0}$ does not. The games are identical to $Π_{FP}$ (see Fig. 6) except for the following changes:

(1) In the $Setup$ algorithm, we initialize the sets ${LabSet}_{1}$ , ${LabSet}_{2}$ to null set and set boolean variable $bad$ to $false$ .

(2) We remove line 5 and 6 of $Update$ protocol and after line 4 in the $Update$ protocol, we add the following code:

Graph

(3) We replace line 5 in the $Search$ protocol with the following code:

Graph

(4) We replace line 7 in the $Search$ protocol with the following code:

Graph

The first game $G_{0}$ will output 1, only if $bad$ is set, as repeated labels in maps T and D are the only source of incorrectness. $G_{0}$ produces an identical distribution to real game when $bad$ is not set. If the value assigned to $label$ is repeated, $G_{0}$ replaces it with new value which hasn't been assigned to any $label$ up till now.

Let us denote the event "the flag $bad$ is set to $true$ " in $G_{0}$ by $E_{0}$ . This gives,

$\begin{array}{l} (13) & Pr [{DSSECor}_{A}^{Π_{FP}} (λ) = 1] ⩽ Pr [E_{0}] . \end{array}$

In $G_{1}$ , every call to $PRF$ s $F_{t}$ and $F_{d}$ are answered using tables ${Key}_{t}$ and ${Key}_{d}$ respectively. The entries in table ${Key}_{t}$ are referred by $(w, ver)$ and entries in table ${Key}_{d}$ are referred by w.

Let us denote the event "the flag $bad$ is set to $true$ " in $G_{1}$ by $E_{1}$ .

If there exists an adversary $A$ that is able to distinguish between games $G_{0}$ and $G_{1}$ , we can construct an adversary $B_{1}$ that can distinguish $F_{t}$ from a truly random function and/or an adversary $B_{2}$ that can distinguish $F_{d}$ from a truly random function. Formally, there exist adversaries $B_{1}$ and $B_{2}$ , such that

$\begin{array}{l} | Pr [E_{1}] - Pr [E_{0}] | & ⩽ {Adv}_{F_{t}, B_{1}}^{PRF} (λ) + {Adv}_{F_{d}, B_{2}}^{PRF} (λ) \\ (14) & ⩽ neg (λ) . \end{array}$

The event $E_{1}$ occurs only when the newly picked label value was already picked earlier in $G_{1}$ . $E_{1}$ occurs in line 7 of modification (3) above, if the same label is generated for more than one keyword. Since, the labels are picked uniformly at random in line 5 and the number of keywords, m, is a polynomial in security parameter, by the birthday bound we can conclude that the probability that the two labels for map D are equal is at most $\frac{m^{2}}{2^{λ}}$ , i.e., $neg (λ)$ .

Further, the key $k_{w}$ is picked uniformly at random (see line 5 of modification (2)) and the number of updates on T, say q, is a polynomial in security parameter. By the birthday bound we can conclude that the probability that the two keys are equal is at most $\frac{q^{2}}{2^{λ}}$ , i.e., $neg (λ)$ . Therefore, only with negligible probability the input to $H_{1}$ is repeated. So, if $E_{1}$ occurs in line 8 of modification (2), one can find collision in the hash function $H_{1}$ . Since, $H_{1}$ is collision resistant this happens with negligible probability. Therefore,

$\begin{array}{l} (15) & Pr [E_{1}] ⩽ neg (λ) . \end{array}$

From (13), (14) and (15), we get

$\begin{array}{l} (16) & Pr [{DSSECor}_{A}^{Π_{FP}} (λ) = 1] ⩽ neg (λ) . □ \end{array}$

A.2. Security

Theorem A.2.

If $F_{d}$ , $F_{t}$ are secure $PRF$ s and $H_{1}$ , $H_{2}$ are hash functions modeled as random oracles outputting 2λ and μ bits respectively then $Π_{FP}$ is $FP - I$ secure (Definition 3.1 ).

Proof.

We structure our proof using a sequence of games $G_{0}$ to $G_{4}$ . $G_{0}$ will compute a distribution identical to ${Real}_{A}^{Π_{FP}} (λ)$ and $G_{4}$ will compute a distribution that can be simulated perfectly given $L$ , i.e., its distribution is identical to ${Ideal}_{A, Sim}^{Π_{FP}} (λ)$ and the intermediate games are hybrids.

Game $G_{0}$ : $G_{0}$ is exactly identical to ${Real}_{A}^{Π_{FP}} (λ)$ .

$\begin{array}{l} (17) & Pr [{Real}_{A}^{Π_{FP}} (λ) = 1] = Pr [G_{0} = 1] . \end{array}$

Game $G_{1}$ : In $G_{1}$ , every call to $PRF$ s $F_{t}$ and $F_{d}$ are answered using tables ${Key}_{t}$ and ${Key}_{d}$ respectively. The entries in table ${Key}_{t}$ are referred by $(w, ver)$ and entries in table ${Key}_{d}$ are referred by w. If there exists an adversary $A$ that is able to distinguish between games $G_{0}$ and $G_{1}$ , we can construct an adversary $B_{1}$ that can distinguish $F_{t}$ from a truly random function and/or an adversary $B_{2}$ that can distinguish $F_{d}$ from a truly random function. Formally, there exist adversaries $B_{1}$ and $B_{2}$ , such that

$\begin{array}{l} | Pr [G_{0} = 1] - Pr [G_{1} = 1] | & ⩽ {Adv}_{F_{t}, B_{1}}^{PRF} (λ) + {Adv}_{F_{d}, B_{2}}^{PRF} (λ) \\ (18) & ⩽ neg (λ) . \end{array}$

Graph: Fig. 18. Games G2 and G2′ (Theorem A.2). G2′ includes the box code and G2 does not.

Game $G_{2}$ : In $G_{2}$ , instead of calling $H_{1}$ to generate $label$ in the $Update$ protocol, we pick random strings. Then, during the $Search$ protocol, the random oracle $H_{1}$ is programmed accordingly to ensure consistency.

Table ${Hash}_{1}$ and $H_{1}$ are used to simulate the random oracle $H_{1}$ , the entries in the table ${Hash}_{1}$ are referred by $(w, ver, c)$ and in the table $H_{1}$ by ( $k, c$ ).

Figure 18 formally describes $G_{2}$ , and an intermediate game $G_{2}^{'}$ . In $G_{2}^{'}$ , $H_{1}$ is never programmed to two different values for the same input, thus, ensuring consistency. Instead of storing the randomly picked value in table ${Hash}_{1}$ at position $(w, {ver}_{w}, c_{w})$ , it first checks whether $H_{1}$ is already programmed at value $(k_{w}, c_{w})$ which can happen if there was a query to the random oracle $H_{1}$ with input $(k_{w} ‖ c_{w})$ . If the check is true, the value $H_{1} (k_{w} ‖ c_{w})$ is stored in ${Hash}_{1} [w, {ver}_{w}, c_{w}]$ else the randomly picked value is stored in ${Hash}_{1} [w, {ver}_{w}, c_{w}]$ . The random oracle when needed in the $Search$ protocol in line 9 or by an adversary's query to the random oracle $H_{1}$ in line 5 is lazily programmed in $G_{2}$ , so that the outputs are consistent throughout.

The only difference between game $G_{1}$ and $G_{2}^{'}$ is how we model the random oracle $H_{1}$ . The outputs of $H_{1}$ is perfectly indistinguishable in both these games, therefore,

$\begin{array}{l} (19) & Pr [G_{2}^{'} = 1] = Pr [G_{1} = 1] . \end{array}$

Let us denote the event "the flag $bad$ is set to $true$ " in $G_{2}^{'}$ by $E_{1}$ . The games $G_{2}^{'}$ and $G_{2}$ are also perfectly identical unless the event $E_{1}$ occurs, and we can apply identical-until-bad technique to bound the distinguishing advantage between $G_{2}^{'}$ and $G_{2}$ .

$\begin{array}{l} (20) & | Pr [G_{2}^{'} = 1] - Pr [G_{2} = 1] | ⩽ Pr [E_{1}] . \end{array}$

The event $E_{1}$ occurs in line 7 of $Update$ protocol and in line 5 of $H_{1}$ algorithm of Fig. 18. The former captures the fact that adversary has already queried random oracle $H_{1}$ at input $(k_{w} ‖ c_{w})$ and the latter captures the fact that the adversary queries random oracle $H_{1}$ on a valid input $(k ‖ c)$ . Since, the value $k_{w}$ is picked uniformly at random and the adversary cannot do anything better than guessing the value of $k$ , the probability with which event $E_{1}$ occurs is negligible. Using (19) and (20), we can conclude:

$\begin{array}{l} (21) & | Pr [G_{1} = 1] - Pr [G_{2} = 1] | ⩽ neg (λ) . \end{array}$

Game $G_{3}$ : In $G_{3}$ , what we did for $H_{1}$ in game $G_{2}$ , we do for $H_{2}$ . Using the same arguments, we can conclude:

$\begin{array}{l} (22) & | Pr [G_{2} = 1] - Pr [G_{3} = 1] | ⩽ neg (λ) . \end{array}$

Graph: Fig. 19. Game G4 (Theorem A.2).

Game $G_{4}$ : In $G_{4}$ (see Fig. 19), we abstract out the information that needs to be simulated by the simulator in order to output transcripts identical to $G_{3}$ . Using $GetData$ algorithm in $G_{4}$ , one keeps track of the randomly generated $label$ and $pad$ differently than in $G_{3}$ . In $Search$ protocol, the random oracles are programmed identically to that in $G_{3}$ . Queries to random oracles $H_{1}$ and $H_{2}$ can be simulated by outputting random values.

The transcripts outputted by $Update$ protocol is identical to that of $Update$ protocol in $G_{3}$ , as we output fresh random strings in the $Update$ protocol.

Next, we describe the $Search$ protocol in $G_{4}$ . Based on tables $Update$ and $STs$ , the value of the following components: $empty$ , ${ver}_{w}$ , $c_{w}$ and ${{ind}_{w, c}, H_{1, w, c}, H_{2, w, c}}_{0 ⩽ c ⩽ c_{w}}$ is determined using $GetData$ algorithm. Flag $empty$ is set to 1 if $Update$ is empty. ${ver}_{w}$ is the count of searches for which there was an update on map T corresponding to keyword w after the last search. The loop in line 6 of $GetData$ determines the number of times version number is updated, i.e., value of ${ver}_{w}$ . Here, $c_{w}$ denotes the count of updates on map T corresponding to keyword w after the last search. ${{ind}_{w, c}}_{0 ⩽ c ⩽ c_{w}}$ are the document identifier values along with operation bit that have been added to T after the last search operation and the values ${H_{1, w, c}, H_{2, w, c}}_{0 ⩽ c ⩽ c_{w}}$ are used to simulate the random oracles consistently with the response given at the time of update queries. The loop in line 12 of $GetData$ computes the values of $c_{w}$ and ${{ind}_{w, c}, H_{1, w, c}, H_{2, w, c}}_{0 ⩽ c ⩽ c_{w}}$ . As the value of components are computed correctly and consistently w.r.t. previous queries, the transcripts of $Search$ protocol is identical to that of $Search$ protocol in $G_{3}$ . Therefore, we conclude that:

$\begin{array}{l} (23) & Pr [G_{4} = 1] = Pr [G_{3} = 1] . \end{array}$

Simulator $Sim$ : Finally, we construct a simulator that given the leakage profile $L$ simulates game $G_{4}$ correctly. It is easy to see that, $Sim$ can simulate $Update$ protocol correctly. Instead of using keyword w, $Sim$ uses the counter $\overline{w} = min sp (w)$ uniquely mapped from w using $L_{Search}$ in simulating the $Search$ protocol (line 6 and line 8 of Fig. 19). In line 2 of $Search$ protocol of Fig. 19, $Sim$ uses $sp (w)$ and $Hist (w)$ instead of $STs$ and $Update$ as input to the $GetData$ algorithm. Thus, $Sim$ is able to produce transcripts of output of $Search$ and $Update$ protocols identical to $G_{4}$ . Hence, we conclude that:

$\begin{array}{l} (24) & Pr [{Ideal}_{A, Sim}^{Π_{FP}} (λ) = 1] = Pr [G_{4}] . \end{array}$

By connecting all the games, we conclude

$\begin{array}{l} | Pr [{Real}_{A}^{Π_{FP}} (λ) = 1] - Pr [{Ideal}_{A, Sim}^{Π_{FP}} (λ) = 1] | ⩽ neg (λ) . \end{array}$

□

Acknowledgments

We thank the anonymous reviewers for their elaborate and insightful comments which helped us in improving the overall presentation of our work.

References 1 M.A. Abdelraheem, T. Andersson and C. Gehrmann, Inference and record-injection attacks on searchable encrypted relational databases, IACR Cryptology ePrint Archive. 2017 (2017), 24. 2 M. Bellare and P. Rogaway, The security of triple encryption and a framework for code-based game-playing proofs, in. EUROCRYPT, LNCS, Vol. 4004, Springer, 2006, pp. 409 – 426. 3 D. Boneh and B. Waters, Constrained pseudorandom functions and their applications, in. ASIACRYPT, LNCS, Vol. 8270, Springer, 2013, pp. 280 – 300. 4 R. Bost, Σo φ o ς. Forward secure searchable encryption, in. ACM CCS, ACM Press, 2016, pp. 1143 – 1154. 5 R. Bost, Searchable encryption. New constructions of encrypted databases, PhD thesis, 2018. 6 R. Bost, B. Minaud and O. Ohrimenko, Forward and backward private searchable encryption from constrained cryptographic primitives, in. ACM CCS, ACM Press, 2017, pp. 1465 – 1482. 7 E. Boyle, S. Goldwasser and I. Ivan, Functional signatures and pseudorandom functions, in. PKC, LNCS, Vol. 8383, Springer, 2014, pp. 501 – 519. 8 D. Cash, P. Grubbs, J. Perry and T. Ristenpart, Leakage-abuse attacks against searchable encryption, in. ACM CCS, ACM Press, 2015, pp. 668 – 679. 9 D. Cash, J. Jaeger, S. Jarecki, C.S. Jutla, H. Krawczyk, M. Rosu and M. Steiner, Dynamic searchable encryption in very-large databases. Data structures and implementation, in. NDSS, The Internet Society, 2014. D. Cash, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. Roşu and M. Steiner, Highly-scalable searchable symmetric encryption with support for Boolean queries, in. CRYPTO, LNCS, Vol. 8042, Springer, 2013, pp. 353 – 373. G. Chamani, SSE, https://github.com/jgharehchamani/SSE (Accessed. 2019-10-08). Y. Chang and M. Mitzenmacher, Privacy preserving keyword searches on remote encrypted data, in. ACNS, Lecture Notes in Computer Science, Vol. 3531, 2005, pp. 442 – 455. M. Chase and S. Kamara, Structured encryption and controlled disclosure, in. ASIACRYPT, LNCS, Vol. 6477, Springer, 2010, pp. 577 – 594. R. Curtmola, J.A. Garay, S. Kamara and R. Ostrovsky, Searchable symmetric encryption. Improved definitions and efficient constructions, in. ACM CCS, ACM Press, 2006, pp. 79 – 88. Enron Email dataset, https://www.cs.cmu.edu/~enron/ (Accessed. 2018-05-14). M. Etemad, A. Küpçü, C. Papamanthou and D. Evans, Efficient dynamic searchable encryption with forward privacy, PoPETs. 2018 (1) (2018), 5 – 20. S. Faber, S. Jarecki, H. Krawczyk, Q. Nguyen, M. Rosu and M. Steiner, Rich queries on encrypted data. Beyond exact matches, in. ESORICS, LNCS, Vol. 9327, Springer, 2015, pp. 123 – 145. Facebook, RocksDB. A persistent key-value store for fast storage environment, https://rocksdb.org/ (Accessed. 2018-05-14). S. Garg, P. Mohassel and C. Papamanthou, TWORAM. efficient oblivious RAM in two rounds with applications to searchable encryption, in. CRYPTO, LNCS, Vol. 9816, Springer, 2016, pp. 563 – 592. C. Gentry, A fully homomorphic encryption scheme, PhD thesis, Stanford, CA, USA, 2009. ISBN 978-1-109-44450-6. J. Ghareh Chamani, D. Papadopoulos, C. Papamanthou and R. Jalili, New constructions for forward and backward private symmetric searchable encryption, in. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS '18, ACM, 2018, pp. 1038 – 1055. O. Goldreich, S. Goldwasser and S. Micali, How to construct random functions (extended abstract), in. FOCS, IEEE Computer Society Press, 1984, pp. 464 – 479. O. Goldreich and R. Ostrovsky, Software Protection and Simulation on Oblivious RAMs, J. ACM (1996), 431–473. doi. 10.1145/233551.233553. S. Goldwasser, S. Micali and R.L. Rivest, A "paradoxical" solution to the signature problem (extended abstract), in. FOCS, IEEE Computer Society, 1984, pp. 441 – 448. S. Goldwasser, S. Micali and R.L. Rivest, A digital signature scheme secure against adaptive chosen-message attacks, SIAM J. Comput.. 17 (2) (1988), 281 – 308. doi. 10.1137/0217017. M.D. Green and I. Miers, Forward secure asynchronous messaging from puncturable encryption, in. IEEE Symposium on Security and Privacy, IEEE Computer Society Press, 2015, pp. 305 – 320. S. Kamara and T. Moataz, Boolean searchable symmetric encryption with worst-case sub-linear complexity, in. EUROCRYPT, LNCS, Vol. 10212, Springer, 2017, pp. 94 – 124. S. Kamara, C. Papamanthou and T. Roeder, Dynamic searchable symmetric encryption, in. ACM CCS, ACM Press, 2012, pp. 965 – 976. A. Kiayias, S. Papadopoulos, N. Triandopoulos and T. Zacharias, Delegatable pseudorandom functions and applications, in. ACM CCS, ACM Press, 2013, pp. 669 – 684. K.S. Kim, M. Kim, D. Lee, J.H. Park and W. Kim, Forward secure dynamic searchable symmetric encryption with efficient updates, in. ACM CCS, ACM Press, 2017, pp. 1449 – 1463. N. Koblitz and A. Menezes, Another look at security definitions, Adv. in Math. of Comm.. 7 (1) (2013), 1 – 38. doi. 10.3934/amc.2013.7.1. A. Menezes and N.P. Smart, Security of signature schemes in a multi-user setting, Des. Codes Cryptography. 33 (3) (2004), 261 – 274. doi. 10.1023/B:DESI.0000036250.18062.3f. M. Naveed, The fallacy of composition of oblivious RAM and searchable encryption, IACR Cryptology ePrint Archive. 2015 (2015), 668. M. Naveed, M. Prabhakaran and C.A. Gunter, Dynamic searchable encryption via blind storage, in. IEEE Symposium on Security and Privacy, IEEE Computer Society Press, 2014, pp. 639 – 654. T. Pornin and J.P. Stern, Digital signatures do not guarantee exclusive ownership, in. ACNS, LNCS, Vol. 3531, 2005, pp. 138 – 150. NLTK Project, Natural Language Toolkit, https://www.nltk.org/ (Accessed. 2018-05-14). D.X. Song, D. Wagner and A. Perrig, Practical techniques for searches on encrypted data, in. IEEE Symposium on Security and Privacy, IEEE Computer Society Press, 2000, pp. 44 – 55. X. Song, C. Dong, D. Yuan, Q. Xu and M. Zhao, Forward private searchable symmetric encryption with optimized I/O efficiency, in. IEEE Transactions on Dependable and Secure Computing, 2018. E. Stefanov, C. Papamanthou and E. Shi, Practical dynamic searchable encryption with small leakage, in. NDSS, The Internet Society, 2014. E. Stefanov, M. van Dijk, E. Shi, C.W. Fletcher, L. Ren, X. Yu and S. Devadas, Path ORAM. An extremely simple oblivious RAM protocol, in. ACM CCS, ACM Press, 2013, pp. 299 – 310. S.-F. Sun, X. Yuan, J.K. Liu, R. Steinfeld, A. Sakzad, V. Vo and S. Nepal, Practical backward-secure searchable encryption from symmetric puncturable encryption, in. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS '18, ACM, 2018, pp. 763 – 780. The OpenSSL Project, OpenSSL Cryptography and SSL/TLS Toolkit, https://www.openssl.org/ (Accessed. 2018-05-14). S. Vaudenay, Digital signature schemes with domain parameters. Yet another parameter issue in ECDSA, in. ACISP, LNCS, Vol. 3108, Springer, 2004, pp. 188 – 199. Y. Zhang, J. Katz and C. Papamanthou, All your queries are belong to us. The power of file-injection attacks on searchable encryption, in. USENIX Security Symposium, USENIX Association, 2016, pp. 707 – 720.

By Sanjit Chatterjee; Shravan Kumar Parshuram Puria and Akash Shah

Reported by Author; Author; Author

Titel:	Efficient backward private searchable encryption
Autor/in / Beteiligte Person:	Shravan Kumar Parshuram Puria ; Shah, Akash ; Chatterjee, Sanjit
Link:	Volltext (PDF) View record in OpenAIRE (Volltext) https://doi.org/10.3233/jcs-191322
Zeitschrift:	Journal of Computer Security, Jg. 28 (2020-03-17), S. 229-267
Veröffentlichung:	IOS Press, 2020
Medientyp:	unknown
ISSN:	1875-8924 (print) ; 0926-227X (print)
DOI:	10.3233/jcs-191322
Schlagwort:	Scheme (programming language) Cryptographic primitive Theoretical computer science Computer Networks and Communications business.industry Computer science 05 social sciences 020206 networking & telecommunications Cryptography 02 engineering and technology Encryption Symmetric-key algorithm Hardware and Architecture Information leakage 0202 electrical engineering, electronic engineering, information engineering Overhead (computing) 0501 psychology and cognitive sciences Safety, Risk, Reliability and Quality business Communication complexity computer Software 050104 developmental & child psychology computer.programming_language
Sonstiges:	Nachgewiesen in: OpenAIRE

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.