Authenticated Keyword Search in Scalable Hybrid-Storage Blockchains Note

  • Authenticated Keyword Search in Scalable Hybrid-Storage Blockchains Note


    • We first propose the Suppressed Merkle inverted (MerkleinvMerkle^{inv} ) index, which maintains only a partial ADS structure on-chain that can be securely updated with a logarithmsized(对数时间复杂度) cryptographic proof.
    • Moreover, we propose a Chameleon inverted (ChameleoninvChameleon^{inv} ) index that leverages the chameleon vector commitment to achieve a constant maintenance cost.
      • It is further optimized with Bloom filters to enhance the query and verification performance.


    • The basic idea is to let both the on-chain smart contract and the off-chain SP maintain an authenticated data structure (ADS).
      • For each query, the SP computes not only the query results but also a cryptographic proof known as verification object (VO) to attest to those results.
    • A key challenge here is that the ADS should be designed to be update efficient.
      • ADS should be able to be efficiently maintained by the smart contract, employs a unique gas model to measure the execution costs.
    • In a pioneering study [11], a gas-efficient ADS named GEM 2-tree has been proposed.
      • First, it is developed for range queries and cannot be directly used to support other more complex queries such as keyword search and similarity search.
      • Second, as we will elaborate later on, it still incurs a relatively high gas cost due to storing many intermediate data in the blockchain state.
    • The MerkleinvMerkle^{inv}index is an inverted index in which each keyword corresponds to a Merkle B-tree (MBtree) [5] that indexes the corresponding object IDs.
      • Each conjunctive component can then be processed as authenticated join queries over the query keywords’ MB-trees.
      • However, maintaining the complete Merkle inv index on-chain by the smart contract is prohibitively expensive due to the excessive storage operations.

    Suppressed MerkleinvMerkle^{inv}index

    The main idea is that the smart contract maintains only the root digest of each MB-tree in the MerkleinvMerkle ^{inv}index.

    • Only the root digests are used during the authenticated keyword search.
    • The maintenance of the Suppressed Merkle inv index requires constant (costly) storage operations and logarithmic (cheap) memory operations, achieving an overall lower gas cost.

    Chameleon inverted (ChameleoninvChameleon^{inv}) index

    Using the CVC, one can publicly verify the data stored in a vector. Moreover, the data owner can update the vector without changing its commitment by using a secret trapdoor.

    • The smart contract maintains the root commitments with some auxiliary data of the ChameleoninvChameleon^{inv} index

    An optimized $Chameleon^{inv∗}$ index:

    • The smart contract maintains an additional Bloom filter for every group of objects in each Chameleon tree.
      • The Bloom filters help to facilitate the testing of non-existing objects, thus achieving better query and verification performance.
    • On the other hand, due to the additional maintenance of the Bloom filters, the $Chameleon^{inv∗}$ index has a slightly higher maintenance cost.

    Paper’s contributions

    • For the first time in the literature, we study the problem of authenticated keyword search in scalable hybrid-storage blockchains.
    • We propose the Suppressed MerkleinvMerkle^{inv} index that can significantly reduce the ADS maintenance cost in terms of the smart contract’s gas consumption.
    • We develop the ChameleoninvChameleon^{inv} index to further reduce the ADS maintenance cost to a constant level while still supporting efficient queries. To enhance the query and verification performance, an optimized Chameleon inv∗ index is also proposed.


    • Since the blockchain network needs to spend computation and storage resources for the smart contract deployment and execution, a transaction fee, denominated in gas, is charged for each deployment and execution.
      • In general, the storage operations are a multitude more expensive than the in-memory data access and computation operations.
      • In order to prevent computation-intensive and non-stop smart contracts, a gasLimit (e.g., 8,000,000 in [10]) is set. The deployment or execution of a smart contract will be terminated if the total gas consumption exceeds the gasLimit.

    Merkle Hash Tree (MHT)

    An MHT is a data structure for authentication and verification [12]. Each leaf node stores the digest of the indexed object. Each internal node stores a hash that is derived from its two child nodes, e.g., h5=h(h1h2)h_5 = h(h_1 || h_2 ), where ‘ || ’ denotes the concatenation operation.

    The root hash is signed by the data owner and made public. The MHT can be used to authenticate any subset of the data objects stored in the leaf nodes in logarithmic complexity.

    The MHT can be used to authenticate any subset of the data objects stored in the leaf nodes in logarithmic complexity.

    The MHT is a binary tree. To support authenticated queries in relational databases, it has been extended to multi-way Merkle B-tree (MB-tree), which follows theB+tree B^+ -tree structure and augments each index entry with a corresponding hash [5].

    Vector Commitment (VC)

    A VC maps a vector of messages ( 〈 m 1 , m 2 , · · · , m q 〉 ) to a fixed-sized commitment, which can be opened at a specific position.

    A VC scheme consists of several polynomial-time algorithms:

    • Gen(1λ1^λ, q): The key generation function returns the public parameters pppp with the input of the security parameter λλ and the vector size qq.
    • $Com_{pp}(< m_1, m_2, · · · , m_q> , r)$: The commitment function takes a vector of qq messages and a random number rr as input and outputs a commitment cc with some auxiliary information auxaux
    • Openpp(i,m,aux)Open_{pp}(i, m, aux): The opening function returns a proof π iffπ\ iff m is the i-th message in a commitment w.r.t. cc.
    • Verpp(c,i,m,π)Ver_{pp}(c, i, m, π): The verification function returns 1 iffπiff π passes the verification that c is computed based on a sequence of messages with m at position ii.

    Chameleon Vector Commitment (CVC)

    A CVC is a trapdoor commitment scheme [14]. A user who owns the private key can update a message in a vector without changing the vector’s commitment. The key generation function CGen(1λ,q)CGen(1^λ , q) is slightly different from the Gen(1λ,q)Gen(1^λ , q) with an additional output of trapdoor tdtd.

    • $CCol_{pp}(c, i, m, m^{\'}, td, aux)$: The collision finding algorithm computes a new $aux ′$ such that $aux ′ $corresponds to a vector of qq messages with $m ′$ instead of mm at position ii and the commitment remains to be cc.


    System Model

    Involve Parties:

    • data owner (DO)
    • hybridstorage blockchain
      • blockchain with smart contract functionality
      • an off-chain storage service provider (SP)
    • query clients

    Data object model

    data object is modeled as a tuple:

    $o_i = <id, {w_j}, v>, (1 < j < m)$

    idid = object's ID

    ${w_j} (1<j<m)= $ a set of keywords associated with the object

    v=v = content of the object

    Arrival of a new object oio_i

    • DO sends oio_i to the SP but only the meta-data $〈 id, { w_j} , h(o_i) 〉$ to the blockchain
      • h(i)h(i) is the hash value of meta-data

    Authenticated query processing


    • authenticated data structure (ADS) is maintained by both the smart contract and the SP.
      • DO invokes the smart contract to update the on-chain ADS.
      • ADS in the SP is updated in synchronous with the one stored on-chain.
    • The digests of the ADS become the authenticated information that is shared by both the smart contract and the SP.


    • The client sends a keyword search request to the SP, which uses the ADS to compute the results as well as a verification object (VOspVO_{sp} ) that can be used by the client to verify the results.
    • Both the results and VOspVO_{sp} are returned to the client.
    • Result verification
      • The client first retrieves the authenticated digests (VOchainVO_{chain}) from the blockchain.
      • the client can verify the correctness of the results by combining VOchainV O_{chain} and VOspV O_{sp}.

    Threat Model

    • off-chain SP is not fully trusted and can behave arbitrarily.
      • It may add, delete, or modify the query results intentionally or unintentionally.
    • The client needs to check the following two security criteria:
      • Soundness: All of the returned results satisfy the query condition and are originated from the DO;
      • Completeness: No valid result is missing.

    Problem Statement

    • How to design the ADS that can be efficiently maintained by the on-chain smart contract
    • In terms of the gas cost, while effectively supporting authenticated keyword search in hybrid-storage blockchains.


    Baseline Solution --- MerkleinvMerkle^{inv} index

    inverted index

    • A dictionary of keywords, each pointing to a list of objects that contain the keyword.
    • Build an MB-tree for each keyword’s object list.

    Overview of the Suppressed MerkleinvMerkle^{inv}Index

    The MerkleinvMerkle^{inv} index suffers from a high maintenance cost since the complete MB-trees are maintained by the smart contract.

    The general idea of the Suppressed MerkleinvMerkle^{inv} index is to fully suppress the on-chain MB-trees.

    • Only the root hashes of the MB-trees have been stored on chain to minimize the maintenance cost.
    • SP still maintains the complete structures of the MB-trees to support efficient query processing.
    • Ask the off-chain SP to construct an update proof, denoted as UpdVOUpdVO, during a new object’s insertion , to maintain the root hash of each MB-tree without knowing the complete structure

    Details of the Suppressed MerkleinvMerkle^{inv} Index

    UpdVOUpdV O

    • Consists of the right-most branch of the MB-tree
    • Algorithm 1:UpdVO

    截屏2020-10-28 15.10.35

    After the verification of the UpdVOUpdVO, the smart contract uses it to update the root hash of the MB-tree using Algorithm 2.


    Overview of the ChameleoninvChameleon^{inv} Index

    A Chameleon tree is a q-ary tree in which each node (except the root) corresponds to a data object.


    截屏2020-11-02 10.13.43

    Every node in the tree is associated with a node position, which is assigned by counting tree nodes using the order of left to right and then top to bottom.

    For example, the node ns3n_{s3} in Fig. 8 stores the object hash h(4)h(4) (the object ID is used to denote the object s for simplicity), the node commitment cs3c_{s3} , the proof π3,1π_{3,1} for verifying h(4)h(4) is stored in s3s_3 , and the proof ρ1,2sρ_{1,2}^s for verifying s3s_3 is the first child of its parent s1s_1 .

    • It is worth noting that the commitment of each node is predetermined and independent of the stored data, although the corresponding vector can be updated using a trapdoor during object insertions.
      • On the other hand, a new node with the inserted object is always linked to its parent node nparn_{par} from left to right.
      • Thus, given the total count of objects cntcnt, the structure of a Chameleon tree is decided.
    • The value of cntcnt needs to be maintained, which incurs a constant cost, in order to prevent an adversary from returning the query results based on an outdated index.
    • ChameleoninvChameleon^{inv} index each keyword corresponds to a Chameleon tree.
      • For each Chameleon tree, the smart contract only needs to maintain the (invariant) root commitment and the total object count cntcnt.

    Maintenance of the ChameleoninvChameleon^{inv} Index


    • DO to create the initial Chameleon tree
    • Generating the necessary keys, including the PRF key sksk, the CVC trapdoor td td, and its public parameter pppp (lines 2-3).
      • trapdoor needs to be kept secret by the DO
    • After the key generation, the commitment of the root node, c0c_0, is computed (lines 4-5).
    • Keyword ww and the root commitment c0c_0 are sent to both the blockchain and the SP (line 6).

    截屏2020-11-02 13.51.09


    • Creating a new node for oio_i
      • New node at pospos is allocated in the Chameleon tree and its predetermined commitment cposc_{pos} is computed (lines 2-3).
      • DO finds the collision of cposc_{pos} using the trapdoor tdtd and updates the first element of the CVC to the object’s hash h(oi)h(o_i ) (line 4).
      • Proof πpos,1π_{pos,1} is generated (line 5).
    • Linking the new node to its corresponding parent node.
      • DO first locates the parent node nparn_{par} and the corresponding index jj in the CVC (line 6).
      • nparn_{par} ’s pre-determined commitment cparc_{par} is computed and a collision is found to update its j-th element to cposc_{pos} (lines 7-8).
      • corresponding proof ρpar,jρ_{par,j} is computed to attest to that nposn_{pos} is the $(j − 1)^ {th}$ child of nparn_{par} (line 9).
      • Denote $〈 c_{pos} , π_{pos,1} , ρ_{par,j} 〉$ as the insertion proof of the new object oio_i.
      • DO sends the insertion proof to the SP and the updated cnt value to the blockchain (lines 10-11).

    截屏2020-11-02 14.41.20


    截屏2020-11-02 14.51.08

    Authenticated Keyword Search with ChameleoninvChameleon^{inv} Index

    To prove the existence of an object at position pospos in the Chameleon tree, the SP can generate a membership proof, Π, by including the insertion proofs of the object at pospos and all its ancestor nodes except the root.

    • The membership verification is processed recursively in a bottom-up fashion.

    Algorithm 5: The case of joining two Chameleon trees

    • Similar to the MB-trees’ join explained in Section II-C.

      • The major differbience is that the Chameleon tree is indexed by each object’s node position, rather than the object ID.
    • To facilitate the join processing, the SP builds a hash map locally to keep track of the mapping between the positions and the object IDs for each Chameleon tree.

    • The join query is processed in rounds between the two trees, the membership proofs

      • the target
      • the matching and boundary objects are added to the VOspVO_{sp}.
    • The two trees switch their roles to find the matching objects until one tree’s end is reached.

    截屏2020-11-02 17.05.13

    Algorithm 6: Procedure of the client’s verification

    • The client retrieves the root commitment c0c_0 and count cntcnt (denoted as VOchainV O_{chain} ) for each relevant Chameleon tree.
    • the client checks the V O spfor each round of join processing. For each round:
      • checks the position of this round’s target is 1 or equals the previous round’s boundary position
      • verifies the integrity of the two boundary objects using their membership proofs
      • verifies the integrity of the target object with its membership proof and adds the object as a result if its ID matches the left boundary’s ID
      • verifies that the target object’s ID is between the boundary objects’ IDs
      • For the last round, the client additionally checks the termination position is no smaller than the object count cntcnt.

    截屏2020-11-02 21.03.07

Log in to reply

Copyright © 2018 All rights reserved.

Looks like your connection to Dian was lost, please wait while we try to reconnect.