Authenticated Keyword Search in Scalable Hybrid-Storage Blockchains Note
Abstract
- We first propose the Suppressed Merkle inverted (
) index, which maintains only a partial ADS structure on-chain that can be securely updated with a logarithmsized(对数时间复杂度) cryptographic proof. - Moreover, we propose a Chameleon inverted (
) index that leverages the chameleon vector commitment to achieve a constant maintenance cost. - It is further optimized with Bloom filters to enhance the query and verification performance.
Introduction
- The basic idea is to let both the on-chain smart contract and the off-chain SP maintain an authenticated data structure (ADS).
- For each query, the SP computes not only the query results but also a cryptographic proof known as verification object (VO) to attest to those results.
- A key challenge here is that the ADS should be designed to be update efficient.
- ADS should be able to be efficiently maintained by the smart contract, employs a unique gas model to measure the execution costs.
- In a pioneering study [11], a gas-efficient ADS named GEM 2-tree has been proposed.
- First, it is developed for range queries and cannot be directly used to support other more complex queries such as keyword search and similarity search.
- Second, as we will elaborate later on, it still incurs a relatively high gas cost due to storing many intermediate data in the blockchain state.
- The
index is an inverted index in which each keyword corresponds to a Merkle B-tree (MBtree) [5] that indexes the corresponding object IDs. - Each conjunctive component can then be processed as authenticated join queries over the query keywords’ MB-trees.
- However, maintaining the complete Merkle inv index on-chain by the smart contract is prohibitively expensive due to the excessive storage operations.
Suppressed index
The main idea is that the smart contract maintains only the root digest of each MB-tree in the
- Only the root digests are used during the authenticated keyword search.
- The maintenance of the Suppressed Merkle inv index requires constant (costly) storage operations and logarithmic (cheap) memory operations, achieving an overall lower gas cost.
Chameleon inverted ( ) index
Using the CVC, one can publicly verify the data stored in a vector. Moreover, the data owner can update the vector without changing its commitment by using a secret trapdoor.
- The smart contract maintains the root commitments with some auxiliary data of the
index
An optimized $Chameleon^{inv∗}$ index:
- The smart contract maintains an additional Bloom filter for every group of objects in each Chameleon tree.
- The Bloom filters help to facilitate the testing of non-existing objects, thus achieving better query and verification performance.
- On the other hand, due to the additional maintenance of the Bloom filters, the
$Chameleon^{inv∗}$ index has a slightly higher maintenance cost.
Paper’s contributions
- For the first time in the literature, we study the problem of authenticated keyword search in scalable hybrid-storage blockchains.
- We propose the Suppressed
index that can significantly reduce the ADS maintenance cost in terms of the smart contract’s gas consumption. - We develop the
index to further reduce the ADS maintenance cost to a constant level while still supporting efficient queries. To enhance the query and verification performance, an optimized Chameleon inv∗ index is also proposed.
BACKGROUND AND PRELIMINARIES
- Since the blockchain network needs to spend computation and storage resources for the smart contract deployment and execution, a transaction fee, denominated in gas, is charged for each deployment and execution.
- In general, the storage operations are a multitude more expensive than the in-memory data access and computation operations.
- In order to prevent computation-intensive and non-stop smart contracts, a gasLimit (e.g., 8,000,000 in [10]) is set. The deployment or execution of a smart contract will be terminated if the total gas consumption exceeds the gasLimit.
Merkle Hash Tree (MHT)
An MHT is a data structure for authentication and verification [12]. Each leaf node stores the digest of the indexed object. Each internal node stores a hash that is derived from its two child nodes, e.g.,
The root hash is signed by the data owner and made public. The MHT can be used to authenticate any subset of the data objects stored in the leaf nodes in logarithmic complexity.
The MHT can be used to authenticate any subset of the data objects stored in the leaf nodes in logarithmic complexity.
The MHT is a binary tree. To support authenticated queries in relational databases, it has been extended to multi-way Merkle B-tree (MB-tree), which follows the
Vector Commitment (VC)
A VC maps a vector of messages ( 〈 m 1 , m 2 , · · · , m q 〉 ) to a fixed-sized commitment, which can be opened at a specific position.
A VC scheme consists of several polynomial-time algorithms:
- Gen(
, q): The key generation function returns the public parameters with the input of the security parameter and the vector size . $Com_{pp}(< m_1, m_2, · · · , m_q> , r)$: The commitment function takes a vector of messages and a random number as input and outputs a commitment with some auxiliary information : The opening function returns a proof m is the i-th message in a commitment w.r.t. . : The verification function returns 1 passes the verification that c is computed based on a sequence of messages with m at position .
Chameleon Vector Commitment (CVC)
A CVC is a trapdoor commitment scheme [14]. A user who owns the private key can update a message in a vector without changing the vector’s commitment. The key generation function
$CCol_{pp}(c, i, m, m^{\'}, td, aux)$: The collision finding algorithm computes a new $aux ′$ such that $aux ′ $corresponds to a vector of messages with $m ′$ instead of at position and the commitment remains to be .
PROBLEM FORMULATION
System Model
Involve Parties:
- data owner (DO)
- hybridstorage blockchain
- blockchain with smart contract functionality
- an off-chain storage service provider (SP)
- query clients
Data object model
data object is modeled as a tuple:
Arrival of a new object
- DO sends
to the SP but only the meta-data $〈 id, { w_j} , h(o_i) 〉$ to the blockchain is the hash value of meta-data
Authenticated query processing
Backgroud:
- authenticated data structure (ADS) is maintained by both the smart contract and the SP.
- DO invokes the smart contract to update the on-chain ADS.
- ADS in the SP is updated in synchronous with the one stored on-chain.
- The digests of the ADS become the authenticated information that is shared by both the smart contract and the SP.
Works:
- The client sends a keyword search request to the SP, which uses the ADS to compute the results as well as a verification object (
) that can be used by the client to verify the results. - Both the results and
are returned to the client. - Result verification
- The client first retrieves the authenticated digests (
) from the blockchain. - the client can verify the correctness of the results by combining
and .
- The client first retrieves the authenticated digests (
Threat Model
- off-chain SP is not fully trusted and can behave arbitrarily.
- It may add, delete, or modify the query results intentionally or unintentionally.
- The client needs to check the following two security criteria:
- Soundness: All of the returned results satisfy the query condition and are originated from the DO;
- Completeness: No valid result is missing.
Problem Statement
- How to design the ADS that can be efficiently maintained by the on-chain smart contract
- In terms of the gas cost, while effectively supporting authenticated keyword search in hybrid-storage blockchains.
SUPPRESSED INDEX
Baseline Solution --- index
inverted index
- A dictionary of keywords, each pointing to a list of objects that contain the keyword.
- Build an MB-tree for each keyword’s object list.
Overview of the Suppressed Index
The
The general idea of the Suppressed
- Only the root hashes of the MB-trees have been stored on chain to minimize the maintenance cost.
- SP still maintains the complete structures of the MB-trees to support efficient query processing.
- Ask the off-chain SP to construct an update proof, denoted as
, during a new object’s insertion , to maintain the root hash of each MB-tree without knowing the complete structure
Details of the Suppressed Index
- Consists of the right-most branch of the MB-tree
- Algorithm 1:UpdVO
After the verification of the
INDEX
Overview of the Index
A Chameleon tree is a q-ary tree in which each node (except the root) corresponds to a data object.
Example:
Every node in the tree is associated with a node position, which is assigned by counting tree nodes using the order of left to right and then top to bottom.
For example, the node
- It is worth noting that the commitment of each node is predetermined and independent of the stored data, although the corresponding vector can be updated using a trapdoor during object insertions.
- On the other hand, a new node with the inserted object is always linked to its parent node
from left to right. - Thus, given the total count of objects
, the structure of a Chameleon tree is decided.
- On the other hand, a new node with the inserted object is always linked to its parent node
- The value of
needs to be maintained, which incurs a constant cost, in order to prevent an adversary from returning the query results based on an outdated index. index each keyword corresponds to a Chameleon tree. - For each Chameleon tree, the smart contract only needs to maintain the (invariant) root commitment and the total object count
.
- For each Chameleon tree, the smart contract only needs to maintain the (invariant) root commitment and the total object count
Maintenance of the Index
Setup
- DO to create the initial Chameleon tree
- Generating the necessary keys, including the PRF key
, the CVC trapdoor , and its public parameter (lines 2-3). - trapdoor needs to be kept secret by the DO
- After the key generation, the commitment of the root node,
, is computed (lines 4-5). - Keyword
and the root commitment are sent to both the blockchain and the SP (line 6).
Insert
- Creating a new node for
- New node at
is allocated in the Chameleon tree and its predetermined commitment is computed (lines 2-3). - DO finds the collision of
using the trapdoor and updates the first element of the CVC to the object’s hash (line 4). - Proof
is generated (line 5).
- New node at
- Linking the new node to its corresponding parent node.
- DO first locates the parent node
and the corresponding index in the CVC (line 6). ’s pre-determined commitment is computed and a collision is found to update its j-th element to (lines 7-8). - corresponding proof
is computed to attest to that is the $(j − 1)^ {th}$ child of (line 9). - Denote
$〈 c_{pos} , π_{pos,1} , ρ_{par,j} 〉$ as the insertion proof of the new object . - DO sends the insertion proof to the SP and the updated cnt value to the blockchain (lines 10-11).
- DO first locates the parent node
Example:
Authenticated Keyword Search with Index
To prove the existence of an object at position
- The membership verification is processed recursively in a bottom-up fashion.
Algorithm 5: The case of joining two Chameleon trees
-
Similar to the MB-trees’ join explained in Section II-C.
- The major differbience is that the Chameleon tree is indexed by each object’s node position, rather than the object ID.
-
To facilitate the join processing, the SP builds a hash map locally to keep track of the mapping between the positions and the object IDs for each Chameleon tree.
-
The join query is processed in rounds between the two trees, the membership proofs
- the target
- the matching and boundary objects are added to the
.
-
The two trees switch their roles to find the matching objects until one tree’s end is reached.
Algorithm 6: Procedure of the client’s verification
- The client retrieves the root commitment
and count (denoted as ) for each relevant Chameleon tree. - the client checks the V O spfor each round of join processing. For each round:
- checks the position of this round’s target is 1 or equals the previous round’s boundary position
- verifies the integrity of the two boundary objects using their membership proofs
- verifies the integrity of the target object with its membership proof and adds the object as a result if its ID matches the left boundary’s ID
- verifies that the target object’s ID is between the boundary objects’ IDs
- For the last round, the client additionally checks the termination position is no smaller than the object count
.