毕设英译汉原文协作数据共享系统的可靠存储和(5)

时间：2025-07-06

before any node failures,and failed few enough nodes that data was never lost.For completeness we plan to implement the Bloomﬁlter-based background replication approach of the Pastry-based PAST storage system[14],which can be directly applied to our context.

IV.V ERSIONED D ATA S TORAGE

Recall from our earlier discussion that O RCHESTRA sup-ports a batched publish/import cycle,where each participant stores its own updates in the CDSS,disjoint from all others. There is no need for traditional concurrency control mech-anisms,as conﬂicts among concurrent updates are resolved during the import stage(via reconciliation)by the participant. However,there is indeed a notion of global consistency.We assign a logical timestamp(epoch)that advances after each batch of updates is published by a peer.When a participant performs an import or poses a distributed query,it is with respect to the data available at the speciﬁc epoch in which the import starts.The participant should receive the effects of all state published up to that epoch,and no state published thereafter(until its next import).The current epoch can be determined through a simple“gossip”protocol and does not require a single point of failure.

Of course,in order to support queries over versioned data, we must develop a storage and access layer capable of managing such data.There are several key challenges here:•Between database versions,we want to efﬁciently reuse storage for data values that have not changed.

•We must track which tuples belong to the desired version of a database.Such metadata should be co-located with the data in a way that minimizes the need for communi-cation during query operation.

•Each tuple must be uniquely identiﬁable using a tuple identiﬁer that includes its version.Yet,for efﬁciency of computation,we must partition data along a set of key attributes(as with a clustered index).It must be possible to convert from the tuple ID to the tuple key,so that

a tuple can be retrieved by its ID;therefore a tuple’s

hash key must be derived from(possibly a subset of)the attributes in its ID.

We maintain all versions of the database in a log-like structure across the participants:instead of replacing a tuple, we simply update our records to include the new version rather than the old version,which remains in storage.Disk space is rarely a constraint today,and the beneﬁts of full versioning, such as support for historical queries,typically outweigh the drawbacks.

Each node,therefore,may contain many versions of each tuple.If the set of nodes is inﬂux,nodes may come and go between when a tuple is inserted or updated and when it is used in a query;therefore,a node may not have the correct version of a particular tuple.We assume that background replication is sufﬁcient to ensure that each tuple exists somewhere in the system,but that it may not exist where the standard content-addressable networking scheme canﬁnd it.The key to our approach is a hierarchical structure that maps from a point in time to the collection of tuple IDs present in a relation at that

Fig.3.Storage scheme to ensure version consistency and efﬁcient retrieval.

Rounded rectangles indicate the key used to contact each node(whose state is indicated with squared rectangles).

time.This collection is used during processing to detect which tuples are missing or stale,and must therefore be retrieved from another node in the system.

Figure3shows the main data structures used to ensure con-sistency.All data structures are replicated using the underlying network substrate,so failure of any node will cause all of its functionality to be assumed transparently by one or more neighboring nodes.We distribute all tuples are according to a hashing scheme.Relations are divided into versioned pages, each of which represents a partition over the space of possible tuple keys’hash values.Tuples assigned to the same page will be likely be co-located on a single node,or span two nodes in the worst case.As an optimization,we place the index node entry at the same node as the tuples it references, by storing the index page at the middle of the range of tuple keys it encompasses.This is why the network substrate,as discussed in Section III-A,assigns a large,contiguous region in the key space to each node;it means that the vast majority of tuple keys are never sent over the network.If each node is responsible for many smaller ranges,this is no longer the case,and performance suffers.

When requesting a given relation at a given epoch,the storage system hashes these values to get the address of a relation coordinator,who has a list of the pages in the relation at that epoch.The system uses this list,which contains the hash ID associated with each page,toﬁnd the index nodes that contain these pages.From the index nodes,the system retrieves the tuple IDs belonging to the relation at the epoch, which are used to retrieve the full versions of all the tuples in the relation from the data storage node.Recall that as the pages are colocated with most of the tuples they reference, typically a single node serves as both the index node and the data storage node for an entire page,reducing network trafﬁc and improving performance.

Our scheme is designed to efﬁciently support small changes to tables.Modifying a tuple in a relation requires us to look up the page holding the old version of the tuple using an inverse node,modify that page to include the ID of the new tuple,and write out that modiﬁed page as the new index page for the region of the table surrounding the updated tuple.The entire contents of the new tuple must also be written out to the network.The system then creates a new version record linking to the updated index page,and all of the unaffected pages from the previous version.

We were initially inspired byﬁlesystem i-nodes,th …… 此处隐藏：3901字，全部文档内容请下载后查看。喜欢就下载吧 ……

毕设英译汉原文协作数据共享系统的可靠存储和(5).doc 将本文的Word文档下载到电脑

下载这篇word文档