毕设英译汉原文协作数据共享系统的可靠存储和(6)

时间：2025-07-06

Fig.4.Data is partitioned across nodes by the key (theﬁrst attribute),which is a subset of the Tuple ID.Redundant copies of replicated data are not shown.The left brackets indicate which nodes a tuple is stored on,while the right brackets indicate which index page a tuple’s ID is

on.

pend operations and small changes,the page-level data in a largeﬁle mostly remains unchanged.Such schemes all make use of a versioned system for tracking the contents of aﬁle, which greatly resembles our index nodes.A direct translation of the CFS approach would create a small number of index node entries with tuple IDs arranged in the order they appear in the table;retrieval of the referenced tuples would require communication with many data storage nodes.We instead use a slightly higher number of entries representing partitions of the tuple space;each such page can be retrieved from one or at most a few data storage nodes.

A key property we adopt from CFS is that,once there is enough information to begin a request(i.e.the current epoch has been determined),it is always clear what data should be present in the distributed storage layer.Therefore stale data will never be retrieved.If expected data is not found at the node that should own it,this is likely due to network churn. The request can either be retried after background replication has moved state around,or the system can proactively try to retrieve the missing state from other nearby nodes. Example4.1:Suppose we have three participants,each storing a partition of a simple,one-table database,R(x,y), where x is the key and y is a non-key attribute.Node n1is responsible for the range[0x00...,0x55...],n2for [0x55...,0xAA...],and n3[0xAA...,0x00...].In theﬁrst epoch(epoch0),a participant inserts the tuples R(a,b)and R(f,z).In epoch1,someone inserts R(b,c),R(e,e),and R(c,f)while also changing R(f,z)to R(f,a).In epoch2, someone inserts R(d,d).Theﬁnal state of the system is shown in Figure4.The Tuple ID is the key attribute of a tuple and the epoch in which it was last modiﬁed,e.g., f,1 for R(f,a). The index page ID consists of the relation name,the epoch in which it was last modiﬁed,and a unique identiﬁer for that relation and epoch,which is0for both example pages here. It also includes the hash ID where the index page is stored. Pseudocode for performing a lookup appears as Algo-

rithm1.Retrieval starts at the relation coordinator for the requested epoch,from which a list of index nodes can be obtained.It sends a scan request to each index node,along with the sargable predicate.The index nodes apply the sargable predicates to the list of tuples for each index page,and requests that the matching tuples be retrieved.This operation is highly parallelizable;the only operation done at a single node is the sending of the scan requests,which is very fast.

Example4.2:Figure5shows how the lookup procedure works for our example instance.First,the lookup request from n2for relation R at epoch2is hashed toﬁnd the node(in this case,n1)that is the relation coordinator for the relation at the desired epoch.The data stored there contains the list of index pages that contain the tuple IDs for that version of the relation.

The request to scan those pages is sent to the index nodes that contain the contents of the pages,in this case n1and n3.Those index nodes then send requests on to the data storage nodes that contain the full tuples(stored as a mapping from Tuple ID to full tuple)to scan the desired tuples given their IDs.The data storage nodes then retrieve the desired tuples and return them to the requester(not shown).Note that only two of the six Tuple IDs were actually sent over the network,due to the colocation of index pages and tuple data.

As mentioned before,this approach avoids any possibility of seeing stale data due to replication lag.Suppose that,for some reason,n1had not yet received a copy of the record for R at epoch2.It would search other nodes nearby in the system until it found a copy before proceeding.Similarly,if n1had not yet received the data f,1 ,it would never simply return the data for f,0 ;it knows that data is stale because it does not appear in the index page.It would instead try to retrieve the full tuple for f,1 from the network before proceeding.

Algorithm1Retrieve(R,e,f(¯k)).Input:R relation,e epoch, f(¯k)ﬁlter function over key¯k.Output:Matching tuples t∈R satisfying f(¯k).

1:relCoord←h( R,e )

2:Contact Relation Coordinator at relCoord,retrieve pageIDList 3:for page∈pageIDList do

4:Ask Index node at h( e,(page.max+page.min)/2 )to scan page page

5:Index node retrieves page contents Tuples

6:Index nodeﬁlters Tuples with f(¯k)→fTuples

7:for t∈fTuples do

8:Index node requests that Data Storage node at h(t.key) scan the tuple t

9:Data Storage node sends t to node that requested scan, bypassing the Index node and Relation Coordinator 10:end for

11:end for

V.R ELIABLE Q UERY E XECUTION

As in prior peer-to-peer query engines[5],[6],we adopt

a dataﬂow(“push”)style of distributed query processing.The

operators at each node either receive data directly from a local scan of persistent storage,or receive tuples as they arrive from other nodes in system.All data is ultimately collected at the query initiator node,which may doﬁnal processing,such as the last stage of aggregation,or aﬁnal sort.

…… 此处隐藏：3325字，全部文档内容请下载后查看。喜欢就下载吧 ……

毕设英译汉原文协作数据共享系统的可靠存储和(6).doc 将本文的Word文档下载到电脑

下载这篇word文档