毕设英译汉原文协作数据共享系统的可靠存储和(7)

时间：2025-07-06

The main difference in O RCHESTRA is that we are con-

cerned with computing the exact(i.e.,correct and complete) answer set overﬁnite relations,rather than doing best-effort computations over unbounded streams.Moreover,we support scientiﬁc data sharing confederations with hundreds of peers, not hundreds of thousands or millions of peers.

A.Architecture for Performance and Failure Detection Several aspects of our query processing architecture are enabled by our custom hash-based substrate,versus an existing DHT like those used in PIER(Chord[8]or a hybrid between Gnutella and Chord[16])or Seaweed(Pastry).However,we additionally develop several techniques at the query execution level that are vital for performance and correctness.

First,for failure detection and efﬁciency,the query pro-cessor beneﬁts from the fact our substrate uses TCP to manage connections between machines.A“downstream”node almost immediately detects when an“upstream”node has failed,because the TCP connection drops.It also automatically providesﬂow control in the event of a congested network.In contrast,the DHTs in prior work tend to do little or noﬂow control,and they rely on occasional pings to eventually detect failures.(Of course,we could have alternatively used UDP, and implemented fast failure detection andﬂow control via periodic handshaking.)

Second,for failure recovery,the query processor is given direct information about the state of the routing tables.A snapshot of the routing tables is taken by the query initiator as it invokes the query.This snapshot is disseminated along with the query plan to all nodes,in order to ensure absolute consistency of the routing tables.If one or more nodes fail in the middle of execution,the difference in the routing tables is reported back to the query initiator,such that it can incrementally recompute only the lost portion of the query state(Section V-D).

Third,for performance,the query processor batches tu-ples into blocks by destination,compressing them(using lightweight Zip-based compression)and marshalling them in a format that exploits their commonalities.This makes query processing much more efﬁcient than if it were built over a DHT with many smaller messages,and reduces CPU and bandwidth use.

Finally,for correctness,each tuple is annotated with in-formation about which source nodes supplied the data from which the tuple was derived.This is used to prevent duplicate answers when recovering from a failed node.

B.Query Execution

A query plan consists of the operators listed in Table I. The query is“driven”by some combination of the leaf-level scan operators described in the table—each is novel to our system,as it exploits the speciﬁc versioned indexing scheme used in our storage system.Such operators typically are run concurrently across all of the nodes in the system—each operating on a data partition stored at those nodes.

From there,the retrieved data may be passed locally through a series of pipelined operators,such as joins or function eval-uation.This continues until the next operator is either a ship

Covering index scan retrieves data directly from the index nodes, if only key attributes are required,bypassing the data storage nodes.

Distributed scan executes at both index nodes and data storage nodes,similar to in Algorithm1,with the data storage nodes received ﬁltered collections of tuple IDs that pass a sargable predicate.The tuples from each index page are stored nearby on disk,and are retrieved in a single pass through the hash ID range for that page.

Instead of being sent back to the query initiator,the resulting tuples are pushed through the query plan.

Select implements selection on intermediate results.

Project is the standard projection operator.

Join is a pipelined hash join[17].

Aggregate is a a blocking,hash-based grouping operator,which supports re-aggregation of partially aggregated intermediate results.

Ship sends the tuples it receives to the query initiator.

Rehash partitions its input among the system nodes by hashing on some subset of the tuples’attributes.

Compute-function performs scalar function evaluation,such as arithmetic or string concatenation.

TABLE I

O PERATORS IN THE O RCHESTRA QUERY ENGINE

operator or a rehash operator.The ship operator sends the data it receives to the query initiator.The rehash operator partitions its input tuples by their hash IDs in the networking substrate and sends them to other nodes in the system.Rehashing is commonly used to enable joins or aggregation,when a relation needs to be re-partitioned on a join or grouping key.The rehash operator routes tuples to a destination node byﬁrst hashing the key using the SHA-1hash function,then consulting the snapshot of the query routing table described previously.

Each operator sends an end-of-stream notiﬁcation to its parent operator when itﬁnishes executing.Scans can easily detect when they are done,and most other operators simply propagate an end-of-stream notiﬁcation downstream after they receive it(perhapsﬁrst performing someﬁnal computation to produce results,as in a aggregate operator).However, detecting end-of-stream with the rehash operator is slightly tricky:it cannot complete until it has acknowledgment from all downstream nodes that they have received all of the data it sent.Once all operators have encountered the end of the stream,the query is complete.

Example5.1:Continuing our example,a given node may initiate the following query:

SELECT x,MIN(z)FROM R,S

WHERE R.y=S.y GROUP BY x

O RCHESTRA can use the execution plan in Figure6(an-notated with the tuples at each stage in the plan,in italics).

Each node joins tuples from relations R and S on attribute y, and then groups the results by x.Before this computation can begin,tuples must be redistri …… 此处隐藏：4341字，全部文档内容请下载后查看。喜欢就下载吧 ……

毕设英译汉原文协作数据共享系统的可靠存储和(7).doc 将本文的Word文档下载到电脑

下载这篇word文档