毕设英译汉原文 协作数据共享系统的可靠存储和(7)
时间:2025-07-06
时间:2025-07-06
The main difference in O RCHESTRA is that we are con-
cerned with computing the exact(i.e.,correct and complete) answer set overfinite relations,rather than doing best-effort computations over unbounded streams.Moreover,we support scientific data sharing confederations with hundreds of peers, not hundreds of thousands or millions of peers.
A.Architecture for Performance and Failure Detection Several aspects of our query processing architecture are enabled by our custom hash-based substrate,versus an existing DHT like those used in PIER(Chord[8]or a hybrid between Gnutella and Chord[16])or Seaweed(Pastry).However,we additionally develop several techniques at the query execution level that are vital for performance and correctness.
First,for failure detection and efficiency,the query pro-cessor benefits from the fact our substrate uses TCP to manage connections between machines.A“downstream”node almost immediately detects when an“upstream”node has failed,because the TCP connection drops.It also automatically providesflow control in the event of a congested network.In contrast,the DHTs in prior work tend to do little or noflow control,and they rely on occasional pings to eventually detect failures.(Of course,we could have alternatively used UDP, and implemented fast failure detection andflow control via periodic handshaking.)
Second,for failure recovery,the query processor is given direct information about the state of the routing tables.A snapshot of the routing tables is taken by the query initiator as it invokes the query.This snapshot is disseminated along with the query plan to all nodes,in order to ensure absolute consistency of the routing tables.If one or more nodes fail in the middle of execution,the difference in the routing tables is reported back to the query initiator,such that it can incrementally recompute only the lost portion of the query state(Section V-D).
Third,for performance,the query processor batches tu-ples into blocks by destination,compressing them(using lightweight Zip-based compression)and marshalling them in a format that exploits their commonalities.This makes query processing much more efficient than if it were built over a DHT with many smaller messages,and reduces CPU and bandwidth use.
Finally,for correctness,each tuple is annotated with in-formation about which source nodes supplied the data from which the tuple was derived.This is used to prevent duplicate answers when recovering from a failed node.
B.Query Execution
A query plan consists of the operators listed in Table I. The query is“driven”by some combination of the leaf-level scan operators described in the table—each is novel to our system,as it exploits the specific versioned indexing scheme used in our storage system.Such operators typically are run concurrently across all of the nodes in the system—each operating on a data partition stored at those nodes.
From there,the retrieved data may be passed locally through a series of pipelined operators,such as joins or function eval-uation.This continues until the next operator is either a ship
Covering index scan retrieves data directly from the index nodes, if only key attributes are required,bypassing the data storage nodes.
Distributed scan executes at both index nodes and data storage nodes,similar to in Algorithm1,with the data storage nodes received filtered collections of tuple IDs that pass a sargable predicate.The tuples from each index page are stored nearby on disk,and are retrieved in a single pass through the hash ID range for that page.
Instead of being sent back to the query initiator,the resulting tuples are pushed through the query plan.
Select implements selection on intermediate results.
Project is the standard projection operator.
Join is a pipelined hash join[17].
Aggregate is a a blocking,hash-based grouping operator,which supports re-aggregation of partially aggregated intermediate results.
Ship sends the tuples it receives to the query initiator.
Rehash partitions its input among the system nodes by hashing on some subset of the tuples’attributes.
Compute-function performs scalar function evaluation,such as arithmetic or string concatenation.
TABLE I
O PERATORS IN THE O RCHESTRA QUERY ENGINE
operator or a rehash operator.The ship operator sends the data it receives to the query initiator.The rehash operator partitions its input tuples by their hash IDs in the networking substrate and sends them to other nodes in the system.Rehashing is commonly used to enable joins or aggregation,when a relation needs to be re-partitioned on a join or grouping key.The rehash operator routes tuples to a destination node byfirst hashing the key using the SHA-1hash function,then consulting the snapshot of the query routing table described previously.
Each operator sends an end-of-stream notification to its parent operator when itfinishes executing.Scans can easily detect when they are done,and most other operators simply propagate an end-of-stream notification downstream after they receive it(perhapsfirst performing somefinal computation to produce results,as in a aggregate operator).However, detecting end-of-stream with the rehash operator is slightly tricky:it cannot complete until it has acknowledgment from all downstream nodes that they have received all of the data it sent.Once all operators have encountered the end of the stream,the query is complete.
Example5.1:Continuing our example,a given node may initiate the following query:
SELECT x,MIN(z)FROM R,S
WHERE R.y=S.y GROUP BY x
O RCHESTRA can use the execution plan in Figure6(an-notated with the tuples at each stage in the plan,in italics).
Each node joins tuples from relations R and S on attribute y, and then groups the results by x.Before this computation can begin,tuples must be redistri …… 此处隐藏:4341字,全部文档内容请下载后查看。喜欢就下载吧 ……
下一篇:《企业成本管理会计》公式