毕设英译汉原文协作数据共享系统的可靠存储和(9)

时间：2025-07-06

generating all join and grouping results dependent on them. Re-create data that was sent to the failed nodes’hash key space ranges.Additionally,any data that was sent to a failed node was either lost when the node failed or has become tainted by passing through the node and will therefore be discarded.Now all data that was to have been sent to the failed nodes must be retransmitted.If an operator maintains an in-memory snapshot of all data necessary to re-produce its answers(as with a pipelined hash join)this is relatively efﬁcient.For more costly operations such as tablescans,we add a cache of their output data,at the downstream rehash or ship operator.It is easy to detect which of the reproduced tuples would have been sent to a failed node by consulting the query’s original routing table.

Perhaps the most difﬁcult task in recovery is avoiding race conditions that lead to subtly incorrect query results.We have chosen to divide computation into phases corresponding to the initial execution,followed by successive incremental recovery invocations.Each tuple gets tagged with a phase.As each stateful operator processes a recovery message,it purges tainted data and increments its phase counter.All tuples it (re)produces are in this new phase.This allows the system to differentiate between old,in-ﬂight data from a failed node and new,recomputed results from recovery.

VI.E XPERIMENTAL E VALUATION

We brieﬂy describe our implementation,which has been under development for more than two years.

Query Engine.Our execution engine is implemented in approximately50,000lines of Java.It uses BerkeleyDB Java Edition3.3.69for persistent storage of data.We conducted most experiments on a16-node cluster of dual-core2.4GHz Xeon machines with4GB RAM running Fedora10,connected by Gigabit Ethernet.To study performance at scale,we used up to1002GHz dual core nodes from Amazon’s EC2cloud computing service.

Query Optimizer.The focus of this paper is on the distributed execution engine of O RCHESTRA,but we brieﬂy describe its optimizer.It currently handles single-block SQL queries, including function evaluation and grouping.It adopts the V olcano[18]transformational model,using top-down enu-meration of plans with memoization,and employing branch-and-bound pruning to discard alternative query plans when their cost exceeds the cost of a known query plan.Our optimizer considers bushy as well as linear query plans.It relies on information(previously computed and stored)about machine CPU and disk performance,as well as pairwise bandwidth.The optimizer estimates costs by assuming that each horizontally partitioned relation will be evenly distributed by the storage layer across all nodes.It then estimates the cost of a subplan by considering the cost at the slowest node or link that must be used at each stage—in a sense estimating the worst-case expected completion time of each operation.

A.Workload

Queries that are generated from schema mappings,as in data exchange and collaborative data sharing systems,are primarily select-project-join queries that vary from domain to domain,

and are seldom publicly available.A recent benchmark suite, STBenchmark[19],has been proposed to create synthetic data exchange schema mappings along a variety of dimensions.

We ran the STBenchmark instance and mapping generator with the default parameters,but with the nesting depth set to zero to produce relational data.We varied the size of each generated relation from100K to1.6M tuples(the maximum the ToXGene generator would produce due to memory con-straints).Except for oneﬁeld,all STBenchmark tables are wide relations containing many25-character variable length strings(which are not necessarily representative of typical data exchange settings).Nonetheless,we selected a repre-sentative subset of the STBenchmark mapping scenarios to study:(1)Copy,which retrieves an entire7-attribute relation,

(2)Select,which retrieves the tuples from a6-attribute relation

that satisfy a simple integer inequality predicate,(3)Join, which combines a7-,a5-,and a9-attribute relation by joining them on two attributes,(4)Concatenate,which retrieves a6-attribute relation,concatenates three of those attributes together,and returns the result along with the remaining three attributes,and(5)Correspondence,which retrieves a 7-attribute relation and uses a correspondence table to add an integer-valued ID based on two of the input attributes to the result.The last query used a Skolem function(ID generator) in the output,which we replaced with a value correspondence table,as would likely be used in practice.

To add diversity and scale to our data and queries,we also experimented with the standard TPC-H OLAP benchmark:

(1)it scales to a variety of sizes,enabling us to consider

dataset scalability,(2)it contains a diverse set of queries, enabling us to identify different performance factors,(3)it is a well-understood and standard benchmark for comparison.We used the standard TPC-H data generator to create source data at several scale factors,and we selected the TPC-H queries meeting the single-SQL-block requirement of our optimizer.

We distributed the8TPC-H tables by partitioning on their key attribute(ﬁrst key attribute,if more than one attribute was present).Two of the tables,Nation and Region,were small enough that we replicated them at each node;together they take up less than3KB on disk.We use TPC-H queries 1,3,5,6,and10,and measure running time to completion of the full query.Queries1and6are aggregation queries over the Lineitem table;Q1performs a distributed aggregation followed by re-aggregation at the query coordinator,while Q6 only performs an aggregation at the coordinator.Queries3, 5,and10are3-way,6-way,and4-way joins,respectively, followed by aggregation.

All measurements were taken after results converged to a stable range of values;this is done t …… 此处隐藏：4260字，全部文档内容请下载后查看。喜欢就下载吧 ……

毕设英译汉原文协作数据共享系统的可靠存储和(9).doc 将本文的Word文档下载到电脑

下载这篇word文档