毕设英译汉原文协作数据共享系统的可靠存储和(12)

时间：2026-05-08

most2%(for Q10).In our view,this overhead is low enough to make it worthwhile if there is a reasonable expectation of node failure—particularly for long-running queries where the cost of restart may be high.Such an expectation goes up as more nodes join(and query running times go down,reducing the overall amount of overhead).Also,if query performance is limited by available network bandwidth,incremental recovery becomes almost free due to the low network overhead,and restarting becomes more expensive.

VII.R ELATED W ORK

Distributed hash table-based query processors have largely targeted the domain of Internet-scale network monitoring, where nodes located throughout the Internet each process large amounts of typically streaming data.The PIER system[5] developed implementations of the pipelined hash join and Bloomjoin over a DHT,as well as schemes for computing aggregation over a tree-like structure of nodes.Seaweed[6] focuses on distributed aggregation,including proactive com-putation of aggregates,and latency-based cost estimation. In both of these systems,the focus is on throughput and best-effort query processing using many peers operating on large amounts of data;completeness and consistency are not essential.Our target domain is more controlled and smaller—with certain parameters closer to distributed DBMSs—but also has storage,consistency,and completeness requirements. Reliable query processing is a topic of study dating back at least to IBM’s R*[20]and perhaps best known commercially as Tandem NonStop SQL[21].However,their consistency model and deﬁnition of reliability differ from ours.In existing work,the problem is detecting a failed machine in a local cluster and possibly aborting and restarting a query.Our goal is to incrementally recompute“missing”answers where possible, in order to complete query computation.Also,our consistency model is somewhat simpler because we do not consider transactions,and relations are only updated by their“owners.”Recent work on cloud data services,such as[10],[11]seeks to develop reliable,batch-oriented,DBMS-like capabilities over Hadoop and immutableﬁles stored in HDFS.Sinfonia[22] seeks to develop failure-tolerant“mini-transactions”to support distributed state management in a cluster.

VIII.C ONCLUSIONS AND F UTURE W ORK

This paper has shown how to provide a reliable peer-to-peer storage and query execution engine for a CDSS. This involves a richer networking substrate,novel differential indexing schemes to guarantee the correct versions of all tuples are used during processing,and a query evaluator that is care-fully matched to this substrate.We developed techniques for handling failures through incremental or full recomputation, and showed the trade-offs between these approaches.

There are a number of directions in which we would like to extend this work.One is to make use of materialized views,perhaps arising from the cached results of previous queries,to improve execution performance,though as in a centralized database the cost of freshening and using a view may outweigh its beneﬁt.Another promising avenue is to implement automatic load-balancing by adjusting the

routing table,to compensate for unequal network bandwidth or available machine resources.Finally and most importantly, now that the system is stable and fully functional,we plan to integrate it as a component of the O RCHESTRA system, realizing the truly peer-to-peer nature of a CDSS.

A CKNOWLEDGMENTS

This work was funded in part by NSF grants IIS-0477972, IIS-0713267,IIS-0513778,and CNS-0721541.We thank the Penn Database Group,especially TJ Green,Greg Kar-vounarakis,and Svilen Mihaylov,and our anonymous review-ers for their feedback.We are also grateful to the authors of STBenchmark for their technical assistance.

R EFERENCES

[1]Z.G.Ives,T.J.Green,G.Karvounarakis,N.E.Taylor,V.Tannen,P.P.

Talukdar,M.Jacob,and F.Pereira,“The O RCHESTRA collaborative data

sharing system,”SIGMOD Rec.,2008.

[2]N.E.Taylor and Z.G.Ives,“Reconciling while tolerating disagreement

in collaborative data sharing,”in SIGMOD,2006.

[3]T.J.Green,G.Karvounarakis,Z.G.Ives,and V.Tannen,“Update

exchange with mappings and provenance,”in VLDB,2007,amended

version available as Univ.of Pennsylvania report MS-CIS-07-26.

[4] A.Rowstron and P.Druschel,“Pastry:Scalable,distributed object lo-

cation and routing for large-scale peer-to-peer systems,”in Middleware,

2001.