毕设英译汉原文 协作数据共享系统的可靠存储和(12)

时间:2025-07-06

most2%(for Q10).In our view,this overhead is low enough to make it worthwhile if there is a reasonable expectation of node failure—particularly for long-running queries where the cost of restart may be high.Such an expectation goes up as more nodes join(and query running times go down,reducing the overall amount of overhead).Also,if query performance is limited by available network bandwidth,incremental recovery becomes almost free due to the low network overhead,and restarting becomes more expensive.

VII.R ELATED W ORK

Distributed hash table-based query processors have largely targeted the domain of Internet-scale network monitoring, where nodes located throughout the Internet each process large amounts of typically streaming data.The PIER system[5] developed implementations of the pipelined hash join and Bloomjoin over a DHT,as well as schemes for computing aggregation over a tree-like structure of nodes.Seaweed[6] focuses on distributed aggregation,including proactive com-putation of aggregates,and latency-based cost estimation. In both of these systems,the focus is on throughput and best-effort query processing using many peers operating on large amounts of data;completeness and consistency are not essential.Our target domain is more controlled and smaller—with certain parameters closer to distributed DBMSs—but also has storage,consistency,and completeness requirements. Reliable query processing is a topic of study dating back at least to IBM’s R*[20]and perhaps best known commercially as Tandem NonStop SQL[21].However,their consistency model and definition of reliability differ from ours.In existing work,the problem is detecting a failed machine in a local cluster and possibly aborting and restarting a query.Our goal is to incrementally recompute“missing”answers where possible, in order to complete query computation.Also,our consistency model is somewhat simpler because we do not consider transactions,and relations are only updated by their“owners.”Recent work on cloud data services,such as[10],[11]seeks to develop reliable,batch-oriented,DBMS-like capabilities over Hadoop and immutablefiles stored in HDFS.Sinfonia[22] seeks to develop failure-tolerant“mini-transactions”to support distributed state management in a cluster.

VIII.C ONCLUSIONS AND F UTURE W ORK

This paper has shown how to provide a reliable peer-to-peer storage and query execution engine for a CDSS. This involves a richer networking substrate,novel differential indexing schemes to guarantee the correct versions of all tuples are used during processing,and a query evaluator that is care-fully matched to this substrate.We developed techniques for handling failures through incremental or full recomputation, and showed the trade-offs between these approaches.

There are a number of directions in which we would like to extend this work.One is to make use of materialized views,perhaps arising from the cached results of previous queries,to improve execution performance,though as in a centralized database the cost of freshening and using a view may outweigh its benefit.Another promising avenue is to implement automatic load-balancing by adjusting the

routing table,to compensate for unequal network bandwidth or available machine resources.Finally and most importantly, now that the system is stable and fully functional,we plan to integrate it as a component of the O RCHESTRA system, realizing the truly peer-to-peer nature of a CDSS.

A CKNOWLEDGMENTS

This work was funded in part by NSF grants IIS-0477972, IIS-0713267,IIS-0513778,and CNS-0721541.We thank the Penn Database Group,especially TJ Green,Greg Kar-vounarakis,and Svilen Mihaylov,and our anonymous review-ers for their feedback.We are also grateful to the authors of STBenchmark for their technical assistance.

R EFERENCES

[1]Z.G.Ives,T.J.Green,G.Karvounarakis,N.E.Taylor,V.Tannen,P.P.

Talukdar,M.Jacob,and F.Pereira,“The O RCHESTRA collaborative data

sharing system,”SIGMOD Rec.,2008.

[2]N.E.Taylor and Z.G.Ives,“Reconciling while tolerating disagreement

in collaborative data sharing,”in SIGMOD,2006.

[3]T.J.Green,G.Karvounarakis,Z.G.Ives,and V.Tannen,“Update

exchange with mappings and provenance,”in VLDB,2007,amended

version available as Univ.of Pennsylvania report MS-CIS-07-26.

[4] A.Rowstron and P.Druschel,“Pastry:Scalable,distributed object lo-

cation and routing for large-scale peer-to-peer systems,”in Middleware,

2001.

[5]R.Huebsch,B.N.Chun,J.M.Hellerstein,B.T.Loo,P.Maniatis,

T.Roscoe,S.Shenker,I.Stoica,and A.R.Yumerefendi,“The architec-

ture of PIER:an Internet-scale query processor.”in CIDR,2005.

[6] D.Narayanan,A.Donnelly,R.Mortier,and A.Rowstron,“Delay aware

querying with Seaweed,”in VLDB,2006.

[7]S.Ratnasamy,P.Francis,M.Handley,R.Karp,and S.Shenker,“A

scalable content-addressable network,”in SIGCOMM,2001.

[8]I.Stoica,R.Morris,D.Karger,M.F.Kaashoek,and H.Balakrishnan,

“Chord:A scalable peer-to-peer lookup service for Internet applica-

tions,”in SIGCOMM,2001.

[9]S.Ghemawat,H.Gobioff,and S.-T.Leung,“The Googlefile system,”

in SOSP,2003.

[10] C.Olston,B.Reed,U.Srivastava,R.Kumar,and A.Tomkins,“Pig

Latin:a not-so-foreign language for data processing,”in SIGMOD,2008.

[11] B.F.Cooper,R.Ramakrishnan,U.Srivastava,A.Silberstein,P.Bo-

hannon,H.-A.Jacobsen,N.Puz,D.Weaver,and R.Yerneni,“PNUTS:

Yahoo!’s hosted data serving platform,”PVLDB,vol.1,no.2,2008.

[12]“Amazon Simple Storage Service(Amazon S3),”2008,aws.amazon.

com/s3.

[13] A.Gupta,B.Liskov,and R.Rodrigues,“Efficient routing for peer-to-

peer overlays,”in NSDI,2004.

[14]P.Druschel and A.I.T.Rowstron,“PAST:A large-scale,persistent

peer-to-peer storage utility,”in HotOS,2001,pp.75– …… 此处隐藏:4720字,全部文档内容请下载后查看。喜欢就下载吧 ……

毕设英译汉原文 协作数据共享系统的可靠存储和(12).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:7 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:29元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219