毕设英译汉原文 协作数据共享系统的可靠存储和(3)
时间:2025-07-06
时间:2025-07-06
scheme to combine and process the data;and distributing data tuple-by-tuple according to a key,and using a distributed hash scheme to route messages to nodes in a network.Google’s MapReduce and GFS,as well as Hadoop and HDFS,use the former model.Distributed hash tables(DHTs)[4],[7],[8]and directory-based schemes use the latter.
Distributedfilesystems suffer from several drawbacks as the basis of a query engine.First,they require a single administrative domain and(at least in current implementations like HDFS)a single coordinator node(the NameNode),which introduces a single point of failure.Moreover,they actually use two different distribution models:base data is partitioned on a per-page basis,then all multi-pass query operations(joins, aggregation,nesting)must be executed through a MapReduce scheme that partitions the data on keys(via sorting or hashing). We instead adopt a tuple-by-tuple hash-based distribution scheme for routing messages:this is commonly referred to as a content addressable overlay network and is exemplified by the DHT.Our goal is to provide good performance and to tolerate nodes joining or failing,but we do not require scalability to millions of nodes as with the DHT.In Section III we adapt some of the key ideas of the DHT in order to accomplish this.
B.Versioned Storage
Each time a participant in O RCHESTRA publishes its up-dates,we create a new version of that participant’s update log(stored as tables).This also results in a new version of the global state published to the CDSS.Now,when a participant in O RCHESTRA imports data via update exchange and reconciliation,it expects to receive a consistent,complete set of answers according to some version of that global state.We support this with a storage scheme(described in Section IV)that tracks state across versions,and manages replication and failover when node membership changes,such that queries receive a“snapshot”of the data according to a version.We optimize for the fact that most published updates will be new data rather than revisions to current data. When data is stored in a traditional content-addressable network,background replication methods ensure that all data eventually is replicated,and gets placed where it belongs when a node fails—but if the set of participants is changing then data may temporarily be missed during query processing.Fur-thermore,such systems also require the data assigned to each key to be immutable.Similarly,existing distributedfilesystems like GFS and HDFS assume data is within immutablefiles, and they are additionally restricted to a single administrative domain.
Hence our versioned storage scheme must provide book-keeping than a traditional distributed hash table,but offers more autonomy andflexibility than a distributedfilesystem. In Section III we describe our customized data storage,parti-tioning,and distributed lookup layer.
C.Query Processing Layer
As is further discussed in Section VII,a number of exist-ing query processing systems,including PIER[5]and Sea-weed[6],have employed DHTs to perform large-scale,“best-effort”query processing of streaming data.In essence,
the
(a)Pastry-style range
allocation(b)Balanced range allocation
Fig.2.Range allocation schemes
DHT is treated like a very large parallel DBMS,where hashing is used as the basis of intra-operator parallelism.Immutable data can be stored at every peer,accessed by hashing its index key.Operations like joins can be performed by hashing both relations according to their join key,co-locating the relations to be joined at the same node.Such work has two shortcomings for our context:multiple data versions are not supported,and their“best-effort”consistency model in the presence of failures or node membership changes is insufficient.
Our goal is not only to support efficient distributed compu-tation of query answers,but also to detect and recover from node failure.We emphasize that this is different from recovery in a transactional sense:here our goal is to compensate for missing answers in a query,ideally without redoing the entire query from scratch(whereas transactional recovery typically does involve aborting and recomputing from the beginning).
Failure recovery in query answering requires us(in Section V) to develop techniques to track the processing of query state, all the way from the initial versioned index and storage layer, through the various query operators,to thefinal output.
Furthermore,we develop techniques for incrementally re-computing only those results that a failed node was responsible for producing.Given that every operator in the query plan may be executed in parallel across all nodes,the failure of a single node affects intermediate state at all levels of the plan.Our goal is to restart the query only over the affected portions of the data,and yet to ensure that the query does not produce duplicate or incorrect answers.
III.H ASHING-B ASED S UBSTRATE
Any scalable substrate for data storage in a peer-to-peer setting needs to adopt techniques for(1)data partitioning,
(2)data retrieval,and(3)handling node membership changes,
including failures.We describe how our custom hashing-based storage layer addresses these issues,in a way that is fully decentralized and supports multiple administrative domains.
A.Data Partitioning
Like most content-addressable overlay networks,we adopt
a hash-based system for data placement.Similar to previous
well-known distributed hash tables(DHTs)such as Pastry[4], we use as our key space160-bit unsigned integers,matching the output of the SHA-1cryptographic hash function.
It is convenient to visualize the key space as a ring of values, starting at0and increase clockwise until they get to(2160−1) and then overflow back to0.Figure2shows two examples of this ring that we will …… 此处隐藏:4048字,全部文档内容请下载后查看。喜欢就下载吧 ……
下一篇:《企业成本管理会计》公式