毕设英译汉原文协作数据共享系统的可靠存储和(2)

时间：2025-07-06

Participant

with local DB servers + storage

All participants

Fig.1.Basic architectural components in the O RCHESTRA system,as a participant(peer)publishes its update logs and imports data from elsewhere. Components on the left were the focus of[2],[3],and this paper focuses on the components shown on the right.

in a CDSS,usersﬁrst make updates only to their local storage, and they occasionally publish a log of these updates(which are primarily insertions of new data items)to the CDSS.Then they perform an import(transforming and importing others’newly published data to their local replica).Only in this step is information actually shared across users,and it is then that conﬂict resolution is performed.Hence,we do not need special support for global consistency,such as distributed locking or version vectors,at the distributed storage level. We address these needs through a custom data partitioning and storage layer,as well as a new distributed query processor. We develop novel techniques for ensuring versioning,con-sistency,and failure recovery in order to guarantee complete answers.Our speciﬁc contributions are as follows:•Modiﬁcations to the standard data partitioning tech-niques used in distributed hash tables[4],customizing them to a more stable environment,and providing greater transparency of operation to the layers above.

•A distributed,replicated,versioned relational storage scheme that ensures that queries see a consistent,com-plete snapshot of the data.

•Mechanisms for detecting node failures and either com-pletely restarting or incrementally recomputing the query, while ensuring the correct answer set is returned.

•Experiments,using standard benchmarks for OLAP and schema mapping tasks,across local and cloud computing nodes,validating our methods under different network settings and in the presence of failures.

We implement and evaluate our techniques within the O RCHESTRA collaborative data sharing system.However,the techniques are broadly applicable across a variety of emerging data management applications,such as distributed version control,data exchange,and data warehousing.

Section II presents the O RCHESTRA architecture,and Sec-tion III details our modiﬁed data distribution substrate.Sec-tion IV describes our storage and indexing layer,upon which we build the fault-tolerant distributed query engine presented in Section V.Section VI validates our techniques through ex-perimental analysis.We describe related work in Section VII, and conclude and discuss future work in Section VIII.

II.S YSTEM A RCHITECTURE AND R EQUIREMENTS Figure1shows O RCHESTRA’s architecture,and sketches the dataﬂow involved in its main operations.Each participant

(illustrated on the left)operates a local DBMS with a possibly unique schema,and uses this DBMS to pose queries and make updates.O RCHESTRA is invoked when the participant has a stable data instance it wishes to“synchronize”with the world: this involves publishing updates from the local DBMS log to versioned storage,and importing updates from elsewhere.

The import operation consists of update exchange[3]and reconciliation[2].Update exchangeﬁnds updates satisfying

a local participant’sﬁltering criteria and,based on the schema

mappings,executes SQL queries that convert data into the par-ticipant’s local schema.Reconciliationﬁnds sets of conﬂicts, among both updates and the transactions they comprise,by executing SQL queries over the versioned storage system.

To this point,our work has focused on the left half of theﬁgure:the logic needed to create and use the SQL queries supporting update exchange and reconciliation,and the modules to“hook”into the DBMS to obtain update logs.

In this paper,we focus on the right half of the diagram:how to implement distributed,versioned storage and distributed query execution.We are particularly concerned with performance in support of update exchange(data transformation)queries, which are more complex than the conﬂict detection queries, and by far the main bottleneck in performance[2],[3].We also develop capabilities in the query execution layer to support mapping and OLAP-style queries directly over the distributed, versioned data.Data is primarily stored and replicated among the various participants’nodes.However,as greater resources, particularly in terms of CPU,are required,participants may purchase cycles on a cloud computing service capable of running arbitrary code,such as Amazon’s EC2(considered in this paper)or Microsoft’s Azure.

In the remainder of this section,we explain the unique requirements of O RCHESTRA and why they require new solutions beyond the existing state of the art.In subsequent sections,we describe our actual solutions.

A.Data Storage,Partitioning,and Distributed Lookup

As discussed previously,we assume that the participants number in the dozens to hundreds,are usually connected, and have enough storage capacity to maintain a log of all data versions.Our target domain differs from conventional P2P systems where connectivity is highly unstable.We only expect low“churn”(nodes joining and leaving the system) rates,perhaps as participants go down for maintenance or are replaced with new machines.We expect failures to be infrequent enough that keeping a few replicas of every data item is sufﬁcient.We avoid single points of failure,as we want the service to remain available at all times,even if some nodes go down for maintenance.

In a distributed implementation of a CDSS,we need a means of(1)partitioning the stored data(such that it is distributed reasonably uniformly across the nodes),(2)ensuring efﬁcient re-partitioning when nodes join and leave,(3)supporting distributed query computation,and(4)supporting background replication.There are two main schemes for doing this in

a distributed …… 此处隐藏：3895字，全部文档内容请下载后查看。喜欢就下载吧 ……

毕设英译汉原文协作数据共享系统的可靠存储和(2).doc 将本文的Word文档下载到电脑

下载这篇word文档