毕设英译汉原文协作数据共享系统的可靠存储和(11)

时间：2025-07-06

Database Scale Factor 010203040506070N e t w o r k T r a f f i c (M B )

work trafﬁc vs.data size,TPC-H,8nodes.

Per-Node Bandwidth KB/sec 0102030405060708090100

E x e c u t i o n T i m e (s e c )

Q10Q3Q5Q1Q6

Fig.17.Running time vs.per-node bandwidth,8nodes,TPC-H scale factor 4.

Number of Nodes

102030405060

E x e c u t i o n T i m e (s e c )

Q10Q5Q3Q1Q6

rger-scale performance on EC2,TPC-H scale factor 10.

Number of Nodes 0

100

150

200

250

N e t w o r k T r a f f i c (M B )

Q10Q3Q5Q6Q1

Fig.19.Total trafﬁc on EC2,TPC-H scale factor 10.Number of Nodes

P e r -n o d e N e t w o r k T r a f f i c (M B )

Q10Q3Q5Q6Q1

Fig.20.Per-node trafﬁc on EC2,TPC-H scale factor 10.

Failure Time (sec)

04812T i m e (s e c )

Restart Recovery

Failure Time (sec)

04812T i m e (s e c )

Restart Recovery

Q10

Fig.21.Running times for Q1and Q10with a failure with and without incremental recovery,8nodes,TPC-H scale factor 2.

are degraded but reasonable for the bandwidths likely to be available between academic,institutional,or corporate users (>400kB/sec).Queries 1and 6,which perform no rehash operations and therefore send much less data over the network,are less impacted than queries 3,5,and 10,which join multiple relations and rehash data while doing so.

Higher Latency Settings.We omit a full presentation of our latency experiments due to space constraints.Realistic laten-cies (up to 200ms)had little impact on query performance.D.Scalability to Larger Numbers of Nodes

Since we have a limited number of local machines in our cluster,we next tried several alternatives to scale to higher numbers.Our initial efforts were with the PlanetLab network testbed —but disappointingly,we found that most nodes here were severely underpowered and overloaded,and disk-and memory-intensive tasks like ours were constantly thrashing,resulting in inconsistent and uninformative results.

Instead,we leased virtual nodes from Amazon’s EC2service —something we envision O RCHESTRA ’s user base doing as needed.Amazon has data centers geographically distributed across the world,so round-trip times are short and bandwidth is high.We used EC2’s “large”instances with 7.5GB RAM,and a virtualized dual-core 2GHz Opteron CPU.We show settings with only EC2nodes to make the execution time results simpler to understand,although we performed addi-tional experiments showing similar results using a mixture of local and EC2nodes.We experimented with the TPC-H scenario,as performance on STBenchmark at the data sizes we could generate was either too fast to be measured reliably or dominated by the cost of collecting the results.

We varied the number of total participants in the setting from 10to 100,using TPC-H scale factor 10(10GB data).Network trafﬁc results,shown in Figures 19and 20,are similar to the results shown in Figures 11and 12for smaller numbers of nodes.Execution times are shown in Figure 18.As before,increasing the number of nodes leads to a dramatic decrease in execution time.This experiment validates the scalability of our system to large numbers of nodes.E.Failure and Recomputation

Finally,we study recovery when a node fails or becomes unreachable.One option is to abort the query and restart it over the remaining nodes.The other is to use the remaining nodes to recompute the “lost”results.Our experiments used 8nodes and TPC-H scale factor 2.

Incremental Recomputation vs.Total Restart.To explore the trade-offs between incremental recomputation versus full restart,we ﬁrst ran a series of experiments using Q1(a selection and aggregation query)and Q10(which performs three joins followed by an aggregation),chosen to represent the two classes of TPC queries we studied.We started each query and at varying points after the start of the query (before it ﬁnished)we caused one of the nodes to fail.To avoid giving incremental recomputation an unfair advantage,we recompute using the same routing tables (which spreads the range of the failed node evenly over the nodes holding its replicated data).Figure 21shows performance results for Q1and Q10.In both cases,incremental recovery outperforms aborting and restarting by approximately 20%,validating the approach.Execution is slow for both techniques (compared to no failure)due to the cache misses inherent when a new node takes over a portion of the substrate key space.

Overhead of Incremental Recomputation.Incremental recomputation requires more data to be stored and sent over the network (to track the provenance of intermediate results),and requires that all intermediate results be kept around until the end of the query.Clearly,if this adds signiﬁcant overhead to an average query,it may actually be preferable to restart after nodes fail.We measured the overhead of incremental recovery support on the TPC-H queries,which we brieﬂy sum-marize due to space constraints.As expected,recovery support slightly increased execution time:queries ran from 2%-7%work trafﬁc increased by negligible amounts,at

…… 此处隐藏：2978字，全部文档内容请下载后查看。喜欢就下载吧 ……

毕设英译汉原文协作数据共享系统的可靠存储和(11).doc 将本文的Word文档下载到电脑

下载这篇word文档