毕设英译汉原文 协作数据共享系统的可靠存储和(11)
时间:2025-07-06
时间:2025-07-06
Database Scale Factor 010203040506070N e t w o r k T r a f f i c (M B )
work traffic vs.data size,TPC-H,8nodes.
Per-Node Bandwidth KB/sec 0102030405060708090100
E x e c u t i o n T i m e (s e c )
Q10Q3Q5Q1Q6
Fig.17.Running time vs.per-node bandwidth,8nodes,TPC-H scale factor 4.
Number of Nodes
102030405060
E x e c u t i o n T i m e (s e c )
Q10Q5Q3Q1Q6
rger-scale performance on EC2,TPC-H scale factor 10.
Number of Nodes 0
50
100
150
200
250
N e t w o r k T r a f f i c (M B )
Q10Q3Q5Q6Q1
Fig.19.Total traffic on EC2,TPC-H scale factor 10.Number of Nodes
5
10
15
20
P e r -n o d e N e t w o r k T r a f f i c (M B )
Q10Q3Q5Q6Q1
Fig.20.Per-node traffic on EC2,TPC-H scale factor 10.
Failure Time (sec)
04812T i m e (s e c )
Restart Recovery
Q1
Failure Time (sec)
04812T i m e (s e c )
Restart Recovery
Q10
Fig.21.Running times for Q1and Q10with a failure with and without incremental recovery,8nodes,TPC-H scale factor 2.
are degraded but reasonable for the bandwidths likely to be available between academic,institutional,or corporate users (>400kB/sec).Queries 1and 6,which perform no rehash operations and therefore send much less data over the network,are less impacted than queries 3,5,and 10,which join multiple relations and rehash data while doing so.
Higher Latency Settings.We omit a full presentation of our latency experiments due to space constraints.Realistic laten-cies (up to 200ms)had little impact on query performance.D.Scalability to Larger Numbers of Nodes
Since we have a limited number of local machines in our cluster,we next tried several alternatives to scale to higher numbers.Our initial efforts were with the PlanetLab network testbed —but disappointingly,we found that most nodes here were severely underpowered and overloaded,and disk-and memory-intensive tasks like ours were constantly thrashing,resulting in inconsistent and uninformative results.
Instead,we leased virtual nodes from Amazon’s EC2service —something we envision O RCHESTRA ’s user base doing as needed.Amazon has data centers geographically distributed across the world,so round-trip times are short and bandwidth is high.We used EC2’s “large”instances with 7.5GB RAM,and a virtualized dual-core 2GHz Opteron CPU.We show settings with only EC2nodes to make the execution time results simpler to understand,although we performed addi-tional experiments showing similar results using a mixture of local and EC2nodes.We experimented with the TPC-H scenario,as performance on STBenchmark at the data sizes we could generate was either too fast to be measured reliably or dominated by the cost of collecting the results.
We varied the number of total participants in the setting from 10to 100,using TPC-H scale factor 10(10GB data).Network traffic results,shown in Figures 19and 20,are similar to the results shown in Figures 11and 12for smaller numbers of nodes.Execution times are shown in Figure 18.As before,increasing the number of nodes leads to a dramatic decrease in execution time.This experiment validates the scalability of our system to large numbers of nodes.E.Failure and Recomputation
Finally,we study recovery when a node fails or becomes unreachable.One option is to abort the query and restart it over the remaining nodes.The other is to use the remaining nodes to recompute the “lost”results.Our experiments used 8nodes and TPC-H scale factor 2.
Incremental Recomputation vs.Total Restart.To explore the trade-offs between incremental recomputation versus full restart,we first ran a series of experiments using Q1(a selection and aggregation query)and Q10(which performs three joins followed by an aggregation),chosen to represent the two classes of TPC queries we studied.We started each query and at varying points after the start of the query (before it finished)we caused one of the nodes to fail.To avoid giving incremental recomputation an unfair advantage,we recompute using the same routing tables (which spreads the range of the failed node evenly over the nodes holding its replicated data).Figure 21shows performance results for Q1and Q10.In both cases,incremental recovery outperforms aborting and restarting by approximately 20%,validating the approach.Execution is slow for both techniques (compared to no failure)due to the cache misses inherent when a new node takes over a portion of the substrate key space.
Overhead of Incremental Recomputation.Incremental recomputation requires more data to be stored and sent over the network (to track the provenance of intermediate results),and requires that all intermediate results be kept around until the end of the query.Clearly,if this adds significant overhead to an average query,it may actually be preferable to restart after nodes fail.We measured the overhead of incremental recovery support on the TPC-H queries,which we briefly sum-marize due to space constraints.As expected,recovery support slightly increased execution time:queries ran from 2%-7%work traffic increased by negligible amounts,at
50
…… 此处隐藏:2978字,全部文档内容请下载后查看。喜欢就下载吧 ……下一篇:《企业成本管理会计》公式