毕设英译汉原文协作数据共享系统的可靠存储和(10)

时间：2026-01-12

Number of Nodes 05

20E x e c u t i o n T i m e (S e c )

Join

Corresp.Concatenate Copy Select

Fig.7.Running time:STBenchmark,800K tuples/relation,1-16nodes.

Number of Nodes

100

150

200

N e t w o r k T r a f f i c (M B )

Join

Corresp.Copy

Concatenate Select work trafﬁc:STBenchmark,800K tuples/relation,1-16nodes.

Number of Nodes

102030405060N e t w o r k T r a f f i c p e r N o d e (M B )

Join

Corresp.Copy

Concatenate Select

Fig.9.Per-node network trafﬁc:STBenchmark,800K tuples/relation,1-16nodes.

No. of Nodes 012345678

E x e c u t i o n T i m e (s e c )

Q10Q5Q3Q1Q6

Fig.10.Running time:TPC-H Scale Factor 0.5,1-16nodes.

No. of Nodes 0

N e t w o r k T r a f f i c (M B )

Q10Q3Q5Q6Q1

work trafﬁc:TPC-H Scale Factor 0.5,1-16nodes.

No. of Nodes

0.0

0.5

1.0

1.5

2.0

2.5

P e r -N o d e N e t w o r k T r a f f i c (M B )

Q10Q3Q5Q6Q1

Fig.12.Per-node network trafﬁc:TPC-H scale factor 0.5,1-16nodes.

# Tuples/Relation

01234567

E x e c u t i o n T i m e (s e c )

Join

Corresp.Copy

Concatenate Select

Fig.13.Running time vs.data size,STBench-mark,8nodes.Database Scale Factor

01234567

E x e c u t i o n T i m e (s e c )

Q10Q3Q5Q1Q6

Fig.14.Running time vs.data size,TPC-H,8

nodes.

# Tuples/Relation

050100150200250300350400

N e t w o r k T r a f f i c (M B )

Join

Corresp.Copy

Concatenate Select

work trafﬁc vs.data size,STBench-mark,8nodes.

Scaling Nodes.Figure 7shows execution times for STBench-mark (at 800,000tuples/relation)for 1to 16physical nodes,while Figure 10shows times for TPC-H queries over the 500MB data set (scale factor 0.5).Note that results for STBenchmark are directly above the corresponding results for TPC-H to emphasize that the trends are very similar.Ideally,the running times would be halved each time we double the number of nodes.Our results come very close to matching this expectation for all of the TPC-H queries and about half of the STBenchmark queries.In the other STBenchmark queries (in particular Copy),so much data is returned (because the tuples consist of many long strings),that collecting the results at the query initiator becomes a bottleneck.With 16nodes,all but 0.1sec of the Copy query is spent transmitting and receiving the results.We conducted separate experiments to verify that performance is mostly limited by network bandwidth,with some additional performance degradation due to the unmar-shaling and storage at the query initiator.All queries continue to show some performance improvement as the number of processing nodes increases.

Figures 8and 11show the total network trafﬁc while executing these queries,and Figures 9and 12show the per-node trafﬁc.As expected,the network trafﬁc increases as we scale up the number of nodes,but not dramatically so,and the per-node trafﬁc (after rising signiﬁcantly when we move from single-node computation to distributed operation)continues to decrease as nodes are added to the system.

Scaling Data Set Size.We next consider the effects of scaling the data.Figure 13shows execution times for STBenchmark on the 16-node cluster for 100K to 1.6M tuples/relation,and Figure 14shows the same for the TPC-H queries over the 8-node cluster while varying the data size from 250MB to 4GB (scale factors 0.25to 4).Figures 15and Figure 16show total network trafﬁc for the same scenarios.Execution times and network trafﬁc for all queries scale approximately linearly in the size of the data,as one would expect since there are only foreign-key joins and the data is fairly evenly distributed.We conclude that our system scales well on a LAN,and move on to consider other network settings.

C.Performance over a Simulated Wide Area Network

We next consider possible variations on Internet connectiv-ity among compute nodes.We made use of the trafﬁc shaping and network emulation features built into recent versions of Linux to simulate various parameter changes.Speciﬁcally,we used NetEm to delay outgoing packets,simulating a higher latency network,and we used the HTB queue discipline to simulate a lower bandwidth network.Here we focus on the TPC-H benchmark,since STBenchmark,due to its large strings,becomes increasingly bandwidth-constrained at the query initiator,and since we feel its data is actually less representative than TPC-H’s.

Limited Bandwidth Settings.Our experimental results,shown in Figure 17,demonstrate that while performance suffers in very low-bandwidth connections,execution times

…… 此处隐藏：2412字，全部文档内容请下载后查看。喜欢就下载吧 ……

毕设英译汉原文协作数据共享系统的可靠存储和(10).doc 将本文的Word文档下载到电脑

下载这篇word文档