2008-FAST-Avoiding the Disk Bottleneck in the Data Domain De(12)

时间：2026-01-16

FAST有关论文。。

Figure8:WrFigure 8:Write Throughput of Single Backup ClientiteThroughputofSing

leBackupClientaand nd4BBackup Clients.ackupClients.Figure9:RFigure 9:Read Throughput of Single Backup ClienteadThroughputofSingleBackupClientand and

4BBackup ClientsackupClients

backupstreamusingoneclientcomputerand4backupbackup stream using one client computer and 4 backup

streams using two client computers for write and read for str

eamsusingtwoclientcompute

rsfor

writeandreadfo

r10 generations of the backup datasets. The results are 10generationsofthebackupdatasets.TheresultsareshowninFigures8and9.shown in Figures 8 and 9.

The deduplication system delivers high write throughput Thededuplicationsystemdelivershighwritethroughputresults for both cases. In the single stream case, the

resultsforbothcases.nthesinglestreamcase,thesystemachieveswritethroughputof110MB/secforsystem achieves write throughput of 110 MB/sec for generation0andover113MB/secforgeneration1generation 0 and over 113 MB/sec for generation 1

through9.nthe4streamcase,thesystemachievesthrough 9. In the 4 stream case, the system achieves

writethroughputof139MB/secforgeneration0andawrite throughput of 139 MB/sec for generation 0 and a sustained217MB/secforgeneration1through9.sustained 217 MB/sec for generation 1 through 9.

Writethroughputforgeneration0islowerbecauseallWrite throughput for generation 0 is lower because all segmentsarenewandrequireZiv-Lempelstylesegments are new and require Ziv-Lempel style

pression by the CPU of the deduplication system. ThesystemdelivershighreadthroughputresultsfortheThe system delivers high read throughput results for the

singlestreamcase.Throughoutallgenerations,thesingle stream case. Throughout all generations, the systemachievesover100MB/secreadthroughput.system achieves over 100 MB/sec read throughput. Forthe4streamcase,thereadthroughputis211MB/secFor the 4 stream case, the read throughput is 211 MB/sec forgeneration0,192MB/secforgeneration1,165for generation 0, 192 MB/sec for generation 1, 165 MB/secforgeneration2,andstayataround140MB/secMB/sec for generation 2, and stay at around 140 MB/sec forfuturegenerations.Themainreasonforthedecreasefor future generations. The main reason for the decrease ofreadthroughputinthelatergenerationsisthatfutureof read throughput in the later generations is that future generationshavemoreduplicatedatasegmentsthanthegenerations have more duplicate data segments than the firstfew.However,thereadthroughputstaysataboutfirst few. However, the read throughput stays at about 140MB/secforlatergenerationsbecauseofStream-140 MB/sec for later generations because of Stream-nformed Segment Layout and Locality Preserved nformedSegmentLayoutandLocalityPreservedCaching.Caching.

NotethatwritethroughputhashistoricallybeenvaluedNote that write throughput has historically been valued morethanreadthroughputforthebackupusecasesincemore than read throughput for the backup use case since backuphastocompletewithinaspecifiedbackupbackup has to complete within a specified backup windowtimeperiodanditismuchmorefrequenteventwindow time period and it is much more frequent event thanrestore.Readthroughputisstillveryimportant,than restore. Read throughput is still very important, especiallyinthecaseofwholesystemrestores.especially in the case of whole system restores.

5.45.4 Discussion Discussion

ThetechniquespresentedinthispaperaregeneralThe techniques presented in this paper are general

methodstoimprovethroughputperformanceofmethods to improve throughput performance of deduplicationstoragesystems.Althoughoursystemdeduplication storage systems. Although our system dividesadatastreamintocontent-basedsegments,thesedivides a data stream into content-based segments, these methodscanalsoapplytosystemusingfixedalignedmethods can also apply to system using fixed aligned

segmentssuchasVenti.segments such as Venti.

Asasidenote,wehavecomparedthecompressionratiosAs a side note, we have compared the compression ratios ofasystemsegmentingdatastreamsbycontents(aboutof a system segmenting data streams by contents (about 8Kbytesonaverage)withanothersystemusingfixed8Kbytes on average) with another system using fixed aligned8Kbytessegmentsontheengineeringandaligned 8Kbytes segments on the engineering and exchangebackupdatasets.Wefoundthatthefixedexchange backup datasets. We found that the fixed alignmentapproachgetsbasicallynoglobalcompressionalignment approach gets basically no global compression (globalcompression:1.01)fortheengineeringdata,(global compression: 1.01) for the engineering data, whereasthesystemwithcontent-basedsegmentationwhereas the system with content-based segmentation getsalotofglobalcompression(6.39:1).Themaingets a lot of global compression (6.39:1). The main reasonofthedifferenceisthatthebackupsoftwarereason of the difference is that the backup software createsthebackupdatasetwithoutrealigningdataatfilecreates the backup dataset without realigning data at file boundaries.Fortheexchangebackupdatasetwheretheboundaries. For the exchange backup dataset where the backupsoftwarealignsdataatindividualmailboxes,thebackup software aligns data at individual mailboxes, the globalcompressiondifferenceisless(6.61:1vs.global compression difference is less (6.61:1 vs. 10.28:1),thoughthereisasignificantgap.10.28:1), though there is a significant gap.

FragmentationwillbecomemoresevereforlongtermFragmentation will become more severe for long term retention,andcanreducetheeffectivenessofLocalityretention, and can reduce the effectiveness of Locality PreservedCaching.WehaveinvestigatedmechanismstoPreserved Caching. …… 此处隐藏：4441字，全部文档内容请下载后查看。喜欢就下载吧 ……

2008-FAST-Avoiding the Disk Bottleneck in the Data Domain De(12).doc 将本文的Word文档下载到电脑

下载这篇word文档