2008-FAST-Avoiding the Disk Bottleneck in the Data Domain De(10)
时间:2026-01-16
时间:2026-01-16
FAST有关论文。。
Figure 6: Logical/Physical Capacities at Data Center B.Figure6:Logical/PhysicalCapacitiesatDataCenterB.
Figure 7: Compression Ratios at Data Center B.Figure7:CompressionRatiosatDataCenterB.
MinMin
DailyglobalDaily global compressioncompressionDaily local Dailylocalcompressioncompression
5.095.091.401.40
MaxMax45.1645.164.134.13
AverageAverage13.9213.922.332.33
StandardStandard
deviationdeviation9.089.080.570.57
effective.Independentofseeding,Ziv-LLempelstyleeffective. ndependent of seeding, Ziv-Lempel style
compression is relatively stable, giving a reduction of compressionisrelativelystable,givingareductionofabout2overtime.Therealworldobservationsontheabout 2 over time. The real world observations on the applicabilityofduplicatesegmenteliminationduringapplicability of duplicate segment elimination during seedingandafterseedingareparticularlyrelevantinseeding and after seeding are particularly relevant in evaluatingourtechniquestoreducediskaccessesbelow.evaluating our techniques to reduce disk accesses below. ISavingswithSummaryVectorand5.25.2 I/O Savings with Summary Vector and I/O
Locality Preserved Caching LocalaityPreservedCachingTodeterminetheeffectivenessoftheSummaryVectorTo determine the effectiveness of the Summary Vector andLocalityPreservedCaching,weexaminethesavingsand Locality Preserved Caching, we examine the savings fordiskreadstofindduplicatesegmentsusingafor disk reads to find duplicate segments using a SummaryVectorandLocalityPreservedCaching.Summary Vector and Locality Preserved Caching. Weusetwointernaldatasetsforourexperiment.OneisaWe use two internal datasets for our experiment. One is a dailyfullbackupofacompany-wideExchangedaily full backup of a company-wide Exchange informationstoreovera135-dayperiod.Theotheristheinformation store over a 135-day period. The other is the weeklyfullanddailyincrementalbackupofanweekly full and daily incremental backup of an Engineeringdepartmentovera100-dayperiod.Table3Engineering department over a 100-day period. Table 3 summarizeskeyattributesofthesetwodatasets.summarizes key attributes of these two datasets.
TheseinternaldatasetsaregeneratedfromproductionThese internal datasets are generated from production usage(albeitinternal).Wealsoobservethatvarioususage (albeit internal). We also observe that various compressionratiosproducedbytheinternaldatasetsarecompression ratios produced by the internal datasets are relativelysimilartothoseofrealworldexamplesrelatively similar to those of real world examples examinedinsection5.1.Webelievetheseinternalexamined in section 5.1. We believe these internal datasetsarereasonableproxiesofrealworlddatasets are reasonable proxies of real world deployments.deployments.
EachofthebackupdatasetsissenttothededuplicatingEach of the backup datasets is sent to the deduplicating storagesystemwithasinglebackupstream.Withrespectstorage system with a single backup stream. With respect tothededuplicationstoragesystem,wemeasuretheto the deduplication storage system, we measure the numberofdiskreadsforsegmentindexlookupsandnumber of disk reads for segment index lookups and localityprefetchesneededtofindduplicatesduringwritelocality prefetches needed to find duplicates during write forfourcases:for four cases:
(1) withneitherSummaryVectornorLocality(1)with neither Summary Vector nor Locality
PreservedCaching;Preserved Caching; (2) withSummaryVectoronly;(2)with Summary Vector only;
(3) withLocalityPreservedCachingonly;and(3)with Locality Preserved Caching only; and
Table2:STable 2:Statistics on Daily GlobaltatisticsonDailyGlobalaand Daily Local ndDailyLocal
Compression Ratios at Data Center BCompressionRatiosatDataCenterB
Logicalcapacity(TB)Logical capacity (TB)
Physicalcapacityafterdeduplicatingsegments(TB)(TB)
Global compressionGlobalcompression
Physicalcapacityafterlocal compression (TB)localcompression(TB)Local compression LocalcompressionTotal compressionTotalcompression
Exchange ExchangeEEngineering ngineering
datadatadatadata2.762.762.542.540.490.495.695.690.220.222.172.1712.3612.36
0.500.505.045.040.2610.2611.931.93
9.759.75
Table3:CTable 3:Capacities and Compression RatiosapacitiesandCompressionRatiosoon n
Exchange and Engineering DatasetsExchangeandEngineeringDatasets
Table2summarizestheminimum,maximum,average,Table 2 summarizes the minimum, maximum, average, and standard deviation of both daily global and daily andstandarddeviationofbothdailyglobalanddailylocal compression ratios, excluding seeding and days localcompressionratios,excludingseedinganddayswithoutbackup.without backup.
The two sets of results show that the deduplication Thetwosetsofresultsshowthatthededuplicationstorage system works well with the real world datasets. storagesystemworkswellwiththerealworlddatasets.Asexpected,bothcumulativeglobalandcumulativeAs expected, both cumulative global and cumulative totalcompressionratiosincreaseasthesystemholdstotal compression ratios increase as the system holds morebackupdata.more backup data.
Duringseeding,duplicatesegmenteliminationtendstoDuring seeding, duplicate segment elimination tends to beineffective,becausemostsegmentsarenew.Afterbe ineffective, because most segments are new. After seeding,despitethelargevariationintheactualnumber,seeding, despite the large variation in the actual number, duplicatesegmenteliminationbecomesextremelyduplicate segment elimination becomes extremely
278
FAST’08:6thUSENIXConferenceonFileandStorageTechnologiesUSENIXAssociation
…… 此处隐藏:3467字,全部文档内容请下载后查看。喜欢就下载吧 ……