2008-FAST-Avoiding the Disk Bottleneck in the Data Domain De(10)

时间:2026-01-16

FAST有关论文。。

Figure 6: Logical/Physical Capacities at Data Center B.Figure6:Logical/PhysicalCapacitiesatDataCenterB.

Figure 7: Compression Ratios at Data Center B.Figure7:CompressionRatiosatDataCenterB.

MinMin

DailyglobalDaily global compressioncompressionDaily local Dailylocalcompressioncompression

5.095.091.401.40

MaxMax45.1645.164.134.13

AverageAverage13.9213.922.332.33

StandardStandard

deviationdeviation9.089.080.570.57

effective.Independentofseeding,Ziv-LLempelstyleeffective. ndependent of seeding, Ziv-Lempel style

compression is relatively stable, giving a reduction of compressionisrelativelystable,givingareductionofabout2overtime.Therealworldobservationsontheabout 2 over time. The real world observations on the applicabilityofduplicatesegmenteliminationduringapplicability of duplicate segment elimination during seedingandafterseedingareparticularlyrelevantinseeding and after seeding are particularly relevant in evaluatingourtechniquestoreducediskaccessesbelow.evaluating our techniques to reduce disk accesses below. ISavingswithSummaryVectorand5.25.2 I/O Savings with Summary Vector and I/O

Locality Preserved Caching LocalaityPreservedCachingTodeterminetheeffectivenessoftheSummaryVectorTo determine the effectiveness of the Summary Vector andLocalityPreservedCaching,weexaminethesavingsand Locality Preserved Caching, we examine the savings fordiskreadstofindduplicatesegmentsusingafor disk reads to find duplicate segments using a SummaryVectorandLocalityPreservedCaching.Summary Vector and Locality Preserved Caching. Weusetwointernaldatasetsforourexperiment.OneisaWe use two internal datasets for our experiment. One is a dailyfullbackupofacompany-wideExchangedaily full backup of a company-wide Exchange informationstoreovera135-dayperiod.Theotheristheinformation store over a 135-day period. The other is the weeklyfullanddailyincrementalbackupofanweekly full and daily incremental backup of an Engineeringdepartmentovera100-dayperiod.Table3Engineering department over a 100-day period. Table 3 summarizeskeyattributesofthesetwodatasets.summarizes key attributes of these two datasets.

TheseinternaldatasetsaregeneratedfromproductionThese internal datasets are generated from production usage(albeitinternal).Wealsoobservethatvarioususage (albeit internal). We also observe that various compressionratiosproducedbytheinternaldatasetsarecompression ratios produced by the internal datasets are relativelysimilartothoseofrealworldexamplesrelatively similar to those of real world examples examinedinsection5.1.Webelievetheseinternalexamined in section 5.1. We believe these internal datasetsarereasonableproxiesofrealworlddatasets are reasonable proxies of real world deployments.deployments.

EachofthebackupdatasetsissenttothededuplicatingEach of the backup datasets is sent to the deduplicating storagesystemwithasinglebackupstream.Withrespectstorage system with a single backup stream. With respect tothededuplicationstoragesystem,wemeasuretheto the deduplication storage system, we measure the numberofdiskreadsforsegmentindexlookupsandnumber of disk reads for segment index lookups and localityprefetchesneededtofindduplicatesduringwritelocality prefetches needed to find duplicates during write forfourcases:for four cases:

(1) withneitherSummaryVectornorLocality(1)with neither Summary Vector nor Locality

PreservedCaching;Preserved Caching; (2) withSummaryVectoronly;(2)with Summary Vector only;

(3) withLocalityPreservedCachingonly;and(3)with Locality Preserved Caching only; and

Table2:STable 2:Statistics on Daily GlobaltatisticsonDailyGlobalaand Daily Local ndDailyLocal

Compression Ratios at Data Center BCompressionRatiosatDataCenterB

Logicalcapacity(TB)Logical capacity (TB)

Physicalcapacityafterdeduplicatingsegments(TB)(TB)

Global compressionGlobalcompression

Physicalcapacityafterlocal compression (TB)localcompression(TB)Local compression LocalcompressionTotal compressionTotalcompression

Exchange ExchangeEEngineering ngineering

datadatadatadata2.762.762.542.540.490.495.695.690.220.222.172.1712.3612.36

0.500.505.045.040.2610.2611.931.93

9.759.75

Table3:CTable 3:Capacities and Compression RatiosapacitiesandCompressionRatiosoon n

Exchange and Engineering DatasetsExchangeandEngineeringDatasets

Table2summarizestheminimum,maximum,average,Table 2 summarizes the minimum, maximum, average, and standard deviation of both daily global and daily andstandarddeviationofbothdailyglobalanddailylocal compression ratios, excluding seeding and days localcompressionratios,excludingseedinganddayswithoutbackup.without backup.

The two sets of results show that the deduplication Thetwosetsofresultsshowthatthededuplicationstorage system works well with the real world datasets. storagesystemworkswellwiththerealworlddatasets.Asexpected,bothcumulativeglobalandcumulativeAs expected, both cumulative global and cumulative totalcompressionratiosincreaseasthesystemholdstotal compression ratios increase as the system holds morebackupdata.more backup data.

Duringseeding,duplicatesegmenteliminationtendstoDuring seeding, duplicate segment elimination tends to beineffective,becausemostsegmentsarenew.Afterbe ineffective, because most segments are new. After seeding,despitethelargevariationintheactualnumber,seeding, despite the large variation in the actual number, duplicatesegmenteliminationbecomesextremelyduplicate segment elimination becomes extremely

278

FAST’08:6thUSENIXConferenceonFileandStorageTechnologiesUSENIXAssociation

…… 此处隐藏:3467字,全部文档内容请下载后查看。喜欢就下载吧 ……
2008-FAST-Avoiding the Disk Bottleneck in the Data Domain De(10).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:4.9 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:19元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219