Storage device performance prediction with CART models(3)
时间:2025-07-10
时间:2025-07-10
Storage device performance prediction is a key element of self-managed storage systems and application planning tasks, such as data assignment. This work explores the application of a machine learning tool, CART models, to storage device modeling. Our appr
1Introduction
Thecostsandcomplexityofsystemadministrationinstoragesystems[17,35,11]anddatabasesystems[12,1,15,21]makeautomationofadministrationtasksacriticalresearchchallenge.Oneimportantaspectofadministeringself-managedstoragesystems,particularlylargestorageinfrastructures,isdecidingwhichdatasetstostoreonwhichdevices.To ndanoptimalornearoptimalsolutionrequirestheabilitytopredicthowwelleachdevicewillserveeachworkload,sothatloadscanbebalancedandparticularlygoodmatchescanbeexploited.
Researchershavelongutilizedperformancemodelsforsuchpredictiontocomparealternativestoragedevicedesigns.Givensuf cienteffortandexpertise,accuratesimulations(e.g.,[5,28])oranalyticmodels(e.g.,[22,30,31])canbegeneratedtoexploredesignquestionsforaparticulardevice.Unfortunately,inpractice,suchtimeandexpertiseisnotavailablefordeployedinfrastructures,whichareoftencomprisedofnumerousanddistinctdevicetypes,andtheiradministratorshaveneitherthetimenortheexpertiseneededtocon guredevicemodels.
Thispaperattacksthisobstaclebyprovidingablack-boxmodelgenerationalgorithm.By“blackbox,”wemeanthatthemodel(andmodelgenerationsystem)hasnoinformationabouttheinternalcomponentsoralgorithmsofthestoragedevice.Givenaccesstoadeviceforsome“trainingperiod,”themodelgen-erationsystemlearnsadevice’sbehaviorasafunctionofinputworkloads.Theresultingdevicemodelapproximatesthisfunctionusingexistingmachinelearningtools.OurapproachemploystheClassi cationAndRegressionTrees(CART)toolbecauseofitsef ciencyandaccuracy.CARTmodels,inanutshell,approximatefunctionsonamulti-dimensionalCartesianspaceusingpiece-wiseconstantfunctions.
Suchlearning-basedblackboxmodelingisdif cultfortworeasons.First,allthemachinelearningtoolswehaveexaminedusevectorsofscalarsasinput.Existingworkloadcharacterizationmodels,however,pressingthesedistributionsintoasetofscalarsisnotstraightforward.Second,thequalityofthegeneratedmodelsdependshighlyonthequalityofthetrainingworkloads.Thetrainingworkloadsshouldbediverseenoughtoprovidehighcoverageoftheinputspace.
Thisworkdevelopstwowaysofencodingworkloadsasvectors:avectorperrequestoravectorperworkload.Thetwoencodingschemesleadtotwotypesofdevicemodels,operatingattheper-requestandper-workloadgranularities,respectively.Therequest-leveldevicemodelspredicteachrequest’sresponsetimebasedonitsper-requestvector,or“requestdescription.”Theworkload-leveldevicemodels,ontheotherhand,predictaggregateperformancedirectlyfromper-workloadvectors,or“workloaddescriptions.”Ourexperimentsonavarietyofrealworldworkloadshaveshownthatthesedescriptionsarereasonablygoodatcapturingworkloadperformancefrombothsingledisksanddiskarrays.ThetwoCART-basedmodelshaveamedianrelativeerrorof17%and38%,respectively,foraverageresponsetimeprediction,and18%and43%respectivelyforthe90thpercentile,whenthetrainingandtestingtracescomefromthesameworkload.TheCART-basedmodelsalsointerpolatewellacrossworkloads.
Theremainderofthispaperisorganizedasfollows.Section2discussespreviousworkintheareaofstoragedevicemodelingandworkloadcharacterization.Section3describesCARTanditsproperties.Section4describestwoCART-baseddevicemodels.Section5evaluatesthemodelsusingseveralreal-worldworkloadtraces.Section6concludesthepaper.
2RelatedWork
Performancemodelinghasalongandsuccessfulhistory.Almostalways,however,thoroughknowledgeofthesystembeingmodeledisassumed.Disksimulators,suchasPantheon[33]andDiskSim[5],usesoftwaretosimulatestoragedevicebehaviorandproduceaccurateper-requestresponsetimes.Developingsuchsim-ulatorsischallenging,especiallywhendiskparametersarenotpubliclyavailable.Predictingperformance
…… 此处隐藏:1788字,全部文档内容请下载后查看。喜欢就下载吧 ……下一篇:中国食物成分表(全)2010版