c ○ 2001 Kluwer Academic Publishers. Manufactured in The Ne(5)
发布时间:2021-06-07
发布时间:2021-06-07
Abstract. A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains in each cell an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. In practical situat
LOGLINEAR-BASEDQUASICUBES259one(thelaterusedtodemonstratescalability).FinallySection4presentstheconclusionsandfuturework.
2.Ourmethod
Inthissectionwedescribeindetailhowwecompressthedatacubebymeansofconstructingloglinearmodelsfordenseregionsofthecube.Wemodelregionsofthecorecuboid2andemploythesemodelstoestimatethevaluesoftheindividualcells.Thereasontofocusinthecorecuboidissimple:theerrorguaranteesforqueriestothecorecuboidholdforanyothercuboidinthelattice.(Inreality,asweshallsee,theerrorswhenweaggregatecellsofthecorecuboiddecreasedramatically.)
Aswestatedintheintroduction,toavoidincurringinlargeerrorsbytheestimation,weretainallthecellvalueswhoseestimationsarefartherawayfromtherealvaluebymorethanapre-establishedthreshold.Thisthresholdbecomestheguaranteeoftheapproximateanswer.(Aswewillshowlater,manyanswersare,inrealityclosertotherealanswerthanwhatthethresholdpredicts.)Westorethemodelparameters(foreachmodeledregionofthecuboid)alongwiththeretainedcellstoprocessthequeries.
Theissuesinvolvedincompressingthecube,givingapproximateanswerstothequeriesandevaluatingourtechniquecanbesummarizedasfollows:
Selectingchunksofthecorecuboidthatwillbedescribedbymodels(regionsofthecorecuboidthataresuf cientlydense).
Foreachchunktobemodeled,computingthemodelparametersbasedonthedatacon-tainedinthechunkandthen,basedontheestimatedvaluescomputeforeachnon-zerocellinthechunktheestimationerroranddetermineifthecellneedstoberetained.
Organizingthemodelparametersandretainedcellstoef cientlyaccessthemwhenprocessingqueries.
Evaluatingthetradeoffsbetweenspacetakenbythecompressedcube,responsetimeforthequeriesandthequalityoftheapproximateanswersobtained(errors).
Intherestofthissection,wedescribeindetailhowwesolvedeachoftheseissues.
2.1.Dividingthecorecuboid
Inordertoselectthechunksinwhichwedividethecorecuboidweuseadensitybasedapproachcalledhierarchicalclustering(JainandDubes,1988)togethighdensityportionsofthecube.Thisapproachhasbeenpreviouslyutilizedtoidentifyregionsofhighdensityinmultidimensionaldata(Agrawaletal.,1996).(TheaiminAgrawaletal.(1996)istohaveadensityapproachtoclustering,whereaclusterisde nedasaregionwithhigherdensityofpointsthanitssurroundingregions.)Weassumethatthecorecuboidhasbeencomputed(analgorithmsuchastheonepresentedinRossandSrivastava(1997)iswellsuitedforthetask).Givenad-dimensionaldatacubewithdimensionsA={D1,D2,...,Dd},wheredisthenumberofdimensions,weassumeitbeasetofbounded,totallyordereddomainsspace.3Accordingly,S= D1 × D2 ×···× Dd ,isthepossiblenumberofcellsin
上一篇:同方易教常见问题解决