Estimating the quality of data in relational databases(3)
发布时间:2021-06-07
发布时间:2021-06-07
2OverallApproach
Ouroverallapproachforachievingthegoalsthatwerestatedintheintroductioncanbedescribedasasequenceofproblems.
Webegin,inSection3,bydescribingthedualmeasuresthatwillbeusedtogaugethequalityofdatabaseinformation.Weclaimthatthesemeasurescaptureinamostnaturalway,therelationshipofthestoredinformationtotruth,andarethereforeexcellentindicatorsofquality.
Ourmeasuresrequiretheauthenticationofdatabaseinformation,whichisaprocessthatneedstobedonebyhumans.However,weadvocatetheuseofstatisticalmethods(essentially,sampling)tokeepthemanualworkwithinacceptablelimits.ThissubjectisdiscussedinSection4.1.
Havingobtainedaccurateinformationaboutthequalityofthesamples,weproceedtopartitionthegivendatabasetoasetofcomponentsthatarehomogeneouswithrespecttoourqualitymeasures.Wethenestimatethequalityofthesecomponents,usingthesamples.Thisimpliesthatwheninformationisextractedfromasinglecomponent,itsqualityratingsareinheritedfromthecontainingcomponent.Thesemethods,describedinSections4.2and4.3,provideuswithqualityspeci cationsforthegivendatabases.
Finally,inSection5,wedescribetheprocessofinferringthequalityofanswerstoarbi-traryqueriesfromthequalityspeci cationsthathavebeenassignedtothedatabase.Ourtreatmentoftheproblemisinthecontextofrelationaldatabases,andweassumethestandardde nitionsoftherelationalmodel[17].Inparticular,thedatabasecomponentsmentionedearlierarede nedusingthemechanismofviews.Wealsomakethefollowingassumptions.
1.Queriesandviewsuseonlytheprojection,selection,andCartesianproductoperations,selectionsuseonlyrangeconditions,andprojectionsalwaysretainthekeyattribute(s).
2.Thestoredinformation(thedatabaseinstances)arerelativelystatic,andhencethequalityofdatadoesnotchangefrequently.
Becauseofspacelimitations,severalkeyissuesandsolutionsareonlysketchedinthispaper,andfullerdiscussionsareprovidedin[11].
上一篇:三结合教育工作总结
下一篇:理性的批判和道义的批判