Estimating the quality of data in relational databases(2)
时间:2025-04-03
时间:2025-04-03
1Introduction
Theimportanceofdataqualityintheinformationagecannotbeoverestimated.People,businesses,andgovernmentsrelymoreandmoreoninformationintheireverydayoperations,anddatabasesofdi erentkindsaretheprimarysourceofthisinformation.Ourdependenceondatabasesgrowssimultaneouslywiththeirsize,yetmostlargedatabasescontainerrorsandinconsistencies.Thereisagrowingawarenessinthedatabaseresearchcommunity
[13,19]andamongdatabasepractitioners[1]oftheproblemofdataquality.Bynow,theneedfordataqualitymetricsandformethodsforincorporatingthemindatabasesystemsiswellunderstood.Dataqualitycanbemetricizedinanumberofdi erentwaysdependingonwhichaspectofinformationareconsideredimportant[18,5].Theadditionofdataqualitycapabilitiestodatabasesystemswillenhancedecision-makingprocesses,improvethequalityofinformationservices,and,ingeneral,providemoreaccuratepicturesofreality.Ontheotherhand,thesenewcapabilitiesofdatabasesshouldnotbedemandingintermsofresources,e.g.,theymustnotaddtoomuchcomplexitytoqueryprocessingorrequiremuchmorememorythanexistingdatabases.
Therecentadvancesinthe eldofdataqualityconcerndataatanattributevaluelevel[18]andatarelationlevel[14].Thecomprehensivesurveyofthestate-of-the-artinthe eldisgivenin[19].Therelationalalgebraextendedwithdataaccuracyestimatesbasedontheassumptionsofuniformdistributionsofincorrectvaluesacrosstuplesandattributeswas rstdescribedin[14].
Withmoreandmoreelectronicinformationsourcesbecomingwidelyavailable,theissueofthequalityofthese,often-competing,sourceshasbecomegermane.Weproposeastandardforratinginformationsourceswithrespecttotheirquality.Animportantconsiderationisthatthequalityofinformationsourcesoftenvariesconsiderablywhenspeci careaswithinthesesourcesareconsidered.Thisimpliesthattheassignmentofasingleratingofqualitytoaninformationsourceisusuallyunsatisfactory.Ofcourse,totheuserofaninformationsourcetheoverallqualityofthesourcemaynotbeasimportantasthequalityofthespeci cinformationthatthisuserisextractingfromthesource.Therefore,methodsmustbedevelopedthatwillderivereliableestimatesofthequalityoftheinformationprovidedtousers,fromthequalityspeci cationsthathavebeenassignedtothesources.
Ourworkherebearsonalltheseconcerns.Wedescribeanapproachthatusesdualqualitymeasuresthatgaugethedistanceoftheinformationinadatabasefromthetruth.Wethenproposetocombinemanualveri cationwithstatisticalmethodstoarriveatusefulestimatesofthequalityofdatabases.Weconsiderthevarianceinqualitybyisolatingareasofdatabasesthatarehomogeneouswithrespecttoquality,andthenestimatingthequalityofeachseparatearea.Thesecompositeestimatesmayberegardedasqualityspeci cationthatwillbea xedtothedatabase.Finally,weshowhowtoderivequalityestimatesforindividualqueriesfromsuchqualityspeci cations.
…… 此处隐藏:700字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:三结合教育工作总结
下一篇:理性的批判和道义的批判