Meta-classifier approach to reliable text classification(9)
时间:2026-01-21
时间:2026-01-21
A problem with automatic classifiers is that there is no way to know if a particular classification is just a guess or a certain answer. Reliable classification is the task of predicting whether a certain instance is correctly classified or not, i.e., a cl
1.2.RELATEDWORK
forrandomness,computableapproximationsofalgorithmictestsofrandomness
areused[Proedrouetal.,2001].Inthiscontextwedistinguishthetypicalness
frameworkandthetransductionframework.
Thetypicalnessframeworkprovidesreliabilityestimationsondatathatis
independentlyandidenticallydistributed(iid).Considerasequenceofinstances
togetherwithanewinstanceofanunknownclass.Thetypicalnessframeworkis
usedtogainameasureofreliabilityforallpossibleclassesofthisnewinstance
usingatypicalnessfunction.Foreachpossibleclassitisexaminedhowlikelyit
isthatallinstancesoftheextendedsequence,i.e.,thesequencewiththenew
instanceadded,aredrawnindependentlyfromthesamedistribution.Themore
typicalthesequenceis,thehigherthereliabilitymeasure.
Thetypicalnessfunctioncanbeconstructedbymeasuringthe“strangeness”
ofindividualinstances,usingindividualstrangenessfunctions[Kukar,2004].
Adrawbackofthisapproachisthatthestrangenessfunctiondependsonthe
classi cationalgorithmthatisused.Sofar,theonlysuccessfulapplications
useSupportVectorMachines[Vovketal.,1999]andthenearestneighbour
algorithm[Proedrouetal.,2001].Anotherdisadvantageofthisapproachisits
computationalcomplexity[KukarandKononenko,2002,Melluishetal.,2001].
Anotherstatisticalframeworkforreliableclassi cationisthetransduction
framework.Theframeworkisclassi er-independentanditisbasedonatrans-
ductivestepduringtheclassi cationprocess.First,aninstanceisclassi edbya
baseclassi er,andthentheinstanceisaddedtothetrainingset,togetherwith
theclassi cationitreceivedfromthebaseclassi er.Theclassi erisre-trained
andtheinstanceisclassi edagain.Thereliabilityoftheinstanceclassi ca-
tionismeasuredasthedi erencebetweenposteriorclassprobabilitiesthatthe
instancereceivesbeforeandafteritwasaddedtothetrainingset[Kukarand
Kononenko,2002].Smirnovetal.[2003b]showthatthisapproachisnotonly
computationallyine cient,butitalsorequireshighprecisionofthereal-number
representationwhenalargeamountofdataisusedwithmanyclasses.Onlyin
thecasethatthetrainingsetissmall,addinganinstancetothetrainingset
canchangethelevelofrandomnesssigni cantly.
Kukar[2004]presentsanalgorithmthatjoinsthetransductiveframework
andthetypicalnessframework,therebymakingthetransductivestepstatisti-
callysound.Experimentalresultsshowedthatreliabilitycanbeestimatedquite
accurately,butagainthehighcomputationalcostsposeaproblem.
1.2.3Version-SpaceSupport-VectorMachines
Adi erentapproachtoreliableinstanceclassi cationisbasedonversionspaces
[Smirnovetal.,2005].WerefertoMitchell[1997]foranintroductiontoversion
spaces.Themainideaistoconstructversionspacesthatcontainthehypothe-
sesofthetargetconceptstobelearnedortheircloseapproximations.The
unanimous-votingruleisimplementedbytestingversionspacesforcollapse.
Applyingthisrulemakesitimpossibletomisclassifyinstanceswhenthereisno
noiseinthedata.Althoughexperimentalresultsarepromising,thisapproach
iscomputationallyexpensiveaswell.
3
…… 此处隐藏:1134字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:第八章 收银员的礼仪
下一篇:浅析网络安全技术