Meta-classifier approach to reliable text classification(8)
时间:2026-01-21
时间:2026-01-21
A problem with automatic classifiers is that there is no way to know if a particular classification is just a guess or a certain answer. Reliable classification is the task of predicting whether a certain instance is correctly classified or not, i.e., a cl
1.2.RELATEDWORK
i.e.,theclassi cationoftheinstanceisclassi edaseitherreliableorunreliable.
Whentheclassi cationisclassi edasreliable,itiscomparabletosaying“I
know”,andtheinstanceiscorrectlyclassi ed.Whentheclassi cationisclassi-
edasunreliable,itiscomparabletosaying“Idonotknow”,andtheinstance
isnotclassi ed.Solvingthetaskmeansthatallclassi cationsarecorrectand
theclassi erachievesanaccuracyof100%onthe“Iknow”instances.
Thereliabilityofaclassi cationcanbede nedinseveralways.Weadopt
thede nitionofKukarandKononenko[2002].Theyde nethereliabilityofa
classi cationastheprobabilitythattheclassi cationiscorrect.
Reliableclassi cationhastwoimportantapplicationareas:real-worldap-
plicationsandensemblesofclassi ers.Reliableclassi cationcanbeusedin
real-world,safety-criticaldomains,becauseitreducestheriskofmisclassi ca-
tion.Whenunreliableclassi cationsarediscarded,thecoverage(percentageof
recordsthatcanbeclassi ed)oftheclassi erdiminishes,butatthesametime
itsaccuracy(percentageofcorrectlyclassi edrecords)increases.
Reliableclassi cationisalreadybeingusedinanothercontextaswell,namely
inensemblesofclassi ers.Ensemblesclassifyaninstancebycombiningthe
classi cationsofmultipleclassi ers.The nalclassi cationcanbeobtainedin
severalways,e.g.,by(unanimous)votingoftheclassi ers.Reliableclassi ca-
tioncanbeusedtosifttheclassi cationsoftheclassi ers.Onlytheclassi ers
thatmakeareliableclassi cationwillbeusedtomakethe nalclassi cation
biningensemblesofclassi erswithreliableclassi cation
cansubstantiallyincreaseaccuracy[Seewald,2003,Ting,1996].
1.2RelatedWork
Fourdi erentapproachestoreliableclassi cationthathavebeendevelopedin
thelasttenyearsarediscussedinthissection:theBayesianframework,the
statisticalframeworks2,version-spacesupport-vectormachines,andthemeta-
classi erapproach.
1.2.1BayesianFramework
Classi ersintheBayesianframeworkusuallyproduceclassi cationsintheform
ofposteriorprobabilitydistributionsoverallpossibleclasses.Thesimplest
approachtoreliableclassi cationistousetheposteriorprobabilityofasingle
classi cationasameasureforreliability[Ting,1996].However,itisshown
thatposteriorprobabilitiesofclassi erslikena¨ veBayesarepoormeasuresfor
reliability,sincetheyarebasedonincorrectpriorassumptions[Melluishetal.,
2001,Delanyetal.,2004].Thus,theBayesianframeworkcanbemisleadingfor
reliableclassi cation,andisthereforeconsideredinappropriateforourpurposes.
1.2.2StatisticalFrameworks
Severalapproachestoreliableclassi cationarebasedonthealgorithmictheory
ofrandomness[Vovketal.,1999].Sincethereisnocomputable,universaltest
outputoftheclassi ersintheBayesianframeworkandthestatisticalframeworksis
areliabilityestimation,ratherthanareliableclassi cation.Byusingthresholds,reliability
estimationscaneasilybeconvertedtoreliableclassi cations.Whenthereliabilityestimation
isaboveacertaintresholdtheclassi cationisconsideredreliable.2The
2
…… 此处隐藏:1104字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:第八章 收银员的礼仪
下一篇:浅析网络安全技术