Meta-classifier approach to reliable text classification(11)
时间:2026-01-21
时间:2026-01-21
A problem with automatic classifiers is that there is no way to know if a particular classification is just a guess or a certain answer. Reliable classification is the task of predicting whether a certain instance is correctly classified or not, i.e., a cl
1.3.PROBLEMSTATEMENTANDRESEARCHQUESTIONS
tisticallysoundandcomputationallye cient.Thefourdi erentapproachesto
reliableclassi cationthatwediscussedinsection1.2allhaveshortcomings.The
Bayesianframeworkistoodependentonassumptionsaboutthepriordistribu-
tionwhenappliedinpracticeforreliableclassi cation.Thetypicalnessframe-
work,transductionframework,andVersion-SpaceSupport-Vectormachinesare
alltheoreticallysound,buttheyarecomputationallyexpensive.Themeta-
classi erapproachistheonlycandidateforourstudy.Byconstruction,
1.theapproachissoundaslongasthemeta-classi erisaccurate.The
accuracyofaclassi eristheproportionofcorrectlyclassi edinstances.
2.theapproachise cientaslongasthebaseclassi erandthemeta-classi er
aree cient.
Ourstudyispartlymotivatedbythefactthatthemeta-classi erapproach
wasnotexhaustivelystudied.Existingstudiesonmeta-classi ersfocusedon
theglobalapproachoronapplicationinthecontextofensemblesonly[Seewald,
2003,Smirnovetal.,2003a,Delanyetal.,2004].Thetheoreticalframeworkof
themeta-classi erapproachandthelocalmeta-classi erapproachhavenever
beenanalysedindetailforareal-worldapplication.Ouraimistoinvestigate
themeta-classi erapproachinthecontextoftextclassi cation.Hence,our
problemstatementreadsasfollows:
Doesthemeta-classi erapproachprovideapracticalsolutionto
reliabletextclassi cation?
Toanswerourproblemstatementweapplythemeta-classi erapproachon
realtextclassi cationtasksanddatasetsprovidedbytheCentraalBureauvoor
deStatistiek(CBS).Thedatasetsmatchthescaleoftheproblemwewouldlike
totackle:thedatasetsarelarge,ofhighdimensionality,andhavealarge
numberofclasses.
Inordertoaddresstheproblemstatement,threeresearchquestionshave
beenformulated.Tosolvethetextclassi cationtasksoftheCBSwe rsthave
totraintextclassi ers,andthenapplythemeta-classi erapproach.Therefore
ourresearchconsistsoftwoparts.
The rstpartoftheresearchstudiestheapplicabilityofdi erenttextclas-
si ersfortheCBSdata,leadingtothe rstresearchquestion:
1.Whichtextclassi ersachievehighaccuracyandatthesametimehave
smallspaceandtimecomplexity?
Thesecondandmainpartoftheresearchinvestigatesdi erentmeta-classi er
approachesappliedtothetextclassi erstrainedontheCBSdata.Thesecond
andthirdresearchquestionreadasfollows.
2.Whattypeofmetadatarepresentationisbestsuitedforreliabletextclas-
si cation?
3.Shouldthemeta-classi erbelocalorglobal?
Thesequestionsaimatstudyingtwomainfeatures,viz.thetypeofmetadata
representation,andthenature(i.e.,localorglobal).Themeta-classi ersare
evaluatedonthetextclassi cationtasksprovidedbytheCBS.Di erenttext
classi ersandmeta-classi ersarecombinedinordertoseewhichcombination
performsbest.
5
…… 此处隐藏:866字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:第八章 收银员的礼仪
下一篇:浅析网络安全技术