Meta-classifier approach to reliable text classification(2)
时间:2026-01-21
时间:2026-01-21
A problem with automatic classifiers is that there is no way to know if a particular classification is just a guess or a certain answer. Reliable classification is the task of predicting whether a certain instance is correctly classified or not, i.e., a cl
Abstract
Aproblemwithautomaticclassi ersisthatthereisnowaytoknowifaparticu-
larclassi cationisjustaguessoracertainanswer.Reliableclassi cationisthe
taskofpredictingwhetheracertaininstanceiscorrectlyclassi edornot,i.e.,a
classi cationisclassi edaseitherreliableorunreliable.Whentheclassi cation
isclassi edasunreliable,itislikesaying“Idonotknow”,andtheinstance
doesnotreceiveaclassi cation.
Givenabaseclassi er,themeta-classi erapproachistotrainameta-
classi erthatpredictsthecorrectnessofeachclassi cationofthebaseclassi er.
Theclassi cationruleofthemeta-classi erapproachistoassignaclasspre-
dictedbythebaseclassi ertoaninstanceifthemeta-classi erdecidesthatthe
baseclassi cationisreliable.
Themeta-classi erapproachisappliedontextclassi cationtasksprovided
bytheCBStoanswerthefollowingproblemstatement:
Doesthemeta-classi erapproachprovideapracticalsolutionto
reliabletextclassi cation?
The rstpartoftheresearchstudiestextclassi ers,andprovidesananswer
totheresearchquestion:
1.Whichtextclassi ersachievehighaccuracyandatthesametimehave
smallspaceandtimecomplexity?
ExperimentsontheCBSdatasetsshowthatthenearestneighbourandthe
na¨ veBayesalgorithmincombinationwiththetfidftextrepresentationare
acceptabletextclassi ers.
Thesecondpartoftheresearchstudiesthemeta-classi erapproachtopro-
videananswertothesecondandthirdresearchquestion.Oursecondresearch
questionis:
2.Whattypeofmetadatarepresentationisbestsuitedforreliabletextclas-
si cation?
Themeta-classi eristrainedonseveraltypesofmetadatarepresentations.The
usedmetadatarepresentationsincludetheoriginalinstances,theprobability
distributionofthebaseclassi erandasetofbasicstatisticsabouttheclassi -
cationofthebaseclassi er.Fortaskswithmanyclasses,theoriginalinstances
representationisbest.Fortaskswithasmallnumberofclasses,theoriginal
instancesrepresentationandtheprobabilitydistributionarebothgoodcandi-
dates.
i
…… 此处隐藏:127字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:第八章 收银员的礼仪
下一篇:浅析网络安全技术