Meta-classifier approach to reliable text classification(21)
时间:2026-01-21
时间:2026-01-21
A problem with automatic classifiers is that there is no way to know if a particular classification is just a guess or a certain answer. Reliable classification is the task of predicting whether a certain instance is correctly classified or not, i.e., a cl
3.3.EXPERIMENTSWITHTHETEXTCLASSIFIERSDigit
Subclass
Example1Level62Sublevel034SOIcodeField31Sub eld50567Serialnr.18TrackNVT9PhaseNVT
Table3.1:SubclassesintheeducationSOI+code.
Theclassattributecanbedividedintofourorsevensubclasses.The rst
foursubclassesofthesevensubclassestogetherareequaltothe rstsubclassof
thefoursubclasses.Intable3.1thedivisionoftheSOI+codeintosubclasses
isillustrated.The rstrowinthetablearethedigitsintheSOI+code.The
secondrowshowsthesubclasses,andthethirdrowshowsthedivisionofthe
code603150-1-NVT-NVTinsevensubclasses.Thevaluesofthesubclassescan
bederiveddirectlyfromtheoriginalclassvalue,i.e.,thesubclassvalueconsists
ofoneormoredigitsfromtheoriginalclassvalue.
Thehierarchicalclassi erclassi esaninstanceinmultiplesteps.Ineach
steponesubclassisclassi ed.Thevalueofthesubclassissubsequentlyadded
tothelistofattributesthatisusedtoclassifythenextsubclass.Thisprocess
continuesuntilallsubclassvalueshavebeenclassi ed.Allthesubclassvalues
arethenmergedtocreatethe nalclassvalue.
Thesubclassthatistheeasiesttopredict,willbepredicted rst.Inmost
hierarchicalclassi ersthisisthesubclassatthetopofthehierarchy,whichrep-
resentsthemostgeneralconcept.Inourcasewedonothaveastricthierarchy.
Todeterminewhichsubclassshouldbepredicted rst,wetrainaclassi erfor
eachpossiblesubclassandthesubclassthathasthehighestclassi cationaccu-
racyisclassi ed rst.Inthesamewaywedeterminetheclassi cationorderof
theremainingsubclasses.
3.3ExperimentswiththeTextClassi ers
Thissectiondescribestheexperimentsthathavebeencarriedoutinorderto
assesstheperformanceofthedi erentclassi ers.Besidestheexperimentsthat
measuretheclassi cationaccuracyofthedi erentclassi ers,someadditional
experimentshavebeencarriedouttoderivetheoptimalsettingofparameters
forthehierarchicalclassi er.
3.3.1ExperimentalSettingandMethodology
Thetextclassi ersaretestedontheProfession1,theProfession2,theEduca-
tion1,andtheEducation2datasets.Anearestneighbourclassi er(NN)and
ana¨ veBayesclassi er(NB)aretrainedoneachdatasetincombinationwith
thetfandtfidftextrepresentation.Thedatasetshavebeensampledwith-
outreplacementtoreducethecomputationalrequirements.Threesamplesof
roughly20,000instancesaretakenforeachdataset.Eachclassi eristrained
foreachsampleandthentheresultsareaveraged.Wemeasuretheclassication
accuracy,i.e.,thepercentageofcorrectlyclassi edrecords.
Totestthehierarchicalclassi erasamplefromtheEducation1datasetof
15
…… 此处隐藏:652字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:第八章 收银员的礼仪
下一篇:浅析网络安全技术