Meta-classifier approach to reliable text classification(21)

时间:2026-01-21

A problem with automatic classifiers is that there is no way to know if a particular classification is just a guess or a certain answer. Reliable classification is the task of predicting whether a certain instance is correctly classified or not, i.e., a cl

3.3.EXPERIMENTSWITHTHETEXTCLASSIFIERSDigit

Subclass

Example1Level62Sublevel034SOIcodeField31Sub eld50567Serialnr.18TrackNVT9PhaseNVT

Table3.1:SubclassesintheeducationSOI+code.

Theclassattributecanbedividedintofourorsevensubclasses.The rst

foursubclassesofthesevensubclassestogetherareequaltothe rstsubclassof

thefoursubclasses.Intable3.1thedivisionoftheSOI+codeintosubclasses

isillustrated.The rstrowinthetablearethedigitsintheSOI+code.The

secondrowshowsthesubclasses,andthethirdrowshowsthedivisionofthe

code603150-1-NVT-NVTinsevensubclasses.Thevaluesofthesubclassescan

bederiveddirectlyfromtheoriginalclassvalue,i.e.,thesubclassvalueconsists

ofoneormoredigitsfromtheoriginalclassvalue.

Thehierarchicalclassi erclassi esaninstanceinmultiplesteps.Ineach

steponesubclassisclassi ed.Thevalueofthesubclassissubsequentlyadded

tothelistofattributesthatisusedtoclassifythenextsubclass.Thisprocess

continuesuntilallsubclassvalueshavebeenclassi ed.Allthesubclassvalues

arethenmergedtocreatethe nalclassvalue.

Thesubclassthatistheeasiesttopredict,willbepredicted rst.Inmost

hierarchicalclassi ersthisisthesubclassatthetopofthehierarchy,whichrep-

resentsthemostgeneralconcept.Inourcasewedonothaveastricthierarchy.

Todeterminewhichsubclassshouldbepredicted rst,wetrainaclassi erfor

eachpossiblesubclassandthesubclassthathasthehighestclassi cationaccu-

racyisclassi ed rst.Inthesamewaywedeterminetheclassi cationorderof

theremainingsubclasses.

3.3ExperimentswiththeTextClassi ers

Thissectiondescribestheexperimentsthathavebeencarriedoutinorderto

assesstheperformanceofthedi erentclassi ers.Besidestheexperimentsthat

measuretheclassi cationaccuracyofthedi erentclassi ers,someadditional

experimentshavebeencarriedouttoderivetheoptimalsettingofparameters

forthehierarchicalclassi er.

3.3.1ExperimentalSettingandMethodology

Thetextclassi ersaretestedontheProfession1,theProfession2,theEduca-

tion1,andtheEducation2datasets.Anearestneighbourclassi er(NN)and

ana¨ veBayesclassi er(NB)aretrainedoneachdatasetincombinationwith

thetfandtfidftextrepresentation.Thedatasetshavebeensampledwith-

outreplacementtoreducethecomputationalrequirements.Threesamplesof

roughly20,000instancesaretakenforeachdataset.Eachclassi eristrained

foreachsampleandthentheresultsareaveraged.Wemeasuretheclassication

accuracy,i.e.,thepercentageofcorrectlyclassi edrecords.

Totestthehierarchicalclassi erasamplefromtheEducation1datasetof

15

…… 此处隐藏:652字,全部文档内容请下载后查看。喜欢就下载吧 ……
Meta-classifier approach to reliable text classification(21).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:4.9 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:19元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219