Meta-classifier approach to reliable text classification(17)

时间:2026-01-21

A problem with automatic classifiers is that there is no way to know if a particular classification is just a guess or a certain answer. Reliable classification is the task of predicting whether a certain instance is correctly classified or not, i.e., a cl

2.4.CHAPTERSUMMARY

Removesuffixes(-e,-en)

If(String.length>1)

Removesuffixes(-ig,-ing,-baar,-bar,-heid,-etj,-tj)

If(String.length>1)

Replace(-opleiding→-opl,administratief→adm)

If(String.length>1)

Replacesuffixes(-v→-f,-z→-s)

Replacesuffixes(-dd→-d,-ff→-f,-gg→-g,-kk→-k,

-ll→-l,-mm→-m,-nn→-n,-pp→-p,-rr→-r,

-ss→-s,-tt→-t)

Figure2.1:Pseudocodeofthestemmingalgorithm.

occurringterms(administratiefandopleiding)totheirabbreviations(adm.

andopl.),whichalsofrequentlyoccur.

Thereductionofthenumberofattributesasaresultofstemmingisgiven

intable2.2.TheEducation1datasetandEducation2datasetcontainthesame

attributesexceptfortheclassattribute.Thereforetheresultsofthefeature

reductionarethesameandthetableshowsonlyoneEducationcolumnthat

representsbothdatasets.Forbothprofession lesthereductionisaround10%,

fortheeducation lesthereductionis6.4%.Besidesthisreductionalsoan

increaseinclassi cationperformancehasbeenobserved.Theperformancein-

creasedependsonthealgorithmandthedatasetused,itliesbetween0.15%

and1.60%.Dataset

Sample1

Sample2

Sample3

AverageEducation6.40%6.40%6.40%6.40%Profession110.00%10.14%9.97%10.04%Profession210.57%10.14%10.99%10.57%

Table2.2:Reductioninthenumberoffeaturesasaresultofstemming.

2.4ChapterSummary

ThefourCBSdatasetsprovideuswithchallengingtextclassi cationtasks,due

tothelargeamountsofrecords,attributes,andclasses.Randomsamplesfrom

10,000and20,000recordsaretakenfromthedatasetstoreducethecomputa-

tionalcomplexity.Furthermore,onlytheinformative eldshavebeenselected

andfeature-reductiontechniqueshavebeenapplied.Thefeature-reductiontech-

niquesincludedocument-frequencytresholding,removalofstopwordsandstem-

ming.Togethertheyleadtoafeaturereductionofapproximately60%.The

resultsofthedatapreparationarefourdatasetsthatarereducedinsizeand

canbeusedforclassi cation.

11

…… 此处隐藏:45字,全部文档内容请下载后查看。喜欢就下载吧 ……
Meta-classifier approach to reliable text classification(17).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:4.9 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:19元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219