Generalizing Subcategorization Frames Acquired from Corpora(5)

发布时间:2021-06-08

This paper presents a method of improving the quality of subcategorization frames (SCFs) acquired from corpora in order to augment a lexicon of a lexicalized grammar. We first estimate a confidence value that a word can have each SCF, and create an SCF con

Table1:TreefamiliesoftheXTAGEnglishgrammarmappedfrom23outof163SCFtypes

Tnx0Vnx1Tnx0Vs1

Tnx0Vnx2nx1Tnx0Vnx1Pnx2Tnx0Vnx1pnx2Tnx0Vplnx1Tnx0VplTnx0Vnx1s2Tnx0Vpnx1Ts0Vnx1Tnx0Vax1

Tnx0Vplnx2nx1

Transitive

SententialcomplementDitransitive

MultipleanchorditransitivewithPPDitransitivewithPPTransitiveverbParticleIntransitiveverbParticle

SententialcomplementwithNPIntransitivewithPP

TransitivesententialsubjectIntransitivewithadjectiveDitransitiveverbParticle

1

0.8

confidence cut-off 0.01confidence cut-off 0.03confidence cut-off 0.05

0.6Recall

0.4 0.2 0

0 0.2 0.4

Precision

0.6 0.8 1

Inordertoevaluateourmethod,wesplittheSCFlexi-conoftheXTAGEnglishgrammarintothetrainingpor-tionandthetestportion.Thetrainingportionincludes9,427SCFsfor8,399words,whilethetestportionin-cludes433SCFsfor280wordsThetestportionisse-lectedfromtheSCFlexiconforwordsthatareobservedintheacquiredSCFlexicon.WeextractSCFcon dence-valuevectorsfromthetrainingportionandcombinethemwiththeSCFcon dence-valuevectorsobtainedfromtheacquiredSCFs.Thenumberoftheresultingdataobjectsis8,679.5WealsomakeuseoftheSCFcon dence-valuevectorsobtainedfromthetrainingSCFlexiconasanini-tialcentroidbyregardingεas0.Thetotalnumberofthemwas35.6Wethenperformedclusteringofthese8,679dataobjectsinto35clusters.

We nallyevaluateprecisionandrecalloftheresultingSCFsbycomparingthemwiththetestSCFlexiconoftheXTAGEnglishgrammar.

We rstcomparecon dencecut-offwithfrequencycut-offtoinvestigateeffectsofBayesianestimation.Fig-ure4showsprecisionandrecalloftheresultingSCFsetsusingcon dencecut-offandfrequencycut-off.Wemea-suredprecisionandrecalloftheSCFsetsobtainedusingcon dencecut-offwhoserecognitionthresholdt=0.01(con dencecut-off0.01),0.03(con dencecut-off0.03),and0.05(con dencecut-off0.05)byvaryingthresholdforthecon dencevaluefrom0to1.WealsomeasuredthosefortheSCFsetsobtainedusingfrequencycut-offbyvaryingthresholdfortherelativefrequencyfrom0to1.Thegraphapparentlyindicatesthatthecon dencecut-offsoutperformedthefrequencycut-off.Whenwe

5WeusedtheSCFcon dence-valuevectorsforwordswhich

Figure4:PrecisionandrecalloftheresultingSCFsusingcon dencecut-offandfrequencycut-off

1

centroid cut-off 0.03centroid cut-off 0.03*

0.8

0.6Recall

0.4 0.2 0

0 0.2 0.4

Precision

0.6 0.8 1

Figure5:PrecisionandrecalloftheresultingSCFsusingcon dencecut-offandfrequencycut-off

comparecon dencecut-offswithdifferentrecognitionthresholds,wecanimproveprecisionusinghigherrecog-nitionthresholdwhilewecanimproverecallusinglowerrecognitionthreshold.Thisresultisquiteconsistentwithourexpectations.

Wethencomparecentroidcut-offwithcon dencecut-offtoobserveeffectsofclusteringusinginformationinthelexiconoftheXTAGEnglishgrammar.Figure5showsprecisionandrecalloftheresultingSCFsetsusingcentroidcut-offandcon dencecut-offwiththerecogni-tionthresholdt=0.03byvaryingthethresholdforthecon dencevalue.Inordertoshowtheeffectsofinfor-mationofthetrainingSCFlexicon,centroidcut-off0.03*isSCFsobtainedbyclusteringofSCFcon dence-valuevectorsintheacquiredSCFsonlywithrandominitial-ization.ThegraphapparentlyshowsthatclusteringismeaningfulonlywhenwemakeuseofthereliableSCFcon dence-valuevectorsobtainedfromthemanuallytai-

areincludedintheXTAGEnglishgrammar.WhenboththetrainingSCFlexiconandtheacquiredSCFlexiconhavethesamewords,wesimplyusedanSCFcon dence-valuevectorobtainedfromtheacquiredSCFlexicon.

6WeusedtheSCFcon dence-valuevectorsthatappearwithmorethantwowords.

Generalizing Subcategorization Frames Acquired from Corpora(5).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:7 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:29元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219