Generalizing Subcategorization Frames Acquired from Corpora(6)

发布时间:2021-06-08

This paper presents a method of improving the quality of subcategorization frames (SCFs) acquired from corpora in order to augment a lexicon of a lexicalized grammar. We first estimate a confidence value that a word can have each SCF, and create an SCF con

Tnx0Vnx1Tnx0Vs1

Tnx0Vnx2nx1Tnx0Vnx1Pnx2Tnx0Vnx1pnx2Tnx0Vplnx1Tnx0VplTnx0Vnx1s2Ts0Vnx1Tnx0Vax1

Tnx0Vplnx2nx1

267(222)38(29)21(16)8(4)5(1)40(23)20(0)11(6)8(1)2(1)1(0)PrecisionRecall0.959(212/221)0.794(212/267)0.357(10/28)0.263(10/38)0.105(6/57)0.286(6/21)0.200(3/15)0.375(3/8)0.024(1/41)0.200(1/5)0.538(7/13)0.175(7/40)na(0/0)0.000(0/20)0.083(1/12)0.091(1/11)0.000(0/2)0.000(0/8)0.000(0/9)0.000(0/2)0.000(0/2)0.000(0/1)PrecisionRecall0.958(253/264)0.948(253/267)0.381(8/21)0.211(8/38)0.185(10/54)0.476(10/21)0.200(2/10)0.250(2/8)0.029(1/34)0.200(1/5)0.667(6/9)0.150(6/40)na(0/0)0.000(0/20)0.200(1/5)0.091(1/11)na(0/0)0.000(0/8)0.000(0/3)0.000(0/2)na(0/0)0.000(0/1)PrecisionRecall0.956(260/272)0.974(260/267)0.323(10/31)0.263(10/38)0.122(9/74)0.429(9/21)0.250(2/8)0.250(2/8)na(0/0)0.000(0/5)0.778(7/9)0.175(7/40)na(0/0)0.000(0/20)0.200(1/5)0.091(1/11)na(0/0)0.000(0/8)0.000(0/1)0.000(0/2)na(0/0)0.000(0/1)

Table2:Precisionandrecallfor400SCFsobtainedfromfreqencycut-off,con dencecut-off0.03,andcentroidcut-off

0.03

loredlexicon.Thecentroidcut-offusingthelexiconboostedprecisionandrecallcomparedtothecon dencecut-offandthecentroidcut-offwithoutthelexicon.

We nallyinvestigateprecisionandrecallofthere-sultingSCFsforeverySCFtypeinordertoevaluateef-fectsofourmethodoneachSCF.Table2showspreci-sionandrecalloftheSCFsbyusingfrequencycut-off(thethresholdfortherelativefrequency0.092),con -dencecut-off0.03(thethresholdforthecon dencevalue0.953),centroidcut-off0.03(thethresholdforthecon -dencevalue0.889)7byusingthresholdsfortherelativefrequencyandthecon dencevaluethatpreserveexactly400SCFs.Thenumbersincurlybracketsin#ofSCFscolumshowthenumberofSCFsinthetestSCFlexiconthatareacquiredfromthetrainingcorpus.TheleftandrightnumbersincurlybracketsintheprecisioncolumnsshowthenumberofcorrectSCFsagainstallSCFsintheresultingSCFlexiconwhilethoseintherecallcolumnsshowthenumberofcorrectSCFsagainstallSCFsinthetestSCFlexicon.Wecanobserveatendencythatthecon dencecut-offandthecentroidcut-offpreservemoretransitive(Tnx0Vnx1)SCF.ThisisbecausesomeSCFsofTnx0Vnx1inthetestSCFlexiconarenotobservedinthetrainingcorpusbutarepredictedbyaprioridis-tributionforSCFTnx0Vnx1.Also,thecentroidcut-offtendstoreduceimplausibleSCFsofTnx0Vnx1Pnx2andTnx0Vax1.Sincethethresholdforthecon dencevalueofthecentroidcut-off0.03(0.889)issmallerthanthatofthecon dencecut-off0.03(0.953),theclusteringcouldeliminateimplausibleSCFswithoutreducingrecall.Inshort,onereasonwhythecentroidcut-offoutper-formsthecon dencecut-off(orthefrequencycut-off)isduetothewayhowthecentroidcut-offeliminateSCFsnotexistedinthelexicon.WhenweeliminateSCFswithlowerrelativefrequencyundertheassumptionthatthoseSCFstendtobewronglyacquiredSCFs,itmustalsoeliminatecorrectSCFswithlowrelativefrequencies.Byusingco-occurrencetendencyamongSCFsasanother

nowordtakesSCFTnx0Vpnx1inthetestSCFlexi-con,weomitithere.

7Since

criteriatojudgetheimplausibilityoftheSCFs,wecaneliminatemorewronglyacquiredSCFsbecausetheytendtoviolatetheco-occurrencetendency.Anotherreasonwhythecentroidcut-offandthecon dencecut-offout-performthethefrequencycut-offisduetothewayhowthosecut-offsaddnewunseenSCFs.Wecanaddplausi-bleSCFsfromthoseSCFswhichisreliableaccordingtotheiraprioridistribution.Furthermore,sincethecentroidcut-offmakesuseoftheco-occurrencetendencyamongSCFs,itaddsonlySCFswhichareplausibleintermsofcorpus-basedstatistics(con dencevalue)underthere-strictionprovidedbytheco-occurrencetendencyamongSCFsinthelexiconofthetargetgrammar.

5ConcludingRemarksandFutureWork

Inthispaper,wepresentedanovelwaytoimprovethequalityofSCFsacquiredfromcorporainordertoaug-mentalexicalizedgrammarwiththem.ByapplyingourmethodtotheacquiredSCFlexiconusingtheXTAGEn-glishgrammar,weshowedthatourmethodimprovedbothprecisionandrecalloftheresultingSCFscomparedtothenaivefrequency-basedcut-off.

Infuturework,wearegoingtoinvestigatethepars-ingperformanceoftheXTAGEnglishgrammaraug-mentedwithSCFsobtainedbyourmethod.Wewillapplyourmethodtolexicalizedgrammarswithrela-tivelysmallerlexicon,e.g.,theLINGOEnglishResourceGrammar(Flickinger,2000).

Acknowledgment

TheauthorswishtothankYoshimasaTsuruokaandTakuyaMatsuzakifortheiradviceonprobabilisticmod-elingofthesetofSCFs,andthankAlexFangforhishelpinusingSCFsacquiredfromthecorpus.TheauthorsarealsoindebtedtoYusukeMiyao,JohnCarrollandthethreeanonymousreviewersfortheirvaluablecommentsonthispaper.The rstauthorwassupportedinpartbyJSPSRe-searchFellowshipsforYoungScientists.

Generalizing Subcategorization Frames Acquired from Corpora(6).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:7 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:29元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219