Generalizing Subcategorization Frames Acquired from Corpora(2)

发布时间:2021-06-08

This paper presents a method of improving the quality of subcategorization frames (SCFs) acquired from corpora in order to augment a lexicon of a lexicalized grammar. We first estimate a confidence value that a word can have each SCF, and create an SCF con

(#S(EPATTERN:TARGET|ftp|

:SUBCAT(VSUBCATNONE):CLASSES(222985):RELIABILITY0

:FREQSCORE0.01640195:FREQCNT2

:TLTL(VVDVV0)

:SLTL(((|ssh|NN1))):OLT1LNIL:OLT2LNIL

:OLT3LNIL:LRL0))

Figure1:AnacquiredSCFforaverb“ftp”thelexiconoftheXTAGEnglishgrammar,andthencom-paredtheresultswiththoseobtainedbynaivefrequencycut-off.

Figure2:ProbabilitydistributionsofSCFsforapply2.2

ClusteringofVerbSCFDistributions

2

2.1

Background

AcquisitionofSCFsforLexicalizedGrammars

WestartbyacquiringSCFsforalexicalizedgrammarfromcorporabythemethoddescribedin(CarrollandFang,2004).

Intheirstudy,they rstacquire ne-grainedSCFsbythemethodproposedby(BriscoeandCarroll,1997;Ko-rhonen,2002).Figure1showsanexampleofoneac-quiredSCFentryforaverb“ftp.”EachacquiredSCFen-tryhasseveral eldsabouttheobservedSCF.Weexplainhereonlyitsportionrelatedtothisstudy.TheTARGET eldisawordstem(|ftp|inFigure1),the rstnumberintheCLASSES eldindicatesanSCFID(22inFigure1),andFREQCNTshowshowoftenwordsderivablefromthewordstemhadtheSCFidenti edbytheSCFID(2timesinFigure1)inthetrainingcorpus.TheobtainedSCFscomprisethetotal163typesofrelatively ne-grainedSCFs,whichareoriginallybasedontheSCFsintheANLT(BoguraevandBriscoe,1987)andCOMLEX(Gr-ishmanetal.,1994)dictionaries.Inthisexample,theSCFID22correspondstoanSCFofintransitiveverb.TheythenobtainSCFsforthetargetlexicalizedgram-mar(theLINGOEnglishResourceGrammar(Flickinger,2000)intheirstudy)byusingahandcraftedtranslationmapfromthese163typestooneofthetypesofSCFsinthetargetgrammar.Theyreportthattheycouldachieveacoverageimprovementof4.5%(52.7%to57.2%)withaparsingtimedouble(9.78sec.to21.78sec.).

Thisapproachiseasilyextensibletoanylexicalizedgrammars,ifthegrammarshaveanorganizedarchitec-tureoflexicon,whichderivepossiblelexicalentriesfromeachSCFthegrammarde nes.Existinglexicalizedgrammarsusuallyareequippedwiththiskindoforga-nization,e.g.,lexicaltypesinLINGOERGandtreefam-iliesintheXTAGEnglishgrammar.

TherearesomerelatedworkonclusteringofSCFprob-abilitydistributions(SchulteimWaldeandBrew,2002;Korhonenetal.,2003).Thesestudiesaimatobtainingverbsemanticclasses,whichcloselyrelatedtosyntacticbehaviorofargumentselection.

SchulteimWaldeandBrew(2002)employedcluster-ingofverbSCFdistributionstoinduceverbsemanticclasses.They rstrepresentaverbSCFdistributionbyann-dimensionalvectorforeachverb.EachelementintheSCFdistributionrepresentsaprobabilitythataverbappearswiththecorrespondingSCF.Theythenperformk-Meansclustering(Forgy,1965)ofthesevectorsinor-dertoobtainverbsemanticclasses.

Korhonenetal.(2003)alsoconductedclusteringofverbSCFdistributionsusingadifferentclusteringmethodincludingthenearestneighborsclusteringandtheInformationBottleneckclustering(Tishbyetal.,1999).Theyinvestigatedtheeffectofpolysemicverbsonclus-tering.

Althoughthesestudiesdemonstratedthatthereisacer-tainclassi cationofverbsbyclusteringofverbSCFdis-tributions,theydonotfocusontheimprovementofthequalityoftheSCFlexicon.Inthispaper,wefocusontheproblemtoidentifywhetherawordcanhaveeachSCFandtrytoobtainwordclasseswhoseelementwordshavethesamesetofSCFs.

3Method

Thebasicideaofourmethodis rsttoobtainwordclasseswhoseelementwordshavethesamesetofSCFs,usingnotonlyacquiredSCFsbutalsoexistingSCFsinthetargetgrammar.Wetheneliminateimplausibleac-quiredSCFsandaddplausibleunseenSCFsaccordingtothesetofSCFsrepresentedbythecentroidsoftheresult-ingclusters.3.1

RepresentationofCon denceValuesforSCFs

WerepresentanSCFcon dence-valuevectorofeachwordwiwithavectorvi,anobjectforclustering.Eachelementvijinvirepresentsthecon dencevalueofSCF

Generalizing Subcategorization Frames Acquired from Corpora(2).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:7 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:29元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219