XLELFG parsing → Discriminant

时间:2026-01-21

We present the LFG Parsebanker, a comprehensive toolkit for interactive incremental construction of a treebank as a parsed corpus. The tool which we have developed supports the process flow in semi-automatic treebank construction, as illustrated in the fol

LFG

Parsebanker

TREPILNorwegianTreebankPilotProject

Introduction

WepresenttheLFGParsebanker,acomprehensivetoolkitforinteractiveincrementalcon-structionofatreebankasaparsedcorpus.Thetoolwhichwehavedevelopedsupportstheprocess owinsemi-automatictreebankconstruction,asillustratedinthefollowingscheme:

XLE/LFGparsing →Discriminantdisambiguation

→DatabasestorageThetoolkithasthefollowingcomponents:

XLE-Web,aninterfacetotheXLEparseronawebpage;thisinterfaceincludesanewdisplayofpackedstructuresandoffersdiscriminants[1],designedandimplementedforLFGgrammars,toselectananalysis;

aparsebankingpagewhichoffersviewsanddisambiguationasinXLE-Web,butalsoaddi-tionalparsebankmanagementoperations,suchassubcorpusandgrammarselectionandasearchwindowbasedonTigerSearchextendedforf-structures;

anoverviewpageprovidingnavigation,informationandsortingofutterances; adiscriminantstatisticspagedisplayingstatisticsonchosendiscriminants.

MostofthesecomponentsareimplementedinCommonLispanduseXML,XSLTandJavascripttoservetheinterfacewebpages.C-structuretrees(andgraphs)aredrawnus-ingScalableVectorGraphics(SVG)andMySQLisusedtostoretheparsebank.

Disambiguationwithdiscriminants

Inbuildingatreebank,theannotator’schoicebetweendifferentpossiblegrammaticalstruc-turesiscomplicatedbyseveralfactors.Amajorchallengeisthesheernumberofpossiblestructures,whichmayrunintothehundredsorthousandsforlongersentences.Anotherchal-lengeisthehighlevelofdetailrecordedinthestructures,whichisdesirableinthetreebankbutcanbedauntingfortheannotator.Considerthef-structuresin(2)forthesentenceinexample(1),wherehverdagcanbeanobjectoranadjunct.(1)Barn-alekerhverdag.child-DEF.PLplayeveryday“Thechildrenplayeveryday.”

(2)

Thedifferenceindicatedwithgreenshadinginthestructuresin(2)ispresentedtothean-notatorasthechoicein(3).Thesesimple,localdifferencesarecalleddiscriminants[1].Bychoosingwhetherhverdagisanobjectoranadjunct,theannotatordecidesontheintendedanalysisbutavoidsexaminingthewhole,complicatedstructures.

(3)

NULLOBJNULLADJUNCTParsebankinginterfacewithdiscriminantdisambiguation

Theinterfaceforidentifyingtheintendedanalysisisshowninthefollowingscreenshot.Hereweseethelistofdiscriminantsontheleft,thepackedconstituentstructureinthemiddle,andthepackedfunctionalstructureontheright.Theanalysesshownareforexample(4),inwhichtilfjellshastwopossibleattachments.Theannotator

basicallychoosesdiscriminantsbyclickingtochooseorrejectthem,butotheradvancedactionsarealsoavailable[3].

VictoriaRosén,PaulMeurerandKoenraaddeSmedt

UniversityofBergenandUnifobAKSIS

(4)Tamedbarn-atilfjells.

takealongchild-DEF.PLtomountain-LOC

“Takethechildrenalongtothemountains”or“Takethechildreninthemountainsalong”

Discriminanttypes

1.Lexicaldiscriminant(awordformanditspartofspeech)

2.Morphologicaldiscriminant(abaseformwithitstagsfrommorphologicalpreprocessing)3.C-structurediscriminant(alabeledorunlabeledbracketingofasubstring)4.F-structurediscriminant(aminimalpaththroughanf-structure)

Treebankoverviewpage

Theoverviewpage,showninthefollowingscreenshot,listsallsentencesinthecorpusto-getherwithinformationaboutnumberofparsesolutions,whethertheanalysisisfragmented,numberofdiscriminants,numberofchosenanalyses,sentencelength,andwhetherthecho-senanalysis

istheintendedone.Anycommentsaddedbytheannotatorduringthedisam-biguationprocessarealsoshown.

Discriminantstatisticspage

Thediscriminantstatisticspagepresentsafrequencylistofchosendiscriminantsforasub-corpus.Eachdiscriminantislistedwithitstype,thenumberoftimesitischosen(i.e.markedasgood)andthenumberoftimesitscomplementischosen

(i.e.markedasbad).(Note:Thestatisticsshownwerecompiledbeforelexicaldiscriminantswereaddedtothesystem.)

Resultsandprospects

OurworkbuildsonpreviousparsebankingeffortssuchastheTreebanker[1],Alpino[4]andLinGORedwoods[2].Ourtoolkit,however,isspeci callydesignedforLFGgrammars.WehaveimplementedTIGER-basedsearchonf-structuresaswellasc-structures,andwecantrainparserankingbasedonourLFGdiscriminants.

Thetoolwhichwehavedevelopedisfunctionalandwillbefurtherdevelopedintheremain-deroftheproject.AlthoughitwasoriginallyprimarilyintendedforNorwegian,ithasbeenimplementedinalanguage-independentfashion.ThismeansthatitmaybeusedforbuildingatreebankforanylanguageforwhichasuitableLFGgrammarisavailable.

TheTREPILprojectrunsfromApril1,2004toDecember31,2008.Itswebsiteis:http://gandalf.aksis.uib.no/trepil/.

References

[1]DavidCarter.TheTreeBanker:Atoolforsupervisedtrainingofparsedcorpora.InProceedingsoftheFourteenthNationalConferenceonArti cialIntelli-gence,pages598–603,Providence,RhodeIsland,1997.[2]StephanOepen,DanFlickinger,KristinaToutanova,andChristopherD.Manning.LinGORedwoods,arichanddynamictreebankforHPSG.ResearchonLanguage&Computation,2(4):575–596,December2004.[3]VictoriaRosén,KoenraadDeSmedt,andPaulMeurer.Towardsatoolkitlinkingtreebankingtogrammardevelopment.InProceedingsoftheFifthWorkshoponTreebanksandLinguisticTheories,pages55–66,2006.[4]LeonoorVanderBeek,GosseBouma,RobertMalouf,andGertjanVanNoord.TheAlpinodependencytreebank.InComputationalLinguisticsintheNetherlands(CLIN)2001,TwenteUniversity,2002.

…… 此处隐藏:3270字,全部文档内容请下载后查看。喜欢就下载吧 ……
XLELFG parsing → Discriminant.doc 将本文的Word文档下载到电脑

    精彩图片

    热门精选

    大家正在看

    × 游客快捷下载通道(下载后可以自由复制和排版)

    限时特价:4.9 元/份 原价:20元

    支付方式:

    开通VIP包月会员 特价:19元/月

    注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
    微信:fanwen365 QQ:370150219