XLELFG parsing → Discriminant
时间:2026-01-21
时间:2026-01-21
We present the LFG Parsebanker, a comprehensive toolkit for interactive incremental construction of a treebank as a parsed corpus. The tool which we have developed supports the process flow in semi-automatic treebank construction, as illustrated in the fol
LFG
Parsebanker
TREPILNorwegianTreebankPilotProject
Introduction
WepresenttheLFGParsebanker,acomprehensivetoolkitforinteractiveincrementalcon-structionofatreebankasaparsedcorpus.Thetoolwhichwehavedevelopedsupportstheprocess owinsemi-automatictreebankconstruction,asillustratedinthefollowingscheme:
XLE/LFGparsing →Discriminantdisambiguation
→DatabasestorageThetoolkithasthefollowingcomponents:
XLE-Web,aninterfacetotheXLEparseronawebpage;thisinterfaceincludesanewdisplayofpackedstructuresandoffersdiscriminants[1],designedandimplementedforLFGgrammars,toselectananalysis;
aparsebankingpagewhichoffersviewsanddisambiguationasinXLE-Web,butalsoaddi-tionalparsebankmanagementoperations,suchassubcorpusandgrammarselectionandasearchwindowbasedonTigerSearchextendedforf-structures;
anoverviewpageprovidingnavigation,informationandsortingofutterances; adiscriminantstatisticspagedisplayingstatisticsonchosendiscriminants.
MostofthesecomponentsareimplementedinCommonLispanduseXML,XSLTandJavascripttoservetheinterfacewebpages.C-structuretrees(andgraphs)aredrawnus-ingScalableVectorGraphics(SVG)andMySQLisusedtostoretheparsebank.
Disambiguationwithdiscriminants
Inbuildingatreebank,theannotator’schoicebetweendifferentpossiblegrammaticalstruc-turesiscomplicatedbyseveralfactors.Amajorchallengeisthesheernumberofpossiblestructures,whichmayrunintothehundredsorthousandsforlongersentences.Anotherchal-lengeisthehighlevelofdetailrecordedinthestructures,whichisdesirableinthetreebankbutcanbedauntingfortheannotator.Considerthef-structuresin(2)forthesentenceinexample(1),wherehverdagcanbeanobjectoranadjunct.(1)Barn-alekerhverdag.child-DEF.PLplayeveryday“Thechildrenplayeveryday.”
(2)
Thedifferenceindicatedwithgreenshadinginthestructuresin(2)ispresentedtothean-notatorasthechoicein(3).Thesesimple,localdifferencesarecalleddiscriminants[1].Bychoosingwhetherhverdagisanobjectoranadjunct,theannotatordecidesontheintendedanalysisbutavoidsexaminingthewhole,complicatedstructures.
(3)
NULLOBJNULLADJUNCTParsebankinginterfacewithdiscriminantdisambiguation
Theinterfaceforidentifyingtheintendedanalysisisshowninthefollowingscreenshot.Hereweseethelistofdiscriminantsontheleft,thepackedconstituentstructureinthemiddle,andthepackedfunctionalstructureontheright.Theanalysesshownareforexample(4),inwhichtilfjellshastwopossibleattachments.Theannotator
basicallychoosesdiscriminantsbyclickingtochooseorrejectthem,butotheradvancedactionsarealsoavailable[3].
VictoriaRosén,PaulMeurerandKoenraaddeSmedt
UniversityofBergenandUnifobAKSIS
(4)Tamedbarn-atilfjells.
takealongchild-DEF.PLtomountain-LOC
“Takethechildrenalongtothemountains”or“Takethechildreninthemountainsalong”
Discriminanttypes
1.Lexicaldiscriminant(awordformanditspartofspeech)
2.Morphologicaldiscriminant(abaseformwithitstagsfrommorphologicalpreprocessing)3.C-structurediscriminant(alabeledorunlabeledbracketingofasubstring)4.F-structurediscriminant(aminimalpaththroughanf-structure)
Treebankoverviewpage
Theoverviewpage,showninthefollowingscreenshot,listsallsentencesinthecorpusto-getherwithinformationaboutnumberofparsesolutions,whethertheanalysisisfragmented,numberofdiscriminants,numberofchosenanalyses,sentencelength,andwhetherthecho-senanalysis
istheintendedone.Anycommentsaddedbytheannotatorduringthedisam-biguationprocessarealsoshown.
Discriminantstatisticspage
Thediscriminantstatisticspagepresentsafrequencylistofchosendiscriminantsforasub-corpus.Eachdiscriminantislistedwithitstype,thenumberoftimesitischosen(i.e.markedasgood)andthenumberoftimesitscomplementischosen
(i.e.markedasbad).(Note:Thestatisticsshownwerecompiledbeforelexicaldiscriminantswereaddedtothesystem.)
Resultsandprospects
OurworkbuildsonpreviousparsebankingeffortssuchastheTreebanker[1],Alpino[4]andLinGORedwoods[2].Ourtoolkit,however,isspeci callydesignedforLFGgrammars.WehaveimplementedTIGER-basedsearchonf-structuresaswellasc-structures,andwecantrainparserankingbasedonourLFGdiscriminants.
Thetoolwhichwehavedevelopedisfunctionalandwillbefurtherdevelopedintheremain-deroftheproject.AlthoughitwasoriginallyprimarilyintendedforNorwegian,ithasbeenimplementedinalanguage-independentfashion.ThismeansthatitmaybeusedforbuildingatreebankforanylanguageforwhichasuitableLFGgrammarisavailable.
TheTREPILprojectrunsfromApril1,2004toDecember31,2008.Itswebsiteis:http://gandalf.aksis.uib.no/trepil/.
References
[1]DavidCarter.TheTreeBanker:Atoolforsupervisedtrainingofparsedcorpora.InProceedingsoftheFourteenthNationalConferenceonArti cialIntelli-gence,pages598–603,Providence,RhodeIsland,1997.[2]StephanOepen,DanFlickinger,KristinaToutanova,andChristopherD.Manning.LinGORedwoods,arichanddynamictreebankforHPSG.ResearchonLanguage&Computation,2(4):575–596,December2004.[3]VictoriaRosén,KoenraadDeSmedt,andPaulMeurer.Towardsatoolkitlinkingtreebankingtogrammardevelopment.InProceedingsoftheFifthWorkshoponTreebanksandLinguisticTheories,pages55–66,2006.[4]LeonoorVanderBeek,GosseBouma,RobertMalouf,andGertjanVanNoord.TheAlpinodependencytreebank.InComputationalLinguisticsintheNetherlands(CLIN)2001,TwenteUniversity,2002.
…… 此处隐藏:3270字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:国家的教育方针
下一篇:血液内科试卷及答案六