Topic segmentation with an aspect hidden Markov model(11)
时间:2025-04-27
时间:2025-04-27
We present a novel probabilistic method for partially unsupervised topic segmentation on unstructured text. Previous approaches to this problem utilize the hidden Markov model framework (HMM). The HMM treats a document as mutually independent sets of words
85EXPERIMENTALRESULTS
IABACDECFAGFHJ
1234567891011121314151617
Figure4:AsegmentationofAllThingsConsideredfromApril29,1999.Thetopdiagramisthehypothesissegmentation.Thebottomdiagramisthetruesegmentation.Figure4showsasegmentationfromarealtranscriptofATConApril29,1999.Thesegmentationisnotperfectbuthypothesizesthedetectedtopicbreaksatapproximatelythecorrectpointsintheprogram.At rst,thereseemtobemanymissedbreaks.Wearguehoweverthatthesemissedstorybreaksdonotalwaysconstitutetopicbreaksandthereforearenotindicativeoftheperformanceofourmodel.Toillustratethis,weexploreamethodoftopiclabelingbasedonthelanguagemodelparametersoftheaspectmodel.
Onewayofidentifyingthetopicswhichthesegmenter ndsisbythetop fteen
parameterforthevalueofwhichtheViterbialgorithmassignedwordsofthe
toaparticularsegment.Figure5liststhesewordsets(denotedbyaletter)astheycorrespondtothetopicsinthesegmentation(denotedbyanumber).Forexample,story14isabouttheIsraeli/Palestiniancon ict.Itscorrespondingsegmentinthehy-pothesissegmentationcanbedescribedbythewordsintopicFwhichincludepeace,israeli,andpalestinian.
Analysisofthiscorrespondenceoftenexplainsmissedtopicbreaks.Articles11and12arebothabouttheKosovarrefugees.Understandably,theyarebothassignedtotopicAandthebreakbetweenstoriesgoesundetected.
Notethatthesegmentercanworkevenifthetopwordsoffailtogiveagoodtopicdescription.ThestoryaboutdeformedfrogsisassignedtopicI,arathergenericlanguagemodelwithnorealdescriptivewords.However,thesubsequentstoryabouttheeconomy tstopicJsowellthattheAHMMisabletoproperlydetectthebreak.
上一篇:工程索道与柔性吊桥
下一篇:人教版六年级数学上册期末试卷2