A formal model for information selection in multi-sentence t(6)
发布时间:2021-06-08
发布时间:2021-06-08
Selecting important information while accounting for repetitions is a hard task for both summarization and question answering. We propose a formal model that represents a collection of documents in a two-dimensional space of textual and conceptual units wi
The score of the atomic event depends on the fre-quency of the named entities pair for the input text and the frequency of the connector for that named entities pair.Filatova and Hatzivassiloglou(2003) define the procedure for extracting atomic events in detail,and show that these triplets capture the most important relations connecting the major constituent parts of events,such as location,dates and partici-pants.Our hypothesis is that using these events as conceptual units would provide a reasonable basis for summarizing texts that are supposed to describe one or more events.
Evaluation Metric Given the difficulties in com-ing up with a universally accepted evaluation mea-sure for summarization,and the fact that judgments by humans are time-consuming and labor-intensive, we adopted an automated process for comparing system-produced summaries to the ideal summaries written by humans.The ROUGE method(Lin and Hovy,2003)is based on n-gram overlap between the system-produced and ideal summaries.As such, it is a recall-based measure,and it requires that the length of the summaries be controlled in or-der to allow for meaningful comparisons.Although ROUGE is only a proxy measure of summary qual-ity,it offers the advantage that it can be readily ap-plied to compare the performance of different sys-tems on the same set of documents,assuming that ideal summaries are available for those documents. Baseline Our baseline method does not consider the overlap in information content between selected textual units.Instead,wefix the score of each sen-tence as the sum of tf*idf values or atomic event scores.At every step we choose the remaining sen-tence with the largest score,until the stopping crite-rion for summary length is satisfied.
Results For every version of our baseline and approximation algorithms,and separately for the tf*idf-weighted words and event features,we get a sorted list of sentences extracted according to a par-ticular algorithm.Then,for each DUC document set we create four summaries of each suggested length (50,100,200,and400words)by extracting accord-ingly thefirst50,100,200,and400words from the top sentences.
To evaluate the performance of our summarizers we compare their outputs against the human models of the corresponding length provided by DUC,us-ing the ROUGE-created scores for unigrams.Since scores are not comparable across different docu-ment sets,instead of average scores we report the number of document sets for which one algorithm outperforms another.We compare each of our
Length Events tf*idf
50+30
100+4−4
200+2−4
400+50
Table1:Adaptive greedy algorithm versus baseline.
Length Events tf*idf
500+7
100+4+4
200+8+6
400+2+14
Table2:Modified greedy algorithm versus baseline. approximation algorithms(adaptive and modified greedy)to the baseline.
Table1shows the number of data sets for which the adaptive greedy algorithm outperforms our baseline.This implementation of our informa-tion packing model improves the ROUGE scores in most cases when events are used as features,while the opposite is true when tf*idf provides the con-ceptual units.This may be partly explained because of the nature of the tf*idf-weighted word features: it is possible that important words cannot be con-sidered independently,and that the repetition of im-portant words in later sentence does not necessarily mean that the sentence offers no new information. Thus words may not provide independent enough features for our approach to work.
Table2compares our modified greedy algorithm to the baseline.In that case,the model offers gains in performance when both events and words are used as features,and in fact the gains are most pro-nounced with the word features.For both algo-rithms,the gains are generally minimal for50word summaries and most pronounced for the longest, 400word summaries.This validates our approach, as the information packing model has a limited op-portunity to alter the set of selected sentences when those sentences are very few(often one or two for the shortest summaries).
It is worth noting that in direct comparisons be-tween the adaptive and modified greedy algorithm we found the latter to outperform the former.We found also events to lead to better performance than tf*idf-weighted words with statistically significant differences.Events tend to be a particularly good representation for document sets with well-defined constituent parts(such as specific participants)that cluster around a narrow event.Events not only give us a higher absolute performance when compared
上一篇:电梯标准化管理制度