Simple Text Processing
发布时间:2021-06-06
发布时间:2021-06-06
data retrieval information retrieval matching exact partial, best match inference deduction induction model deterministic probabilistic query language artificial natural query specification complete incomplete items wanted matching relevant error response
1
Document Management, Department of Linguistics, Uppsala University, VT992
Information Retrieval
Document Life Cycle
Document Management, Department of Linguistics, Uppsala University, VT993
Document Management, Department of Linguistics, Uppsala University, VT994
Data Retrieval versus Information Retrieval
Precision and Recall
matching inference model query language query speci cation items wanted error response
data retrieval exact deduction deterministic arti cial complete matching sensitive
information retrieval partial, best match induction probabilistic natural incomplete relevant insensitive
precision= j Dj\ D j Djr
\ recall= j D D Dj j jr r
deduction: p(a); 8x: p(x) ! q(x) ) q(a) induction: p(a); q(a) ) 8x: p(x) ! q(x)
D: documents returned by the system D: relevant documentsr
data retrieval information retrieval matching exact partial, best match inference deduction induction model deterministic probabilistic query language artificial natural query specification complete incomplete items wanted matching relevant error response
Document Management, Department of Linguistics, Uppsala University, VT995
Document Management, Department of Linguistics, Uppsala University, VT996
Simple Text Processing
Searching StringsSearch Pattern
searching strings string replacement sorting tokenization text segmentation (sentences, phrases, MWU's) frequency counts
simple strings expressionsString Searching Problems
string matching string distance longest common subsequence approximate string matching longest repeated substring
Document Management, Department of Linguistics, Uppsala University, VT997
Document Management, Department of Linguistics, Uppsala University, VT998
String Matching
String Matching - Algorithms (1)
Given pattern string x, with j x j= m, and text string y, with j y j= n, where m; n> 0 and m n, if x occurs as a substring of y then determine the position within y of the rst occurrence of x, i.e. return the least value of i such that y(i; i+ m? 1)= x(1; m).
at successive positions. worst case: O(mn) average case: O(m+ n) Knuth-Morris-Pratt: calculate a next-table for the pattern x which will be used for shifting the pattern when mismatching worst case: O(m+ n) Boyer-Moore: scan the string string from left to right, but match symbols from right to left (disregard portions which cannot possibly match the pattern) worst case: O(m+ n) average case: O(n=m) (large alphabet, small pattern)
Brute Force: compare pattern x with substrings of y
上一篇:报表、财务分析
下一篇:赖诺普利联合牛黄降压胶囊