Simple Text Processing

发布时间:2021-06-06

data retrieval information retrieval matching exact partial, best match inference deduction induction model deterministic probabilistic query language artificial natural query specification complete incomplete items wanted matching relevant error response

1

Document Management, Department of Linguistics, Uppsala University, VT992

Information Retrieval

Document Life Cycle

Document Management, Department of Linguistics, Uppsala University, VT993

Document Management, Department of Linguistics, Uppsala University, VT994

Data Retrieval versus Information Retrieval

Precision and Recall

matching inference model query language query speci cation items wanted error response

data retrieval exact deduction deterministic arti cial complete matching sensitive

information retrieval partial, best match induction probabilistic natural incomplete relevant insensitive

precision= j Dj\ D j Djr

\ recall= j D D Dj j jr r

deduction: p(a); 8x: p(x) ! q(x) ) q(a) induction: p(a); q(a) ) 8x: p(x) ! q(x)

D: documents returned by the system D: relevant documentsr

data retrieval information retrieval matching exact partial, best match inference deduction induction model deterministic probabilistic query language artificial natural query specification complete incomplete items wanted matching relevant error response

Document Management, Department of Linguistics, Uppsala University, VT995

Document Management, Department of Linguistics, Uppsala University, VT996

Simple Text Processing

Searching StringsSearch Pattern

searching strings string replacement sorting tokenization text segmentation (sentences, phrases, MWU's) frequency counts

simple strings expressionsString Searching Problems

string matching string distance longest common subsequence approximate string matching longest repeated substring

Document Management, Department of Linguistics, Uppsala University, VT997

Document Management, Department of Linguistics, Uppsala University, VT998

String Matching

String Matching - Algorithms (1)

Given pattern string x, with j x j= m, and text string y, with j y j= n, where m; n> 0 and m n, if x occurs as a substring of y then determine the position within y of the rst occurrence of x, i.e. return the least value of i such that y(i; i+ m? 1)= x(1; m).

at successive positions. worst case: O(mn) average case: O(m+ n) Knuth-Morris-Pratt: calculate a next-table for the pattern x which will be used for shifting the pattern when mismatching worst case: O(m+ n) Boyer-Moore: scan the string string from left to right, but match symbols from right to left (disregard portions which cannot possibly match the pattern) worst case: O(m+ n) average case: O(n=m) (large alphabet, small pattern)

Brute Force: compare pattern x with substrings of y

Simple Text Processing.doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:7 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:29元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219