MIT CLRS NOTES 2 mit 算法导论课堂笔记2
发布时间:2024-11-10
发布时间:2024-11-10
MIT CLRS NOTES mit 算法导论课堂笔记
Lecture 7
Table T holding n records.
record is a hunk of data broken into what we called fields. dynamic set Operations Insert(T,x) Delete(T,x) Search(T,k)
Sup. the set of keys is K 0,1,...,m 1 and keys are distinct. Set up array T[0,...,m-1] or “direct-access” table x if x“dynamic set”
nil otherwise
x can be a pointer or just record itself P223 figure 11.1 ops take (1)time
practical application can be bit vector to sort keys, just like counting sort(count once or not)
Problem: usu. the range of keys is large 264 nearly = 16*1018 too huge
MIT CLRS NOTES mit 算法导论课堂笔记
…,m-1} P255 figure 11.2
We have Universe U of keys, a much smaller set we care about is K. what the hash function is going to do is map keys into a table T[0..m-1] occurs. Perfert hashing with no collision will be talked next lecture. Records in same slot are linked into a list. Ex. P225 figure 11.3
Worst case: into the same slot.
Ave. case: Each key in K is equally likely to be hashed to any slot in T indep. of where other keys are hashed.
n key, m slots
load factor =n/m=ave. # keys per slot
Exp. Search cost= (1 ) 1 represents hash & access slot ; search list. = (1) if = (1), i.e. if n= (m)
Should distribute keys uniformly into slots
Regualriy in the key distribution should not affect this uniformity
MIT CLRS NOTES mit 算法导论课堂笔记
integer not floating division H(k)=k mod m
Don’t want m to have a small divisor d Worse: m=2r - does not depend on all bits of k
Pick m to be prime ????? not too close to power of 2 or 10
faster than division
Faster because multiplicaion software implementation of our computer is the inner loop of division impl. m=2r , computer has w-bit words h(k)=(A*k mod 2w)rsh(w-r)
A is odd integer 2w-1 <A<2w A’s first(not sigh) and last bit is on; rsh is bit wise right shift op Don’t pick A too close to 2w P232 Ex
Modular wheel ????
Another method to collision resolution is
MIT CLRS NOTES mit 算法导论课堂笔记
no storage outside of hash table
probe systematically until an empty slot is found
h: U univers of keys {0,1,.., m-1} prob #-> {0,1,.., m-1} slot probe sequence for a given key should be a permutation table may fill up( chaining may cause long list); del. is diffcult Open address is used where with no deletions, Although deletion is possible in this method. eg: compiler Ex. Insert K=496
Search- same probe seq. -successful –find record -unsuccessful –find nil
MIT CLRS NOTES mit 算法导论课堂笔记
Probing strategies
Linear probing : look seq. through table
“primary clustering” –long runs of filled slots
Double hashing h(k,i)=(h1(k)+ih2(k)) mod m (excellent) usu. pick m=2r and h2 (k) odd
– asumme ea. key equally likely to have any one of the m! perms. as its prob seq. indep. of other keys P241 formal one Proof.
1 probe always necessary with prob n/m collision -> 2nd probe nec. 2 prob. n-1/m-1 collision -> 3nd probe nec.
n in
< = for i=1,2,…,n-1 m im
nn 1n 2
E[# probes]=1+(1+(1+(…
mm 1m 2
1
if 1 (i.e., n<m) 1
Note:
1 (1 (.... 1 2....
1 1
When this is constant time
MIT CLRS NOTES mit 算法导论课堂笔记
Lecture 8
For any choice of hash function, bad set of keys that all hash to same slot.
Let U be a universe of keys, and let be H a finite collection of hash function mapping U to {0,1, ... , m-1} U, where x y, h H:h(x) h(y)}
Hm
the chance of collision betw. x and y is 1/m if h is chosen randomly from H
{hH
n keys into m slots in table T. Then, for a given key x, E[#
Proof. Let Cx be random variable denote total # collisions of keys in T with x, and let
MIT CLRS NOTES mit 算法导论课堂笔记
1 if h(x)=h(y)
cxy = 0 otherwise
Note: E[cxy]=1/m (because H is universal) and Cx =therefore E[Cx ]=E[
y T {x}
c
xy
y T {x}
c
xy]=(n-1)/m <
Let m be prime. Decompose key k U into r+1 “digits” k=<k0 , k1 , ... , kr > where 0 ki m
the following random a will determine which hash function to use. Pick a=< a0 , a1 , ... , ar >, each ai randomly from{0,1, ... ,m-1} Define ha(k)=( aiki) mod m (Dot product modulo m)
i 0r
How big is H={ha}? mr+1
Let x=< x0 , x1 , ... , xr >
y=< y0 , y1 , ... , yr > be distinct keys
They differ in at least one digit, wlog(without loss of generality) pos. 0. For how many ha H do x and y collide? Must have ha(x)= ha(y)
MIT CLRS NOTES mit 算法导论课堂笔记
-> aixi aiyi(mod m) { means congruent}
i 0r
i 0
rr
-> ai(xi yi) 0 (mod m)
i 0
->a0(x0 y0) ai(xi yi) 0(mod m)
i 1
r
->a0(x0 y0) ai(xi yi)(mod m)
i 1
r
Number theory fact
Let m be prime, for any z m (integers mod m) $(such that) z 0, unique z-1 m $ z z-1 1(mod m) z z-1
Since x0 y0, ( x0-y0)-1
->a0 ( ai(xi yi))(x0 y0) 1(mod m)
i 1r
1 2 3 4 5 6 1 4 5 2 3 6
Thus, for any choices of a1, a2, ... ar , exactly one of m choices for a0 cause x and y to collide.
# ha’s choices that cause x, y collide = m (choices for a
1) m(choices for a2) ...m(choices for ar)
1(choices for a0, the above one)= mr =H
m
MIT CLRS NOTES mit 算法导论课堂笔记
Given n keys, construct a static hash table of size m=O(n) $ search tales
(1)time in the worst case.
Idea: 2 level scheme with universal hashing in both levels No collisions at level 2!
2 slots using random h in universal H E[#collisions]<1/2 verse of birthday paradox Proof Prob 2 given keys collide under h is 1/m=1/n2
n 2 pairs of keys -> E[#collisions]=
n 1 2 n2<1/2
1/2 Pf. Markov’s inequality for r.v. X 0
E[X]
t
E[#collisions]
Pr{ 1 collision} <1/2
1
Pr{X t}
MIT CLRS NOTES mit 算法导论课堂笔记
For level 1, choose m=n, and let ni be r.v. for # keys that hash to slot i in T. Use ni2 slots for each level 2 table Si.
E[total storage]=E[ (ni2)]= (n)
i 0n 1
Lecture 9
T for i 1 to n
do Insert(T,A[i])
Perform inorder tree walk of T
Tree-walk time= (n)
How long to build BST? n2 worst-case Observation: BST sort quicksort -Same comparisons, different order
MIT CLRS NOTES mit 算法导论课堂笔记
BST
Quicksort
comparison is the same, but have different comparison order. Depth=# comparisons during insert Ave. node depth=( (#comparisons/node))/n
nodes
=
1
O(nlgn) (QS analysis) n
=O(lgn)
Ave. BST height=O(lgn) but this argument doesn’t show it. Ex:
MIT CLRS NOTES mit 算法导论课堂笔记
Ave. node depth to choose
nlgn
n
n n=O(nlgn)/n=O(lgn)
n n
large (not dominate the nlgn) will make the BST 2
height increase: the thresh h will be nlgn
1. Prove Jensen’s inequality, which says f(E[X]) E[f(X)] for any convex function f and r.v. X
2. Analyze “exponential height” of randomly built BST on n nodes: Yn=2X where Xn is r.v. for height.
n
3. Prove that
2
E[Xn]
Xn3
E[Y]E[2]O[n] =n=
therefore E[Xn] O[lgn]
A function f: R->R is convex if , 0 $ 1, we have
f( x y) f(x) f(y) x,y R
f
MIT CLRS NOTES mit 算法导论课堂笔记
Let f be convex and let { 1, 2, ... , n} be nonnegtive constants $
nk 1
x
1. Then, for any set {x1,x2, ... ,xn}, we have
n
f( kxk) kf(xk)
k 1
here { 1, 2, ... , n}will be probability for us to use Proof: Induction on n.
For n=1, 1 1 then f( 1x1) 1f(x1) trivial
Let f be convex, and x be a r.v. Then f(E[X]) E[f(X)]. Proof. f(E[X])=f( kPr{X k})
k
f( kxk) f( nxn (1 n)
k 1
n
k
xk) 1 k 1n
n 1
nf(xn) (1 n)f( nf(xn) (1 n)
n 1
k
xk) convexity 1 k 1n
n 1
k
f(xk) I.H.
k 11 n
= kf(xk)
k 1
n
k
f(k)Pr{X k}=E[f(X)]
Xn=r.v. denoting height of BST( all perms equally likely)
Xn2Yn=
MIT CLRS NOTES mit 算法导论课堂笔记
Root has rank k Xn=1+max{Xk-1,Xn-k} Yn=2 max{Yk-1,Yn-k} Def. Z
nk
1 0
if root has rank k
otherwise
random v. indicator , like quick sort Pr{Znk=1}=E[Znk]=1/n Yn= Znk(2max(Yk-1,Yn-k))
k 1n
E[Yn] =E[ Znk(2max(Yk-1,Yn-k))]
k 1
n
= E[Znk(2max(Yk-1,Yn-k))]
k 1n
n
= E[Znk]E[(2max(Yk-1,Yn-k))] independence
k 1
2n
E[Yk-1 Yn-k] nk 1
4n 1
= E[Yi] ni 0
Solve Use subst. to show E[Yn] Cn3 for const C large enough for initial condition.
4n 13
E[Yn] Ck {substitution}
nk 0
MIT CLRS NOTES mit 算法导论课堂笔记
4Cn3
xdx n0
4Cn43
()=Cn
n4
2E[Xn] E[2Xn]
=E[Yn]
Cn
3
Jensen
therefore E[Xn] 3lgn+O(1)
max {a, b} a+b
when a b gets big, a+b approaches max {a, b} very slowly max{2a, 2b} 2a+2b
when a b gets big, 2a+2b approaches max {2a, 2b} very quickly, really tight bound.
sum of exponential like2a+2bis not easy to deal with, but product of exponential is easy, that is where Jensen comes in using convexity of exponentals.
MIT CLRS NOTES mit 算法导论课堂笔记
Lecture 10
Binary search tree data structure with guaranteed O(lg n) height supporting dynamic set operations
VL trees, B-treses/ 2-3 trees, red-black trees
these five loop invariance will maintain the balance 1. every node is either red or black 2. root is black
3. leaves (nil’s) are black
4. if a node is red, then its parent is black = no red adjacent we don’t want too many red, you can have lots of black nodes, black will be good. 5. for any node x
all simple paths from x to a descendent leaf have same # of black nodes. we do not count red nodes because there are not too many red nodes , so this will do us well.
nil’s)
上一篇:平新乔微观经济学第11讲