COMP5318 Knowledge Discovery and Data Mining_2011 Semester 1

时间：2025-07-06

University of Sydney_COMP5318 Knowledge Discovery and Data Mining_2011

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Miningby Tan, Steinbach, Kumar

Introduction to Data Mining

4/18/2004

University of Sydney_COMP5318 Knowledge Discovery and Data Mining_2011

Association Rule Mining

Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction

Market-Basket transactionsTID Items

Example of Association Rules{Diaper} {Beer}, {Milk, Bread} {Eggs,Coke}, {Beer, Bread} {Milk},

1 2 3 4 5

Bread, Milk Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke

Implication means co-occurrence, not causality!

Introduction to Data Mining

4/18/2004

University of Sydney_COMP5318 Knowledge Discovery and Data Mining_2011

Definition: Frequent Itemset

Itemset– A collection of one or more items

Example: {Milk, Bread, Diaper}TID Items

– k-itemset

An itemset that contains k items

1 2 3 4 5

Bread, Milk Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke

Support count ( )– Frequency of occurrence of an itemset – E.g. ({Milk, Bread,Diaper}) = 2

Support– Fraction of transactions that contain an itemset – E.g. s({Milk, Bread, Diaper}) = 2/5

Frequent Itemset– An itemset whose support is greater than or equal to a minsup threshold

Introduction to Data Mining

4/18/2004

University of Sydney_COMP5318 Knowledge Discovery and Data Mining_2011

Definition: Association Rule

Association Rule– An implication expression of the form X Y, where X and Y are itemsets

TID

Items

1 2 3 4 5

Bread, Milk Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke

– Example: {Milk, Diaper} {Beer}

Rule Evaluation Metrics– Support (s)

Fraction of transactions that contain both X and Y

Example:

{Milk, Diaper} Beers

– Confidence (c)

Measures how often items in Y appear in transactions that contain X

(Milk , Diaper, Beer )|T|

2 0.4 5

(Milk, Diaper, Beer ) 2 c 0.67 (Milk , Diaper ) 34/18/2004 #

Introduction to Data Mining

University of Sydney_COMP5318 Knowledge Discovery and Data Mining_2011

Association Rule Mining Task

Given a set of transactions T, the goal of association rule mining is to find all rules having– support ≥ minsup threshold – confidence ≥ minconf threshold

Brute-force approach:– List all possible association rules – Compute the support and confidence for each rule – Prune rules that fail the minsup and minconf thresholds Computationally prohibitive!

Introduction to Data Mining

4/18/2004

University of Sydney_COMP5318 Knowledge Discovery and Data Mining_2011

Mining Association RulesTID Items

Example of Rules:{Milk,Diaper} {Beer} (s=0.4, c=0.67) {Milk,Beer} {Diaper} (s=0.4, c=1.0) {Diaper,Beer} {Milk} (s=0.4, c=0.67) {Beer} {Milk,Diaper} (s=0.4, c=0.67) {Diaper} {Milk,Beer} (s=0.4, c=0.5) {Milk} {Diaper,Beer} (s=0.4, c=0.5

)

1 2 3 4 5

Bread, Milk Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke

Observations: All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} Rules originating from the same itemset have identical support but can have different confidence

Thus, we may decouple the support and confidence requirements© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

University of Sydney_COMP5318 Knowledge Discovery and Data Mining_2011

Mining Association Rules

Two-step approach:1. Frequent Itemset Generation–

Generate all itemsets whose support minsup

2. Rule Generation–

Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset

Frequent itemset generation is still computationally expensive

Introduction to Data Mining

4/18/2004

University of Sydney_COMP5318 Knowledge Discovery and Data Mining_2011

Frequent Itemset Generationnull

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

ABCD

ABCE

ABDE

ACDE

BCDE

ABCDE

Given d items, there are 2d possible candidate itemsets4/18/2004 #

Introduction to Data Mining

University of Sydney_COMP5318 Knowledge Discovery and Data Mining_2011

Frequent Itemset Generation

Brute-force approach:– Each itemset in the lattice is a candidate frequent itemset – Count the support of each candidate by scanning the databaseTransactionsTID 1 2 3 4 5 Items Bread, Milk Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke

List of Candidates

– Match each transaction against every candidate – Complexity ~ O(NMw) => Expensive since M = 2d !!!© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #

University of Sydney_COMP5318 Kno …… 此处隐藏：9635字，全部文档内容请下载后查看。喜欢就下载吧 ……

COMP5318 Knowledge Discovery and Data Mining_2011 Semester 1.doc 将本文的Word文档下载到电脑

下载这篇word文档

上一篇：2009—2010学年度第二学期六年级第一次模拟测试语

下一篇：物流运输车辆的种类的类型

精彩图片

学习雷锋活动月演讲稿-国旗下讲话范文（多篇）

学习雷锋实践活动与感悟(多篇组合)

小学生学习雷锋活动心得体会范本（多篇）

关于小学校园教育教学工作总结报告范文

猜你喜欢

热门精选

经典双语美文：加油为了心中的梦想

2020-09-14

美文赏析：花开向暖，素锦留香

2020-09-14

经典美文：花开向暖，素锦留香

2020-09-14

经典美文：青春如歌，携一份懂得同行

2020-09-14

在雨中静听自己的心跳美文

2020-09-14

浅谈当今大学教育的含金量双语美文

2020-09-14

大家正在看

四岁幼儿的心理特点矿产资源储量规模划分标准水下冲孔灌注桩安全技术交底人教版语文九年级上册课本基础知江西财经大学语文期末考试复习之九年级化学下册9[1][1].1溶液的形成中通快递实习报告-快递客服的实项目风险管理体会

Copyright © 2023-2025 学科文库版权所有
本网站尊重并保护知识产权，根据《信息网络传播权保护条例》，如果我们转载的作品侵犯了您的权利，请在一个月内通知我们，我们会及时删除。
客服QQ：370150219 邮箱：370150219@qq.com
苏ICP备16052595号-5

× 游客快捷下载通道（下载后可以自由复制和排版）

限时特价：7 元/份 原价:20元

支付方式：

微信支付并下载

支付宝支付并下载

开通VIP包月会员特价：29元/月

注：下载文档有可能“只有目录或者内容不全”等情况，请下载之前注意辨别，如果您已付费且无法下载或内容有问题，请联系我们协助你处理。
微信：fanwen365 QQ：370150219