CS代考 STAT318 — Data Mining – cscodehelp代写

STAT318 — Data Mining
Dr
University of Canterbury, Christchurch,
, University of Canterbury 2021
STAT318 — Data Mining ,1 / 28

Association Analysis
Association analysis is an unsupervised learning technique that finds interesting associations in large data sets.
It is often applied to large commercial data bases, where it is useful for selective marketing.
If the data are binary-valued (asymmetric), the problem is called market basket analysis.
Let
be a set of p items and let
I = {i1,i2,…,ip}
T = {t1,t2,…,tN}
be a set of N transactions such that tj ⊂ I for j = 1,2,…,N.
, University of Canterbury 2021
STAT318 — Data Mining ,2 / 28

Terminology
Itemset: A collection of items from I. If X has k items it is called a k-itemset.
Support Count: The number of transactions that have X as a subset σ(X)=|{tj :X ⊂tj andtj ∈T}|.
Support: The fraction of transactions that have X as a subset s(X) = σ(X).
N
bcd -> a
acd -> b
abd -> c
abc -> d
cd -> ab bd -> ac
ac -> bd ab -> cd
d -> abc
c -> abd
b -> acd
a -> bcd
, University of Canterbury 2021
STAT318 — Data Mining ,22 / 28

Strong rules are not necessarily interesting
Example: Consider the following association rule and contingency table:
{Tea} → {Coffee}
Tea Not Tea
(support = 15%, confidence = 75%).
Coffee
Not Coffee
800 200 1000
(a) (b)
800 Pr(Coffee) =
150 50 650 150
Pr(Coffee| Tea) =
200
, University of Canterbury 2021
STAT318 — Data Mining ,23 / 28

Objective Interestingness Measures
Lift is an interestingness measure based on correlation.
Lift: Given two itemsets X and Y , the lift measure is defined as
lift(X,Y)= s(X ∪Y) s(X )s(Y )
= c(X →Y). s(Y )
Interpretation:
lift(X , Y )
= 1, 
> 1, < 1, X and Y are X and Y are X and Y are statistically independent positively correlated negatively correlated. Example: Find lift(Tea, Coffee). , University of Canterbury 2021 STAT318 — Data Mining ,24 / 28 Objective Interestingness Measures Cosine: Given two itemsets X and Y , the cosine measure is defined by s(X ∪Y) cosine(X , Y ) = 􏰎s(X )s(Y ) = 􏰎c(X→Y)c(Y→X). A cosine value close to one indicates that most of the transactions containing X also contain Y , and vice versa. A value close to zero means most transactions containing X do not contain Y , and vice versa. , University of Canterbury 2021 STAT318 — Data Mining ,25 / 28 Objective Interestingness Measures Symmetric Measure: a measure M is symmetric iff M(X →Y)=M(Y →X). Asymmetric Measure: a measure that is not symmetric. Null Transaction: a transaction that does not include any of the itemsets being examined. Null Invariant: a measure is null invariant if its value does not depend on null transactions. , University of Canterbury 2021 STAT318 — Data Mining ,26 / 28 Objective Interestingness Measures Example: Is lift a symmetric measure? Is it null invariant? Example: Is cosine a symmetric measure? Is it null invariant? , University of Canterbury 2021 STAT318 — Data Mining ,27 / 28 Summary Association rule mining consists of finding frequent itemsets, from which strong association rules are formed. The Apriori algorithm is a seminal method for association analysis. Association rules are among data mining’s biggest successes (Hastie, Tibshirani, Friedman). , University of Canterbury 2021 STAT318 — Data Mining ,28 / 28