二 关联规则挖掘-- Apriori 算法( 三 )


(rules)的解释:
第一段:规则的条数 , 共有463条规则
第二段:规则长度的分布 , len=2有150条规则 , len=3有297 , len=4有16 。
第三段:规则长度的五数总括和均值 。
第四段:支持度、置信度和lift的五数总括和均值 。
第五段:挖掘的相关信息 。
(3)评估规则
规则可以划分为3大类:
那么 , 如何发现有用的rules呢?
> #按照lift , 对规则进行排序> orderbylift_rules <- sort(rules, by = 'lift')> inspect(orderbylift_rules[1:5])lhsrhssupport confidencelift count[1] {herbs}=> {root vegetables}0.0070157600.4312500 3.95647769[2] {berries}=> {whipped/sour cream} 0.0090493140.2721713 3.79688689[3] {tropical fruit,other vegetables,whole milk}=> {root vegetables}0.0070157600.4107143 3.76807469[4] {beef,other vegetables} => {root vegetables}0.0079308590.4020619 3.68869278[5] {tropical fruit,other vegetables} => {pip fruit}0.0094560240.2634561 3.48264993
items %in% c(“A”, “B”)表示 lhs+rhs的项集并集中至少有一个item在c(“A”, “B”)中 。
如果仅仅想搜索lhs或者rhs , 那么用lhs或rhs替换items即可 。如:lhs %in% c(“”) 。
%in%是精确匹配
%pin%是部分匹配 , 也就是说只要item like ‘%A%’ or item like ‘%B%’
%ain%是完全匹配 , 也就是说 has ’A’ andhas ‘B’
同时可以通过条件运算符(&, |, !) 添加 , , lift的过滤条件 。
> #按条件筛选关联规则> subsetrules <- subset(rules, subset = rhs %in% "whole milk" & lift >= 2)> inspect(sort(subsetrules, by = "support")[1:5])lhsrhssupportconfidence liftcount[1] {other vegetables,yogurt}=> {whole milk} 0.02226741 0.51288062.007235 219[2] {tropical fruit,yogurt}=> {whole milk} 0.01514997 0.51736112.024770 149[3] {root vegetables,yogurt}=> {whole milk} 0.01453991 0.56299212.203354 143[4] {pip fruit,other vegetables} => {whole milk} 0.01352313 0.51750972.025351 133[5] {root vegetables,rolls/buns} => {whole milk} 0.01270971 0.52301262.046888 125
这里一般用到另外两种测度指标:
:规则左边的支持度 , 衡量规则应用的频率 。
:卡方统计量 , 为了检验规则的lhs和rhs之间的独立性 。在α= 0.05时 , 具有1个自由度(2×2列联表)的卡方分布的临界值是3.84;较高的卡方值表明lhs和rhs不是独立的 , 也就是该值越大 , 表示关联规则越可信 。
还有很多种测度指标 , 具体可以看帮助? 。
> #查看其它的quality measure> qualityMeasures <- interestMeasure(subsetrules, measure = c("coverage", "chiSquared"), transactions = Groceries)> summary(qualityMeasures)coveragechiSquaredMin.:0.009964Min.: 43.351st Qu.:0.0118451st Qu.: 60.16Median :0.014642Median : 73.63Mean:0.016264Mean: 77.803rd Qu.:0.0177433rd Qu.: 91.90Max.:0.043416Max.:155.43> quality(subsetrules) <- cbind(quality(subsetrules), qualityMeasures)#将coverage和chiSquared统计量加入到quality(subsetrules)中> inspect(head(sort(subsetrules, by = "chiSquared")))lhsrhssupportconfidence liftcount coverage[1] {other vegetables,yogurt}=> {whole milk} 0.022267412 0.51288062.007235 2190.04341637[2] {root vegetables,yogurt}=> {whole milk} 0.014539908 0.56299212.203354 1430.02582613[3] {butter,yogurt}=> {whole milk} 0.009354347 0.63888892.500387920.01464159[4] {tropical fruit,root vegetables} => {whole milk} 0.011997966 0.57004832.230969 1180.02104728[5] {tropical fruit,yogurt}=> {whole milk} 0.015149975 0.51736112.024770 1490.02928317[6] {other vegetables,butter}=> {whole milk} 0.011489578 0.57360412.244885 1130.02003050chiSquared[1] 155.4279[2] 129.5825[3] 112.9113[4] 109.9678[5] 106.9339[6] 106.9239