Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
Tags
- 습관
- 완결 웹툰
- 영화
- pandas
- geopandas
- 네이버 웹툰
- 사랑
- 네이버
- 아주 작은 습관의 힘
- 제주도
- 이범선
- 진심
- 제임스 클리어
- 넷플릭스
- 만화 영화
- 서귀포
- 가족
- 이기적 출판사
- 산책
- QGIS
- 빅데이터 분석기사 필기
- python
- 커피
- 웹툰
- 완결
- 액션
- 애니메이션
- 빅데이터 분석기사
- 로맨스
- 네이버 완결 웹툰
Archives
- Today
- Total
JuJuKwakKwak
2022.04.03 - 이러닝 - Code 본문
Ch30¶
In [1]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
In [2]:
df = pd.read_csv('data/association_rules_mart.csv')
df.head(3)
Out[2]:
Date | ID | Item | |
---|---|---|---|
0 | 2014-01-01 | 1249in804 | citrus fruit |
1 | 2014-01-01 | 1249in804 | coffee |
2 | 2014-01-01 | 1381ht273 | curd |
In [3]:
df_cus = pd.read_csv('data/association_rules_customers.csv')
df_cus.head(2)
Out[3]:
ID | Gender | Age | |
---|---|---|---|
0 | 1000ol738 | F | 51 |
1 | 1001sf480 | M | 55 |
In [4]:
df_prod = pd.read_csv('data/association_rules_products.csv')
df_prod.head(2)
Out[4]:
product | price | |
---|---|---|
0 | citrus fruit | 3100 |
1 | coffee | 1500 |
1번¶
2014년에는 매출이 발생했으나 2015년에는 매출이 발생하지 않은 품목은 총 몇 개인가? (정답 예시 : 1)
In [5]:
df_q1 = df.copy()
df_q1.head(2)
Out[5]:
Date | ID | Item | |
---|---|---|---|
0 | 2014-01-01 | 1249in804 | citrus fruit |
1 | 2014-01-01 | 1249in804 | coffee |
In [6]:
df_q1['Date'] = pd.to_datetime(df_q1['Date'])
df_q1['year'] = df_q1['Date'].dt.year
df_q1['count'] = 1
df_q1.head(2)
Out[6]:
Date | ID | Item | year | count | |
---|---|---|---|---|---|
0 | 2014-01-01 | 1249in804 | citrus fruit | 2014 | 1 |
1 | 2014-01-01 | 1249in804 | coffee | 2014 | 1 |
In [7]:
df_q1_cnt = df_q1.groupby(['year', 'Item'])['count'].sum().reset_index()
df_q1_cnt.head()
Out[7]:
year | Item | count | |
---|---|---|---|
0 | 2014 | Instant food products | 37 |
1 | 2014 | UHT-milk | 160 |
2 | 2014 | abrasive cleaner | 12 |
3 | 2014 | artif. sweetener | 13 |
4 | 2014 | baby cosmetics | 1 |
In [9]:
df_q1_cnt_pivot = df_q1_cnt.pivot(index='Item', columns='year', values='count').reset_index()
df_q1_cnt_pivot.head()
Out[9]:
year | Item | 2014 | 2015 | 2016 |
---|---|---|---|---|
0 | Instant food products | 37.0 | 23.0 | 1.0 |
1 | UHT-milk | 160.0 | 169.0 | 5.0 |
2 | abrasive cleaner | 12.0 | 10.0 | 1.0 |
3 | artif. sweetener | 13.0 | 17.0 | 2.0 |
4 | baby cosmetics | 1.0 | 2.0 | NaN |
In [10]:
df_q1_cnt_pivot = df_q1_cnt_pivot.fillna(0)
df_q1_cnt_pivot.head()
Out[10]:
year | Item | 2014 | 2015 | 2016 |
---|---|---|---|---|
0 | Instant food products | 37.0 | 23.0 | 1.0 |
1 | UHT-milk | 160.0 | 169.0 | 5.0 |
2 | abrasive cleaner | 12.0 | 10.0 | 1.0 |
3 | artif. sweetener | 13.0 | 17.0 | 2.0 |
4 | baby cosmetics | 1.0 | 2.0 | 0.0 |
In [11]:
df_q1_cnt_pivot.loc[(df_q1_cnt_pivot[2014] > 0) & (df_q1_cnt_pivot[2015] == 0), ]
# 정답 : 3
Out[11]:
year | Item | 2014 | 2015 | 2016 |
---|---|---|---|---|
79 | kitchen utensil | 1.0 | 0.0 | 0.0 |
114 | preservation products | 1.0 | 0.0 | 0.0 |
155 | toilet cleaner | 5.0 | 0.0 | 0.0 |
2번¶
전해 12월의 매출은 차년도 매출과 꽤 연관이 깊다고 한다.
2014년도 12월 매출 상위 3개 품목을 확인하고 해당 품목의 2015년 매출 비중을
반올림하여 소수점 셋째 자리까지 기술하시오. (정답 예시 : 0.123)
In [12]:
df_q2 = df.copy()
df_q2.head(2)
Out[12]:
Date | ID | Item | |
---|---|---|---|
0 | 2014-01-01 | 1249in804 | citrus fruit |
1 | 2014-01-01 | 1249in804 | coffee |
In [13]:
df_q2['Date'] = pd.to_datetime(df_q2['Date'])
df_q2['year'] = df_q2['Date'].dt.year
df_q2['month'] = df_q2['Date'].dt.month
df_q2.head(2)
Out[13]:
Date | ID | Item | year | month | |
---|---|---|---|---|---|
0 | 2014-01-01 | 1249in804 | citrus fruit | 2014 | 1 |
1 | 2014-01-01 | 1249in804 | coffee | 2014 | 1 |
In [14]:
df_prod.head(2)
Out[14]:
product | price | |
---|---|---|
0 | citrus fruit | 3100 |
1 | coffee | 1500 |
In [15]:
df_q2_sub = df_q2.loc[(df_q2['year'] == 2014) & (df_q2['month'] == 12), ]
df_q2_sub = df_q2_sub.merge(df_prod, left_on='Item', right_on='product', how='left')
df_q2_sub.head(2)
Out[15]:
Date | ID | Item | year | month | product | price | |
---|---|---|---|---|---|---|---|
0 | 2014-12-01 | 1369fc211 | sugar | 2014 | 12 | sugar | 5000 |
1 | 2014-12-01 | 1369fc211 | flour | 2014 | 12 | flour | 3800 |
In [16]:
df_q2_2014_agg = df_q2_sub.groupby('Item')['price'].sum().reset_index()
df_q2_2014_agg.head()
Out[16]:
Item | price | |
---|---|---|
0 | Instant food products | 2700 |
1 | UHT-milk | 15600 |
2 | abrasive cleaner | 4600 |
3 | artif. sweetener | 1100 |
4 | baking powder | 5100 |
In [17]:
df_q2_2014_agg = df_q2_2014_agg.sort_values('price', ascending=False)
df_q2_2014_agg.head()
Out[17]:
Item | price | |
---|---|---|
100 | rolls/buns | 237800 |
136 | yogurt | 220000 |
82 | other vegetables | 120700 |
9 | bottled beer | 101500 |
12 | brown bread | 101400 |
In [19]:
df_q2_2015 = df_q2.loc[df_q2['year'] == 2015, ]
df_q2_2015 = df_q2_2015.merge(df_prod, left_on='Item', right_on='product', how='left')
df_q2_2015.head(2)
Out[19]:
Date | ID | Item | year | month | product | price | |
---|---|---|---|---|---|---|---|
0 | 2015-01-01 | 1220gy277 | canned beer | 2015 | 1 | canned beer | 2700 |
1 | 2015-01-01 | 1220gy277 | margarine | 2015 | 1 | margarine | 500 |
In [20]:
df_q2_2015_agg = df_q2_2015.groupby('Item')['price'].sum().reset_index()
df_q2_2015_agg.head()
Out[20]:
Item | price | |
---|---|---|
0 | Instant food products | 62100 |
1 | UHT-milk | 202800 |
2 | abrasive cleaner | 46000 |
3 | artif. sweetener | 18700 |
4 | baby cosmetics | 6400 |
In [21]:
df_q2_2014_agg['Item'][:3]
Out[21]:
100 rolls/buns 136 yogurt 82 other vegetables Name: Item, dtype: object
In [23]:
df_q2_2015_agg['top3'] = df_q2_2015_agg['Item'].isin(df_q2_2014_agg['Item'][:3]) + 0
df_q2_2015_agg.head()
Out[23]:
Item | price | top3 | |
---|---|---|---|
0 | Instant food products | 62100 | 0 |
1 | UHT-milk | 202800 | 0 |
2 | abrasive cleaner | 46000 | 0 |
3 | artif. sweetener | 18700 | 0 |
4 | baby cosmetics | 6400 | 0 |
In [24]:
df_q2_2015_agg2 = df_q2_2015_agg.groupby('top3')['price'].sum().reset_index()
df_q2_2015_agg2
Out[24]:
top3 | price | |
---|---|---|
0 | 0 | 36612500 |
1 | 1 | 9090600 |
In [25]:
round(df_q2_2015_agg2.iloc[1, 1] / df_q2_2015_agg2['price'].sum(), 3)
# 정답 : 0.199
Out[25]:
0.199
3번¶
남성과 여성의 상품 구매 성향이 다르다는 가정을 확인하기 위해서 2015년 데이터를 기반으로
연관규칙 분석을 실시하고 지지도가 0.05 이상인 규칙 중 향상도가 가장 높은 조건의 결과절(consequent) 품목을
남녀 차례대로 기술하시오. (정답 예시 : water, sugar)
In [26]:
df_q3 = df.copy()
df_q3.head(2)
Out[26]:
Date | ID | Item | |
---|---|---|---|
0 | 2014-01-01 | 1249in804 | citrus fruit |
1 | 2014-01-01 | 1249in804 | coffee |
In [27]:
df_q3_cnt = df_q3['ID'].value_counts().reset_index()
df_q3_cnt.head()
Out[27]:
index | ID | |
---|---|---|
0 | 3180so909 | 36 |
1 | 3915oj968 | 34 |
2 | 3050yl021 | 34 |
3 | 2051sz135 | 34 |
4 | 3737jm860 | 33 |
In [28]:
df_q3_cnt.loc[df_q3_cnt['ID'] == 1, ]
Out[28]:
index | ID |
---|
In [29]:
df_q3_cnt['ID'].min()
Out[29]:
2
In [31]:
df_q3['Date'] = pd.to_datetime(df_q3['Date'])
df_q3['year'] = df_q3['Date'].dt.year
df_q3.head(2)
Out[31]:
Date | ID | Item | year | |
---|---|---|---|---|
0 | 2014-01-01 | 1249in804 | citrus fruit | 2014 |
1 | 2014-01-01 | 1249in804 | coffee | 2014 |
In [32]:
df_q3_sub = df_q3.loc[df_q3['year'] == 2015, ]
df_q3_sub.head(2)
Out[32]:
Date | ID | Item | year | |
---|---|---|---|---|
18277 | 2015-01-01 | 1220gy277 | canned beer | 2015 |
18278 | 2015-01-01 | 1220gy277 | margarine | 2015 |
In [33]:
df_cus.head(2)
Out[33]:
ID | Gender | Age | |
---|---|---|---|
0 | 1000ol738 | F | 51 |
1 | 1001sf480 | M | 55 |
In [35]:
df_q3_join = df_q3_sub.merge(df_cus, left_on='ID', right_on='ID', how='left')
df_q3_join.head(2)
Out[35]:
Date | ID | Item | year | Gender | Age | |
---|---|---|---|---|---|---|
0 | 2015-01-01 | 1220gy277 | canned beer | 2015 | M | 47 |
1 | 2015-01-01 | 1220gy277 | margarine | 2015 | M | 47 |
In [36]:
df_m = df_q3_join.loc[df_q3_join['Gender'] == 'M']
df_m['purchase'] = True
C:\Users\KJW\AppData\Local\Temp/ipykernel_5040/1637540834.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_m['purchase'] = True
In [37]:
df_m_pivot = pd.pivot_table(data=df_m,
index='ID', columns='Item', values='purchase',
aggfunc=max,
fill_value=False)
df_m_pivot.head(2)
Out[37]:
Item | Instant food products | UHT-milk | abrasive cleaner | artif. sweetener | baby cosmetics | bags | baking powder | bathroom cleaner | beef | berries | beverages | bottled beer | bottled water | brandy | brown bread | butter | butter milk | cake bar | candles | candy | canned beer | canned fish | canned fruit | canned vegetables | cat food | cereals | chewing gum | chicken | chocolate | chocolate marshmallow | citrus fruit | cleaner | cling film/bags | cocoa drinks | coffee | condensed milk | cooking chocolate | cookware | cream | cream cheese | curd | curd cheese | dental care | dessert | detergent | dish cleaner | dishes | dog food | domestic eggs | female sanitary products | finished products | fish | flour | flower (seeds) | flower soil/fertilizer | frankfurter | frozen dessert | frozen fish | frozen fruits | frozen meals | ... | pasta | pastry | pet care | photo/film | pickled vegetables | pip fruit | popcorn | pork | pot plants | potato products | processed cheese | prosecco | pudding powder | ready soups | red/blush wine | rice | roll products | rolls/buns | root vegetables | rubbing alcohol | rum | salt | salty snack | sauces | sausage | seasonal products | semi-finished bread | shopping bags | skin care | sliced cheese | snack products | soap | soda | soft cheese | softener | soups | sparkling wine | specialty bar | specialty cheese | specialty chocolate | specialty fat | specialty vegetables | spices | spread cheese | sugar | sweet spreads | syrup | tea | tidbits | tropical fruit | turkey | vinegar | waffles | whipped/sour cream | whisky | white bread | white wine | whole milk | yogurt | zwieback |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1001sf480 | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | True | False | False | False | False |
1002nj599 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | True | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False |
2 rows × 160 columns
In [38]:
item_sets = apriori(df=df_m_pivot, min_support=0.005, use_colnames=True)
item_sets.head(2)
Out[38]:
support | itemsets | |
---|---|---|
0 | 0.005917 | (Instant food products) |
1 | 0.054734 | (UHT-milk) |
In [39]:
rules = association_rules(item_sets, metric='confidence', min_threshold=0.005)
rules.head(2)
Out[39]:
antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
---|---|---|---|---|---|---|---|---|---|
0 | (UHT-milk) | (bottled beer) | 0.054734 | 0.113905 | 0.006657 | 0.121622 | 1.067743 | 0.000422 | 1.008785 |
1 | (bottled beer) | (UHT-milk) | 0.113905 | 0.054734 | 0.006657 | 0.058442 | 1.067743 | 0.000422 | 1.003938 |
In [40]:
rules_sub = rules.loc[rules['support'] > 0.05, ]
rules_sub = rules_sub.sort_values('lift', ascending=False)
rules_sub.head(2)
Out[40]:
antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
---|---|---|---|---|---|---|---|---|---|
320 | (bottled beer) | (whole milk) | 0.113905 | 0.347633 | 0.055473 | 0.487013 | 1.400939 | 0.015876 | 1.271702 |
321 | (whole milk) | (bottled beer) | 0.347633 | 0.113905 | 0.055473 | 0.159574 | 1.400939 | 0.015876 | 1.054340 |
In [41]:
df_f = df_q3_join.loc[df_q3_join['Gender'] == 'F']
df_f['purchase'] = True
C:\Users\KJW\AppData\Local\Temp/ipykernel_5040/3691065010.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_f['purchase'] = True
In [42]:
df_f_pivot = pd.pivot_table(data=df_f,
index='ID', columns='Item', values='purchase',
aggfunc=max,
fill_value=False)
df_f_pivot.head(2)
Out[42]:
Item | Instant food products | UHT-milk | abrasive cleaner | artif. sweetener | baby cosmetics | bags | baking powder | bathroom cleaner | beef | berries | beverages | bottled beer | bottled water | brandy | brown bread | butter | butter milk | cake bar | candles | candy | canned beer | canned fish | canned fruit | canned vegetables | cat food | cereals | chewing gum | chicken | chocolate | chocolate marshmallow | citrus fruit | cleaner | cling film/bags | cocoa drinks | coffee | condensed milk | cooking chocolate | cookware | cream | cream cheese | curd | curd cheese | decalcifier | dental care | dessert | detergent | dish cleaner | dishes | dog food | domestic eggs | female sanitary products | finished products | fish | flour | flower (seeds) | flower soil/fertilizer | frankfurter | frozen chicken | frozen dessert | frozen fish | ... | pastry | pet care | photo/film | pickled vegetables | pip fruit | popcorn | pork | pot plants | potato products | processed cheese | prosecco | pudding powder | ready soups | red/blush wine | rice | roll products | rolls/buns | root vegetables | rubbing alcohol | rum | salad dressing | salt | salty snack | sauces | sausage | seasonal products | semi-finished bread | shopping bags | skin care | sliced cheese | snack products | soap | soda | soft cheese | softener | soups | sparkling wine | specialty bar | specialty cheese | specialty chocolate | specialty fat | specialty vegetables | spices | spread cheese | sugar | sweet spreads | syrup | tea | tidbits | tropical fruit | turkey | vinegar | waffles | whipped/sour cream | whisky | white bread | white wine | whole milk | yogurt | zwieback |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1000ol738 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | True | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | False |
1004jh583 | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False |
2 rows × 164 columns
In [43]:
item_sets = apriori(df=df_f_pivot, min_support=0.005, use_colnames=True)
rules = association_rules(item_sets, metric='confidence', min_threshold=0.005)
rules_sub = rules.loc[rules['support'] > 0.05, ]
rules_sub = rules_sub.sort_values('lift', ascending=False)
rules_sub.head(2)
# 정답 : whole milk, soda
Out[43]:
antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
---|---|---|---|---|---|---|---|---|---|
1709 | (sausage) | (soda) | 0.191283 | 0.197094 | 0.051332 | 0.268354 | 1.361553 | 0.013631 | 1.097397 |
1708 | (soda) | (sausage) | 0.197094 | 0.191283 | 0.051332 | 0.260442 | 1.361553 | 0.013631 | 1.093514 |
'Data Science > 국비지원' 카테고리의 다른 글
Day72 : 세미프로젝트 발표 (0) | 2022.04.04 |
---|---|
Day72 - 이러닝 - Code(1) (0) | 2022.04.04 |
Day72 - 이러닝 : 스택과 큐 (0) | 2022.04.04 |
2022.04.03 - 이러닝 (0) | 2022.04.04 |
Day71 : mecab 설치 (0) | 2022.04.01 |