뽐뿌 특가 데이터 분석 하기

나는 현명한 소비자가 되기 위해 항상 최적의 가격과 가성비 있는 상품을 찾는 데 큰 관심을 가지고 있다. 이러한 관심은 소비 습관을 개선하고 특별한 노하우를 가지게 되는데 큰 도움을 주었다. 데이터 사이언티스트로서 이 노하우를 활용하여 대표적인 특가 정보 커뮤니티인 ‘뽐뿌’의 특가 데이터를 분석하려고 한다. ‘뽐뿌’는 연간 약 2만 5천개의 특가 게시물이 공유되며, 수많은 소비자들이 정보 교환의 장으로 이용하는 대표적인 커뮤니티이다. 특가 게시물은 많이 등록되지만 그 중 유용한 특가는 소비자의 선택(조회, 추천, 댓글의 수)을 받아 인기/핫 게시물 (이하 인기 게시물)이라고 볼 수 있다. 이는 해당 제품이 매력적인 특가이기 때문에 대다수의 소비자에게 선택 받았다는 것을 의미하기 때문이다. 그래서 이번 분석에서는 일반 게시물과 인기 게시물의 차이를 중점적으로 살펴보려고 한다. 이번 분석으로 “더 현명한 소비”를 위한 인사이트를 도출하고, 이를 통해 소비자들이 더 현명한 소비를 할 수 있는데 도움이 되었으면 한다.

용어정리

특가 게시물 : 일반, 인기 게시물을 모두 통칭하는 게시물
일반 게시물 : 특가 정보를 공유하는 게시물
인기 게시물 : 일반 게시물에서 조회수, 추천, 댓글 수가 많아 소비자의 선택을 받아 인기 게시물로 등록된 게시물

1. 목적

이 분석을 통해 “뽐뿌”의 일반 게시물과 인기 게시물 사이의 차이점을 명확히 파악하고자 한다. 이 차이점을 바탕으로 어떤 상품이나 정보가 소비자들에게 인기를 받고 있는지 알아보려고 한다. 이런 인사이트를 통해 소비자들이 더 현명한 구매 결정을 할 수 있도록 도움을 주는 것을 목적으로 한다.

2. 분석 순서

인기 게시물 특성
- 인기 게시물의 특성을 일반 게시물과 비교하여 확인한다.
카테고리 분석
- 인기 게시물이 어떤 카테고리에 많은지 파악한다.
- 각 카테고리별로 얼마나 많은 게시물이 등록되었는지 확인한다.
키워드 분석
- ‘kiwi’ 패키지를 이용하여 게시물의 텍스트에서 명사만을 추출한다.
- 추출된 명사 중 상위 20개의 단어를 통계적으로 분석하여 일반 게시물과 인기/핫 게시물 간의 키워드 차이를 비교한다.
가격 분석
- 일반 게시물과 인기 게시물의 상품 가격을 기술통계와 박스 플롯을 활용하여 비교 분석한다.
- 배송비에 대한 비교 분석도 박스 플롯을 이용해 진행한다.
판매 채널 분석
- 일반 게시물과 인기 게시물에서 자주 언급되는 판매 채널 상위 10개를 파악한다.
- 상위 10개에 들지 못하는 판매 채널에서의 일반 게시물과 인기/핫 게시물의 비율 및 수를 확인한다.
시계열 분석
- 일반 게시물과 인기 게시물의 등록 패턴을 연도별, 월 별, 일별, 요일별, 시간별로 분석한다.

3. 결론

소비자들은 유용하다고 생각하는 인기 게시물은 일반 게시물에 비해 많이 조회하고, 이를 추천하며 댓글을 활발하게 남기는 경향이 있다. 특히 추천 수는 인기 게시물에 대한 소비자의 만족도를 반영하는 중요한 지표로 작용한다.
뽐뿌에서는 상품권과 식품/건강 카테고리는 인기 게시물이 상당히 많은 반면, 화장품, 육아, 등산/캠핑, 그리고 서적 관련 카테고리에서는 인기 게시물이 상대적으로 부족하다. 이러한 카테고리의 제품을 찾고 있는 사용자는 다른 정보 소스를 찾는 것이 더 도움이 될 것이다.
뽐뿌의 사용자들은 코로나와 관련된 제품, 식품, 그리고 상품권에 특히 관심을 많이 보인다. 반면에 비싼 가전 제품들은 상대적으로 관심이 덜한 편이다.
대다수의 뽐뿌 사용자들은 100,000원 이하의 저렴한 가격대의 제품을 선호하는 것으로 보이며, 대부분의 특가 게시물은 무료 배송을 제공한다. 그러나 배송비가 인기 게시물에 큰 영향을 주는 것은 아니다.
지마켓/옥션은 특가 정보가 활발하게 공유되는 판매 채널로, 많은 소비자들로부터 큰 관심을 받는다. 인터파크와 롯데는 전체 게시물 대비 인기 게시물의 비율이 상당히 높다. 11번가는 많은 특가 게시물 중 작은 비율로 인기 게시물로 등록된다.쿠팡, 네이버, 하이마트, 카카오는 각각의 특화된 제품 또는 서비스 카테고리에서 유용한 특가 정보를 제공할 가능성이 높다.
2022년은 특가 게시물의 총 수는 크게 변하지 않았지만, 인기 게시물의 수는 눈에 띄게 감소했다. 이러한 변화의 원인으로 경제적 불황과 코로나의 장기화가 영향을 미쳤을 것으로 보인다.
5월과 11월에는 지마켓/옥션의 스마일데이와 11번가의 그랜드십일절로 인해 특가 게시물이 집중적으로 게시되는 경향이 있다. 그리고, 특히 1일과 11일에는 티몬과 11번가의 십일절 때문에 특가 게시물이 많이 게시되며, 인기 게시글도 많아 진다. 평일에는 특가 게시물이 주말에 비해 약 2배 더 많다. 아침 10시부터 11시 사이에 특가 게시물이 가장 활발하게 등록되며, 새벽 3시부터 4시 사이에는 가장 적게 등록된다.

4. 현명한 소비를 위한 전략적 제안

추천수를 주목하라

추천수는 소비자들의 만족도와 제품의 가치를 반영하는 중요한 지표다. 높은 추천수를 받은 게시물에는 가치 있는 특가 정보가 있을 확률이 높다.

적극적인 정보 탐색

상품권, 식품/건강 카테고리는 인기 게시물이 많으므로 뽐뿌를 적극 활용한다. 하지만 화장품, 육아, 등산/캠핑, 서적 등의 카테고리에서는 뽐뿌 보다는 다른 특가 정보 플랫폼을 활용하여 특가 제품을 찾아본다.

가격대 선택

대부분의 사용자들은 100,000원 이하의 저렴한 제품을 선호한다. 따라서 이 가격대의 제품을 주로 찾아보는 것이 좋다. 무료 배송 제품을 우선적으로 찾는 것도 추천한다.

이벤트 및 프로모션 시기 활용

5월과 11월, 그리고 1일과 11일에 특별한 이벤트나 할인이 진행된다. 이 기간 동안 쇼핑을 계획하여 가장 큰 혜택을 받을 수 있다.

판매 채널 선택

지마켓/옥션, 인터파크, 롯데에서는 특가 정보가 활발하게 공유되므로 이 판매 채널들을 눈여겨 보면 좋을것 같다.

특가 게시물 등록이 활발한 시간대 이용

오전 10시부터 11시 사이에는 특가 게시물이 가장 많이 등록된다. 이 시간대에 자주 방문하여 최신 특가 정보를 놓치지 않도록 한다.

이렇게 소비자로서 정보를 스마트하게 활용하고, 다양한 플랫폼과 이벤트를 잘 활용한다면, 훨씬 더 현명한 소비를 할 수 있을 것이다.

5. 상세 분석 및 코드

Package and Data load

import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
import seaborn as sns
import pingouin as pg
import ast
from PIL import Image

from scipy.stats import shapiro, levene, mannwhitneyu, chi2_contingency, ttest_ind
from datetime import datetime
from tqdm import tqdm
from collections import Counter
from wordcloud import WordCloud
from kiwipiepy import Kiwi

# custom function
from utils import autolabel, remove_outliers, top_text

# 파라미터
sns.set(style="ticks")
plt.rcParams['font.family'] = 'NanumGothic'
custom_palette = ["skyblue", "lightgreen"]
%matplotlib inline
%config InlineBackend.figure_format='retina'
title_font_size = 14
sub_font_size = 12

# 전처리한 데이터 불러오기
data = pd.read_csv("./datas/2023-08-08 23:47:16.905195_preprocessing.csv")

data_info = data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 117980 entries, 0 to 117979
Data columns (total 20 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 item_no        117980 non-null  int64  
 writer         117980 non-null  object 
 title          117980 non-null  object 
 end            117980 non-null  bool   
 comment        117980 non-null  int64  
 date           117980 non-null  object 
 recommend      117980 non-null  int64  
 opposite       117980 non-null  int64  
 view           117980 non-null  int64  
 category       117980 non-null  object 
URL            117980 non-null  object 
pop            117980 non-null  bool   
hot            117980 non-null  bool   
sales_channel  117980 non-null  object 
price          117980 non-null  object 
product_price  96227 non-null   float64
shipping_cost  103266 non-null  float64
real_title     117950 non-null  object 
keywords       117980 non-null  object 
post_type      117980 non-null  object 
dtypes: bool(3), float64(2), int64(5), object(10)
memory usage: 15.6+ MB

Column	설명
item_no	게시물 번호
Author	작성자
Title	게시물 제목
end	특가 종료 여부
Comments	댓글 수
Date	게시 날짜
recommend	추천수
opposite	반대수
view	조회수
Category	특가 제품이 속한 카테고리
URL	URL
pop	인기 게시물 여부
hot	핫 게시물 여부
sales_channel	판매 채널
price	상품 가격/배송비
product_price	상품 가격
shipping_cost	배송비
real_title	게시물 제목 (판매채널, 상품가격/배송비 제외)
keywords	게시글 제목에서 추출된 명사 키워드
post_type	일반, 인기/핫 게시물 여부

1. 인기 게시물 특성

특가 게시물의 인기 요소

인기 있는 특가 게시물을 살펴보니, 소비자들이 게시물을 더 많이 본다는 것, 댓글을 많이 남긴다는 것, 그리고 추천을 많이 한다는 것을 확인했다. 이런 게시물을 소비자들이 ‘유용한 특가’로 생각하는것 을 알 수 있다.
전체 게시물 중에는 73%가 일반 게시물이고, 27%만이 인기 게시물이다. 이것은 총 게시물 중 27%만이 진짜로 소비자들의 마음을 사로잡는 특가라는 것을 의미한다.
숫자로 보면, 인기 게시물은 일반 게시물에 비해 조회수와 댓글은 약 2배, 특히, 추천수는 11배 많아 소비자의 만족도를 잘 반영하는 지표로 보인다. 이 차이는 통계적으로도 의미있는 차이로 나타났다. (P-value 0.05)

# 일반 게시물과 인기 게시물의 비율
plt.figure(figsize=(8,4))
ax = sns.countplot(data=data, x="post_type", palette=custom_palette)

# 전체 게시물 수 계산
total = len(data)

# 각 바 위에 비율 표시
for p in ax.patches:
    percentage = '{:.1f}%'.format(100 * p.get_height() / total)
    x = p.get_x() + p.get_width() / 2
    y = p.get_y() + p.get_height() * 1.02
    ax.annotate((f"{int(p.get_height())}개 ({percentage})"), (x, y), ha='center', va='center', size=sub_font_size)

ax.set_xlabel('')
ax.set_ylabel('')
ax.set_xticklabels(["일반", "인기"])

plt.title("유형 별 게시물 비율", size=title_font_size, fontweight='bold')
sns.despine()
plt.show()

output_10_0

# 실제로 일반 게시물과 인기 게시물이 조회수(view), 추천수(recommend), 댓글수(comment)의 평균 차이가 있는지 T-test 통계 검증을 진행한다.
# 단, 데이터 수가 약 12만개이므로 정규성, 독립성, 등분산성 확인은 생략한다.

general_data = data[data['post_type'] == 'general']
popular_hot_data = data[data['post_type'] == 'popular/hot']
metrics = ['view', 'recommend', 'comment']
for metric in metrics:
    _, p_value = ttest_ind(general_data[metric], popular_hot_data[metric])

    # 유의수준에 따라 결과 해석
    alpha = 0.05  # 유의수준 (보통 0.05)
    if p_value < alpha:
        print(f"p-value {p_value}가 유의수준 {alpha}보다 작으므로 일반 / 인기 게시물간의 {metric}의 평균 차이는 통계적으로 유의미합니다.")
    else:
        print(f"p-value가 유의수준보다 크므로 일반 / 인기 게시물간의 {metric} 통계적으로 유의미하지 않습니다.")

p-value 0.0가 유의수준 0.05보다 작으므로 일반 / 인기 게시물간의 view의 평균 차이는 통계적으로 유의미합니다.
p-value 0.0가 유의수준 0.05보다 작으므로 일반 / 인기 게시물간의 recommend의 평균 차이는 통계적으로 유의미합니다.
p-value 0.0가 유의수준 0.05보다 작으므로 일반 / 인기 게시물간의 comment의 평균 차이는 통계적으로 유의미합니다.

avg_metrics = data.groupby('post_type')[['view', 'recommend', 'comment']].mean()

fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(8, 12))
metrics = ['view', 'recommend', 'comment']
titles = ['평균 조회수', '평균 추천수', '평균 댓글수']

# 그래프
for ax, metric, title in zip(axes, metrics, titles):
    avg_metrics[metric].plot(kind='bar', ax=ax, rot=0, color=custom_palette)
    ax.set_title(title, size=title_font_size, fontweight='bold')
    ax.set_xlabel("")
    ax.set_xticklabels(["일반", "인기"])

    # 숫자 추가
    for bar in ax.patches:
        yval = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2, yval, round(yval,2), ha='center', va='bottom', fontsize=sub_font_size)
        
sns.despine()
plt.tight_layout()
plt.show()

2. 카테고리 분석

상품권

전체 게시물 중 상품권 카테고리는 4%이지만, 45.82%가 인기 게시물이다. 이는 상품권 특가에 대한 높은 관심을 반영한다. 상품권 특가를 찾는다면 뽐뿌를 자주 확인하는 것이 좋다.

식품/건강

인기 게시물 비율은 32.23%. 전체 평균인 27%보다 높다. 이는 식품/건강 관련 특가에 대한 관심이 큰 것을 볼 수 있다.

의류/잡화, 디지털, 기타

인기 게시물 비율은 약 26%로, 전체 평균(27%)에 거의 근접하다. 즉, 이 카테고리들은 뽐뿌에서 다른 카테고리와 비슷한 관심도를 받고 있으며 괜찮은 특가 정보를 발견할 수 있어 보인다.

가전/가구, 컴퓨터

많은 게시물이 있지만, 인기 게시물의 비율은 15%와 18%로 평균보다 낮다. 이런 결과로 보아 뽐뿌에서 이 카테고리의 특가 정보가 큰 인기를 얻기는 어려운 것으로 보인다.

화장품, 육아, 등산/캠핑, 서적

게시물과 인기 게시물의 비율이 모두 낮아, 뽐뿌에서 좋은 특가 정보를 찾는 것은 쉽지 않을 것 같다. 특가 정보가 필요하면 다른 커뮤니티를 확인하는 것이 좋을 것 같다.

# 데이터 프레임에서 게시물 종류 별 카테고리 갯수
category_counts = data.groupby(['post_type', 'category']).size().unstack(level=0).reset_index()
category_counts["total"] = category_counts["general"] + category_counts["popular/hot"]
category_counts["general_ratio"] = round(category_counts["general"] / category_counts["total"] * 100, 2)
category_counts["popular/hot_ratio"] = round(category_counts["popular/hot"] / category_counts["total"] * 100, 2)
category_counts["category"] = category_counts["category"].str.strip("[]")
category_counts = category_counts.sort_values(by="general_ratio", ascending=True)
category_counts

카테고리	일반 게시물	인기 게시물	종합	일반 게시물 비율	인기 게시물 비율
상품권	2568	2172	4740	54.18	45.82
식품/건강	27209	12940	40149	67.77	32.23
의류/잡화	6834	2433	9267	73.75	26.25
디지털	11489	4048	15537	73.95	26.05
기타	17633	5880	23513	74.99	25.01
서적	356	116	472	75.42	24.58
화장품	2488	755	3243	76.72	23.28
등산/캠핑	803	179	982	81.77	18.23
컴퓨터	6989	1544	8533	81.91	18.09
가전/가구	7483	1413	8896	84.12	15.88
육아	2279	369	2648	86.06	13.94

# 순서 정렬
category_counts = category_counts.sort_values(by="general_ratio", ascending=False)

# 그래프 그리기
plt.figure(figsize=(12,4))
bar_general = plt.barh(category_counts["category"], category_counts["general_ratio"], color="skyblue", label="일반 게시물")
bar_popular_hot = plt.barh(category_counts["category"], category_counts["popular/hot_ratio"], left=category_counts["general_ratio"], color="lightgreen", label="인기 게시물")

# 그래프에 퍼센트 삽입
autolabel(bar_general)
autolabel(bar_popular_hot, previous_bars=[bar_general])

plt.xlabel("")
plt.ylabel("")
plt.legend(bbox_to_anchor=(1.05, 1))
plt.title("카테고리별 일반, 인기 게시물비율", size=title_font_size, fontweight="bold")
sns.despine()
plt.show()

category_counts = category_counts.sort_values(by="general", ascending=True)

plt.figure(figsize=(12,4))
bar_general = plt.barh(category_counts["category"], category_counts["general"], color="skyblue", label="일반 게시물")
bar_popular = plt.barh(category_counts["category"], category_counts["popular/hot"], left=category_counts["general"], color="lightgreen", label="인기 게시물")

# 각 바 위에 수치 추가
for bar in bar_general:
    plt.text(bar.get_width() - 0.02 * bar.get_width(), bar.get_y() + bar.get_height()/2, 
             f'{int(bar.get_width())}', va='center', ha='right', color='black', fontsize=10)

for bar in bar_popular:
    plt.text(bar.get_x() + bar.get_width() + 0.02 * bar.get_width(), bar.get_y() + bar.get_height()/2, 
             f'{int(bar.get_width())}', va='center', ha='left', color='black', fontsize=10)

plt.xlabel("")
plt.ylabel("")
plt.legend(bbox_to_anchor=(1.05, 1))
plt.title("카테고리별 일반, 인기 게시물 갯수", size=title_font_size, fontweight="bold")
sns.despine()
plt.show()

3. 키워드 분석

공통 키워드 (주황색):

마스크, 우유, 제로는 일반 게시물과 인기 게시물 양쪽에서 자주 나타난다.
코로나19 때문에 마스크와 제로 키워드가 많이 등장하고, 우유는 일상생활에 꼭 필요한 항목이기 때문에 자주 나타나는 것으로 추측된다.

일반 키워드 (하늘색):

모니터, 게이밍, 청소기는 주로 일반 게시물에서 보인다. 이는 앞에서 가전/가구 카테고리와 연관이 있어 보인다.
이 제품들은 가격이 상대적으로 비싸거나, 뽐뿌 커뮤니티의 주 사용자 취향과는 조금 거리가 있어 인기 게시물로 많이 오르지 못하는 것 같다.

인기 키워드 (초록색):

비비고, 컬쳐, 오뚜기, 동원 등의 키워드는 인기 게시물에서 주로 보인다.
식품 관련 키워드인 비비고와 동원이 인기가 있고, 상품권 키워드인 컬쳐 역시 주목받고 있다. 이를 통해 뽐뿌 사용자들이 식품과 상품권 특가에 관심이 많다는 것을 알 수 있다.

data['keywords'] = data['keywords'].apply(ast.literal_eval)

# 일반, 인기 게시물의 키워드 추출
pop_hot_keywords = [keyword for keywords_list in data[data['post_type'] == 'popular/hot']['keywords'] for keyword in keywords_list]
general_keywords = [keyword for keywords_list in data[data['post_type'] == 'general']['keywords'] for keyword in keywords_list]

# 키워드 언급 횟수
pop_hot_keyword_freq = Counter(pop_hot_keywords)
general_keyword_freq = Counter(general_keywords)

# 상위 20개 키워드 추출
top_pop_hot_keywords = pop_hot_keyword_freq.most_common(20)
top_general_keywords = general_keyword_freq.most_common(20)

# 상위 20개 키워드만 추출
top_20_pop_hot_keywords = set([keyword for keyword, _ in top_pop_hot_keywords])
top_20_general_keywords = set([keyword for keyword, _ in top_general_keywords])

# 상위 20개 키워드 중 공통, 인기, 일반 키워드 추출
unique_pop_hot_keywords_top_20 = top_20_pop_hot_keywords - top_20_general_keywords
unique_general_keywords_top_20 = top_20_general_keywords - top_20_pop_hot_keywords

# 공통, 인기, 일반 키워드의 색상 지정
pop_hot_colors = ['lightgreen' if keyword in unique_pop_hot_keywords_top_20 else 'lightgreen' if keyword in unique_general_keywords_top_20 else 'salmon' for keyword, _ in top_pop_hot_keywords]
general_colors = ['skyblue' if keyword in unique_pop_hot_keywords_top_20 else 'skyblue' if keyword in unique_general_keywords_top_20 else 'salmon' for keyword, _ in top_general_keywords]

# 그래프 생성
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

pop_hot_keywords_plot, pop_hot_frequencies_plot = zip(*top_pop_hot_keywords)
general_keywords_plot, general_frequencies_plot = zip(*top_general_keywords)

# 일반 게시물 그래프
bars1 = ax1.barh(general_keywords_plot, general_frequencies_plot, color=general_colors)
ax1.set_title('일반 게시물')
for bar, freq in zip(bars1, general_frequencies_plot):
    ax1.text(freq - 2, bar.get_y() + bar.get_height()/2, int(freq), 
             va='center', ha='right', color='black')
ax1.invert_yaxis()

# 인기 게시물 그래프
bars2 = ax2.barh(pop_hot_keywords_plot, pop_hot_frequencies_plot, color=pop_hot_colors)
ax2.set_title('인기 게시물')
for bar, freq in zip(bars2, pop_hot_frequencies_plot):
    ax2.text(freq - 2, bar.get_y() + bar.get_height()/2, int(freq), 
             va='center', ha='right', color='black')
ax2.invert_yaxis()

fig.suptitle("게시물 유형 별 상위 20개 키워드", size=title_font_size, fontweight="bold")
plt.tight_layout()
sns.despine()
plt.show()

# 키워드를 단일 list로 변경
all_keywords = [keyword for sublist in data['keywords'] for keyword in sublist]

# 리스트 to 시리즈
keywords_series = pd.Series(all_keywords)

# 마스크 이미지
uploaded_mask_image = Image.open("./images/mask_image.png").convert("L")  # Convert to grayscale
uploaded_mask = np.array(uploaded_mask_image)


# 워드 클라우드 생성
wordcloud_with_uploaded_mask = WordCloud(
    font_path='./font/NanumGothic.ttf',
    colormap="PuBu",
    background_color='white',
    mask=uploaded_mask,
    width=800, height=400,
    contour_width=0.5,
    contour_color='white',
    collocations=False
).generate_from_frequencies(keywords_series.value_counts())

# 워드 클라우드 표시
plt.figure(figsize=(12, 12))
plt.imshow(wordcloud_with_uploaded_mask, interpolation="bilinear")
plt.axis('off')
plt.savefig('./01_wordcloud.png', dpi=200)
plt.show()

output_21_0

4. 가격 분석

제품 가격

상품들의 가격은 주로 100,000원 아래로 분포한다. 인기 게시물의 제품들은 일반 게시물보다 더 저렴한 편이다. 이것은 제품이 특별한 할인이나 프로모션으로 인하여 저렴한 가격에 판매되어 인기 게시물로 선정된 것을 반영한다.

배송비

전체 특가 게시물 중 79%의 상품들은 무료 배송으로 무료 배송 제품의 비율이 유료 배송 제품보다 약 4배 더 많다. 인기 게시물에서는 무료 배송 제품이 0.7%p 정도 더 많이 등장하는데, 이 차이는 통계적으로는 의미가 있지만 실제 연관성은 약하다. 대부분의 특가 게시물이 무료 배송인 것을 고려하면, 배송비는 인기 게시물 선정에 크게 기여하지 않는 것으로 보인다.

# 게시물 유형별 가격 분포 시각화
plt.figure(figsize=(8, 4))
ax = sns.boxplot(x='post_type', y='product_price', data=data, palette=custom_palette, showfliers=False)

# 박스플롯의 중앙값을 숫자로 표시
medians = data.groupby(['post_type'])['product_price'].median().values
medians = [int(round(m, 2)) for m in medians]
median_labels = [str(np.round(s, 2)) for s in medians]

pos = range(len(medians))
for tick, label in zip(pos, ax.get_xticklabels()):
    ax.text(pos[tick], medians[tick] + 5000, f"{median_labels[tick]}원", 
            horizontalalignment='center', size='small', color="black", weight='semibold')

plt.title('게시물 유형별 가격 분포', size=14, fontweight="bold")
sns.despine()
plt.xlabel("")
plt.ylabel('가격')
plt.xticks([0, 1], ['일반', '인기'])
plt.show()

# 실제로 일반 게시물과 인기 게시물이 조회수(view), 추천수(recommend), 댓글수(comment)의 평균 차이가 있는지 T-test 통계 검증을 진행한다.
# 단, 데이터 수가 약 12만개이므로 정규성, 독립성, 등분산성 확인은 생략한다.

general_prices = data[data['post_type'] == 'general']['product_price'].dropna()
popular_hot_prices = data[data['post_type'] == 'popular/hot']['product_price'].dropna()
_, p_value = ttest_ind(general_prices, popular_hot_prices)

# 유의수준에 따라 결과 해석
alpha = 0.05  # 유의수준 (보통 0.05)
if p_value < alpha:
    print(f"p-value ({round(p_value, 1)})가 유의수준 {alpha}보다 작으므로 일반 / 인기 게시물간 가격의 평균 차이는 통계적으로 유의미합니다.")
else:
    print(f"p-value가 유의수준보다 크므로 일반 / 인기 게시물간 가격은 통계적으로 유의미하지 않습니다.")

p-value (0.0)가 유의수준 0.05보다 작으므로 일반 / 인기 게시물간 가격의 평균 차이는 통계적으로 유의미합니다.

# 무료 배송과 유료 배송 구분
data['free_shipping'] = data['shipping_cost'] == 0

# 게시물 유형별 무료 배송과 유료 배송의 비율 계산
shipping_distribution = data.groupby('post_type')['free_shipping'].value_counts(normalize=True).unstack().fillna(0)

colors = ['salmon', 'salmon', 'skyblue', 'lightgreen']

# 스택드 바 차트로 시각화
ax = shipping_distribution.plot(kind='bar', stacked=True, figsize=(8, 6), width=0.8)
plt.title('게시물 유형별 무료/유료 배송 비율', size=14, fontweight="bold")
plt.ylabel('비율')
plt.xlabel('게시물 유형')
plt.xticks([0, 1], ['일반', '인기'], rotation=0)
plt.yticks([])

for p, color in zip(ax.patches, colors):
    p.set_facecolor(color)

for p in ax.patches:
    width, height = p.get_width(), p.get_height()
    x, y = p.get_xy() 
    ax.text(x + width/2, 
            y + height/2, 
            '{:.1%}'.format(height), 
            horizontalalignment='center', 
            verticalalignment='center',
            color='black')
sns.despine()
plt.legend(["유료 배송"], loc='upper right', bbox_to_anchor=(0.7, 0.48, 0.4, 0.5))
plt.show()

# 게시물 유형과 배송 유형 간의 교차 테이블 생성
data['shipping_type'] = data['shipping_cost'].apply(lambda x: '무료배송' if x == 0 else '유료배송')
contingency_table = pd.crosstab(data['post_type'], data['shipping_type'])

# 카이제곱 독립성 검정 수행
chi2, p, _, _ = chi2_contingency(contingency_table)

if p > 0.05:
    print("인기 게시물과 일반 게시물의 무료 배송 비율은 차이가 없습니다.")    
else:
    print("인기 게시물의 무료 배송 비율이 일반 게시물보다 많습니다.")

인기 게시물의 무료 배송 비율이 일반 게시물보다 많습니다.

# 표본의 크기가 크므로, 크래머 V를 통해 일반 게시물과 인기 게시물간 연관성 검증
n = contingency_table.sum().sum()
phi2 = chi2 / n
cramers_v = np.sqrt(phi2 / min(contingency_table.shape))

if cramers_v < 0.1:
    print("일반 게시물과 인기 게시물의 연관성은 아주 약합니다.")

elif cramers_v < 0.3:
    print("일반 게시물과 인기 게시물의 연관성은 보통입니다.")
else:
    print("일반 게시물과 인기 게시물의 연관성은 강합니다.")

일반 게시물과 인기 게시물의 연관성은 아주 약합니다.

5. 판매 채널 분석

지마켓/옥션

총 35,548개의 게시물 중, 약 30.70%가 인기 게시물이다. 이는 지마켓/옥션에서 특가 정보가 활발하게 공유되고 있으며, 특가 게시물이 상당한 관심을 받는다는 것은 두 채널이 제공하는 제품 및 서비스의 다양성과 품질, 그리고 브랜드의 신뢰도와 연계될 수 있다. 이들 채널에서는 특가 정보를 찾는 소비자들이 상품의 품질과 가격에 만족하는 경우가 많아 인기 게시물의 비율이 높아진 것으로 추측할 수 있다.

인터파크, 롯데

인터파크와 롯데는 전체 게시물 대비 인기 게시물의 비율이 높다. 이는 두 채널이 제공하는 특가 정보가 소비자들의 관심을 많이 받는다는 것을 의미한다. 따라서, 특가 정보를 찾는 소비자들에게 이 두 채널 또한 주요하게 고려될 만한 곳이다.

11번가

전체 게시물 중 14,834개를 차지하며, 두 번째로 많은 특가 게시물이 있지만, 인기 게시물 비율은 22.40%로 상대적으로 낮다. 즉, 많은 특가 정보를 제공하지만 모두가 소비자의 인기를 끄는 것은 아니다.

쿠팡, 네이버, 하이마트, 카카오

이들 채널의 게시물 비율은 상대적으로 낮지만, 특정 제품이나 카테고리에 특화된 특가 정보를 제공할 가능성이 높다. 하이마트는 가전, 쿠팡은 사용자의 구매/검색 이력 기반 할인, 카카오, 네이버는 제휴 브랜드 할인이다. 해당 판매 채널의 특화된 제품 또는 서비스에 관심 있는 소비자에게는 더 유용한 정보가 될 것이다.

# Unknown을 제외한 상위 10개 판매 채널을 먼저 추출
top_10_channels_without_unknown = data[data['sales_channel'] != 'unknown'].groupby('sales_channel').size().nlargest(10).index.tolist()

# 판매 채널명 변경: 상위 10개에 포함되지 않거나 'Unknown'인 채널은 'ETC'로 변경
data['sales_channel_aggregated'] = data['sales_channel'].apply(lambda x: x if x in top_10_channels_without_unknown else 'ETC')

# 게시물 종류 별 판매 채널 갯수 계산
channel_counts_aggregated = data.groupby(['post_type', 'sales_channel_aggregated']).size().unstack(level=0).reset_index()
channel_counts_aggregated["total"] = channel_counts_aggregated["general"] + channel_counts_aggregated["popular/hot"]
channel_counts_aggregated["general_ratio"] = round(channel_counts_aggregated["general"] / channel_counts_aggregated["total"] * 100, 2)
channel_counts_aggregated["popular/hot_ratio"] = round(channel_counts_aggregated["popular/hot"] / channel_counts_aggregated["total"] * 100, 2)
idx = channel_counts_aggregated[channel_counts_aggregated['sales_channel_aggregated'] == "ETC"].index
channel_counts_aggregated.drop(idx , inplace=True)
channel_counts = channel_counts_aggregated.sort_values(by="general_ratio", ascending=True)
channel_counts = channel_counts.rename_axis(None, axis=1).reset_index(drop=True)
channel_counts.columns = ["판매 채널", "일반 게시물", "인기 게시물", "총합", "일반 게시물 비율", "인기 게시물 비율"]
channel_counts

	판매 채널	일반 게시물	인기 게시물	총합	일반 게시물 비율	인기 게시물 비율
0	지마켓/옥션	24634	10914	35548	69.30	30.70
1	인터파크	1636	709	2345	69.77	30.23
2	롯데	3136	1169	4305	72.85	27.15
3	티몬	6686	2323	9009	74.21	25.79
4	위메프	6935	2312	9247	75.00	25.00
5	11번가	11511	3323	14834	77.60	22.40
6	쿠팡	4153	1090	5243	79.21	20.79
7	네이버	5482	1346	6828	80.29	19.71
8	하이마트	1313	267	1580	83.10	16.90
9	카카오	1572	300	1872	83.97	16.03

channel_counts = channel_counts_aggregated.sort_values(by="general_ratio", ascending=False)

plt.figure(figsize=(14,6))
bar_general = plt.barh(channel_counts["sales_channel_aggregated"], channel_counts["general_ratio"], color="skyblue", label="일반 게시물")
bar_popular_hot = plt.barh(channel_counts["sales_channel_aggregated"], channel_counts["popular/hot_ratio"], left=channel_counts["general_ratio"], color="lightgreen", label="인기 게시물")

autolabel(bar_general)
autolabel(bar_popular_hot, previous_bars=[bar_general])

plt.xlabel("비율")
plt.ylabel("판매채널별")
plt.legend(bbox_to_anchor=(1.0, 1.0))
plt.title("판매채널별 일반과 인기/핫 게시물의 비율", size=14, fontweight="bold")
sns.despine()
plt.show()

channel_counts = channel_counts.sort_values(by="general", ascending=True)

plt.figure(figsize=(14,6))
bar_general = plt.barh(channel_counts["sales_channel_aggregated"], channel_counts["general"], color="skyblue", label="일반 게시물")
bar_popular = plt.barh(channel_counts["sales_channel_aggregated"], channel_counts["popular/hot"], left=channel_counts["general"], color="lightgreen", label="인기 게시물")

# 각 바 위에 수치 추가
for bar in bar_general:
    plt.text(bar.get_width() - 0.02 * bar.get_width(), bar.get_y() + bar.get_height()/2, 
             f'{int(bar.get_width())}', va='center', ha='right', color='black', fontsize=10)

for bar in bar_popular:
    plt.text(bar.get_x() + bar.get_width() + 0.02 * bar.get_width(), bar.get_y() + bar.get_height()/2, 
             f'{int(bar.get_width())}', va='center', ha='left', color='black', fontsize=10)

plt.xlabel("비율")
plt.ylabel("판매 채널")
plt.legend(bbox_to_anchor=(1, 1))
plt.title("채널별 일반, 인기, 핫 게시물의 갯수", size=14, fontweight="bold")
sns.despine()
plt.show()

6. 시계열 분석

연도

2022년의 특가 게시물 수는 2019년과 유사하지만, 일반 게시물의 수가 크게 증가했다. 이는 더 많은 게시물이 인기를 얻지 못한 것으로, 경제적 요인이나 코로나 이후의 물가 상승 등의 영향인 것으로 보인다.

월별

5월과 11월에 특가 게시물이 집중적으로 올라오는데, 이는 각각 지마켓/옥션의 스마일데이와 11번가의 이벤트 때문이다. 이 시기에는 특가와 관련된 인기 게시물도 증가하는 경향이 있다.

일별

1일과 11일에 특가 게시물이 주목받는데, 티몬과 11번가의 십일절이 주요 요인이다. 특히 11일에는 11번가의 영향으로 지마켓/옥션에서의 게시물이 줄어든 것으로 추측된다.

요일별

평일에 특가 게시물 및 인기 게시물의 수가 주말의 약 2배다. 따라서, 특가 제품을 찾을 때 주말보다 평일이 더 유리하다.

시간별

오전 10~11시에 특가 게시물이 가장 많이 등록되며, 이후 점차 감소한다. 새벽 3~4시에는 가장 적게 등록되지만, 이 시간대의 일반 게시물과 특가 게시물 수가 유사하다. 이는 인기 게시물로 등록되는데 시간이 필요하기 때문으로 보인다.

# 날짜 형식 변환
data['date'] = pd.to_datetime(data['date'], format='%y.%m.%d %H:%M:%S')

# 연도별 분석
data['year'] = data['date'].dt.year
yearly_data = data[(data['year'] != 2018) & (data['year'] != 2023)]
yearly_analysis = yearly_data.groupby(['year', 'post_type']).size().unstack(level=1).fillna(0).reset_index()
yearly_analysis = yearly_analysis.rename_axis(None, axis=1).reset_index(drop=True)
yearly_analysis.columns = ["연도", "일반 게시물", "인기 게시물"]
yearly_analysis["종합"] = yearly_analysis["일반 게시물"] + yearly_analysis["인기 게시물"]
yearly_analysis

	연도	일반 게시물	인기 게시물	종합
0	2019	15938	9162	25100
1	2020	17459	8124	25583
2	2021	16593	6601	23194
3	2022	20857	5530	26387

# 월별 분석
data['month'] = data['date'].dt.month
monthly_data = data[(data['year'] >= 2019) & (data['year'] <= 2022)]
monthly_analysis = monthly_data.groupby(['month', 'post_type']).size().unstack(level=1).fillna(0).reset_index()
monthly_analysis = monthly_analysis.rename_axis(None, axis=1).reset_index(drop=True)
monthly_analysis.columns = ["월", "일반 게시물", "인기 게시물"]
monthly_analysis["종합"] = monthly_analysis["일반 게시물"] + monthly_analysis["인기 게시물"]
monthly_analysis

	월	일반 게시물	인기 게시물	종합
0	1	5531	2658	8189
1	2	5042	2587	7629
2	3	5566	2701	8267
3	4	5682	2388	8070
4	5	6741	2709	9450
5	6	5678	2368	8046
6	7	5922	2465	8387
7	8	5578	2361	7939
8	9	4966	2192	7158
9	10	5600	2164	7764
10	11	8551	2618	11169
11	12	5990	2206	8196

# 일별 분석
data['day'] = data['date'].dt.day
day_analysis = data.groupby(['day', 'post_type']).size().unstack(level=1).fillna(0).reset_index()
day_analysis = day_analysis.rename_axis(None, axis=1).reset_index(drop=True)
day_analysis.columns = ["일", "일반 게시물", "인기 게시물"]
day_analysis["종합"] = day_analysis["일반 게시물"] + day_analysis["인기 게시물"]
day_analysis

	일	일반 게시물	인기 게시물	종합
0	1	3627	1350	4977
1	2	3062	1091	4153
2	3	2832	1066	3898
3	4	2653	1062	3715
4	5	2643	1091	3734
5	6	2623	1027	3650
6	7	2869	1105	3974
7	8	2806	1026	3832
8	9	2648	1048	3696
9	10	2904	1003	3907
10	11	3949	1338	5287
11	12	2686	1005	3691
12	13	2790	1002	3792
13	14	2923	1028	3951
14	15	2904	1069	3973
15	16	2894	943	3837
16	17	3046	1160	4206
17	18	2841	1078	3919
18	19	2943	1055	3998
19	20	2839	1063	3902
20	21	2896	1037	3933
21	22	3005	1071	4076
22	23	2896	1020	3916
23	24	2760	978	3738
24	25	2740	959	3699
25	26	2791	1095	3886
26	27	2476	906	3382
27	28	2481	981	3462
28	29	2225	870	3095
29	30	1914	760	2674
30	31	1465	562	2027

# 요일별 분석
data['weekday'] = data['date'].dt.day_name()
weekday_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
weekday_analysis = data.groupby(['weekday', 'post_type']).size().unstack(level=1).fillna(0).reindex(weekday_order).reset_index()
weekday_analysis = weekday_analysis.rename_axis(None, axis=1).reset_index(drop=True)
weekday_analysis.columns = ["요일", "일반 게시물", "인기 게시물"]
weekday_analysis["종합"] = weekday_analysis["일반 게시물"] + weekday_analysis["인기 게시물"]
weekday_analysis

	요일	일반 게시물	인기 게시물	종합
0	월요일	15697	5547	21244
1	화요일	15230	5042	20272
2	수요일	14948	5279	20227
3	목요일	13654	4597	18251
4	금요일	13696	5181	18877
5	토요일	6698	3203	9901
6	일요일	6208	3000	9208

# 시간 분석
data['hour'] = data['date'].dt.hour
hour_analysis = data.groupby(['hour', 'post_type']).size().unstack(level=1).fillna(0).reset_index()
hour_analysis = hour_analysis.rename_axis(None, axis=1).reset_index(drop=True)
hour_analysis.columns = ["시간", "일반 게시물", "인기 게시물"]
hour_analysis["종합"] = hour_analysis["일반 게시물"] + hour_analysis["인기 게시물"]
hour_analysis

	시간	일반 게시물	인기 게시물	종합
0	0	6303	3606	9909
1	1	2720	1760	4480
2	2	1330	825	2155
3	3	679	483	1162
4	4	423	278	701
5	5	346	276	622
6	6	537	360	897
7	7	1079	653	1732
8	8	2324	989	3313
9	9	5255	1605	6860
10	10	8088	2281	10369
11	11	8009	2131	10140
12	12	4863	1608	6471
13	13	5197	1531	6728
14	14	5451	1553	7004
15	15	5189	1525	6714
16	16	4734	1456	6190
17	17	4819	1343	6162
18	18	3693	1344	5037
19	19	2997	1230	4227
20	20	2875	1268	4143
21	21	2889	1257	4146
22	22	2882	1275	4157
23	23	3449	1212	4661

# 각 선 그래프에서 가장 큰 값과 가장 작은 값을 표시하는 함수를 정의합니다.
def annotate_max_min(line, ax):
    xdata = line.get_xdata()
    ydata = line.get_ydata()
    max_index = np.argmax(ydata)
    min_index = np.argmin(ydata)
    if ydata[min_index] > 300:
        # 최대값 표시
        ax.text(xdata[max_index], ydata[max_index] + 100, int(ydata[max_index]), 
                ha='center', va='bottom', fontsize=9)

        # 최소값 표시
        ax.text(xdata[min_index], ydata[min_index] + 100, int(ydata[min_index]), 
                ha='center', va='bottom', fontsize=9)
    else:
        pass

# 그래프 그리기
fig, axes = plt.subplots(nrows=5, ncols=1, figsize=(12, 20))

# 연도별 그래프 ---------
axes[0].plot(yearly_analysis["연도"], yearly_analysis['일반 게시물'], label='일반', marker='o', color=custom_palette[0])
axes[0].plot(yearly_analysis["연도"], yearly_analysis['인기 게시물'], label='인기', linestyle='--', marker='o', color=custom_palette[1])


# 월별 그래프 ---------
axes[1].plot(monthly_analysis["월"], monthly_analysis['일반 게시물'], label='일반', marker='o', color=custom_palette[0])
axes[1].plot(monthly_analysis["월"], monthly_analysis['인기 게시물'], label='인기', linestyle='--', marker='o', color=custom_palette[1])

# 일별 그래프 ---------
axes[2].plot(day_analysis["일"], day_analysis['일반 게시물'], label='일반', marker='o', color=custom_palette[0])
axes[2].plot(day_analysis["일"], day_analysis['인기 게시물'], label='인기', linestyle='--', marker='o', color=custom_palette[1])

# 요일별 그래프 ---------
axes[3].plot(weekday_analysis["요일"], weekday_analysis['일반 게시물'], label='일반', marker='o', color=custom_palette[0])
axes[3].plot(weekday_analysis["요일"], weekday_analysis['인기 게시물'], label='인기', linestyle='--', marker='o', color=custom_palette[1])

# 시간별 그래프 ---------
axes[4].plot(hour_analysis["시간"], hour_analysis['일반 게시물'], label='일반', marker='o', color=custom_palette[0])
axes[4].plot(hour_analysis["시간"], hour_analysis['인기 게시물'], label='인기', linestyle='--', marker='o', color=custom_palette[1])

#그래프 숫자 작성
for i in range(0,5):
    for line in axes[i].get_lines():
        annotate_max_min(line, axes[i])
        
# 그래프 설정
axes[0].legend(["일반", "인기"], loc='upper right', bbox_to_anchor=(0.65, 0.7, 0.4, 0.5))

axes[0].set_title("연도 별 분석")
axes[1].set_title("월별 분석")
axes[2].set_title("일 별 분석")
axes[3].set_title("요일 별 분석")
axes[4].set_title("시간 별 분석")

axes[0].set_xlabel("")
axes[1].set_xlabel("")
axes[2].set_xlabel("월")
axes[3].set_xlabel("요일")
axes[4].set_xlabel("시간")

axes[0].set_ylabel("")
axes[1].set_ylabel("")
axes[2].set_ylabel("")
axes[3].set_ylabel("")
axes[4].set_ylabel("")

axes[0].set_xticks(range(2019, 2023))
axes[1].set_xticks(range(1, 13))
axes[2].set_xticks(range(1, 32))
axes[3].set_xticks(range(0,7))
axes[3].set_xticklabels(["월요일", "화요일", "수요일", "목요일", "금요일", "토요일", "일요일"])
axes[4].set_xticks(range(0, 25))

fig.suptitle("게시물 유형 별 시계열 분석", size=title_font_size, fontweight="bold")
plt.tight_layout(pad=1.7)
sns.despine()
plt.show()        

# 5월은 지마켓/옥션에서 11월은 지마켓/옥션과 11번가에서 각 스마일 데이와 그랜드 십일절 이라는 연례 특가 행사 이벤트를 한다. 이때문에 5월과 11월에 특가 게시물의 등록수가 많은것인지 확인한다.
# 11일은 11번가에서 십일절 이라는 특가 행사를하여 특가가 많은것인지 확인이 필요하다.
channel_counts = channel_counts_aggregated.sort_values(by="general_ratio", ascending=True)
order = list(channel_counts["sales_channel_aggregated"])
order.append("ETC")

# 월별 판매 채널 카운트
monthly_channel_counts = data.groupby([data['date'].dt.month, 'sales_channel_aggregated']).size().reset_index(name='counts')
monthly_channel_counts_pivot = monthly_channel_counts.pivot(index='date', columns='sales_channel_aggregated', values='counts').fillna(0)

# 비율로 변환
monthly_channel_percentage = monthly_channel_counts_pivot.divide(monthly_channel_counts_pivot.sum(axis=1), axis=0) * 100
monthly_channel_percentage = monthly_channel_percentage[order]
monthly_channel_percentage.rename_axis(None, axis=1).reset_index().round(2)

	월	지마켓/옥션	인터파크	롯데	티몬	위메프	11번가	쿠팡	네이버	하이마트	카카오	ETC
0	1	32.13	1.88	2.88	8.23	9.52	10.69	4.95	4.96	1.54	1.48	21.74
1	2	32.42	2.03	3.10	7.01	8.60	11.94	4.80	5.78	1.43	1.49	21.39
2	3	28.59	1.92	3.38	7.88	7.63	12.39	4.39	6.63	1.51	1.67	24.02
3	4	27.42	1.92	5.36	8.07	7.20	12.37	4.97	6.24	1.45	1.93	23.07
4	5	39.70	1.53	2.91	7.58	5.53	11.68	4.21	5.15	1.36	1.41	18.96
5	6	24.37	2.87	4.31	8.88	8.60	12.28	4.93	7.31	1.30	2.08	23.06
6	7	29.44	1.75	3.47	7.62	9.37	11.89	4.15	5.79	1.57	1.61	23.33
7	8	30.39	2.02	3.31	7.53	8.97	12.18	3.53	5.00	1.23	1.68	24.16
8	9	30.85	2.22	2.75	7.03	7.17	12.84	4.55	5.60	0.81	1.40	24.78
9	10	27.43	1.96	5.87	7.16	7.35	11.04	4.33	5.67	0.75	1.40	27.04
10	11	31.53	1.83	2.89	6.29	5.92	18.90	3.75	4.41	1.53	1.00	21.94
11	12	24.02	2.02	3.87	8.02	9.32	11.85	4.53	6.91	1.23	1.89	26.33

# 일별 판매 채널 카운트
daily_channel_counts = data.groupby([data['date'].dt.day, 'sales_channel_aggregated']).size().reset_index(name='counts')
daily_channel_counts_pivot = daily_channel_counts.pivot(index='date', columns='sales_channel_aggregated', values='counts').fillna(0)

# 비율로 변환
daily_channel_percentage = daily_channel_counts_pivot.divide(daily_channel_counts_pivot.sum(axis=1), axis=0) * 100
daily_channel_percentage = daily_channel_percentage[order]
daily_channel_percentage.rename_axis(None, axis=1).reset_index().round(2)

	일	지마켓/옥션	인터파크	롯데	티몬	위메프	11번가	쿠팡	네이버	하이마트	카카오	ETC
0	1	30.24	1.23	3.01	11.35	10.05	13.22	3.03	4.06	1.97	0.84	21
1	2	29.23	1.52	3.54	7.87	10.47	12.98	3.92	4.94	2.05	1.11	22.37
2	3	28.22	1.51	2.92	7.95	10.83	14.26	3.75	5.41	1.49	1.46	22.19
3	4	29.61	1.45	4.06	8.45	6.97	13.89	4.5	4.98	1.35	1.29	23.45
4	5	31.92	1.34	4.58	7.66	8.49	11.97	4.18	4.74	1.15	1.5	22.47
5	6	31.81	1.12	3.67	8	8.58	12.71	4.58	5.18	1.26	1.32	21.78
6	7	30.12	1.61	4.78	7.57	8.28	13.11	3.93	6.39	1.26	1.56	21.39
7	8	34.86	1.83	3.31	7.07	7.57	10.59	4.2	6.13	1.44	1.51	21.48
8	9	31.03	1.95	3.44	7.71	7.93	11.09	4.65	5.82	1.76	1.68	22.94
9	10	32.76	1.77	2.89	6.94	7.35	12.44	4.66	5.58	1.13	1.87	22.63
10	11	21.17	1.44	1.78	4.29	5.9	39.97	2.86	3.5	0.93	1.31	16.87
11	12	31.51	2.33	2.55	7.29	8.75	11.35	4.77	5.09	1.08	1.25	24.03
12	13	28.88	2.43	4.25	7.3	7.09	12.53	5.27	6.54	1.21	1.69	22.81
13	14	31.51	2.33	3.24	8.25	7.14	11.01	4.38	6.02	1.47	2.2	22.45
14	15	32.37	2.32	3.4	7.4	8.16	10.22	4.98	6.8	1.33	1.61	21.42
15	16	33.2	1.54	3.7	7.35	7.97	8.91	4.72	5.94	1.51	1.82	23.33
16	17	30.12	11.67	3.26	5.75	6.89	7.2	4.59	5.68	1.07	1.31	22.44
17	18	31.77	1.68	4.75	7.55	7.45	8.85	4.21	6.05	1.79	1.66	24.24
18	19	33.42	1.53	3.83	7.2	7	9.48	5.2

# 시간 별 판매 채널 카운트
hour_channel_counts = data.groupby([data['date'].dt.hour, 'sales_channel_aggregated']).size().reset_index(name='counts')
hour_channel_counts_pivot = hour_channel_counts.pivot(index='date', columns='sales_channel_aggregated', values='counts').fillna(0)

# 비율로 변환
hour_channel_percentage = hour_channel_counts_pivot.divide(hour_channel_counts_pivot.sum(axis=1), axis=0) * 100
hour_channel_percentage = hour_channel_percentage[order]
hour_channel_percentage.rename_axis(None, axis=1).reset_index().round(2)

시간	지마켓/옥션	인터파크	롯데	티몬	위메프	11번가	쿠팡	네이버	하이마트	카카오	ETC
0	29.08	1.33	3.52	14.55	11.81	18.29	2.61	2.69	0.82	1.32	13.97
1	38.73	1.36	3.77	9.22	7.34	14.91	3.06	2.90	1.09	0.65	16.96
2	39.72	1.90	2.83	6.45	8.31	12.90	3.90	3.81	1.07	0.70	18.42
3	38.55	2.67	3.18	6.71	7.31	11.96	4.13	3.10	1.03	0.60	20.74
4	34.24	1.57	3.14	8.42	8.84	11.70	3.85	3.99	1.28	0.57	22.40
5	35.05	2.09	3.38	7.88	8.04	14.31	3.54	4.02	1.29	0.96	19.45
6	31.10	0.78	2.90	8.58	8.14	12.93	4.68	4.46	1.67	0.78	23.97
7	35.16	1.27	3.58	7.62	6.18	11.89	7.10	3.64	1.73	0.58	21.25
8	32.24	1.42	3.35	9.60	6.79	13.22	5.98	3.62	1.39	1.24	21.16
9	30.99	1.69	3.94	9.27	7.62	12.22	4.68	4.42	1.46	1.72	22.00
10	30.89	2.70	4.04	6.63	7.91	11.11	3.57	5.50	1.46	1.40	24.80
11	27.50	2.55	4.57	6.70	8.38	12.32	3.44	5.71	1.38	1.46	26.00
12	29.35	2.24	4.33	7.37	8.13	10.11	4.47	6.49	1.00	1.34	25.17
13	28.75	2.41	3.85	7.43	7.92	11.86	4.06	6.42	1.65	1.19	24.46
14	27.28	2.11	4.21	6.94	7.91	11.74	4.31	6.57	1.46	1.28	26.19
15	27.42	2.16	3.65	6.93	7.21	14.34	4.60	5.94	1.58	1.10	25.07
16	27.24	2.13	3.30	6.91	7.54	12.54	4.64	6.62	1.58	1.41	26.09
17	25.95	2.17	3.13	6.48	6.69	11.05	4.46	9.06	1.31	4.56	25.14
18	26.62	1.77	3.20	6.47	7.78	10.22	4.76	9.05	1.55	3.34	25.23
19	27.18	1.59	3.08	5.49	6.27	13.74	5.96	9.30	1.37	2.37	23.66
20	28.53	1.67	3.26	6.57	6.59	11.39	5.79	8.62	1.35	1.88	24.35
21	31.14	2.32	3.79	5.93	6.97	11.24	6.73	6.22	1.42	1.62	22.62
22	33.22	1.71	3.01	5.92	7.43	10.44	6.25	5.87	1.37	1.13	23.65
23	40.38	1.44	2.38	4.78	5.79	12.98	5.51	4.25	0.97	1.12	20.40

# 두 그래프를 2x1로 합치기
fig, axes = plt.subplots(3, 1, figsize=(15, 20))

# 월별 판매 채널 비율
sns.heatmap(monthly_channel_percentage, cmap='YlGnBu', annot=True, fmt='.2f', linewidths=.5, ax=axes[0], cbar=False)

# 일별 판매 채널 비율
sns.heatmap(daily_channel_percentage, cmap='YlGnBu', annot=True, fmt='.2f', linewidths=.5, ax=axes[1], cbar=False)

# 시간별 판매 채널 비율
sns.heatmap(hour_channel_percentage, cmap='YlGnBu', annot=True, fmt='.2f', linewidths=.5, ax=axes[2], cbar=False)


# 그래프 설정
axes[0].set_title("월 별")
axes[1].set_title("일 별")
axes[2].set_title("시간 별")

axes[0].set_xlabel("")
axes[1].set_xlabel("")
axes[2].set_xlabel("판매 채널")

axes[0].set_ylabel("월")
axes[1].set_ylabel("일")
axes[2].set_ylabel("시간")

fig.suptitle("판매 채널 게시물 등록 비율", size=14, fontweight="bold")
plt.tight_layout(pad=1.7)
sns.despine()
plt.show()

뽐뿌 특가 데이터 분석 하기

1. 목적

2. 분석 순서

3. 결론

4. 현명한 소비를 위한 전략적 제안

5. 상세 분석 및 코드

Package and Data load

1. 인기 게시물 특성

2. 카테고리 분석

3. 키워드 분석

4. 가격 분석

5. 판매 채널 분석

6. 시계열 분석

Recent Update

Trending Tags

Contents

Trending Tags

뽐뿌 특가 데이터 분석 하기

1. 목적

2. 분석 순서

3. 결론

4. 현명한 소비를 위한 전략적 제안

5. 상세 분석 및 코드

Package and Data load

1. 인기 게시물 특성

2. 카테고리 분석

3. 키워드 분석

4. 가격 분석

5. 판매 채널 분석

6. 시계열 분석

Recent Update

Trending Tags

Contents

Further Reading

Instacart EDA 프로젝트

타이타닉 튜토리얼

Prediction을 위한 Titanic EDA

Trending Tags