๊ด€๋ฆฌ ๋ฉ”๋‰ด

๐Ÿฆ• ๊ณต๋ฃก์ด ๋˜์ž!

์ง‘ ๊ฐ’ ์˜ˆ์ธก ๋ถ„์„...3 ๋ณธ๋ฌธ

Data/Dacon

์ง‘ ๊ฐ’ ์˜ˆ์ธก ๋ถ„์„...3

Kirok Kim 2022. 2. 8. 21:24
์ƒ๊ด€๊ด€๊ณ„ ์‹œ๊ฐํ™”
1์žฅ ๋ถ€๋ถ„
# ๋จผ์ € ์ƒ๊ด€๊ณ„์ˆ˜ ๊ณ„์‚ฐ์„ ์œ„ํ•ด ํ…์ŠคํŠธ ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆซ์ž๋กœ ๋ณ€ํ™˜ํ•ด์คŒ.
from sklearn.preprocessing import LabelEncoder

# ๋”ฅ ๋ณต์‚ฌ
corr_df = data.copy()
corr_df[corr_df.columns[corr_df.dtypes=='O']] = corr_df[corr_df.columns[corr_df.dtypes=='O']].astype(str).apply(LabelEncoder().fit_transform)
corr_df['Exter Qual'].unique()
## ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•ด๋ณด๋ฉด ๋ฐ˜๋น„๋ก€์  ์š”์†Œ๊ฐ€ ๋งŽ์Œ 
#์ด ๋ถ€๋ถ„์€ sklearn์„ ํ†ตํ•ด ์ œ๋Œ€๋กœ ๋œ ์ˆ˜์น˜ํ™”๊ฐ€ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์•˜๋‹ค๊ณ  ์ƒ๊ฐํ•จ.
corr_df.info()
๊ฐœ์„ ๋œ ์ฝ”๋“œ
# ๊ทธ๋ž˜์„œ ๋ช…๋ชฉํ˜• ๋ฐ์ดํ„ฐ๋“ค์„  ํŒŒ์•…ํ•˜๊ณ 
data['Exter Qual'].unique()
data.info()
# ๊ทธ  ๊ฐ’๋“ค์„  ํ•˜๋‚˜ํ•˜๋‚˜  ๋ฐ”๊ฟ”์คŒ  ๋‹คํ–‰ํžˆ  ๊ฐ™์€  ์ฒ™๋„๋กœ  ๊ณ„์‚ฐ์ด  ๋˜๊ณ   ์žˆ์–ด์„œ  ๋ฐ”๊พธ๊ธฐ๊ฐ€  ์‰ฌ์› ์Œ
data2=data.copy() 
for i in data[data.columns[data.dtypes=='O']]:
  print(i)
  data2 = data2.replace({f'{i}' : {"Po":0, "Fa" : 1, "TA" : 2, "Gd" : 3, "Ex" : 4}})
data2.info()
์ˆ˜์ •์ „ heatmap
#์ƒ๊ด€๊ด€๊ณ„ ๋ถ„์„๋„ ์ด๊ฒŒ ๋ฐ์ดํ„ฐ ๋ถ„์„ํ•˜๋Š”๋ฐ์— ์ œ์ผ ์œ ์šฉํ•œ ์‹œ๊ฐํ™”๊ฐ€ ์•„๋‹Œ๊ฐ€ ์‹ถ๋‹ค
plt.figure(figsize=(15,10))

heat_table = corr_df.corr()
mask = np.zeros_like(heat_table)
mask[np.triu_indices_from(mask)] = True
heatmap_ax = sns.heatmap(heat_table, annot=True, mask = mask, cmap='coolwarm')
heatmap_ax.set_xticklabels(heatmap_ax.get_xticklabels(), fontsize=15, rotation=45)
 # ๊ธ€์ž ๊ธฐ์šธ์ด๊ธฐ ๋ฐ ํฐํŠธ์‚ฌ์ด์ฆˆ ๊ฐ๋„๋Š” ๋ฐ˜์‹œ๊ณ„ ๋ฐฉํ–ฅ
heatmap_ax.set_yticklabels(heatmap_ax.get_yticklabels(), fontsize=15)
plt.title('correlation between features', fontsize=40)
plt.show()

# target๊ณผ  ๋†’์€  ์ƒ๊ด€๊ด€๊ณ„๋ฅผ  ๊ฐ€์ง€๋Š”  ๊ฒƒ๋“ค
heat_table.loc[:,'target'].sort_values().tail(7)

์ˆ˜์ • ํ›„ heatmap
plt.figure(figsize=(15,10))
heat_table2 =data2.corr()# ํŒ๋‹ค์Šค ์ƒ๊ด€๊ด€๊ณ„
# ์‚ผ๊ฐํ˜• ๋งˆ์Šคํฌ๋ฅผ ๋งŒ๋“ ๋‹ค(์œ„ ์ชฝ ์‚ผ๊ฐํ˜•์— True, ์•„๋ž˜ ์‚ผ๊ฐํ˜•์— False)
mask = np.zeros_like(heat_table2) #numpy heat_table๊ณผ shape๋™์ผํ•œ 0์œผ๋กœ ์ฑ„์›Œ์ง„ํ–‰๋ ฌ
mask[np.triu_indices_from(mask)] = True
# heatmap
heatmap_ax = sns.heatmap(heat_table2, annot=True, mask = mask, cmap='coolwarm')
# x์ถ• ์„ธํŒ…
heatmap_ax.set_xticklabels(heatmap_ax.get_xticklabels(), fontsize=15, rotation=45)
 # ๊ธ€์ž ๊ธฐ์šธ์ด๊ธฐ ๋ฐ ํฐํŠธ์‚ฌ์ด์ฆˆ ๊ฐ๋„๋Š” ๋ฐ˜์‹œ๊ณ„ ๋ฐฉํ–ฅ
# y์ถ• ์„ธํŒ…
heatmap_ax.set_yticklabels(heatmap_ax.get_yticklabels(), fontsize=15)
plt.title('correlation between features', fontsize=40)
plt.show()

# target๊ณผ  ๋†’์€  ์ƒ๊ด€๊ด€๊ณ„๋ฅผ  ๊ฐ€์ง€๋Š”  ๊ฒƒ๋“ค
heat_table2.loc[:,'target'].sort_values().tail(7)

sklearn์œผ๋กœ ํ•œ ๊ฒƒ๊ณผ๋Š” ์ฐจ์ด๊ฐ€ ์ข€ ์žˆ๋‹ค.
์—ญ์‹œ ์•„์ง์€ ์ธ๊ณต์ง€๋Šฅ๋ณด๋‹ค๋Š” ์‚ฌ๋žŒ์ด๋‹ค.
๋ฐ˜์‘ํ˜•
Comments