์ผ | ์ | ํ | ์ | ๋ชฉ | ๊ธ | ํ |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
- ์ฐธ์กฐ ๋ณ์
- dacon
- ์ด์ง์ ๋ณํ
- MacOS
- ํ์ ๋ณ์
- ํฉํ ๋ฆฌ์ผ ์ง๋ฒ
- sql
- ํ๋ก๊ทธ๋๋จธ์ค
- ๋ธ๋ผ์ฐ์ ์คํ
- ๋ค์ต์คํธ๋ผ ์๊ณ ๋ฆฌ์ฆ
- Do it
- ์ ํ ํฌ ํ์ด์ฌ
- Do_it
- ์ง ๊ฐ ์์ธก ๋ถ์
- java
- mysql
- ์ด๊ฒ์ด ์ทจ์ ์ ์ํ ์ฝ๋ฉํ ์คํธ๋ค
- ์๋ฐ
- ์์ด
- DFS
- BFS
- np.zeros_like
- 2BPerfect
- matplotlib
- jdbc
- Extended Slices
- ๋ฐฑ์ค
- ์ต์
- PYTHON
- ๋ฐ์ค๊ทธ๋ํ
- Today
- Total
๋ชฉ๋กData/Dacon (6)
๐ฆ ๊ณต๋ฃก์ด ๋์!

์๊ด๊ด๊ณ ์๊ฐํ 1์ฅ ๋ถ๋ถ # ๋จผ์ ์๊ด๊ณ์ ๊ณ์ฐ์ ์ํด ํ ์คํธ ํ์์ ๋ฐ์ดํฐ๋ฅผ ์ซ์๋ก ๋ณํํด์ค. from sklearn.preprocessing import LabelEncoder # ๋ฅ ๋ณต์ฌ corr_df = data.copy() corr_df[corr_df.columns[corr_df.dtypes=='O']] = corr_df[corr_df.columns[corr_df.dtypes=='O']].astype(str).apply(LabelEncoder().fit_transform) corr_df['Exter Qual'].unique() ## ์๊ด๊ด๊ณ๋ฅผ ๋ถ์ํด๋ณด๋ฉด ๋ฐ๋น๋ก์ ์์๊ฐ ๋ง์ #์ด ๋ถ๋ถ์ sklearn์ ํตํด ์ ๋๋ก ๋ ์์นํ๊ฐ ์ด๋ฃจ์ด์ง์ง ์์๋ค๊ณ ์๊ฐํจ. corr_df.info() ๊ฐ์ ๋ ์ฝ๋..

์์นํ๋ฐ์ดํฐ ๋ฐ ๋ช ๋ชฉํ ๋ฐ์ดํฐ ์๊ฐํ #์์นํ ๋ฐ์ดํฐ numeric_feature = data.columns[(data.dtypes==int) | (data.dtypes== float)] # ์นดํ ๊ณ ๋ฆฌํ ๋ฐ์ดํฐ categorical_feature = data.columns[data.dtypes=='O'] import matplotlib.pyplot as plt %matplotlib inline plt.style.use("ggplot") feature = numeric_feature # Boxplot ์ ์ฌ์ฉํด์ ๋ฐ์ดํฐ์ ๋ถํฌ๋ฅผ ์ดํด๋ด ๋๋ค. plt.figure(figsize=(20,15)) plt.suptitle("Boxplots", fontsize=40) for i in range(len(feature))..
๋ช ๋ชฉํ ๋ฐ์ดํฐ ๋ณํ ๋ฐ ํํธ๋งต์ ์์ธํ ๋ด์ฉ์ 3์ฅ์์ ๋ค๋ฃฐ ์์ ์ต์(์ฐ๊ตฌ์ค) !pip install pandas !pip install numpy !pip install matplotlib !pip install seaborn !pip install sklearn import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.preprocessing import LabelEncoder # 1. train.csv : ํ์ต ๋ฐ์ดํฐ # id : ๋ฐ์ดํฐ ๊ณ ์ id # OverallQual : ์ ๋ฐ์ ์ฌ๋ฃ์ ๋ง๊ฐ ํ์ง # YearBuilt : ์๊ณต ์ฐ๋ # YearRemodAdd : ..

fig, axes = plt.subplots(4, 3, figsize=(25, 15)) fig.suptitle('feature distributions per quality', fontsize= 40) for ax, col in zip(axes.flat, train.columns[1:]): sns.violinplot(x= 'quality', y= col, ax=ax, data=train) ax.set_title(col, fontsize=20) plt.tight_layout() plt.show() sns.color_palette("Set2") fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize = (30, 9)) scatter_fix = sns.histplot(data..

ํด๋์ค ๋ถํฌ ํ์ธ counted_values = train['quality'].value_counts() plt.style.use('ggplot') plt.figure(figsize=(12, 10)) plt.title('class counting', fontsize = 30) value_bar_ax = sns.barplot(x=counted_values.index, y=counted_values) value_bar_ax.tick_params(labelsize=20) ์์ธ ํ์ง๋ณ ๊ณ ์ ํน์ฑ ํ์ธ qualities = {} for i in range(4, 9): quality_description = train[train['quality'] == i].drop(['id', 'quality'], axis=1)..

๋ชฉํ : ์์ธ์ ์ฑ๋ถ ํจ๋์ ๋ถ์ํด์ ํด๋น ์์ธ์ ํ์ง์ ๋ถ๋ฅ ๋ฐ์ดํฐ ๋ถ๋ฌ์ค๊ธฐ import pandas as pd import matplotlib.oyplot as plt import seaborn as sns from matplotlib import patches %matplotlib inline train = pd.read_csv('train.csv') train.head() ๊ฒฐ์ธก์น ํ์ธ def check_missing_col(dataframe): counted_missing_col = 0 for i, col in enumerate(dataframe.columns): missing_values = sum(dataframe[col].isna()) is_missing = True if missing_va..