!! Steps to apply machine learing to real-world data

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전)

!! Steps to apply machine learing to real-world data

WakaraNai 2020. 10. 8. 18:08

728x90

Step 1. Gather the Data

- data leakage를 막아라! -> target value 만든 후 갱신되거나 생성된 variable은 배제

+) target leakage: 예측 시에 사용할 수 없는 data가 포함된 상태에서 예측을 할 때 발생

데이터를 이용할 수 있는 타이밍이나 정방향 순서대로(chronological order)의 관점에서 생각해봐야 함.

target leakage 예제. 미래에 pneumonia 환자가 항생제를 아직 받지 않았다면 해당 모델, 테이블은 실세계 적용 불가

+) On Kaggle, Go "Data" Category. Please read 'Data Description' carefully.

then, Go "Notebooks" category, Click "New Notebook"

Step 2. Prepare the Data

- missing value를 컨트롤하기, categorical data 정의하기 -> "Feature Engineering"

Step 3. Select a Model

- www.kaggle.com/vbmokin/data-science-for-tabular-data-advanced-techniques

Data Science for tabular data: Advanced Techniques

Explore and run machine learning code with Kaggle Notebooks | Using data from No Data Sources

www.kaggle.com

Step 4. Train the Model

- Fit 'decision trees' and 'random forests' to patterns in training data

Step 5. Evaluate the Model

- 'Validation set'을 통해 보지 못한 데이터에 잘 작동하는지 확인

Step 6. Tune Parameters

+) XGBoost model을 이용하기도 함. www.kaggle.com/alexisbcook/xgboost

Step 7. Get Predictions

- Predict your result using model between validation data and train data

Step 8. Submit Your Result on Competition

1. Click "Save Version" Button on right top

2. Check "Save & Run All (Commit)" then, Click "Save"

3. If Saved well, You can see increment next to "save version" button.

Click it then Choose your current notebook.

4. Click "Go to Viewer", after 1sec, new TAB will pop up. Go there.

5. Go to "Output", then Click "Submit"

-> 최종 결과 파일인 Submission.csv 만 제출하는 것임!!! 그럼 정확도를 판단해줌!!!

728x90

'Machine Learning > [Kaggle Course] ML (+ 딥러닝, 컴퓨터비전)' 카테고리의 다른 글

[Kaggle Course] Introduction (0)	2020.10.11
Intro to AutoML (0)	2020.10.08
Random Forest (0)	2020.09.28
[Kaggle Courses] UnderFitting vs OverFitting (0)	2020.09.27
[Kaggle Courses] What is Model Validation (Evaluating) (0)	2020.09.26

현재글!! Steps to apply machine learing to real-world data

250x250

kaggle, 2급, Python, Intro to Machine Learning, cos, course, 너비우선탐색, 알고리즘, 백준, cos pro 1급, cos pro, Intermediate Machine Learning, COSPro, Intro to DeepLearning, 데이터분석, 머신러닝, data visualization, 파이썬, YBMIT, datascience,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

WakaraNai