'Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전)' 카테고리의 글 목록 (3 Page)

728x90

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 30

missing values의 예시 1. 세번째 침대가 들어갈 공간이 부족한 두 침대로 꽉찬 방 2. 설문조사는 아마도 응답자의 소득을 알 수 없음. Missing Values를 다루는 세가지 방법 1. A Simple Option: Drop Columns with Missing Values missing values가 있는 column을 삭제한다. 대부분의 column이 missing value를 포함하지 않는다면 괜찮음. 100개 중 1개 정도라면 정확도에 크게 영향을 미치지 않음. !! 꼭 X_valid 데이터에서도 drop 해야 함 !! # Get names of columns with missing values cols_with_missing = [col for col in X_train.colu..

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 2020.10.11

[Kaggle Course] Introduction

Learn to handle missing values, non-numeric values, data leakage and more. Your models will be more accurate and useful you will accelerate your machine learning expertise by learning how to: tackle data types often found in real-world datasets (missing values, categorical variables), design pipelines to improve the quality of your machine learning code, use advanced techniques for model validat..

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 2020.10.11

Intro to AutoML

Automated machine Learning in Google Clound AutoML Tables( 단점-유료) : cloud.google.com/automl-tables/docs/beginners-guide

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 2020.10.08

!! Steps to apply machine learing to real-world data

Step 1. Gather the Data - data leakage를 막아라! -> target value 만든 후 갱신되거나 생성된 variable은 배제 +) target leakage: 예측 시에 사용할 수 없는 data가 포함된 상태에서 예측을 할 때 발생 데이터를 이용할 수 있는 타이밍이나 정방향 순서대로(chronological order)의 관점에서 생각해봐야 함. +) On Kaggle, Go "Data" Category. Please read 'Data Description' carefully. then, Go "Notebooks" category, Click "New Notebook" Step 2. Prepare the Data - missing value를 컨트롤하기, categor..

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 2020.10.08

Random Forest

1. Random Forest 특징 - 수많은 trees를 이용. 예측정확성이 single decision tree일 때보다 좋음 - parameter에 매우 민감. - 최대 트리 사이즈에 대해서 민감하지 않고 항상 좋은 예측을 하기에 좋음. 그 model은 single decision tree보다는 훨씬 나은 성능을 보여줍니다. 수많은 decision tree에 대해서 예측을 한 뒤 평균을 계산하니깐요. +) Intermediate Machine Learning: XGBoost - Introduction 참고 - wakaranaiyo.tistory.com/17 2. 예제 from sklearn.ensemble import RandomForestRegressor from sklearn.metrics im..

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 2020.09.28

[Kaggle Courses] UnderFitting vs OverFitting

1. OverFitting -> deep tree 너무 세세하게 분류 기준을 세우니, 분류 집단 수가 너무 많아져, 정확도가 떨어짐 2. UnderFitting -> shallow tree 분류 기준을 너무 적게 잡으니, 분류 집단 수가 너무 적어서, 정확도가 떨어짐. 3. 각 모델의 예측 정확성 비교하기 MAE와 max_leaf_nodes의 값을 비교. max_leaf_nodes 입력인수: 최대 leaf의 수. leaf가 너무 많으면(node가 많아지면) overfitting -> model이 너무 sensible해짐 leaft가 너무 적으면 underfitting 이상적인 트리 사이즈란? MAE가 최소가 되는 트리 사이즈 # for-loop을 이용해서 최대 leaf(node) 수를 통하여 이상적인 트..

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 2020.09.27

[Kaggle Courses] What is Model Validation (Evaluating)

Evaluating: 내가 만든 모델의 예측 정확성(predictive accuracy) 확인하기, 즉 모델의 퀄리티 요약하기 1. Evaluating의 한 가지 방법: MAE (Mean Absolute Error) 평균절대오차 error = actual - predicted from sklearn.metrics import mean_absolute_error predicted_data_y = data_model.predict(X) mean_absolute_error(y, predicted_data_y) 2. In-Sample Score의 문제점 -> 이 방법 쓰지 말자 In-Sample Score: train data로 predict을 하고 train data의 target data, 즉 목표값과 비교..

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 2020.09.26

[Kaggle Courses] From Fitting to Prediction

1. Selecting Data for Modeling data = pd.read_csv( filename ) data.columns data = data.dropna(axis=0) - Selecting The Prediction Target: Dot-notation: 필요한 column 추출 prediction target(y) y = data.Price 2. Choosing "Features" (X) data_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude'] X = data[data_features] 3. Building My Model Define: model의 타입은?( 결정트리? 다른 거?) Fit: data의 패턴을..

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 2020.09.26

[Kaggle Courses] Basic Data Exploration - Ex.MelbourneHomePrice

Prediction of New House Price in Melbourne¶ ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']에 따라 house의 Price가 어떻게 되는지 model을 만들자. In [6]: import pandas as pd #It has DataFrame(SQL) melbourne_file_path = r"C:\Users\32mou\Desktop\melb_data.csv\melb_data.csv" melbourne_data = pd.read_csv(melbourne_file_path) melbourne_data.describe() #Checking Missing Value is important Out[6]: Rooms P..

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 2020.09.22

[Kaggle Courses] How Models Works

1. Decision Tree (Yes or No) Fitting or Training the model Use Data to decide how to break into two groups. Capturing patterns from data. Training data == the data used to fit the model Predict == After the model has been fit, Apply it to new data 2. Improving the Decision Tree (Yes or No) +) leaf: 결정 트리의 말단 노드

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 2020.09.22

1 2 3

250x250

datascience, Python, YBMIT, course, 너비우선탐색, Intermediate Machine Learning, data visualization, cos pro 1급, Intro to DeepLearning, kaggle, 머신러닝, COSPro, 알고리즘, 데이터분석, cos, cos pro, 파이썬, 백준, Intro to Machine Learning, 2급,

Today :
Yesterday :

728x90

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

WakaraNai

Machine Learning/[Kaggle Course] ML (+ 딥러닝, 컴퓨터비전) 30

티스토리툴바