Data Mining - prediction mechanism

Web/DB & Cloud

Data Mining - prediction mechanism

WakaraNai 2021. 12. 10. 04:26

728x90

NoSQL 같은 큰 database에서 반자동 분석하는 과정 -> 머신러닝을 목표로

KDD : knowledge discovery in databases

prediction mechanism (based on past history)

classification : result for which class
regression : result for a new parameter value

descriptive pattern

association : ( similar relation) - (병의) 원인 감지용으로 사용, 추천시스템이나
cluster : 전염병 감지 등..

Classification Rules

ex)

새로운 자동차 보험 신청자가 주어지면, 그 또는 그녀는 낮은 위험, 중간 위험 또는 높은 위험으로 분류되어야 하는가?

Decision Tree

groups based on a partitioning attribute, and a partitioning condition for the node

Bayesian Classifier

cj : class j

d : instance

p (cj | d) = instance d가 class cj에 있을 확률

p (d | cj ) = instance d가 class cj에 생성될 확률 - 선제 조건 - Naive Bayesian 공식으로 계산

p (cj ) = class cj가 출현할 확률 - 선제 조건

p (d) = instance d가 발생할 확률 - 모든 class에서 같은 값을 가진다면 무시해도 됨

Naive Bayesian Classifier

각 p (di | cj )은 각 class cj에 대한 di 값의 히스토그램에서 측정됨

SVM : Support Vector Mahcine Classifier

line을 그려서 class 분리

n차원에서는 평면을 그려서

여기서 검은색 선을 maximum margin line이라고 함

때론 곡선의 형태를 띄기도 함

평면이 되기도 함

완벽하게 분리하도록 그릴 수 없음

최선을 선택할 뿐

N-ary classification can be done by N binary classifications

• In class i vs. not in cl

Neural Network Classifier

For classification, each output value indicates likelihood of the input instance belonging to that class

• Pick class with maximum likelihood

Value of a node may be linear combination of inputs,

or may be a nonlinear function

• E.g., sigmoid function

각 node가 다음 layer의 모든 node에 연결 되어 있다면 fully connected

Backpropagation 알고리즘은 다음과 같이 동작한다.
• 가중치는 처음에 랜덤하게 설정됨
• training 인스턴스가 한 번에 하나씩 처리됩니다.
▪ 출력은 current weight를 사용하여 계산한다.
▪ 분류가 틀리면 가중치를 조정하여 올바른 등급에 높은 점수를 부여합니다.

Deep learning

= training of deep neural network

on very large numbers of training instances

Regression

class가 아닌 특정 값을 산출해야 할 때 사용

잘 맞는 coefficient 찾기

linear regression

Given values for a set of variables, X1 , X2 , …, Xn ,

we wish to predict the value of a variable Y.

One way is to infer coefficients a0 , a1 , a2 , …, an

such that Y = a0 + a1 * X1 + a2 * X2 + … + an * Xn

이러한 선형의 다항식을 찾아내는 것이 바로 linear regression

curve fitting

the process of finding a curve that fits the data

The fit may only be approximate

because of noise in the data
because the relationship is not exactly a polynomial

728x90

저작자표시 비영리 동일조건

'Web > DB & Cloud' 카테고리의 다른 글

Information Retrieval (정보 검색) (0)	2021.12.10
Data Mining - descriptive pattern (0)	2021.12.10
Data Warehouse (0)	2021.12.10
OLAP : Online Analytical Processing (0)	2021.12.10
Ranking & Windowing (0)	2021.12.10

현재글Data Mining - prediction mechanism

250x250

COSPro, cos, datascience, Intro to Machine Learning, cos pro 1급, 머신러닝, 알고리즘, 파이썬, cos pro, 백준, 2급, 너비우선탐색, Intermediate Machine Learning, YBMIT, kaggle, data visualization, 데이터분석, Intro to DeepLearning, Python, course,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

WakaraNai