Big Data

CS/Introduction of Coumputer Science

Big Data

WakaraNai 2022. 11. 15. 16:39

728x90

Three V's of Big Data

- volume : amount of data 규모

- variety : range of data types sources 다양성

- velocity : speed of data in/out 속도

4th veracity 정확성

5th value 가치

Database

DBMS : Database Management System

- Relational Model : SQL

- Query Processing

- Transaction Management : collection of operations that performs a single logical function

- Concurrency-contrl manager : 동시성 제어. 동시에 발생한 트랜잭셕을 처리하여 데이터베이스의 일관성을 보장

- Recovery : Redo & Undo

Big Data Technologies

NoSQL = Not Only SQL

Apache Hadoop : 4th big data

allow for the distributed processing of large datasets

across clusters of computers using simple programming models

Hadoop = HDFS(store) + MapReduce(process)

- Redundant, fault-tolerant data storage

- Parallel computation framework

- job coordination

Hadoop is Not ...

... relational database

... OLTP online transaction processing

... structed data store of any kind

Hadoop used for ...

- recommendation system

- Natural Language Processing

- Data warehousing

- Market research / forecasting

- Financial analysis

- Correlation engines

- Image/video processing

- log analysis

- social networking

- health

- government

- telecommunication

Hadoop : HDFS

a distributed file system designed to run on commodity hardware

목표

- hardware failure -> detect fault quickly, automatic recovery

- streaming data access -> batch processing (not interactive use by users)

- large data sets

- simple coherency model: write-once-read-many acess

- "Moving Computation is Cheaper than Moving Data"

Hadoop : Map Reduce

a programming model and an associated implementation for

processing and generating large data

Programming model - 3 phases

- Map phase

- Sort phase

- Reduce phase

장점

- Distribute data and computation

- independent task

- linear scaling in the ideal case

- simple programming model : "end-user" only writes map-reduce tasks

단점

- still rough

- programming is very restrictve

- "Joins" operation are tricky and slow

- cluster management is hard (debuggging ...)

- limit scaling

Big Data의 5대 신기술

1. Storam and Kafka : stream in-real time

2. Drill and Dremel: ad-hoic querying

3. R : statistical programming language

4. Gremlin and Giraph : graph analysis

5. SAP Hana : in-memory analystics platform

6. Honorable mention :D3 : visualization 차트로 시각화

빅데이터 분석

- 텍스트 마이닝

- 웹 마이닝

- 오피니언 마이닝 (뉴스 등)

- 리얼리티 마이닝 (휴대용 기기 사용량)

- 소셜 네트워크 분석

- 분류

- 군집화

- 기계학습

- 감성 분석

728x90

저작자표시 비영리 동일조건

'CS > Introduction of Coumputer Science' 카테고리의 다른 글

Computer Security & AI (0)	2022.11.25
Semantic Analyzer: Scope (0)	2022.11.19
Network and Wireless (0)	2022.11.15
Quantum Computing (0)	2022.11.11
Computer Vision (0)	2022.11.11

현재글Big Data

250x250

data visualization, Intro to DeepLearning, kaggle, 너비우선탐색, cos pro, Intro to Machine Learning, 머신러닝, Python, Intermediate Machine Learning, cos, datascience, 데이터분석, YBMIT, course, 백준, 파이썬, COSPro, 2급, 알고리즘, cos pro 1급,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

WakaraNai

Big Data

Three V's of Big Data

Database

Big Data Technologies

Apache Hadoop : 4th big data

Hadoop : HDFS

Hadoop : Map Reduce

Big Data의 5대 신기술

빅데이터 분석

'CS > Introduction of Coumputer Science' 카테고리의 다른 글

'CS/Introduction of Coumputer Science'의 다른글

티스토리툴바

Big Data

Three V's of Big Data

Database

Big Data Technologies

Apache Hadoop : 4th big data

Hadoop : HDFS

Hadoop : Map Reduce

Big Data의 5대 신기술

빅데이터 분석

'CS > Introduction of Coumputer Science' 카테고리의 다른 글

'CS/Introduction of Coumputer Science'의 다른글

관련글

티스토리툴바