[Spark ML] LightGBM 알고리즘 Spark로 구현하기
주요 하이퍼 파라미터 / 조기 중단 기능(Early Stopping)
Posted by
Wonyong Jang
on April 08, 2023 ·
4 mins read
[Spark] 설치 및 Spark shell 실습
scala언어의 spark prompt를 실행하는 script
Posted by
Wonyong Jang
on January 27, 2023 ·
3 mins read
[Spark] Docker Ubuntu 컨테이너로 Spark 실습환경 만들기
도커를 이용한 master, worker 클러스터 환경 구성 / spark-submit / 스탠드 얼론 클러스터 매니저
Posted by
Wonyong Jang
on August 29, 2021 ·
14 mins read
[Spark] Broadcast, Accumulator 공유변수
braodcast, accumulator, closure
Posted by
Wonyong Jang
on July 08, 2021 ·
4 mins read
[Spark] How to override a spark dependency in cluster mode(AWS EMR)
라이브러리 버전 충돌이 발생할 때 shadowJar를 사용하여 package relocate
Posted by
Wonyong Jang
on July 08, 2021 ·
5 mins read
[Spark] 아파치 스파크 graceful shutdown
How to do graceful shutdown of spark streaming job
Posted by
Wonyong Jang
on June 29, 2021 ·
7 mins read
[Spark] Persistence 와 Data Locality
RDD Persistence / memory, disk cache / locality level(PROCESS LOCAL, NODE LOCAL, RACK LOCAL)
Posted by
Wonyong Jang
on June 23, 2021 ·
10 mins read
[Spark] 아파치 스파크 Partitioning
RDD on a Cluster / Partiton 개수와 크기 정하기 / coalesce 와 repartition
Posted by
Wonyong Jang
on June 21, 2021 ·
6 mins read
[Spark] 아파치 스파크 Serialization
Serialization challenges with Spark and Scala / Passing function to spark
Posted by
Wonyong Jang
on June 15, 2021 ·
16 mins read
[Spark] Pipeline and Stage
Stage skip 되는 경우 / 셔플에 의한 stage 분리 / 셔플 발생시 write, read
Posted by
Wonyong Jang
on May 10, 2021 ·
6 mins read
[Spark] 아파치 스파크(spark) DataSet
DataSet 의 주요 연산 사용법
Posted by
Wonyong Jang
on May 02, 2021 ·
12 mins read
[Spark] 아파치 스파크(spark) SQL과 DataFrame
DataFrame 의 주요 연산 사용법
Posted by
Wonyong Jang
on May 01, 2021 ·
9 mins read
[Spark] Streaming 의 Fault Tolerance 와 Graph
장애 복구
Posted by
Wonyong Jang
on April 14, 2021 ·
2 mins read
[Spark] 아파치 스파크(spark) RDD 여러가지 연산
지연 처리 방식의 Transformation, 즉시 실행 방식의 Action / narrow, wide transformation
Posted by
Wonyong Jang
on April 12, 2021 ·
20 mins read
[Spark] 아파치 스파크(spark) 시작하기
Driver, Executor, Node, Job, Stage, Task, Cluster Manager/ RDD, Fault tolerance / Hadoop
Posted by
Wonyong Jang
on April 11, 2021 ·
16 mins read