[Spark] Dynamic Partition Pruning / Speculative Execution
filter push down / dimension 테이블과 fact 테이블 조인시 쿼리 성능 최적화
Posted by
Wonyong Jang
on May 15, 2024 ·
5 mins read
[Spark] Join Strategies 과 Shuffle
shuffle join, broadcast join / shuffle sort merge join, broadcast hash join
Posted by
Wonyong Jang
on April 20, 2024 ·
7 mins read
[Scala] Guice의 Singleton 사용
Scala 언어를 이용하여 Dependency Injection 구조 만들기
Posted by
Wonyong Jang
on March 09, 2024 ·
2 mins read
[Spark] On Kubernetes
EMR Cluster 에서의 Spark와 비교
Posted by
Wonyong Jang
on March 03, 2024 ·
2 mins read
[Spark] Memory 관리 및 튜닝
Spark 실행시 적절한 Driver와 Executor 개수
Posted by
Wonyong Jang
on February 13, 2024 ·
4 mins read
[BigData] File Format - Parquet, ORC
Parquet(파케이), ORC(Optimized Row Columnar) / 컬럼 기반(Columnar) 저장 포맷과 열 기반(Row-based) 저장 포맷
Posted by
Wonyong Jang
on February 02, 2024 ·
5 mins read
[AWS] S3 버킷 수명 주기 구성
DeletingObjectsfromVersioningSuspendedBuckets, Versioning Suspended Bucket Lifecycle
Posted by
Wonyong Jang
on January 11, 2024 ·
4 mins read
[Scala] is 로 시작하는 Boolean 타입 필드 사용시 이슈
java, kotlin 그리고 scala 언어에서의 차이 / jackson을 이용한 serialize 할 때 주의사항
Posted by
Wonyong Jang
on November 25, 2023 ·
10 mins read