[Airflow] 아파치 Airflow - Workflow
DAG(Directed Acyclic Graph) / 데이터 워크 플로우 관리 도구/ execution_date 의미 / backfill and catchup
Posted by
Wonyong Jang
on July 25, 2024 ·
14 mins read
[Python] LLM 을 이용하여 데이터 수집 및 요약 추출
LangChain과 OpenAI API 사용 / ChatOpenAI / StrOutputParser / ChatPromptTemplate / WebBaseLoader
Posted by
Wonyong Jang
on July 18, 2024 ·
11 mins read
[Python] Python을 이용한 Crawling (Scrapy)
Crawling, Scraping / 사이트의 크롤링 정책
Posted by
Wonyong Jang
on July 08, 2024 ·
5 mins read
[Spark] Dynamic Partition Pruning / Speculative Execution
filter push down / dimension 테이블과 fact 테이블 조인시 쿼리 성능 최적화
Posted by
Wonyong Jang
on May 15, 2024 ·
5 mins read
[Spark] Join Strategies 과 Shuffle
shuffle join, broadcast join / shuffle sort merge join, broadcast hash join / join hint
Posted by
Wonyong Jang
on April 20, 2024 ·
9 mins read
[Scala] Guice의 Singleton 사용
Scala 언어를 이용하여 Dependency Injection 구조 만들기
Posted by
Wonyong Jang
on March 09, 2024 ·
2 mins read
[Spark] On Kubernetes
EMR Cluster 에서의 Spark와 비교
Posted by
Wonyong Jang
on March 03, 2024 ·
2 mins read
[Spark] Memory 관리 및 튜닝
Spark 실행시 적절한 Driver와 Executor 개수
Posted by
Wonyong Jang
on February 13, 2024 ·
4 mins read