Spark Streaming

Shubham Chaudhary

2 min readJan 25, 2019

--

ETL process is to fetch data from different types of systems, structure it and save it into the destination database.

Batch

In the case of a batch job, the query will be run on the data saved at source-path and the transformed data will be saved at the destination dest-path.

Streaming

In the case of a streaming job, the query will run on the data continuously from source-path and transformed data will be appended in the destination dest-path again and again as data comes in.

Merging static data (DB) with streaming data

There might be use cases where you want to merge static data (e.g. MySQL) with the streaming data. You can do this as follows:

Executing the Job

Batch Execution

Stream Execution

With the planner’s logical plan, incremental execution plan is generated on top of it

Resources

Apache Spark 2.0: A Deep Dive Into Structured Streaming — by Tathagata Das

Originally published at shubham.chaudhary.xyz.

Streaming Data Analytics

Shubham Chaudhary

Written by Shubham Chaudhary

Software Developer - Recommendations Team @ Scribd, Previously ML Team @ Zomato. I blog at https://shubham.chaudhary.xyz/blog/

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams