Hi,
"I'd assume this is because Kafka Streams is positioned for
building streaming applications, rather than doing analytics, whereas Spark
is more often used for analytics purposes."
Well not necessarily the full picture. Spark can do both analytics and
streaming, especially with Spark Structured Streaming. Spark Structured
Streaming is the Apache Spark API that lets you express computation on
streaming data *in the same way you express a batch computation on static
data.* That is the strength of Spark. Spark supports Java, Scala and Python
among others. Python or more specifically Pyspark is particularly popular
with Data Science plus the conventional analytics.
Structured Streaming Programming Guide - Spark 3.1.1 Documentation
(apache.org)
<https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>
There are two scenarios with Spark Structured Streaming. There are called
*foreach* and *foreachBatch* operations allow you to apply arbitrary
operations and write logic on the output of a streaming query. They have
slightly different use cases - w*hile **foreach** allows custom write logic
on every row,* *foreachBatch** allows arbitrary operations and custom logic
on the output of each micro-batch*.
HTH
view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Wed, 28 Apr 2021 at 20:12, Andrew Otto <otto@wikimedia.org> wrote:
> I'd assume this is because Kafka Streams is positioned for building
> streaming applications, rather than doing analytics, whereas Spark is more
> often used for analytics purposes.
>
"I'd assume this is because Kafka Streams is positioned for
building streaming applications, rather than doing analytics, whereas Spark
is more often used for analytics purposes."
Well not necessarily the full picture. Spark can do both analytics and
streaming, especially with Spark Structured Streaming. Spark Structured
Streaming is the Apache Spark API that lets you express computation on
streaming data *in the same way you express a batch computation on static
data.* That is the strength of Spark. Spark supports Java, Scala and Python
among others. Python or more specifically Pyspark is particularly popular
with Data Science plus the conventional analytics.
Structured Streaming Programming Guide - Spark 3.1.1 Documentation
(apache.org)
<https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>
There are two scenarios with Spark Structured Streaming. There are called
*foreach* and *foreachBatch* operations allow you to apply arbitrary
operations and write logic on the output of a streaming query. They have
slightly different use cases - w*hile **foreach** allows custom write logic
on every row,* *foreachBatch** allows arbitrary operations and custom logic
on the output of each micro-batch*.
HTH
view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Wed, 28 Apr 2021 at 20:12, Andrew Otto <otto@wikimedia.org> wrote:
> I'd assume this is because Kafka Streams is positioned for building
> streaming applications, rather than doing analytics, whereas Spark is more
> often used for analytics purposes.
>
Comments
Post a Comment