Matthias,
Once a Spark dataframe is created by reading the data from Kafka (https://sparkbyexamples.com/spark/spark-streaming-with-kafka/) , you can use Spark SQL and all the aggregations that are shown in this page are valid. I feel that having this built into Kafka streams library would make it very easy.
Thanks
Mohan
On 4/28/21, 12:00 PM, "Matthias J. Sax" <mjsax@apache.org> wrote:
I am not familiar with all the details about Spark, however, the link
you shared is for Spark SQL. I thought Spark SQL is for batch processing
only?
Personally, I would be open to add more built-in aggregations next to
count(). It did not come up in the community so far, so there was no
investment yet.
-Matthias
On 4/28/21 10:30 AM, Parthasarathy, Mohan wrote:
> Hi,
>
> Whenever the discussion about what streaming framework to use for near-realtime analytics, there is normally a discussion about Spark vs Kafka streaming. One of the points in favor of Spark streaming is the simple aggregations that are built-in. See here: https://sparkbyexamples.com/spark/spark-sql-aggregate-functions/ . When it comes to Kafka streams, there is boilerplate code for some of them. Is there any reason why it is not provided as part of the library ? I am unable to find any discussion on this topic. Are there any plans to provide such features in the Kafka streaming library ?
>
> Thanks
> Mohan
>
Once a Spark dataframe is created by reading the data from Kafka (https://sparkbyexamples.com/spark/spark-streaming-with-kafka/) , you can use Spark SQL and all the aggregations that are shown in this page are valid. I feel that having this built into Kafka streams library would make it very easy.
Thanks
Mohan
On 4/28/21, 12:00 PM, "Matthias J. Sax" <mjsax@apache.org> wrote:
I am not familiar with all the details about Spark, however, the link
you shared is for Spark SQL. I thought Spark SQL is for batch processing
only?
Personally, I would be open to add more built-in aggregations next to
count(). It did not come up in the community so far, so there was no
investment yet.
-Matthias
On 4/28/21 10:30 AM, Parthasarathy, Mohan wrote:
> Hi,
>
> Whenever the discussion about what streaming framework to use for near-realtime analytics, there is normally a discussion about Spark vs Kafka streaming. One of the points in favor of Spark streaming is the simple aggregations that are built-in. See here: https://sparkbyexamples.com/spark/spark-sql-aggregate-functions/ . When it comes to Kafka streams, there is boilerplate code for some of them. Is there any reason why it is not provided as part of the library ? I am unable to find any discussion on this topic. Are there any plans to provide such features in the Kafka streaming library ?
>
> Thanks
> Mohan
>
Comments
Post a Comment