Apache Spark 3 – Real-time Stream Processing using Scala

Learn to create Real-time Stream Processing applications using Apache Spark

About the Course

What you’ll learn

  • Real-time Stream Processing Concepts.
  • Spark Structured Streaming APIs and Architecture.
  • Working with File Streams.
  • Working With Kafka Source and Integrating Spark with Kafka.
  • State-less and State-full Streaming Transformations.
  • Windowing Aggregates using Spark Stream.
  • Watermarking and State Cleanup.
  • Streaming Joins and Aggregation.
  • Handling Memory Problems with Streaming Joins.
  • Creating Arbitrary Streaming Sinks.

Course Content

  • Before you start –> 2 lectures • 4min.
  • Setup your Environment –> 5 lectures • 33min.
  • Getting started with Spark Structured Streaming –> 7 lectures • 1hr 11min.
  • Spark Streaming with Kafka –> 6 lectures • 45min.
  • Windowing and Aggregates –> 6 lectures • 1hr 2min.
  • Stream Processing and Joins –> 4 lectures • 42min.
  • Keep Learning –> 2 lectures • 1min.

Apache Spark 3 - Real-time Stream Processing using Scala

Requirements

  • Spark Fundamentals and exposure to Spark Dataframe APIs.
  • Kafka Fundamentals and working knowledge of Apache Kafka.
  • Programming Knowledge Using Scala Programming Language.
  • A Recent 64-bit Windows/Mac/Linux Machine with 8 GB RAM.

About the Course

I am creating Apache Spark 3 – Real-time Stream Processing using the Scala course to help you understand the Real-time Stream processing using Apache Spark and apply that knowledge to build real-time stream processing solutions. This course is example-driven and follows a working session like approach. We will be taking a live coding approach and explain all the needed concepts along the way.

Who should take this Course?

I designed this course for software engineers willing to develop a Real-time Stream Processing Pipeline and application using the Apache Spark. I am also creating this course for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. Another group of people is the managers and architects who do not directly work with Spark implementation. Still, they work with the people who implement Apache Spark at the ground level.

Spark Version used in the Course

This Course is using the Apache Spark 3.x. I have tested all the source code and examples used in this Course on Apache Spark 3.0.0 open-source distribution.