Azure Databricks – Build data engineering and AI/ML pipeline

Learn anomaly detection, Data Factory, Azure functions, Spark, Delta lake, Kafka, Event Hub, CI/CD using azure devops

This course is designed to help you develop the skill necessary to perform ETL operations in Databricks, build unsupervised anomaly detection models, learn MLOPS, perform CI/CD operations in databricks and Deploy machine learning models into production.

What you’ll learn

  • What is Anomaly detection?.
  • How to apply unsupervised learning algorithms Isolation Forest, KNN and Clustering based Approach to detect anomalies?.
  • Step by Step guide to perform ETL operations using Azure Databricks.
  • Understand DataLakeHouse Architecture.
  • Build Data Pipeline using Azure Tech stack.
  • machine learning model interpretable shapley values.
  • Spark structured streaming with Kafka.
  • Spark Structured streaming with Azure Event Hub.
  • Use MLFlow for managing the end-to-end machine learning lifecycle.
  • Anomaly detection on Time series data.
  • Building CI/CD Pipeline using Azure Devops.
  • Building Data Pipeline using Azure Data Factory.
  • Productionizing model using Azure Function and Docker.

Course Content

  • Introduction –> 3 lectures • 6min.
  • Introduction to anomaly detection –> 4 lectures • 12min.
  • Anomaly detection-LAB –> 6 lectures • 20min.
  • Data Lake house architecture –> 4 lectures • 23min.
  • Build Data pipeline using Azure tech stack –> 7 lectures • 30min.
  • Explainable AI –> 4 lectures • 17min.
  • Spark Structured streaming –> 5 lectures • 50min.
  • MLOPS using MLFlow in DataBricks –> 8 lectures • 47min.
  • Anomaly detection on Time Series Data –> 1 lecture • 1min.
  • Building CI_CD Pipeline using Azure Devops –> 1 lecture • 1min.

Auto Draft

Requirements

  • Basic knowledge on python programming language.
  • Basic understanding of Bigdata Ecosystems.
  • Basic understanding of Pyspark.

This course is designed to help you develop the skill necessary to perform ETL operations in Databricks, build unsupervised anomaly detection models, learn MLOPS, perform CI/CD operations in databricks and Deploy machine learning models into production.

 

Big Data engineering:

Big data engineers interact with massive data processing systems and databases in large-scale computing environments. Big data engineers provide organizations with analyses that help them assess their performance, identify market demographics, and predict upcoming changes and market trends.

 

Azure Databricks:

Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. Azure Databricks offers three environments for developing data intensive applications: Databricks SQL, Databricks Data Science & Engineering, and Databricks Machine Learning.

 

Anomlay detection:

Anomaly detection (aka outlier analysis) is a step in data mining that identifies data points, events, and/or observations that deviate from a dataset’s normal behavior. Anomalous data can indicate critical incidents, such as a technical glitch, or potential opportunities, for instance a change in consumer behavior. Machine learning is progressively being used to automate anomaly detection.

 

Data Lake House:

A data lakehouse is a data solution concept that combines elements of the data warehouse with those of the data lake. Data lakehouses implement data warehouses’ data structures and management features for data lakes, which are typically more cost-effective for data storage .

 

Explainable AI:

Explainable AI is artificial intelligence in which the results of the solution can be understood by humans. It contrasts with the concept of the “black box” in machine learning where even its designers cannot explain why an AI arrived at a specific decision.

 

Spark structured streaming:

Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. .In short, Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming.

 

CI/CD Operation :

CI and CD stand for continuous integration and continuous delivery/continuous deployment. In very simple terms, CI is a modern software development practice in which incremental code changes are made frequently and reliably.

 

 

Get Tutorial