Data Engineering, Serverless ETL & BI on Amazon Cloud

Data warehousing & ETL on AWS Cloud

AWS Cloud can seem intimidating and overwhelming to a lot of people due to its vast ecosystem, but this course will make it easier for anyone who wants a hands-on expertise in setting up a data-warehouse in Redshift or setup a BI infrastructure from scratch .

What you’ll learn

  • Setting up a Data Warehouse on Amazon Cloud using Redshift from scratch.
  • Learn and understand AWS Athena and when to make use of Athena.
  • Learn how to store data in S3 Data lakes using Parquet columnar file formats and optimize the process of data scans using Athena.
  • Learn and automate the ETL processes using different server-less components like AWS Glue , Data Pipeline and Lambda Functions.
  • Data Centralization using Redshift Spectrum.
  • Trigger and Automate Glue jobs using Lambda Functions.
  • Understand how to pull data into QuickSight which is a BI-Reporting/Visualization offering from AWS.

Course Content

  • About the Course & Introduction –> 3 lectures • 10min.
  • Getting Started with Redshift and Mysql RDS –> 8 lectures • 49min.
  • ETL and Syncing Transactional Data with Redshift DWH –> 15 lectures • 2hr 23min.
  • Data Lakes & Handling External Data Sources –> 8 lectures • 1hr 12min.
  • Redshift Spectrum –> 3 lectures • 20min.
  • Quicksight – BI / Reporting and Visualization –> 3 lectures • 31min.
  • Redshift – Optimization Techniques and Fine tuning –> 7 lectures • 48min.
  • Bonus – Do more with AWS Glue –> 2 lectures • 20min.

Data Engineering, Serverless ETL & BI on Amazon Cloud

Requirements

  • Hands on expertise on Python & Sql is a must.
  • should have a technical background or prior experience in Pyspark (at least beginner level).
  • Basic understanding of different cloud components (AWS ,GCP or Azure ).

AWS Cloud can seem intimidating and overwhelming to a lot of people due to its vast ecosystem, but this course will make it easier for anyone who wants a hands-on expertise in setting up a data-warehouse in Redshift or setup a BI infrastructure from scratch .

Data Scientists/Analysts/Business Analysts will soon be expected to (if not already) become all-rounders and handle the technical aspect of data ingestion/engineering/warehousing .

Anyone who has the basic understanding of how cloud works can benefit from this course because :

– This course is designed keeping in mind end to end life cycle of a typical data engineering project

–  Provides a practical solution to real-world use-cases

This Course covers :

  • Setting up a data warehouse in AWS Redshift from scratch
  • Basic Data Warehousing Concepts
  • Writing server-less AWS Glue Jobs (pyspark and python shell) for ETL and batch processing
  • AWS Athena for ad-hoc analysis (when to use Athena)
  • AWS Data Pipeline to sync incremental data
  • Lambda functions to trigger and automate ETL/Data Syncing processes
  • QuickSight Setup , Analyses and Dashboards

Prerequisites for this course are :

  • Python / Sql (Absolute must)
  • PySpark (should know how to write some basic Pyspark scripts)
  • Willingness to explore ,learn and put in the extra effort to succeed
  • An active AWS Account

Important Note – This course makes use of the free tiers for Redshift and RDS , so you will not be billed for them unless you exceed the free tier usage which should be more than enough to get enough practice from this course  .

Also , this course makes use of AWS UI on the browser for creating clusters and setting up jobs , there is no bash scripting involved. One can use any operating system to perform the lab sessions in this course .

This course is not code-intense or code-heavy ,there is only 35% coding involved , the rest is execution,understanding and chaining different component together. The whole purpose of this course is to make everyone aware of and feel comfortable with all the tools/features used in this course .

Some Tips : 

  • Try to watch the videos at 1.2X speed
  • Every time you work on a new component or feature , do some research on the other tools that are meant for the same purpose and see how they differ and in what aspects , For Eg  Redshift/Athena vs  Snowflake or Bigquery , QuickSight vs PowerBi vs Microstrategy