TECHGENX

Leading to the Future Digital World

Spark and Python for Big Data with PySpark course

 

Please Note: All below course content will be covered in practical scenarios and regular assignments will be shared. All sessions will be recorded and shared with student for future reference (free of cost). Along with below course.

Course Details

  1. What is Big Data?
  2. Hadoop and HDFS?
  3. Apache Spark?

Setting up Python with PySpark

  1. Use Python and Spark together to analyze Big Data
  2. Work on Consulting Projects that mimic real world situations

Databricks setup

  1. Learn about the DataBricks Platform

Local VirtualBox set up

AWS EC2 Pyspark setup

  1. Get set up on Amazon Web Services EC2 for Big Data Analysis.
  2. Learn how to use AWS Elastic MapReduce Service!

AWS EMR Cluster set up

Spark dataframe Basics.

  1. What is Spark Session and how to use it
  2. Creating dataframes using PySpark
  3. Reading and writing Csv files
  4. Reading Json files using PySpark
  5. Data Fabrication using PySpark
  6. Join 2 data frames in pyspark
  7. Udf functions in pyspark (replacement of pandas lambda function)
  8. Use of spark-submit command in terminal for PySpark Application
  9. Customize logger lib of PySpark as per project in real world scenario.
  10. Creating schemas and adopting user-defined schemas in PySpark
  11. Use of function lib in Pyspark

Spark Dataframe Project exercise

Introduction with Machine learning with Mlib

  1. Use Spark's MLlib to create Powerful Machine Learning Models

Linear Regression

Logistics Regression

  1. Classify Customer Churn with Logisitic Regression

Decision Tree and Random Forests

  1. Learn how to use Spark's Gradient Boosted Trees
  2. Use Spark with Random Forests for Classification

K-means clustering

Collaborative Filtering for Recommender Systems

Natural Language processing

  1. Create a Spam filter using Spark and Natural Language Processing!

Spark streaming with Python

  1. Learn how to leverage the power of Linux with a Spark Environment.
  2. Use Spark Streaming to Analyze Tweets in Real Time!

Students Testimonials

  • Seeing the demand of Python in programming, I decided to enroll for weekend classes of Python then after joining, Ankit sir took our class and from the beginning we attended all the classes..

    gaurav saini
  • Ankit sir is one of the best mentors I have ever had and he's supportive also . He's is the professional in every branch he know and....

    Anshul
  • Ankit sir is the best trainer and have best knowledge in coding he is the best teacher and also supports the idea help to develop them

    rishabh jain
Read More.....