DATA MANAGEMENT AND ORGANIZATION (2 SCU)
Learning Outcomes:
On successful completion of this course, students will be able to: LO1 – Describe the basic principles of data management and big data processing; LO2 – Demonstrate HDFS commands and Spark SQL; LO3 – Perform data preparation and machine learning using PySpark; LO4 – Perform Extract, Transform, Load (ETL) with Spark.
Topics:
- Working with Apache Spark;
- PySpark for Supervised Machine Learning;
- PySpark Variable selection;
- Utility Functions in PySpark;
- ETL (Extract, Transform, and Load);
- SQL on Big Data Landscape;
- Optimizing Spark Applications;
- Model Evaluation using PySpark;
- Working with PySpark;
- Spark Streaming;
- Data Management & Data Governance;
- Spark SQL.
SOCIAL MEDIA
Let’s relentlessly connected and get caught up each other.
Looking for tweets ...