Course Overview
Course Content
Hadoop MapReduce Framework
- Traditional way vs MapReduce way
- Why MapReduce
- YARN Components
- YARN Architecture
- YARN MapReduce Application Execution Flow
- YARN Workflow
- Anatomy of MapReduce Program
- Input Splits, Relation between Input Splits and HDFS Blocks
- MapReduce: Combiner & Partitioner
- Demo of Health Care Dataset
- Demo of Weather Dataset
Apache Hive
- Introduction to Apache Hive
- Hive vs Pig
- Hive Architecture and Components
- Hive Metastore
- Limitations of Hive
- Comparison with Traditional Database
- Hive Data Types and Data Models
- Hive Partition
- Hive Bucketing
- Hive Tables (Managed Tables and External Tables)
- Importing Data
- Querying Data & Managing Outputs
- Hive Script & Hive UDF
- Retail use case in Hive
- Hive Demo on Healthcare Dataset
Advanced Apache Hive and HBase
- Hive QL: Joining Tables, Dynamic Partitioning
- Custom MapReduce Scripts
- Hive Indexes and views
- Hive Query Optimizers
- Hive Thrift Server
- Hive UDF
- Apache HBase: Introduction to NoSQL Databases and HBase
- HBase v/s RDBMS
- HBase Components
- HBase Architecture
- HBase Run Modes
- HBase Configuration
- HBase Cluster Deployment