Big Data

Introduction

In today's enterprise world, managing and analyzing vast amounts of data is crucial for gaining competitive insights and driving informed decision-making. This course will equip you with a fundamental understanding of big data concepts and the Hadoop framework, including HDFS, MapReduce, and YARN, tailored to enterprise applications. By the end of this course, you will possess the foundational skills needed to implement and leverage big data technologies in an enterprise environment, enhancing your organization's data processing capabilities.

Educational goals - objectifs pédagogiques

Discover how to manage the spectacular growth of data in the company.
Explore the different components of a Big Data cluster and how they interact.
Understand Big Data paradigms.
Understand the advantages of Open Source solutions.
Develop a Big Data project from scratch.

Prerequisites

SQL and Python programming, a good understanding of the Linux shell and Git.

Recommanded previous courses include DevOps and Git.

Modules

Module 1 (3h) - Big Data introduction

Information Systems
Distributed systems
Horizontal vs vertical scaling
Data structure
History of data
Distributed systems
The 3 Vs
Who needs Big Data?
Big Data clusters
Big Data clusters
The Hadoop Ecosystem
Data skils and profiles

Module 2 (3h) - Hadoop core: HDFS and YARN

Hadoop ecosystem introduction
Hadoop ecosystem projects
Hadoop core components
HDFS: presentation
HDFS: Master / Slave architecture
HDFS: Files storage
HDFS: Data replication example
HDFS: Client interactions
HDFS: Important properties
HDFS: Single Master mode vs High Availability
YARN: presentation
YARN: Architecture
YARN: Applications
YARN: Application lifecycle
YARN: Job scheduler and resource management

Module 3 (3h) - Distributed processing and the MapReduce framework

HDFS + YARN architecture
MapReduce: a framework
MapReduce: Application steps
MapReduce: Word count example
MapReduce: Distribution on a cluster
MapReduce: Important properties
MapReduce vs other frameworks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big Data

Introduction

Educational goals - objectifs pédagogiques

Prerequisites

Modules

Module 1 (3h) - Big Data introduction

Module 2 (3h) - Hadoop core: HDFS and YARN

Module 3 (3h) - Distributed processing and the MapReduce framework

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

Big Data

Introduction

Educational goals - objectifs pédagogiques

Prerequisites

Modules

Module 1 (3h) - Big Data introduction

Module 2 (3h) - Hadoop core: HDFS and YARN

Module 3 (3h) - Distributed processing and the MapReduce framework