In today's enterprise world, managing and analyzing vast amounts of data is crucial for gaining competitive insights and driving informed decision-making. This course will equip you with a fundamental understanding of big data concepts and the Hadoop framework, including HDFS, MapReduce, and YARN, tailored to enterprise applications. By the end of this course, you will possess the foundational skills needed to implement and leverage big data technologies in an enterprise environment, enhancing your organization's data processing capabilities.
- Discover how to manage the spectacular growth of data in the company.
- Explore the different components of a Big Data cluster and how they interact.
- Understand Big Data paradigms.
- Understand the advantages of Open Source solutions.
- Develop a Big Data project from scratch.
SQL and Python programming, a good understanding of the Linux shell and Git.
Recommanded previous courses include DevOps and Git.
- Information Systems
- Distributed systems
- Horizontal vs vertical scaling
- Data structure
- History of data
- Distributed systems
- The 3 Vs
- Who needs Big Data?
- Big Data clusters
- Big Data clusters
- The Hadoop Ecosystem
- Data skils and profiles
- Hadoop ecosystem introduction
- Hadoop ecosystem projects
- Hadoop core components
- HDFS: presentation
- HDFS: Master / Slave architecture
- HDFS: Files storage
- HDFS: Data replication example
- HDFS: Client interactions
- HDFS: Important properties
- HDFS: Single Master mode vs High Availability
- YARN: presentation
- YARN: Architecture
- YARN: Applications
- YARN: Application lifecycle
- YARN: Job scheduler and resource management
- HDFS + YARN architecture
- MapReduce: a framework
- MapReduce: Application steps
- MapReduce: Word count example
- MapReduce: Distribution on a cluster
- MapReduce: Important properties
- MapReduce vs other frameworks