The Hadoop 1.0 Bottleneck
In Hadoop 1.0, the JobTracker daemon managed both cluster resource allocation and execution monitoring.
- ◆Scaling Limits: JobTracker hit scalability walls at roughly 4,000 nodes, failing to handle concurrent queries.
- ◆Compute lock-in: The cluster was restricted to MapReduce computations, preventing engines like Spark or Storm from accessing HDFS.
The launch of Hadoop 2.0 YARN (Yet Another Resource Negotiator) resolves this by splitting cluster resource management from execution.
YARN Principle: Decouple cluster resource allocation from computation monitoring to create a multi-tenant big data infrastructure.
The YARN Architecture
YARN replaces JobTracker with a two-tiered manager:
- ◆ResourceManager (Global Master): Allocates compute resources (memory, CPU) across all applications in the cluster.
- ◆NodeManager (Node Agent): Monitors resource utilization (containers) on individual cluster nodes.
- ◆ApplicationMaster (App Master): A per-job manager that coordinates task execution with NodeManagers.
| Daemon | Scope | Core Responsibility |
|---|---|---|
| ResourceManager | Global Cluster | Allocates compute containers to applications. |
| NodeManager | Individual Node | Monitors CPU and RAM usage inside local containers. |
| ApplicationMaster | Individual Job | Requests resources and monitors job progress. |
Running Multi-Tenant Frameworks
With YARN, a single Hadoop cluster can run diverse frameworks concurrently:
<!-- Conceptual yarn-site.xml cluster resource configuration -->
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>yarn-master.shivamitcs.in</value>
</property>
</configuration>This multi-tenant architecture increases hardware utilization and allows organizations to run real-time analytics alongside standard batch transformations.