hadoop - MapReduce job hangs, waiting for AM container to be allocated

Question

Welcome To Ask or Share your Answers For Others

hadoop - MapReduce job hangs, waiting for AM container to be allocated

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

hadoop - MapReduce job hangs, waiting for AM container to be allocated

I tried to run simple word count as MapReduce job. Everything works fine when run locally (all work done on Name Node). But, when I try to run it on a cluster using YARN (adding mapreduce.framework.name=yarn to mapred-site.conf) job hangs.

I came across a similar problem here: MapReduce jobs get stuck in Accepted state

Output from job:

*** START ***
15/12/25 17:52:50 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/12/25 17:52:51 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/12/25 17:52:51 INFO input.FileInputFormat: Total input paths to process : 5
15/12/25 17:52:52 INFO mapreduce.JobSubmitter: number of splits:5
15/12/25 17:52:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1451083949804_0001
15/12/25 17:52:53 INFO impl.YarnClientImpl: Submitted application application_1451083949804_0001
15/12/25 17:52:53 INFO mapreduce.Job: The url to track the job: http://hadoop-droplet:8088/proxy/application_1451083949804_0001/
15/12/25 17:52:53 INFO mapreduce.Job: Running job: job_1451083949804_0001

mapred-site.xml:

<configuration>

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>

<property>
   <name>mapreduce.job.tracker</name>
   <value>localhost:54311</value>
</property> 

<!--
<property>
   <name>mapreduce.job.tracker.reserved.physicalmemory.mb</name>
   <value></value>
</property>

<property>
   <name>mapreduce.map.memory.mb</name>
   <value>1024</value>
</property>

<property>
   <name>mapreduce.reduce.memory.mb</name>
   <value>2048</value>
</property>    

<property>
   <name>yarn.app.mapreduce.am.resource.mb</name>
   <value>3000</value>
   <source>mapred-site.xml</source>
</property> -->

</configuration>

yarn-site.xml

<configuration>
 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>

<!--
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3000</value>
<source>yarn-site.xml</source>
</property>

<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>500</value>
</property>

<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>3000</value>
</property>
-->

</configuration>

//I the left commented options - they were not solving the problem

YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.

What can be the problem?

EDIT:

I tried this configuration (commented) on machines: NameNode(8GB RAM) + 2x DataNode (4GB RAM). I get the same effect: Job hangs on ACCEPTED state.

EDIT2: changed configuration (thanks @Manjunath Ballur) to:

yarn-site.xml:

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop-droplet</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>hadoop-droplet:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>hadoop-droplet:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>hadoop-droplet:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>hadoop-droplet:8033</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>hadoop-droplet:8088</value>
  </property>
  <property>
    <description>Classpath for typical applications.</description>
    <name>yarn.application.classpath</name>
    <value>
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
        $YARN_HOME/*,$YARN_HOME/lib/*
    </value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local</value>
  </property>
  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs</value>
  </property>
  <property>
    <description>Where to aggregate logs</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/var/log/hadoop-yarn/apps</value>
  </property>
  <property> 
    <name>yarn.scheduler.minimum-allocation-mb</name> 
    <value>50</value>
  </property>
  <property> 
    <name>yarn.scheduler.maximum-allocation-mb</name> 
    <value>390</value>
  </property>
  <property> 
    <name>yarn.nodemanager.resource.memory-mb</name> 
    <value>390</value>
  </property>
</configuration>

mapred-site.xml:

<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>

<property>  
    <name>yarn.app.mapreduce.am.resource.mb</name>  
    <value>50</value>
</property>
<property> 
    <name>yarn.app.mapreduce.am.command-opts</name> 
    <value>-Xmx40m</value>
</property>
<property>
    <name>mapreduce.map.memory.mb</name>
    <value>50</value>
</property>
<property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>50</value>
</property>
<property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx40m</value>
</property>
<property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx40m</value>
</property>
</configuration>

Still not working. Additional info: I can see no nodes on cluster preview (similar problem here: Slave nodes not in Yarn ResourceManager )

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:02:55+0000

You should check the status of Node managers in your cluster. If the NM nodes are short on disk space then RM will mark them "unhealthy" and those NMs can't allocate new containers.

1) Check the Unhealthy nodes: http://<active_RM>:8088/cluster/nodes/unhealthy

If the "health report" tab says "local-dirs are bad" then it means you need to cleanup some disk space from these nodes.

2) Check the DFS dfs.data.dir property in hdfs-site.xml. It points the location on local file system where hdfs data is stored.

3) Login to those machines and use df -h & hadoop fs - du -h commands to measure the space occupied.

4) Verify hadoop trash and delete it if it's blocking you. hadoop fs -du -h /user/user_name/.Trash and hadoop fs -rm -r /user/user_name/.Trash/*

Categories

hadoop - MapReduce job hangs, waiting for AM container to be allocated

hadoop - MapReduce job hangs, waiting for AM container to be allocated

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags