If we face the following error while running hadoop jobs
http://<masternode ip>:50030/jobtracker.jsp
Error:
“Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.”
Description:
Respective box name <machine name> is not updated in all the cluster configuration files (hadoop and /etc/hosts)
Steps to resolve:
Step 1:
Stop all the hadoop services in master node
hduser: /usr/local/hadoop/bin/stop-all.sh
Step 2:
Edit the hadoop configuration file and update the box name in all cluster nodes
Location: /usr/local/hadoop/conf/
List of files:
core-site.xml
mapred-site.xml
hdfs-site.xml
slaves
masters
Step 3:
Edit the following file and update the entries as follows
/etc/hosts
<ipaddress> space <box name>
Step 4:
Perform step 4 in all the cluster nodes
root: rm -rf /app/
root: mkdir -p /app/hadoop/tmp
root: chmod -R 0755 /app/
root: chown -R hduser:hadoop /app
hduser: /usr/local/hadoop/bin/hadoop namenode -format
Step 5:
Start all the hadoop services in the master node
hduser: /usr/local/hadoop/bin/start-all.sh
Step 6:
Check whether all the services are running
hduser: jps
Masternode : 6 services
Jps
DataNode
TaskTracker
SecondaryNameNode
NameNode
JobTracker
Slave Nodes : 3 services
Jps
DataNode
TaskTracker