Hadoop 1 Installation: (Single-Node Cluster) in Ubuntu Linux Virtual Box

Step) Fix IP address of VirtualBox
Before your start, As per my personal experience-It is better to make your IP address Fix for your Ubuntu Virtual Box. This step is optional, you can skip as well.

Go to the network.. select options..go to IPV4 settings tab.. then instead of Automatic(dhcp).. select manual ..then click on ‘add’ & enter details as below:
address: 192.168.1.7
Netmask: 255.255.255.0
gateway: 192.168.*.*
DNS servers: 192.168.*.*

Save it..then from command prompt: sudo /etc/init.d/networking restart
finally restart the system .. confirm ip address & check if internet is working . Done !!!!

Refer to the link below for more details:

Step) Set up SSH:
sudo apt-get update -done
sudo apt-get install openssh-server (Check if this step is successful)

–Portmap is needed to map ports for openssh. The command is listed below:
sudo apt-get install portmap (this step is successfull now)

ssh-keygen -t rsa -P “”
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh localhost (Check if successful to proceed further)

Step) Install JDK
sudo mkdir -p /usr/local/java
sudo cp -r jdk-7u67-linux-x64.gz /usr/local/java/
sudo chmod a+x jdk-7u67-linux-x64.gz
sudo tar xvzf jdk-7u67-linux-x64.gz

–Add Java home directory to your path in profile, so that every time you login, you don’t need to traverse to your Java Installation:

sudo gedit /etc/profile
JAVA_HOME=/usr/local/java/jdk1.7.0_67
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export JAVA_HOME
export PATH

source /etc/profile

sudo update-alternatives –install “/usr/bin/java” “java” “/usr/local/java/jdk1.7.0_67/bin/java” 1
sudo update-alternatives –install “/usr/bin/javac” “javac” “/usr/local/java/jdk1.7.0_67/bin/javac” 1
sudo update-alternatives –install “/usr/bin/javaws” “javaws” “/usr/local/java/jdk1.7.0_67/bin/javaws” 1
sudo update-alternatives –set java /usr/local/java/jdk1.7.0_67/bin/java
sudo update-alternatives –set javac /usr/local/java/jdk1.7.0_67/bin/javac
sudo update-alternatives –set javaws /usr/local/java/jdk1.7.0_67/bin/javaws

-Type & confirm version of java in Command prompt: java -version

Step) Download and Extract Hadoop zip file
Download Apache Hadoop 1.0.4 source from Apache mirrors (Referral link as
https://archive.apache.org/dist/hadoop/core/hadoop-1.0.4/hadoop-1.0.4-bin.tar.gz )
and save in your temp folder (In my case I have saved in ‘SharedFolder’):

Now copy and extract hadoop-1.0.4-bin.tar.gz file to your desired directory, personally I prefer – /usr/local/

sudo cp /home/kirat/Documents/SharedFolder/hadoop-1.0.4-bin.tar.gz /usr/local/
sudo tar xvzf hadoop-1.0.4-bin.tar.gz
sudo rm -r hadoop-1.0.4-bin.tar.gz
sudo ln -s /usr/local/hadoop-1.0.4/ /opt/hadoop

Step) Set Hadoop Home path
Add hadoop home directory to your path in profile, so that every time you login, you don’t need to traverse to your Hadoop Installation

sudo gedit /etc/profile
export HADOOP_HOME=”/usr/local/hadoop-1.0.4
export PATH=”$HADOOP_HOME/bin:$PATH

source /etc/profile

Step) Set required permissions
sudo chmod -R 777 /usr/local/hadoop-1.0.4/
sudo chown -R kirat /usr/local/hadoop-1.0.4/
Step) Set Hadoop Configurations attributes
Let’s first set the base directory that specifies the location on the local filesystem under which Hadoop will keep all its data. Carry out the following steps:

1. Create a directory into which Hadoop will store its data:
sudo mkdir /var/lib/hadoop

2. Ensure the directory is writable by any user:
sudo chmod 777 /var/lib/hadoop

3. In hadoop_env.sh file set Java home
sudo gedit /usr/local/hadoop-1.0.4/conf/hadoop-env.sh
export JAVA_HOME=”/usr/local/java/jdk1.7.0_67″

4. Modify core-site.xml to add the following property:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.*.*:9000</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop</value>
</property>
</configuration>
4. Modify hdfs-site.xml to add the following property:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
5. Modify mapred-site.xml to add the following property:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.*.*:9001</value>
</property>

<property>
<name>mapred.reduce.child.java.opts</name>
<value>-Xmx1024M</value>
</property>
</configuration>

Step) Format the HDFS filesystem using NameNode
This is important & very first step before you start your Hadoop Cluster, this would format the Hadoop file system which is implemented on top of the local file system. This step is executed only for the first time when you set up a Hadoop cluster.

–To format the filesystem which initializes the directory “dfs.name.dir”, execute the following command
hadoop namenode -format

Step) Starting your single-node cluster
start-all.sh

** This would start a Namenode, Datanode, Jobtracker and a Tasktracker on your Ubuntu machine.

Step) Stopping your single-node cluster
Run the command : stop-all.sh

To access details via Hadoop Web Interfaces:
Web UI of NameNode: http://192.168.*.*:50070/dfshealth.jsp
Web UI of JobTracker: http://192.168.*.*:50030/jobtracker.jsp
Web UI of TaskTracker: http://192.168.*.*:50060/jobtracker.jsp
Successfully Done!!!!!!!!!!

Leave a Reply