Saturday, February 27, 2016

Install Hadoop on Ubuntu (Single-Node Cluster)



In this tutorial I will describe the required steps for setting up a single-node Hadoop cluster which running on Ubuntu 15.04.


According to wiki in simple Hadoop is a framework written in Java for running applications which stand in Big Data analysis.




The main goal of this tutorial is avoiding common mistakes on Hadoop installation in Ubuntu(Linux) environment

In future i'll describe install and configure HIVE and SCOOP in addition HIVE-SQOOP with MySQL.

Prerequisites
  • Ubuntu 15.04 with Internet connectivity
  • Java (Open JDK or Oracle Java)

Steps in brief 

  • Install and configure Java
  • Install and configure SSH
  • Install and configure Hadoop
  • Configure System variables
  • Configure Hadoop variables

Install and Configure Java

In this tutorial i'll use Open JDK , you can use Oracle java instead of Open JDK 

Oracle JAVA(via apt-get)
Open the terminal and enter below command step by step
  • sudo apt-get install python-software-properties
  • sudo add-apt-repository ppa:ferramroberto/java
  • sudo apt-get update
  • sudo apt-get install sun-java6-jdk
  • sudo update-java-alternatives -s java-6-sun
The full JDK which will be placed in /usr/lib/jvm/java-6-sun

Open JDK(via apt-get)

  • sudo apt-get update
  • sudo apt-get install default-jdk
  • sudo update-alternatives --config java
The full JDK which will be placed in /usr/lib/jvm/java-7-openjdk-i386

Install and Configure SSH

  • sudo apt-get install ssh
  • sudo apt-get install rsync
  • ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
  • cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Install and Configure HADOOP(2.7)


  • wget -c http://apache.mirrors.lucidnetworks.net/hadoop/common/hadoop-2.7.0/hadoop-2.7.0.tar.gz
  • sudo tar -zxvf hadoop-2.7.0.tar.gz
  • sudo mv hadoop-2.7.0 hadoop
  • sudo chmod 777 hadoop
  • sudo mv hadoop /usr/local/hadoop

Configure System variables

Enter following command and paste below lines then save and exit
  • sudo nano ~/.bashrc
Paste below lines then save and exit (Ctrl+x)

#Hadoop Variables
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"


For applying system wide changes enter one of below commands
  • bash
  • source ~/.bashrc

Configure Hadoop variables

  • cd /usr/local/hadoop/etc/hadoop
Edit following configuration files and do the below modifications 
  • sudo nano hadoop-env.sh
          #The java implementation to use.
          export JAVA_HOME="/usr/lib/jvm/java-7-openjdk-i386"

  • sudo nano core-site.xml

          <configuration>
                  <property>
                      <name>fs.defaultFS</name>
                      <value>hdfs://localhost:9000</value>
                  </property>
          </configuration>

  • sudo nano yarn-site.xml
          <configuration>
                  <property>
                      <name>yarn.nodemanager.aux-services</name>
                      <value>mapreduce_shuffle</value>
                  <property>
                  <property>
                      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                      <value> org.apache.hadoop.mapred.ShuffleHandler</value>
                  </property>
          </configuration>

  • sudo cp mapred-site.xml.template mapred-site.xml
  • sudo nano mapred-site.xml
          <configuration>
                  <property>
                      <name>mapreduce.framework.name</name>
                      <value>yarn</value>
                  </property>
          </configuration>

  • sudo nano hdfs-site.xml
          <configuration>
                  <property>
                      <name>dfs.replication</name>
                      <value>1</value>
                  </property>
                  <property>
                      <name>dfs.namenode.name.dir</name>
                      <value>file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
                  </property>
                  <property>
                      <name>dfs.datanode.data.dir</name>
                      <value>file:/usr/local/hadoop/hadoop_store/hdfs/datanode</value>
                  </property>
          </configuration>

  • cd
  • mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
  • mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
  • sudo chown kavinda:kavinda -R /usr/local/hadoop (your user name)
  • hdfs namenode -format

Up and Running the Hadoop cluster

  • start-all.sh (start hadoop)
  • jps (check the status)
  • in your web browser enter below addresses
    • localhost:8088
    • localhost:50070
  • stop-all.sh (stop hadoop)

















Sinhala Support


Followers

Designed By Seo Blogger Templates