Apache Hadoop Tutorial – We shall learn to install Apache Hadoop on Ubuntu. Java is a prerequisite to run Hadoop.
Install Apache Hadoop on Ubuntu
Following is a step by step guide to Install Apache Hadoop on Ubuntu
Install Java
Hadoop is an open-source framework written in Java. So, for Hadoop to run on your computer, you should install Java in prior.
Open a terminal and run the following command :
$ sudo apt-get install default-jdk
To verify the installation of Java, run the following command in the terminal :
$ java -version
The output for the command would be as shown below.
hadoopuser@tutorialkart:~# java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
Install Hadoop
Download latest Hadoop binary package from http://hadoop.apache.org/releases.html.
Look for latest stable release (not in alpha channel) and click on binary link provided for the release.
Click on the first mirror link
Copy the downloaded tar file to /usr/lib/ and untar.
$ sudo cp hadoop-2.8.1.tar.gz /usr/lib/
$ sudo tar zxf hadoop-2.8.1.tar.gz
$ sudo rm hadoop-2.8.1.tar.gz
Provide the password if asked.
Set Java and Hadoop Path
Make sure you have the PATHs set up for Java and Hadoop in bashrc file.
Open a Terminal and run the following command to edit bashrc file.
$ sudo nano ~/.bashrc
Paste the following entries at the end of .bashrc file.
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/default-java/jre
export HADOOP_INSTALL=/usr/lib/hadoop-2.8.1
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END
Run Hadoop
After setting up the path for Hadoop and Java, you may run the hadoop command, from anywhere, using the terminal.
$ hadoop
The output would be as shown below :
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp copy file or directories recursively
archive -archiveName NAME -p * create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
Conclusion
In this Apache Hadoop Tutorial, we have successfully installed Hadoop on Ubuntu. In subsequent tutorials, we shall look into HDFS and MapReduce and start with Word Count Example in Hadoop.