Hadoop Installation

Hadoop Installation

Hadoop Installation

This basic tutorial is based on the Ubuntu system as an installation environment. To reduce unnecessary trouble, please install Ubuntu and boot into the system. Java must also be installed.

1/. Adding a Hadoop System User Group and User Use the following command to create a user group by executing the following command in the terminal:

sudo addgroup hadoop_

Use the following command to add a user:

sudo adduser --ingroup hadoop_ hduser_

2/.  configure SSH In order to manage nodes in the cluster, Hadoop needs SSH access. First, switch users and enter the following command

su hduser_

The following command will create a new key.

ssh-keygen -t rsa -P ""

Use this key to enable SSH access to the local machine.

cat /home/hduser_/.ssd/id_rsa.pub >> /home/hduser_/.ssh/authorized_keys

Now, test the SSH settings to connect to locahost via the “hduser” user.

ssh localhost

Note that if you see the following error response after executing the ‘ssh localhost’ command, SSH may not be available on this system.

To solve the above problem, re-install the SSH service

sudo apt-get purge openssh-server
sudo apt-get install openssh-server
3/. download Hadoop Open the URL in the browser: http://hadoop.apache.org/releases.html

After the download is complete, the file is placed by default in /home/username/Downloads

sudo tar xzf hadoop-3.1.2.tar.gz
sudo mv hadoop-3.1.2 /usr/local/hadoop
cd /usr/local
sudo chown -R hduser_:hadoop_ hadoop

At this point, the relevant work for the hadoop file is ready, and we have to do some key settings.

4/.  modify the ~/.bashrc file Add the following lines to the end of the ~/.bashrc file as follows:
nano ~/.bashrc

Now use the following command environment configuration

. ~/.bashrc

5/.  Configuring the associated HDFS settings JAVA_HOME In the file

/usr/local/hadoop/etc/hadoop/hadoop-env.sh, use the following line instead, that is, write the full Java installation path. As follows:

There are two more parameters to set in the $HADOOP_HOME/etc/hadoop/core-site.xml file:

1. ‘hadoop.tmp.dir’ – Used to specify a directory for Hadoop to store its data files.

2. ‘fs.default.name’ – specify the default file system In order to set two parameters, open the file core-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

Copy the contents of all the following lines into the middle of the tag <configuration></configuration>.

The result is shown below:
sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

Go to the directory: /usr/local/hadoop/etc/hadoop and use the following command:

cd /usr/local/hadoop/etc/hadoop 

Now create a directory, such as the directory used in core-site.xml above: /app/hadoop/tmp

sudo mkdir -p /app/hadoop/tmp

Grant the permissions directory /app/hadoop/tmp and execute the following command:

sudo chown -R hduser_:hadoop_  /app/hadoop/tmp 
sudo chmod 750 /app/hadoop/tmp

6/. Map Reduce configuration  

Before setting this configuration, we need to set the path to HADOOP_HOME and execute the following command:

sudo nano/etc/profile.d/hadoop.sh

Then enter the following line,

export HADOOP_HOME=/usr/local/hadoop

Then execute the following command

sudo chmod +x /etc/profile.d/hadoop.sh

Exit the command line terminal and enter again and enter the following command: echo $HADOOP_HOME to verify the path to hadoop :


Now copy the file and execute the following command:

sudo cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following settings to the tags <configuration> and </configuration> as shown below:

sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Create the directory specified by the above configuration and authorize the directory to the user, using the following command:

sudo mkdir -p /home/hduser_/hdfs 
sudo chown -R hduser_:hadoop_ /home/hduser_/hdfs 
sudo chmod 750 /home/hduser_/hdfs

7/.  Format HDFS

Before using Hadoop first, you need to format HDFS first, using the following command.

$HADOOP_HOME/bin/hdfs namenode -format

The results of the execution are shown below:

Enter: y, continue …

Note: You may encounter some prompt errors: “java.io.IOException: Cannot create directory /app/hadoop/tmp/dfs/name/current…”, execute the following command to create a directory:

sudo mkdir -p /app/hadoop/tmp/dfs/name/current 
sudo chmod -R a+w /app/hadoop/tmp/dfs/name/current/

8/. Start a single-node cluster of Hadoop

Start a single-node cluster of cHadoop (using the hduser_ user to start) with the following command, as follows:

[email protected]:~$ $HADOOP_HOME/sbin/start-yarn.sh

Now use the ‘jps’ tool/command to verify that all Hadoop-related processes are running.


If Hadoop starts successfully, the jps output should show: NameNode, NodeManager, ResourceManager, SecondaryNameNode, DataNode.

9/. Stop/close Hadoop


Start typing and press Enter to search