Automation of Hadoop Cluster

Abhishek Prasad Kesare
3 min readApr 16, 2021

Automation of Hadoop distributed file storage cluster with ansible

Before we start lets become familiar with some more terminologies

What is Hadoop?

Hadoop is a big data product made by a company Apache and this product provide us various ways for distributed computing and distributed storage.

Distributed storage: In todays world we have huge data for storage of this data we need hardware but if you have PB of data then the getting a single hardware with large storage become impossible. Here Hadoop hdfs cluster come into picture. This a master slave node cluster meant for storage of large data and with the concept of parallelism so the IOPS speed is also faster than real time hardware

Distributed computing: To access the data stored in HDFS cluster we need RAM and CPU having high power so Hadoop has one more product called map reduce cluster(MR cluster) so this has the same architecture as HDFS but instead of storage this is used for computing

Today we are going to see how we can setup HDFS cluster using ansible

so if you have previously setup hadoop then you need to clean the setup so go as show in below code

This will clear the master
This will clear the slaves

Now lets go and copy the required software's and packages to the master and slaves

Here we are copying jdk and hadoop version 1 software for cluster setup

After installing the Hadoop and java jdk we need to configure the hadoop datanode(slaves) and namenode(master) so that we create the distributed cluster.

so here we configuring the masternode for Hadoop cluster

Now let’s configure the datanodes so that they can share their storage to master node

when you run all the playbooks of ansible using below command you will see that Hadoop cluster has been completely setup.

ansible-playbook playbook_name.yml

To see hadoop cluster information:

hadoop dfsadmin -report 

That’s all for today!!

See in the next one 🙌

You willl find complete code on Github 💡

You can connect me on Linkedin 👬

--

--

Abhishek Prasad Kesare

Data science, , cloud computing, Artificial Intelligence, Cybersecurity,tech-blogger