Automation of Hadoop Cluster

3 min readApr 16, 2021

Automation of Hadoop distributed file storage cluster with ansible

Before we start lets become familiar with some more terminologies

What is Hadoop?

Hadoop is a big data product made by a company Apache and this product provide us various ways for distributed computing and distributed storage.

Distributed storage: In todays world we have huge data for storage of this data we need hardware but if you have PB of data then the getting a single hardware with large storage become impossible. Here Hadoop hdfs cluster come into picture. This a master slave node cluster meant for storage of large data and with the concept of parallelism so the IOPS speed is also faster than real time hardware

Distributed computing: To access the data stored in HDFS cluster we need RAM and CPU having high power so Hadoop has one more product called map reduce cluster(MR cluster) so this has the same architecture as HDFS but instead of storage this is used for computing

Today we are going to see how we can setup HDFS cluster using ansible

so if you have previously setup hadoop then you need to clean the setup so go as show in below code

Now lets go and copy the required software's and packages to the master and slaves

Here we are copying jdk and hadoop version 1 software for cluster setup

After installing the Hadoop and java jdk we need to configure the hadoop datanode(slaves) and namenode(master) so that we create the distributed cluster.

so here we configuring the masternode for Hadoop cluster