Automation of Hadoop Cluster
Automation of Hadoop distributed file storage cluster with ansible
Before we start lets become familiar with some more terminologies
What is Hadoop?
Hadoop is a big data product made by a company Apache and this product provide us various ways for distributed computing and distributed storage.
Distributed storage: In todays world we have huge data for storage of this data we need hardware but if you have PB of data then the getting a single hardware with large storage become impossible. Here Hadoop hdfs cluster come into picture. This a master slave node cluster meant for storage of large data and with the concept of parallelism so the IOPS speed is also faster than real time hardware
Distributed computing: To access the data stored in HDFS cluster we need RAM and CPU having high power so Hadoop has one more product called map reduce cluster(MR cluster) so this has the same architecture as HDFS but instead of storage this is used for computing
Today we are going to see how we can setup HDFS cluster using ansible
so if you have previously setup hadoop then you need to clean the setup so go as show in below code
Now lets go and copy the required software's and packages to the master and slaves
After installing the Hadoop and java jdk we need to configure the hadoop datanode(slaves) and namenode(master) so that we create the distributed cluster.
so here we configuring the masternode for Hadoop cluster
Now let’s configure the datanodes so that they can share their storage to master node
when you run all the playbooks of ansible using below command you will see that Hadoop cluster has been completely setup.
ansible-playbook playbook_name.yml
To see hadoop cluster information:
hadoop dfsadmin -report
That’s all for today!!
See in the next one 🙌
You willl find complete code on Github 💡
You can connect me on Linkedin 👬