Configure Hadoop and start cluster services using Ansible Playbook

prerequisite:
- Ansible installed
We are going to setup Hadoop 1 cluster over ec2 instances with ansible playbook.
On my controller node Hadoop 1 is installed. After Hadoop installation we get all the files for the configuration of Hadoop cluster.
In my case Hadoop 1 is installed on controller node.
I am going to configure one instance as master node and three as slave nodes.
Usually for Hadoop cluster to work three slaves are required.

Here root@master is my controller node. In other four ec2 instances you can see there is no jps command means Hadoop is not installed.
Let’s setup the Hadoop cluster with ansible:
Inventory

Here there are two hosts group and name and data. The name is for the master node and data is for the slave nodes.
The all host group will be used for every host that Ansible is ran against.
Ansible playbook

The above is the ansible playbook which i will run against the host groups to setup the Hadoop cluster.
hosts: all
This play will install the hadoop and java on every host the ansible ran against.
get_url module downloads the Hadoop rpm package form the url on all the clients . Copy module uploads the jdk 8 rpm package on the clients from the controller node.
You can also copy the hadoop rpm package instead of downloading the rpm package.
Finally installing the hadoop and java with the command module.
hosts: name
This play makes directory /nn , uploads the hdfs-site.xml and core-site.xml files and finally start namenode or master node service.
The hdfs-site.xml and core-site.xml files are as follows:

hosts: data
This play makes directory /dn , uploads the hdfs-site.xml and core-site.xml files and finally start datanode or slavenode service.
The hdfs-site.xml and core-site.xml files are as follows:

Running the ansible playbook


Now the jps

The master node

Uploading the file

File uploaded

Reading the file
