Configure Hadoop and start cluster services using Ansible Playbook

3 min readJan 24, 2021

prerequisite:

Ansible installed

We are going to setup Hadoop 1 cluster over ec2 instances with ansible playbook.

On my controller node Hadoop 1 is installed. After Hadoop installation we get all the files for the configuration of Hadoop cluster.

In my case Hadoop 1 is installed on controller node.

I am going to configure one instance as master node and three as slave nodes.

Usually for Hadoop cluster to work three slaves are required.

Here root@master is my controller node. In other four ec2 instances you can see there is no jps command means Hadoop is not installed.

Let’s setup the Hadoop cluster with ansible:

Inventory

Here there are two hosts group and name and data. The name is for the master node and data is for the slave nodes.

The all host group will be used for every host that Ansible is ran against.

Ansible playbook

The above is the ansible playbook which i will run against the host groups to setup the Hadoop cluster.

hosts: all

This play will install the hadoop and java on every host the ansible ran against.

get_url module downloads the Hadoop rpm package form the url on all the clients . Copy module uploads the jdk 8 rpm package on the clients from the controller node.

You can also copy the hadoop rpm package instead of downloading the rpm package.

Finally installing the hadoop and java with the command module.

hosts: name

This play makes directory /nn , uploads the hdfs-site.xml and core-site.xml files and finally start namenode or master node service.

The hdfs-site.xml and core-site.xml files are as follows:

hosts: data

This play makes directory /dn , uploads the hdfs-site.xml and core-site.xml files and finally start datanode or slavenode service.

The hdfs-site.xml and core-site.xml files are as follows:

Running the ansible playbook

Now the jps

The master node

Uploading the file

File uploaded

Reading the file

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Hadoop With Ansible

Hadoop Automation

Ansible Power

Written by Shobhit Singh Pal

2 Followers

6 Following

Learning Practicing Exploring

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

Ansible roles: Everything You Need to Know

0xtr1gger

Ansible roles: Everything You Need to Know

What are Ansible roles and how to write them + practical project for Linux user security automation!

Jan 24

The Ultimate RCA (Root Cause Analysis) Document for Cloud Infrastructure Failures

Mihir Popat

The Ultimate RCA (Root Cause Analysis) Document for Cloud Infrastructure Failures

Cloud infrastructure is the backbone of modern digital operations, but failures are inevitable. When downtime strikes, businesses lose…

Feb 17

Lists

Staff picks

827 stories1649 saves

Stories to Help You Level-Up at Work

19 stories948 saves

Self-Improvement 101

20 stories3355 saves

Productivity 101

20 stories2821 saves

100 Essential Commands, Scripts, and Hacks : The DevOps Engineer’s Survival Guide

Zudonu Osomudeya

100 Essential Commands, Scripts, and Hacks : The DevOps Engineer’s Survival Guide

100 Essential Commands, Scripts, and Hacks

Mar 3

171

How I Am Using a Lifetime 100% Free Server

Harendra

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Oct 26, 2024

9.4K

172

Troubleshooting Skills for Ansible Playbooks

Nirbhay Singh

Troubleshooting Skills for Ansible Playbooks

Ansible has become indispensable for automating IT infrastructure, but like any powerful tool, it requires users to master troubleshooting…

Jan 2

Technical Guide: End-to-End CI/CD DevOps with Jenkins, Docker, Kubernetes, ArgoCD, Github Actions , AWS EC2 and Terraform by Joel .O Wembo

Django Unleashed

Joel Wembo

Technical Guide: End-to-End CI/CD DevOps with Jenkins, Docker, Kubernetes, ArgoCD, Github Actions …

Building an end-to-end CI/CD pipeline for Django applications using Jenkins, Docker, Kubernetes, ArgoCD, AWS EKS, AWS EC2

Apr 12, 2024

1.2K

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams