Hadoop Cluster (HDFS) Configuration using ansible.

4 min readDec 16, 2020

Whenever there is need to perform any task repetitively, over and over again we need automation and when it comes to configuration management we always think of an automation tool like Ansible.

What is Ansible?

Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates.
Ansible’s main goals are simplicity and ease-of-use. It also has a strong focus on security and reliability.

Ansible is mainly used for :

Provisioning: Set up the various servers you need in your infrastructure.
Configuration management: Change the configuration of an application, OS, or device; start and stop services; install or update applications; implement a security policy; or perform a wide variety of other configuration tasks.
Application deployment: Make DevOps easier by automating the deployment of internally developed applications to your production systems.

Ansible Architecture:

Ansible Playbooks: Ordered list of tasks, saved so that you can run those tasks in that order repeatedly. Playbooks are written in YAML format.

Inventory: A list of managed nodes. An inventory file is also sometimes called as a host file. Your inventory can specify information like IP address for each managed node.

Control Node: Any machine with ansible installed is called as controller node. You can run ansible commands and playbooks by invoking ansible-playbook command from any control node.

Managed Node: The machines you manage with ansible are called as managed nodes or hosts.

What is Hadoop?

Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data.
Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets more quickly.