Introduction
This article provides information about adding a new node (server) to the Logsign Cluster structure.
New Node Configuration
- When adding a new node to the cluster, the Date/Time settings area on the newly installed server should be filled out correctly first.
- Logsign license key must be entered.
- You should ensure that the newly installed server is updated.
- An internal interface is added to the cluster:
Cluster communication is confirmed by pinging the other node internal network with the "ping 2.2.2.2" command from Cluster nodes.
The new node is added to the cluster plan:
In the Logsign interface, select Settings > General Settings > Cluster menu and fill out the information by selecting the "Add Server" button. (When adding a new node from Add Server, enter the internal interface IP.)
- Roles are assigned to the server on the main screen and saved.
- The cluster plan is applied.
Run the apply plan command on the first machine and wait for it to finish.
Apply plan command: /opt/logsign-venv/bin/logsign-python /opt/logsign-maintenance/cluster.py apply_plan
- The newly added node is rebooted and when it comes up, another apply plan command is sent from the first machine.
- Elasticsearch is checked from kopf. The new machine should have joined the cluster and started taking over shards.
- HDFS services are checked.
HDFS services:
HDFS log files are located under the /var/log/hadoop-hdfs/ directory.
HDFS services are checked on all machines and confirmed to be running properly.
- You will receive an error in the hdfs journalnode service on the newly added machine. To solve the problem:
The "current" file under the "/var/lib/hadoop-hdfs/cache/hdfs/dfs/jn/logsigncluster/" directory on the active namenode machine is copied to the /home/iadmin directory on the newly added machine. (We are not taking the in_use.lock file.)
Command: scp -r /var/lib/hadoop-hdfs/cache/hdfs/dfs/jn/logsigncluster/current srv4:/home/iadmin
After copying, go to the /home/iadmin directory on the new machine and check file permissions with the "ls -lah" command.
You will see that the Curren file came with root:root permissions. We need to change the chown permissions to hdfs by running the "chown -R hdfs:hdfs current" command.
Then we copy the current file to the /var/lib/hadoop-hdfs/cache/hdfs/dfs/jn/logsigncluster/ directory and restart the journalnode with the service hadoop-hdfs-journalnode restart command.
When this step is completed, you should recheck the HDFS services. Do not move on to the next step without seeing that they are working properly.
- HDFS replication and rebalance process:
We run the "hadoop dfs -setrep -w -R 3 /" command on the active namenode machine and wait for the process to finish. (This process can take a long time depending on the amount of data.)
After the replication process is completed, we may need to rebalance. Rebalance is a process to be performed when there are large differences in disk usage rates on servers.
The rebalance command is sent on the active namenode machine.
HDFS rebalance command: hdfs balancer -threshold 1
Like the replication process, the rebalance process can take a long time depending on the amount of data.
When all steps are completed and you see that HDFS and Elasticsearch are working properly, we should perform general cluster checks and confirm that all added roles are working properly. If we do not encounter any problems in the checks, the process is successfully completed.