keepalived can be used to switch one or more IPs between one or more server systems. Typically, the underlying protocol for this is VRRP (Virtual Router Redundancy Protocol).
To determine whether localhost is currently in an active (ACTIVE), passive (BACKUP), or dysfunctional (FAULT) state within the server cluster, there are multiple approaches, each with its own pros and cons.
The goal is to provide a brief overview and examine some methods in detail, making it easier for other system administrators facing the same task.
TLDR: See “Part 5 – DBus”
Part 1 – keepalived with state dumping using a unix signal
keepalived provides the ability to write current process statistics to a file after receiving a specific signal on the Unix process. The signal for JSON format is 36, and it can be sent as follows:
kill -s $(keepalived --signum=JSON) $(cat /var/run/keepalived.pid)
Subsequently, a corresponding file is found in /tmp/keepalived.json with exported information. This approach can be processed in a script to extract only the cluster state.
#!/bin/bash
while getopts ":vs" opt; do
case ${opt} in
v ) verbose=1
;;
s ) single=1
;;
\? ) echo "Usage: cmd [-vs]"
exit 1;
;;
esac
done
# dump
json=$(kill -s $(keepalived --signum=JSON) $(cat /var/run/keepalived.pid) && cat /tmp/keepalived.json)
output=""
# loop over it
for instance in $(echo "$json" | jq -r '. | to_entries | .[].key'); do
release_master=$(echo "$json" | jq -r '.['$instance'].stats.release_master')
become_master=$(echo "$json" | jq -r '.['$instance'].stats.become_master')
if [[ $verbose == 1 ]]; then
iname=$(echo "$json" | jq -r '.['$instance'].data.iname')
output+="$iname "
fi
if [ "$become_master" -gt "$release_master" ]; then
output+="ACTIVE\n"
elif [ "$become_master" -eq "$release_master" ]; then
output+="BACKUP\n"
else
output+="UNKNOWN\n"
fi
done
if [[ $single == 1 && $verbose != 1 ]]; then
if [[ $output =~ ^("ACTIVE\n")+$ ]]; then
echo "ACTIVE"
exit 0
else
echo "UNKNOWN"
exit 1
fi
else
echo -e "$output"
fi
In the script, the values “become_master” and “release_master” are compared in such a way that it determines the cluster status to be either ACTIVE or BACKUP.
Advantages:
- Cluster state is nearly real-time
- No need for a more complex SNMP setup
- Only requires the installation of jq
Disadvantages:
- Due to asynchronous Unix signals (“Fire and forget”), a race condition occurs with dumping statistics to /tmp/keepalived.json
- These issues can be addressed with additional checks or inotify, but it increases complexity
Part 2 – SNMP
A slimmer script can be utilized if SNMP is available as a foundation. If SNMP setup already exists for collecting statistics, the following script becomes valuable.
apt install snmp snmpd snmp-mibs-downloader
# START /etc/snmp/snmpd.conf
sysLocation Sitting on the Dock of the Bay
sysContact Me <me@example.org>
sysServices 72
master agentx
agentaddress 127.0.0.1,[::1]
rocommunity public localhost
rocommunity6 public localhost
rouser authPrivUser authpriv -V systemonly
includeDir /etc/snmp/snmpd.conf.d
# END
# START /etc/snmp/snmp.conf
# Leere Datei oder wahlweise alles auskommentiert
# END
# START /etc/keepalived/keepalived.conf
global_defs {
enable_snmp_vrrp
enable_snmp_checker
enable_snmp_rfc
enable_snmp_rfcv2
enable_snmp_rfcv3
}
[...]
# END
systemctl start snmpd
systemctl restart keepalived
# TEST
snmpwalk -v2c -c public localhost KEEPALIVED-MIB::version
If the current keepalived version is returned as output, everything has worked. Now, the status can be retrieved with a relatively slim bash script, for example, as follows:
#!/bin/bash
res=$(snmpwalk -v2c -Oe -OQ -Ov -c public localhost KEEPALIVED-MIB::vrrpInstanceState)
if [[ "$res" =~ ^2+$ ]]; then
echo "ACTIVE"
exit 0
elif [[ "$res" =~ ^1+$ ]]; then
echo "BACKUP"
exit 1
elif [[ "$res" =~ ^3+$ ]]; then
echo "FAULT"
exit 2
else
echo "UNKNOWN"
exit 3
fi
Advantages:
- Current data with no race condition between Unix signal and data writing
- Slim Bash script possible, with the installed snmpd available for additional data
Disadvantages:
- Comparatively inflated setup if snmpd is not already in use
- Some familiarity with SNMP commands and their functionality may be required
Teil 3 – keepalived notify script
Another mechanism to obtain the state is through the Notify function provided by keepalived. The notify script is executed on specific or all state changes. The configuration setting “notify” can be used to configure the script for each VRRP instance:
vrrp_instance INSTANZ1 {
state BACKUP
interface ens192
notify /usr/bin/kdstate
virtual_router_id 32
priority 100
authentication {
auth_type PASS
auth_pass etst1234
}
virtual_ipaddress {
10.0.0.1
}
}
The script at /usr/bin/kdstate would then look like this:
#!/bin/bash
echo $2 $3 > /tmp/kdstate.$2.txt
The parameters that keepalived is passing to the script would be:
- $1 = “GROUP” or “INSTANCE”
- $2 = name of group or instance
- $3 = target state of transition (“MASTER”, “BACKUP”, “FAULT”)
Advantages:
- A file is written on state change, providing data that can be efficiently retrieved
- The solution can be combined with additional tasks based on state changes
Disadvantages:
- Limited usable data, only sufficient for the intended purpose
- Data is written only on state change, potentially resulting in outdated information
Addendum: keepalived also provides the option to redirect the notify output to a FIFO pipe. In combination with this feature and an appropriate listener on the pipe, other configurations beyond simple output to a file are conceivable.
Part 4 – Checking keepalived IPs without keepalived
This approach involves checking if the IPs associated with an instance or interface are present on the current system.
#!/bin/bash
(ip a | grep -q "10.0.0.1") || exit 1
(ip a | grep -q "10.0.0.2") || exit 1
echo "ACTIVE"
Advantages:
- Relatively universally applicable
- Simple script with minimal overhead
Disadvantages:
- No differentiation between BACKUP and FAULT, only ACTIVE check
- The output state may not be correct in case of errors (e.g., IP conflict)
- IPs must be fully known prior (e. g. saved in some file or in some config management)
Part 5 – DBus
Since version 1.3.0, keepalived supports integration with DBus, allowing retrieval of the keepalived state with various DBus clients. We enable DBus like that:
global_defs {
enable_dbus
}
And can retrieve the data like this:
#!/bin/bash
for instance in $(busctl --list tree org.keepalived.Vrrp1 | grep -E 'IPv(4|6)$'); do
res+=$(busctl --json=short get-property org.keepalived.Vrrp1 \
$instance \
org.keepalived.Vrrp1.Instance State | jq '.data[0]')
done
if [[ "$res" =~ ^[2]+$ ]]; then
echo "ACTIVE"
exit 0
elif [[ "$res" =~ ^[1]+$ ]]; then
echo "BACKUP"
exit 1
elif [[ "$res" =~ ^[3]+$ ]]; then
echo "FAULT"
exit 2
else
echo "UNKNOWN"
exit 3
fi
There are also great ways to fetch the data from DBus in different languages, a Python example can be found here.
Advantages:
- DBus properties are loaded from the application runtime, ensuring real-time data
- No additional daemon installation required
- Relatively slim script for checking
Disadvantages:
- Keepalived DBus properties need iteration over all combinations of NIC+VRRP instance+IP version
Conclusion
There are numerous, sometimes convoluted solutions for querying the status of keepalived instances. Five approaches were discussed here, with Part 5 being my preferred method due to its elegance and efficiency. Best of luck in implementing your solution.