Various approaches to extract the cluster state from keepalived VRRP-instances

keepalived can be used to switch one or more IPs between one or more server systems. Typically, the underlying protocol for this is VRRP (Virtual Router Redundancy Protocol).

To determine whether localhost is currently in an active (ACTIVE), passive (BACKUP), or dysfunctional (FAULT) state within the server cluster, there are multiple approaches, each with its own pros and cons.

The goal is to provide a brief overview and examine some methods in detail, making it easier for other system administrators facing the same task.

TLDR: See “Part 5 – DBus”

Part 1 – keepalived with state dumping using a unix signal

keepalived provides the ability to write current process statistics to a file after receiving a specific signal on the Unix process. The signal for JSON format is 36, and it can be sent as follows:

kill -s $(keepalived --signum=JSON) $(cat /var/run/keepalived.pid)

Subsequently, a corresponding file is found in /tmp/keepalived.json with exported information. This approach can be processed in a script to extract only the cluster state.

#!/bin/bash

while getopts ":vs" opt; do
  case ${opt} in
    v ) verbose=1
      ;;
    s ) single=1
      ;;
    \? ) echo "Usage: cmd [-vs]"
         exit 1;
      ;;
  esac
done

# dump
json=$(kill -s $(keepalived --signum=JSON) $(cat /var/run/keepalived.pid) && cat /tmp/keepalived.json)

output=""

# loop over it
for instance in $(echo "$json" | jq -r '. | to_entries | .[].key'); do
    release_master=$(echo "$json" | jq -r '.['$instance'].stats.release_master')
    become_master=$(echo "$json" | jq -r '.['$instance'].stats.become_master')

    if [[ $verbose == 1 ]]; then
      iname=$(echo "$json" | jq -r '.['$instance'].data.iname')
      output+="$iname "
    fi

    if [ "$become_master" -gt "$release_master" ]; then
        output+="ACTIVE\n"
    elif [ "$become_master" -eq "$release_master" ]; then
        output+="BACKUP\n"
    else
        output+="UNKNOWN\n"
    fi
done

if [[ $single == 1 && $verbose != 1 ]]; then
  if [[ $output =~ ^("ACTIVE\n")+$ ]]; then
    echo "ACTIVE"
    exit 0
  else
    echo "UNKNOWN"
    exit 1
  fi
else
  echo -e "$output"
fi

In the script, the values “become_master” and “release_master” are compared in such a way that it determines the cluster status to be either ACTIVE or BACKUP.

Advantages:

  • Cluster state is nearly real-time
  • No need for a more complex SNMP setup
  • Only requires the installation of jq

Disadvantages:

  • Due to asynchronous Unix signals (“Fire and forget”), a race condition occurs with dumping statistics to /tmp/keepalived.json
  • These issues can be addressed with additional checks or inotify, but it increases complexity

Part 2 – SNMP

A slimmer script can be utilized if SNMP is available as a foundation. If SNMP setup already exists for collecting statistics, the following script becomes valuable.

apt install snmp snmpd snmp-mibs-downloader

# START /etc/snmp/snmpd.conf
sysLocation    Sitting on the Dock of the Bay
sysContact     Me <me@example.org>
sysServices    72

master  agentx
agentaddress  127.0.0.1,[::1]

rocommunity  public localhost
rocommunity6 public localhost

rouser authPrivUser authpriv -V systemonly

includeDir /etc/snmp/snmpd.conf.d
# END

# START /etc/snmp/snmp.conf
# Leere Datei oder wahlweise alles auskommentiert
# END

# START /etc/keepalived/keepalived.conf
global_defs {
  enable_snmp_vrrp
  enable_snmp_checker
  enable_snmp_rfc
  enable_snmp_rfcv2
  enable_snmp_rfcv3
}

[...]
# END

systemctl start snmpd
systemctl restart keepalived

# TEST
snmpwalk -v2c -c public localhost KEEPALIVED-MIB::version

If the current keepalived version is returned as output, everything has worked. Now, the status can be retrieved with a relatively slim bash script, for example, as follows:

#!/bin/bash

res=$(snmpwalk -v2c -Oe -OQ -Ov -c public localhost KEEPALIVED-MIB::vrrpInstanceState)

if [[ "$res" =~ ^2+$ ]]; then
    echo "ACTIVE"
    exit 0
elif [[ "$res" =~ ^1+$ ]]; then
    echo "BACKUP"
    exit 1
elif [[ "$res" =~ ^3+$ ]]; then
    echo "FAULT"
    exit 2
else
    echo "UNKNOWN"
    exit 3
fi

Advantages:

  • Current data with no race condition between Unix signal and data writing
  • Slim Bash script possible, with the installed snmpd available for additional data

Disadvantages:

  • Comparatively inflated setup if snmpd is not already in use
  • Some familiarity with SNMP commands and their functionality may be required

Teil 3 – keepalived notify script

Another mechanism to obtain the state is through the Notify function provided by keepalived. The notify script is executed on specific or all state changes. The configuration setting “notify” can be used to configure the script for each VRRP instance:

vrrp_instance INSTANZ1 {
  state BACKUP
  interface ens192
  notify /usr/bin/kdstate
  virtual_router_id 32
  priority 100
  authentication {
    auth_type PASS
    auth_pass etst1234
  }
  virtual_ipaddress {
    10.0.0.1
  }
}

The script at /usr/bin/kdstate would then look like this:

#!/bin/bash

echo $2 $3 > /tmp/kdstate.$2.txt

The parameters that keepalived is passing to the script would be:

  • $1 = “GROUP” or “INSTANCE”
  • $2 = name of group or instance
  • $3 = target state of transition (“MASTER”, “BACKUP”, “FAULT”)

Advantages:

  • A file is written on state change, providing data that can be efficiently retrieved
  • The solution can be combined with additional tasks based on state changes

Disadvantages:

  • Limited usable data, only sufficient for the intended purpose
  • Data is written only on state change, potentially resulting in outdated information

Addendum: keepalived also provides the option to redirect the notify output to a FIFO pipe. In combination with this feature and an appropriate listener on the pipe, other configurations beyond simple output to a file are conceivable.

Part 4 – Checking keepalived IPs without keepalived

This approach involves checking if the IPs associated with an instance or interface are present on the current system.

#!/bin/bash
(ip a | grep -q "10.0.0.1") || exit 1
(ip a | grep -q "10.0.0.2") || exit 1
echo "ACTIVE"

Advantages:

  • Relatively universally applicable
  • Simple script with minimal overhead

Disadvantages:

  • No differentiation between BACKUP and FAULT, only ACTIVE check
  • The output state may not be correct in case of errors (e.g., IP conflict)
  • IPs must be fully known prior (e. g. saved in some file or in some config management)

Part 5 – DBus

Since version 1.3.0, keepalived supports integration with DBus, allowing retrieval of the keepalived state with various DBus clients. We enable DBus like that:

global_defs {
  enable_dbus
}

And can retrieve the data like this:

#!/bin/bash

for instance in $(busctl --list tree org.keepalived.Vrrp1  | grep -E 'IPv(4|6)$'); do

  res+=$(busctl --json=short  get-property org.keepalived.Vrrp1 \
    $instance \
    org.keepalived.Vrrp1.Instance State | jq '.data[0]')

done

if [[ "$res" =~ ^[2]+$ ]]; then
    echo "ACTIVE"
    exit 0
elif [[ "$res" =~ ^[1]+$ ]]; then
    echo "BACKUP"
    exit 1
elif [[ "$res" =~ ^[3]+$ ]]; then
    echo "FAULT"
    exit 2
else
    echo "UNKNOWN"
    exit 3
fi

There are also great ways to fetch the data from DBus in different languages, a Python example can be found here.

Advantages:

  • DBus properties are loaded from the application runtime, ensuring real-time data
  • No additional daemon installation required
  • Relatively slim script for checking

Disadvantages:

  • Keepalived DBus properties need iteration over all combinations of NIC+VRRP instance+IP version

Conclusion

There are numerous, sometimes convoluted solutions for querying the status of keepalived instances. Five approaches were discussed here, with Part 5 being my preferred method due to its elegance and efficiency. Best of luck in implementing your solution.

Leave a Reply

Your email address will not be published. Required fields are marked *