OpenVPN Client Runtime Issues Revisited

After having run OpenVPN clients in production for quite a bit of time, it became clear that having an easy way to resurrect dead tunnels was a good feature to have. In addition, having the ability to quickly switch egress IPs from a pool of available servers was more or less mandatory as well. For this reason, I needed to improve my hackish scripts and make them a bit more presentable. A great utility called shellcheck(1) available from ports helped me do just that. In fact, I ended up rewriting most of my shell scripts concocted for one reason or another. A shoutout to an anonymous user who pointed out that utility to me!

Bring out your dead! (Part 2)

* * * * *       /etc/openvpn/scripts/resurrect.sh

The blunt instrument above has been quite useful and has improved the stability of my client tunnels a great deal. Whenever an openvpn client process disappears it will be up and running in no time. It is not elegant, but it works.

#!/bin/ksh

TMP1=$(mktemp)
TMP2=$(mktemp)
TMP3=$(mktemp)
PIDS=$(pgrep openvpn)

find /etc/openvpn/scripts -type d -name 'tun*' | cut -d '/' -f 5 | sort > "$TMP1"

for p in ${PIDS}
    do ps -p "$p" | grep -E 'tun[0-4]' | awk '{print $7}' >> "$TMP2"
done

sort "$TMP2" > "$TMP3"

DEAD=$(comm -23 "$TMP1" "$TMP3")

for D in ${DEAD}
    do logger -i "Resurrecting a dead tunnel: $D"
    rm -f /var/run/"$D"
    sh /etc/netstart "$D"
done

rm "$TMP1" "$TMP2" "$TMP3"

Recycle Your “Zombie” Tunnels

There are, however, other failure modes where the openvpn client process continues to be alive, but the actual tunnel is not functional. It is for this reason, I needed to write a second script which picks a new client config from a pool of available configs per tunnel interface and kicks the tunnel back up again with that new config.

I wanted to incorporate this to work with the netstart script, so that I could just call netstart for a given interface and the script would do all housekeeping for me per tunnel interface.

# cat /etc/hostname.tun0
up
rdomain 10
!/etc/openvpn/scripts/tun0/recycle.sh

I was also able to refine the openvpn client configuration a bit as well, so that the config files provided by the VPN provider work out of the box without any modification.

#!/bin/ksh

set -A configs FixMe1 FixMe2 FixMeN

dev=tun0
openvpn=/usr/local/sbin/openvpn
pid=$(pgrep -f "/var/run/$dev")

if [[ -n "${pid}" ]]; then
    currentconfig=$(ps -p "$pid" | grep -F tun | awk '{print $12}' | cut -d "/" -f 4)
else
    currentconfig=None
fi

while true; do
    RANDOM=$$$(date +%s)
    selectedconfig=${configs[$RANDOM % ${#configs[@]} ]}
    if [[ $selectedconfig != "$currentconfig" ]]; then
        break
    fi
done

logger -i "Switching VPN config from $currentconfig to $selectedconfig."

if [[ -n "${pid}" ]]; then
    kill "$pid"
fi

rm -f "/var/run/$dev"

${openvpn} --dev "$dev" --writepid "/var/run/$dev" --daemon \
    --config "/etc/openvpn/$selectedconfig" --persist-local-ip \
    --user _openvpn --group _openvpn --auth-user-pass /etc/openvpn/auth.txt \
    --script-security 2 --ping 10 --route-noexec \
    --route-up "/etc/openvpn/scripts/$dev/up.sh" \
    --ifconfig-noexec --up "/etc/openvpn/scripts/$dev/up.sh"

exit 0

So what happens here is that the script checks the current running config and then picks a new one from a pool of configs and makes sure that it is not the current one. It then kicks up the new session with the randomly selected new config. Since the pool of configs is different for each tunnel interface, I found it the simplest to replicate the script for each interface and its appropriate list of configs. This is the same approach as I used for up.sh, since setting up the environment was cleaner and more explicit that way as well.

Originally, I thought about setting up a cron job to recycle the VPN tunnels periodically, but that would interfere with long running TCP connections or other functionality relying on the egress IP staying the same for longer periods of time. In any case, this setup now has the benefit of picking random IPs from a pool of hundreds of different ones in different geolocations, whether at boot time or on demand.