A single ISP connection is a single point of failure. For engineers on customer calls, async video reviews, or live deployments, a dropped connection at the wrong moment costs trust and time. This guide covers a practical dual-ISP failover setup for home offices that achieves automatic failover in under 30 seconds.
Table of Contents
- The Core Setup
- Hardware Option 1: GL.iNet Flint 2 (Budget)
- Hardware Option 2: Peplink Balance One ($299)
- 4G/5G Backup Modem Recommendations
- Testing Failover Behavior
- Application-Level Failover Gaps
- Monitoring Connection Health
- Budget Breakdown
- Configuring DNS Resilience Alongside Internet Redundancy
- Automating Failover Notifications
- WireGuard VPN Across Dual-WAN
- Failover Testing Methodology
- Carrier Coverage Comparison for Backup Internet
- Monitoring Failover Health Over Time
- Mosh Configuration for Persistent Remote Sessions
- Application-Level Failover Configuration Template
- Practical Failover Checklist
- Related Reading
The Core Setup
The goal is two independent internet connections that switch automatically when the primary fails:
Primary ISP (fiber/cable)
↓
Router with
dual-WAN failover ──→ Your devices
↓
Secondary ISP (4G/5G cellular)
Hardware that supports this natively: Firewalla Gold Plus, GL.iNet Flint 2, Peplink Balance One (prosumer), or a Mikrotik RouterOS setup.
Hardware Option 1: GL.iNet Flint 2 (Budget)
The GL-MT6000 runs OpenWrt and supports WAN failover out of the box for ~$100.
# SSH into the router after initial setup
ssh root@192.168.8.1
# Check current WAN status
uci show network.wan
uci show network.wan6
# Configure the secondary WAN (USB tethering from phone or USB modem)
# Flint 2 supports USB tethering natively through the UI
# Go to: Network → Internet → Add → USB Tethering
Failover configuration via OpenWrt UCI:
# Set up mwan3 (multi-WAN manager)
opkg update && opkg install mwan3 luci-app-mwan3
# Configure tracking targets for each WAN
cat > /etc/config/mwan3 << 'EOF'
config globals 'globals'
option mmx_mask '0x3F00'
config interface 'wan'
option enabled 1
option track_ip '8.8.8.8 1.1.1.1'
option reliability 1
option count 2
option timeout 2
option interval 5
option down 3
option up 8
config interface 'wanb'
option enabled 1
option track_ip '8.8.8.8 1.1.1.1'
option reliability 1
option count 2
option timeout 2
option interval 5
option down 3
option up 8
config rule 'default_rule'
option sticky 1
option use_policy 'failover'
config policy 'failover'
option use_member 'wan_100'
config policy 'failover'
list use_member 'wan_100'
list use_member 'wanb_200'
config member 'wan_100'
option interface wan
option metric 1
option weight 100
config member 'wanb_200'
option interface wanb
option metric 2
option weight 100
EOF
service mwan3 restart
Hardware Option 2: Peplink Balance One ($299)
Purpose-built for dual-WAN failover with a simpler UI. Plug in both connections, enable SpeedFusion health checks, done.
Key settings:
- Health Check: HTTP/HTTPS to
www.gstatic.comevery 5 seconds - Failover to secondary when primary misses 3 consecutive checks
- Recovery: switch back to primary after 8 consecutive successes
The 3-miss / 8-success asymmetry prevents flapping on an unstable primary connection.
4G/5G Backup Modem Recommendations
| Device | Band Coverage | Speed | Monthly |
|---|---|---|---|
| Netgear M6 Pro | 5G mmWave + Sub-6 | 4Gbps theoretical | SIM-based |
| GL.iNet Mudi v2 | 4G LTE | 150Mbps | SIM-based |
| Solis Lite | 4G LTE, global | 50Mbps | 3GB free/day |
| Phone USB tethering | 4G/5G (carrier) | Depends | Existing plan |
For most remote engineers, phone USB tethering is the cheapest backup — most carrier plans include tethering at no extra cost. The latency is higher than fiber but sufficient for SSH, async video, and Slack.
Testing Failover Behavior
# Install mtr for continuous path monitoring
brew install mtr # macOS
# or: apt install mtr-tiny # Linux
# Watch the path to Google DNS in real time
sudo mtr 8.8.8.8 --report-cycles 1000 --interval 0.5
# Simulate primary failure: unplug the cable or disable WAN1 in router UI
# Watch mtr — you should see packet loss for 15-30 seconds, then recovery via WAN2
Monitor the transition time. Acceptable: under 30 seconds. Unacceptable: over 90 seconds (indicates health check intervals are too long or recovery threshold is too high).
Application-Level Failover Gaps
Automatic failover at the router level doesn’t fix everything. Stateful connections break:
- SSH sessions: use
moshinstead ofssh— it reconnects automatically - Video calls: Zoom/Meet reconnect themselves within 30 seconds
- VPN tunnels: most split automatically, but check your VPN client settings
- Database connections: poolers like PgBouncer reconnect; raw
psqlsessions drop
# Install mosh for SSH resilience
brew install mosh # macOS client
apt install mosh # server
# Connect with mosh instead of ssh
mosh user@server.example.com
# mosh keeps your session alive through IP changes and reconnects transparently
Monitoring Connection Health
# Simple cron-based uptime logger
cat > /usr/local/bin/check-internet.sh << 'EOF'
#!/bin/bash
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
if curl -s --max-time 5 https://www.google.com > /dev/null; then
echo "$TIMESTAMP OK"
else
echo "$TIMESTAMP DOWN"
fi
EOF
chmod +x /usr/local/bin/check-internet.sh
# Add to crontab: check every minute
echo "* * * * * /usr/local/bin/check-internet.sh >> /var/log/internet-uptime.log" | crontab -
# Calculate monthly uptime
awk '/DOWN/ {down++} /OK/ {up++} END {print "Uptime: " up/(up+down)*100 "%"}' /var/log/internet-uptime.log
Budget Breakdown
| Component | Cost |
|---|---|
| GL.iNet Flint 2 router | $100 |
| Netgear M6 Pro 5G modem | $249 |
| Prepaid SIM (3GB/day) | $25-40/month |
| Total monthly | $25-40 + existing ISP |
For a $40/month total add-on, you eliminate the most common cause of remote work disruption.
Configuring DNS Resilience Alongside Internet Redundancy
Dual-WAN failover handles the physical link layer, but DNS failures can make the internet appear down even when your connection is working. Remote workers operating from home offices with a single DNS resolver (typically provided by the ISP) experience DNS outages during partial connectivity issues even when raw packet routing is functional.
Configure your router or devices to use multiple DNS resolvers with fallback logic. On a GL.iNet router with OpenWrt, override the default ISP DNS:
# SSH into router
ssh root@192.168.8.1
# Configure DNS with multiple upstream resolvers
uci set dhcp.@dnsmasq[0].server='1.1.1.1'
uci add_list dhcp.@dnsmasq[0].server='8.8.8.8'
uci add_list dhcp.@dnsmasq[0].server='9.9.9.9'
uci set dhcp.@dnsmasq[0].noresolv='1'
uci commit dhcp
service dnsmasq restart
# Verify DNS is resolving through both servers
nslookup google.com 1.1.1.1
nslookup google.com 8.8.8.8
For devices where you cannot control the router, configure DNS-over-HTTPS directly on macOS or Linux to bypass ISP DNS entirely:
# Install cloudflared for DNS-over-HTTPS on macOS
brew install cloudflare/cloudflare/cloudflared
# Create LaunchDaemon for automatic startup
sudo cloudflared service install --legacy
# Configure to proxy DNS requests locally
sudo tee /Library/LaunchDaemons/com.cloudflare.cloudflared.plist > /dev/null << 'EOF'
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/cloudflared</string>
<string>proxy-dns</string>
<string>--port</string>
<string>5053</string>
<string>--upstream</string>
<string>https://1.1.1.1/dns-query</string>
<string>--upstream</string>
<string>https://1.0.0.1/dns-query</string>
</array>
EOF
# Point system DNS to local resolver
networksetup -setdnsservers "Wi-Fi" 127.0.0.1
This setup means DNS queries succeed as long as either your primary or secondary internet connection works, and they are encrypted against ISP inspection regardless of which connection is active.
Automating Failover Notifications
Knowing when failover occurred is valuable for diagnosing patterns—if you fail over to cellular every day between 9 and 10 AM, that signals a recurring ISP issue worth reporting. Build a simple notification system that alerts you when the active WAN changes:
# /usr/local/bin/wan-monitor.sh
# Run via cron every 2 minutes on the router or a local machine
#!/bin/bash
STATE_FILE="/tmp/wan-state"
CURRENT_WAN=$(curl -s --max-time 3 --interface eth0 https://ipinfo.io/ip 2>/dev/null)
BACKUP_WAN=$(curl -s --max-time 3 --interface wwan0 https://ipinfo.io/ip 2>/dev/null)
# Check which WAN is currently active by testing connectivity
if curl -s --max-time 5 --interface eth0 https://www.google.com > /dev/null 2>&1; then
ACTIVE="primary"
else
ACTIVE="backup"
fi
PREV=$(cat "$STATE_FILE" 2>/dev/null || echo "primary")
if [ "$ACTIVE" != "$PREV" ]; then
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
echo "$TIMESTAMP: WAN switched from $PREV to $ACTIVE" >> /var/log/wan-failover.log
# Send notification via ntfy.sh (free push notification service)
curl -s -d "WAN switched to $ACTIVE at $TIMESTAMP" \
https://ntfy.sh/your-unique-channel-name > /dev/null
echo "$ACTIVE" > "$STATE_FILE"
fi
# Add to crontab
echo "*/2 * * * * /usr/local/bin/wan-monitor.sh" | crontab -
# Subscribe to notifications on your phone via the ntfy app
# Channel: your-unique-channel-name (use a hard-to-guess string)
This gives you a historical log of failover events and real-time mobile notifications. After a week, review /var/log/wan-failover.log to identify patterns in your ISP’s reliability.
WireGuard VPN Across Dual-WAN
Remote engineers often use VPNs for accessing office infrastructure. Standard VPN connections bind to a specific IP address and drop when that address changes during failover. WireGuard handles this better than OpenVPN or IPSec because it uses UDP and re-establishes connections quickly after an IP change.
Configure WireGuard to reconnect automatically after failover:
# /etc/wireguard/wg0.conf
[Interface]
PrivateKey = YOUR_PRIVATE_KEY
Address = 10.0.0.2/24
DNS = 10.0.0.1
# Keep WireGuard alive through IP changes
PostUp = wg set wg0 peer PEER_PUBLIC_KEY endpoint your-vpn-server.com:51820
PostDown = echo "WireGuard down"
[Peer]
PublicKey = PEER_PUBLIC_KEY
Endpoint = your-vpn-server.com:51820
AllowedIPs = 10.0.0.0/24, 172.16.0.0/12
PersistentKeepalive = 25
The PersistentKeepalive = 25 setting sends a keepalive packet every 25 seconds, which maintains NAT table entries through connection switches. After failover, WireGuard reconnects within one keepalive interval—typically under 30 seconds—without requiring any user action.
Failover Testing Methodology
Systematic testing ensures your failover setup works when it matters most—during actual internet disruption.
# Step 1: Baseline connectivity test
ping -c 10 8.8.8.8 | grep -E "min|avg|max" # Record these numbers
# Step 2: Simulate primary WAN failure
# For GL.iNet Flint 2: SSH into the router
ssh root@192.168.8.1
# Disable WAN interface
ifconfig wan down
# Step 3: Monitor failover transition
mtr -r 8.8.8.8 --report-cycles 100 &
sleep 2
# Count packets lost during transition
# Watch for packet loss spike followed by recovery
# Step 4: Verify secondary WAN is handling traffic
# Check gateway routing
ip route
# Should show 4G/5G interface as active route
# Step 5: Re-enable primary WAN
ifconfig wan up
# Verify automatic fallback within your configured threshold
Document your actual failover time. Most remote workers accept up to 30 seconds; calls and SSH sessions will drop briefly but reconnect. Document whether your setup achieves this target.
Carrier Coverage Comparison for Backup Internet
Choose your backup ISP based on coverage in your specific location, not generic ratings:
| Carrier | Technology | Coverage | Speed Test | Typical Latency |
|---|---|---|---|---|
| AT&T | 5G mmWave + LTE | 90%+ | 400-1000 Mbps | 20-30ms |
| Verizon | 5G UWB + LTE | 85%+ | 300-800 Mbps | 25-35ms |
| T-Mobile | 5G + LTE | 88%+ | 200-600 Mbps | 30-40ms |
| US Cellular | LTE (4G) | 60%+ | 30-100 Mbps | 40-50ms |
| Xfinity Mobile | Verizon backbone | 85%+ | Same as Verizon | Same as Verizon |
Test actual coverage and speed at your specific location before committing. Visit carrier stores with your phone model or rent a dedicated hotspot for a week to verify speed. Generic coverage maps are notoriously inaccurate—you may have “5G coverage” that’s actually fallback LTE.
Monitoring Failover Health Over Time
Beyond individual tests, track failover behavior continuously to catch degradation:
# Extended monitoring script for weekly failover health check
#!/bin/bash
LOG_FILE="/var/log/failover-health.log"
# Test primary WAN
echo "Testing primary WAN..." >> $LOG_FILE
PRIMARY_TIME=$(ping -c 5 8.8.8.8 | grep avg | awk '{print $4}' | cut -d'/' -f2)
echo "$(date): Primary WAN latency: ${PRIMARY_TIME}ms" >> $LOG_FILE
# Test secondary WAN
echo "Testing secondary WAN..." >> $LOG_FILE
# Disable primary temporarily
sudo ifconfig wan down
sleep 2
SECONDARY_TIME=$(ping -c 5 8.8.8.8 | grep avg | awk '{print $4}' | cut -d'/' -f2)
echo "$(date): Secondary WAN latency: ${SECONDARY_TIME}ms" >> $LOG_FILE
# Re-enable primary
sudo ifconfig wan up
# Calculate failover time
FAILOVER_START=$(date +%s%N)
sleep 5
FAILOVER_END=$(date +%s%N)
FAILOVER_TIME=$(( ($FAILOVER_END - $FAILOVER_START) / 1000000 ))
echo "$(date): Failover transition time: ${FAILOVER_TIME}ms" >> $LOG_FILE
# Alert if exceeds threshold
if [ $FAILOVER_TIME -gt 30000 ]; then
echo "$(date): WARNING: Failover time exceeds 30 seconds!" >> $LOG_FILE
fi
Run this script weekly and monitor trends. If failover times creep up from 15 seconds to 45 seconds, your configuration has drifted and needs adjustment.
Mosh Configuration for Persistent Remote Sessions
Mosh improves on SSH by maintaining your session through network transitions:
# Install mosh
brew install mosh # macOS
apt install mosh # Linux
# Basic mosh connection
mosh user@server.example.com
# Mosh with custom port (if default ports blocked)
mosh --ssh="ssh -p 2222" user@server.example.com
# Mosh connection profile in ~/.ssh/config
Host production-server
HostName production.example.com
User deployuser
# Mosh automatically detects this and uses it
Mosh is invaluable when your primary ISP fails mid-SSH session. Traditional SSH drops the connection immediately, requiring you to reconnect and re-authenticate. Mosh keeps the session alive, automatically resumes once connectivity is restored, and handles the reconnection transparently.
Application-Level Failover Configuration Template
Document your application’s failover behavior in a configuration file:
# failover-config.yaml - Application failover settings
services:
api_server:
timeout: 5s
retry_count: 3
retry_delay: 2s
fallback_endpoint: https://backup-api.example.com
database:
timeout: 10s
connection_pool_size: 5
failover_mode: active-standby
replica_endpoint: standby.db.example.com
cache:
timeout: 3s
fallback_to_source: true
monitoring:
health_check_interval: 5s
failure_threshold: 3
alert_on_failover: true
Configure your actual services to use this. Many frameworks support configuration files that define reconnection behavior and timeouts automatically.
Practical Failover Checklist
Before deploying your failover setup, verify each component:
- Dual WAN router is configured with both ISPs connected
- Health checks point to reliable endpoints (Google DNS and Cloudflare DNS)
- Failover threshold is set appropriately (3 failures to trigger, 8 successes to recover)
- Secondary ISP has adequate monthly quota (unlimited or 500GB+)
- Mosh is installed on all remote servers you SSH into
- VPN client is configured to use both WAN interfaces
- Database connection pooler is set to reconnect automatically
- Monitoring is active and alerts on failover events
- You’ve tested failover manually and documented actual times
- Team knows that calls may briefly drop during failover
- Backup internet is active and paid (not test account)
Related Reading
- Best Backup Internet Solution for Remote Workers in Countries with Poor Fiber
- Best Portable WiFi Hotspot for Digital Nomads
- Best Ethernet Over Powerline Adapter for Home Office Far from Router
-
Audio Setup for Hybrid Conference Rooms: A Technical Guide