Reply
Emerging Member
Posts: 42
Registered: ‎06-10-2013
Kudos: 9
Accepted Solution

Is there a watchdog?

I'm just wondering if there is a watchdog feature on the router?  It didn't jump out at me in the configuration but I could have missed it.

 

Asking because I was playing around with my EdgeRouter Lite and something happened and it just died.  Had to do a power reset to bring it back up.  Only issue was that I wasn't there and had to have my friend go to my house to power cycle it.

 

I was hoping that if it crashed like that it could detect it and force a reboot.


Accepted Solutions
Previous Employee
Posts: 13,551
Registered: ‎06-10-2011
Kudos: 5471
Solutions: 1656
Contributions: 2

Re: Is there a watchdog?

The CPU does have a hardware watchdog that works with the kernel, so I guess the question is how exactly the router "died"?  Did it stop responding on the serial console?

View solution in original post


All Replies
Previous Employee
Posts: 13,551
Registered: ‎06-10-2011
Kudos: 5471
Solutions: 1656
Contributions: 2

Re: Is there a watchdog?

The CPU does have a hardware watchdog that works with the kernel, so I guess the question is how exactly the router "died"?  Did it stop responding on the serial console?

Emerging Member
Posts: 42
Registered: ‎06-10-2013
Kudos: 9

Re: Is there a watchdog?


@UBNT-ancheng wrote:

The CPU does have a hardware watchdog that works with the kernel, so I guess the question is how exactly the router "died"?  Did it stop responding on the serial console?


Not sure about the serial console.  But it no longer responded to pings.  Now I was trying to setup L2TP/IPsec at the time.  So not sure if I did some crazy config changes that killed it.  It didn't stop responding immediately after I commited.  Seemed to be a minute or two later.  I guess that will teach me to use the commit-confirm command next time.

Previous Employee
Posts: 13,551
Registered: ‎06-10-2011
Kudos: 5471
Solutions: 1656
Contributions: 2

Re: Is there a watchdog?

Yeah the hardware watchdog is only activated if the kernel dies and does not "poke" the watchdog for some time. If it's something like connectivity issue then like you said "commit-confirm" or some other solution may work, e.g., a simple script that restarts the router when ping stops working (though that may be dangerous of course).

New Member
Posts: 20
Registered: ‎09-29-2012
Kudos: 6

Re: Is there a watchdog?

[ Edited ]

I've written a network watchdog script for our routers since we had some issues in the past where a reboot would fix it (back in the old Vyatta days).

 

It's very basic but does what we need it to, we have a cronjob set to run every 5 minutes and have it setup to prevent reboot loops.

 

#!/bin/bash

rchk=`cat /rebootchk`
if [ $rchk = 0 ] ; then
        sudo /bin/ping -c2 <EXTERNAL_IP> > /dev/null 2>&1
        if [ $? -ne 0 ] ; then
                sudo /bin/ping -c2 <IP_FROM_WAN> > /dev/null 2>&1
                if [ $? -ne 0 ] ; then
                        sudo /bin/ping -c2 <IP_FROM_LAN> > /dev/null 2>&1
                        if [ $? -ne 0 ] ; then
                                echo "1" > /rebootchk
                                echo "All pings failed @ `date`" >> /var/log/rchk.log
                                sudo /sbin/reboot
                        else
                                echo "EXTERNAL and WAN didn't ping @ `date`" >> /var/log/rchk.log
                        fi
                else
                        echo "EXTERNAL didn't ping @ `date`" >> /var/log/rchk.log
                fi
        else
                echo "Good @ `date`" >> /var/log/rchk.log
        fi
else
        echo "No action" >> /var/log/rchk.log
fi

 We replaced <EXTERNAL_IP> with google.com, <IP_FROM_WAN> is out datacenter's gateway IP, and <IP_FROM_LAN> is our backup router's IP. As long as it can ping any of those 3 it will not reboot, if it does reboot we have the following line in /etc/rc.local so it executes on boot:

 

echo 1 > /rebootchk

 Here are the cronjobs we run:

*/5 * * * * /root/rchk.sh
*/30 * * * * echo 0 > /rebootchk

 I forget if we needed to install anything to get this to work but if you get errors it'll probably tell you what's missing.

 

Gotta love linux! Man Happy

 

 

Member
Posts: 211
Registered: ‎05-21-2013
Kudos: 265
Solutions: 8
Contributions: 3

Re: Is there a watchdog?

[ Edited ]

I would use logger instead of redirection. Like "logger NO PING". This way messages end up in the same place to any other logs, especially useful if you log to remote location too.

 

Also I usually increase the ping wait time to like -W90 to prevent reboot on temporary glitches. Or even ping several external addresses to make sure we didn't just lose connectivity to specific subnet and something really wrong is going on.

Member
Posts: 211
Registered: ‎05-21-2013
Kudos: 265
Solutions: 8
Contributions: 3

Re: Is there a watchdog?

If there is a hardware watchdog, how it works?

I don't see octeon-wdt module, nor I see the /dev/watchdog file. Is some different watchdog interaction mechanism used?

Previous Employee
Posts: 13,551
Registered: ‎06-10-2011
Kudos: 5471
Solutions: 1656
Contributions: 2

Re: Is there a watchdog?

Actually it's not built as a module (CONFIG_CAVIUM_OCTEON_WATCHDOG=y). By default the kernel pokes the watchdog using interrupts (see /proc/interrupts), and the device file (which we don't create by default) can be used to poke the watchdog from userspace among other things.

Member
Posts: 211
Registered: ‎05-21-2013
Kudos: 265
Solutions: 8
Contributions: 3

Re: Is there a watchdog?

Ah, got it! What is the correct way to create /dev/watchdog to be able to kick the dog from userspace?

 

I'd like to make a userspace kicker wrapper, e.g. for watchdogd.

Previous Employee
Posts: 13,551
Registered: ‎06-10-2011
Kudos: 5471
Solutions: 1656
Contributions: 2

Re: Is there a watchdog?

The watchdog device is generic (misc device with minor 130), not specific to octeon_wdt. You might also want to check out the "watchdog" package in Debian, which already provides similar functionality.

Member
Posts: 211
Registered: ‎05-21-2013
Kudos: 265
Solutions: 8
Contributions: 3

Re: Is there a watchdog?

Good to know. Just found out watchdogd even creates it if it doesn't yet exist, nice.

 

Does the kernel reset the timer undonditionally though? I.e. will watchdogd be able keep the timer from being reset by the kernel if userspace tests fail?

Previous Employee
Posts: 13,551
Registered: ‎06-10-2011
Kudos: 5471
Solutions: 1656
Contributions: 2

Re: Is there a watchdog?

The kernel should stop poking if the device is opened. You can look at the driver source files for more specific details of course.

Member
Posts: 211
Registered: ‎05-21-2013
Kudos: 265
Solutions: 8
Contributions: 3

Re: Is there a watchdog?

Indeed I should. (:

 

Can everything I need be found in any watchdog driver such as octeon_wdt, or it's split into hardware specific part and generic watchdog infrastructure?

Previous Employee
Posts: 13,551
Registered: ‎06-10-2011
Kudos: 5471
Solutions: 1656
Contributions: 2

Re: Is there a watchdog?


@dmbaturin wrote:

Can everything I need be found in any watchdog driver such as octeon_wdt, or it's split into hardware specific part and generic watchdog infrastructure?


That sort of depends on what "everything you need" is? Icon Smile The normal operation is just file open/write/close, but if you need ioctl then the support will vary of course.

Member
Posts: 211
Registered: ‎05-21-2013
Kudos: 265
Solutions: 8
Contributions: 3

Re: Is there a watchdog?

Oh, I meant just is the watchdog driver self-contained or the octeon_wdt contains just functions needed eo e.g. reset the hardware timer on specific platform and such while e.g. /dev/watchdog operations are in some common file. I've read it and found out that it's self-contained, thanks.

 

I started working on a watchdog daemon CLI wrapper, by the way. Here's the first draft, not yet functional: https://github.com/SO3Group/vyatta-watchdog I hope it will work as expected soon.

Member
Posts: 211
Registered: ‎05-21-2013
Kudos: 265
Solutions: 8
Contributions: 3

Re: Is there a watchdog?

So I could make the watchdog package functional.

 

You can download the package from http://baturin.org/files/vyatta/packages/vyatta-watchdog_1.3_all.deb or review the source at the link from post above. Commit and release history is a total mess—sorry for that, I should have gotten more sleep before doing it.

 

I added some documentation to the README.md so you can see it right at the github link.

 

I verified it basically works, but I didn't test it thoroughly, so backup your config first and beware of bugs.

Highlighted
New Member
Posts: 14
Registered: ‎10-30-2013
Kudos: 3
Reply