Reply
Emerging Member
Posts: 60
Registered: ‎01-03-2014
Kudos: 19
Solutions: 4

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

"Upgraded" my ERLite from 1.10 to 1.10.1, it's a 1LAN 2WAN setup with load balancing. After the "upgrade" I had lost my internet connection and with no time to debug this I just reverted back to 1.10.

Member
Posts: 218
Registered: ‎02-12-2013
Kudos: 69
Solutions: 18

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

@starckIT @PVKK
Can you have a look at /var/log/messages - but before you reboot, since the log file gets deleted on reboot.
New Member
Posts: 8
Registered: ‎03-12-2017
Kudos: 1

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

After updating my firmware from 1.10 to 1.10.1 on an ER-X running a 1LAN 2WAN cfg (although no 2nd WAN in use) the gateway routing stops daily requiring a router reboot--whereas I had run a month on 1.10 with no reboots needed.  This is a very simple config with no DHCP services in use.

 

I have just rolled back to 1.10 on the ER-X, but another update on an ERL3 with a much more complex configuration over the weekend seems to be working stable so far (similar cfg with 2WAN & no DHCP, but various active port forwards and an active VPN on that one).

Member
Posts: 218
Registered: ‎02-12-2013
Kudos: 69
Solutions: 18

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

@LsiCorp I have 12 places on ER-X with a configuration that had eth0 as primary connection and eth1 as fail-over (most places don't even have anything connected to eth1). Upon upgrading to 1.10.1, I lost 3 places. Now I've removed the load balancing config, and haven't had issues since.
Seems like there's something fishy with the LB in this version and it isn't failing every time, which is probably why Ubnt didn't catch it in their lab.
New Member
Posts: 2
Registered: ‎06-27-2017

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

Getting some things in my System Log Monitor since upgrading from 1.9.7hotfix 1 to 1.10.1.

 

ubnt kernel: Process 26230 (ping) has crashed (parent 1771 (exe) signal 11, code 128, addr (nil)), coredumps disabled

 

ubnt kernel: Process 20411 (ping) has crashed (parent 1771 (exe) signal 11, code 128, addr (nil)), coredumps disabled

 

ubnt kernel: Process 20993 (ping) has crashed (parent 1771 (exe) signal 11, code 128, addr (nil)), coredumps disabled

 

ubnt kernel: Process 18422 (ping) has crashed (parent 1771 (exe) signal 11, code 128, addr (nil)), coredumps disabled

 

Should I be worried?

Ubiquiti Employee
Posts: 1,021
Registered: ‎07-20-2015
Kudos: 949
Solutions: 71

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

@king_of_hearts:

ubnt kernel: Process 26230 (ping) has crashed (parent 1771 (exe) signal 11, code 128, addr (nil)), coredumps disabled

ubnt kernel: Process 20411 (ping) has crashed (parent 1771 (exe) signal 11, code 128, addr (nil)), coredumps disabled

ubnt kernel: Process 20993 (ping) has crashed (parent 1771 (exe) signal 11, code 128, addr (nil)), coredumps disabled

ubnt kernel: Process 18422 (ping) has crashed (parent 1771 (exe) signal 11, code 128, addr (nil)), coredumps disabled

I've never seen ping crash because of "Segmentation Fault" before. I woder what is this 1771 parent? Can you please show output of "ps faux" shell command

Ubiquiti Employee
Posts: 1,021
Registered: ‎07-20-2015
Kudos: 949
Solutions: 71

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

I've summarized LoadBalancing failure symptoms in 1.10.1:

1) @tubstr
> I am experiancing problems after upgrading from 1.9.7-hf4 to 1.10.1 with address groups in my load balanced with failover setup.
> Did factory reset and cleaned out the .gz backup file from all junk and uploaded it. At the present all is working fine. Something must have been altered while upgrading by the migration process.

 

2) @starckIT
> I've upgraded several edgemax routers Er8pro,
> but when configured with the wizard and doing loadbalancing for 2 WAN interfaces. The Loadbalancing doesnt work anymore.
> I have to check into it but from my experience, it's only opening the secondary WAN interface to the Inet for the local users.
> I was just at the customers office, after a reboot it started working again. So initially, when the devices was upgraded and rebooted, it didn't work. The customer did unplug second, fail-over internet connection, but it didn't switch.

 

3) @PVKK
> load balancing across my 3 WAN configuration got lost and my total bandwidth is flowing through only one WAN at a time
> If I unplug, one WAN the other WAN starts working, which is not the case before the upgrade

 

4) @chjohans
> "Upgraded" my ERLite from 1.10 to 1.10.1, it's a 1LAN 2WAN setup with load balancing.
> After the "upgrade" I had lost my internet connection and with no time to debug this I just reverted back to 1.10.

 

5) @LsiCorp
> After updating my firmware from 1.10 to 1.10.1 on an ER-X running a 1LAN 2WAN cfg (although no 2nd WAN in use) the gateway routing stops daily requiring a router reboot--whereas I had run a month on 1.10 with no reboots needed.
> This is a very simple config with no DHCP services in use.

 

6) @flamber
> I have 12 places on ER-X with a configuration that had eth0 as primary connection and eth1 as fail-over
> (most places don't even have anything connected to eth1).
> Upon upgrading to 1.10.1, I lost 3 places.
> Now I've removed the load balancing config, and haven't had issues since.
> Seems like there's something fishy with the LB in this version and it isn't failing every time, which is probably why Ubnt didn't catch it in their lab.

 

Unfortunately I was not able to reproduce any of the above mentioned error. I tested different LB combinations but they all work fine on my test router.

 

I'm stuck and can not understand how to reproduce LB failure. Could you please trigger errorneous situation once again and provide following data:

  1. Briefly explain what how do you expect LoadBalancing to work
  2. Briefly explain what is wrong with LoadBalancing and symptoms of the failure
  3. show configuration commands |grep "load-balance"
  4. show load-balance status
  5. show load-balance watchdog
  6. show interfaces
  7. ip route
  8. ip route show table 201
  9. ip route show table 202

 

New Member
Posts: 2
Registered: ‎06-27-2017

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

@UBNT-afomins

 

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         2  0.0  0.0      0     0 ?        S    Mar25   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Mar25   0:54  \_ [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [kworker/0:0H]
root         6  0.0  0.0      0     0 ?        S    Mar25   0:05  \_ [kworker/u4:0]
root         7  0.0  0.0      0     0 ?        S    Mar25   0:10  \_ [migration/0]
root         8  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [rcu_bh]
root         9  0.0  0.0      0     0 ?        S    Mar25   2:16  \_ [rcu_sched]
root        10  0.1  0.0      0     0 ?        S    Mar25   4:37  \_ [rcuc/0]
root        11  0.0  0.0      0     0 ?        S    Mar25   0:04  \_ [watchdog/0]
root        12  0.0  0.0      0     0 ?        S    Mar25   0:04  \_ [watchdog/1]
root        13  0.0  0.0      0     0 ?        S    Mar25   3:22  \_ [rcuc/1]
root        14  0.0  0.0      0     0 ?        S    Mar25   0:02  \_ [migration/1]
root        15  0.0  0.0      0     0 ?        S    Mar25   0:11  \_ [ksoftirqd/1]
root        16  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [kworker/1:0]
root        17  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [kworker/1:0H]
root        18  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [khelper]
root        19  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [netns]
root       104  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [writeback]
root       107  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [bioset]
root       108  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [crypto]
root       110  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [kblockd]
root       118  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [khubd]
root       165  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [khungtaskd]
root       166  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [kswapd0]
root       167  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [fsnotify_mark]
root       168  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [unionfs_siod]
root       238  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [deferwq]
root       239  0.0  0.0      0     0 ?        S    Mar25   1:01  \_ [kworker/0:1]
root       242  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [scsi_eh_0]
root       243  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [usb-storage]
root       246  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [kworker/0:2]
root       256  0.0  0.0      0     0 ?        S<   Mar25   0:02  \_ [kworker/0:1H]
root       257  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [kjournald]
root       261  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [loop8]
root       276  0.0  0.0      0     0 ?        S<   Mar25   0:02  \_ [kworker/1:1H]
root       336  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [octeon-ethernet]
root       375  0.0  0.0      0     0 ?        S<   Mar25   0:00  \_ [ipv6_addrconf]
root       460  0.0  0.0      0     0 ?        S    Mar25   0:00  \_ [irq/117-octeon-]
root     27782  0.0  0.0      0     0 ?        S    06:13   0:01  \_ [kworker/u4:2]
root      4149  0.0  0.0      0     0 ?        S    10:22   0:01  \_ [kworker/1:2]
root         1  0.0  0.1   2572   764 ?        Ss   Mar25   0:08 init [2]                         
root       537  0.0  0.0   1952   280 ?        Ss   Mar25   2:08 /usr/sbin/rngd
daemon     547  0.0  0.0   2700   328 ?        Ss   Mar25   0:00 /usr/sbin/atd
root       554  0.0  0.1   2972   936 ?        Ss   Mar25   0:01 /usr/sbin/cron
root       561  0.0  0.2   4752  1332 ?        Ss   Mar25   0:21 /usr/sbin/ubnt-infctld
root       569  0.0  0.0   2760   312 ?        Ss   Mar25   0:12 /usr/sbin/ubnt-daemon
root       571  0.0  0.5  18384  2616 ?        Sl   Mar25   0:32  \_ /opt/vyatta/sbin/ubnt-cfgd
root       690  0.0  0.3  18384  1928 ?        S    Mar25   0:00  |   \_ /opt/vyatta/sbin/ubnt-cfgd
root      1310  0.0  0.3  18384  1972 ?        S    Mar25   0:00  |   \_ /opt/vyatta/sbin/ubnt-cfgd
root      2098  0.0  0.4  18384  2152 ?        S    Mar25   0:00  |   \_ /opt/vyatta/sbin/ubnt-cfgd
root      2114  0.0  0.3  18384  1656 ?        S    Mar25   0:00  |   \_ /opt/vyatta/sbin/ubnt-cfgd
root       655  4.9  3.0 162636 15264 ?        Sl   Mar25 219:59  \_ /usr/sbin/ubnt-util -f
root       727  0.4  0.3  14424  1568 ?        Sl   Mar25  20:09 /usr/bin/monit -c /etc/monit/monitrc
root       768  0.0  0.8   9240  4012 ?        Ss   Mar25   0:54 /usr/sbin/imi -d
root       774  0.0  0.6   9208  3236 ?        Ss   Mar25   0:53 /usr/sbin/nsm -d -P 0
root       789  0.0  0.5   7908  2592 ?        Ss   Mar25   1:15 /usr/sbin/ribd -d -P 0
root      1037  0.0  0.3  29560  1584 ?        Sl   Mar25   0:11 /usr/sbin/rsyslogd -c5
ntp       1720  0.0  0.4   6816  2124 ?        Ss   Mar25   1:09 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 102:101
root      1765  0.0  0.2   5088  1232 ?        S    Mar25   4:16 /usr/bin/udapi-bridge --watchdog
root     29786  0.4  0.5   5352  2772 ?        S    07:04   1:19  \_ /usr/bin/udapi-bridge --watchdog
root      6482  0.0  0.1   2116   540 ?        S    11:35   0:00      \_ ping -n -c 1 -W 1 192.168.1.112
root      2140  0.0  0.0   1960   484 ?        Ss   Mar25   0:00 /sbin/netplugd -P -p /var/run/netplugd.pid
root      2176  0.0  0.1   2536   768 ttyS0    Ss+  Mar25   0:00 /sbin/getty -L ttyS0 115200 vt100
root      2222  0.0  0.1   2748   852 ?        Ss   Mar25   0:01 /sbin/dhclient -q -nw -cf /var/run/dhclient_eth0.conf -pf /var/run/dhclient_eth0.pid -lf /var/run/dhclient_eth0.leases eth0
root      2241  0.0  0.0   2256   468 ?        S    Mar25   0:00 /usr/sbin/telnetd -p 55523 -b 127.0.0.101 -F
www-data  2266  0.0  1.2  10608  6400 ?        S    Mar25   0:51 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
www-data  2267  0.0  2.8  62264 14076 ?        Sl   Mar25   2:34  \_ python /var/www/python/gui.py
root      2270  0.0  0.1   2380   716 ?        Ss   Mar25   1:28 /usr/sbin/miniupnpd -f /opt/vyatta/etc/miniupnpd.conf -P /var/run/miniupnpd.pid
root      2346  0.0  0.1   8032   984 ?        Ss   Mar25   0:00 /usr/sbin/sshd -p 22 -o Protocol=2
root      6439  1.6  0.6  11844  3328 ?        Ss   11:34   0:00  \_ sshd: king_of_hearts [priv]          
1000      6453  0.0  0.3  11844  1808 ?        S    11:34   0:00      \_ sshd: king_of_hearts@pts/0           
1000      6454  0.4  0.4   4192  2060 pts/0    Ss+  11:34   0:00          \_ -vbash
1000      6483  0.0  0.2   3244  1052 pts/0    R+   11:35   0:00              \_ ps faux
dnsmasq  30035  0.0  0.2   5304  1168 ?        S    07:06   0:00 /usr/sbin/dnsmasq -x /run/dnsmasq/dnsmasq.pid -u dnsmasq -7 /etc/dnsmasq.d,.dpkg-dist,.dpkg-old,.dpkg-new --local-service

 

New Member
Posts: 2
Registered: ‎11-03-2016

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

I've update my ER-4 router but now i've no internet connection anymore and i cant login to the router anymore with the ip adres. I've no cable for the cosole. A hard resst didnt help at all

Ubiquiti Employee
Posts: 1,021
Registered: ‎07-20-2015
Kudos: 949
Solutions: 71

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

@Boesh

> I've update my ER-4 router but now i've no internet connection anymore and i cant login to the router anymore with the ip adres.

> I've no cable for the cosole.

You can try 2 options:

  1. Do factory reset and connect to router via eth0 192.168.1.1
  2. Conenct to router via ssh-recovery -> https://help.ubnt.com/hc/en-us/articles/360002231073-EdgeRouter-How-to-Use-SSH-Recovery-

 

Q:

  1. What firmware was installed on ER-4 before upgrade?
  2. Are LED light blinking when you plug ethernet cable?
Senior Member
Posts: 3,657
Registered: ‎05-15-2014
Kudos: 1281
Solutions: 256

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

@UBNT-afomins For what it's worth, load-balancing fail-over in 1.10.1 is working fine on all our LB configured routers. We run fail-over mode only, we do not have active-active however.

New Member
Posts: 2
Registered: ‎11-03-2016

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

Well after the 5th time that I did a reset it worked.

 

The previous software was 1.9.8 and all the led light where on where a cable was put in.

 

Thank for the help anyway

New Member
Posts: 10
Registered: ‎09-06-2016

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

After few days, squid3 seems to crash too often.

Today, it took only 7h before crashing :

Mar 29 16:51:57	ubnt squid[3433]: Squid Parent: (squid-1) process 6260 started
Mar 29 16:51:54	ubnt squid[3433]: Squid Parent: (squid-1) process 5509 exited due to signal 11 with status 0
Mar 29 16:51:54	ubnt kernel: Process 5509 (squid3) has crashed (parent 3433 (squid3) signal 11, code 196609, addr 00000000072f244d), coredumps disabled
Mar 29 16:33:30	ubnt squid[3433]: Squid Parent: (squid-1) process 5509 started
Mar 29 16:33:27	ubnt squid[3433]: Squid Parent: (squid-1) process 5444 exited due to signal 11 with status 0
Mar 29 16:33:27	ubnt kernel: Process 5444 (squid3) has crashed (parent 3433 (squid3) signal 11, code 196609, addr 0000000015e6544d), coredumps disabled
Mar 29 16:32:33	ubnt squid[3433]: Squid Parent: (squid-1) process 5444 started
Mar 29 16:32:30	ubnt squid[3433]: Squid Parent: (squid-1) process 5388 exited due to signal 10 with status 0
Mar 29 16:32:30	ubnt kernel: Process 5388 (squid3) has crashed (parent 3433 (squid3) signal 10, code 128, addr (nil)), coredumps disabled

Note that after that, I was unable to SSH or login into the web GUI.

New Member
Posts: 14
Registered: ‎02-21-2018

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

Hi @UBNT-afomins

 

I am also having issues with LB in 1.10.1, where my connection on eth1 fails several times per day (for 30s only), and sometimes it does not come back. I manage to restore the connection using the following command:

sudo kill -kill `pidof ubnt-util`

I did a fresh instalation after my router got bricked, so there is no previous version or restored backup.

 

Here is the state of the router during the crash. Please note that eth1 was considered DOWN for more than an hour:

admin@192.168.1.1's password:
Linux ubnt 3.10.107-UBNT #1 SMP Mon Mar 5 18:53:35 UTC 2018 mips
Welcome to EdgeOS
Last login: Thu Mar 29 10:27:45 2018 from xxxdell.local
admin@ubnt:~$ show load-balance watchdog
Group G
  eth0
  status: Running
  pings: 3961
  fails: 1
  run fails: 0/3
  route drops: 0
  ping gateway: 8.8.8.8 - REACHABLE

  eth1
  status: Waiting on recovery (0/3)
  pings: 1058
  fails: 3
  run fails: 3/3
  route drops: 2
  ping gateway: 200.160.6.220 - DOWN
  last route drop   : Thu Mar 29 16:22:27 2018
  last route recover: Thu Mar 29 14:53:20 2018

admin@ubnt:~$ sudo kill -kill `pidof ubnt-util`
  1. show configuration commands |grep "load-balance"
    admin@ubnt:~$ show configuration commands |grep "load-balance"
    set load-balance group G interface eth0 route-test initial-delay 15
    set load-balance group G interface eth0 route-test interval 5
    set load-balance group G interface eth0 route-test type ping target 8.8.8.8
    set load-balance group G interface eth1 route-test initial-delay 15
    set load-balance group G interface eth1 route-test interval 5
    set load-balance group G interface eth1 route-test type ping target 200.160.6.22    0
    set load-balance group G lb-local enable
    set load-balance group G lb-local-metric-change disable
  2. show load-balance status
    Group G
      interface   : eth0
      carrier     : up
      status      : active
      gateway     : 192.168.0.1
      route table : 201
      weight      : 50%
      flows
          WAN Out : 454
          WAN In  : 0
        Local Out : 1
    
      interface   : eth1
      carrier     : up
      status      : active
      gateway     : 192.168.15.1
      route table : 202
      weight      : 50%
      flows
          WAN Out : 450
          WAN In  : 0
        Local Out : 3
  3. show load-balance watchdog
    admin@ubnt:~$ show load-balance watchdog
    Group G
      eth0
      status: Running
      pings: 176
      fails: 0
      run fails: 0/3
      route drops: 0
      ping gateway: 8.8.8.8 - REACHABLE
    
      eth1
      status: Running
      pings: 176
      fails: 0
      run fails: 0/3
      route drops: 0
      ping gateway: 200.160.6.220 - REACHABLE
  4. show interfaces
    admin@ubnt:~$ show interfaces
    Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down
    Interface    IP Address                        S/L  Description
    ---------    ----------                        ---  -----------
    eth0         192.168.0.142/24                  u/u  WAN
    eth1         192.168.15.2/24                   u/u  WAN 2
    eth2         -                                 u/u
    eth3         -                                 u/D
    eth4         -                                 u/D
    lo           127.0.0.1/8                       u/u
                 ::1/128
    switch0      192.168.1.1/24                    u/u  Local
  5. ip route
    admin@ubnt:~$ ip route
    default  proto zebra
            nexthop via 192.168.15.1  dev eth1 weight 1
            nexthop via 192.168.0.1  dev eth0 weight 1
    192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.142
    192.168.1.0/24 dev switch0  proto kernel  scope link  src 192.168.1.1
    192.168.15.0/24 dev eth1  proto kernel  scope link  src 192.168.15.2
  6. ip route show table 201
    admin@ubnt:~$ ip route show table 201
    default via 192.168.0.1 dev eth0
    blackhole default  metric 256
    127.0.0.0/8 dev lo  scope link
    192.168.0.0/24 dev eth0  scope link
    192.168.1.0/24 dev switch0  scope link
    192.168.15.0/24 dev eth1  scope link
  7. ip route show table 202
    admin@ubnt:~$ ip route show table 202
    default via 192.168.15.1 dev eth1
    blackhole default  metric 256
    127.0.0.0/8 dev lo  scope link
    192.168.0.0/24 dev eth0  scope link
    192.168.1.0/24 dev switch0  scope link
    192.168.15.0/24 dev eth1  scope link

Here is part of /var/log/messages just in case

Mar 29 16:22:14 ubnt ntpd[6535]: ntpd exiting on signal 15
Mar 29 16:22:16 ubnt ntpd[23665]: ntpd 4.2.6p2@1.2194-o Mon Mar  5 17:31:26 UTC 2018 (1)
Mar 29 16:22:16 ubnt ntpd[23666]: proto: precision = 40.443 usec
Mar 29 16:22:17 ubnt ntpd[23666]: ntpd exiting on signal 15
Mar 29 16:22:19 ubnt ntpd[23832]: ntpd 4.2.6p2@1.2194-o Mon Mar  5 17:31:26 UTC 2018 (1)
Mar 29 16:22:19 ubnt ntpd[23833]: proto: precision = 40.452 usec
Mar 29 16:22:27 ubnt wlb: wlb-G-eth1 wlb-G-eth1 reachability failed, failover
Mar 29 16:22:27 ubnt wlb: group G, interface eth1 going Inactive
Mar 29 17:15:39 ubnt wlb: wlb-G-eth0 Starting wlb watchdog on wlb-G-eth0 after 15s delay
Mar 29 17:15:39 ubnt wlb: wlb-G-eth1 Starting wlb watchdog on wlb-G-eth1 after 15s delay
Mar 29 17:15:39 ubnt wlb: group G, interface eth1 going Active

 

Hope it helps

 

Highlighted
Member
Posts: 218
Registered: ‎02-12-2013
Kudos: 69
Solutions: 18

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

Hi @UBNT-afomins

 

I think I've found the trigger for making the load balancer fail.

 

My test setup is with DHCP WAN on eth0 as primary, nothing in eth1 but it's set as fail-over (since I don't have a second connection here).

 

1. Power on ISP and ER-X (this is just to simulate long power outages)

2. Wait for ER-X to boot completely (about a minute after "ubnt-service-gui: starting the GUI service.")

3. Unplug the eth0 cable, wait for 3 seconds, plug it back in. This can also be triggered by the ISP router, since doing configurations, somehow restarts it's interfaces (some of my ISPs uses Technicolor cable modems, lucky me)

 

But but but. In step 3, if I unplug the cable for - let's say 1 minute - then it comes online.

 

Seems like it's a race-condition that kills it.

 

Just did a couple of tests and can reproduce everytime.

Spoiler
$ show configuration commands | grep "load-balance"
set load-balance group G interface eth0
set load-balance group G interface eth1 failover-only
set load-balance group G lb-local enable
set load-balance group G lb-local-metric-change disable
set load-balance group G transition-script /config/scripts/conntrack_flush.sh
$ show load-balance status
Group G
  interface   : eth0
  carrier     : up
  status      : inactive
  gateway     : 192.168.1.254
  route table : 201
  weight      : 0%
  flows
      WAN Out : 0
      WAN In  : 0
    Local Out : 21

  interface   : eth1
  carrier     : down
  status      : failover
  gateway     : unknown
  route table : 202
  weight      : 0%
  flows
      WAN Out : 57
      WAN In  : 0
    Local Out : 17
$ show load-balance watchdog
Group G
  eth0
  status: Waiting on recovery (0/3)
  pings: 29
  fails: 3
  run fails: 3/3
  route drops: 1
  ping gateway: ping.ubnt.com - DOWN
  last route drop   : Thu Mar 29 18:41:58 2018

  eth1
  status: Waiting on recovery (0/3)
  failover-only mode
  pings: 1
  fails: 1
  run fails: 3/3
  route drops: 1
  ping gateway: ping.ubnt.com - DOWN
  last route drop   : Thu Mar 29 18:37:14 2018
$ show interfaces
Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down
Interface    IP Address                        S/L  Description                 
---------    ----------                        ---  -----------                 
eth0         192.168.1.62/24                   u/u  WAN                         
eth1         -                                 u/D  WAN 2                       
eth2         -                                 u/D                              
eth3         -                                 u/u                              
eth4         -                                 u/D                              
ifb0         -                                 u/u                              
ifb1         -                                 u/u                              
lo           127.0.0.1/8                       u/u                              
             ::1/128                          
switch0      192.168.5.1/24                    u/u  Local
$ ip route
default via 192.168.1.254 dev eth0  proto zebra 
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.62 
192.168.5.0/24 dev switch0  proto kernel  scope link  src 192.168.5.1 
$ ip route show table 201
blackhole default  metric 256 
127.0.0.0/8 dev lo  scope link 
192.168.1.0/24 dev eth0  scope link 
192.168.5.0/24 dev switch0  scope link 
$ ip route show table 202
blackhole default  metric 256 
127.0.0.0/8 dev lo  scope link 
192.168.1.0/24 dev eth0  scope link 
192.168.5.0/24 dev switch0  scope link 
$ cat /var/log/messages
Mar 29 17:16:12 erx-fw-cXX rsyslogd: set SCM_CREDENTIALS failed on '/dev/log': Protocol not available
Mar 29 17:16:12 erx-fw-cXX kernel: Linux version 3.10.107-UBNT (root@e7030944ec4e) (gcc version 4.6.3 (Buildroot 2012.11.1) ) #1 SMP Mon Mar 5 18:53:35 UTC 2018
Mar 29 17:16:12 erx-fw-cXX kernel: 
Mar 29 17:16:12 erx-fw-cXX kernel: The CPU feqenuce set to 880 MHz
Mar 29 17:16:12 erx-fw-cXX kernel: GCMP present
Mar 29 17:16:12 erx-fw-cXX kernel: Zone ranges:
Mar 29 17:16:12 erx-fw-cXX kernel:  Normal   [mem 0x00000000-0x0fffffff]
Mar 29 17:16:12 erx-fw-cXX kernel:  HighMem  empty
Mar 29 17:16:12 erx-fw-cXX kernel: Movable zone start for each node
Mar 29 17:16:12 erx-fw-cXX kernel: Early memory node ranges
Mar 29 17:16:12 erx-fw-cXX kernel:  node   0: [mem 0x00000000-0x0fffffff]
Mar 29 17:16:12 erx-fw-cXX kernel: Primary instruction cache 32kB, 4-way, VIPT, linesize 32 bytes.
Mar 29 17:16:12 erx-fw-cXX kernel: Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
Mar 29 17:16:12 erx-fw-cXX kernel: MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
Mar 29 17:16:12 erx-fw-cXX kernel: Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65024
Mar 29 17:16:12 erx-fw-cXX kernel: Kernel command line: console=ttyS1,57600n8 ubi.mtd=7 root=ubi0_0 rootfstype=ubifs rootsqimg=squashfs.img rootsqwdir=w rw
Mar 29 17:16:12 erx-fw-cXX kernel: launch: starting cpu1
Mar 29 17:16:12 erx-fw-cXX kernel: launch: cpu1 gone!
Mar 29 17:16:12 erx-fw-cXX kernel: Primary instruction cache 32kB, 4-way, VIPT, linesize 32 bytes.
Mar 29 17:16:12 erx-fw-cXX kernel: Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
Mar 29 17:16:12 erx-fw-cXX kernel: MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
Mar 29 17:16:12 erx-fw-cXX kernel: launch: starting cpu2
Mar 29 17:16:12 erx-fw-cXX kernel: launch: cpu2 gone!
Mar 29 17:16:12 erx-fw-cXX kernel: Primary instruction cache 32kB, 4-way, VIPT, linesize 32 bytes.
Mar 29 17:16:12 erx-fw-cXX kernel: Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
Mar 29 17:16:12 erx-fw-cXX kernel: MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
Mar 29 17:16:12 erx-fw-cXX kernel: launch: starting cpu3
Mar 29 17:16:12 erx-fw-cXX kernel: launch: cpu3 gone!
Mar 29 17:16:12 erx-fw-cXX kernel: Primary instruction cache 32kB, 4-way, VIPT, linesize 32 bytes.
Mar 29 17:16:12 erx-fw-cXX kernel: Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
Mar 29 17:16:12 erx-fw-cXX kernel: MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
Mar 29 17:16:12 erx-fw-cXX kernel: 4 CPUs re-calibrate udelay(lpj = 1167360)
Mar 29 17:16:12 erx-fw-cXX kernel: Ralink gpio driver initialized
Mar 29 17:16:12 erx-fw-cXX kernel: i2cdrv_major = 218
Mar 29 17:16:12 erx-fw-cXX kernel: flash manufacture id: 1c, device id 70 15
Mar 29 17:16:12 erx-fw-cXX kernel: EN25QH16(1c 70151c70) (2048 Kbytes)
Mar 29 17:16:12 erx-fw-cXX kernel: mtd .name = raspi, .size = 0x00200000 (2M) .erasesize = 0x00010000 (64K) .numeraseregions = 0
Mar 29 17:16:12 erx-fw-cXX kernel: Creating 1 MTD partitions on "raspi":
Mar 29 17:16:12 erx-fw-cXX kernel: 0x000000000000-0x000000080000 : "SPI_FLASH"
Mar 29 17:16:12 erx-fw-cXX kernel: MediaTek Nand driver init, version v2.1 Fix AHB virt2phys error
Mar 29 17:16:12 erx-fw-cXX kernel: Enable NFI Clock
Mar 29 17:16:12 erx-fw-cXX kernel: # MTK NAND # : Use HW ECC
Mar 29 17:16:12 erx-fw-cXX kernel: NAND ID [01 DA 90 95 46, 00909546]
Mar 29 17:16:12 erx-fw-cXX kernel: Support this Device in MTK table! 1da 
Mar 29 17:16:12 erx-fw-cXX kernel: [NAND]select ecc bit:12, sparesize :112 spare_per_sector=28
Mar 29 17:16:12 erx-fw-cXX kernel: Signature matched and data read!
Mar 29 17:16:12 erx-fw-cXX kernel: load_fact_bbt success 2047
Mar 29 17:16:12 erx-fw-cXX kernel: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
Mar 29 17:16:12 erx-fw-cXX kernel: last message repeated 15 times
Mar 29 17:16:12 erx-fw-cXX kernel: Creating 7 MTD partitions on "MT7621-NAND":
Mar 29 17:16:12 erx-fw-cXX kernel: 0x000000000000-0x00000ff80000 : "ALL"
Mar 29 17:16:12 erx-fw-cXX kernel: 0x000000000000-0x000000080000 : "Bootloader"
Mar 29 17:16:12 erx-fw-cXX kernel: 0x000000080000-0x0000000e0000 : "Config"
Mar 29 17:16:12 erx-fw-cXX kernel: 0x0000000e0000-0x000000140000 : "eeprom"
Mar 29 17:16:12 erx-fw-cXX kernel: 0x000000140000-0x000000440000 : "Kernel"
Mar 29 17:16:12 erx-fw-cXX kernel: 0x000000440000-0x000000740000 : "Kernel2"
Mar 29 17:16:12 erx-fw-cXX kernel: 0x000000740000-0x00000ff00000 : "RootFS"
Mar 29 17:16:12 erx-fw-cXX kernel: [mtk_nand] probe successfully!
Mar 29 17:16:12 erx-fw-cXX kernel: UBNT BD mac 44d9e7f5acbd kidx 0 mrev 18 serial 44D9E7F5ACBD type e50
Mar 29 17:16:12 erx-fw-cXX kernel: rdm_major = 253
Mar 29 17:16:12 erx-fw-cXX kernel: GMAC1_MAC_ADRH -- : 0x000044d9
Mar 29 17:16:12 erx-fw-cXX kernel: GMAC1_MAC_ADRL -- : 0xe7f5acbd
Mar 29 17:16:12 erx-fw-cXX kernel: Ralink APSoC Ethernet Driver Initilization. v3.1  512 rx/tx descriptors allocated, mtu = 1500!
Mar 29 17:16:12 erx-fw-cXX kernel: GMAC1_MAC_ADRH -- : 0x000044d9
Mar 29 17:16:12 erx-fw-cXX kernel: GMAC1_MAC_ADRL -- : 0xe7f5acbd
Mar 29 17:16:12 erx-fw-cXX kernel: PROC INIT OK!
Mar 29 17:16:12 erx-fw-cXX kernel: Ralink I2C Init
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: attaching mtd7 to ubi0
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: scanning is finished
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: attached mtd7 (name "RootFS", size 247 MiB) to ubi0
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: VID header offset: 2048 (aligned 2048), data offset: 4096
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: good PEBs: 1982, bad PEBs: 0, corrupted PEBs: 0
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: user volume: 1, internal volumes: 1, max. volumes count: 128
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: max/mean erase counter: 29/7, WL threshold: 4096, image sequence number: 1298117041
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: available PEBs: 0, total reserved PEBs: 1982, PEBs reserved for bad PEB handling: 40
Mar 29 17:16:12 erx-fw-cXX kernel: UBI: background thread "ubi_bgt0d" started, PID 54
Mar 29 17:16:12 erx-fw-cXX kernel: UBIFS: background thread "ubifs_bgt0_0" started, PID 55
Mar 29 17:16:12 erx-fw-cXX kernel: UBIFS: recovery needed
Mar 29 17:16:12 erx-fw-cXX kernel: UBIFS: recovery completed
Mar 29 17:16:12 erx-fw-cXX kernel: UBIFS: mounted UBI device 0, volume 0, name "troot"
Mar 29 17:16:12 erx-fw-cXX kernel: UBIFS: LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
Mar 29 17:16:12 erx-fw-cXX kernel: UBIFS: FS size: 244428800 bytes (233 MiB, 1925 LEBs), journal size 12189696 bytes (11 MiB, 96 LEBs)
Mar 29 17:16:12 erx-fw-cXX kernel: UBIFS: reserved for root: 4952683 bytes (4836 KiB)
Mar 29 17:16:12 erx-fw-cXX kernel: UBIFS: media format: w4/r0 (latest is w4/r0), UUID C15A1865-84C1-4964-8693-FCE7129760D1, small LPT model
Mar 29 17:16:12 erx-fw-cXX kernel: Algorithmics/MIPS FPU Emulator v1.5
Mar 29 17:16:12 erx-fw-cXX kernel: ubnt_platform: module license 'Proprietary' taints kernel.
Mar 29 17:16:12 erx-fw-cXX kernel: Disabling lock debugging due to kernel taint
Mar 29 17:16:12 erx-fw-cXX kernel: Registering char device flash0 (200) succeeds
Mar 29 17:16:12 erx-fw-cXX kernel: Raeth v3.1 (Tasklet)
Mar 29 17:16:12 erx-fw-cXX kernel: 
Mar 29 17:16:12 erx-fw-cXX kernel: phy_tx_ring = 0x0bf84000, tx_ring = 0xabf84000
Mar 29 17:16:12 erx-fw-cXX kernel: 
Mar 29 17:16:12 erx-fw-cXX kernel: phy_rx_ring0 = 0x0bf86000, rx_ring0 = 0xabf86000
Mar 29 17:16:12 erx-fw-cXX kernel: change HW-TRAP to 0x17c8f
Mar 29 17:16:12 erx-fw-cXX kernel: GMAC1_MAC_ADRH -- : 0x000044d9
Mar 29 17:16:12 erx-fw-cXX kernel: GMAC1_MAC_ADRL -- : 0xe7f5acbd
Mar 29 17:16:12 erx-fw-cXX kernel: eth0: ===> VirtualIF_open
Mar 29 17:16:12 erx-fw-cXX kernel: eth1: ===> VirtualIF_open
Mar 29 17:16:12 erx-fw-cXX kernel: eth2: ===> VirtualIF_open
Mar 29 17:16:12 erx-fw-cXX kernel: eth3: ===> VirtualIF_open
Mar 29 17:16:12 erx-fw-cXX kernel: eth4: ===> VirtualIF_open
Mar 29 17:16:12 erx-fw-cXX kernel: CDMA_CSG_CFG = 81000000
Mar 29 17:16:12 erx-fw-cXX kernel: GDMA1_FWD_CFG = 21710000
Mar 29 17:16:12 erx-fw-cXX kernel: ESW: Link Status Changed - Port0 Link UP
Mar 29 17:16:12 erx-fw-cXX ssh-recovery[476]: starting...
Mar 29 17:16:13 erx-fw-cXX ssh-recovery[476]: if=(all) port=(60257) terminate-timeout=(60)
Mar 29 17:16:13 erx-fw-cXX ssh-recovery[476]: enabling link on interfaces...
Mar 29 17:16:13 erx-fw-cXX kernel: eth0: ===> VirtualIF_open
Mar 29 17:16:13 erx-fw-cXX ssh-recovery[476]: eth0 :: mac=(44:d9:e7:f5:ac:bd)
Mar 29 17:16:13 erx-fw-cXX kernel: eth1: ===> VirtualIF_open
Mar 29 17:16:13 erx-fw-cXX ssh-recovery[476]: eth1 :: mac=(44:d9:e7:f5:ac:be)
Mar 29 17:16:13 erx-fw-cXX kernel: eth2: ===> VirtualIF_open
Mar 29 17:16:13 erx-fw-cXX ssh-recovery[476]: eth2 :: mac=(44:d9:e7:f5:ac:bf)
Mar 29 17:16:13 erx-fw-cXX kernel: eth3: ===> VirtualIF_open
Mar 29 17:16:13 erx-fw-cXX ssh-recovery[476]: eth3 :: mac=(44:d9:e7:f5:ac:c0)
Mar 29 17:16:13 erx-fw-cXX kernel: eth4: ===> VirtualIF_open
Mar 29 17:16:13 erx-fw-cXX ssh-recovery[476]: eth4 :: mac=(44:d9:e7:f5:ac:c1)
Mar 29 17:16:13 erx-fw-cXX ssh-recovery[476]: switch0 :: mac=(44:d9:e7:f5:ac:c2)
Mar 29 17:16:14 erx-fw-cXX kernel: ip_set: protocol 6
Mar 29 17:16:15 erx-fw-cXX ssh-recovery[476]: service started :: pid=(611)
Mar 29 17:16:15 erx-fw-cXX kernel: Type=Linux
Mar 29 17:16:15 erx-fw-cXX NSM[678]:  NSM-6: Initializing memdbg: ptr=0x693634 history-size=1024 memdbg-size=143552
Mar 29 17:16:16 erx-fw-cXX NSM[687]:  NSM-6: 10 MB
Mar 29 17:16:16 erx-fw-cXX NSM[687]:  NSM-6: 1000 MB 
Mar 29 17:16:16 erx-fw-cXX NSM[687]:  NSM-6: 10 MB
Mar 29 17:16:16  NSM[687]: last message repeated 3 times
Mar 29 17:16:16 erx-fw-cXX RIB[692]:  RIB-6: Initializing memdbg: ptr=0x5874c4 history-size=1024 memdbg-size=143552
Mar 29 17:16:16 erx-fw-cXX NSM[687]:  NSM-4: Could not create VRF table with identifier 1 in the MPLS Forwarder
Mar 29 17:16:16 erx-fw-cXX RIB[695]:  RIB-6: RIBd (1.2.0) starts
Mar 29 17:16:21 erx-fw-cXX IMI[675]:  IMI-6: imi_server_send_config called (PM 1)
Mar 29 17:16:21 erx-fw-cXX IMI[675]:  IMI-6: imi_server_send_config called (PM 42)
Mar 29 17:16:24 erx-fw-cXX rl-system.init: Checking/creating SSH host keys.
Mar 29 17:16:27 erx-fw-cXX rsyslogd: set SCM_CREDENTIALS failed on '/dev/log': Protocol not available
Mar 29 17:16:28 erx-fw-cXX kernel: ESW: Link Status Changed - Port0 Link Down
Mar 29 17:16:30 erx-fw-cXX ntpd[956]: ntpd 4.2.6p2@1.2194-o Mon Mar  5 17:31:26 UTC 2018 (1)
Mar 29 17:16:30 erx-fw-cXX ntpd[957]: proto: precision = 29.409 usec
Mar 29 17:16:31 erx-fw-cXX rsyslogd: set SCM_CREDENTIALS failed on '/dev/log': Protocol not available
Mar 29 17:16:31 erx-fw-cXX kernel: ESW: Link Status Changed - Port0 Link UP
Mar 29 17:16:32 erx-fw-cXX ntpd_intres[963]: host name not found: 0.ubnt.pool.ntp.org
Mar 29 17:16:32 erx-fw-cXX ntpd_intres[963]: host name not found: 1.ubnt.pool.ntp.org
Mar 29 17:16:32 erx-fw-cXX ntpd_intres[963]: host name not found: 2.ubnt.pool.ntp.org
Mar 29 17:16:32 erx-fw-cXX ntpd_intres[963]: host name not found: 3.ubnt.pool.ntp.org
Mar 29 17:16:39 erx-fw-cXX NSM[687]:  NSM-6: Operation not supported 
Mar 29 17:16:40 erx-fw-cXX NSM[687]:  NSM-6: Operation not supported 
Mar 29 17:16:45 erx-fw-cXX ntpd_intres[963]: host name not found: 0.ubnt.pool.ntp.org
Mar 29 17:16:45 erx-fw-cXX ntpd_intres[963]: host name not found: 1.ubnt.pool.ntp.org
Mar 29 17:16:45 erx-fw-cXX ntpd_intres[963]: host name not found: 2.ubnt.pool.ntp.org
Mar 29 17:16:45 erx-fw-cXX ntpd_intres[963]: host name not found: 3.ubnt.pool.ntp.org
Mar 29 17:16:51 erx-fw-cXX ntpd_intres[963]: host name not found: 0.ubnt.pool.ntp.org
Mar 29 17:16:51 erx-fw-cXX ntpd_intres[963]: host name not found: 1.ubnt.pool.ntp.org
Mar 29 17:16:51 erx-fw-cXX ntpd_intres[963]: host name not found: 2.ubnt.pool.ntp.org
Mar 29 17:16:51 erx-fw-cXX ntpd_intres[963]: host name not found: 3.ubnt.pool.ntp.org
Mar 29 17:16:53 erx-fw-cXX ntpd[957]: ntpd exiting on signal 15
Mar 29 17:16:55 erx-fw-cXX ntpd[1833]: ntpd 4.2.6p2@1.2194-o Mon Mar  5 17:31:26 UTC 2018 (1)
Mar 29 17:16:55 erx-fw-cXX ntpd[1834]: proto: precision = 42.064 usec
Mar 29 17:16:56 erx-fw-cXX ubnt-service-ssh: waiting for netplugd to be started...
Mar 29 17:16:57 erx-fw-cXX ntpd_intres[1836]: host name not found: 0.ubnt.pool.ntp.org
Mar 29 17:16:57 erx-fw-cXX ntpd_intres[1836]: host name not found: 1.ubnt.pool.ntp.org
Mar 29 17:16:57 erx-fw-cXX ntpd_intres[1836]: host name not found: 2.ubnt.pool.ntp.org
Mar 29 17:16:57 erx-fw-cXX ntpd_intres[1836]: host name not found: 3.ubnt.pool.ntp.org
Mar 29 17:17:05 erx-fw-cXX ubnt-service-gui: waiting for netplugd to be started...
Mar 29 17:17:15 erx-fw-cXX wlb: wlb-G-eth0 Starting wlb watchdog on wlb-G-eth0 after 60s delay
Mar 29 17:17:15 erx-fw-cXX wlb: wlb-G-eth1 Starting wlb watchdog on wlb-G-eth1 after 60s delay
Mar 29 17:17:15 erx-fw-cXX wlb: group G, interface eth0 going Active
Mar 29 17:17:15 erx-fw-cXX ssh-recovery[680]: terminating the SSH recovery service :: pid=(611)
Mar 29 17:17:16 erx-fw-cXX netplugd: Starting network plug daemon: netplugd.
Mar 29 17:17:20 erx-fw-cXX ubnt-service-ssh: starting the SSH service (see messages from sshd).
Mar 29 17:17:24 erx-fw-cXX ubnt-service-gui: starting the GUI service.
Mar 29 18:37:14 erx-fw-cXX wlb: wlb-G-eth1 wlb-G-eth1 reachability failed, failover
Mar 29 18:41:37 erx-fw-cXX kernel: ESW: Link Status Changed - Port0 Link Down
Mar 29 18:41:38 erx-fw-cXX dhclient: send_packet: Network is unreachable
Mar 29 18:41:38 erx-fw-cXX dhclient: send_packet: please consult README file regarding broadcast address.
Mar 29 18:41:38 erx-fw-cXX dhclient: dhclient.c:2257: Failed to send 300 byte long packet over fallback interface.
Mar 29 18:41:40 erx-fw-cXX kernel: ESW: Link Status Changed - Port0 Link UP
Mar 29 18:41:58 erx-fw-cXX wlb: wlb-G-eth0 wlb-G-eth0 reachability failed, failover
Mar 29 18:41:58 erx-fw-cXX wlb: group G, interface eth0 going Inactive
$ sudo kill -9 $(pidof ubnt-util)
$ cat /var/log/messages
...
Mar 29 18:41:58 erx-fw-cXX wlb: group G, interface eth0 going Inactive
Mar 29 18:56:43 erx-fw-cXX wlb: wlb-G-eth0 Starting wlb watchdog on wlb-G-eth0 after 60s delay
Mar 29 18:56:43 erx-fw-cXX wlb: wlb-G-eth1 Starting wlb watchdog on wlb-G-eth1 after 60s delay
Mar 29 18:56:43 erx-fw-cXX wlb: group G, interface eth0 going Active

This is with dnsmasq as forwarder with "set system name-server 127.0.0.1", though I don't think it has anything to do with DNS or lb-local.

 

By the way, the command "show tech-support" doesn't output any load balance info, just says:

----------------
WAN LOAD BALANCING
----------------
Wan Load Balance is not configured

And thanks to @cewald for the "kill ubnt-util", it helped me testing it faster!

New Member
Posts: 4
Registered: ‎04-03-2017

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

we are doing a POC of edgerouter infinity v1.10.1 for use with 10G links and I  not sure if this is a bug or expected behavior but I have configured the offload feature but show ubnt offload it says it is disabled.

 

show version

Version:      v1.10.1

Build ID:     5067572

Build on:     03/05/18 17:51

Copyright:    2012-2018 Ubiquiti Networks, Inc.

HW model:     EdgeRouter Infinity

 

relevnt section of show configureation

   offload {

        hwnat disable

        ipv4 {

            forwarding enable

            gre enable

            vlan enable

 

show ubnt offload

 

IP offload module   : loaded

IPv4

  forwarding: disabled

  vlan      : disabled

  pppoe     : disabled

  gre       : disabled

IPv6

  forwarding: disabled

  vlan      : disabled

  pppoe     : disabled

 

IPSec offload module: loaded

 

Traffic Analysis    :

  export    : disabled

  dpi       : disabled

    version       : 1.354

Member
Posts: 218
Registered: ‎02-12-2013
Kudos: 69
Solutions: 18

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

@lesyshyn
Are you using using anything like briding, QoS, NetFlow or bonding? If yes, then that disables the offloading:
https://help.ubnt.com/hc/en-us/articles/115006567467-EdgeRouter-Hardware-Offloading-Explained
New Member
Posts: 1
Registered: ‎03-31-2018

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

I've updated my EdgeRouter X SFP to 1.10 (and today to 1.10.1) from something as old as 1.9.0. Before the upgrade I had no issues with the router - it would run smoothly for months, but since upgrading to 1.10 it seems to be leeking occasionally - for instance I can only login to GUI within first 30 minutes after reboot of the router - after that time it goes silent on port 443, and in some longer time spans it may even shut down all ethernet ports and just sit there until someone manually reboots it via powerplug.

 

This is very frustrating! I hoped 1.10.1 would be the fix, but the issue persists, how can I contribute to fixing it?

New Member
Posts: 4
Registered: ‎04-03-2017

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

Thanks, it was the netflow.

New Member
Posts: 1
Registered: ‎04-02-2018

Re: EdgeMAX EdgeRouter software version v1.10.1 has been released!

Hmm, that sounds somewhat similar ...

 

My EdgeRouter ERPro-8 had been running smooth for five years, with the only reboots being the result of firmware upgrades ... until v1.10. Ever since the upgrade it's developed connection issues which I haven't been able to track down: the WAN interface (eth1, IP from ISP DHCP) will stop transmitting typically around 30 minutes after rebooting, though I was getting hopeful on my last attempt ... but it happened again after 44 hours this time. The interface still receives data judging by the dashboard and the occasional firewall DROP line in the log, and I can still access the GUI and CLI in order to reboot. At first I thought I was simply losing the IP, but as it's transmitting 0bps I can't even renew.

 

Port dying? Not likely .. because every now and then the same behaviour appears to happen on eth7 (connected to LAN switch) locking me out of the device and forcing me to power cycle. Plus everything returns to normal when I revert to v1.9.7+Hotfix4 ... v1.10.1 didn't fix it here either. I compiled a support log after booting, and after eth1 stops, but I don't see anything out of the ordinary in there when comparing the two.

 

It's an extremely basic setup that I haven't touched in years (LAN-WAN, DHCP Server, NAT, a couple of forwarded ports and basic firewall rules; no load-balancing, VPN or QoS).

Reply