Reply
New Member
Posts: 3
Registered: ‎11-11-2016

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

Ah, thank god. I thought I was going crazy. For me, it only seemed to happen in firmwares 1.10.3+. Running 1.10.1 was solid for many months. Looking at the patch notes for 1.10.8, it doesn't seem like that firmware addresses this issue?

Member
Posts: 238
Registered: ‎04-09-2013
Kudos: 89
Solutions: 6

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

[ Edited ]

Thank you for your feedback @Boltsie, it's really valuable to trace back the version where the issue arise.

 

What I can think of, regarding EdgeOS 1.10 version series:

  • The issue is not firmware/bootloader related, as it occurs in v1.10.6 and earlier.
  • v1.10.5 got the offload tables not hardcoded anymore
  • v1.10.3 had a lot of offloading changes (disable-flow-flushing-upon-fib-changes, flow lifetime, statistics, etc)
  • v1.10.0: fix regarding Hardware Offloading related to receiving packets in multi-cores CPU context

Feel free to tell me if I missed some changes related to the ERL3 Offload Engine(*) that occured in the v1.10 EOS series.

And if someone has the time and means to test these several versions and confirm when it all started...

 

(*): if you are affected by this issue and have another Cavium based ER, please let me now. I'll change the topic's title and scope.

 

--

edit: fixed misleading statements thanks to waterside's following post.

Senior Member
Posts: 3,234
Registered: ‎08-06-2015
Kudos: 1383
Solutions: 186

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime


@elgo wrote:

What I can think of, regarding EdgeOS 1.10 version series:

  • ERL3 had no firmware/bootloader upgrade (only OS & packages upgrades) if I'm not wrong
  • v1.10.0: ERL3 got a fix regarding Hardware Offloading related to receiving packets in multi-cores CPU context

 

There are no fixes in offloading unique to any specific Cavium-based router:  All of the offloading changes apply to all of the Cavium-based routers (essentially anything not ER-X*)

 

All Cavium-based platforms (see above) have a bootloader update in 1.10.7, and another in 1.10.8.  To confirm if there is an available bootloader update on your ER you can use 'show system boot-image' via the CLI.  This KB article may be of some help:  EdgeRouter - How to Update Bootloader

Member
Posts: 116
Registered: ‎09-05-2016
Kudos: 42

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

[ Edited ]

Stumbled on this thread whilst wondering at my offload stats.  There seems to be a lot of zeros in the stats with offload enabled, unless I am reading them incorrectly:

 

 

Spoiler

Last login: Mon Dec  3 10:56:57 UTC 2018 on pts/0                                 

Linux ubnt 3.10.107-UBNT #1 SMP Tue Nov 20 17:01:40 UTC 2018 mips64               

Welcome to EdgeOS                                                                 

admin@ubnt:~$ show ubnt offload statistics                                        

                                                                                  

 Statistics                                                                       

========================                                                          

                                                                                  

RX packets:                        0    bytes:                     0              

TX packets:                        0    bytes:                     0              

Bypass packets:                    0    bytes:                     0              

Bad L4 checksum:                   0    bytes:                     0              

                                                                                  

Protocol        RX packets      RX bytes                TX packets      TX bytes  

                                                                                  

ipv4            0                 0                   0                 0         

ipv6            0                 0                   0                 0         

pppoe           0                 0                   0                 0         

vlan            0                 0                   0                 0         

                                                                                  

 Forwarding cache size (IPv4)                                                     

=============================                                                     

                                                                                  

table_size (buckets)                  32768                                       

table size (bytes)                    4194304                                     

flows_max (bytes)                     19660800                                    

                                                                                  

 Flow cache table size (IPv6)

=============================

 

table_size (buckets)                  8192

table size (bytes)                    1048576

flows_max (bytes)                     2883584

 

 Flow timers

=============================

 

cycles                                613638817879872

clock_rate                            800000000

HZ                                    100

timer_ticks                           76675149

new_flow_interval (timer_ticks)       1200

old_flow_interval (timer_ticks)       400

 

 Low-level IPv4 flow dynamics

=============================

 

ipv4_flow_found                       0

    ipv4_flow_found_expired           0

    ipv4_flow_found_old_random_bypass 0

    ipv4_flow_found_action_bypass     0

 

ipv4_flow_not_found                   0

 

 IPv4 flow creation dynamics

=============================

 

ipv4_create_flow_found                            0

ipv4_create_flow_found_replaced                   0

ipv4_create_flow_not_found                        0

ipv4_create_flow_not_found_replaced_expired       0

ipv4_create_flow_not_found_replaced_non_expired   0

 

 Low-level IPv6 flow dynamics

=============================

 

ipv6_flow_found                       0

    ipv6_flow_found_expired           0

    ipv6_flow_found_old_random_bypass 0

    ipv6_flow_found_action_bypass     0

 

ipv6_flow_not_found                   0

 

 IPv6 flow creation dynamics

=============================

 

ipv6_create_flow_found                            0

ipv6_create_flow_found_replaced                   0

ipv6_create_flow_not_found                        0

ipv6_create_flow_not_found_replaced_expired       0

ipv6_create_flow_not_found_replaced_non_expired   0

 

 Flow cache flushes

=============================

 

ipv4_flushes                          0

ipv6_flushes                          0

 

admin@ubnt:~$ show ubnt offload

                                                                                  

IP offload module   : loaded                                                      

IPv4                                                                              

  forwarding: enabled                                                             

  vlan      : enabled                                                             

  pppoe     : enabled                                                             

  gre       : enabled                                                             

IPv6                                                                              

  forwarding: enabled                                                             

  vlan      : enabled                                                             

  pppoe     : disabled                                                            

                                                                                  

IPSec offload module: loaded                                                      

                                                                                  

Traffic Analysis    :                                                             

  export    : enabled                                                             

  dpi       : enabled                                                             

    version       : 1.422                                                         

admin@ubnt:~$ 

 

 

Last login: Mon Dec  3 13:57:00 UTC 2018 on pts/0                                 

Linux ubnt 3.10.107-UBNT #1 SMP Tue Nov 20 17:01:40 UTC 2018 mips64               

Welcome to EdgeOS                                                                 

admin@ubnt:~$ show system boot-image                                              

The system currently has the following boot image installed:                      

Current boot version: e201_001_1e4df                                              

Current boot md5sum : 3571b4f2bb17d7fe204a4d249766ed35                            

                                                                                  

admin@ubnt:~$ 

 The router has been up for 9 days (since the last firmware update).

Senior Member
Posts: 3,234
Registered: ‎08-06-2015
Kudos: 1383
Solutions: 186

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

Stats collection is not actually enabled until you run 'show ubnt offload statistics' the first time after each reboot.  The first time that is run the counters will indeed be all zeros as a result.

 

There had been discussion about adding an option to enable this by default but that doesn't seem to be there yet.

 

 

Member
Posts: 116
Registered: ‎09-05-2016
Kudos: 42

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

Thanks, I will do that.

New Member
Posts: 7
Registered: ‎06-11-2014
Kudos: 3
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

@UBNT-sandisn

 

Are there any recommended workarounds or known good versions to move to?

This is becoming a bit more of a problem after a couple of our sites have upgraded from 150mbps to 600mbps service.

Emerging Member
Posts: 46
Registered: ‎08-27-2016
Kudos: 11
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime


@roto wrote:

@UBNT-sandisn

 

Are there any recommended workarounds or known good versions to move to?

This is becoming a bit more of a problem after a couple of our sites have upgraded from 150mbps to 600mbps service.


I don't remember having this issue in 1.10.1 but I can't be entirely sure.

 

@UBNT-sandisn Any news to report regarding this issue? Have you guys been able to reproduce it yet?

 

I understand that many of you have probably had some well earned time off during the holidays. Man Happy

Member
Posts: 259
Registered: ‎08-15-2015
Kudos: 40
Solutions: 2

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

FWIW: I just tested this on my ERL3 running v1.10.8

 

$ uptime
11:54:04 up 31 days, 7 min, 1 user, load average: 0.10, 0.17, 0.17

$ show ubnt offload IP offload module : loaded IPv4 forwarding: enabled vlan : disabled pppoe : disabled gre : disabled IPv6 forwarding: disabled vlan : disabled pppoe : disabled IPSec offload module: loaded Traffic Analysis : export : disabled dpi : disabled version : 1.422

I've a 50/5 Internet connection.  (Though speedtest.net today says I'm getting 60/10.)  I uploaded a 435MB file to one of my DigitalOcean virtual servers and then downloaded it.  Upload was 11.2mb/s.  Download was 56mb/s.

 

ERL3 CPU never hit more than .39.  Usually was under .30.

 

During this time, the top display updated normally and I was doing some light web browsing.

Member
Posts: 238
Registered: ‎04-09-2013
Kudos: 89
Solutions: 6

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

[ Edited ]

@SEMIJim: not sure about the relevance of your observations because your WAN link speed is way below the capacity of an ERL-3 without hardware offloading enabled. Try some LAN iperf testing next time and go above 700Mb/s to show "something".

 

@blunden: +1.

I'll add another question for the UBNT gentlemen: is there any use (for you) that we test the v2.0.0 released monday or the issue is still reproducible in this version too?

Emerging Member
Posts: 62
Registered: ‎04-28-2014
Kudos: 26
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

[ Edited ]

 

@UBNT-sandisn

 

Any progress on this? I currently have 2 ER3 in this state. Friday, 29 days uptime, all good. Today, 1 month 1 day, high CPU & VLANs showing traffic on the dashboard.

 

I have a tech support file for one of the routers, the 2nd is still generating but I assume it will be available shortly if needed.

 

edit: 1.10.8

Emerging Member
Posts: 52
Registered: ‎12-30-2009
Kudos: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

I'm testing this issue on 2.0.0 right now, but the uptime is just 3 days right now. No problem yet.

Emerging Member
Posts: 62
Registered: ‎04-28-2014
Kudos: 26
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

@UBNT-afomins
@UBNT-sandisn

Any info you need before I reset the offloading?
Member
Posts: 259
Registered: ‎08-15-2015
Kudos: 40
Solutions: 2

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime


@elgo wrote:

@SEMIJim: not sure about the relevance of your observations because your WAN link speed is way below the capacity of an ERL-3 without hardware offloading enabled. Try some LAN iperf testing next time and go above 700Mb/s to show "something".

 


Hmmm... I don't get more than 300mb/s out of my ERL3 with iperf3.  Even with it freshly-booted.  Hooked my laptop to the same switch and get 940mb/s from that, so I'm not sure what's going on.

 

I've got other fish to fry, right now, but that's probably something I'll want to pursue, eventually.

New Member
Posts: 39
Registered: ‎09-01-2015
Kudos: 2

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

[ Edited ]

I believe I ran into this issue on my ERL yesterday. Unfortunately I found this thread after rebooting. I validated that hardware offload was reporting to be enabled and it was.

 

Off the top of my head, I noticed that the `ksoftirqd` processes were taking up very large amounts of CPU, along wth spikes of `ubnt-util`, `dnsmasq`(?), and `snmpd`. If there's something useful to collect moving forward, I'd be happy to share.

 

Running on a 1000/1000 fiber line. Before restart, WAN speeds were dropping to ~200mbit and are now saturating the connection after the restart.

Ubiquiti Employee
Posts: 545
Registered: ‎01-06-2017
Kudos: 192
Solutions: 20

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime


@gritech wrote:
@UBNT-afomins
@UBNT-sandisn

Any info you need before I reset the offloading?

Hi,

 

Thanks for the offer.

 

1. Output from top, htop.

2. Offloading stats. Stop the script with CTRL+C after a minute or so.

echo 1 > /proc/cavium/stats
chmod +x offload_stats.py
./offload_stats.py 2>&1 | tee stats.txt

3. Does flushing offloading cache help?

echo 0 > /proc/cavium/ipv4/cache
echo 0 > /proc/cavium/ipv6/cache

4. Does changing cache size to previous default help?

configure
set system offload ipv4 table-size 8192
set system offload ipv6 table-size 8192
commit
save

5. Does disabling/enabling offloading help (disable all offload options not just forwarding...)?

configure
set system offload ipv4 forwarding disable
commit
delete system offload ipv4 forwarding disable
commit
save

Regards,

Sandis

Attachment
Emerging Member
Posts: 62
Registered: ‎04-28-2014
Kudos: 26
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

Hi Sandis,

 

Here's the output from offload_stats.py & top.  I don't have htop installed.

 

I haven't flushed the cache yet; I'm almost positive this will reset the offloading.  *I'm more than happy to flush the cache if you're ready to reset & wait 30 days again.*

 

From last time I was in this state, changing the offload table size or flow lifetime will get the offload working perfectly again.

 

I have not run with the default size since it was able to be changed, so I'm not sure if it will break at 8192.  If this is important, I can change it & run 30 days.

 

Let me know if you want me to proceed with flushing, changing the table size, or disabling/enabling offloading.

 

I'll PM you my full config as well.

 

 

 

Attachment
Ubiquiti Employee
Posts: 545
Registered: ‎01-06-2017
Kudos: 192
Solutions: 20

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

@gritech

 

Thanks! You said you have two routers in this state. For one of them just do the flush. For the other - resize to 8192. Keep us updated.

Emerging Member
Posts: 62
Registered: ‎04-28-2014
Kudos: 26
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

I only had to do one of the routers to test both scenarios as flushing alone did not fix offloading. (MainRouter; the one I sent files from).

 

 

show ubnt offload statistics:  ipv4, ipv6 flushes both 0

echo 0 > /proc/cavium/ipv4/cache

echo 0 > /proc/cavium/ipv6/cache

show ubnt offload statistics:  ipv4, ipv6 flushes both 1

 

Offloading not fixed

 

 

 

set system offload ipv4 table-size 8192

set system offload ipv6 table-size 8192    *ipv6 table was already at default 8192; no ipv6 used on these routers

show ubnt offload statistics:  ipv4, ipv6 flushes both 2

 

Offload fixed.

 

 

 

I will leave this router with table sized 8192 and see if it makes it past another 30 days. (currently 36 days).

 

New Member
Posts: 24
Registered: ‎03-26-2017
Kudos: 8
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime


@waterside wrote:

Stats collection is not actually enabled until you run 'show ubnt offload statistics' the first time after each reboot.


Is there a performance penalty for enabling these statistics? Is there a way to disable the collection again after running that command?

 

Thanks!

Reply