Reply
New Member
Posts: 24
Registered: ‎03-26-2017
Kudos: 8
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

Note: I seem to have possibly hit this problem as well, but started another thread about it here:

 

https://community.ubnt.com/t5/EdgeRouter/EdgeRouter-thruput-declines-over-3-months-restores-after-re...

New Member
Posts: 10
Registered: ‎07-20-2016

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

I am also experiencing this issue, I'm on 1.10.8. I was 42 days uptime when I stumbled upon this thread looking for answers. I rebooted and all is working again for now.

Ubiquiti Employee
Posts: 545
Registered: ‎01-06-2017
Kudos: 192
Solutions: 20

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime


@gritech wrote:

I only had to do one of the routers to test both scenarios as flushing alone did not fix offloading. (MainRouter; the one I sent files from).

 

 

show ubnt offload statistics:  ipv4, ipv6 flushes both 0

echo 0 > /proc/cavium/ipv4/cache

echo 0 > /proc/cavium/ipv6/cache

show ubnt offload statistics:  ipv4, ipv6 flushes both 1

 

Offloading not fixed

 

 

 

set system offload ipv4 table-size 8192

set system offload ipv6 table-size 8192    *ipv6 table was already at default 8192; no ipv6 used on these routers

show ubnt offload statistics:  ipv4, ipv6 flushes both 2

 

Offload fixed.

 

 

 

I will leave this router with table sized 8192 and see if it makes it past another 30 days. (currently 36 days).

 


Excellent, thank you. That narrows down the search somewhat. Let us know how it goes.

Ubiquiti Employee
Posts: 545
Registered: ‎01-06-2017
Kudos: 192
Solutions: 20

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime


@dpfubiq wrote:

@waterside wrote:

Stats collection is not actually enabled until you run 'show ubnt offload statistics' the first time after each reboot.


Is there a performance penalty for enabling these statistics? Is there a way to disable the collection again after running that command?

 

Thanks!


 

There was a noticable penality in our load testing environment however it is not representative of real-world loads. You can try disabling the stats collection and check your router.

 

# Disable.
echo 0 > sudo /proc/cavium/stats

# Enable
echo 1 > sudo /proc/cavium/stats
Ubiquiti Employee
Posts: 1,228
Registered: ‎07-20-2015
Kudos: 1444
Solutions: 81

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

[ Edited ]

> Hardware Offloading "breaking" after >30days uptime

After reviewing source code I finally was able to find root cause of this issue. This bug was introduced in v1.10.2 on all Cavium-based routers and it causes offloading to forward packets via slowpath after ~30 days of uptime (268.435.456 msec to be precise).

 

We shall provide fix in v2.0.1 and v1.10.9 firmwares.

 

Update: actually this bug was present even in prev-1.10.2 firmwares, but would take 25 months of uptime to trigger it on ER-Lite (or 12 months on ER8, or 5 months on ER-Infinity).

Emerging Member
Posts: 62
Registered: ‎04-28-2014
Kudos: 26
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

Awesome! Thanks for investigating this time-consuming-to-reproduce bug!
Member
Posts: 238
Registered: ‎04-09-2013
Kudos: 89
Solutions: 6

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

Great!

To all who collaborated to this thread, thank you for your efforts on gathering data and testing Man Happy

Emerging Member
Posts: 46
Registered: ‎08-27-2016
Kudos: 11
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

Awesome! Looking forward to the updated firmwares.

 

I'd be curious to hear more details of what caused this if possible. Regardless, this is wonderful news. Man Very Happy

Veteran Member
Posts: 5,434
Registered: ‎03-12-2011
Kudos: 2728
Solutions: 129

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime


@blunden wrote:

I'd be curious to hear more details of what caused this if possible. Regardless, this is wonderful news. Man Very Happy


Same. The curiosioty is killing me Man Tongue

Ubiquiti Employee
Posts: 1,228
Registered: ‎07-20-2015
Kudos: 1444
Solutions: 81

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

[ Edited ]

> I'd be curious to hear more details of what caused this if possible.

  1. Offloading has 28-bit wide time counter that will wrap around when reaching value 268435455 and it will stop processing new flows after this event (bug!)
  2. Before v1.10.2 time counter was growing differently on each CPU. For instance on ER-Lite it used to grow 4 units/sec which would take 25 months for it to overlap
  3. In v1.10.2 we syncronized time counter with "linux jiffies"  to make constant growth speed on all CPUs, since then time counter grows 100 units/sec which means that it will overlap in 1 month on all ER models. 
Member
Posts: 724
Registered: ‎09-13-2018
Kudos: 137
Solutions: 48

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime

So are you now doing modular arithmetic (mod 2^28)?  Increasing the size of the counter would just make the problem happen less frequently.

Emerging Member
Posts: 46
Registered: ‎08-27-2016
Kudos: 11
Solutions: 1

Re: [ERL3 - Offloading] Hardware Offloading "breaking" after >30days uptime


@UBNT-afomins wrote:

> I'd be curious to hear more details of what caused this if possible.

  1. Offloading has 28-bit wide time counter that will wrap around when reaching value 268435455 and it will stop processing new flows after this event (bug!)
  2. Before v1.10.2 time counter was growing differently on each CPU. For instance on ER-Lite it used to grow 4 units/sec which would take 25 months for it to overlap
  3. In v1.10.2 we syncronized time counter with "linux jiffies"  to make constant growth speed on all CPUs, since then time counter grows 100 units/sec which means that it will overlap in 1 month on all ER models. 

Thanks for the info.

 

Any estimate on when we get to enjoy the 1.10.9 release? Man Happy This bug is really annoying.

Reply