Regular Member
Posts: 529
Registered: ‎07-21-2010
Kudos: 95
Solutions: 6

Performance degradation when CPU is at 100% | IP Forwarding Engine

[ Edited ]

Hey Guys,

 

we have two ER-Pro8 with 2x Full-eBGP-Table and 1x Full-iBGP-Table. On both devices IP-Offloading is activated and enabled.

 

 

root@at-sbg-itz-tz-k10-r10-bgp01:/home/rack# show version
Version: v1.9.7+hotfix.1
Build ID: 5005855
Build on: 08/03/17 03:38
Copyright: 2012-2017 Ubiquiti Networks, Inc.
HW model: EdgeRouter Pro 8-Port
HW S/N: 24A43CB3DB93
Uptime: 20:10:58 up 6 days, 22:31, 1 user, load average: 1.85, 1.80, 1.80

root@at-sbg-itz-tz-k10-r10-bgp01:/home/rack# show ubnt offload
IP offload module : loaded IPv4 forwarding: enabled vlan : enabled pppoe : disabled gre : enabled IPv6 forwarding: enabled vlan : enabled pppoe : disabled IPSec offload module: loaded Traffic Analysis : export : disabled dpi : disabled version : 1.302

 

 

Topology:

 

2x IP Upstreams /←← 1GiB  ER-Pro8 #1  1GiB  ER-Lite  1GiB  WebServer #2
/
WebServer #1 -| iBGP
\
2x IP Upstreams \ 1GiB ER-Pro8 #2

Upstream #2 > #1
Downstream #1 > #2

 

When downloading a File from WebServer #1 to WebServer #2 i can get full Gigabit but when BGP-Updates are in progress, the bandwidth goes down to under 100 MBit.

 

WhatsApp Image 2018-02-10 at 20.02.51.jpeg

 

 

First EdgeMax = On Site Webserver#2

Middle EdgeMax = On Site Weberver#1 / BGP1

Right EdgeMax = On Site Weberver#1 / BGP2

 

Upstream-Routing is Weberver#2 > BGP1 > Webserver#1

Downstream-Routing is Webserver#1 > BGP2 > BGP1 > Webserver#2

 

You can see performance impacts when CPU is already at over 50%.

If CPU is idle at up to 10%, you can push 1 GiB/s.

 

Why is IP forwarding offloading relaing on the CPU?

Is FDB not installed in the offloading engine, so for routing the engine have not to use CPU?

 

Regular Member
Posts: 529
Registered: ‎07-21-2010
Kudos: 95
Solutions: 6

Re: Performance degradation when CPU is at 100% | IP Forwarding Engine

That is the downloading-speed when CPU of ER#1 is idle < 50% usage

curl2.JPG

 

That is the downloading-speed when CPU of ER#1 is >= 50%

curl.JPG

 

That is ps ax when downloading at low speed

psax.JPG 

 

I have changed the routing to only use ER-Pro8 #1, so Up- and Downstrem is symmetrical.

Interfaces do not have much load, but i should be able to get up to 1 GiB per Port.

I monitor every interface that is in the line. When CPU on ER-Pro is heavily loaded, i can see that i can not get the full bandwidth.

 

But the IP-Forwarding / Routing should not be affected when traffic is offloaded!

 

I think that is a real issue.

 

Can anyone write a script that is doing some creepy stuff to max out the CPU, so we can test IP-Forwarding under CPU stress test?

 

 

Veteran Member
Posts: 7,991
Registered: ‎03-24-2016
Kudos: 2083
Solutions: 913

Re: Performance degradation when CPU is at 100% | IP Forwarding Engine

CPU stress:

Spoiler
sudo openssl speed md5 sha1 sha256 sha512 des des-ede3 aes-128-cbc aes-192-cbc aes-256-cbc rsa2048 dsa2048

My main question:  You have offload enabled....but is traffic really being offloaded?

Regular Member
Posts: 529
Registered: ‎07-21-2010
Kudos: 95
Solutions: 6

Re: Performance degradation when CPU is at 100% | IP Forwarding Engine

Offloading is enabled. Traffic passes VLANs, but also Offloading for VLANs is enabled.

I do not know if traffic is offloaded or not. I believe that is not offloaded, but why?

Veteran Member
Posts: 6,096
Registered: ‎01-04-2017
Kudos: 886
Solutions: 314

Re: Performance degradation when CPU is at 100% | IP Forwarding Engine

I see your using older versions of the firmware. 1.9.7. HF1 was riddled with bugs. I would suggest try moving up to 1.10.0 and if your still having issues post your config.
Regular Member
Posts: 529
Registered: ‎07-21-2010
Kudos: 95
Solutions: 6

Re: Performance degradation when CPU is at 100% | IP Forwarding Engine

Is there any "monitor" application to see the workload of CPU / ASIC?

 

On Mikrotik you can use "profiling" to see how much workload is on CPU, Bridge, Network and so on. Is something similar available to EdgeOS?

 

I will upgrade to 1.10.0 but that is not trivial procedure on a production system with 1.500 Customer connected to it.

Member
Posts: 200
Registered: ‎04-28-2015
Kudos: 107
Solutions: 1

Re: Performance degradation when CPU is at 100% | IP Forwarding Engine

It's worth noting that certain features also make offload unavailable for these platforms.

 

On the ER-Pro8, for example, I don't believe there's any accelerated bridging/switching.  So any traffic flow that crosses a bridge interface likely hits CPU.

 

The various QoS mechanisms surfaced in EdgeOS also disable offload for traffic matching those flows, as I understand it.

 

You have to look at the entire configuration scenario to know with certainty whether or not offload is going to apply to your traffic.

Regular Member
Posts: 529
Registered: ‎07-21-2010
Kudos: 95
Solutions: 6

Re: Performance degradation when CPU is at 100% | IP Forwarding Engine

We just routing traffic from VLAN to VLAN.

 

So incomming Traffic vom Upstream Carrier

#1 is eth0.100

#2 is eth1.101

 

to our network

#1 is eth6.2200

#2 is eth7.2201

 

there is nothing else configurated on that router.

Only BGP is running to do the eBGP with our Upstreams.

 

Is there any command where you can see how many packets were processed by CPU and how many by ASIC?

Member
Posts: 200
Registered: ‎04-28-2015
Kudos: 107
Solutions: 1

Re: Performance degradation when CPU is at 100% | IP Forwarding Engine

[ Edited ]

Yes, at least some stats are available.

 

Try:

 

sudo -s

(become root)

 

echo '1' > /proc/cavium/stats

(enables cavium stat tracking)

then

 

cat /proc/cavium/stats

(gives current stat info), may be repeated as often as you like.

 

for raw offload processing stats

 

EDIT: I had typo'ed echo 'w' > /proc/cavium/stats

EDIT: also need root to do this, so added sudo.

Highlighted
Regular Member
Posts: 529
Registered: ‎07-21-2010
Kudos: 95
Solutions: 6

Re: Performance degradation when CPU is at 100% | IP Forwarding Engine

[ Edited ]

Thank you for that information. I will write a script to get bytes in seconds, to verify how much bandwidth was processed by ASIC and how many by CPU.

 

After installing v.1.10.0 i got much better results.

 

dl-fast.JPG

 

 

that is how it should be.

 

but that issue verified that previous versions of EdgeOS were buggy.

I remember the first EdgeMax Promo Video where people changed CYSCO vs EDGEMAX and the overall performance got better. Now,  6 Years later the performance is as the video suggests. That is very sad.

 

https://www.youtube.com/watch?v=_Qyu8iaqcfk