Reply
Ubiquiti Employee
Posts: 1,244
Registered: ‎07-20-2015
Kudos: 1491
Solutions: 82

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

@RcRaCk2k

> Is there a way, that the offload-engine checks for firewall / filter-rules / nat-rules on first packet and if there is no filter that will ever interact with src-ip-address and dst-ip-address to only insert a ip-forwarding-flow?

I don't think that it is feasible because it would mean that offload engine needs to make routing decision but it is job for Linux routing stack.

 

As for the original problem (flow jumps from offload to linux) we are now making proof of concept without forced flow-flushing upon FIB changes. This task is much more complicated than I initially thought. I hope that experimental offloading engine without flow-flushing will be ready next week.

 

Veteran Member
Posts: 7,781
Registered: ‎03-24-2016
Kudos: 2027
Solutions: 890

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

What routing decision?

All @RcRaCk2k want is to remove some stuff from offload hash table:  protocol , source port and destination port.

 

Then all flows between a source-destination IP pair will use single entry in offload table.

 

Ubiquiti Employee
Posts: 1,244
Registered: ‎07-20-2015
Kudos: 1491
Solutions: 82

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

@16again

> What routing decision?

> All @RcRaCk2k want is to remove some stuff from offload hash table:  protocol , source port and destination port.

I think @RcRaCk was talking about totally different thing. He proposed that offload engine would not send packet to Linux if packet is forwarded without modification. But in order to achieve this offload engine needs to:

  • Match packet against netfilter rules to figure out if packet header is going to be changed
  • Do lookup in FIB table to find nexthop and egress interface (that is what I called a "routing decission")

I believe that this approach is wrong and offload engine should not mess with netfiler and FIB and should delegate this work to Linux network stack.

 

> Then all flows between a source-destination IP pair will use single entry in offload table.

That's nto gonna work, because this approach would trigger a lot of collisions in flow table and decrease performance.

Regular Member
Posts: 525
Registered: ‎07-21-2010
Kudos: 93
Solutions: 6

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

@UBNT-afomins @16again is completely correct.

 

1) Packet arrives Network-Interface

2) Offloading-Engine have no flow-entry and send packet to linux

....... And now the magics happens:

3) when there is no filter-rule that ever will target src-ip / dst-ip (layer3)

3.1) insert layer3 flow-entry, without protocol, dst- and src-port

 

That will not trigger any collisions.

 

Can you dump us the structure of a flow-entry?

Emerging Member
Posts: 62
Registered: ‎04-28-2014
Kudos: 26
Solutions: 1

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

[ Edited ]

A couple of quick questions


Does the offload engine work with decicated hardware in the CPU or is it purely a software offload that is optimized for the the Octeon CPU?

If it is indeed a hardware offload, is the 24K flows limit a hardware limitation and does it change by CPU model?

Ubiquiti Employee
Posts: 1,244
Registered: ‎07-20-2015
Kudos: 1491
Solutions: 82

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

@gritech

> Does the offload engine work with decicated hardware in the CPU or is it purely a software offload that is optimized for the the Octeon CPU?

Offload engine is software that is optimized for the Octeon CPU.

Veteran Member
Posts: 7,781
Registered: ‎03-24-2016
Kudos: 2027
Solutions: 890

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

this software vs hardware boundary is blurred.

It wouldn't surprise me a bit if offload engine looks way more like a FPGA / ASIC than a normal CPU

This FPGA is programmed with microcode (=software) ...which configures the hardware

 

 

 

 

Regular Member
Posts: 525
Registered: ‎07-21-2010
Kudos: 93
Solutions: 6

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

[ Edited ]

I also bet that there is an FPGA with shared memory instead of using CPU.

 

I hope, Ubiquiti will bring OCTEON III Processor (Edit: Model CN7890) into their XG-Product-Line, so we can achieve full wirespeed routing with all 8x 10G Ports. The new Plattform supports 500 Gbps and 240 Billion instructions per second. I am afraid that this chip is not actually used in the XG-Router-Series.

 

https://www.cavium.com/octeon-III-CN7XXX.html

Senior Member
Posts: 3,316
Registered: ‎08-06-2015
Kudos: 1422
Solutions: 190

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0


@RcRaCk2kwrote:

I hope, Ubiquiti will bring OCTEON III Processor into thair XG-Product-Line, so we can achive full wirespeed routing with all 8x 10G Ports. The new Plattform supports 500 Gbps and 240 Billion instructions per second. I am afraid that this chip is not acutally used in the XG-Router-Serie.

 

https://www.cavium.com/octeon-III-CN7XXX.html


??

 

The ER-8-XG is indeed based on a 16-core Octeon III:

user@er-8-xg:~$ show hardware cpu summary
Processors 16
Cores      16
Model      Cavium Octeon III V0.2  FPU V0.0

More specifically it was noted in the story thread here that it appears to be a CN7360.

 

Regular Member
Posts: 525
Registered: ‎07-21-2010
Kudos: 93
Solutions: 6

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

Oh sorry, i was not specific enough, for 240 Billion of Instructions per second you have to use CN7890 Model.

I beleave that this Model is not so much more expensive than the CN73XX Model.

 

For my situation: I have bought 2 pcs of JUNIPER MX480 because of Offload-Issues. I will remove the ER-Pro8 because the new router models ER-4 and ER-6P have more power than ER-Pro/ER-Pro8.

 

I will move all ERs to my internal Network, because if your RIB is not changing, the performance is perfect.

 

For eBGP i will move to Juniper.

Emerging Member
Posts: 62
Registered: ‎04-28-2014
Kudos: 26
Solutions: 1

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

As it is software, allowing the number of flows tracked to be configured would be great for situations with clients creating a large number of sessions.

Regular Member
Posts: 525
Registered: ‎07-21-2010
Kudos: 93
Solutions: 6

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

Rack looks like when Juniper was installed in rack:


WhatsApp Image 2018-04-02 at 19.13.55.jpeg

 

How the rack looks like after Juniper was rocking the network:

Two EdgeSwitches XG were left for switching reasons.


WhatsApp Image 2018-04-10 at 18.51.16.jpeg

 

Second Juniper was delievered yesterday:

This router will be installed in our second datacenter for location redundancy.

 

WhatsApp Image 2018-04-12 at 13.20.28.jpeg

 

If you get screwed by your Router (ER-Pro8 and ER-XG in my case) and your customers get angry, a investment in such hardware is a must to not loose costomers to the big companies. I thought that Ubnt-EdgeRouter can do that job lefthanded, but this was not the case. UBNT-aformis sent me a new cavium ip offload engine module that will fix the issue described in this thread, but for me the time is up to use EdgeRouters as Border-Gateway for ISP and/or Datacenters. There are many bugs in BGP-Implementation, so 99,9% Uptime is a lie at this time.

 

@Ubiquiti-Devs: Change Routing-Daemon to BIRD and we will probably change our mind.

Member
Posts: 124
Registered: ‎08-24-2015
Kudos: 20

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

 

@Ubiquiti-Devs: Change Routing-Daemon to BIRD and we will probably change our mind.

 

Indeed, I and others have been saying that for a while now, time to move to BIRD Ubiquiti -  fast ! 

Veteran Member
Posts: 5,441
Registered: ‎03-12-2011
Kudos: 2736
Solutions: 129

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

Discovered this thread from the beta release notes.

 

The overhead of enabling the cavium offload stats would be useful to know, as would having these stats exposed via SNMP so that they can be monitored more closely. The workaround in 1.10.2 is a bit of a sledge hammer approach, having to choose between poor performance and a non-trivial failover time. Perhaps in the future it could be fine tuned a little more to only flush affected flows, or to only flush if the next hop changes (eg, if a new prefix is learned or withdrawn but it's the next hop is the same as another less specific route that already/still exists, like the default route) - but until then having better tools available to properly assess an individual router (eg, via SNMP) would be good.

 

The other question is around the maximum number of flows, while not the root cause of this issue it may impact other usecases. Is it still planned to make this configurable (along with larger defaults for the beefier EdgeRouter models)? I didn't see mention in the 1.10.2 release notes about this at all. The current default taking only 1.3MB of ram seems like this number could be much larger, even on lower end models (when it's needed anyway). As with the flush issue, having visibility into the metrics for this would also be really handy as well to be able to make appropriate decisions.

Member
Posts: 230
Registered: ‎12-12-2010
Kudos: 100
Solutions: 5

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

Me too wasn't aware of this thread until I found it in the release notes for 1.10.2. The problem has been known for a long time to us, though. We had it occur regularly with OSPF route changes, even if they were only for withdrawing and re-announcing a /30 customer's route. Actually, we came to the conclusion that offload handling was interrupted for some seconds just by looking at the throughput taking a dive to ~ 350 Mbps on ER-8's - which however became visible when we set up 1-second-interval-SNMP-graphs for some selected links.

 

I didn't dare to report this issue because we are mostly running 1.6.0 firmware still - and the expected answer would have been that we ought to upgrade to newest firmware. Which, as we know now, wouldn't have solved the problem ...

 

The side effect of the "fix" in 1.10.2 which might send packets on the wrong interface for up to 12 seconds is quite unfortunate - at least if the 0.0.0.0/0 route changes - because it will likely cause temporary routing loops where packets ping-pong at max speed (eating all bandwidth and CPU on a router) which in turn can cause OSPF to miss hellos and react with even more route flaps. So if feasible, please consider whether a change of the default route could cause an offload-flush immediately, while other route changes won't affect offload as per the fix.

 

Thomas Giger
true global communications GmbH
www.tgnet.de
Regular Member
Posts: 525
Registered: ‎07-21-2010
Kudos: 93
Solutions: 6

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

I think that the offload-engine needs some refactoring. The Module have to do more complex things to get the best performance at all.

 

@UBNT-afomins please enable output from /proc/cavium/ipv4/flows so we can see the current flows.

 

And perhaps a /proc/cavium/ipv4/flow_changes pipe/FiFo where we can see adds / removes / changes so a another application can make a nice GUI where we can see the flows.

 

Performance tipps:

  • If no Connection-Tracking will ever apply to a SRC-IP and DST-IP only install a flow-entity for Layer3, not for Layer4
    That means, that a Rule in the NAT-Table never applys to 0.0.0.0

  • Create a RAW-Table Target to let a user choose if you should forward on Layer4 or Layer3
    iptables -t raw -A OUTPUT -o eth2 -j cavium_nf --layer3
    iptables -t raw -A OUTPUT -d 172.16.1.0/24 -j cavium_nf --layer3
    iptables -t raw -A OUTPUT -o eth3 -j cavium_nf --layer4

 

Layer3-Entity:

  • In Interface
  • Out Interface
  • SRC IP-Address
  • DST IP-Address
  • Nexthop MAC-Address

Layer4-Entity:

  • Proto
  • In Interface
  • Out Interface
  • SRC IP-Address
  • optional SRC-NAT-IP-Address
  • SRC Port-Address
  • optional SRC-NAT-Port-Address
  • DST IP-Address
  • optional DST-NAT-IP-Address
  • DST Port-Address
  • optional DST-NAT-Port-Address
  • Nexthop MAC-Address

 

Our BGP Border Routers do not need Connection Tracking at all.

We need performance on Routing without NAT.

 

 

 

Ubiquiti Employee
Posts: 1,244
Registered: ‎07-20-2015
Kudos: 1491
Solutions: 82

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

[ Edited ]

@NVX

> The overhead of enabling the cavium offload stats would be useful to know, as would having these stats exposed via SNMP so that they can be monitored more closely
My tests show that there's no overhead at all, that's why we will enable stats by default

 

> The workaround in 1.10.2 is a bit of a sledge hammer approach, having to choose between poor performance and a non-trivial failover time
For the vast majority of usecases "non-trivial failover time" during failover is much better option than having bad performance caused by updated in routing table

 

> Perhaps in the future it could be fine tuned a little more to only flush affected flows
This was our first choice, but it turned out to be a non-trivial task to implement "smart flow flushing" that's why we chose "sledge hammer approach"

 

> The other question is around the maximum number of flows... Is it still planned to make this configurable
Yes, we plan to make size of flow table configurable in future version.

 

BTW in `1.10.2` we added cavium counter which is incremented when active flow is being squeezed out from offloading table (ipv4_create_flow_not_found_replaced_non_expired).

 

You can check this counter like so:

  1. Enable cavium stats once:
    sudo bash -c "echo 1 > /proc/cavium/stats"
  2. Show IPV4 flow creation stats:
    sudo cat /proc/cavium/stats|grep "IPv4 flow creation dynamcis" -A 7
    IPv4 flow creation dynamcis
    =============================
    
    ipv4_create_flow_found                          256
    ipv4_create_flow_found_replaced                 256
    ipv4_create_flow_not_found                      132
    ipv4_create_flow_not_found_replaced_expired     132
    ipv4_create_flow_not_found_replaced_non_expired 0   <---- !!! If this counter is growing then size of offloading table is not enough !!!

 

> As with the flush issue, having visibility into the metrics for this would also be really handy as well to be able to make appropriate decisions.
You can check flush statistics like so:

 

ubnt@hank:~$ sudo cat /proc/cavium/stats|grep "Flow cache flushes" -A 7
Flow cache flushes
=============================

ipv4_flushes 0
ipv6_flushes 0

 


@tma
> The side effect of the "fix" in 1.10.2 which might send packets on the wrong interface for up to 12 seconds is quite unfortunate -
> at least if the 0.0.0.0/0 route changes - because it will likely cause temporary routing loops where packets ping-pong at max speed (eating all bandwidth and CPU on a router)
> which in turn can cause OSPF to miss hellos and react with even more route flaps
I don't think that it will produce route loops because OSPF singling packets (or any other locally-originalted traffic) are not handled by offloading engine and OSPF hellos will always be send from correct interface.

 

@RcRaCk2k

> please enable output from /proc/cavium/ipv4/flows so we can see the current flows.
Ok, we shall consider enabling /proc/cavium/ipv4/flows in next firmware

 

 

 

Veteran Member
Posts: 5,441
Registered: ‎03-12-2011
Kudos: 2736
Solutions: 129

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0


@UBNT-afomins wrote:

 

> The workaround in 1.10.2 is a bit of a sledge hammer approach, having to choose between poor performance and a non-trivial failover time
For the vast majority of usecases "non-trivial failover time" during failover is much better option than having bad performance caused by updated in routing table

 

> Perhaps in the future it could be fine tuned a little more to only flush affected flows
This was our first choice, but it turned out to be a non-trivial task to implement "smart flow flushing" that's why we chose "sledge hammer approach"

 

 


Yeah that's fair, especially for a short term fix, but is there plans to revisit this in a later version?

Veteran Member
Posts: 5,441
Registered: ‎03-12-2011
Kudos: 2736
Solutions: 129

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0


@UBNT-afomins wrote:

 

BTW in `1.10.2` we added cavium counter which is incremented when active flow is being squeezed out from offloading table (ipv4_create_flow_not_found_replaced_non_expired).

 

You can check this counter like so:

  1. Enable cavium stats once:
    sudo bash -c "echo 1 > /proc/cavium/stats"
  2. Show IPV4 flow creation stats:
    sudo cat /proc/cavium/stats|grep "IPv4 flow creation dynamcis" -A 7
    IPv4 flow creation dynamcis
    =============================
    
    ipv4_create_flow_found                          256
    ipv4_create_flow_found_replaced                 256
    ipv4_create_flow_not_found                      132
    ipv4_create_flow_not_found_replaced_expired     132
    ipv4_create_flow_not_found_replaced_non_expired 0   <---- !!! If this counter is growing then size of offloading table is not enough !!!

  


Neat stat. Just checked one of my routers and this is increasing. Will be good when the table size becomes configurable.

Ubiquiti Employee
Posts: 1,244
Registered: ‎07-20-2015
Kudos: 1491
Solutions: 82

Re: Offloading-Flow randomly "jumps" from Offload-Engine to Linux / ER-Pro8 / v.1.10.0

@NVX
> Perhaps in the future it could be fine tuned a little more to only flush affected flows
> Yeah that's fair, especially for a short term fix, but is there plans to revisit this in a later version?
Yes, we will consider implementing "smarter" approach to flow flushing after we release 2.0.0

 

> ipv4_create_flow_not_found_replaced_non_expired 0 <---- !!! If this counter is growing then size of offloading table is not enough !!!

> Just checked one of my routers and this is increasing. Will be good when the table size becomes configurable.

This might be an issue on more powerfull ER models - ER-4/ER-pro/ER-infinity. We will increase size of offloading flow table in 1.10.4

 

  1. What's your ER model?
  2. How many LAN clients is it serving?
  3. What's the value of "ipv4_create_flow_not_found_replaced_expired" vs "ipv4_create_flow_not_found_replaced_non_expired" on that router?
Reply