Reply
Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

USG-3P Smart Queue testing results

[ Edited ]

I ran a few tests to work out the optimal Smart Queue rate settings on my USG-3P. DPI is enabled. Speeds are in Mbps, ping times are in ms. Tests are quite variable, so don't read too much into the fine detail.

 

Upload

LimitSpeed (mbps)Min ping10%ileMedianAverage90%ileMax
Off17104065617479
20M17103864607780
16M14111112121317
15M13101011111314

 

This shows that I have a 17Mbps unrestricted upload; setting the Smart Queue limit above that (20M) has no effect. Any setting below that has a small effect on throughput and a signficant improvement to latency / ping consistency. I'll be using the 16M setting.

 

Download

LimitSpeed (mbps)Min ping10%ileMedianAverage90%ileMax
Off6911449099166188
160M35101114162336
72M31111113131617
64M32111113141622
48M31111113141522

 

My download is 69Mbps unrestricted with fairly large, but not disastrous, variations in latency. Applying any level of Smart Queue, even far above line rate (160M here) restricts the download speed to a cap of around 35Mbps and improves latency. I believe this is a hardware limit of the USG. Any setting below line rate gives very good latency control. I'll be using the 72M setting going forward as I value response more than raw throughput, but it's a shame that the USG-3P cannot better 35Mbps.

 

Notes

I'm running beta firmware on the USG, but I doubt that makes a big difference: IPS/IDS is disabled. Connection is a PlusNet 80/20 FTTC; test is using betterspeedtest from here on a Banana Pi connected to the USG via a TP-Link unmanaged gigabit switch.

 

me@USG:~$ show ubnt offload

IP offload module   : loaded
IPv4
  forwarding: enabled
  vlan      : enabled
  pppoe     : enabled
  gre       : enabled
IPv6
  forwarding: enabled
  vlan      : enabled
  pppoe     : disabled

IPSec offload module: loaded

Traffic Analysis    :
IPv4
  forwarding: enabled
  vlan      : enabled
  pppoe     : enabled
  gre       : enabled
  export    : disabled
  dpi       : enabled
IPSec offload module: loaded

 

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
Member
Posts: 262
Registered: ‎09-08-2016
Kudos: 60
Solutions: 9

Re: USG-3P Smart Queue testing results

[ Edited ]

Yes, you are confirming the necessity of updated unifi firewalls.

 

When using VOIP on a ADSL connection, smartqueues should be turned on for the latency. However the current firewalls can do not more than 200mbps (USG-Pro), which is too less for a lot of people nowadays (in Western Europe).

 

Common internet connections for SMB in the Netherlands doing already min. 250Mbit (like we have). Turning on smartqueues cripples the download speed to approx 20% of what we pay for. (Even the slowest we can get is 60Mbit/s, far more than the USG3 with smartqueues can handle.)

 

We would definitely buy a USG-8-XG to solve this, if these were only available!

Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

I note that the USG seems to be limited by CPU load from ksoftirqd, which is only running on CPU0.

 

I have tried editing /proc/irq/24/smp_affinity which determines that, but I cannot write to that "file" even as superuser. I have also tried installing Debian MIPS irqbalance which runs apparently successfully but does not influence the situation.

 

@UBNT-cmb— would it give any benefits to allow both cores to handle the IRQ load? Happy to try things out.

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
Ubiquiti Employee
Posts: 5,012
Registered: ‎08-08-2016
Kudos: 5420
Solutions: 346

Re: USG-3P Smart Queue testing results

[ Edited ]

wrote:

I note that the USG seems to be limited by CPU load from ksoftirqd, which is only running on CPU0.

 

I have tried editing /proc/irq/24/smp_affinity which determines that, but I cannot write to that "file" even as superuser. I have also tried installing Debian MIPS irqbalance which runs apparently successfully but does not influence the situation.

 

@UBNT-cmb— would it give any benefits to allow both cores to handle the IRQ load? Happy to try things out.


We get what Linux offers there. Generally under really high load you have ksoftirqd consuming a good deal of both cores. Some specific things it probably can't do so. There isn't anything that prevents it from using both in general. AFAIK there isn't anything you can do to improve that, but feel free to poke at it. 

Emerging Member
Posts: 62
Registered: ‎09-26-2017
Kudos: 10
Solutions: 1

Re: USG-3P Smart Queue testing results

Well my WAN is only 25Mbps so the 35Mbps cap is not a big deal --- BUT I can not get DPI and Smart Queue to function at the same time. It seems that it is still one or the other. Also, why does the problem persist where enabling Smart Queues completely breaks the traffic monitoring? Whenever I enable SQs, my per-device and total data transfer stats are completely wrong. As in, not counting 10% of what actually goes through the pipe. It's frustrating and ridiculous. Pretty basic to have an accurate and complete picture of total data use and per-device data use, no?

 

As for SQs in general, yes, USG-3P seems to do well with them. It is able to make my lousy 25/2 DSL at least "usable" where it is completely unusable with them off. (Latency spikes above 500ms on routine basis etc etc)

 

I'm using the latest firmware which purports to allow simultaneous DPI and SQs but for the life of me, I can't get them to run at the same time. I'd like to, so if I'm missing something obvious, please advise.

Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

[ Edited ]

wrote:

We get what Linux offers there. Generally under really high load you have ksoftirqd consuming a good deal of both cores. Some specific things it probably can't do so. There isn't anything that prevents it from using both in general. AFAIK there isn't anything you can do to improve that, but feel free to poke at it. 


I have done; herewith my results.

 

Firstly, it does not seem to be possible to set the eth0 IRQ handler to be both cores. I can only set one or the other. All other IRQs are handled by both (/proc/irq/*/smp_affinity is 3 for both cores); but IRQ24 (eth0) can only have smp_affinity set to 1 (for CPU0) or 2 (for CPU1).

 

Secondly, and most surprisingly, there seems to be a significant improvement in using CPU1 to handle the IRQ instead of CPU0. I ran a set of tests with the same method as above, this time with static Smart Queue settings, and varied which core handled the IRQ. I ran three on CPU0, three on CPU1 then repeated. Results are shown below.

 

I can't think why this would be, but it does seem to be real. I have no way of knowing, but it seems logical that if both cores could be brought into play for IRQ24, we could see some real performance improvement on Smart Queue throughput, but this may be a hardware restriction.

 

To try this for yourself, ssh into your USG, and issue "sudo su" then "echo 1 > /proc/irq/24/smp_affinity" for CPU0; same but with "2" not "1" for CPU1.

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
IRQ.png
Emerging Member
Posts: 61
Registered: ‎05-11-2017
Kudos: 23
Solutions: 5

Re: USG-3P Smart Queue testing results

@Troon

 

Awesome work! Test results are identical for me a well. I get 38Mbs reguardless of the CPU used with or without DPI in my case. On a 150/10 connection 38Mbs is a nasty hit. Latancy in my case did not improve. Avg to speedtest.net was 38ms with "Cox communications".

 

This is to a testing server that is less then 40 miles physically from me. For my case latancy was the same with any firewall that I've used in the past few years. Ever since the ISP tripped on contect inspection it's been total crap show for a good connection. 

Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

More interesting results, relating to my discovery of a significant ~10% performance increase by switching which CPU handles the eth0 interrupts...

 

I ran the test again, this time logging how many eth0 and timer interrupts were handled by each core (cat /proc/interrupts before each test). My hypothesis was that CPU0 was being penalized by having to handle both network and timer interrupts.

 

But no.

 

The table below shows:

  1. Core 1 (the one that gives better throughput when handling eth0, except for the spurious run 8) handles roughly twice as many interrupts as Core 0 (blue cells) — for the same test!
  2. Roughly the same number of timer interrupts occur on the same core as the network interrupts (orange).
  3. More than twice the number of timer interrupts occur on Core 0 when core 1 is handling network versus the other way around (pink).

The obvious next step would be to restrict the timer interrupts to the "other" core to the network handler, but I cannot adjust that — the timer interrupt is unchangeably bound to both cores.

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
ints.PNG
Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

[ Edited ]

More playing, managed to eke out another 6% speed increase, now comfortably running over 40Mbps with Smart Queues and DPI enabled; in fact, the USG's own speed test reports over 50Mbps.

 

I was looking at the ethernet driver module, octeon-ethernet. One of its tuneable parameters is called rx_cpu_factor, which is described thus:

 

"rx_cpu_factor:Control how many CPUs are used for packet reception. Larger numbers result in fewer CPUs used. (int)"

 

By default, it is set to 8. For a laugh, I ran some speed tests (same hardware and method as above) with it set to this default, and 2 which is a lower number plucked out of thin air. IRQ set to the faster Core 1 as above.

 

Results below. To try this yourself, ssh into the USG and issue:

 

 

sudo bash -c 'echo 2 > /sys/module/octeon_ethernet/parameters/rx_cpu_factor'

I haven't seen any obvious downsides to this, but no guarantees that you won't break something.

 

 

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
rx_cpu_factor.PNG
Regular Member
Posts: 489
Registered: ‎07-20-2013
Kudos: 242
Solutions: 22

Re: USG-3P Smart Queue testing results

Would really like to read @UBNT-cmb's input on these performance increases. Amazing. Good work Troon. 

Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

[ Edited ]

Ubnt Banana

 

A+ for bufferbloat at over 44Mbps. That's with Smart Queues + DPI, hardware offload off (or DPI doesn't work), IRQ24 handled by Core 1 and rx_cpu_factor set to 2. I'm in the UK hence the fairly high pings to DSLReports' US-based servers.

 

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

[ Edited ]

So what's the best setting for rx_cpu_factor? Looks like anything below the default 8 improves performance (note non-zero y-origin: the increase is 5–10%):

 

 

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
rxs.PNG
Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

[ Edited ]

Found even more speed, this time thanks to RFS. The commands to turn it on are, as root:

 

echo 3 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 2048 > /proc/sys/net/core/rps_sock_flow_entries
echo 2048 > /sys/class/net/eth0/queues/rx-0/rps_flow_cnt

To turn off, replace the 3 and the 2048 with 0. The usual test (3 on, 3 off, repeat) gives about another 7% on top of the above (and the USG's own speed test reports 56/13!):

 

 

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
RFS.PNG
New Member
Posts: 18
Registered: ‎03-21-2017
Kudos: 3

Re: USG-3P Smart Queue testing results

This is interesting what does the CPU usage look like??
Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

[ Edited ]

Both cores are close to 100% during the download tests — the little USG's speed is CPU-bound with smart queues turned on. With less traffic (my connection-limited upload, for example) the CPU usage drops right off. 

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
Emerging Member
Posts: 61
Registered: ‎05-11-2017
Kudos: 23
Solutions: 5

Re: USG-3P Smart Queue testing results

What's the temp like when doing the download? Lets say run a download for 6 hours and see if it cooks the unit. At the end of the guide it says it would be bound to a single CPU if its coming from one sender what effect would it have with a torrent?

 

That's a really nice find, makes me wonder how was the connection the rest of the time. Reading the guide it's close to RSS is a way on a windows box where you don't notice it until the machine is being hit hard. RSS has advantages where RFS seems to miss. When you do smaller things like general web surfing for one person dose it hesitate at all or just acts normal. 

 

What dose the memory useage look like when under load? With 5.7.x+ IDP/IPS will be introduced and this little guy has almost no memory to use. 

 

@Troon Please keep tinkering! 

Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

[ Edited ]

I have no way to measure the temperature and no desire to cook my USG Man Happy

 

There's plenty of memory in the 3P. I have tried IPS, and memory still isn't a problem. It's even more demanding of the CPU, however: I was getting under 30Mbps.

 

The problem with the current IPS implementation is that my wired devices connected via a dumb switch disappear from the controller, so I can't use this yet. Once the ubnt guys fix that, I'll have a tinker to see what improvements I can make. 

 

I have found about 30% download speed increase in this thread — I'll keep looking. There are no apparent problems with my changes: everything seems fast and smooth. 

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

Next, I wondered if there was any difference between RPS and RFS. RFS is an extension to RPS and would be expected to perform better on a typical multi-core system. The USG isn't a typical system though, and by default, neither is enabled. I've shown above that RFS is better than nothing, but is it better than plain RPS?

 

As above, to enable RPS we do (as root):

 

echo 3 > /sys/class/net/eth0/queues/rx-0/rps_cpus

to allow both CPU cores to share the workload (3 is 11 in binary: that's 01 for core 0 + 10 for core 1). Set it back to 0 to disable.

 

Then to enable RFS, we also do:

 

echo 2048 > /proc/sys/net/core/rps_sock_flow_entries
echo 2048 > /sys/class/net/eth0/queues/rx-0/rps_flow_cnt

...or set them back to 0 to disable. 2048 is a guess, and should be the maximum number of connections you'd expect divided by the number of queues (and must be a power of two) — and the USG is limited to one queue. 32768 is recommended as a reasonable setting for a "medium-sized server": my small domestic network is unlikely to need that much hence my choice.

 

A quick test showed there was no clear distinction between the two options, so I ran a few more tests than before. Again, I'm alternating every three tests to avoid biasing one test due to slowly changing ISP contention, or something like that.

 

The graph below shows no clear winner. A Student's T test on the data shows a 70% probability that RFS is better, which is far from statistically significant. I'm keeping it turned on, though. Memory usage hovered around 36% throughout.

 

 

Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
rfsrps.PNG
New Member
Posts: 4
Registered: ‎12-03-2017
Kudos: 1

Re: USG-3P Smart Queue testing results

This is great work, thanks! I tried this on my USG which was struggling with smart queues and definitely saw a significant speed improvement.

 

Just to make it easier for others who find this page, I've copied all the commands you provided below:

 

echo 3 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 2048 > /proc/sys/net/core/rps_sock_flow_entries
echo 2048 > /sys/class/net/eth0/queues/rx-0/rps_flow_cnt
echo 2 > /sys/module/octeon_ethernet/parameters/rx_cpu_factor
echo 2 > /proc/irq/24/smp_affinity

Member
Posts: 189
Registered: ‎09-01-2017
Kudos: 50
Solutions: 9

Re: USG-3P Smart Queue testing results

Looking good.
Home: USG-3P | UAP-AC-Lite
Church: USG-Pro | US-24-250W | UAP-AC-Lite x 6 | UAP-AC-IW x 2
Controller: Linode 4GB VPN on Ubuntu 18.04
Reply