Thursday - last edited Thursday
I bought 2 ER-X routers with the intent of chaining them together for port forwarding while using hardware offloading. However, I am having an issue where both routers crash when I use hardware offload NAT.
My particular setup is that I have 2 ER-X routers, one of which (let's call it A) is connected to the internet on its WAN port, and the other (let's call it B) is connected to A. I have A set to port forward port 80 to B, and B set to port forward port 80 to a web server host on its internal network. I have haripin NAT and hardware offload NAT enabled on both routers.
When I attempt to download a file from the web server on any machine that is connected to router B, using router A's WAN IP address, both routers A and B stop responding. When the routers stop responding, they don't respond to ping, don't route any traffic, and each LED (on both routers) that corresponds to an ethernet port where a cable is connected begins to flash synchronously at about 2 Hz.
Specifically, the problem occurs when I am downloading a test file from the web server. The file starts to download until the download speed reaches about 50MB/s (400Mbps). At that point, all traffic stops.
Disconnecting and then re-connecting the ethernet cable between the two routers resolves the issue, at least until I try to download the file again. Connecting the routers to each other through a 3rd party ethernet switch does not help the problem.
However, I have noticied that if I attempt to download the test file from a host on router A's internal network (still using router A's public IP address), the problem does not occur. This indicates that HW offload is working for one ER-X, but not when the connection is bi-directional between two ER-X routers.
I have tried both firmware v1.8.5, v1.9.7+hotfix.3, and v1.9.7+hotfix.4, and the problem is idential on all.
I have tried restting the routers to default and enabling only HW offload NAT and 1 port forwarding rule on each router, and the problem persists.
Here is a diagram of my setup, in case it helps anyone reproduce/debug the problem.
Here is the situation in which the routers crash:
Here is a similar situation in which the routers surprisingly don't crash:
Just a guess... there might be some sort of packet storm happening due to the hairpin NAT. Try turning it off on unit B and keeping it enabled for unit A. If that doesn't help, try the other way around.
That said, is there any specific reason you are cascading the routers? It is usually not necessary to do this in most network configurations. If your intent is to have isolated (or partially isolated) networks, this can be done with VLANs on a single router which will likely result in a cleaner, more efficient, and more controlled network environment. The ER-X has a VLAN aware switch and it is realtively easy to configure the VLANs and firewall rules to meet most network requirements.
Also, worth noting that your description of a "crash" is not quite accurate since you mention that disconnecting the ethernet cable fixes the problem. A crash usually implies that the device must be restarted to resume normal functions. This would be more accurately described as temporarily unresponsive (until the ethernet cable is unplugged and reinserted).
Assuming that it is a packet storm of some sort, the system becomes unresponsive because either 100% of the resources are consumed by the packet storm, or it is isolating/disabling the ports as a function of the unusual packet volume (possibly STP or storm control type error detection technqiues).
Thursday - last edited Thursday
@ooferomen - Does that issue apply here given that it is a http based file transver vs iPerf3? If it is related, would it suggest that turning off HW offload would solve the OP's problem?
I do not have access to the link associated with that issue within the KI's, so I can't read more about it, but now I'm genuinely curious (even though I'm not experiencing this issue).
Thank you for your responses.
@shermbug Yes, "temporarily unresponsive" is a more applicable term. I only recently figured out that it wasn't crashing when I tried unplugging the cable connecting the two.
Yes, I could be using VLANs. I wanted two separate routers so that I could disconnect or modify router B without affecting router A, but I'll keep that in mind as an alternative option.
I can confirm that disabling hardware offload NAT causes the problem to no longer occur. However, the performance decrease makes that not an option.
@ooferomen It looks like issue #5 is what I am dealing with. Unfortuately I don't have permission to read any more about it.
You can workaround this issue by configuring dNAT and hairpin on "lower" ER-X, so if 10.1.0.3 client accesses the webserver, the "upper" ERX isn't used at all.
@clcain 2 or more ERXs with HWNAT enabled will lock up.
It is a know issue which (speculation on my part) problemy lies in the chip vendor's SDK which is likely why ERX hardware offload issues have been a challenge for Ubnt.
ERXs can achieve near wire speed for basic routing w/o HWNAT. Running other things such as NAT, a bunch of firewall rules, etc will hit the limits of the ERX and you wont see wire speed. If you need multiple services and higher bandwidith then you need a beefier router. The ER-8 and ER-Pro fit the bill while the new ER-4 and ER-6 are even more impressive.