Highlighted
New Member
Posts: 10
Registered: ‎07-18-2017
Kudos: 21

EdgeRouter Infinity - conntrack tuning

Hi All,

 

We have rollout a heap of these EdgeRouter Infinity devices, our network services around 1500 residential and business customers for internet access so the connection limits need to be massive, we have noticed the conntrack defaults are not good enough based on the hardware specs of Infinity has anyone got any idea what we can set without causing any performance issues.

 

We have a 10Gbps link and 2x 1Gbps links pertty much we hover around 65% utilization across all links on peak hours.

 

Thank you in advance.

New Member
Posts: 17
Registered: ‎12-14-2016
Kudos: 1

Re: EdgeRouter Infinity - conntrack tuning

Was wondering if you ever found anything with this, we have nearly the same setup as you listed here with an edgerouter infinity and I think we may be hitting some limits on the table since we do a lot of NAT as well.

 

Below is a sysctl -a | grep nf_conntrack during our LOWEST utilization period

 

net.netfilter.nf_conntrack_acct = 0
net.netfilter.nf_conntrack_buckets = 32768
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 51975
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.netfilter.nf_conntrack_expect_max = 2048
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_helper = 1
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 262144
net.netfilter.nf_conntrack_tcp_be_liberal = 1
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 7440
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 30
net.nf_conntrack_max = 262144

New Member
Posts: 31
Registered: ‎12-06-2018
Kudos: 6
Solutions: 5

Re: EdgeRouter Infinity - conntrack tuning

Well this topic has been cussed and discussed here and elsewhere, but I'll take another stab at it try and be helpful.  Some advice has been to avoid connection tracking (such as this post) while others have indicated they just set the limit to "millions" and didn't worry about it.

 

As background, the key to connection tracking limits starts with the hash table that is initialized when the linux kernel module is loaded (from the source code here):

Spoiler
int nf_conntrack_init_start(void)
{
	unsigned long nr_pages = totalram_pages();
	int max_factor = 8;
	int ret = -ENOMEM;
	int i;

	/* struct nf_ct_ext uses u8 to store offsets/size */
	BUILD_BUG_ON(total_extension_size() > 255u);

	seqcount_init(&nf_conntrack_generation);

	for (i = 0; i < CONNTRACK_LOCKS; i++)
		spin_lock_init(&nf_conntrack_locks[i]);

	if (!nf_conntrack_htable_size) {
		/* Idea from tcp.c: use 1/16384 of memory.
		 * On i386: 32MB machine has 512 buckets.
		 * >= 1GB machines have 16384 buckets.
		 * >= 4GB machines have 65536 buckets.
		 */
		nf_conntrack_htable_size
			= (((nr_pages << PAGE_SHIFT) / 16384)
			   / sizeof(struct hlist_head));
		if (nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
			nf_conntrack_htable_size = 65536;
		else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE))
			nf_conntrack_htable_size = 16384;
		if (nf_conntrack_htable_size < 32)
			nf_conntrack_htable_size = 32;

		/* Use a max. factor of four by default to get the same max as
		 * with the old struct list_heads. When a table size is given
		 * we use the old value of 8 to avoid reducing the max.
		 * entries. */
		max_factor = 4;
	}

	nf_conntrack_hash = nf_ct_alloc_hashtable(&nf_conntrack_htable_size, 1);
	if (!nf_conntrack_hash)
		return -ENOMEM;

	nf_conntrack_max = max_factor * nf_conntrack_htable_size;

Given that, I contend the MINIMUM conntrack hash table for the ER Infinity with 16GB of RAM should be 262144.  Anyway, it can be manually set with:

set system conntrack hash-size 262144
commit ; save

You'll be prompted to reboot because the setting can only be changed when the conntrack module is loaded.

 

Now, we've provided the hash table size, so the kernel should set a maxium of 2097152 for nf_conntrack_max using the max_factor of 8 (the max_factor = 4 is ONLY used IFF the system has to calculate the hash size).  However, it can be manually set with:

 

set system conntrack table-size 2097152
commit ; save

That change will take effect immediately -- no reboot required.  You may also set a lower limit so long as value is a "power of 2" and the kernel will accept it.

 

So, IMHO, that is where to begin.  If that is not sufficient, you will know it by log messages such as the following:

 

kernel: nf_conntrack: table full, dropping packet.

If that is your case, then first check your free memory available.  BGP routing tables and other things need RAM as well.  If you have plenty of RAM to spare, increase the hash-size, then reboot, then tweak the table-size (connection limit) if necessary.

 

A word about expect-table-size.  This table separately tracks connections that the kernel thinks are going to be opening soon or "expected".  This is used by things like FTP that have one channel for control and another for data.  If the kernel "sees" the control traffic referring to the protocol/port for the data channel, it can use that to expect and track the eventual data traffic.  These would require data channels to be unencrypted and (most likely) a helper module or rule to identify that control traffic.  IMHO the kernel conntrack helper modules are not very useful in most modern implementations and sometimes even harmful.

 

So, unless you KNOW you need the helper module for some specific traffic, I recommend disabling them:

 

Spoiler
set system conntrack modules ftp disable
set system conntrack modules gre disable
set system conntrack modules h323 disable
set system conntrack modules pptp disable
set system conntrack modules sip disable
set system conntrack modules tftp disable

Finally, the most difficult part to get "right" is tweaking connection tracking time outs for YOUR particular use case.  Therefore, that is left as an exercise for the reader.

 

Enjoy!

 

New Member
Posts: 17
Registered: ‎12-14-2016
Kudos: 1

Re: EdgeRouter Infinity - conntrack tuning

Thank you for your response RokaKen, I will give this a try.

SuperUser
Posts: 14,658
Registered: ‎12-08-2008
Kudos: 11474
Solutions: 701
Contributions: 1

Re: EdgeRouter Infinity - conntrack tuning

We run Infinitys as our core routers for our ISP, and we just upped the table size to 64 Million and all the problems went away...

 

If you look at the actual conntrack table structure the individual entries don't take up much space, so making the table huge is not really an issue.   The kernel make pretty efficient use of RAM for this.   We started doing this years ago when we still used big Linux servers (dual Xeon HPs) as routers, and it still works fine.

 

As stated above, there are all kinds of settings you can tweak if you want to...

Jim

" How can anyone trust Scientists? If new evidence comes along, they change their minds! " Politician's joke (sort of...)
"Humans are allergic to change..They love to say, ‘We’ve always done it this way.’ I try to fight that. "Admiral Grace Hopper, USN, Computer Scientist
"It's not Rocket Science! - Oh wait, Actually it is... "NASA bumper sticker
"Just because you can do something doesn't mean you should."my mantra in the Programming classes I used to teach once upon a time...
New Member
Posts: 31
Registered: ‎12-06-2018
Kudos: 6
Solutions: 5

Re: EdgeRouter Infinity - conntrack tuning


@eejimm wrote:

We run Infinitys as our core routers for our ISP, and we just upped the table size to 64 Million and all the problems went away...

 


Jim,

 

I understand and agree with everything you said, but I (respectfully) disagree with your approach.  I've seen lots of info on this topic for x86_64, but couldn't find anything for Cavium or MIPS in general, so I'd like to explore it a little further here.  Fortunately, the ER Infinity should provide all the info we need.

 

First, I assume you increased your hash-size to accomodate such a connection table.  This provides the linked-list for efficient lookups to the much larger tracking table.  That would be 8388608 to accomodate a table-size of up to 67108864.  Last I looked, a HASH_ENTRY was 8bytes, so 8388608 X 8bytes = 64MB -- no problem.  It's a hefty chunk to waste if you don't use it, but if you do, it's nothing on a router with 16GB.

 

Then, I agree the connection entries themselves are small, but the kernel doesn't allocate and reap memory per connection.  The kernel uses a SLUB allocator to improve efficiency and reduce fragmentation.  We can look at how much memory is being used by an ER Infinity (or any linux box) by looking at slabinfo -- here's mine from a nearly idle Infinity:

 

$ sudo cat /proc/slabinfo
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
nf_conntrack_8000000404aa8000      0      0    288   28    2 : tunables    0    0    0 : slabdata      0      0      0
nf_conntrack_ffffffff80fa9180   4060   4060    288   28    2 : tunables    0    0    0 : slabdata    145    145      0
nf_conntrack_expect      0      0    232   35    2 : tunables    0    0    0 : slabdata      0      0      0
<snip>

Ok, so the second line tells me I have 4060 active and total connections (objects) tracked, each taking 288b of RAM -- they are grouped in 28 objects per slab for a total of 145 slabs and each slab takes 2 pages of memory.  So, on Cavium (MIPS) how big is a page?

 

$ sudo getconf PAGESIZE
4096

Great - 4096b (same as x86_64).  That means every 28 connections will take 8192b of RAM (so, 288b X 28 = 8064b meaning there is 8192b - 8064b = 128b of wasted RAM).  I'll use 8192b / 28 ~ 293b per connection to account for the actual and wasted RAM (approximated).

 

Again, I agree the kernel is efficient -- unlike the static hash table, the connection table grows only AS NEEDED.  It can grow quickly but the kernel will reap it slowly -- trying to reuse the already allocated space with future connections as old ones time out.  Still not a problem with plenty of RAM.

 

Here's the rub -- if I take you literally, then 64000000 connections X 293b ~ over 18GB RAM!  I'm sure your big iron HP servers had 24GB or more, but that exceeds ALL RAM in an ER Infinity.  I contend that the ER would become unstable when available RAM is exhausted -- say, somewhere between 51 and 58 million.

 

In my use case, the ER Infinity(s) are on the Edge.  I have multiple upstream providers and downstream customers so I'm taking multiple BGP full table feeds upstream and OSPF downsteam both IPv4 and IPv6.  I probably have less free RAM than you, but not a significant amount.

 

My approach would be never go over a hash-size of 4194304 giving a connection limit of 33554432 on the Infinity.  If I ever hit that limit, I would be using ~9GB of RAM which is more than 50% of available RAM, but still safe.  That many connections at once probably means a customer or my router is under a DDoS attack and I need my BGP to stay up so I can advertise a blackhole/scrubber prefix.  If that many connections are legitimate, then I need to add a router, swing a circuit or two, tweak some attributes and share that load.  Otherwise, I need to pursue tracking bypass referenced in my post above.

 

If anyone does have an output of 'sudo cat /proc/slabinfo' showing 64 million (or even 32 million) nf_conntrack objects, please post it.  I'd seriously love to see it and hear how well your router handled it (or didn't).