New Member
Posts: 13
Registered: ‎01-05-2017
Kudos: 6
Solutions: 2
Accepted Solution

Problems with DNS on USG.

[ Edited ]

Something seems really wrong with the DNS resolution on my USG.  

I think this problem is the core of the cascading failures I've been having with WAN1/WAN2 failover.

 

 

Observe the following interaction:

ubnt@usg:/etc$ info

Model:       UniFi-Gateway-3
Version:     4.4.12.5032482
MAC Address: f0:9f:c2:05:d2:b7
IP Address:  192.168.200.201
Hostname:    usg
Uptime:      13125 seconds

Status:      Connected (http://192.168.1.3:8080/inform)

ubnt@usg:/etc$ host ping.ubnt.com
ping.ubnt.com is an alias for dl.ubnt.com.
dl.ubnt.com is an alias for d2cnv2pop2xy4v.cloudfront.net.
d2cnv2pop2xy4v.cloudfront.net has address 13.33.248.211


ubnt@usg:/etc$ getent hosts ping.ubnt.com
^C


ubnt@usg:/etc$ ping ping.ubnt.com
^C


ubnt@usg:/etc$ ping 13.33.248.211
PING 13.33.248.211 (13.33.248.211) 56(84) bytes of data.
64 bytes from 13.33.248.211: icmp_req=1 ttl=237 time=231 ms
64 bytes from 13.33.248.211: icmp_req=2 ttl=237 time=104 ms
^C
--- 13.33.248.211 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 104.709/168.228/231.747/63.519 ms

 

My /etc/resolv.conf shows

#line generated by /opt/vyatta/sbin/vyatta_update_resolv.pl
domain		localdomain
nameserver      8.8.4.4
nameserver      9.9.9.9

And, my /etc/nswitch.conf shows

# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.

passwd:         compat
group:          compat
shadow:         compat

hosts:          files dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

 

Moreover, from another machine, the DNS cache on the USG returns the correct entry:

~ : abe@albus 21:56:47 Thu Dec 07
$ host ping.ubnt.com 192.168.1.1
Using domain server:
Name: 192.168.1.1
Address: 192.168.1.1#53
Aliases:

ping.ubnt.com is an alias for dl.ubnt.com.
dl.ubnt.com is an alias for d2cnv2pop2xy4v.cloudfront.net.
d2cnv2pop2xy4v.cloudfront.net has address 13.33.248.211

 

Why is it that libc host lookups (via `getent` or `ping` for example) on the USG itself are failing?


Accepted Solutions
New Member
Posts: 13
Registered: ‎01-05-2017
Kudos: 6
Solutions: 2

Re: Problems with DNS on USG.

I think I may have identified the problem, which I think arises from an interaction between the load-balancing and DNS resolution.

 

 

I think the routing table to the DNS servers was not getting updated properly when one of the uplinks hiccupped.

Both eth0/WAN1 and eth2/WAN2 were using the same DNS servers, and when a link went down, the DNS requests routed through the down link just vanished.  And, the routing cache seemed to keep sending new requests to the same dead route.

 

I have now set the DNS on eth0/WAN1 to 8.8.8.8 and the DNS on eth2/WAN2 to 8.8.4.4

So /etc/resolv.conf shows

nameserver 8.8.8.8

nameserver 8.8.4.4

 

Then, I added static routes 

8.8.8.0/24 via nexthop 192.168.100.1 (the uplink on eth0/WAN1)

and 

8.8.4.0/24 via nexthost 192.168.200.1 (the uplink on eth1/WAN2)

 

 

This way, I think, if one link goes down, it won't try to route DNS to the dead link.

So far this morning, it is working.  We'll see in a few days if it stays up.

 

 

Ideally, I'd also like to be able to set the load-balance ping gateway for each uplink to the corresponding DNS server.  While I can do this with a "configure" command on the USG, it is overwritten with each provision from the Cloud Key controller web interface, and that's annoying, so I'll leave it as is for now.

 

In future versions of the Controller interface, I'd really like a richer collection of tools for setting up load-balancing.

Usually, load-balancing is used in situations where the network admin knows that the uplinks are both slow and/or unreliable, so and load-balancing is used to minimize that problem.  Unfortunately, in my experience, load-balancing actually makes the system less stable.

 

View solution in original post

New Member
Posts: 13
Registered: ‎01-05-2017
Kudos: 6
Solutions: 2

Re: Problems with DNS on USG.

[ Edited ]

So far, this has worked stably for several days.  (Maximum uptime previously was about 1 day.)
Most importantly, I have had the situation where one link goes down, but the load-balance watchdog on the other link stays up.  Previously, the failure would cascade and both links would be shown as down.

 

So, until/unless new evidence arises, setting separate fixed DNS severs and matching those in the static routing table seems to fix it for me.

(If this does bear out long-term, I suggest the Unifi Controller interface be update to set these routes automatically, and to either have a warning box that the same DNS should not be used for both uplinks, or to have a patch to the watchdog script that clears the route cache at the appropriate time during failover.)


View solution in original post


All Replies
New Member
Posts: 22
Registered: ‎11-22-2016
Kudos: 6
Solutions: 1

Re: Problems with DNS on USG.

FYI- My issues seem related to WAN failover as well.
New Member
Posts: 22
Registered: ‎11-22-2016
Kudos: 6
Solutions: 1

Re: Problems with DNS on USG.

It is possible what you're seeing is cloudfront giving out multiple IPs as part of the way their CDN works:

 

You see the address changes between tries:

 

Same thing happens for me with I try ping multiple times in a row, I sometimes get different addresses.

 

I think this is normal.

 

rusty@xxx:/var/log$ getent hosts ping.ubnt.com
52.85.207.221   d2cnv2pop2xy4v.cloudfront.net ping.ubnt.com dl.ubnt.com
rusty@xxx:/var/log$ getent hosts ping.ubnt.com
54.192.140.36   d2cnv2pop2xy4v.cloudfront.net ping.ubnt.com dl.ubnt.com
rusty@xxx:/var/log$ getent hosts ping.ubnt.com
54.192.140.36   d2cnv2pop2xy4v.cloudfront.net ping.ubnt.com dl.ubnt.com
rusty@xxx:/var/log$ getent hosts ping.ubnt.com
52.85.207.221   d2cnv2pop2xy4v.cloudfront.net ping.ubnt.com dl.ubnt.com
rusty@xxx:/var/log$ getent hosts ping.ubnt.com
54.192.140.36   d2cnv2pop2xy4v.cloudfront.net ping.ubnt.com dl.ubnt.com
rusty@xxx:/var/log$ getent hosts ping.ubnt.com
52.85.207.221   d2cnv2pop2xy4v.cloudfront.net ping.ubnt.com dl.ubnt.com
rusty@xxx:/var/log$ getent hosts ping.ubnt.com
54.192.140.36   d2cnv2pop2xy4v.cloudfront.net ping.ubnt.com dl.ubnt.com
rusty@xxx:/var/log$ getent hosts ping.ubnt.com
52.85.207.221   d2cnv2pop2xy4v.cloudfront.net ping.ubnt.com dl.ubnt.com
rusty@xxx:/var/log$ getent hosts ping.ubnt.com
54.192.140.36   d2cnv2pop2xy4v.cloudfront.net ping.ubnt.com dl.ubnt.com
rusty@xxx:/var/log$ getent hosts ping.ubnt.com
52.85.207.221   d2cnv2pop2xy4v.cloudfront.net ping.ubnt.com dl.ubnt.com
rusty@xxx:/var/log$ host 52.85.207.221
221.207.85.52.in-addr.arpa domain name pointer server-52-85-207-221.dfw50.r.cloudfront.net.
rusty@xxx:/var/log$ host 54.192.140.36
36.140.192.54.in-addr.arpa domain name pointer server-54-192-140-36.sfo5.r.cloudfront.net.
New Member
Posts: 13
Registered: ‎01-05-2017
Kudos: 6
Solutions: 2

Re: Problems with DNS on USG.

Sadly, that is not the problem. A direct DNS query via "host" or "dig" always resolves all day long, while "getent hosts" and "ping" never do. (Moreover, in my terminal above, the IP did not change.)
New Member
Posts: 13
Registered: ‎01-05-2017
Kudos: 6
Solutions: 2

Re: Problems with DNS on USG.

I think I may have identified the problem, which I think arises from an interaction between the load-balancing and DNS resolution.

 

 

I think the routing table to the DNS servers was not getting updated properly when one of the uplinks hiccupped.

Both eth0/WAN1 and eth2/WAN2 were using the same DNS servers, and when a link went down, the DNS requests routed through the down link just vanished.  And, the routing cache seemed to keep sending new requests to the same dead route.

 

I have now set the DNS on eth0/WAN1 to 8.8.8.8 and the DNS on eth2/WAN2 to 8.8.4.4

So /etc/resolv.conf shows

nameserver 8.8.8.8

nameserver 8.8.4.4

 

Then, I added static routes 

8.8.8.0/24 via nexthop 192.168.100.1 (the uplink on eth0/WAN1)

and 

8.8.4.0/24 via nexthost 192.168.200.1 (the uplink on eth1/WAN2)

 

 

This way, I think, if one link goes down, it won't try to route DNS to the dead link.

So far this morning, it is working.  We'll see in a few days if it stays up.

 

 

Ideally, I'd also like to be able to set the load-balance ping gateway for each uplink to the corresponding DNS server.  While I can do this with a "configure" command on the USG, it is overwritten with each provision from the Cloud Key controller web interface, and that's annoying, so I'll leave it as is for now.

 

In future versions of the Controller interface, I'd really like a richer collection of tools for setting up load-balancing.

Usually, load-balancing is used in situations where the network admin knows that the uplinks are both slow and/or unreliable, so and load-balancing is used to minimize that problem.  Unfortunately, in my experience, load-balancing actually makes the system less stable.

 

New Member
Posts: 17
Registered: ‎07-22-2016
Kudos: 6

Re: Problems with DNS on USG.

Please let us know how this goes... this sounds very likely in my scenario too.

New Member
Posts: 13
Registered: ‎01-05-2017
Kudos: 6
Solutions: 2

Re: Problems with DNS on USG.

[ Edited ]

So far, this has worked stably for several days.  (Maximum uptime previously was about 1 day.)
Most importantly, I have had the situation where one link goes down, but the load-balance watchdog on the other link stays up.  Previously, the failure would cascade and both links would be shown as down.

 

So, until/unless new evidence arises, setting separate fixed DNS severs and matching those in the static routing table seems to fix it for me.

(If this does bear out long-term, I suggest the Unifi Controller interface be update to set these routes automatically, and to either have a warning box that the same DNS should not be used for both uplinks, or to have a patch to the watchdog script that clears the route cache at the appropriate time during failover.)


New Member
Posts: 17
Registered: ‎07-22-2016
Kudos: 6

Re: Problems with DNS on USG.

I can confirm the same results.  I was able to recreate the problem readily, and since making the above listed changes, I've been stable again.

 

Thanks again - great catch!

New Member
Posts: 1
Registered: ‎12-18-2016

Re: Problems with DNS on USG.

[ Edited ]

I literally just discovered this thread as I found this resolution as well.  Thanks and Kudos for finding it out.

 

Also, this fix also solves the STUN missing problem.  If your controller is not on prem, and you have dual WAN; it will have STUN errors constantly because it cannot resolve the STUN server name.  Also had some issues with heartbeats being missed and other controller chicanery.  This seems to have resolved this.  For everyone's edification, I did something similar to what the OP did.  

 

Interface Configuration:

WAN1 - DNS I used OpenDNS1 and Google DNS1

WAN2 - DNS I used OpenDNS2 and GoogleDNS2

 

Static routes:

WAN1-DNS1 - 208.67.222.222/32 via interface WAN1

WAN1-DNS2 - 8.8.8.8/32 via interface WAN1

WAN2-DNS1 - 208.67.220.220/32 via interface WAN1

WAN2-DNS2 - 8.8.4.4/32 via interface WAN1

 

After this, I was able to use ping.ubnt.com as my up/down test.  If I continued to use 8.8.8.8 (as I was), failover wouldn't work properly

New Member
Posts: 22
Registered: ‎10-05-2017
Kudos: 8
Solutions: 1

Re: Problems with DNS on USG.

This worked for me!

 

Once I added a static route for each interface to its DNS server, "show load-balance watchdog" went from both interfaces being DOWN to both being REACHABLE. And the failover went back to the primary interface like it should.

 

Thanks so much!

 

 

New Member
Posts: 4
Registered: ‎03-11-2018
Kudos: 1

Re: Problems with DNS on USG.

I'm a brand new user of the USG, and immediately ran into this issue.  Failover to WAN2 worked, but never came back to WAN1.

 

I did as suggested above, and it all seems to be working much better now.

New Member
Posts: 25
Registered: ‎02-21-2017
Kudos: 2

Re: Problems with DNS on USG.

[ Edited ]

I'm a bit confused here.... I'm getting everything set up for when our second internet connection is installed in our office so I wanted to have the static routes ready to enable when both wans become available.

 

At the moment we're only connected through the WAN1 in the USG, but if I add a route to 8.8.8.8 through the WAN1 I can't reach it. I have to set the route through WAN2 (which is disabled and unplugged) to be able to ping the target.

 

Does anyone understand why if I set the interface to WAN1 it doesn't work? Why does it work when set to WAN2? Am I doing something wrong?

 

dnsunifi.png

 

 

Edit: I changed the route from interface to nexthop as 

New Member
Posts: 5
Registered: ‎09-11-2016

Re: Problems with DNS on USG.

I was in the same boat.  kudos for creating this thread, identifying the problem, and posting a solution!  thank you so much!

New Member
Posts: 1
Registered: ‎01-31-2017

Re: Problems with DNS on USG.

thanks a lot! this was exactly, what i was lookin for!

 

failover is now working as expected.

 

kind regards,

 

Florian

New Member
Posts: 9
Registered: ‎12-19-2018
Kudos: 1

Re: Problems with DNS on USG.

[ Edited ]

ive set up the DNS on both WANS..

 

i also made those d!mn static routes

 

still:

 

trust@MainUSGPRO4:~$ show load-balance status
Group wan_failover
interface : eth2
carrier : up
status : inactive
gateway : xxxxxxxxxxxxxx
route table : 201
weight : 0%
flows
WAN Out : 10
WAN In : 0
Local Out : 3

interface : eth3
carrier : up
status : failover
gateway : 10.0.1.1
route table : 202
weight : 0%
flows
WAN Out : 6
WAN In : 0
Local Out : 1

trust@MainUSGPRO4:~$ show load-balance watchdog
Group wan_failover
eth2
status: Waiting on recovery (0/3)
pings: 3
fails: 3
run fails: 3/3
route drops: 1
ping gateway: ping.ubnt.com - DOWN
last route drop : Sat Dec 22 11:52:51 2018

eth3
status: Waiting on recovery (0/3)
failover-only mode
pings: 3
fails: 3
run fails: 3/3
route drops: 1
ping gateway: ping.ubnt.com - DOWN
last route drop : Sat Dec 22 11:52:52 2018

Emerging Member
Posts: 52
Registered: ‎06-27-2017
Kudos: 9

Re: Problems with DNS on USG.

Does anyone have instructions regarding how to add static routes in the USG? The normal linux commands don't seem to be available. If anyone has step by step instructions, even in a format that isn't pretty, please post them and I will make a more formal post with credit given to the author.

New Member
Posts: 11
Registered: ‎02-22-2017

Re: Problems with DNS on USG.

I am on 5.10.17-11638-1 and am trying to get failover to work as I have an intermitent problem with my primary, which is a cable modem.

 

Right now failover sometimes works but never fails back from what I see. I came to this thread and wanted to try the static routes option. However, as a newbie for static networks I am missing something. All is clear except for Next Hop setting.

 

Inside the USG under WAN settings it says my router IP is 73.159.148.1 and my IP address is 73.159.149.53. Which number do I put into Next Hop?

 

Also, what is the distance field and why do I put a 1 there? My failover device on WAN2 is about 5 meters away.

 

Any help is appreciated.