Reply
New Member
Posts: 7
Registered: ‎09-21-2018
Accepted Solution

unstable ipsesc: one site sometimes not initiating

hello!

 

i have set up ipsec successfully 3 weeks ago but its then often down for hours or days and i dont know why.

whats strange is that one edgerouter seems not to try to initiate a connection because when (in down state) i do "show vpn log" on both edgerouters i see on the one router (router A) lots of

 ...peer... initiating Main Mode IKE_SA .... to ....

but on the other router (router B) i only see one line: 

Starting IKE charon daemon (strongSwan 5.2.2, Linux 3.10.107-UBNT, mips64)

and nothing else

 

it worked 2 hours ago properly and i did not even login to any of the routers nor changed anything.

 

also strange is that on router A for "show vpn ipsec sa" i see:

peer-somepeer-tunnel-1: #1, CONNECTING, IKEv1, 80c8a2192fabc058:0000000000000000
  local  '%any' @ 192.168.0.199
  remote '%any' @ somepublicip
  queued:  QUICK_MODE
  active:  ISAKMP_VENDOR ISAKMP_CERT_PRE MAIN_MODE ISAKMP_CERT_POST ISAKMP_NATD

and on router B i see no output for this command.

 

note:
both routers are behind their ISP-router, but configured as DMZ (so all traffic is automatically routed to the edgerouters)

 

i configured both routers exactly the same (beside remote-hostnames):

vpn {
    ipsec {
        auto-firewall-nat-exclude enable
        esp-group FOO0 {
            compression disable
            lifetime 3600
            mode tunnel
            pfs enable
            proposal 1 {
                encryption aes128
                hash sha1
            }
        }
        ike-group FOO0 {
            dead-peer-detection {
                action restart
                interval 30
                timeout 60
            }
            ikev2-reauth no
            key-exchange ikev1
            lifetime 28800
            proposal 1 {
                dh-group 14
                encryption aes128
                hash sha1
            }
        }
        site-to-site {
            peer routerA {
                authentication {
                    id fqdn:routerA
                    mode pre-shared-secret
                    pre-shared-secret ****************
                    remote-id fqdn:routerB
                }
                connection-type initiate
                description ""
                ike-group FOO0
                ikev2-reauth inherit
                local-address any
                tunnel 1 {
                    allow-nat-networks disable
                    allow-public-networks disable
                    esp-group FOO0
                    local {
                        prefix 10.17.0.0/16
                    }
                    remote {
                        prefix 10.15.0.0/16
                    }
                }
            }
        }
    }
}

can anyone please help me out? thx in advance


Accepted Solutions
Regular Member
Posts: 311
Registered: ‎11-11-2015
Kudos: 119
Solutions: 28

Re: unstable ipsesc: one site sometimes not initiating

Pick one side to be the initiator and one to be the responder instead of setting both sides to respond...

 

At my main site, I always have the router respond to any connection received and hold connections open using dead peer detection:

set vpn ipsec site-to-site peer B.B.B.B connection-type respond
set vpn ipsec ike-group ike_BBBB dead-peer-detection action hold
set vpn ipsec site-to-site peer C.C.C.C connection-type respond
set vpn ipsec ike-group ike_CCCC dead-peer-detection action hold

At the remote locations, they initiate and dead peer detection is set to restart:

set vpn ipsec site-to-site peer A.A.A.A connection-type initiate
set vpn ipsec ike-group ike_AAAA dead-peer-detection action restart

 

Give that a whirl and see if that improves your tunnel stability.

View solution in original post


All Replies
New Member
Posts: 7
Registered: ‎09-21-2018

Re: unstable ipsesc: one site sometimes not initiating

now after 22hours router-uptime it suddenly magically works again. i have done nothing since yesterday

"show vpn log" on router B says:

Sep 21 17:42:06 00[DMN] Starting IKE charon daemon (strongSwan 5.2.2, Linux 3.10.107-UBNT, mips64)
Sep 22 15:21:49 04[KNL] creating acquire job for policy 10.15.0.99/32[tcp/65040] === 10.17.0.7/32[tcp/8009] with reqid {1}
Sep 22 15:22:09 13[IKE] <peer-somepeer-tunnel-1|1> initiating Main Mode IKE_SA peer-somepeer-tunnel-1[1] to somepublicip
Sep 22 15:22:12 09[IKE] <peer-somepeer-tunnel-1|1> IKE_SA peer-somepeer-tunnel-1[1] established between 192.168.1.101[somepeer]...somepublicip[somepeer]
Sep 22 15:22:14 05[IKE] <peer-somepeer-tunnel-1|1> CHILD_SA peer-somepeer-tunnel-1{1} established with SPIs c123417c_i cb41cc74_o and TS 10.15.0.0/16 === 10.17.0.0/16

why is this "acquire job" started that late (after 22h of router-uptime)? how can i debug for this possible problem?

New Member
Posts: 7
Registered: ‎09-21-2018

Re: unstable ipsesc: one site sometimes not initiating

[ Edited ]

now its down again. and "show vpn log" for the last 3 hours shows (on router A):

Sep 23 06:40:26 09[KNL] creating rekey job for ESP CHILD_SA with SPI ccdc04fb and reqid {1}
Sep 23 06:51:24 10[IKE] <peer-somepeer-tunnel-1|8> closing CHILD_SA peer-somepeer-tunnel-1{1} with SPIs ccdc04fb_i (45840 bytes) cad09d2e_o (0 bytes) and TS 10.17.0.0/16 === 10.15.0.0/16
Sep 23 06:51:24 09[KNL] creating delete job for ESP CHILD_SA with SPI ccdc04fb and reqid {1}
Sep 23 06:51:24 14[KNL] creating delete job for ESP CHILD_SA with SPI cad09d2e and reqid {1}
Sep 23 07:23:49 06[KNL] creating rekey job for ESP CHILD_SA with SPI ca02c380 and reqid {1}
Sep 23 07:23:51 11[IKE] <peer-somepeer-tunnel-1|8> CHILD_SA peer-somepeer-tunnel-1{1} established with SPIs c967f58a_i c5fa6bc7_o and TS 10.17.0.0/16 === 10.15.0.0/16
Sep 23 07:28:42 05[KNL] creating rekey job for ESP CHILD_SA with SPI cb990ad4 and reqid {1}
Sep 23 07:39:14 10[KNL] creating delete job for ESP CHILD_SA with SPI ca02c380 and reqid {1}
Sep 23 07:39:14 08[KNL] creating delete job for ESP CHILD_SA with SPI cb990ad4 and reqid {1}
Sep 23 07:39:14 10[IKE] <peer-somepeer-tunnel-1|8> closing expired CHILD_SA peer-somepeer-tunnel-1{1} with SPIs ca02c380_i cb990ad4_o and TS 10.17.0.0/16 === 10.15.0.0/16
Sep 23 08:07:09 14[IKE] <peer-somepeer-tunnel-1|8> CHILD_SA peer-somepeer-tunnel-1{1} established with SPIs c3231e19_i c7a86297_o and TS 10.17.0.0/16 === 10.15.0.0/16
Sep 23 08:11:24 11[KNL] creating rekey job for ESP CHILD_SA with SPI c967f58a and reqid {1}
Sep 23 08:12:47 14[KNL] creating rekey job for ESP CHILD_SA with SPI c5fa6bc7 and reqid {1}
Sep 23 08:23:51 14[KNL] creating delete job for ESP CHILD_SA with SPI c967f58a and reqid {1}
Sep 23 08:23:51 14[IKE] <peer-somepeer-tunnel-1|8> closing expired CHILD_SA peer-somepeer-tunnel-1{1} with SPIs c967f58a_i c5fa6bc7_o and TS 10.17.0.0/16 === 10.15.0.0/16
Sep 23 08:23:51 08[KNL] creating delete job for ESP CHILD_SA with SPI c5fa6bc7 and reqid {1}
Sep 23 08:53:52 16[KNL] creating rekey job for ESP CHILD_SA with SPI c7a86297 and reqid {1}
Sep 23 08:53:54 13[IKE] <peer-somepeer-tunnel-1|8> CHILD_SA peer-somepeer-tunnel-1{1} established with SPIs c086e3c2_i c37bcb41_o and TS 10.17.0.0/16 === 10.15.0.0/16
Sep 23 08:56:13 11[KNL] creating rekey job for ESP CHILD_SA with SPI c3231e19 and reqid {1}
Sep 23 09:07:09 08[KNL] creating delete job for ESP CHILD_SA with SPI c3231e19 and reqid {1}
Sep 23 09:07:09 14[KNL] <peer-somepeer-tunnel-1|8> querying SAD entry with SPI c3231e19 failed: No such process (3)
Sep 23 09:07:09 14[KNL] <peer-somepeer-tunnel-1|8> querying SAD entry with SPI c7a86297 failed: No such process (3)
Sep 23 09:07:09 14[IKE] <peer-somepeer-tunnel-1|8> closing CHILD_SA peer-somepeer-tunnel-1{1} with SPIs c3231e19_i (48276 bytes) c7a86297_o (390473 bytes) and TS 10.17.0.0/16 === 10.15.0.0/16
Sep 23 09:07:09 14[KNL] creating delete job for ESP CHILD_SA with SPI c7a86297 and reqid {1}

additionally here is the output of "show vpn ipsec sa" from router A:

peer-somepeer-tunnel-1: #8, ESTABLISHED, IKEv1, 93d388d2d53e6e22:ab125e0f37d0e65e
  local  'somepeer' @ 192.168.0.199
  remote 'somepeer' @ some-public-ip
  AES_CBC-128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_2048
  established 10172s ago, reauth in 17573s
  peer-somepeer-tunnel-1: #1, INSTALLED, TUNNEL-in-UDP, ESP:AES_CBC-128/HMAC_SHA1_96/MODP_2048
    installed 1868 ago, rekeying in 945s, expires in 1734s
    in  c086e3c2,  29700 bytes,   495 packets,     6s ago
    out c37bcb41, 1675610 bytes, 25673 packets,     0s ago
    local  10.17.0.0/16
    remote 10.15.0.0/16

output of "show ip route" on router A:

IP Route Table for VRF "default"
S    *> 0.0.0.0/0 [1/0] via 192.168.0.1, eth0
C    *> 10.17.0.0/24 is directly connected, br0
C    *> 127.0.0.0/8 is directly connected, lo
C    *> 192.168.0.0/24 is directly connected, eth0

shouldn't be an entry here in the routing table for the remote 10.15.0.0/16 network?
why does this delete job appear so often? is this the problem?

 

New Member
Posts: 1
Registered: ‎09-23-2018

Re: unstable ipsesc: one site sometimes not initiating

Really understanding what is shown takes networking skills

Jordan Salian
essay grader

New Member
Posts: 7
Registered: ‎09-21-2018

Re: unstable ipsesc: one site sometimes not initiating

any ideas how i could narrow done the problem? or why this job is started that late?

Veteran Member
Posts: 7,039
Registered: ‎03-24-2016
Kudos: 1822
Solutions: 802

Re: unstable ipsesc: one site sometimes not initiating

The IPSEC route can be found in table 220

sudo ip route show table 220

 

I see you're behind NAT, that might cause extra issues.  On NAT device , forward ports 500 and 4500 UDP towards the ER

New Member
Posts: 7
Registered: ‎09-21-2018

Re: unstable ipsesc: one site sometimes not initiating

sudo ip route show table 220

shows nothing (tunnel is currently down)

 

yes i'm behind a nat router on both sides, but i have configured the ERs as DMZ servers on that isp-routers. so all traffic is routed directly to the ERs

New Member
Posts: 7
Registered: ‎09-21-2018

Re: unstable ipsesc: one site sometimes not initiating

this problem is so annoying Icon Cry

any tipps how i can debug further?

Regular Member
Posts: 311
Registered: ‎11-11-2015
Kudos: 119
Solutions: 28

Re: unstable ipsesc: one site sometimes not initiating

Pick one side to be the initiator and one to be the responder instead of setting both sides to respond...

 

At my main site, I always have the router respond to any connection received and hold connections open using dead peer detection:

set vpn ipsec site-to-site peer B.B.B.B connection-type respond
set vpn ipsec ike-group ike_BBBB dead-peer-detection action hold
set vpn ipsec site-to-site peer C.C.C.C connection-type respond
set vpn ipsec ike-group ike_CCCC dead-peer-detection action hold

At the remote locations, they initiate and dead peer detection is set to restart:

set vpn ipsec site-to-site peer A.A.A.A connection-type initiate
set vpn ipsec ike-group ike_AAAA dead-peer-detection action restart

 

Give that a whirl and see if that improves your tunnel stability.

New Member
Posts: 7
Registered: ‎09-21-2018

Re: unstable ipsesc: one site sometimes not initiating

thank you so much WURGY, it seems that it did the trick: the tunnel is now up since i configured your proposal (3 days ago) without any outages 👍 thx
New Member
Posts: 35
Registered: ‎12-22-2015
Kudos: 1

Re: unstable ipsesc: one site sometimes not initiating

Hi roadwarrier1,

 

I've had many problems with unstable Site 2 Site VPNs.

It worked for some time but sometimes after some hours, sometimes after some days it stopped and didn't restart at all. I've tried many things (policy based vpns, routed based vpn), ping and restart scripts etc.


Finally I found the cause...

 

One site uses an smaller mtu size. 

After changing the ms clamp to 1200 now everything works fine.

 

I will try to increase it in the next days to find out the maximum with stable vpn.

 

Regards Mark

Reply