Message ID | 20220701154413.868096-1-ssewook@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2] net-tcp: Find dst with sk's xfrm policy not ctl_sk | expand |
Hello, On Fri, 2022-07-01 at 15:44 +0000, Sewook Seo wrote: > From: sewookseo <sewookseo@google.com> > > If we set XFRM security policy by calling setsockopt with option > IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' > struct. However tcp_v6_send_response doesn't look up dst_entry with the > actual socket but looks up with tcp control socket. This may cause a > problem that a RST packet is sent without ESP encryption & peer's TCP > socket can't receive it. > This patch will make the function look up dest_entry with actual socket, > if the socket has XFRM policy(sock_policy), so that the TCP response > packet via this function can be encrypted, & aligned on the encrypted > TCP socket. > > Tested: We encountered this problem when a TCP socket which is encrypted > in ESP transport mode encryption, receives challenge ACK at SYN_SENT > state. After receiving challenge ACK, TCP needs to send RST to > establish the socket at next SYN try. But the RST was not encrypted & > peer TCP socket still remains on ESTABLISHED state. > So we verified this with test step as below. > [Test step] > 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). > 2. Client tries a new connection on the same TCP ports(src & dst). > 3. Server will return challenge ACK instead of SYN,ACK. > 4. Client will send RST to server to clear the SOCKET. > 5. Client will retransmit SYN to server on the same TCP ports. > [Expected result] > The TCP connection should be established. > > Effort: net-tcp This looks like a stray "internal" tag? > Cc: Maciej Żenczykowski <maze@google.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Steffen Klassert <steffen.klassert@secunet.com> > Cc: Sehee Lee <seheele@google.com> > Signed-off-by: Sewook Seo <sewookseo@google.com> Is this targeting -net -or -net-next? IMHO this could land in either trees. If you are targting net, please add a suitable Fixes: tag. > --- > Changelog since v1: > - Remove unnecessary null check of sk at ip_output.c > Narrow down patch scope: sending RST at SYN_SENT state > Remove unnecessay condition to call xfrm_sk_free_policy() > Verified at KASAN build > > net/ipv4/ip_output.c | 7 ++++++- > net/ipv4/tcp_ipv4.c | 5 +++++ > net/ipv6/tcp_ipv6.c | 7 ++++++- > 3 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 00b4bf26fd93..1da430c8fee2 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, > tcp_hdr(skb)->source, tcp_hdr(skb)->dest, > arg->uid); > security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4)); > - rt = ip_route_output_key(net, &fl4); > +#ifdef CONFIG_XFRM > + if (sk->sk_policy[XFRM_POLICY_OUT]) > + rt = ip_route_output_flow(net, &fl4, sk); > + else > +#endif > + rt = ip_route_output_key(net, &fl4); > if (IS_ERR(rt)) > return; > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index fda811a5251f..459669f9e13f 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) > ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ? > inet_twsk(sk)->tw_priority : sk->sk_priority; > transmit_time = tcp_transmit_time(sk); > +#ifdef CONFIG_XFRM > + if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT) > + xfrm_sk_clone_policy(ctl_sk, sk); > +#endif It looks like the cloned policy will be overwrited by later resets and possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy. Thanks! Paolo
On Fri, Jul 1, 2022 at 5:45 PM Sewook Seo <ssewook@gmail.com> wrote: > > From: sewookseo <sewookseo@google.com> > > If we set XFRM security policy by calling setsockopt with option > IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' > struct. However tcp_v6_send_response doesn't look up dst_entry with the > actual socket but looks up with tcp control socket. This may cause a > problem that a RST packet is sent without ESP encryption & peer's TCP > socket can't receive it. > This patch will make the function look up dest_entry with actual socket, > if the socket has XFRM policy(sock_policy), so that the TCP response > packet via this function can be encrypted, & aligned on the encrypted > TCP socket. > > Tested: We encountered this problem when a TCP socket which is encrypted > in ESP transport mode encryption, receives challenge ACK at SYN_SENT > state. After receiving challenge ACK, TCP needs to send RST to > establish the socket at next SYN try. But the RST was not encrypted & > peer TCP socket still remains on ESTABLISHED state. > So we verified this with test step as below. > [Test step] > 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). > 2. Client tries a new connection on the same TCP ports(src & dst). > 3. Server will return challenge ACK instead of SYN,ACK. > 4. Client will send RST to server to clear the SOCKET. > 5. Client will retransmit SYN to server on the same TCP ports. > [Expected result] > The TCP connection should be established. > > Effort: net-tcp > Cc: Maciej Żenczykowski <maze@google.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Steffen Klassert <steffen.klassert@secunet.com> > Cc: Sehee Lee <seheele@google.com> > Signed-off-by: Sewook Seo <sewookseo@google.com> > --- > Changelog since v1: > - Remove unnecessary null check of sk at ip_output.c > Narrow down patch scope: sending RST at SYN_SENT state > Remove unnecessay condition to call xfrm_sk_free_policy() > Verified at KASAN build > > net/ipv4/ip_output.c | 7 ++++++- > net/ipv4/tcp_ipv4.c | 5 +++++ > net/ipv6/tcp_ipv6.c | 7 ++++++- > 3 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 00b4bf26fd93..1da430c8fee2 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, > tcp_hdr(skb)->source, tcp_hdr(skb)->dest, > arg->uid); > security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4)); > - rt = ip_route_output_key(net, &fl4); Please avoid these #ifdef ? You probably can write something like if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT]) rt = ip_route_output_flow(net, &fl4, sk); else rt = ip_route_output_key(net, &fl4); > +#ifdef CONFIG_XFRM > + if (sk->sk_policy[XFRM_POLICY_OUT]) > + rt = ip_route_output_flow(net, &fl4, sk); > + else > +#endif > + rt = ip_route_output_key(net, &fl4); > if (IS_ERR(rt)) > return; > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index fda811a5251f..459669f9e13f 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) > ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ? > inet_twsk(sk)->tw_priority : sk->sk_priority; > transmit_time = tcp_transmit_time(sk); > +#ifdef CONFIG_XFRM > + if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT) > + xfrm_sk_clone_policy(ctl_sk, sk); > +#endif > } > ip_send_unicast_reply(ctl_sk, > skb, &TCP_SKB_CB(skb)->header.h4.opt, > @@ -827,6 +831,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) > transmit_time); > > ctl_sk->sk_mark = 0; > + xfrm_sk_free_policy(ctl_sk); > sock_net_set(ctl_sk, &init_net); > __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); > __TCP_INC_STATS(net, TCP_MIB_OUTRSTS); > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c > index c72448ba6dc9..453452f87a7c 100644 > --- a/net/ipv6/tcp_ipv6.c > +++ b/net/ipv6/tcp_ipv6.c > @@ -952,7 +952,12 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 > * Underlying function will use this to retrieve the network > * namespace > */ > - dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL); > +#ifdef CONFIG_XFRM > + if (sk && sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT && rst) Why not using sk_fullsock(sk) instead of 'sk->sk_state == TCP_SYN_SENT' ? sk_fullsock() is really telling us if we can use sk as a full socket, and this is all we need to know when reviewing this code. > + dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL); /* Get dst with sk's XFRM policy */ > + else > +#endif > + dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL); > if (!IS_ERR(dst)) { > skb_dst_set(buff, dst); > ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL, > -- > 2.37.0.rc0.161.g10f37bed90-goog >
Hi, Eric. Thanks for your review. if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT]) --> This causes a compile error when CONFIG_XFRM is disabled. ldd: /usr/bin/ld: No such file or directory net/ipv4/ip_output.c:1739:37: error: no member named 'sk_policy' in 'struct sock' if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT]) I think we need to use preprocessor directives at here. Is there any reason to use #if than #ifdef? Then I will modify it to use #if. #if IS_ENABLED(CONFIG_XFRM) or #if defined(CONFIG_XFRM) The reason I added the condition only for the state 'TCP_SYN_SENT' is that I just intended to limit the scope of the patch to the issue scenario(RST packet following challenge ACK is not ESP encapsulated) so that we can have at least a difference as before. I also agree with you about using sk_fullsock() instead of SYN_SENT check. will update the patch soon. Thanks. Sewook. 2022년 7월 5일 (화) 오전 9:04, Eric Dumazet <edumazet@google.com>님이 작성: > > On Fri, Jul 1, 2022 at 5:45 PM Sewook Seo <ssewook@gmail.com> wrote: > > > > From: sewookseo <sewookseo@google.com> > > > > If we set XFRM security policy by calling setsockopt with option > > IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' > > struct. However tcp_v6_send_response doesn't look up dst_entry with the > > actual socket but looks up with tcp control socket. This may cause a > > problem that a RST packet is sent without ESP encryption & peer's TCP > > socket can't receive it. > > This patch will make the function look up dest_entry with actual socket, > > if the socket has XFRM policy(sock_policy), so that the TCP response > > packet via this function can be encrypted, & aligned on the encrypted > > TCP socket. > > > > Tested: We encountered this problem when a TCP socket which is encrypted > > in ESP transport mode encryption, receives challenge ACK at SYN_SENT > > state. After receiving challenge ACK, TCP needs to send RST to > > establish the socket at next SYN try. But the RST was not encrypted & > > peer TCP socket still remains on ESTABLISHED state. > > So we verified this with test step as below. > > [Test step] > > 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). > > 2. Client tries a new connection on the same TCP ports(src & dst). > > 3. Server will return challenge ACK instead of SYN,ACK. > > 4. Client will send RST to server to clear the SOCKET. > > 5. Client will retransmit SYN to server on the same TCP ports. > > [Expected result] > > The TCP connection should be established. > > > > Effort: net-tcp > > Cc: Maciej Żenczykowski <maze@google.com> > > Cc: Eric Dumazet <edumazet@google.com> > > Cc: Steffen Klassert <steffen.klassert@secunet.com> > > Cc: Sehee Lee <seheele@google.com> > > Signed-off-by: Sewook Seo <sewookseo@google.com> > > --- > > Changelog since v1: > > - Remove unnecessary null check of sk at ip_output.c > > Narrow down patch scope: sending RST at SYN_SENT state > > Remove unnecessay condition to call xfrm_sk_free_policy() > > Verified at KASAN build > > > > net/ipv4/ip_output.c | 7 ++++++- > > net/ipv4/tcp_ipv4.c | 5 +++++ > > net/ipv6/tcp_ipv6.c | 7 ++++++- > > 3 files changed, 17 insertions(+), 2 deletions(-) > > > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > > index 00b4bf26fd93..1da430c8fee2 100644 > > --- a/net/ipv4/ip_output.c > > +++ b/net/ipv4/ip_output.c > > @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, > > tcp_hdr(skb)->source, tcp_hdr(skb)->dest, > > arg->uid); > > security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4)); > > - rt = ip_route_output_key(net, &fl4); > > Please avoid these #ifdef ? > > You probably can write something like > > if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT]) > rt = ip_route_output_flow(net, &fl4, sk); > else > rt = ip_route_output_key(net, &fl4); > > > +#ifdef CONFIG_XFRM > > + if (sk->sk_policy[XFRM_POLICY_OUT]) > > + rt = ip_route_output_flow(net, &fl4, sk); > > + else > > +#endif > > + rt = ip_route_output_key(net, &fl4); > > if (IS_ERR(rt)) > > return; > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > index fda811a5251f..459669f9e13f 100644 > > --- a/net/ipv4/tcp_ipv4.c > > +++ b/net/ipv4/tcp_ipv4.c > > @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) > > ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ? > > inet_twsk(sk)->tw_priority : sk->sk_priority; > > transmit_time = tcp_transmit_time(sk); > > +#ifdef CONFIG_XFRM > > + if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT) > > + xfrm_sk_clone_policy(ctl_sk, sk); > > +#endif > > } > > ip_send_unicast_reply(ctl_sk, > > skb, &TCP_SKB_CB(skb)->header.h4.opt, > > @@ -827,6 +831,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) > > transmit_time); > > > > ctl_sk->sk_mark = 0; > > + xfrm_sk_free_policy(ctl_sk); > > sock_net_set(ctl_sk, &init_net); > > __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); > > __TCP_INC_STATS(net, TCP_MIB_OUTRSTS); > > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c > > index c72448ba6dc9..453452f87a7c 100644 > > --- a/net/ipv6/tcp_ipv6.c > > +++ b/net/ipv6/tcp_ipv6.c > > @@ -952,7 +952,12 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 > > * Underlying function will use this to retrieve the network > > * namespace > > */ > > - dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL); > > +#ifdef CONFIG_XFRM > > + if (sk && sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT && rst) > > > Why not using sk_fullsock(sk) instead of 'sk->sk_state == TCP_SYN_SENT' ? > > sk_fullsock() is really telling us if we can use sk as a full socket, > and this is all we need to know when reviewing this code. > > > + dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL); /* Get dst with sk's XFRM policy */ > > + else > > +#endif > > + dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL); > > if (!IS_ERR(dst)) { > > skb_dst_set(buff, dst); > > ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL, > > -- > > 2.37.0.rc0.161.g10f37bed90-goog > >
Hi, Paolo. If you are targting net, please add a suitable Fixes: tag. > I'm targeting net-next, and will update the subject. It looks like the cloned policy will be overwrited by later resets and possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy. > Is it possible that a later reset overwrites sk_ctl's sk_policy? I thought ctl_sk is a percpu variable and it's preempted. Maybe I might miss something, please let me know if my understanding is wrong. Thanks. 2022년 7월 5일 (화) 오전 8:25, Paolo Abeni <pabeni@redhat.com>님이 작성: > > Hello, > > On Fri, 2022-07-01 at 15:44 +0000, Sewook Seo wrote: > > From: sewookseo <sewookseo@google.com> > > > > If we set XFRM security policy by calling setsockopt with option > > IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' > > struct. However tcp_v6_send_response doesn't look up dst_entry with the > > actual socket but looks up with tcp control socket. This may cause a > > problem that a RST packet is sent without ESP encryption & peer's TCP > > socket can't receive it. > > This patch will make the function look up dest_entry with actual socket, > > if the socket has XFRM policy(sock_policy), so that the TCP response > > packet via this function can be encrypted, & aligned on the encrypted > > TCP socket. > > > > Tested: We encountered this problem when a TCP socket which is encrypted > > in ESP transport mode encryption, receives challenge ACK at SYN_SENT > > state. After receiving challenge ACK, TCP needs to send RST to > > establish the socket at next SYN try. But the RST was not encrypted & > > peer TCP socket still remains on ESTABLISHED state. > > So we verified this with test step as below. > > [Test step] > > 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). > > 2. Client tries a new connection on the same TCP ports(src & dst). > > 3. Server will return challenge ACK instead of SYN,ACK. > > 4. Client will send RST to server to clear the SOCKET. > > 5. Client will retransmit SYN to server on the same TCP ports. > > [Expected result] > > The TCP connection should be established. > > > > Effort: net-tcp > > This looks like a stray "internal" tag? > > > Cc: Maciej Żenczykowski <maze@google.com> > > Cc: Eric Dumazet <edumazet@google.com> > > Cc: Steffen Klassert <steffen.klassert@secunet.com> > > Cc: Sehee Lee <seheele@google.com> > > Signed-off-by: Sewook Seo <sewookseo@google.com> > > Is this targeting -net -or -net-next? IMHO this could land in either > trees. If you are targting net, please add a suitable Fixes: tag. > > > > --- > > Changelog since v1: > > - Remove unnecessary null check of sk at ip_output.c > > Narrow down patch scope: sending RST at SYN_SENT state > > Remove unnecessay condition to call xfrm_sk_free_policy() > > Verified at KASAN build > > > > net/ipv4/ip_output.c | 7 ++++++- > > net/ipv4/tcp_ipv4.c | 5 +++++ > > net/ipv6/tcp_ipv6.c | 7 ++++++- > > 3 files changed, 17 insertions(+), 2 deletions(-) > > > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > > index 00b4bf26fd93..1da430c8fee2 100644 > > --- a/net/ipv4/ip_output.c > > +++ b/net/ipv4/ip_output.c > > @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, > > tcp_hdr(skb)->source, tcp_hdr(skb)->dest, > > arg->uid); > > security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4)); > > - rt = ip_route_output_key(net, &fl4); > > +#ifdef CONFIG_XFRM > > + if (sk->sk_policy[XFRM_POLICY_OUT]) > > + rt = ip_route_output_flow(net, &fl4, sk); > > + else > > +#endif > > + rt = ip_route_output_key(net, &fl4); > > if (IS_ERR(rt)) > > return; > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > index fda811a5251f..459669f9e13f 100644 > > --- a/net/ipv4/tcp_ipv4.c > > +++ b/net/ipv4/tcp_ipv4.c > > @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) > > ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ? > > inet_twsk(sk)->tw_priority : sk->sk_priority; > > transmit_time = tcp_transmit_time(sk); > > +#ifdef CONFIG_XFRM > > + if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT) > > + xfrm_sk_clone_policy(ctl_sk, sk); > > +#endif > > It looks like the cloned policy will be overwrited by later resets and > possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy. > > Thanks! > > Paolo >
Hello, On Wed, 2022-07-06 at 03:10 +0000, 서세욱 wrote: > On Tue, Jul 5, 2022 at 5:25 PM Paolo Abeni <pabeni@redhat.com> wrote: > > If you are targting net, please add a suitable Fixes: tag. > I'm targeting net-next, and will update the subject. > > > It looks like the cloned policy will be overwrited by later resets and > > possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy > Is it possible that a later reset overwrites sk_ctl's sk_policy? I > thought ctl_sk is a percpu variable and it's preempted. Maybe I might > miss something, please let me know if my understanding is wrong. I mean: what happesn when there are 2 tcp_v4_send_reset() on the same CPU (with different sk argument)? It looks like that after the first call to xfrm_sk_clone_policy(), sk_ctl->sk_policy will be set to the newly allocated (cloned) policy. The next call will first clear the sk_ctl->sk_policy - without freeing the old value - and later set it again. It looks like a memory leak. Am I missing something? Thanks! Paolo
On Wed, Jul 6, 2022 at 4:02 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > Hello, > On Wed, 2022-07-06 at 03:10 +0000, 서세욱 wrote: > > On Tue, Jul 5, 2022 at 5:25 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > If you are targting net, please add a suitable Fixes: tag. > > I'm targeting net-next, and will update the subject. > > > > > It looks like the cloned policy will be overwrited by later resets and > > > possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy > > > Is it possible that a later reset overwrites sk_ctl's sk_policy? I > > thought ctl_sk is a percpu variable and it's preempted. Maybe I might > > miss something, please let me know if my understanding is wrong. > > I mean: what happesn when there are 2 tcp_v4_send_reset() on the same > CPU (with different sk argument)? This is not possible, because we block BH local_bh_disable(); ctl_sk = this_cpu_read(ipv4_tcp_sk); ... <write over tcl_sk> local_bh_enable(); > > It looks like that after the first call to xfrm_sk_clone_policy(), > sk_ctl->sk_policy will be set to the newly allocated (cloned) policy. > > The next call will first clear the sk_ctl->sk_policy - without freeing > the old value - and later set it again. > > It looks like a memory leak. Am I missing something? > > Thanks! > > Paolo >
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 00b4bf26fd93..1da430c8fee2 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, tcp_hdr(skb)->source, tcp_hdr(skb)->dest, arg->uid); security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4)); - rt = ip_route_output_key(net, &fl4); +#ifdef CONFIG_XFRM + if (sk->sk_policy[XFRM_POLICY_OUT]) + rt = ip_route_output_flow(net, &fl4, sk); + else +#endif + rt = ip_route_output_key(net, &fl4); if (IS_ERR(rt)) return; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index fda811a5251f..459669f9e13f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ? inet_twsk(sk)->tw_priority : sk->sk_priority; transmit_time = tcp_transmit_time(sk); +#ifdef CONFIG_XFRM + if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT) + xfrm_sk_clone_policy(ctl_sk, sk); +#endif } ip_send_unicast_reply(ctl_sk, skb, &TCP_SKB_CB(skb)->header.h4.opt, @@ -827,6 +831,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) transmit_time); ctl_sk->sk_mark = 0; + xfrm_sk_free_policy(ctl_sk); sock_net_set(ctl_sk, &init_net); __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); __TCP_INC_STATS(net, TCP_MIB_OUTRSTS); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index c72448ba6dc9..453452f87a7c 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -952,7 +952,12 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 * Underlying function will use this to retrieve the network * namespace */ - dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL); +#ifdef CONFIG_XFRM + if (sk && sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT && rst) + dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL); /* Get dst with sk's XFRM policy */ + else +#endif + dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL); if (!IS_ERR(dst)) { skb_dst_set(buff, dst); ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL,