diff mbox series

[v2] net-tcp: Find dst with sk's xfrm policy not ctl_sk

Message ID 20220701154413.868096-1-ssewook@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2] net-tcp: Find dst with sk's xfrm policy not ctl_sk | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 22 this patch: 22
netdev/cc_maintainers warning 1 maintainers not CCed: pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 6 this patch: 6
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 22 this patch: 22
netdev/checkpatch warning WARNING: From:/Signed-off-by: email name mismatch: 'From: sewookseo <sewookseo@google.com>' != 'Signed-off-by: Sewook Seo <sewookseo@google.com>' WARNING: line length of 100 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Sewook Seo July 1, 2022, 3:44 p.m. UTC
From: sewookseo <sewookseo@google.com>

If we set XFRM security policy by calling setsockopt with option
IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
struct. However tcp_v6_send_response doesn't look up dst_entry with the
actual socket but looks up with tcp control socket. This may cause a
problem that a RST packet is sent without ESP encryption & peer's TCP
socket can't receive it.
This patch will make the function look up dest_entry with actual socket,
if the socket has XFRM policy(sock_policy), so that the TCP response
packet via this function can be encrypted, & aligned on the encrypted
TCP socket.

Tested: We encountered this problem when a TCP socket which is encrypted
in ESP transport mode encryption, receives challenge ACK at SYN_SENT
state. After receiving challenge ACK, TCP needs to send RST to
establish the socket at next SYN try. But the RST was not encrypted &
peer TCP socket still remains on ESTABLISHED state.
So we verified this with test step as below.
[Test step]
1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
2. Client tries a new connection on the same TCP ports(src & dst).
3. Server will return challenge ACK instead of SYN,ACK.
4. Client will send RST to server to clear the SOCKET.
5. Client will retransmit SYN to server on the same TCP ports.
[Expected result]
The TCP connection should be established.

Effort: net-tcp
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Sehee Lee <seheele@google.com>
Signed-off-by: Sewook Seo <sewookseo@google.com>
---
Changelog since v1:
- Remove unnecessary null check of sk at ip_output.c
  Narrow down patch scope: sending RST at SYN_SENT state
  Remove unnecessay condition to call xfrm_sk_free_policy()
  Verified at KASAN build

 net/ipv4/ip_output.c | 7 ++++++-
 net/ipv4/tcp_ipv4.c  | 5 +++++
 net/ipv6/tcp_ipv6.c  | 7 ++++++-
 3 files changed, 17 insertions(+), 2 deletions(-)

Comments

Paolo Abeni July 5, 2022, 8:25 a.m. UTC | #1
Hello,

On Fri, 2022-07-01 at 15:44 +0000, Sewook Seo wrote:
> From: sewookseo <sewookseo@google.com>
> 
> If we set XFRM security policy by calling setsockopt with option
> IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
> struct. However tcp_v6_send_response doesn't look up dst_entry with the
> actual socket but looks up with tcp control socket. This may cause a
> problem that a RST packet is sent without ESP encryption & peer's TCP
> socket can't receive it.
> This patch will make the function look up dest_entry with actual socket,
> if the socket has XFRM policy(sock_policy), so that the TCP response
> packet via this function can be encrypted, & aligned on the encrypted
> TCP socket.
> 
> Tested: We encountered this problem when a TCP socket which is encrypted
> in ESP transport mode encryption, receives challenge ACK at SYN_SENT
> state. After receiving challenge ACK, TCP needs to send RST to
> establish the socket at next SYN try. But the RST was not encrypted &
> peer TCP socket still remains on ESTABLISHED state.
> So we verified this with test step as below.
> [Test step]
> 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
> 2. Client tries a new connection on the same TCP ports(src & dst).
> 3. Server will return challenge ACK instead of SYN,ACK.
> 4. Client will send RST to server to clear the SOCKET.
> 5. Client will retransmit SYN to server on the same TCP ports.
> [Expected result]
> The TCP connection should be established.
> 
> Effort: net-tcp

This looks like a stray "internal" tag?

> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Sehee Lee <seheele@google.com>
> Signed-off-by: Sewook Seo <sewookseo@google.com>

Is this targeting -net -or -net-next? IMHO this could land in either
trees. If you are targting net, please add a suitable Fixes: tag.


> ---
> Changelog since v1:
> - Remove unnecessary null check of sk at ip_output.c
>   Narrow down patch scope: sending RST at SYN_SENT state
>   Remove unnecessay condition to call xfrm_sk_free_policy()
>   Verified at KASAN build
> 
>  net/ipv4/ip_output.c | 7 ++++++-
>  net/ipv4/tcp_ipv4.c  | 5 +++++
>  net/ipv6/tcp_ipv6.c  | 7 ++++++-
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 00b4bf26fd93..1da430c8fee2 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
>  			   tcp_hdr(skb)->source, tcp_hdr(skb)->dest,
>  			   arg->uid);
>  	security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4));
> -	rt = ip_route_output_key(net, &fl4);
> +#ifdef CONFIG_XFRM
> +	if (sk->sk_policy[XFRM_POLICY_OUT])
> +		rt = ip_route_output_flow(net, &fl4, sk);
> +	else
> +#endif
> +		rt = ip_route_output_key(net, &fl4);
>  	if (IS_ERR(rt))
>  		return;
>  
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index fda811a5251f..459669f9e13f 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
>  		ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
>  				   inet_twsk(sk)->tw_priority : sk->sk_priority;
>  		transmit_time = tcp_transmit_time(sk);
> +#ifdef CONFIG_XFRM
> +		if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT)
> +			xfrm_sk_clone_policy(ctl_sk, sk);
> +#endif

It looks like the cloned policy will be overwrited by later resets and
possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy.

Thanks!

Paolo
Eric Dumazet July 5, 2022, 9:04 a.m. UTC | #2
On Fri, Jul 1, 2022 at 5:45 PM Sewook Seo <ssewook@gmail.com> wrote:
>
> From: sewookseo <sewookseo@google.com>
>
> If we set XFRM security policy by calling setsockopt with option
> IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
> struct. However tcp_v6_send_response doesn't look up dst_entry with the
> actual socket but looks up with tcp control socket. This may cause a
> problem that a RST packet is sent without ESP encryption & peer's TCP
> socket can't receive it.
> This patch will make the function look up dest_entry with actual socket,
> if the socket has XFRM policy(sock_policy), so that the TCP response
> packet via this function can be encrypted, & aligned on the encrypted
> TCP socket.
>
> Tested: We encountered this problem when a TCP socket which is encrypted
> in ESP transport mode encryption, receives challenge ACK at SYN_SENT
> state. After receiving challenge ACK, TCP needs to send RST to
> establish the socket at next SYN try. But the RST was not encrypted &
> peer TCP socket still remains on ESTABLISHED state.
> So we verified this with test step as below.
> [Test step]
> 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
> 2. Client tries a new connection on the same TCP ports(src & dst).
> 3. Server will return challenge ACK instead of SYN,ACK.
> 4. Client will send RST to server to clear the SOCKET.
> 5. Client will retransmit SYN to server on the same TCP ports.
> [Expected result]
> The TCP connection should be established.
>
> Effort: net-tcp
> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Sehee Lee <seheele@google.com>
> Signed-off-by: Sewook Seo <sewookseo@google.com>
> ---
> Changelog since v1:
> - Remove unnecessary null check of sk at ip_output.c
>   Narrow down patch scope: sending RST at SYN_SENT state
>   Remove unnecessay condition to call xfrm_sk_free_policy()
>   Verified at KASAN build
>
>  net/ipv4/ip_output.c | 7 ++++++-
>  net/ipv4/tcp_ipv4.c  | 5 +++++
>  net/ipv6/tcp_ipv6.c  | 7 ++++++-
>  3 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 00b4bf26fd93..1da430c8fee2 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
>                            tcp_hdr(skb)->source, tcp_hdr(skb)->dest,
>                            arg->uid);
>         security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4));
> -       rt = ip_route_output_key(net, &fl4);

Please avoid these #ifdef ?

You probably can write something like

     if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT])
         rt = ip_route_output_flow(net, &fl4, sk);
    else
          rt = ip_route_output_key(net, &fl4);

> +#ifdef CONFIG_XFRM
> +       if (sk->sk_policy[XFRM_POLICY_OUT])
> +               rt = ip_route_output_flow(net, &fl4, sk);
> +       else
> +#endif
> +               rt = ip_route_output_key(net, &fl4);
>         if (IS_ERR(rt))
>                 return;
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index fda811a5251f..459669f9e13f 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
>                 ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
>                                    inet_twsk(sk)->tw_priority : sk->sk_priority;
>                 transmit_time = tcp_transmit_time(sk);
> +#ifdef CONFIG_XFRM
> +               if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT)
> +                       xfrm_sk_clone_policy(ctl_sk, sk);
> +#endif
>         }
>         ip_send_unicast_reply(ctl_sk,
>                               skb, &TCP_SKB_CB(skb)->header.h4.opt,
> @@ -827,6 +831,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
>                               transmit_time);
>
>         ctl_sk->sk_mark = 0;
> +       xfrm_sk_free_policy(ctl_sk);
>         sock_net_set(ctl_sk, &init_net);
>         __TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
>         __TCP_INC_STATS(net, TCP_MIB_OUTRSTS);
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index c72448ba6dc9..453452f87a7c 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -952,7 +952,12 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
>          * Underlying function will use this to retrieve the network
>          * namespace
>          */
> -       dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL);
> +#ifdef CONFIG_XFRM
> +       if (sk && sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT && rst)


Why not using sk_fullsock(sk)  instead of 'sk->sk_state == TCP_SYN_SENT' ?

sk_fullsock() is really telling us if we can use sk as a full socket,
and this is all we need to know when reviewing this code.

> +               dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL);  /* Get dst with sk's XFRM policy */
> +       else
> +#endif
> +               dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL);
>         if (!IS_ERR(dst)) {
>                 skb_dst_set(buff, dst);
>                 ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL,
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>
Sewook Seo July 6, 2022, 3:09 a.m. UTC | #3
Hi, Eric.

Thanks for your review.

 if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT])
 --> This causes a compile error when CONFIG_XFRM is disabled.
ldd: /usr/bin/ld: No such file or directory
net/ipv4/ip_output.c:1739:37: error: no member named 'sk_policy' in
'struct sock'
        if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT])
I think we need to use preprocessor directives at here.
Is there any reason to use #if than #ifdef?  Then I will modify it to use #if.
    #if IS_ENABLED(CONFIG_XFRM) or #if defined(CONFIG_XFRM)

The reason I added the condition only for the state 'TCP_SYN_SENT' is
that I just intended to limit
the scope of the patch to the issue scenario(RST packet following
challenge ACK is not ESP encapsulated)
so that we can have at least a difference as before.
I also agree with you about using sk_fullsock() instead of SYN_SENT
check. will update the patch soon.

Thanks.
Sewook.


2022년 7월 5일 (화) 오전 9:04, Eric Dumazet <edumazet@google.com>님이 작성:
>
> On Fri, Jul 1, 2022 at 5:45 PM Sewook Seo <ssewook@gmail.com> wrote:
> >
> > From: sewookseo <sewookseo@google.com>
> >
> > If we set XFRM security policy by calling setsockopt with option
> > IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
> > struct. However tcp_v6_send_response doesn't look up dst_entry with the
> > actual socket but looks up with tcp control socket. This may cause a
> > problem that a RST packet is sent without ESP encryption & peer's TCP
> > socket can't receive it.
> > This patch will make the function look up dest_entry with actual socket,
> > if the socket has XFRM policy(sock_policy), so that the TCP response
> > packet via this function can be encrypted, & aligned on the encrypted
> > TCP socket.
> >
> > Tested: We encountered this problem when a TCP socket which is encrypted
> > in ESP transport mode encryption, receives challenge ACK at SYN_SENT
> > state. After receiving challenge ACK, TCP needs to send RST to
> > establish the socket at next SYN try. But the RST was not encrypted &
> > peer TCP socket still remains on ESTABLISHED state.
> > So we verified this with test step as below.
> > [Test step]
> > 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
> > 2. Client tries a new connection on the same TCP ports(src & dst).
> > 3. Server will return challenge ACK instead of SYN,ACK.
> > 4. Client will send RST to server to clear the SOCKET.
> > 5. Client will retransmit SYN to server on the same TCP ports.
> > [Expected result]
> > The TCP connection should be established.
> >
> > Effort: net-tcp
> > Cc: Maciej Żenczykowski <maze@google.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Steffen Klassert <steffen.klassert@secunet.com>
> > Cc: Sehee Lee <seheele@google.com>
> > Signed-off-by: Sewook Seo <sewookseo@google.com>
> > ---
> > Changelog since v1:
> > - Remove unnecessary null check of sk at ip_output.c
> >   Narrow down patch scope: sending RST at SYN_SENT state
> >   Remove unnecessay condition to call xfrm_sk_free_policy()
> >   Verified at KASAN build
> >
> >  net/ipv4/ip_output.c | 7 ++++++-
> >  net/ipv4/tcp_ipv4.c  | 5 +++++
> >  net/ipv6/tcp_ipv6.c  | 7 ++++++-
> >  3 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 00b4bf26fd93..1da430c8fee2 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
> >                            tcp_hdr(skb)->source, tcp_hdr(skb)->dest,
> >                            arg->uid);
> >         security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4));
> > -       rt = ip_route_output_key(net, &fl4);
>
> Please avoid these #ifdef ?
>
> You probably can write something like
>
>      if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT])
>          rt = ip_route_output_flow(net, &fl4, sk);
>     else
>           rt = ip_route_output_key(net, &fl4);
>
> > +#ifdef CONFIG_XFRM
> > +       if (sk->sk_policy[XFRM_POLICY_OUT])
> > +               rt = ip_route_output_flow(net, &fl4, sk);
> > +       else
> > +#endif
> > +               rt = ip_route_output_key(net, &fl4);
> >         if (IS_ERR(rt))
> >                 return;
> >
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index fda811a5251f..459669f9e13f 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
> >                 ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
> >                                    inet_twsk(sk)->tw_priority : sk->sk_priority;
> >                 transmit_time = tcp_transmit_time(sk);
> > +#ifdef CONFIG_XFRM
> > +               if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT)
> > +                       xfrm_sk_clone_policy(ctl_sk, sk);
> > +#endif
> >         }
> >         ip_send_unicast_reply(ctl_sk,
> >                               skb, &TCP_SKB_CB(skb)->header.h4.opt,
> > @@ -827,6 +831,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
> >                               transmit_time);
> >
> >         ctl_sk->sk_mark = 0;
> > +       xfrm_sk_free_policy(ctl_sk);
> >         sock_net_set(ctl_sk, &init_net);
> >         __TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
> >         __TCP_INC_STATS(net, TCP_MIB_OUTRSTS);
> > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > index c72448ba6dc9..453452f87a7c 100644
> > --- a/net/ipv6/tcp_ipv6.c
> > +++ b/net/ipv6/tcp_ipv6.c
> > @@ -952,7 +952,12 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
> >          * Underlying function will use this to retrieve the network
> >          * namespace
> >          */
> > -       dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL);
> > +#ifdef CONFIG_XFRM
> > +       if (sk && sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT && rst)
>
>
> Why not using sk_fullsock(sk)  instead of 'sk->sk_state == TCP_SYN_SENT' ?
>
> sk_fullsock() is really telling us if we can use sk as a full socket,
> and this is all we need to know when reviewing this code.
>
> > +               dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL);  /* Get dst with sk's XFRM policy */
> > +       else
> > +#endif
> > +               dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL);
> >         if (!IS_ERR(dst)) {
> >                 skb_dst_set(buff, dst);
> >                 ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL,
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >
Sewook Seo July 6, 2022, 3:10 a.m. UTC | #4
Hi, Paolo.

 If you are targting net, please add a suitable Fixes: tag.
 > I'm targeting net-next, and will update the subject.

It looks like the cloned policy will be overwrited by later resets and
possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy.
> Is it possible that a later reset overwrites sk_ctl's sk_policy? I thought ctl_sk is a percpu variable and it's preempted. Maybe I might miss something, please let me know if my understanding is wrong.

Thanks.


2022년 7월 5일 (화) 오전 8:25, Paolo Abeni <pabeni@redhat.com>님이 작성:
>
> Hello,
>
> On Fri, 2022-07-01 at 15:44 +0000, Sewook Seo wrote:
> > From: sewookseo <sewookseo@google.com>
> >
> > If we set XFRM security policy by calling setsockopt with option
> > IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
> > struct. However tcp_v6_send_response doesn't look up dst_entry with the
> > actual socket but looks up with tcp control socket. This may cause a
> > problem that a RST packet is sent without ESP encryption & peer's TCP
> > socket can't receive it.
> > This patch will make the function look up dest_entry with actual socket,
> > if the socket has XFRM policy(sock_policy), so that the TCP response
> > packet via this function can be encrypted, & aligned on the encrypted
> > TCP socket.
> >
> > Tested: We encountered this problem when a TCP socket which is encrypted
> > in ESP transport mode encryption, receives challenge ACK at SYN_SENT
> > state. After receiving challenge ACK, TCP needs to send RST to
> > establish the socket at next SYN try. But the RST was not encrypted &
> > peer TCP socket still remains on ESTABLISHED state.
> > So we verified this with test step as below.
> > [Test step]
> > 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
> > 2. Client tries a new connection on the same TCP ports(src & dst).
> > 3. Server will return challenge ACK instead of SYN,ACK.
> > 4. Client will send RST to server to clear the SOCKET.
> > 5. Client will retransmit SYN to server on the same TCP ports.
> > [Expected result]
> > The TCP connection should be established.
> >
> > Effort: net-tcp
>
> This looks like a stray "internal" tag?
>
> > Cc: Maciej Żenczykowski <maze@google.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Steffen Klassert <steffen.klassert@secunet.com>
> > Cc: Sehee Lee <seheele@google.com>
> > Signed-off-by: Sewook Seo <sewookseo@google.com>
>
> Is this targeting -net -or -net-next? IMHO this could land in either
> trees. If you are targting net, please add a suitable Fixes: tag.
>
>
> > ---
> > Changelog since v1:
> > - Remove unnecessary null check of sk at ip_output.c
> >   Narrow down patch scope: sending RST at SYN_SENT state
> >   Remove unnecessay condition to call xfrm_sk_free_policy()
> >   Verified at KASAN build
> >
> >  net/ipv4/ip_output.c | 7 ++++++-
> >  net/ipv4/tcp_ipv4.c  | 5 +++++
> >  net/ipv6/tcp_ipv6.c  | 7 ++++++-
> >  3 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 00b4bf26fd93..1da430c8fee2 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
> >                          tcp_hdr(skb)->source, tcp_hdr(skb)->dest,
> >                          arg->uid);
> >       security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4));
> > -     rt = ip_route_output_key(net, &fl4);
> > +#ifdef CONFIG_XFRM
> > +     if (sk->sk_policy[XFRM_POLICY_OUT])
> > +             rt = ip_route_output_flow(net, &fl4, sk);
> > +     else
> > +#endif
> > +             rt = ip_route_output_key(net, &fl4);
> >       if (IS_ERR(rt))
> >               return;
> >
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index fda811a5251f..459669f9e13f 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
> >               ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
> >                                  inet_twsk(sk)->tw_priority : sk->sk_priority;
> >               transmit_time = tcp_transmit_time(sk);
> > +#ifdef CONFIG_XFRM
> > +             if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT)
> > +                     xfrm_sk_clone_policy(ctl_sk, sk);
> > +#endif
>
> It looks like the cloned policy will be overwrited by later resets and
> possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy.
>
> Thanks!
>
> Paolo
>
Paolo Abeni July 6, 2022, 2:01 p.m. UTC | #5
Hello,
On Wed, 2022-07-06 at 03:10 +0000, 서세욱 wrote:
> On Tue, Jul 5, 2022 at 5:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > If you are targting net, please add a suitable Fixes: tag.
> I'm targeting net-next, and will update the subject.
> 
> > It looks like the cloned policy will be overwrited by later resets and
> > possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy

> Is it possible that a later reset overwrites sk_ctl's sk_policy? I
> thought ctl_sk is a percpu variable and it's preempted. Maybe I might
> miss something, please let me know if my understanding is wrong.

I mean: what happesn when there are 2 tcp_v4_send_reset() on the same
CPU (with different sk argument)?

It looks like that after the first call to xfrm_sk_clone_policy(),
sk_ctl->sk_policy will be set to the newly allocated (cloned) policy.

The next call will first clear the sk_ctl->sk_policy - without freeing
the old value - and later set it again. 

It looks like a memory leak. Am I missing something?

Thanks!

Paolo
Eric Dumazet July 6, 2022, 2:07 p.m. UTC | #6
On Wed, Jul 6, 2022 at 4:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
>
> Hello,
> On Wed, 2022-07-06 at 03:10 +0000, 서세욱 wrote:
> > On Tue, Jul 5, 2022 at 5:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > If you are targting net, please add a suitable Fixes: tag.
> > I'm targeting net-next, and will update the subject.
> >
> > > It looks like the cloned policy will be overwrited by later resets and
> > > possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy
>
> > Is it possible that a later reset overwrites sk_ctl's sk_policy? I
> > thought ctl_sk is a percpu variable and it's preempted. Maybe I might
> > miss something, please let me know if my understanding is wrong.
>
> I mean: what happesn when there are 2 tcp_v4_send_reset() on the same
> CPU (with different sk argument)?

This is not possible, because we block BH

local_bh_disable();
ctl_sk = this_cpu_read(ipv4_tcp_sk);
...
<write over tcl_sk>
local_bh_enable();


>
> It looks like that after the first call to xfrm_sk_clone_policy(),
> sk_ctl->sk_policy will be set to the newly allocated (cloned) policy.
>
> The next call will first clear the sk_ctl->sk_policy - without freeing
> the old value - and later set it again.
>
> It looks like a memory leak. Am I missing something?
>
> Thanks!
>
> Paolo
>
diff mbox series

Patch

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 00b4bf26fd93..1da430c8fee2 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1704,7 +1704,12 @@  void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
 			   tcp_hdr(skb)->source, tcp_hdr(skb)->dest,
 			   arg->uid);
 	security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4));
-	rt = ip_route_output_key(net, &fl4);
+#ifdef CONFIG_XFRM
+	if (sk->sk_policy[XFRM_POLICY_OUT])
+		rt = ip_route_output_flow(net, &fl4, sk);
+	else
+#endif
+		rt = ip_route_output_key(net, &fl4);
 	if (IS_ERR(rt))
 		return;
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index fda811a5251f..459669f9e13f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -819,6 +819,10 @@  static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
 		ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
 				   inet_twsk(sk)->tw_priority : sk->sk_priority;
 		transmit_time = tcp_transmit_time(sk);
+#ifdef CONFIG_XFRM
+		if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT)
+			xfrm_sk_clone_policy(ctl_sk, sk);
+#endif
 	}
 	ip_send_unicast_reply(ctl_sk,
 			      skb, &TCP_SKB_CB(skb)->header.h4.opt,
@@ -827,6 +831,7 @@  static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
 			      transmit_time);
 
 	ctl_sk->sk_mark = 0;
+	xfrm_sk_free_policy(ctl_sk);
 	sock_net_set(ctl_sk, &init_net);
 	__TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
 	__TCP_INC_STATS(net, TCP_MIB_OUTRSTS);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c72448ba6dc9..453452f87a7c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -952,7 +952,12 @@  static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
 	 * Underlying function will use this to retrieve the network
 	 * namespace
 	 */
-	dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL);
+#ifdef CONFIG_XFRM
+	if (sk && sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT && rst)
+		dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL);  /* Get dst with sk's XFRM policy */
+	else
+#endif
+		dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL);
 	if (!IS_ERR(dst)) {
 		skb_dst_set(buff, dst);
 		ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL,