[v2] net-tcp: Find dst with sk's xfrm policy not ctl_sk

Message ID	20220701154413.868096-1-ssewook@gmail.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: Sewook Seo <ssewook@gmail.com> To: Sewook Seo <sewookseo@google.com> Cc: Linux Network Development Mailing List <netdev@vger.kernel.org>, "David S . Miller " <davem@davemloft.net>, Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>, David Ahern <dsahern@kernel.org>, Jakub Kicinski <kuba@kernel.org>, =?utf-8?q?Maciej_=C5=BBenczykowski?= <maze@google.com>, Eric Dumazet <edumazet@google.com>, Steffen Klassert <steffen.klassert@secunet.com>, Sehee Lee <seheele@google.com> Subject: [PATCH v2] net-tcp: Find dst with sk's xfrm policy not ctl_sk Date: Fri, 1 Jul 2022 15:44:13 +0000 Message-Id: <20220701154413.868096-1-ssewook@gmail.com> In-Reply-To: <20220621202240.4182683-1-ssewook@gmail.com> References: <20220621202240.4182683-1-ssewook@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] net-tcp: Find dst with sk's xfrm policy not ctl_sk \| expand [v2] net-tcp: Find dst with sk's xfrm policy not ctl_sk

Context	Check	Description
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	warning	Target tree name not specified in the subject
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 22 this patch: 22
netdev/cc_maintainers	warning	1 maintainers not CCed: pabeni@redhat.com
netdev/build_clang	success	Errors and warnings before: 6 this patch: 6
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 22 this patch: 22
netdev/checkpatch	warning	WARNING: From:/Signed-off-by: email name mismatch: 'From: sewookseo <sewookseo@google.com>' != 'Signed-off-by: Sewook Seo <sewookseo@google.com>' WARNING: line length of 100 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Sewook Seo July 1, 2022, 3:44 p.m. UTC

From: sewookseo <sewookseo@google.com>

If we set XFRM security policy by calling setsockopt with option
IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
struct. However tcp_v6_send_response doesn't look up dst_entry with the
actual socket but looks up with tcp control socket. This may cause a
problem that a RST packet is sent without ESP encryption & peer's TCP
socket can't receive it.
This patch will make the function look up dest_entry with actual socket,
if the socket has XFRM policy(sock_policy), so that the TCP response
packet via this function can be encrypted, & aligned on the encrypted
TCP socket.

Tested: We encountered this problem when a TCP socket which is encrypted
in ESP transport mode encryption, receives challenge ACK at SYN_SENT
state. After receiving challenge ACK, TCP needs to send RST to
establish the socket at next SYN try. But the RST was not encrypted &
peer TCP socket still remains on ESTABLISHED state.
So we verified this with test step as below.
[Test step]
1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
2. Client tries a new connection on the same TCP ports(src & dst).
3. Server will return challenge ACK instead of SYN,ACK.
4. Client will send RST to server to clear the SOCKET.
5. Client will retransmit SYN to server on the same TCP ports.
[Expected result]
The TCP connection should be established.

Effort: net-tcp
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Sehee Lee <seheele@google.com>
Signed-off-by: Sewook Seo <sewookseo@google.com>
---
Changelog since v1:
- Remove unnecessary null check of sk at ip_output.c
  Narrow down patch scope: sending RST at SYN_SENT state
  Remove unnecessay condition to call xfrm_sk_free_policy()
  Verified at KASAN build

 net/ipv4/ip_output.c | 7 ++++++-
 net/ipv4/tcp_ipv4.c  | 5 +++++
 net/ipv6/tcp_ipv6.c  | 7 ++++++-
 3 files changed, 17 insertions(+), 2 deletions(-)

Paolo Abeni July 5, 2022, 8:25 a.m. UTC | #1

Hello,

On Fri, 2022-07-01 at 15:44 +0000, Sewook Seo wrote:
> From: sewookseo <sewookseo@google.com>
> 
> If we set XFRM security policy by calling setsockopt with option
> IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
> struct. However tcp_v6_send_response doesn't look up dst_entry with the
> actual socket but looks up with tcp control socket. This may cause a
> problem that a RST packet is sent without ESP encryption & peer's TCP
> socket can't receive it.
> This patch will make the function look up dest_entry with actual socket,
> if the socket has XFRM policy(sock_policy), so that the TCP response
> packet via this function can be encrypted, & aligned on the encrypted
> TCP socket.
> 
> Tested: We encountered this problem when a TCP socket which is encrypted
> in ESP transport mode encryption, receives challenge ACK at SYN_SENT
> state. After receiving challenge ACK, TCP needs to send RST to
> establish the socket at next SYN try. But the RST was not encrypted &
> peer TCP socket still remains on ESTABLISHED state.
> So we verified this with test step as below.
> [Test step]
> 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
> 2. Client tries a new connection on the same TCP ports(src & dst).
> 3. Server will return challenge ACK instead of SYN,ACK.
> 4. Client will send RST to server to clear the SOCKET.
> 5. Client will retransmit SYN to server on the same TCP ports.
> [Expected result]
> The TCP connection should be established.
> 
> Effort: net-tcp

This looks like a stray "internal" tag?

> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Sehee Lee <seheele@google.com>
> Signed-off-by: Sewook Seo <sewookseo@google.com>

Is this targeting -net -or -net-next? IMHO this could land in either
trees. If you are targting net, please add a suitable Fixes: tag.


> ---
> Changelog since v1:
> - Remove unnecessary null check of sk at ip_output.c
>   Narrow down patch scope: sending RST at SYN_SENT state
>   Remove unnecessay condition to call xfrm_sk_free_policy()
>   Verified at KASAN build
> 
>  net/ipv4/ip_output.c | 7 ++++++-
>  net/ipv4/tcp_ipv4.c  | 5 +++++
>  net/ipv6/tcp_ipv6.c  | 7 ++++++-
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 00b4bf26fd93..1da430c8fee2 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
>  			   tcp_hdr(skb)->source, tcp_hdr(skb)->dest,
>  			   arg->uid);
>  	security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4));
> -	rt = ip_route_output_key(net, &fl4);
> +#ifdef CONFIG_XFRM
> +	if (sk->sk_policy[XFRM_POLICY_OUT])
> +		rt = ip_route_output_flow(net, &fl4, sk);
> +	else
> +#endif
> +		rt = ip_route_output_key(net, &fl4);
>  	if (IS_ERR(rt))
>  		return;
>  
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index fda811a5251f..459669f9e13f 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
>  		ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
>  				   inet_twsk(sk)->tw_priority : sk->sk_priority;
>  		transmit_time = tcp_transmit_time(sk);
> +#ifdef CONFIG_XFRM
> +		if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT)
> +			xfrm_sk_clone_policy(ctl_sk, sk);
> +#endif

It looks like the cloned policy will be overwrited by later resets and
possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy.

Thanks!

Paolo

Eric Dumazet July 5, 2022, 9:04 a.m. UTC | #2

On Fri, Jul 1, 2022 at 5:45 PM Sewook Seo <ssewook@gmail.com> wrote:
>
> From: sewookseo <sewookseo@google.com>
>
> If we set XFRM security policy by calling setsockopt with option
> IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
> struct. However tcp_v6_send_response doesn't look up dst_entry with the
> actual socket but looks up with tcp control socket. This may cause a
> problem that a RST packet is sent without ESP encryption & peer's TCP
> socket can't receive it.
> This patch will make the function look up dest_entry with actual socket,
> if the socket has XFRM policy(sock_policy), so that the TCP response
> packet via this function can be encrypted, & aligned on the encrypted
> TCP socket.
>
> Tested: We encountered this problem when a TCP socket which is encrypted
> in ESP transport mode encryption, receives challenge ACK at SYN_SENT
> state. After receiving challenge ACK, TCP needs to send RST to
> establish the socket at next SYN try. But the RST was not encrypted &
> peer TCP socket still remains on ESTABLISHED state.
> So we verified this with test step as below.
> [Test step]
> 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
> 2. Client tries a new connection on the same TCP ports(src & dst).
> 3. Server will return challenge ACK instead of SYN,ACK.
> 4. Client will send RST to server to clear the SOCKET.
> 5. Client will retransmit SYN to server on the same TCP ports.
> [Expected result]
> The TCP connection should be established.
>
> Effort: net-tcp
> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Sehee Lee <seheele@google.com>
> Signed-off-by: Sewook Seo <sewookseo@google.com>
> ---
> Changelog since v1:
> - Remove unnecessary null check of sk at ip_output.c
>   Narrow down patch scope: sending RST at SYN_SENT state
>   Remove unnecessay condition to call xfrm_sk_free_policy()
>   Verified at KASAN build
>
>  net/ipv4/ip_output.c | 7 ++++++-
>  net/ipv4/tcp_ipv4.c  | 5 +++++
>  net/ipv6/tcp_ipv6.c  | 7 ++++++-
>  3 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 00b4bf26fd93..1da430c8fee2 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
>                            tcp_hdr(skb)->source, tcp_hdr(skb)->dest,
>                            arg->uid);
>         security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4));
> -       rt = ip_route_output_key(net, &fl4);

Please avoid these #ifdef ?

You probably can write something like

     if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT])
         rt = ip_route_output_flow(net, &fl4, sk);
    else
          rt = ip_route_output_key(net, &fl4);

> +#ifdef CONFIG_XFRM
> +       if (sk->sk_policy[XFRM_POLICY_OUT])
> +               rt = ip_route_output_flow(net, &fl4, sk);
> +       else
> +#endif
> +               rt = ip_route_output_key(net, &fl4);
>         if (IS_ERR(rt))
>                 return;
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index fda811a5251f..459669f9e13f 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
>                 ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
>                                    inet_twsk(sk)->tw_priority : sk->sk_priority;
>                 transmit_time = tcp_transmit_time(sk);
> +#ifdef CONFIG_XFRM
> +               if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT)
> +                       xfrm_sk_clone_policy(ctl_sk, sk);
> +#endif
>         }
>         ip_send_unicast_reply(ctl_sk,
>                               skb, &TCP_SKB_CB(skb)->header.h4.opt,
> @@ -827,6 +831,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
>                               transmit_time);
>
>         ctl_sk->sk_mark = 0;
> +       xfrm_sk_free_policy(ctl_sk);
>         sock_net_set(ctl_sk, &init_net);
>         __TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
>         __TCP_INC_STATS(net, TCP_MIB_OUTRSTS);
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index c72448ba6dc9..453452f87a7c 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -952,7 +952,12 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
>          * Underlying function will use this to retrieve the network
>          * namespace
>          */
> -       dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL);
> +#ifdef CONFIG_XFRM
> +       if (sk && sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT && rst)


Why not using sk_fullsock(sk)  instead of 'sk->sk_state == TCP_SYN_SENT' ?

sk_fullsock() is really telling us if we can use sk as a full socket,
and this is all we need to know when reviewing this code.

> +               dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL);  /* Get dst with sk's XFRM policy */
> +       else
> +#endif
> +               dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL);
>         if (!IS_ERR(dst)) {
>                 skb_dst_set(buff, dst);
>                 ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL,
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>

Sewook Seo July 6, 2022, 3:09 a.m. UTC | #3

Hi, Eric.

Thanks for your review.

 if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT])
 --> This causes a compile error when CONFIG_XFRM is disabled.
ldd: /usr/bin/ld: No such file or directory
net/ipv4/ip_output.c:1739:37: error: no member named 'sk_policy' in
'struct sock'
        if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT])
I think we need to use preprocessor directives at here.
Is there any reason to use #if than #ifdef?  Then I will modify it to use #if.
    #if IS_ENABLED(CONFIG_XFRM) or #if defined(CONFIG_XFRM)

The reason I added the condition only for the state 'TCP_SYN_SENT' is
that I just intended to limit
the scope of the patch to the issue scenario(RST packet following
challenge ACK is not ESP encapsulated)
so that we can have at least a difference as before.
I also agree with you about using sk_fullsock() instead of SYN_SENT
check. will update the patch soon.

Thanks.
Sewook.


2022년 7월 5일 (화) 오전 9:04, Eric Dumazet <edumazet@google.com>님이 작성:
>
> On Fri, Jul 1, 2022 at 5:45 PM Sewook Seo <ssewook@gmail.com> wrote:
> >
> > From: sewookseo <sewookseo@google.com>
> >
> > If we set XFRM security policy by calling setsockopt with option
> > IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
> > struct. However tcp_v6_send_response doesn't look up dst_entry with the
> > actual socket but looks up with tcp control socket. This may cause a
> > problem that a RST packet is sent without ESP encryption & peer's TCP
> > socket can't receive it.
> > This patch will make the function look up dest_entry with actual socket,
> > if the socket has XFRM policy(sock_policy), so that the TCP response
> > packet via this function can be encrypted, & aligned on the encrypted
> > TCP socket.
> >
> > Tested: We encountered this problem when a TCP socket which is encrypted
> > in ESP transport mode encryption, receives challenge ACK at SYN_SENT
> > state. After receiving challenge ACK, TCP needs to send RST to
> > establish the socket at next SYN try. But the RST was not encrypted &
> > peer TCP socket still remains on ESTABLISHED state.
> > So we verified this with test step as below.
> > [Test step]
> > 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
> > 2. Client tries a new connection on the same TCP ports(src & dst).
> > 3. Server will return challenge ACK instead of SYN,ACK.
> > 4. Client will send RST to server to clear the SOCKET.
> > 5. Client will retransmit SYN to server on the same TCP ports.
> > [Expected result]
> > The TCP connection should be established.
> >
> > Effort: net-tcp
> > Cc: Maciej Żenczykowski <maze@google.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Steffen Klassert <steffen.klassert@secunet.com>
> > Cc: Sehee Lee <seheele@google.com>
> > Signed-off-by: Sewook Seo <sewookseo@google.com>
> > ---
> > Changelog since v1:
> > - Remove unnecessary null check of sk at ip_output.c
> >   Narrow down patch scope: sending RST at SYN_SENT state
> >   Remove unnecessay condition to call xfrm_sk_free_policy()
> >   Verified at KASAN build
> >
> >  net/ipv4/ip_output.c | 7 ++++++-
> >  net/ipv4/tcp_ipv4.c  | 5 +++++
> >  net/ipv6/tcp_ipv6.c  | 7 ++++++-
> >  3 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 00b4bf26fd93..1da430c8fee2 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
> >                            tcp_hdr(skb)->source, tcp_hdr(skb)->dest,
> >                            arg->uid);
> >         security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4));
> > -       rt = ip_route_output_key(net, &fl4);
>
> Please avoid these #ifdef ?
>
> You probably can write something like
>
>      if (IS_ENABLED(CONFIG_XFRM) && sk->sk_policy[XFRM_POLICY_OUT])
>          rt = ip_route_output_flow(net, &fl4, sk);
>     else
>           rt = ip_route_output_key(net, &fl4);
>
> > +#ifdef CONFIG_XFRM
> > +       if (sk->sk_policy[XFRM_POLICY_OUT])
> > +               rt = ip_route_output_flow(net, &fl4, sk);
> > +       else
> > +#endif
> > +               rt = ip_route_output_key(net, &fl4);
> >         if (IS_ERR(rt))
> >                 return;
> >
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index fda811a5251f..459669f9e13f 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
> >                 ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
> >                                    inet_twsk(sk)->tw_priority : sk->sk_priority;
> >                 transmit_time = tcp_transmit_time(sk);
> > +#ifdef CONFIG_XFRM
> > +               if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT)
> > +                       xfrm_sk_clone_policy(ctl_sk, sk);
> > +#endif
> >         }
> >         ip_send_unicast_reply(ctl_sk,
> >                               skb, &TCP_SKB_CB(skb)->header.h4.opt,
> > @@ -827,6 +831,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
> >                               transmit_time);
> >
> >         ctl_sk->sk_mark = 0;
> > +       xfrm_sk_free_policy(ctl_sk);
> >         sock_net_set(ctl_sk, &init_net);
> >         __TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
> >         __TCP_INC_STATS(net, TCP_MIB_OUTRSTS);
> > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > index c72448ba6dc9..453452f87a7c 100644
> > --- a/net/ipv6/tcp_ipv6.c
> > +++ b/net/ipv6/tcp_ipv6.c
> > @@ -952,7 +952,12 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
> >          * Underlying function will use this to retrieve the network
> >          * namespace
> >          */
> > -       dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL);
> > +#ifdef CONFIG_XFRM
> > +       if (sk && sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT && rst)
>
>
> Why not using sk_fullsock(sk)  instead of 'sk->sk_state == TCP_SYN_SENT' ?
>
> sk_fullsock() is really telling us if we can use sk as a full socket,
> and this is all we need to know when reviewing this code.
>
> > +               dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL);  /* Get dst with sk's XFRM policy */
> > +       else
> > +#endif
> > +               dst = ip6_dst_lookup_flow(sock_net(ctl_sk), ctl_sk, &fl6, NULL);
> >         if (!IS_ERR(dst)) {
> >                 skb_dst_set(buff, dst);
> >                 ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL,
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >

Sewook Seo July 6, 2022, 3:10 a.m. UTC | #4

Hi, Paolo.

 If you are targting net, please add a suitable Fixes: tag.
 > I'm targeting net-next, and will update the subject.

It looks like the cloned policy will be overwrited by later resets and
possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy.
> Is it possible that a later reset overwrites sk_ctl's sk_policy? I thought ctl_sk is a percpu variable and it's preempted. Maybe I might miss something, please let me know if my understanding is wrong.

Thanks.


2022년 7월 5일 (화) 오전 8:25, Paolo Abeni <pabeni@redhat.com>님이 작성:
>
> Hello,
>
> On Fri, 2022-07-01 at 15:44 +0000, Sewook Seo wrote:
> > From: sewookseo <sewookseo@google.com>
> >
> > If we set XFRM security policy by calling setsockopt with option
> > IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
> > struct. However tcp_v6_send_response doesn't look up dst_entry with the
> > actual socket but looks up with tcp control socket. This may cause a
> > problem that a RST packet is sent without ESP encryption & peer's TCP
> > socket can't receive it.
> > This patch will make the function look up dest_entry with actual socket,
> > if the socket has XFRM policy(sock_policy), so that the TCP response
> > packet via this function can be encrypted, & aligned on the encrypted
> > TCP socket.
> >
> > Tested: We encountered this problem when a TCP socket which is encrypted
> > in ESP transport mode encryption, receives challenge ACK at SYN_SENT
> > state. After receiving challenge ACK, TCP needs to send RST to
> > establish the socket at next SYN try. But the RST was not encrypted &
> > peer TCP socket still remains on ESTABLISHED state.
> > So we verified this with test step as below.
> > [Test step]
> > 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
> > 2. Client tries a new connection on the same TCP ports(src & dst).
> > 3. Server will return challenge ACK instead of SYN,ACK.
> > 4. Client will send RST to server to clear the SOCKET.
> > 5. Client will retransmit SYN to server on the same TCP ports.
> > [Expected result]
> > The TCP connection should be established.
> >
> > Effort: net-tcp
>
> This looks like a stray "internal" tag?
>
> > Cc: Maciej Żenczykowski <maze@google.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Steffen Klassert <steffen.klassert@secunet.com>
> > Cc: Sehee Lee <seheele@google.com>
> > Signed-off-by: Sewook Seo <sewookseo@google.com>
>
> Is this targeting -net -or -net-next? IMHO this could land in either
> trees. If you are targting net, please add a suitable Fixes: tag.
>
>
> > ---
> > Changelog since v1:
> > - Remove unnecessary null check of sk at ip_output.c
> >   Narrow down patch scope: sending RST at SYN_SENT state
> >   Remove unnecessay condition to call xfrm_sk_free_policy()
> >   Verified at KASAN build
> >
> >  net/ipv4/ip_output.c | 7 ++++++-
> >  net/ipv4/tcp_ipv4.c  | 5 +++++
> >  net/ipv6/tcp_ipv6.c  | 7 ++++++-
> >  3 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 00b4bf26fd93..1da430c8fee2 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -1704,7 +1704,12 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
> >                          tcp_hdr(skb)->source, tcp_hdr(skb)->dest,
> >                          arg->uid);
> >       security_skb_classify_flow(skb, flowi4_to_flowi_common(&fl4));
> > -     rt = ip_route_output_key(net, &fl4);
> > +#ifdef CONFIG_XFRM
> > +     if (sk->sk_policy[XFRM_POLICY_OUT])
> > +             rt = ip_route_output_flow(net, &fl4, sk);
> > +     else
> > +#endif
> > +             rt = ip_route_output_key(net, &fl4);
> >       if (IS_ERR(rt))
> >               return;
> >
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index fda811a5251f..459669f9e13f 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -819,6 +819,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
> >               ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ?
> >                                  inet_twsk(sk)->tw_priority : sk->sk_priority;
> >               transmit_time = tcp_transmit_time(sk);
> > +#ifdef CONFIG_XFRM
> > +             if (sk->sk_policy[XFRM_POLICY_OUT] && sk->sk_state == TCP_SYN_SENT)
> > +                     xfrm_sk_clone_policy(ctl_sk, sk);
> > +#endif
>
> It looks like the cloned policy will be overwrited by later resets and
> possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy.
>
> Thanks!
>
> Paolo
>

Paolo Abeni July 6, 2022, 2:01 p.m. UTC | #5

Hello,
On Wed, 2022-07-06 at 03:10 +0000, 서세욱 wrote:
> On Tue, Jul 5, 2022 at 5:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > If you are targting net, please add a suitable Fixes: tag.
> I'm targeting net-next, and will update the subject.
> 
> > It looks like the cloned policy will be overwrited by later resets and
> > possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy

> Is it possible that a later reset overwrites sk_ctl's sk_policy? I
> thought ctl_sk is a percpu variable and it's preempted. Maybe I might
> miss something, please let me know if my understanding is wrong.

I mean: what happesn when there are 2 tcp_v4_send_reset() on the same
CPU (with different sk argument)?

It looks like that after the first call to xfrm_sk_clone_policy(),
sk_ctl->sk_policy will be set to the newly allocated (cloned) policy.

The next call will first clear the sk_ctl->sk_policy - without freeing
the old value - and later set it again. 

It looks like a memory leak. Am I missing something?

Thanks!

Paolo

Eric Dumazet July 6, 2022, 2:07 p.m. UTC | #6

On Wed, Jul 6, 2022 at 4:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
>
> Hello,
> On Wed, 2022-07-06 at 03:10 +0000, 서세욱 wrote:
> > On Tue, Jul 5, 2022 at 5:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > If you are targting net, please add a suitable Fixes: tag.
> > I'm targeting net-next, and will update the subject.
> >
> > > It looks like the cloned policy will be overwrited by later resets and
> > > possibly leaked? nobody calls xfrm_sk_free_policy() on the old policy
>
> > Is it possible that a later reset overwrites sk_ctl's sk_policy? I
> > thought ctl_sk is a percpu variable and it's preempted. Maybe I might
> > miss something, please let me know if my understanding is wrong.
>
> I mean: what happesn when there are 2 tcp_v4_send_reset() on the same
> CPU (with different sk argument)?

This is not possible, because we block BH

local_bh_disable();
ctl_sk = this_cpu_read(ipv4_tcp_sk);
...
<write over tcl_sk>
local_bh_enable();


>
> It looks like that after the first call to xfrm_sk_clone_policy(),
> sk_ctl->sk_policy will be set to the newly allocated (cloned) policy.
>
> The next call will first clear the sk_ctl->sk_policy - without freeing
> the old value - and later set it again.
>
> It looks like a memory leak. Am I missing something?
>
> Thanks!
>
> Paolo
>

[v2] net-tcp: Find dst with sk's xfrm policy not ctl_sk

Checks

Commit Message

Comments

Patch