diff mbox series

tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function

Message ID 20210929084529.96583-1-liujian56@huawei.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Guessed tree name to be net-next
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cc_maintainers success CCed 17 of 17 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 31 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/header_inline success Link
bpf/vmtest-bpf fail VM_Test
bpf/vmtest-bpf-PR fail PR summary
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next success VM_Test

Commit Message

Liu Jian Sept. 29, 2021, 8:45 a.m. UTC
In the following cases:
We need to redirect the first msg to sock1 and the second msg to sock2.
The sock lock needs to be released at __SK_REDIRECT and to get another
sock lock, this will cause the probability that psock->eval is not set to
__SK_NONE when the second msg comes.

If psock does not set apple bytes, fix this by do the cleanup before
releasing the sock lock. And keep the original logic in other cases.

Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 net/ipv4/tcp_bpf.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

John Fastabend Sept. 30, 2021, 10:25 p.m. UTC | #1
Liu Jian wrote:
> In the following cases:
> We need to redirect the first msg to sock1 and the second msg to sock2.
> The sock lock needs to be released at __SK_REDIRECT and to get another
> sock lock, this will cause the probability that psock->eval is not set to
> __SK_NONE when the second msg comes.
> 
> If psock does not set apple bytes, fix this by do the cleanup before
> releasing the sock lock. And keep the original logic in other cases.

It took me sometime to figure out the above description. Please include
a bit more details here this needs to be backported so we want to be
very clear what the error  is and how to trigger it.

In this case we should list the flow to show how the interleaving of
msgs breaks.

"
With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls
(or multiple cores) on a single socket 'sk' we could get the following
flow.

 msgA, sk                               msgB, sk
 -----------                            ---------------
 tcp_bpf_sendmsg()
 lock(sk)
 psock = sk->psock
                                        tcp_bpf_sendmsg()
                                        lock(sk) ... blocking
 tcp_bpf_send_verdict
 if (psock->eval == NONE)
   psock->eval = sk_psock_msg_verdict
 ..
 < handle SK_REDIRECT case >
   release_sock(sk)                     < lock dropped so grab here >
   ret = tcp_bpf_sendmsg_redir
                                        psock = sk->psock
                                        tcp_bpf_send_verdict
 lock_sock(sk) ... blocking on B
                                        if (psock->eval == NONE) <- boom.
                                         psock->eval will have msgA state

The problem here is we dropped the lock on msgA and grabbed it with msgB.
Now we have old state in psock and importantly psock->eval has not
been cleared. So msgB will run whatever action was done on A and the
verdict program may never see it.
"

Showing the flow makes it painfully obvious why dropping that lock
with old state is broken.


> 
> Signed-off-by: Liu Jian <liujian56@huawei.com>

We need a fixes tag as well so we can backport this.

> ---
>  net/ipv4/tcp_bpf.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index d3e9386b493e..02442e43ac4d 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -232,6 +232,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  	bool cork = false, enospc = sk_msg_full(msg);
>  	struct sock *sk_redir;
>  	u32 tosend, delta = 0;
> +	u32 eval = __SK_NONE;
>  	int ret;
>  
>  more_data:
> @@ -274,6 +275,12 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  		break;
>  	case __SK_REDIRECT:
>  		sk_redir = psock->sk_redir;
> +		if (!psock->apply_bytes) {
> +			/* Clean up before releasing the sock lock. */
> +			eval = psock->eval;
> +			psock->eval = __SK_NONE;
> +			psock->sk_redir = NULL;
> +		}

We need to move above chunk below sk_msg_apply_bytes() so we account for
the bytes and if we zero apply bytes with this send we clear the psock
state. Otherwise we could have the same issue with stale state on the
boundary where apply bytes is met.

>  		sk_msg_apply_bytes(psock, tosend);

<-- put above chunk here.

>  		if (psock->cork) {
>  			cork = true;

Interestingly, I caught the race with cork state, but missed it with
the eval case. Likely because our program redirected to a single sk.

> @@ -281,7 +288,12 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  		}
>  		sk_msg_return(sk, msg, tosend);
>  		release_sock(sk);
> +
>  		ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
> +
> +		if (eval == __SK_REDIRECT)

Is the 'if' needed? we are in this case because eval is SK_REDIRECT.

> +			sock_put(sk_redir);
> +
>  		lock_sock(sk);
>  		if (unlikely(ret < 0)) {
>  			int free = sk_msg_free_nocharge(sk, msg);
> -- 
> 2.17.1
>
Liu Jian Oct. 4, 2021, 4:21 a.m. UTC | #2
> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@gmail.com]
> Sent: Friday, October 1, 2021 6:25 AM
> To: liujian (CE) <liujian56@huawei.com>; john.fastabend@gmail.com;
> daniel@iogearbox.net; jakub@cloudflare.com; lmb@cloudflare.com;
> edumazet@google.com; davem@davemloft.net; yoshfuji@linux-ipv6.org;
> dsahern@kernel.org; kuba@kernel.org; ast@kernel.org; andrii@kernel.org;
> kafai@fb.com; songliubraving@fb.com; yhs@fb.com; kpsingh@kernel.org;
> netdev@vger.kernel.org; bpf@vger.kernel.org
> Cc: liujian (CE) <liujian56@huawei.com>
> Subject: RE: [PATCH] tcp_bpf: Fix one concurrency problem in the
> tcp_bpf_send_verdict function
> 
> Liu Jian wrote:
> > In the following cases:
> > We need to redirect the first msg to sock1 and the second msg to sock2.
> > The sock lock needs to be released at __SK_REDIRECT and to get another
> > sock lock, this will cause the probability that psock->eval is not set
> > to __SK_NONE when the second msg comes.
> >
> > If psock does not set apple bytes, fix this by do the cleanup before
> > releasing the sock lock. And keep the original logic in other cases.
> 
> It took me sometime to figure out the above description. Please include a bit
> more details here this needs to be backported so we want to be very clear
> what the error  is and how to trigger it.
> 
> In this case we should list the flow to show how the interleaving of msgs
> breaks.
> 
> "
> With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls
> (or multiple cores) on a single socket 'sk' we could get the following flow.
> 
>  msgA, sk                               msgB, sk
>  -----------                            ---------------
>  tcp_bpf_sendmsg()
>  lock(sk)
>  psock = sk->psock
>                                         tcp_bpf_sendmsg()
>                                         lock(sk) ... blocking  tcp_bpf_send_verdict  if (psock-
> >eval == NONE)
>    psock->eval = sk_psock_msg_verdict
>  ..
>  < handle SK_REDIRECT case >
>    release_sock(sk)                     < lock dropped so grab here >
>    ret = tcp_bpf_sendmsg_redir
>                                         psock = sk->psock
>                                         tcp_bpf_send_verdict
>  lock_sock(sk) ... blocking on B
>                                         if (psock->eval == NONE) <- boom.
>                                          psock->eval will have msgA state
> 
> The problem here is we dropped the lock on msgA and grabbed it with msgB.
> Now we have old state in psock and importantly psock->eval has not been
> cleared. So msgB will run whatever action was done on A and the verdict
> program may never see it.
> "
> 
> Showing the flow makes it painfully obvious why dropping that lock with old
> state is broken.
> 
Thanks a lot for such a detailed example.
> 
> >
> > Signed-off-by: Liu Jian <liujian56@huawei.com>
> 
> We need a fixes tag as well so we can backport this.
 I will add it.
> 
> > ---
> >  net/ipv4/tcp_bpf.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index
> > d3e9386b493e..02442e43ac4d 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -232,6 +232,7 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> >  	bool cork = false, enospc = sk_msg_full(msg);
> >  	struct sock *sk_redir;
> >  	u32 tosend, delta = 0;
> > +	u32 eval = __SK_NONE;
> >  	int ret;
> >
> >  more_data:
> > @@ -274,6 +275,12 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> >  		break;
> >  	case __SK_REDIRECT:
> >  		sk_redir = psock->sk_redir;
> > +		if (!psock->apply_bytes) {
> > +			/* Clean up before releasing the sock lock. */
> > +			eval = psock->eval;
> > +			psock->eval = __SK_NONE;
> > +			psock->sk_redir = NULL;
> > +		}
> 
> We need to move above chunk below sk_msg_apply_bytes() so we account
> for the bytes and if we zero apply bytes with this send we clear the psock
> state. Otherwise we could have the same issue with stale state on the
> boundary where apply bytes is met.
> 
> >  		sk_msg_apply_bytes(psock, tosend);
> 
> <-- put above chunk here.
yes, here looks better. 
> 
> >  		if (psock->cork) {
> >  			cork = true;
> 
> Interestingly, I caught the race with cork state, but missed it with the eval
> case. Likely because our program redirected to a single sk.
> 
Yes. 
> > @@ -281,7 +288,12 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> >  		}
> >  		sk_msg_return(sk, msg, tosend);
> >  		release_sock(sk);
> > +
> >  		ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
> > +
> > +		if (eval == __SK_REDIRECT)
> 
> Is the 'if' needed? we are in this case because eval is SK_REDIRECT.
> 
Need it, because If the "apply bytes" is not zero, i did not change the logic.
> > +			sock_put(sk_redir);
> > +
> >  		lock_sock(sk);
> >  		if (unlikely(ret < 0)) {
> >  			int free = sk_msg_free_nocharge(sk, msg);
> > --
> > 2.17.1
> >
diff mbox series

Patch

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index d3e9386b493e..02442e43ac4d 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -232,6 +232,7 @@  static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 	bool cork = false, enospc = sk_msg_full(msg);
 	struct sock *sk_redir;
 	u32 tosend, delta = 0;
+	u32 eval = __SK_NONE;
 	int ret;
 
 more_data:
@@ -274,6 +275,12 @@  static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 		break;
 	case __SK_REDIRECT:
 		sk_redir = psock->sk_redir;
+		if (!psock->apply_bytes) {
+			/* Clean up before releasing the sock lock. */
+			eval = psock->eval;
+			psock->eval = __SK_NONE;
+			psock->sk_redir = NULL;
+		}
 		sk_msg_apply_bytes(psock, tosend);
 		if (psock->cork) {
 			cork = true;
@@ -281,7 +288,12 @@  static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 		}
 		sk_msg_return(sk, msg, tosend);
 		release_sock(sk);
+
 		ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
+
+		if (eval == __SK_REDIRECT)
+			sock_put(sk_redir);
+
 		lock_sock(sk);
 		if (unlikely(ret < 0)) {
 			int free = sk_msg_free_nocharge(sk, msg);