diff mbox series

[mptcp-net] mptcp: really cope with fastopen race.

Message ID 02fe7afaec6fac65fc2da5922efe25d6ebdebc88.1706132185.git.pabeni@redhat.com (mailing list archive)
State Accepted, archived
Commit d59403d0c91bd78663dad043b348f1f3ad2cc88d
Delegated to: Matthieu Baerts
Headers show
Series [mptcp-net] mptcp: really cope with fastopen race. | expand

Checks

Context Check Description
matttbe/build success Build and static analysis OK
matttbe/checkpatch warning total: 0 errors, 1 warnings, 0 checks, 9 lines checked
matttbe/KVM_Validation__normal warning Unstable: 2 failed test(s): packetdrill_mp_reset packetdrill_regressions
matttbe/KVM_Validation__debug__except_selftest_mptcp_join_ success Success! ✅
matttbe/KVM_Validation__debug__only_selftest_mptcp_join_ warning Unstable: 1 failed test(s): selftest_mptcp_join

Commit Message

Paolo Abeni Jan. 24, 2024, 9:36 p.m. UTC
Fastopen and PM-trigger subflow shutdown can race, as reported by
syzkaller.

In my first attempt to close such race, I missed the fact that
the subflow status can change again before the subflow_state_change
callback is invoked.

Address the issue additionally copying with all the states directly
reachable from TCP_FIN_WAIT1.

Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support")
Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race")
Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

MPTCP CI Jan. 24, 2024, 10:43 p.m. UTC | #1
Hi Paolo,

Thank you for your modifications, that's great!

Our CI (GitHub Action) did some validations and here is its report:

- KVM Validation: normal:
  - Unstable: 2 failed test(s): packetdrill_regressions selftest_mptcp_join 
MPTCP CI Jan. 24, 2024, 10:43 p.m. UTC | #2
Hi Paolo,

Thank you for your modifications, that's great!

Our CI (Cirrus) did some validations with a debug kernel and here is its report:

- KVM Validation: debug (except selftest_mptcp_join):
  - Critical: 1 Call Trace(s) ❌:
  - Task: https://cirrus-ci.com/task/5675809764016128
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5675809764016128/summary/summary.txt

- KVM Validation: debug (only selftest_mptcp_join):
  - Critical: 1 Call Trace(s) ❌:
  - Task: https://cirrus-ci.com/task/5112859810594816
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5112859810594816/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/f24f750c6f62


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Mat Martineau Jan. 27, 2024, 1:18 a.m. UTC | #3
On Wed, 24 Jan 2024, Paolo Abeni wrote:

> Fastopen and PM-trigger subflow shutdown can race, as reported by
> syzkaller.
>
> In my first attempt to close such race, I missed the fact that
> the subflow status can change again before the subflow_state_change
> callback is invoked.
>
> Address the issue additionally copying with all the states directly
> reachable from TCP_FIN_WAIT1.
>
> Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support")
> Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race")
> Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

I wasn't able to reproduce the CI failure, and the change looks good to 
me. We'll see if CI is happy when this is applied to export-net.

Reviewed-by: Mat Martineau <martineau@kernel.org>

> ---
> net/mptcp/protocol.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 4a32d3d11fb6..de04b97e8dd1 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -1171,7 +1171,8 @@ static inline bool subflow_simultaneous_connect(struct sock *sk)
> {
> 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
>
> -	return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_FIN_WAIT1) &&
> +	return (1 << sk->sk_state) &
> +	       (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING) &&
> 	       is_active_ssk(subflow) &&
> 	       !subflow->conn_finished;
> }
> -- 
> 2.43.0
>
>
>
MPTCP CI Jan. 27, 2024, 2:24 a.m. UTC | #4
Hi Paolo,

Thank you for your modifications, that's great!

Our CI (GitHub Action) did some validations and here is its report:

- KVM Validation: normal:
  - Unstable: 2 failed test(s): packetdrill_mp_reset packetdrill_regressions 
MPTCP CI Jan. 27, 2024, 3:23 a.m. UTC | #5
Hi Paolo,

Thank you for your modifications, that's great!

Our CI (Cirrus) did some validations with a debug kernel and here is its report:

- KVM Validation: debug (except selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/4896041841983488
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/4896041841983488/summary/summary.txt

- KVM Validation: debug (only selftest_mptcp_join):
  - Unstable: 1 failed test(s): selftest_mptcp_join 
Paolo Abeni Jan. 29, 2024, 9:35 a.m. UTC | #6
On Fri, 2024-01-26 at 17:18 -0800, Mat Martineau wrote:
> On Wed, 24 Jan 2024, Paolo Abeni wrote:
> 
> > Fastopen and PM-trigger subflow shutdown can race, as reported by
> > syzkaller.
> > 
> > In my first attempt to close such race, I missed the fact that
> > the subflow status can change again before the subflow_state_change
> > callback is invoked.
> > 
> > Address the issue additionally copying with all the states directly
> > reachable from TCP_FIN_WAIT1.
> > 
> > Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support")
> > Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race")
> > Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
> I wasn't able to reproduce the CI failure, and the change looks good to 
> me. We'll see if CI is happy when this is applied to export-net.

I used the repro provided by syzbot upstream:

https://lore.kernel.org/netdev/000000000000e8ecae060fae7a47@google.com/
https://syzkaller.appspot.com/x/repro.c?x=17abc23be80000

with my default debug config, and here it reproduced the issue reliably
on the unpatched kernel (and no splat with the patched one).

> 
> Reviewed-by: Mat Martineau <martineau@kernel.org>

Thanks,

Paolo
Mat Martineau Jan. 29, 2024, 7:16 p.m. UTC | #7
On Mon, 29 Jan 2024, Paolo Abeni wrote:

> On Fri, 2024-01-26 at 17:18 -0800, Mat Martineau wrote:
>> On Wed, 24 Jan 2024, Paolo Abeni wrote:
>>
>>> Fastopen and PM-trigger subflow shutdown can race, as reported by
>>> syzkaller.
>>>
>>> In my first attempt to close such race, I missed the fact that
>>> the subflow status can change again before the subflow_state_change
>>> callback is invoked.
>>>
>>> Address the issue additionally copying with all the states directly
>>> reachable from TCP_FIN_WAIT1.
>>>
>>> Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support")
>>> Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race")
>>> Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>
>> I wasn't able to reproduce the CI failure, and the change looks good to
>> me. We'll see if CI is happy when this is applied to export-net.
>
> I used the repro provided by syzbot upstream:
>
> https://lore.kernel.org/netdev/000000000000e8ecae060fae7a47@google.com/
> https://syzkaller.appspot.com/x/repro.c?x=17abc23be80000
>
> with my default debug config, and here it reproduced the issue reliably
> on the unpatched kernel (and no splat with the patched one).
>

I was not very specific in my wording: I was referring to the CI failures 
when it applied this patch to export-net:

https://lore.kernel.org/mptcp/482549332bcc96a8b4dd37292cb289b271b49511.camel@redhat.com/T/#mfb81b40b8a42912362eede622b5a15b17313d2a2

...not the failure reported by syzbot that this patch is intended to fix.

I did run export-net plus this patch through the 
mptcp-upstream-virtme-docker environment and it worked fine, so I think 
it's ok to apply.

- Mat
Paolo Abeni Jan. 30, 2024, 8:38 a.m. UTC | #8
On Mon, 2024-01-29 at 11:16 -0800, Mat Martineau wrote:
> On Mon, 29 Jan 2024, Paolo Abeni wrote:
> 
> > On Fri, 2024-01-26 at 17:18 -0800, Mat Martineau wrote:
> > > On Wed, 24 Jan 2024, Paolo Abeni wrote:
> > > 
> > > > Fastopen and PM-trigger subflow shutdown can race, as reported by
> > > > syzkaller.
> > > > 
> > > > In my first attempt to close such race, I missed the fact that
> > > > the subflow status can change again before the subflow_state_change
> > > > callback is invoked.
> > > > 
> > > > Address the issue additionally copying with all the states directly
> > > > reachable from TCP_FIN_WAIT1.
> > > > 
> > > > Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support")
> > > > Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race")
> > > > Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > 
> > > I wasn't able to reproduce the CI failure, and the change looks good to
> > > me. We'll see if CI is happy when this is applied to export-net.
> > 
> > I used the repro provided by syzbot upstream:
> > 
> > https://lore.kernel.org/netdev/000000000000e8ecae060fae7a47@google.com/
> > https://syzkaller.appspot.com/x/repro.c?x=17abc23be80000
> > 
> > with my default debug config, and here it reproduced the issue reliably
> > on the unpatched kernel (and no splat with the patched one).
> > 
> 
> I was not very specific in my wording: I was referring to the CI failures 
> when it applied this patch to export-net:
> 
> https://lore.kernel.org/mptcp/482549332bcc96a8b4dd37292cb289b271b49511.camel@redhat.com/T/#mfb81b40b8a42912362eede622b5a15b17313d2a2

Ah, that one!

I think it's caused by:

commit 198bc90e0e734e5f98c3d2833e8390cac3df61b2
Author: Zhengchao Shao <shaozhengchao@huawei.com>
Date:   Thu Jan 18 09:20:19 2024 +0800

    tcp: make sure init the accept_queue's spinlocks once

and fixed by:

commit 435e202d645c197dcfd39d7372eb2a56529b6640 (net68/tmp_png)
Author: Zhengchao Shao <shaozhengchao@huawei.com>
Date:   Mon Jan 22 18:20:01 2024 +0800

    ipv6: init the accept_queue's spinlocks in inet6_create

Cheers,

Paolo
Matthieu Baerts (NGI0) Jan. 30, 2024, 12:08 p.m. UTC | #9
Hi Paolo, Mat,

On 24/01/2024 22:36, Paolo Abeni wrote:
> Fastopen and PM-trigger subflow shutdown can race, as reported by
> syzkaller.
> 
> In my first attempt to close such race, I missed the fact that
> the subflow status can change again before the subflow_state_change
> callback is invoked.
> 
> Address the issue additionally copying with all the states directly
> reachable from TCP_FIN_WAIT1.

Thank you for the patch and the review!

Now in our tree (fix for -net) without the '.' at the end of the commit
title and a Closes tag ;)

New patches for t/upstream-net and t/upstream:
- d59403d0c91b: mptcp: really cope with fastopen race
- Results: ca82f8ccb3cc..6d56aa24e733 (export-net)
- Results: 4b8f1ec22243..cdd4fe00f4c5 (export)

Tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net/20240130T120521
https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20240130T120521

Cheers,
Matt
Paolo Abeni Feb. 5, 2024, 4 p.m. UTC | #10
On Wed, 2024-01-24 at 22:36 +0100, Paolo Abeni wrote:
> Fastopen and PM-trigger subflow shutdown can race, as reported by
> syzkaller.
> 
> In my first attempt to close such race, I missed the fact that
> the subflow status can change again before the subflow_state_change
> callback is invoked.
> 
> Address the issue additionally copying with all the states directly
> reachable from TCP_FIN_WAIT1.
> 
> Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support")
> Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race")
> Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com

Hopefully it's not too late...

The above should be:

Reported-by: syzbot+732ab7be796ec0d104ac@syzkaller.appspotmail.com

@Mat(s), could you please update the path directly in the export
branch?

Thanks!

Paolo
diff mbox series

Patch

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 4a32d3d11fb6..de04b97e8dd1 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1171,7 +1171,8 @@  static inline bool subflow_simultaneous_connect(struct sock *sk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
 
-	return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_FIN_WAIT1) &&
+	return (1 << sk->sk_state) &
+	       (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING) &&
 	       is_active_ssk(subflow) &&
 	       !subflow->conn_finished;
 }