diff mbox series

[mptcp-net] mptcp: always handle address removal under msk socket lock

Message ID 61cded134e753976259b48a4269f203dcae6c8fa.1739969092.git.pabeni@redhat.com (mailing list archive)
State Accepted, archived
Commit c02a912d7eee88f0e435aa347d44b53126d77a55
Delegated to: Matthieu Baerts
Headers show
Series [mptcp-net] mptcp: always handle address removal under msk socket lock | expand

Checks

Context Check Description
matttbe/build success Build and static analysis OK
matttbe/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
matttbe/shellcheck success MPTCP selftests files have not been modified
matttbe/KVM_Validation__normal success Success! ✅
matttbe/KVM_Validation__debug success Success! ✅
matttbe/KVM_Validation__btf-normal__only_bpftest_all_ success Success! ✅
matttbe/KVM_Validation__btf-debug__only_bpftest_all_ success Success! ✅

Commit Message

Paolo Abeni Feb. 19, 2025, 12:45 p.m. UTC
Syzkaller reported a lockdep splat in the PM control path:

WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 sock_owned_by_me include/net/sock.h:1711 [inline]
WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 msk_owned_by_me net/mptcp/protocol.h:363 [inline]
WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 mptcp_pm_nl_addr_send_ack+0x57c/0x610 net/mptcp/pm_netlink.c:788
Modules linked in:
CPU: 0 UID: 0 PID: 6693 Comm: syz.0.205 Not tainted 6.14.0-rc2-syzkaller-00303-gad1b832bf1cf #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
RIP: 0010:sock_owned_by_me include/net/sock.h:1711 [inline]
RIP: 0010:msk_owned_by_me net/mptcp/protocol.h:363 [inline]
RIP: 0010:mptcp_pm_nl_addr_send_ack+0x57c/0x610 net/mptcp/pm_netlink.c:788
Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ca 7b d3 f5 eb b9 e8 c3 7b d3 f5 90 0f 0b 90 e9 dd fb ff ff e8 b5 7b d3 f5 90 <0f> 0b 90 e9 3e fb ff ff 44 89 f1 80 e1 07 38 c1 0f 8c eb fb ff ff
RSP: 0000:ffffc900034f6f60 EFLAGS: 00010283
RAX: ffffffff8bee3c2b RBX: 0000000000000001 RCX: 0000000000080000
RDX: ffffc90004d42000 RSI: 000000000000a407 RDI: 000000000000a408
RBP: ffffc900034f7030 R08: ffffffff8bee37f6 R09: 0100000000000000
R10: dffffc0000000000 R11: ffffed100bcc62e4 R12: ffff88805e6316e0
R13: ffff88805e630c00 R14: dffffc0000000000 R15: ffff88805e630c00
FS:  00007f7e9a7e96c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2fd18ff8 CR3: 0000000032c24000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 mptcp_pm_remove_addr+0x103/0x1d0 net/mptcp/pm.c:59
 mptcp_pm_remove_anno_addr+0x1f4/0x2f0 net/mptcp/pm_netlink.c:1486
 mptcp_nl_remove_subflow_and_signal_addr net/mptcp/pm_netlink.c:1518 [inline]
 mptcp_pm_nl_del_addr_doit+0x118d/0x1af0 net/mptcp/pm_netlink.c:1629
 genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
 genl_rcv_msg+0xb1f/0xec0 net/netlink/genetlink.c:1210
 netlink_rcv_skb+0x206/0x480 net/netlink/af_netlink.c:2543
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
 netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1348
 netlink_sendmsg+0x8de/0xcb0 net/netlink/af_netlink.c:1892
 sock_sendmsg_nosec net/socket.c:718 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:733
 ____sys_sendmsg+0x53a/0x860 net/socket.c:2573
 ___sys_sendmsg net/socket.c:2627 [inline]
 __sys_sendmsg+0x269/0x350 net/socket.c:2659
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7e9998cde9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f7e9a7e9038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f7e99ba5fa0 RCX: 00007f7e9998cde9
RDX: 000000002000c094 RSI: 0000400000000000 RDI: 0000000000000007
RBP: 00007f7e99a0e2a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f7e99ba5fa0 R15: 00007fff49231088

Indeed the PM can try to send a RM_ADDR over a msk without acquiring
first the msk socket lock.

The bugged code-path comes from an early optimization: when there
are no subflows, the PM should (usually) not send RM_ADDR
notifications.

The above statement is incorrect, as without locks another process
could concurrent create a new subflow and cause the RM_ADDR generation.

Additionally the supposed optimization is not very effective even
performance-wise, as most mptcp sockets should have at least one
subflow: the MPC one.

Address the issue removing the buggy code path, the existing "slow-path"
will handle correctly even the edge case.

Reported-by: syzbot+cd3ce3d03a3393ae9700@syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/546
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/pm_netlink.c | 5 -----
 1 file changed, 5 deletions(-)

Comments

MPTCP CI Feb. 19, 2025, 1:49 p.m. UTC | #1
Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/13413108659

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/f8164af657a2
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=935572


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Matthieu Baerts Feb. 19, 2025, 5:45 p.m. UTC | #2
Hi Paolo,

On 19/02/2025 13:45, Paolo Abeni wrote:
> Syzkaller reported a lockdep splat in the PM control path:

(...)

> Indeed the PM can try to send a RM_ADDR over a msk without acquiring
> first the msk socket lock.
> 
> The bugged code-path comes from an early optimization: when there
> are no subflows, the PM should (usually) not send RM_ADDR
> notifications.
> 
> The above statement is incorrect, as without locks another process
> could concurrent create a new subflow and cause the RM_ADDR generation.
> 
> Additionally the supposed optimization is not very effective even
> performance-wise, as most mptcp sockets should have at least one
> subflow: the MPC one.
Thank you for having looked at this!

By chance, do you know how syzkaller managed to interact with an MPTCP
socket not having any attached subflows? An msk being destroyed?

Just to know if there are specific cases we need to pay attention to in
the future :)

> Address the issue removing the buggy code path, the existing "slow-path"
> will handle correctly even the edge case.
Thank you, it looks good to me!

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

Cheers,
Matt
Matthieu Baerts Feb. 19, 2025, 6:45 p.m. UTC | #3
Hi Paolo,

On 19/02/2025 18:45, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 19/02/2025 13:45, Paolo Abeni wrote:
>> Syzkaller reported a lockdep splat in the PM control path:
> 
> (...)
> 
>> Indeed the PM can try to send a RM_ADDR over a msk without acquiring
>> first the msk socket lock.
>>
>> The bugged code-path comes from an early optimization: when there
>> are no subflows, the PM should (usually) not send RM_ADDR
>> notifications.
>>
>> The above statement is incorrect, as without locks another process
>> could concurrent create a new subflow and cause the RM_ADDR generation.
>>
>> Additionally the supposed optimization is not very effective even
>> performance-wise, as most mptcp sockets should have at least one
>> subflow: the MPC one.
> Thank you for having looked at this!
> 
> By chance, do you know how syzkaller managed to interact with an MPTCP
> socket not having any attached subflows? An msk being destroyed?
> 
> Just to know if there are specific cases we need to pay attention to in
> the future :)

(This question was obviously not blocking this patch, so I just applied it)

Now in our tree (fixes for -net):

New patches for t/upstream-net and t/upstream:
- c02a912d7eee: mptcp: always handle address removal under msk socket lock
- Results: aba580cd4fc8..eb57543e795e (export-net)
- Results: 3ee8e45b1642..98f1e7131283 (export)

Tests are now in progress:

- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/0098419fb05cdbc73ffd9855dc2db613f79366ea/checks
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/d29ae41108d2ac0ae39cd5fe27da1898e826833f/checks

Cheers,
Matt
Paolo Abeni Feb. 20, 2025, 9:33 a.m. UTC | #4
On 2/19/25 6:45 PM, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 19/02/2025 13:45, Paolo Abeni wrote:
>> Syzkaller reported a lockdep splat in the PM control path:
> 
> (...)
> 
>> Indeed the PM can try to send a RM_ADDR over a msk without acquiring
>> first the msk socket lock.
>>
>> The bugged code-path comes from an early optimization: when there
>> are no subflows, the PM should (usually) not send RM_ADDR
>> notifications.
>>
>> The above statement is incorrect, as without locks another process
>> could concurrent create a new subflow and cause the RM_ADDR generation.
>>
>> Additionally the supposed optimization is not very effective even
>> performance-wise, as most mptcp sockets should have at least one
>> subflow: the MPC one.
> Thank you for having looked at this!
> 
> By chance, do you know how syzkaller managed to interact with an MPTCP
> socket not having any attached subflows? An msk being destroyed?

I'm not 110% and I don't have the syzkaller log for the relevant splat
handy, but i.e. a passive socket at clone time does:

	list_add(&subflow->node, &msk->conn_list); // [1]
	sock_hold(ssk);
	mptcp_token_accept(subflow_req, msk); // [2]

under the msk socket spin lock.

After [2], the PM ops can find and operate on the newly accepted msk
socket, but lacking any lock protection and memory barriers could still
observe an empty conn_list, even given the prior [1].

The race is tiny, thus is hard for syzkaller to find a repro.

/P
Matthieu Baerts Feb. 20, 2025, 10:40 a.m. UTC | #5
Hi Paolo,

On 20/02/2025 10:33, Paolo Abeni wrote:
> On 2/19/25 6:45 PM, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> On 19/02/2025 13:45, Paolo Abeni wrote:
>>> Syzkaller reported a lockdep splat in the PM control path:
>>
>> (...)
>>
>>> Indeed the PM can try to send a RM_ADDR over a msk without acquiring
>>> first the msk socket lock.
>>>
>>> The bugged code-path comes from an early optimization: when there
>>> are no subflows, the PM should (usually) not send RM_ADDR
>>> notifications.
>>>
>>> The above statement is incorrect, as without locks another process
>>> could concurrent create a new subflow and cause the RM_ADDR generation.
>>>
>>> Additionally the supposed optimization is not very effective even
>>> performance-wise, as most mptcp sockets should have at least one
>>> subflow: the MPC one.
>> Thank you for having looked at this!
>>
>> By chance, do you know how syzkaller managed to interact with an MPTCP
>> socket not having any attached subflows? An msk being destroyed?
> 
> I'm not 110% and I don't have the syzkaller log for the relevant splat
> handy, but i.e. a passive socket at clone time does:
> 
> 	list_add(&subflow->node, &msk->conn_list); // [1]
> 	sock_hold(ssk);
> 	mptcp_token_accept(subflow_req, msk); // [2]
> 
> under the msk socket spin lock.
> 
> After [2], the PM ops can find and operate on the newly accepted msk
> socket, but lacking any lock protection and memory barriers could still
> observe an empty conn_list, even given the prior [1].
> 
> The race is tiny, thus is hard for syzkaller to find a repro.

I see, thank you for your reply and for having checked!

I will try to keep this in mind when looking at future potential issues.
(And not always assuming there is at least one subflow in the list.)

Cheers,
Matt
Paolo Abeni Feb. 20, 2025, 6:03 p.m. UTC | #6
On 2/20/25 11:40 AM, Matthieu Baerts wrote:
> On 20/02/2025 10:33, Paolo Abeni wrote:
>> I'm not 110% and I don't have the syzkaller log for the relevant splat
>> handy, but i.e. a passive socket at clone time does:
>>
>> 	list_add(&subflow->node, &msk->conn_list); // [1]
>> 	sock_hold(ssk);
>> 	mptcp_token_accept(subflow_req, msk); // [2]
>>
>> under the msk socket spin lock.
>>
>> After [2], the PM ops can find and operate on the newly accepted msk
>> socket, but lacking any lock protection and memory barriers could still
>> observe an empty conn_list, even given the prior [1].
>>
>> The race is tiny, thus is hard for syzkaller to find a repro.
> 
> I see, thank you for your reply and for having checked!
> 
> I will try to keep this in mind when looking at future potential issues.
> (And not always assuming there is at least one subflow in the list.)

To clarify: I think such assumption is true under the msk socket lock,
but it can't be asserted without the lock.

/P
diff mbox series

Patch

diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 572d160edca3..c0e47f4f7b1a 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1514,11 +1514,6 @@  static int mptcp_nl_remove_subflow_and_signal_addr(struct net *net,
 		if (mptcp_pm_is_userspace(msk))
 			goto next;
 
-		if (list_empty(&msk->conn_list)) {
-			mptcp_pm_remove_anno_addr(msk, addr, false);
-			goto next;
-		}
-
 		lock_sock(sk);
 		remove_subflow = mptcp_lookup_subflow_by_saddr(&msk->conn_list, addr);
 		mptcp_pm_remove_anno_addr(msk, addr, remove_subflow &&