Message ID | 20250319022733.51444-1-yangang@kylinos.cn (mailing list archive) |
---|---|
State | Accepted, archived |
Commit | 40ad50026099f08be42da50061c6c5f5a6afa4e7 |
Delegated to: | Matthieu Baerts |
Headers | show |
Series | [mptcp-net,v2] mptcp: fix a NULL pointer result in panic | expand |
Context | Check | Description |
---|---|---|
matttbe/build | success | Build and static analysis OK |
matttbe/checkpatch | warning | total: 0 errors, 3 warnings, 0 checks, 34 lines checked |
matttbe/shellcheck | success | MPTCP selftests files have not been modified |
matttbe/KVM_Validation__normal | success | Success! ✅ |
matttbe/KVM_Validation__debug | success | Success! ✅ |
matttbe/KVM_Validation__btf-normal__only_bpftest_all_ | success | Success! ✅ |
matttbe/KVM_Validation__btf-debug__only_bpftest_all_ | success | Success! ✅ |
Hi Gang, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf-normal (only bpftest_all): Success! ✅ - KVM Validation: btf-debug (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/13937733185 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/904006e09ceb Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=945398 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
On 3/19/25 3:27 AM, Gang Yan wrote: > When test valkey in mptcp, the kernel will panic due to the function > 'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL. > > This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message > attached is valkey_test in 6.14.0-rc4: > > Call trace: > mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P) > subflow_syn_recv_sock (./net/mptcp/subflow.c:854) > tcp_check_req (./net/ipv4/tcp_minisocks.c:863) > tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268) > ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207) > ip_local_deliver_finish (./net/ipv4/ip_input.c:234) > ip_local_deliver (./net/ipv4/ip_input.c:254) > ip_rcv_finish (./net/ipv4/ip_input.c:449) > ... > > According to the debug log, the same req received two SYN-ACK in a very > short time. (When the client retransmits the syn ack dueto multiple reasons. > Even if the packets are transmitted with a relevant time interval, they can be > processed by the server on different CPUs concurrently). The 'subflow_req->msk' > ownership is transferred to subflow in the first, and there will be a risk of a > null pointer dereference here. > > This patch can fix this issue by moving the 'subflow_req->msk' under the > `own_req == true` conditional. > > Suggested-by: Paolo Abeni <pabeni@redhat.com> > Signed-off-by: Gang Yan <yangang@kylinos.cn> The code LGTM. Possibly the commit message could be expanded noting that the !msk check in subflow_hmac_valid() could be dropped, because there is already the same check under the own_req mpj branch. Also this needs a Fixes tag: Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use") Matt: could you please adjust the above while applying the patch? Thanks! Paolo
Hi Paolo, On 19/03/2025 08:38, Paolo Abeni wrote: > On 3/19/25 3:27 AM, Gang Yan wrote: >> When test valkey in mptcp, the kernel will panic due to the function >> 'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL. >> >> This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message >> attached is valkey_test in 6.14.0-rc4: >> >> Call trace: >> mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P) >> subflow_syn_recv_sock (./net/mptcp/subflow.c:854) >> tcp_check_req (./net/ipv4/tcp_minisocks.c:863) >> tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268) >> ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207) >> ip_local_deliver_finish (./net/ipv4/ip_input.c:234) >> ip_local_deliver (./net/ipv4/ip_input.c:254) >> ip_rcv_finish (./net/ipv4/ip_input.c:449) >> ... >> >> According to the debug log, the same req received two SYN-ACK in a very >> short time. (When the client retransmits the syn ack dueto multiple reasons. >> Even if the packets are transmitted with a relevant time interval, they can be >> processed by the server on different CPUs concurrently). The 'subflow_req->msk' >> ownership is transferred to subflow in the first, and there will be a risk of a >> null pointer dereference here. >> >> This patch can fix this issue by moving the 'subflow_req->msk' under the >> `own_req == true` conditional. >> >> Suggested-by: Paolo Abeni <pabeni@redhat.com> >> Signed-off-by: Gang Yan <yangang@kylinos.cn> > > The code LGTM. Me too, thank you both for having looked at that! Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > Possibly the commit message could be expanded noting that the !msk check > in subflow_hmac_valid() could be dropped, because there is already the > same check under the own_req mpj branch. > > Also this needs a Fixes tag: > > Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in > use") > > Matt: could you please adjust the above while applying the patch? Sure, I will do that! Cheers, Matt
Hi Paolo, Gang, On 19/03/2025 18:47, Matthieu Baerts wrote: > Hi Paolo, > > On 19/03/2025 08:38, Paolo Abeni wrote: >> On 3/19/25 3:27 AM, Gang Yan wrote: >>> When test valkey in mptcp, the kernel will panic due to the function >>> 'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL. >>> >>> This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message >>> attached is valkey_test in 6.14.0-rc4: >>> >>> Call trace: >>> mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P) >>> subflow_syn_recv_sock (./net/mptcp/subflow.c:854) >>> tcp_check_req (./net/ipv4/tcp_minisocks.c:863) >>> tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268) >>> ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207) >>> ip_local_deliver_finish (./net/ipv4/ip_input.c:234) >>> ip_local_deliver (./net/ipv4/ip_input.c:254) >>> ip_rcv_finish (./net/ipv4/ip_input.c:449) >>> ... >>> >>> According to the debug log, the same req received two SYN-ACK in a very >>> short time. (When the client retransmits the syn ack dueto multiple reasons. >>> Even if the packets are transmitted with a relevant time interval, they can be >>> processed by the server on different CPUs concurrently). The 'subflow_req->msk' >>> ownership is transferred to subflow in the first, and there will be a risk of a >>> null pointer dereference here. >>> >>> This patch can fix this issue by moving the 'subflow_req->msk' under the >>> `own_req == true` conditional. >>> >>> Suggested-by: Paolo Abeni <pabeni@redhat.com> >>> Signed-off-by: Gang Yan <yangang@kylinos.cn> >> >> The code LGTM. > > Me too, thank you both for having looked at that! Now in our tree (fix for net): New patches for t/upstream-net and t/upstream: - 40ad50026099: mptcp: fix NULL pointer in can_accept_new_subflow - Results: 15fd746f34c5..dba54e66e69e (export-net) - Results: 2bea3081157b..332a74c2a7d4 (export) Tests are now in progress: - export-net: https://github.com/multipath-tcp/mptcp_net-next/commit/06f515aab4fa9d84ba4dfe81b17aa7d77ea22b4d/checks - export: https://github.com/multipath-tcp/mptcp_net-next/commit/83da8f611c90d09ade3f1fc8b04302d272c8f7b9/checks Cheers, Matt
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index efe8d86496db..409bd415ef1d 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -754,8 +754,6 @@ static bool subflow_hmac_valid(const struct request_sock *req, subflow_req = mptcp_subflow_rsk(req); msk = subflow_req->msk; - if (!msk) - return false; subflow_generate_hmac(READ_ONCE(msk->remote_key), READ_ONCE(msk->local_key), @@ -850,12 +848,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, } else if (subflow_req->mp_join) { mptcp_get_options(skb, &mp_opt); - if (!(mp_opt.suboptions & OPTION_MPTCP_MPJ_ACK) || - !subflow_hmac_valid(req, &mp_opt) || - !mptcp_can_accept_new_subflow(subflow_req->msk)) { - SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC); + if (!(mp_opt.suboptions & OPTION_MPTCP_MPJ_ACK)) fallback = true; - } } create_child: @@ -905,6 +899,13 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, goto dispose_child; } + if (!subflow_hmac_valid(req, &mp_opt) || + !mptcp_can_accept_new_subflow(subflow_req->msk)) { + SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC); + subflow_add_reset_reason(skb, MPTCP_RST_EPROHIBIT); + goto dispose_child; + } + /* move the msk reference ownership to the subflow */ subflow_req->msk = NULL; ctx->conn = (struct sock *)owner;
When test valkey in mptcp, the kernel will panic due to the function 'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL. This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message attached is valkey_test in 6.14.0-rc4: Call trace: mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P) subflow_syn_recv_sock (./net/mptcp/subflow.c:854) tcp_check_req (./net/ipv4/tcp_minisocks.c:863) tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268) ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207) ip_local_deliver_finish (./net/ipv4/ip_input.c:234) ip_local_deliver (./net/ipv4/ip_input.c:254) ip_rcv_finish (./net/ipv4/ip_input.c:449) ... According to the debug log, the same req received two SYN-ACK in a very short time. (When the client retransmits the syn ack dueto multiple reasons. Even if the packets are transmitted with a relevant time interval, they can be processed by the server on different CPUs concurrently). The 'subflow_req->msk' ownership is transferred to subflow in the first, and there will be a risk of a null pointer dereference here. This patch can fix this issue by moving the 'subflow_req->msk' under the `own_req == true` conditional. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Gang Yan <yangang@kylinos.cn> --- Notes: - v2: - The MPTCP and the TCP code protect vs such race using the `own_req` boolean. So moving the subflow_req->msk under the 'own_req'. --- net/mptcp/subflow.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)