From patchwork Thu May 27 23:31:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 12285661 X-Patchwork-Delegate: mat@martineau.name Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF2826D00 for ; Thu, 27 May 2021 23:31:47 +0000 (UTC) IronPort-SDR: MXx5E4rQ5F1+V8k4PTCCNIN3fKfGXAV/Mt3+qumhE+Nhcue+SU/ZmqXo0/DWD363baes0U9XGT fyUMbSVeDU0A== X-IronPort-AV: E=McAfee;i="6200,9189,9997"; a="202623687" X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="202623687" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2021 16:31:45 -0700 IronPort-SDR: kH00Y9shY11yaf6vp9N3dNmq0fzho3qCBxEzQkqscO5w238UO3/42SQDRfqrzkmquSTFN1npzy q3Xnhq181D8A== X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="477700331" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.209.84.136]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2021 16:31:45 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net 1/4] mptcp: fix sk_forward_memory corruption on retransmission Date: Thu, 27 May 2021 16:31:37 -0700 Message-Id: <20210527233140.182728-2-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527233140.182728-1-mathew.j.martineau@linux.intel.com> References: <20210527233140.182728-1-mathew.j.martineau@linux.intel.com> X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Paolo Abeni MPTCP sk_forward_memory handling is a bit special, as such field is protected by the msk socket spin_lock, instead of the plain socket lock. Currently we have a code path updating such field without handling the relevant lock: __mptcp_retrans() -> __mptcp_clean_una_wakeup() Several helpers in __mptcp_clean_una_wakeup() will update sk_forward_alloc, possibly causing such field corruption, as reported by Matthieu. Address the issue providing and using a new variant of blamed function which explicitly acquires the msk spin lock. Fixes: 64b9cea7a0af ("mptcp: fix spurious retransmissions") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/172 Reported-by: Matthieu Baerts Tested-by: Matthieu Baerts Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/protocol.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 2bc199549a88..5edc686faff1 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -947,6 +947,10 @@ static void __mptcp_update_wmem(struct sock *sk) { struct mptcp_sock *msk = mptcp_sk(sk); +#ifdef CONFIG_LOCKDEP + WARN_ON_ONCE(!lockdep_is_held(&sk->sk_lock.slock)); +#endif + if (!msk->wmem_reserved) return; @@ -1085,10 +1089,20 @@ static void __mptcp_clean_una(struct sock *sk) static void __mptcp_clean_una_wakeup(struct sock *sk) { +#ifdef CONFIG_LOCKDEP + WARN_ON_ONCE(!lockdep_is_held(&sk->sk_lock.slock)); +#endif __mptcp_clean_una(sk); mptcp_write_space(sk); } +static void mptcp_clean_una_wakeup(struct sock *sk) +{ + mptcp_data_lock(sk); + __mptcp_clean_una_wakeup(sk); + mptcp_data_unlock(sk); +} + static void mptcp_enter_memory_pressure(struct sock *sk) { struct mptcp_subflow_context *subflow; @@ -2299,7 +2313,7 @@ static void __mptcp_retrans(struct sock *sk) struct sock *ssk; int ret; - __mptcp_clean_una_wakeup(sk); + mptcp_clean_una_wakeup(sk); dfrag = mptcp_rtx_head(sk); if (!dfrag) { if (mptcp_data_fin_enabled(msk)) { From patchwork Thu May 27 23:31:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 12285663 X-Patchwork-Delegate: mat@martineau.name Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12DAE6D0D for ; Thu, 27 May 2021 23:31:48 +0000 (UTC) IronPort-SDR: hrP70nwFpeyg/W1j/yFDUObcf0DgA7WpWaNFKoppwt4VDniSnhcsHnE0VLA0xD3/I952rW13RO Fv1oR8RXJhHA== X-IronPort-AV: E=McAfee;i="6200,9189,9997"; a="202623689" X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="202623689" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2021 16:31:45 -0700 IronPort-SDR: 8zDKVubE6zpPBf1AA5nZQwLzQ0ATBhUAfJOkJWfVIgQtxKMpNhJz5PPPFDvkHEEHkg8su4WZCy YwkBmvqDulQg== X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="477700332" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.209.84.136]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2021 16:31:45 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net 2/4] mptcp: always parse mptcp options for MPC reqsk Date: Thu, 27 May 2021 16:31:38 -0700 Message-Id: <20210527233140.182728-3-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527233140.182728-1-mathew.j.martineau@linux.intel.com> References: <20210527233140.182728-1-mathew.j.martineau@linux.intel.com> X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Paolo Abeni In subflow_syn_recv_sock() we currently skip options parsing for OoO packet, given that such packets may not carry the relevant MPC option. If the peer generates an MPC+data TSO packet and some of the early segments are lost or get reorder, we server will ignore the peer key, causing transient, unexpected fallback to TCP. The solution is always parsing the incoming MPTCP options, and do the fallback only for in-order packets. This actually cleans the existing code a bit. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Reported-by: Matthieu Baerts Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/subflow.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index bde6be77ea73..c6ee81149829 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -630,21 +630,20 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, /* if the sk is MP_CAPABLE, we try to fetch the client key */ if (subflow_req->mp_capable) { - if (TCP_SKB_CB(skb)->seq != subflow_req->ssn_offset + 1) { - /* here we can receive and accept an in-window, - * out-of-order pkt, which will not carry the MP_CAPABLE - * opt even on mptcp enabled paths - */ - goto create_msk; - } - + /* we can receive and accept an in-window, out-of-order pkt, + * which may not carry the MP_CAPABLE opt even on mptcp enabled + * paths: always try to extract the peer key, and fallback + * for packets missing it. + * Even OoO DSS packets coming legitly after dropped or + * reordered MPC will cause fallback, but we don't have other + * options. + */ mptcp_get_options(skb, &mp_opt); if (!mp_opt.mp_capable) { fallback = true; goto create_child; } -create_msk: new_msk = mptcp_sk_clone(listener->conn, &mp_opt, req); if (!new_msk) fallback = true; From patchwork Thu May 27 23:31:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 12285665 X-Patchwork-Delegate: mat@martineau.name Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E62ED6D00 for ; Thu, 27 May 2021 23:31:48 +0000 (UTC) IronPort-SDR: 40LbWZZ2+DlbDxk+DBdxrWurpJwN6Dki05CdaLY+nzJC1Pjlux2RTGTgi9oeaw4hY7IwIVwNN4 7XFCuXSpJrWg== X-IronPort-AV: E=McAfee;i="6200,9189,9997"; a="202623691" X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="202623691" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2021 16:31:45 -0700 IronPort-SDR: xQY4shMYAtwGfOqK5tGOFSc6ORdt/AOyDBucP0yeQ1e+VyL3pi3e+GgBSEuuEzXFzlgdHeQw+p KrMa+eQKDa3g== X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="477700334" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.209.84.136]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2021 16:31:45 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net 3/4] mptcp: do not reset MP_CAPABLE subflow on mapping errors Date: Thu, 27 May 2021 16:31:39 -0700 Message-Id: <20210527233140.182728-4-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527233140.182728-1-mathew.j.martineau@linux.intel.com> References: <20210527233140.182728-1-mathew.j.martineau@linux.intel.com> X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Paolo Abeni When some mapping related errors occurs we close the main MPC subflow with a RST. We should instead fallback gracefully to TCP, and do the reset only for MPJ subflows. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/192 Reported-by: Matthieu Baerts Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/subflow.c | 62 +++++++++++++++++++++++---------------------- 1 file changed, 32 insertions(+), 30 deletions(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index c6ee81149829..ef3d037f984a 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1011,21 +1011,11 @@ static bool subflow_check_data_avail(struct sock *ssk) status = get_mapping_status(ssk, msk); trace_subflow_check_data_avail(status, skb_peek(&ssk->sk_receive_queue)); - if (status == MAPPING_INVALID) { - ssk->sk_err = EBADMSG; - goto fatal; - } - if (status == MAPPING_DUMMY) { - __mptcp_do_fallback(msk); - skb = skb_peek(&ssk->sk_receive_queue); - subflow->map_valid = 1; - subflow->map_seq = READ_ONCE(msk->ack_seq); - subflow->map_data_len = skb->len; - subflow->map_subflow_seq = tcp_sk(ssk)->copied_seq - - subflow->ssn_offset; - subflow->data_avail = MPTCP_SUBFLOW_DATA_AVAIL; - return true; - } + if (unlikely(status == MAPPING_INVALID)) + goto fallback; + + if (unlikely(status == MAPPING_DUMMY)) + goto fallback; if (status != MAPPING_OK) goto no_data; @@ -1038,10 +1028,8 @@ static bool subflow_check_data_avail(struct sock *ssk) * MP_CAPABLE-based mapping */ if (unlikely(!READ_ONCE(msk->can_ack))) { - if (!subflow->mpc_map) { - ssk->sk_err = EBADMSG; - goto fatal; - } + if (!subflow->mpc_map) + goto fallback; WRITE_ONCE(msk->remote_key, subflow->remote_key); WRITE_ONCE(msk->ack_seq, subflow->map_seq); WRITE_ONCE(msk->can_ack, true); @@ -1069,17 +1057,31 @@ static bool subflow_check_data_avail(struct sock *ssk) no_data: subflow_sched_work_if_closed(msk, ssk); return false; -fatal: - /* fatal protocol error, close the socket */ - /* This barrier is coupled with smp_rmb() in tcp_poll() */ - smp_wmb(); - ssk->sk_error_report(ssk); - tcp_set_state(ssk, TCP_CLOSE); - subflow->reset_transient = 0; - subflow->reset_reason = MPTCP_RST_EMPTCP; - tcp_send_active_reset(ssk, GFP_ATOMIC); - subflow->data_avail = 0; - return false; + +fallback: + /* RFC 8684 section 3.7. */ + if (subflow->mp_join || subflow->fully_established) { + /* fatal protocol error, close the socket. + * subflow_error_report() will introduce the appropriate barriers + */ + ssk->sk_err = EBADMSG; + ssk->sk_error_report(ssk); + tcp_set_state(ssk, TCP_CLOSE); + subflow->reset_transient = 0; + subflow->reset_reason = MPTCP_RST_EMPTCP; + tcp_send_active_reset(ssk, GFP_ATOMIC); + subflow->data_avail = 0; + return false; + } + + __mptcp_do_fallback(msk); + skb = skb_peek(&ssk->sk_receive_queue); + subflow->map_valid = 1; + subflow->map_seq = READ_ONCE(msk->ack_seq); + subflow->map_data_len = skb->len; + subflow->map_subflow_seq = tcp_sk(ssk)->copied_seq - subflow->ssn_offset; + subflow->data_avail = MPTCP_SUBFLOW_DATA_AVAIL; + return true; } bool mptcp_subflow_data_available(struct sock *sk) From patchwork Thu May 27 23:31:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 12285667 X-Patchwork-Delegate: mat@martineau.name Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 307D96D0F for ; Thu, 27 May 2021 23:31:49 +0000 (UTC) IronPort-SDR: NUlfEI/4cy+0wPtM/WmJzbBAU7Ou/YUqtkl5aAorv/5m8Wqck0S82IFwWCaaWnsvC62wghNqci kKUeuTKMM6ZA== X-IronPort-AV: E=McAfee;i="6200,9189,9997"; a="202623693" X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="202623693" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2021 16:31:45 -0700 IronPort-SDR: mab1Qem+WKbXV9Qcq1Ffq23iPa8bfYct/siRzKEEEyHIdJtmkQTW5uLc5HyBuTHfip52a9cgC9 QdsGsb94mxQA== X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="477700335" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.209.84.136]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2021 16:31:45 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net 4/4] mptcp: update selftest for fallback due to OoO Date: Thu, 27 May 2021 16:31:40 -0700 Message-Id: <20210527233140.182728-5-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527233140.182728-1-mathew.j.martineau@linux.intel.com> References: <20210527233140.182728-1-mathew.j.martineau@linux.intel.com> X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Paolo Abeni The previous commit noted that we can have fallback scenario due to OoO (or packet drop). Update the self-tests accordingly Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh index 3c4cb72ed8a4..9ca5f1ba461e 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -501,6 +501,7 @@ do_transfer() local stat_ackrx_now_l=$(get_mib_counter "${listener_ns}" "MPTcpExtMPCapableACKRX") local stat_cookietx_now=$(get_mib_counter "${listener_ns}" "TcpExtSyncookiesSent") local stat_cookierx_now=$(get_mib_counter "${listener_ns}" "TcpExtSyncookiesRecv") + local stat_ooo_now=$(get_mib_counter "${listener_ns}" "TcpExtTCPOFOQueue") expect_synrx=$((stat_synrx_last_l)) expect_ackrx=$((stat_ackrx_last_l)) @@ -518,10 +519,14 @@ do_transfer() "${stat_synrx_now_l}" "${expect_synrx}" 1>&2 retc=1 fi - if [ ${stat_ackrx_now_l} -lt ${expect_ackrx} ]; then - printf "[ FAIL ] lower MPC ACK rx (%d) than expected (%d)\n" \ - "${stat_ackrx_now_l}" "${expect_ackrx}" 1>&2 - rets=1 + if [ ${stat_ackrx_now_l} -lt ${expect_ackrx} -a ${stat_ooo_now} -eq 0 ]; then + if [ ${stat_ooo_now} -eq 0 ]; then + printf "[ FAIL ] lower MPC ACK rx (%d) than expected (%d)\n" \ + "${stat_ackrx_now_l}" "${expect_ackrx}" 1>&2 + rets=1 + else + printf "[ Note ] fallback due to TCP OoO" + fi fi if [ $retc -eq 0 ] && [ $rets -eq 0 ]; then