From patchwork Wed May 4 21:54:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 12838687 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52971C433F5 for ; Wed, 4 May 2022 21:54:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239436AbiEDV5z (ORCPT ); Wed, 4 May 2022 17:57:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230087AbiEDV5y (ORCPT ); Wed, 4 May 2022 17:57:54 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A044E50473 for ; Wed, 4 May 2022 14:54:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651701257; x=1683237257; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qnJzb3wltAIT3SqGTtEHEESmhK0f8+gNhdxlDAAqu2E=; b=HZZI8q1JsSlkulMzDkN0XeNQaVAZCFOQzuHTUJykzX32739Dk13/wfyU mOJKTTzqlstagRo+EwQ13uy3mhEo6G2wPea0nJA0jkY9mIHNnWKs5YiDO bKlNopnVtnnlaFQ5ns82pnF/6cfaUAxghi1Ddjg83kmHaKoJRTOhAWSIX tkK4FvWz/u9CMlwiIk2mYZmbkfU3QwL0kZoY/g95I2pXx7p9mzGojCf+P BRBHP+GTKhDD/YXRofvojrTZFIoPoSr759nBL9n7RzqaGmTTHlt8pratQ B2AU33RRCAY5frV0emu1oV+KAyZs5Y/Oks92IVGJLJVjocm1JHeVsgIoB Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10337"; a="248445422" X-IronPort-AV: E=Sophos;i="5.91,199,1647327600"; d="scan'208";a="248445422" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2022 14:54:17 -0700 X-IronPort-AV: E=Sophos;i="5.91,199,1647327600"; d="scan'208";a="621000359" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.251.111]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2022 14:54:16 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, edumazet@google.com, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net-next 1/5] mptcp: really share subflow snd_wnd Date: Wed, 4 May 2022 14:54:04 -0700 Message-Id: <20220504215408.349318-2-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20220504215408.349318-1-mathew.j.martineau@linux.intel.com> References: <20220504215408.349318-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni As per RFC, mptcp subflows use a "shared" snd_wnd: the effective window is the maximum among the current values received on all subflows. Without such feature a data transfer using multiple subflows could block. Window sharing is currently implemented in the RX side: __tcp_select_window uses the mptcp-level receive buffer to compute the announced window. That is not enough: the TCP stack will stick to the window size received on the given subflow; we need to propagate the msk window value on each subflow at xmit time. Change the packet scheduler to ignore the subflow level window and use instead the msk level one Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/protocol.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 52ed2c0ac901..97a375eabd34 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1141,19 +1141,20 @@ struct mptcp_sendmsg_info { bool data_lock_held; }; -static int mptcp_check_allowed_size(struct mptcp_sock *msk, u64 data_seq, - int avail_size) +static int mptcp_check_allowed_size(const struct mptcp_sock *msk, struct sock *ssk, + u64 data_seq, int avail_size) { u64 window_end = mptcp_wnd_end(msk); + u64 mptcp_snd_wnd; if (__mptcp_check_fallback(msk)) return avail_size; - if (!before64(data_seq + avail_size, window_end)) { - u64 allowed_size = window_end - data_seq; + mptcp_snd_wnd = window_end - data_seq; + avail_size = min_t(unsigned int, mptcp_snd_wnd, avail_size); - return min_t(unsigned int, allowed_size, avail_size); - } + if (unlikely(tcp_sk(ssk)->snd_wnd < mptcp_snd_wnd)) + tcp_sk(ssk)->snd_wnd = min_t(u64, U32_MAX, mptcp_snd_wnd); return avail_size; } @@ -1305,7 +1306,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, } /* Zero window and all data acked? Probe. */ - copy = mptcp_check_allowed_size(msk, data_seq, copy); + copy = mptcp_check_allowed_size(msk, ssk, data_seq, copy); if (copy == 0) { u64 snd_una = READ_ONCE(msk->snd_una); @@ -1498,11 +1499,16 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) * to check that subflow has a non empty cwin. */ ssk = send_info[SSK_MODE_ACTIVE].ssk; - if (!ssk || !sk_stream_memory_free(ssk) || !tcp_sk(ssk)->snd_wnd) + if (!ssk || !sk_stream_memory_free(ssk)) return NULL; - burst = min_t(int, MPTCP_SEND_BURST_SIZE, tcp_sk(ssk)->snd_wnd); + burst = min_t(int, MPTCP_SEND_BURST_SIZE, mptcp_wnd_end(msk) - msk->snd_nxt); wmem = READ_ONCE(ssk->sk_wmem_queued); + if (!burst) { + msk->last_snd = NULL; + return ssk; + } + subflow = mptcp_subflow_ctx(ssk); subflow->avg_pacing_rate = div_u64((u64)subflow->avg_pacing_rate * wmem + READ_ONCE(ssk->sk_pacing_rate) * burst, From patchwork Wed May 4 21:54:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 12838688 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3681BC433EF for ; Wed, 4 May 2022 21:54:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346099AbiEDV6E (ORCPT ); Wed, 4 May 2022 17:58:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239554AbiEDV54 (ORCPT ); Wed, 4 May 2022 17:57:56 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 628C5506E7 for ; Wed, 4 May 2022 14:54:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651701259; x=1683237259; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Keh/V0hKcsPVV4uZnaQ4ISYtNk/JzPjaop5gmEJiqy4=; b=O4scArRW7FTX8aVpmzvMDuRG3jmEtbtYpLDU5on+LmzF3m+bqBkocz3q iWn0zFbOWmtLvr8AqF+VwEfkoCNYYiO3WQB7U9QMaxqU7/1VxVPbQgd4b Qu4HnB/b5mOGGtqZ0LgOHQZcd8zZYiQbo4qejIXUVnXkbdsXMDaaBFwAH zPywjIdpsyE3kUiXGIgLlhQjbBLqk1tJUeMHvb2PDWUnQDatFrTYZ6iU8 pPszG84Zg3mL7ghinyFzGrT6I2CeBYRmdCrZ/CWtQm0bDnCT97V5gYMxJ n71arYYS8ceKMUn+4uA5Z3HxsdwkcKUGqohR/j5uBsAEckNfCEdvuCKb0 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10337"; a="248445425" X-IronPort-AV: E=Sophos;i="5.91,199,1647327600"; d="scan'208";a="248445425" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2022 14:54:19 -0700 X-IronPort-AV: E=Sophos;i="5.91,199,1647327600"; d="scan'208";a="621000370" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.251.111]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2022 14:54:17 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, edumazet@google.com, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net-next 2/5] mptcp: add mib for xmit window sharing Date: Wed, 4 May 2022 14:54:05 -0700 Message-Id: <20220504215408.349318-3-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20220504215408.349318-1-mathew.j.martineau@linux.intel.com> References: <20220504215408.349318-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni Bump a counter for counter when snd_wnd is shared among subflow, for observability's sake. Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/mib.c | 1 + net/mptcp/mib.h | 1 + net/mptcp/protocol.c | 4 +++- 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index d93a8c9996fd..6a6f8151375a 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -56,6 +56,7 @@ static const struct snmp_mib mptcp_snmp_list[] = { SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED), SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE), SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER), + SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED), SNMP_MIB_SENTINEL }; diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index 529d07af9e14..2411510bef66 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -49,6 +49,7 @@ enum linux_mptcp_mib_field { MPTCP_MIB_RCVPRUNED, /* Incoming packet dropped due to memory limit */ MPTCP_MIB_SUBFLOWSTALE, /* Subflows entered 'stale' status */ MPTCP_MIB_SUBFLOWRECOVER, /* Subflows returned to active status after being stale */ + MPTCP_MIB_SNDWNDSHARED, /* Subflow snd wnd is overridden by msk's one */ __MPTCP_MIB_MAX }; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 97a375eabd34..6710960b74f3 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1153,8 +1153,10 @@ static int mptcp_check_allowed_size(const struct mptcp_sock *msk, struct sock *s mptcp_snd_wnd = window_end - data_seq; avail_size = min_t(unsigned int, mptcp_snd_wnd, avail_size); - if (unlikely(tcp_sk(ssk)->snd_wnd < mptcp_snd_wnd)) + if (unlikely(tcp_sk(ssk)->snd_wnd < mptcp_snd_wnd)) { tcp_sk(ssk)->snd_wnd = min_t(u64, U32_MAX, mptcp_snd_wnd); + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_SNDWNDSHARED); + } return avail_size; } From patchwork Wed May 4 21:54:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 12838690 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 615D4C433EF for ; Wed, 4 May 2022 21:54:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243437AbiEDV6F (ORCPT ); Wed, 4 May 2022 17:58:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239578AbiEDV55 (ORCPT ); Wed, 4 May 2022 17:57:57 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8F3B50E15 for ; Wed, 4 May 2022 14:54:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651701260; x=1683237260; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UHrBSVM6h23pOzib57ZIg1rDyJtjxy4O5aLuYMcgO5I=; b=Nm51RJIMPqtVv5BFJNxRFbimBEV3hhWfoKwTW8DKUjEFWZlE6I+RfGv/ PZvSlzI6PViuFdx1wiP23EHVpxWgR9Oe8PBuZx75QFZ0MhHkN+7EcWU/4 S6k/qsvOyl/+IK1deVgG7xuDo9b4wkyxeSkNuo0VsDU1ofiEE5OY2owaG kjMV/FkaLp0+7vauQTZVvscKmWsqGbWzWszTjBpeCT7A8oKDJwVRcRm5z vFmEK70O5yUiLzZ95NjSfVTEQj6dy6dEU+r4D3A4OM9v9XkaHSeO273Td ztlqiwPqlpkG3knnVolRK88q8GRFRSSr037sXzzeuTv+1gADlcIkmA+Wh w==; X-IronPort-AV: E=McAfee;i="6400,9594,10337"; a="248445427" X-IronPort-AV: E=Sophos;i="5.91,199,1647327600"; d="scan'208";a="248445427" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2022 14:54:20 -0700 X-IronPort-AV: E=Sophos;i="5.91,199,1647327600"; d="scan'208";a="621000378" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.251.111]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2022 14:54:19 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, edumazet@google.com, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net-next 3/5] tcp: allow MPTCP to update the announced window Date: Wed, 4 May 2022 14:54:06 -0700 Message-Id: <20220504215408.349318-4-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20220504215408.349318-1-mathew.j.martineau@linux.intel.com> References: <20220504215408.349318-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni The MPTCP RFC requires that the MPTCP-level receive window's right edge never moves backward. Currently the MPTCP code enforces such constraint while tracking the right edge, but it does not reflects it on the wire, as MPTCP lacks a suitable hook to update accordingly the TCP header. This change modifies the existing mptcp_write_options() hook, providing the current packet's TCP header to the MPTCP protocol, so that the next patch could implement the above mentioned constraint. No functional changes intended. Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- include/net/mptcp.h | 2 +- net/ipv4/tcp_output.c | 14 ++++++++------ net/mptcp/options.c | 2 +- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 8b1afd6f5cc4..d4ec894ce67b 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -125,7 +125,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, struct mptcp_out_options *opts); bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb); -void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp, +void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp, struct mptcp_out_options *opts); void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 5f91a9536e00..b092228e4342 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -445,12 +445,13 @@ struct tcp_out_options { struct mptcp_out_options mptcp; }; -static void mptcp_options_write(__be32 *ptr, const struct tcp_sock *tp, +static void mptcp_options_write(struct tcphdr *th, __be32 *ptr, + struct tcp_sock *tp, struct tcp_out_options *opts) { #if IS_ENABLED(CONFIG_MPTCP) if (unlikely(OPTION_MPTCP & opts->options)) - mptcp_write_options(ptr, tp, &opts->mptcp); + mptcp_write_options(th, ptr, tp, &opts->mptcp); #endif } @@ -606,9 +607,10 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb, * At least SACK_PERM as the first option is known to lead to a disaster * (but it may well be that other scenarios fail similarly). */ -static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, +static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp, struct tcp_out_options *opts) { + __be32 *ptr = (__be32 *)(th + 1); u16 options = opts->options; /* mungable copy */ if (unlikely(OPTION_MD5 & options)) { @@ -702,7 +704,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, smc_options_write(ptr, &options); - mptcp_options_write(ptr, tp, opts); + mptcp_options_write(th, ptr, tp, opts); } static void smc_set_option(const struct tcp_sock *tp, @@ -1355,7 +1357,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, th->window = htons(min(tp->rcv_wnd, 65535U)); } - tcp_options_write((__be32 *)(th + 1), tp, &opts); + tcp_options_write(th, tp, &opts); #ifdef CONFIG_TCP_MD5SIG /* Calculate the MD5 hash, as we have all we need now */ @@ -3591,7 +3593,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, /* RFC1323: The window in SYN & SYN/ACK segments is never scaled. */ th->window = htons(min(req->rsk_rcv_wnd, 65535U)); - tcp_options_write((__be32 *)(th + 1), NULL, &opts); + tcp_options_write(th, NULL, &opts); th->doff = (tcp_header_size >> 2); __TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS); diff --git a/net/mptcp/options.c b/net/mptcp/options.c index e05d9458a025..2570911735ab 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1265,7 +1265,7 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext) ~csum_unfold(mpext->csum)); } -void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp, +void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp, struct mptcp_out_options *opts) { const struct sock *ssk = (const struct sock *)tp; From patchwork Wed May 4 21:54:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 12838689 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5D3DC433FE for ; Wed, 4 May 2022 21:54:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239489AbiEDV6F (ORCPT ); Wed, 4 May 2022 17:58:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239641AbiEDV57 (ORCPT ); Wed, 4 May 2022 17:57:59 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F9E65007E for ; Wed, 4 May 2022 14:54:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651701262; x=1683237262; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1oOaevUEUkpa5jtiPU0dj9G8Vpovcb/v0acu1kMD198=; b=eV5C0prtPP8pLUzAVG0K7N2+paS0yoEIa+Cl4P1Fh3K7BGv36x3Zaw10 oi4MGsEDPol55nF0Qsgpkbfeg0MCdGKdzZNTIHMwvbUcgvCnoId1W5IR8 8J6IGpBgLd/CzAJqyZXTdjqomFL60mmbk0k/v4OIz9/uJkFnIj3iYYTZ0 qYtupn00JOzhf5SvqIEw3CbX5pLxY/9MWT6dY0h3jX8/ula7ky7HJX+WH 7jkFPMeWt+OqyYH0+1nkIw58tErtSmgs+0fewZ7X0bpNDMIyBWW8sjdza u+g2Ui6llwL4IzLBYqtqZkAXKpaNx2x3NTf0L7EAHLVOnlrWmYEtJmhsD A==; X-IronPort-AV: E=McAfee;i="6400,9594,10337"; a="248445430" X-IronPort-AV: E=Sophos;i="5.91,199,1647327600"; d="scan'208";a="248445430" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2022 14:54:22 -0700 X-IronPort-AV: E=Sophos;i="5.91,199,1647327600"; d="scan'208";a="621000382" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.251.111]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2022 14:54:20 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, edumazet@google.com, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net-next 4/5] mptcp: never shrink offered window Date: Wed, 4 May 2022 14:54:07 -0700 Message-Id: <20220504215408.349318-5-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20220504215408.349318-1-mathew.j.martineau@linux.intel.com> References: <20220504215408.349318-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni As per RFC, the offered MPTCP-level window should never shrink. While we currently track the right edge, we don't enforce the above constraint on the wire. Additionally, concurrent xmit on different subflows can end-up in erroneous right edge update. Address the above explicitly updating the announced window and protecting the update with an additional atomic operation (sic) Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/options.c | 52 ++++++++++++++++++++++++++++++++++++++------ net/mptcp/protocol.c | 8 +++---- net/mptcp/protocol.h | 2 +- 3 files changed, 50 insertions(+), 12 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 2570911735ab..3e3156cfe813 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1224,20 +1224,58 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) return true; } -static void mptcp_set_rwin(const struct tcp_sock *tp) +static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th) { const struct sock *ssk = (const struct sock *)tp; - const struct mptcp_subflow_context *subflow; + struct mptcp_subflow_context *subflow; + u64 ack_seq, rcv_wnd_old, rcv_wnd_new; struct mptcp_sock *msk; - u64 ack_seq; + u32 new_win; + u64 win; subflow = mptcp_subflow_ctx(ssk); msk = mptcp_sk(subflow->conn); - ack_seq = READ_ONCE(msk->ack_seq) + tp->rcv_wnd; + ack_seq = READ_ONCE(msk->ack_seq); + rcv_wnd_new = ack_seq + tp->rcv_wnd; + + rcv_wnd_old = atomic64_read(&msk->rcv_wnd_sent); + if (after64(rcv_wnd_new, rcv_wnd_old)) { + u64 rcv_wnd; - if (after64(ack_seq, READ_ONCE(msk->rcv_wnd_sent))) - WRITE_ONCE(msk->rcv_wnd_sent, ack_seq); + for (;;) { + rcv_wnd = atomic64_cmpxchg(&msk->rcv_wnd_sent, rcv_wnd_old, rcv_wnd_new); + + if (rcv_wnd == rcv_wnd_old) + break; + if (before64(rcv_wnd_new, rcv_wnd)) + goto raise_win; + rcv_wnd_old = rcv_wnd; + } + return; + } + + if (rcv_wnd_new != rcv_wnd_old) { +raise_win: + win = rcv_wnd_old - ack_seq; + tp->rcv_wnd = min_t(u64, win, U32_MAX); + new_win = tp->rcv_wnd; + + /* Make sure we do not exceed the maximum possible + * scaled window. + */ + if (unlikely(th->syn)) + new_win = min(new_win, 65535U) << tp->rx_opt.rcv_wscale; + if (!tp->rx_opt.rcv_wscale && + sock_net(ssk)->ipv4.sysctl_tcp_workaround_signed_windows) + new_win = min(new_win, MAX_TCP_WINDOW); + else + new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale)); + + /* RFC1323 scaling applied */ + new_win >>= tp->rx_opt.rcv_wscale; + th->window = htons(new_win); + } } u16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum) @@ -1554,7 +1592,7 @@ void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp, } if (tp) - mptcp_set_rwin(tp); + mptcp_set_rwin(tp, th); } __be32 mptcp_get_reset_option(const struct sk_buff *skb) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 6710960b74f3..7339448ecbee 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -216,7 +216,7 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb) seq = MPTCP_SKB_CB(skb)->map_seq; end_seq = MPTCP_SKB_CB(skb)->end_seq; - max_seq = READ_ONCE(msk->rcv_wnd_sent); + max_seq = atomic64_read(&msk->rcv_wnd_sent); pr_debug("msk=%p seq=%llx limit=%llx empty=%d", msk, seq, max_seq, RB_EMPTY_ROOT(&msk->out_of_order_queue)); @@ -225,7 +225,7 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb) mptcp_drop(sk, skb); pr_debug("oow by %lld, rcv_wnd_sent %llu\n", (unsigned long long)end_seq - (unsigned long)max_seq, - (unsigned long long)msk->rcv_wnd_sent); + (unsigned long long)atomic64_read(&msk->rcv_wnd_sent)); MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_NODSSWINDOW); return; } @@ -3004,7 +3004,7 @@ struct sock *mptcp_sk_clone(const struct sock *sk, mptcp_crypto_key_sha(msk->remote_key, NULL, &ack_seq); ack_seq++; WRITE_ONCE(msk->ack_seq, ack_seq); - WRITE_ONCE(msk->rcv_wnd_sent, ack_seq); + atomic64_set(&msk->rcv_wnd_sent, ack_seq); } sock_reset_flag(nsk, SOCK_RCU_FREE); @@ -3297,9 +3297,9 @@ void mptcp_finish_connect(struct sock *ssk) WRITE_ONCE(msk->write_seq, subflow->idsn + 1); WRITE_ONCE(msk->snd_nxt, msk->write_seq); WRITE_ONCE(msk->ack_seq, ack_seq); - WRITE_ONCE(msk->rcv_wnd_sent, ack_seq); WRITE_ONCE(msk->can_ack, 1); WRITE_ONCE(msk->snd_una, msk->write_seq); + atomic64_set(&msk->rcv_wnd_sent, ack_seq); mptcp_pm_new_connection(msk, ssk, 0); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index f542aeaa5b09..4672901d0dfe 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -257,7 +257,7 @@ struct mptcp_sock { u64 write_seq; u64 snd_nxt; u64 ack_seq; - u64 rcv_wnd_sent; + atomic64_t rcv_wnd_sent; u64 rcv_data_fin_seq; int rmem_fwd_alloc; struct sock *last_snd; From patchwork Wed May 4 21:54:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 12838691 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DC45C433F5 for ; Wed, 4 May 2022 21:54:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354418AbiEDV6H (ORCPT ); Wed, 4 May 2022 17:58:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351153AbiEDV6C (ORCPT ); Wed, 4 May 2022 17:58:02 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8230751596 for ; Wed, 4 May 2022 14:54:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651701263; x=1683237263; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rAc1/rLJSR1p6poXeUKLegm8cyFurQp8xHNbpGX67OY=; b=K2qIN26Y6b5m+KssxVfX9nnM9Seztae2GO4AV9VQ5h1UrvwxAJ+FDg3R bz42UVQrZmOyIayE3Qw2pj+N1isI/0r5mzZs4symGdQN/Cozv1qZmUYQK b9HDL74IiqLJzuwoseOTTEBiuYrDS5hZv10i+QC//dpufzF6BYOsA7KRQ iaELtaXWT7Q+NAQbyApnrRTnF7oAp8frwSw/0+dAx3sRdMQA9XJ59Yspl UJJfwlK/pcSBtihWBMCDBakl6uEbL8Nm683HD9IHqQhwl6cii9B/GybZf QHTJCwVp3XxaB4i9K4WSzqeAXHMrxfbvUNfDwNGB0LEg/FaeEfHI9N/ef Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10337"; a="248445431" X-IronPort-AV: E=Sophos;i="5.91,199,1647327600"; d="scan'208";a="248445431" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2022 14:54:23 -0700 X-IronPort-AV: E=Sophos;i="5.91,199,1647327600"; d="scan'208";a="621000387" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.212.251.111]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2022 14:54:22 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, edumazet@google.com, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net-next 5/5] mptcp: add more offered MIBs counter Date: Wed, 4 May 2022 14:54:08 -0700 Message-Id: <20220504215408.349318-6-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20220504215408.349318-1-mathew.j.martineau@linux.intel.com> References: <20220504215408.349318-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni Track the exceptional handling of MPTCP-level offered window with a few more counters for observability. Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/mib.c | 3 +++ net/mptcp/mib.h | 5 +++++ net/mptcp/options.c | 6 +++++- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index 6a6f8151375a..0dac2863c6e1 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -57,6 +57,9 @@ static const struct snmp_mib mptcp_snmp_list[] = { SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE), SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER), SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED), + SNMP_MIB_ITEM("RcvWndShared", MPTCP_MIB_RCVWNDSHARED), + SNMP_MIB_ITEM("RcvWndConflictUpdate", MPTCP_MIB_RCVWNDCONFLICTUPDATE), + SNMP_MIB_ITEM("RcvWndConflict", MPTCP_MIB_RCVWNDCONFLICT), SNMP_MIB_SENTINEL }; diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index 2411510bef66..2be3596374f4 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -50,6 +50,11 @@ enum linux_mptcp_mib_field { MPTCP_MIB_SUBFLOWSTALE, /* Subflows entered 'stale' status */ MPTCP_MIB_SUBFLOWRECOVER, /* Subflows returned to active status after being stale */ MPTCP_MIB_SNDWNDSHARED, /* Subflow snd wnd is overridden by msk's one */ + MPTCP_MIB_RCVWNDSHARED, /* Subflow rcv wnd is overridden by msk's one */ + MPTCP_MIB_RCVWNDCONFLICTUPDATE, /* subflow rcv wnd is overridden by msk's one due to + * conflict with another subflow while updating msk rcv wnd + */ + MPTCP_MIB_RCVWNDCONFLICT, /* Conflict with while updating msk rcv wnd */ __MPTCP_MIB_MAX }; diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 3e3156cfe813..ac3b7b8a02f6 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1248,8 +1248,11 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th) if (rcv_wnd == rcv_wnd_old) break; - if (before64(rcv_wnd_new, rcv_wnd)) + if (before64(rcv_wnd_new, rcv_wnd)) { + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDCONFLICTUPDATE); goto raise_win; + } + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDCONFLICT); rcv_wnd_old = rcv_wnd; } return; @@ -1275,6 +1278,7 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th) /* RFC1323 scaling applied */ new_win >>= tp->rx_opt.rcv_wscale; th->window = htons(new_win); + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDSHARED); } }