[RFC,0/4] mptcp: just another receive path refactor

Message ID	cover.1621963632.git.pabeni@redhat.com (mailing list archive)
Headers	show Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 413E870 for <mptcp@lists.linux.dev>; Tue, 25 May 2021 17:37:39 +0000 (UTC) From: Paolo Abeni <pabeni@redhat.com> To: mptcp@lists.linux.dev Cc: fwestpha@redhat.com Subject: [RFC PATCH 0/4] mptcp: just another receive path refactor Date: Tue, 25 May 2021 19:37:16 +0200 Message-Id: <cover.1621963632.git.pabeni@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"
Series	mptcp: just another receive path refactor \| expand [RFC,0/4] mptcp: just another receive path refactor [RFC,1/4] mptcp: wake-up readers only for in sequence data. [RFC,2/4] mptcp: don't clear MPTCP_DATA_READY in sk_wait_event() [RFC,3/4] mptcp: move the whole rx path under msk socket lock protection [RFC,4/4] mptcp: cleanup mem accounting.

Message ID

cover.1621963632.git.pabeni@redhat.com (mailing list archive)

Headers

From: Paolo Abeni <pabeni@redhat.com>
To: mptcp@lists.linux.dev
Cc: fwestpha@redhat.com
Subject: [RFC PATCH 0/4] mptcp: just another receive path refactor
Date: Tue, 25 May 2021 19:37:16 +0200
Message-Id: <cover.1621963632.git.pabeni@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="US-ASCII"

Series

mptcp: just another receive path refactor | expand

Message

Paolo Abeni May 25, 2021, 5:37 p.m. UTC

Some recent issues outlined we have perhaps too much complexity
in the receive path and memory accounting.

After recent changes in release_cb() we can drop most such complexity
in favour of some more traditional RX/memory accounting schema.

This is the result.

The first 2 patches are actually bugfixes, even if I don't understand
why we almost never hit the condition addressed by patch 1/4 (and
the patched kernel hits hit frequently).

The 3rd patch introduces the major change, and the fouth is just
cleanup.

This could have some negative performance effects, as on average more
locking is required for each packet. I'm doing some perf test and will
report the results.

Paolo Abeni (4):
  mptcp: wake-up readers only for in sequence data.
  mptcp: don't clear MPTCP_DATA_READY in sk_wait_event()
  mptcp: move the whole rx path under msk socket lock protection
  mptcp: cleanup mem accounting.

 net/mptcp/protocol.c | 298 +++++++++----------------------------------
 net/mptcp/protocol.h |  20 +--
 net/mptcp/subflow.c  |  15 +--
 3 files changed, 68 insertions(+), 265 deletions(-)

Comments

Paolo Abeni May 28, 2021, 3:18 p.m. UTC | #1

On Tue, 2021-05-25 at 19:37 +0200, Paolo Abeni wrote:
> This could have some negative performance effects, as on average more
> locking is required for each packet. I'm doing some perf test and will
> report the results.

There are several different possible scenarios:

1) single subflow, ksoftirq && user-space process run on the same CPU
2) multiple subflows, ksoftirqs && user-space process run on the same
CPU
3) single subflow, ksoftirq && user-space process run on different CPUs
4) multiple subflows ksoftirqs && user-space process run on different
CPUs

With a single subflow, the most common scenario is with ksoftirq &&
user-space process run on the same CPU. With multiple subflows on
resonable server H/W we should likley observe a more mixed situation:
softirqs running on multiple CPUs, one of them also hosting the user-
space process. I don't have data for that yet.

The figures:

scenario	export branch 		RX path refactor	delta
1)		23Mbps			21Mbps			-8%
2)		30Mbps			19Mbps			-37%
3)		17.8Mbps		17.5Mbps		noise range
4)		1-3Mbps			1-3Mbps			???

The last scenario outlined a bug, we likely don't send MPTCP level ACK
frequently enough under some condition. That *could* possibly be
related to:

https://github.com/multipath-tcp/mptcp_net-next/issues/137

but I'm unsure about that.

The delta in scenario 2) is quite significant.

The root cause is that in such scenario the user-space process is the
bottle-neck: it keeps a CPU fully busy, spending most of the available
cycles memcpying the data into the user-space. 

With the current export branch, the skbs movement/enqueuing happens
completely inside the ksoftirqd processes.

On top the RX path refactor, some skbs handling is peformed by the
mptcp_release_cb() inside the scope of the user-space process. That
reduces the number of CPU cycles available for memcpying the data and
thus reduces also the overall tput.

I experimented with a different approach - e.g. keeping the skbs
accounted to the incoming subflows - but that looks not feasible.

Input wanted: WDYT of the above?

Thanks!

Paolo