diff mbox series

[RFC,net-next,4/5] eventpoll: Trigger napi_busy_loop, if prefer_busy_poll is set

Message ID 20240812125717.413108-5-jdamato@fastly.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series Suspend IRQs during preferred busy poll | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 29 this patch: 29
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 29 this patch: 29
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 32 this patch: 32
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Joe Damato Aug. 12, 2024, 12:57 p.m. UTC
From: Martin Karsten <mkarsten@uwaterloo.ca>

Setting prefer_busy_poll now leads to an effectively nonblocking
iteration though napi_busy_loop, even when busy_poll_usecs is 0.

Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
---
 fs/eventpoll.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig Aug. 12, 2024, 1:19 p.m. UTC | #1
On Mon, Aug 12, 2024 at 12:57:07PM +0000, Joe Damato wrote:
> From: Martin Karsten <mkarsten@uwaterloo.ca>
> 
> Setting prefer_busy_poll now leads to an effectively nonblocking
> iteration though napi_busy_loop, even when busy_poll_usecs is 0.

Hardcoding calls to the networking code from VFS code seems like
a bad idea.   Not that I disagree with the concept of disabling
interrupts during busy polling, but this needs a proper abstraction
through file_operations.
Matthew Wilcox (Oracle) Aug. 12, 2024, 4:17 p.m. UTC | #2
On Mon, Aug 12, 2024 at 06:19:35AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 12, 2024 at 12:57:07PM +0000, Joe Damato wrote:
> > From: Martin Karsten <mkarsten@uwaterloo.ca>
> > 
> > Setting prefer_busy_poll now leads to an effectively nonblocking
> > iteration though napi_busy_loop, even when busy_poll_usecs is 0.
> 
> Hardcoding calls to the networking code from VFS code seems like
> a bad idea.   Not that I disagree with the concept of disabling
> interrupts during busy polling, but this needs a proper abstraction
> through file_operations.

I don't understand what's going on with this patch set.  Is it just
working around badly designed hardware?  NVMe is specified in a way that
lets it be completely interruptless if the host is keeping up with the
incoming completions from the device (ie the device will interrupt if a
completion has been posted for N microseconds without being acknowledged).
I assumed this was how network devices worked too, but I didn't check.
Joe Damato Aug. 12, 2024, 5:46 p.m. UTC | #3
On Mon, Aug 12, 2024 at 06:19:35AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 12, 2024 at 12:57:07PM +0000, Joe Damato wrote:
> > From: Martin Karsten <mkarsten@uwaterloo.ca>
> > 
> > Setting prefer_busy_poll now leads to an effectively nonblocking
> > iteration though napi_busy_loop, even when busy_poll_usecs is 0.
> 
> Hardcoding calls to the networking code from VFS code seems like
> a bad idea.   Not that I disagree with the concept of disabling
> interrupts during busy polling, but this needs a proper abstraction
> through file_operations.

Thanks for the feedback; the code modified in the this patch was
already calling directly into the networking stack; we didn't add
that call. We added a check on another member of the eventpoll
structure, though.

In general: it may be appropriate for a better abstraction to exist
between fs/eventpoll.c and the networking stack as there are already
many calls into the networking stack from this code.

However, I think that is a much larger change that is not directly
related to what we're proposing, which is: a mechanism for more
efficient epoll-based busy poll which shows significant performance
improvements.

- Joe
Joe Damato Aug. 12, 2024, 5:49 p.m. UTC | #4
On Mon, Aug 12, 2024 at 05:17:38PM +0100, Matthew Wilcox wrote:
> On Mon, Aug 12, 2024 at 06:19:35AM -0700, Christoph Hellwig wrote:
> > On Mon, Aug 12, 2024 at 12:57:07PM +0000, Joe Damato wrote:
> > > From: Martin Karsten <mkarsten@uwaterloo.ca>
> > > 
> > > Setting prefer_busy_poll now leads to an effectively nonblocking
> > > iteration though napi_busy_loop, even when busy_poll_usecs is 0.
> > 
> > Hardcoding calls to the networking code from VFS code seems like
> > a bad idea.   Not that I disagree with the concept of disabling
> > interrupts during busy polling, but this needs a proper abstraction
> > through file_operations.
> 
> I don't understand what's going on with this patch set.  Is it just
> working around badly designed hardware?  NVMe is specified in a way that
> lets it be completely interruptless if the host is keeping up with the
> incoming completions from the device (ie the device will interrupt if a
> completion has been posted for N microseconds without being acknowledged).
> I assumed this was how network devices worked too, but I didn't check.

Thanks for taking a look. I'd kindly point you back to the cover
letter [1], which describes the purpose of the patch set in greater
detail.

At a high level: the networking stack has a mechanism for deferring
interrupts that was introduced in commit 6f8b12d661d0 ("net: napi:
add hard irqs deferral feature") and expanded upon in 7fd3253a7de6
("net: Introduce preferred busy-polling"). We are expanding the
existing mechanisms further so that when applications are busy
polling, IRQs are totally disabled.

While traditional NAPI does prevent IRQs from being re-enabled, it
runs in softIRQ context and only retrieves the available data at the
time NAPI poll runs. The kernel's pre-existing busy polling however,
lets the application drive packet processing (instead of NAPI). Busy
polling when used in combination with various options allowed user
applications to defer IRQs, but as we show in our cover letter it is
extremely difficult to pick the correct values for all traffic cases
at all times.

We are introducing an extension of the existing mechanism to allow
epoll-based busy poll applications to run more efficiently by
keeping IRQs disabled throughout the duration of successful busy
poll iterations.

- Joe

[1]: https://lore.kernel.org/netdev/20240812125717.413108-1-jdamato@fastly.com/
diff mbox series

Patch

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f53ca4f7fced..cc47f72005ed 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -420,7 +420,9 @@  static bool busy_loop_ep_timeout(unsigned long start_time,
 
 static bool ep_busy_loop_on(struct eventpoll *ep)
 {
-	return !!ep->busy_poll_usecs || net_busy_loop_on();
+	return !!READ_ONCE(ep->busy_poll_usecs) ||
+	       READ_ONCE(ep->prefer_busy_poll) ||
+	       net_busy_loop_on();
 }
 
 static bool ep_busy_loop_end(void *p, unsigned long start_time)