Message ID | 20240812125717.413108-5-jdamato@fastly.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Suspend IRQs during preferred busy poll | expand |
On Mon, Aug 12, 2024 at 12:57:07PM +0000, Joe Damato wrote: > From: Martin Karsten <mkarsten@uwaterloo.ca> > > Setting prefer_busy_poll now leads to an effectively nonblocking > iteration though napi_busy_loop, even when busy_poll_usecs is 0. Hardcoding calls to the networking code from VFS code seems like a bad idea. Not that I disagree with the concept of disabling interrupts during busy polling, but this needs a proper abstraction through file_operations.
On Mon, Aug 12, 2024 at 06:19:35AM -0700, Christoph Hellwig wrote: > On Mon, Aug 12, 2024 at 12:57:07PM +0000, Joe Damato wrote: > > From: Martin Karsten <mkarsten@uwaterloo.ca> > > > > Setting prefer_busy_poll now leads to an effectively nonblocking > > iteration though napi_busy_loop, even when busy_poll_usecs is 0. > > Hardcoding calls to the networking code from VFS code seems like > a bad idea. Not that I disagree with the concept of disabling > interrupts during busy polling, but this needs a proper abstraction > through file_operations. I don't understand what's going on with this patch set. Is it just working around badly designed hardware? NVMe is specified in a way that lets it be completely interruptless if the host is keeping up with the incoming completions from the device (ie the device will interrupt if a completion has been posted for N microseconds without being acknowledged). I assumed this was how network devices worked too, but I didn't check.
On Mon, Aug 12, 2024 at 06:19:35AM -0700, Christoph Hellwig wrote: > On Mon, Aug 12, 2024 at 12:57:07PM +0000, Joe Damato wrote: > > From: Martin Karsten <mkarsten@uwaterloo.ca> > > > > Setting prefer_busy_poll now leads to an effectively nonblocking > > iteration though napi_busy_loop, even when busy_poll_usecs is 0. > > Hardcoding calls to the networking code from VFS code seems like > a bad idea. Not that I disagree with the concept of disabling > interrupts during busy polling, but this needs a proper abstraction > through file_operations. Thanks for the feedback; the code modified in the this patch was already calling directly into the networking stack; we didn't add that call. We added a check on another member of the eventpoll structure, though. In general: it may be appropriate for a better abstraction to exist between fs/eventpoll.c and the networking stack as there are already many calls into the networking stack from this code. However, I think that is a much larger change that is not directly related to what we're proposing, which is: a mechanism for more efficient epoll-based busy poll which shows significant performance improvements. - Joe
On Mon, Aug 12, 2024 at 05:17:38PM +0100, Matthew Wilcox wrote: > On Mon, Aug 12, 2024 at 06:19:35AM -0700, Christoph Hellwig wrote: > > On Mon, Aug 12, 2024 at 12:57:07PM +0000, Joe Damato wrote: > > > From: Martin Karsten <mkarsten@uwaterloo.ca> > > > > > > Setting prefer_busy_poll now leads to an effectively nonblocking > > > iteration though napi_busy_loop, even when busy_poll_usecs is 0. > > > > Hardcoding calls to the networking code from VFS code seems like > > a bad idea. Not that I disagree with the concept of disabling > > interrupts during busy polling, but this needs a proper abstraction > > through file_operations. > > I don't understand what's going on with this patch set. Is it just > working around badly designed hardware? NVMe is specified in a way that > lets it be completely interruptless if the host is keeping up with the > incoming completions from the device (ie the device will interrupt if a > completion has been posted for N microseconds without being acknowledged). > I assumed this was how network devices worked too, but I didn't check. Thanks for taking a look. I'd kindly point you back to the cover letter [1], which describes the purpose of the patch set in greater detail. At a high level: the networking stack has a mechanism for deferring interrupts that was introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature") and expanded upon in 7fd3253a7de6 ("net: Introduce preferred busy-polling"). We are expanding the existing mechanisms further so that when applications are busy polling, IRQs are totally disabled. While traditional NAPI does prevent IRQs from being re-enabled, it runs in softIRQ context and only retrieves the available data at the time NAPI poll runs. The kernel's pre-existing busy polling however, lets the application drive packet processing (instead of NAPI). Busy polling when used in combination with various options allowed user applications to defer IRQs, but as we show in our cover letter it is extremely difficult to pick the correct values for all traffic cases at all times. We are introducing an extension of the existing mechanism to allow epoll-based busy poll applications to run more efficiently by keeping IRQs disabled throughout the duration of successful busy poll iterations. - Joe [1]: https://lore.kernel.org/netdev/20240812125717.413108-1-jdamato@fastly.com/
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index f53ca4f7fced..cc47f72005ed 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -420,7 +420,9 @@ static bool busy_loop_ep_timeout(unsigned long start_time, static bool ep_busy_loop_on(struct eventpoll *ep) { - return !!ep->busy_poll_usecs || net_busy_loop_on(); + return !!READ_ONCE(ep->busy_poll_usecs) || + READ_ONCE(ep->prefer_busy_poll) || + net_busy_loop_on(); } static bool ep_busy_loop_end(void *p, unsigned long start_time)