Message ID | 20250212174329.53793-3-frederic@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | net: Fix/prevent napi_schedule() call from bare task context | expand |
Frederic Weisbecker <frederic@kernel.org> : [...] > r8152 may call napi_schedule() on device resume time from a bare task > context without disabling softirqs as the following trace shows: [...] > diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c > index 468c73974046..1325460ae457 100644 > --- a/drivers/net/usb/r8152.c > +++ b/drivers/net/usb/r8152.c > @@ -8537,8 +8537,11 @@ static int rtl8152_runtime_resume(struct r8152 *tp) > clear_bit(SELECTIVE_SUSPEND, &tp->flags); > smp_mb__after_atomic(); > > - if (!list_empty(&tp->rx_done)) > + if (!list_empty(&tp->rx_done)) { > + local_bh_disable(); > napi_schedule(&tp->napi); > + local_bh_enable(); > + } AFAIU drivers/net/usb/r8152.c::rtl_work_func_t exhibits the same problem.
Le Wed, Feb 12, 2025 at 09:49:29PM +0100, Francois Romieu a écrit : > Frederic Weisbecker <frederic@kernel.org> : > [...] > > r8152 may call napi_schedule() on device resume time from a bare task > > context without disabling softirqs as the following trace shows: > [...] > > diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c > > index 468c73974046..1325460ae457 100644 > > --- a/drivers/net/usb/r8152.c > > +++ b/drivers/net/usb/r8152.c > > @@ -8537,8 +8537,11 @@ static int rtl8152_runtime_resume(struct r8152 *tp) > > clear_bit(SELECTIVE_SUSPEND, &tp->flags); > > smp_mb__after_atomic(); > > > > - if (!list_empty(&tp->rx_done)) > > + if (!list_empty(&tp->rx_done)) { > > + local_bh_disable(); > > napi_schedule(&tp->napi); > > + local_bh_enable(); > > + } > > AFAIU drivers/net/usb/r8152.c::rtl_work_func_t exhibits the same > problem. It's a workqueue function and softirqs don't seem to be disabled. Looks like a goot catch! Thanks. > > -- > Ueimor
Dear Frederic, dear Francois, Thank you for the patch and review. Am 12.02.25 um 21:58 schrieb Frederic Weisbecker: > Le Wed, Feb 12, 2025 at 09:49:29PM +0100, Francois Romieu a écrit : >> Frederic Weisbecker <frederic@kernel.org> : >> [...] >>> r8152 may call napi_schedule() on device resume time from a bare task >>> context without disabling softirqs as the following trace shows: >> [...] >>> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c >>> index 468c73974046..1325460ae457 100644 >>> --- a/drivers/net/usb/r8152.c >>> +++ b/drivers/net/usb/r8152.c >>> @@ -8537,8 +8537,11 @@ static int rtl8152_runtime_resume(struct r8152 *tp) >>> clear_bit(SELECTIVE_SUSPEND, &tp->flags); >>> smp_mb__after_atomic(); >>> >>> - if (!list_empty(&tp->rx_done)) >>> + if (!list_empty(&tp->rx_done)) { >>> + local_bh_disable(); >>> napi_schedule(&tp->napi); >>> + local_bh_enable(); >>> + } >> >> AFAIU drivers/net/usb/r8152.c::rtl_work_func_t exhibits the same >> problem. > > It's a workqueue function and softirqs don't seem to be disabled. > Looks like a goot catch! Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Are you going to send a v2, so it might get into Linux 6.14, or is it too late anyway? Kind regards, Paul
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 468c73974046..1325460ae457 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -8537,8 +8537,11 @@ static int rtl8152_runtime_resume(struct r8152 *tp) clear_bit(SELECTIVE_SUSPEND, &tp->flags); smp_mb__after_atomic(); - if (!list_empty(&tp->rx_done)) + if (!list_empty(&tp->rx_done)) { + local_bh_disable(); napi_schedule(&tp->napi); + local_bh_enable(); + } usb_submit_urb(tp->intr_urb, GFP_NOIO); } else {
napi_schedule() is expected to be called either: * From an interrupt, where raised softirqs are handled on IRQ exit * Fom a softirq disabled section, where raised softirqs are handled on the next call to local_bh_enable(). * From a softirq handler, where raised softirqs are handled on the next round in do_softirq(), or further deferred to a dedicated kthread. r8152 may call napi_schedule() on device resume time from a bare task context without disabling softirqs as the following trace shows: __raise_softirq_irqoff __napi_schedule rtl8152_runtime_resume.isra.0 rtl8152_resume usb_resume_interface.isra.0 usb_resume_both __rpm_callback rpm_callback rpm_resume __pm_runtime_resume usb_autoresume_device usb_remote_wakeup hub_event process_one_work worker_thread kthread ret_from_fork ret_from_fork_asm This may result in the NET_RX softirq vector to be ignored until the next interrupt or softirq handling. The delay can be long if the above kthread leaves the CPU idle and the tick is stopped for a while, as reported with the following message: NOHZ tick-stop error: local softirq work is pending, handler #08!!! Fix this with disabling softirqs while calling napi_schedule(). The call to local_bh_enable() will take care of the NET_RX raised vector. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Closes: 354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- drivers/net/usb/r8152.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)