diff mbox series

[2/2] r8152: Call napi_schedule() from proper context

Message ID 20250212174329.53793-3-frederic@kernel.org (mailing list archive)
State New
Headers show
Series net: Fix/prevent napi_schedule() call from bare task context | expand

Commit Message

Frederic Weisbecker Feb. 12, 2025, 5:43 p.m. UTC
napi_schedule() is expected to be called either:

* From an interrupt, where raised softirqs are handled on IRQ exit

* Fom a softirq disabled section, where raised softirqs are handled on
  the next call to local_bh_enable().

* From a softirq handler, where raised softirqs are handled on the next
  round in do_softirq(), or further deferred to a dedicated kthread.

r8152 may call napi_schedule() on device resume time from a bare task
context without disabling softirqs as the following trace shows:

	__raise_softirq_irqoff
	__napi_schedule
	rtl8152_runtime_resume.isra.0
	rtl8152_resume
	usb_resume_interface.isra.0
	usb_resume_both
	__rpm_callback
	rpm_callback
	rpm_resume
	__pm_runtime_resume
	usb_autoresume_device
	usb_remote_wakeup
	hub_event
	process_one_work
	worker_thread
	kthread
	ret_from_fork
	ret_from_fork_asm

This may result in the NET_RX softirq vector to be ignored until the
next interrupt or softirq handling. The delay can be long if the
above kthread leaves the CPU idle and the tick is stopped for a while,
as reported with the following message:

	NOHZ tick-stop error: local softirq work is pending, handler #08!!!

Fix this with disabling softirqs while calling napi_schedule(). The
call to local_bh_enable() will take care of the NET_RX raised vector.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: 354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/net/usb/r8152.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Francois Romieu Feb. 12, 2025, 8:49 p.m. UTC | #1
Frederic Weisbecker <frederic@kernel.org> :
[...]
> r8152 may call napi_schedule() on device resume time from a bare task
> context without disabling softirqs as the following trace shows:
[...]
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 468c73974046..1325460ae457 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -8537,8 +8537,11 @@ static int rtl8152_runtime_resume(struct r8152 *tp)
>  		clear_bit(SELECTIVE_SUSPEND, &tp->flags);
>  		smp_mb__after_atomic();
>  
> -		if (!list_empty(&tp->rx_done))
> +		if (!list_empty(&tp->rx_done)) {
> +			local_bh_disable();
>  			napi_schedule(&tp->napi);
> +			local_bh_enable();
> +		}

AFAIU drivers/net/usb/r8152.c::rtl_work_func_t exhibits the same
problem.
Frederic Weisbecker Feb. 12, 2025, 8:58 p.m. UTC | #2
Le Wed, Feb 12, 2025 at 09:49:29PM +0100, Francois Romieu a écrit :
> Frederic Weisbecker <frederic@kernel.org> :
> [...]
> > r8152 may call napi_schedule() on device resume time from a bare task
> > context without disabling softirqs as the following trace shows:
> [...]
> > diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> > index 468c73974046..1325460ae457 100644
> > --- a/drivers/net/usb/r8152.c
> > +++ b/drivers/net/usb/r8152.c
> > @@ -8537,8 +8537,11 @@ static int rtl8152_runtime_resume(struct r8152 *tp)
> >  		clear_bit(SELECTIVE_SUSPEND, &tp->flags);
> >  		smp_mb__after_atomic();
> >  
> > -		if (!list_empty(&tp->rx_done))
> > +		if (!list_empty(&tp->rx_done)) {
> > +			local_bh_disable();
> >  			napi_schedule(&tp->napi);
> > +			local_bh_enable();
> > +		}
> 
> AFAIU drivers/net/usb/r8152.c::rtl_work_func_t exhibits the same
> problem.

It's a workqueue function and softirqs don't seem to be disabled.
Looks like a goot catch!

Thanks.

> 
> -- 
> Ueimor
Paul Menzel Feb. 18, 2025, 8:12 p.m. UTC | #3
Dear Frederic, dear Francois,


Thank you for the patch and review.


Am 12.02.25 um 21:58 schrieb Frederic Weisbecker:
> Le Wed, Feb 12, 2025 at 09:49:29PM +0100, Francois Romieu a écrit :
>> Frederic Weisbecker <frederic@kernel.org> :
>> [...]
>>> r8152 may call napi_schedule() on device resume time from a bare task
>>> context without disabling softirqs as the following trace shows:
>> [...]
>>> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
>>> index 468c73974046..1325460ae457 100644
>>> --- a/drivers/net/usb/r8152.c
>>> +++ b/drivers/net/usb/r8152.c
>>> @@ -8537,8 +8537,11 @@ static int rtl8152_runtime_resume(struct r8152 *tp)
>>>   		clear_bit(SELECTIVE_SUSPEND, &tp->flags);
>>>   		smp_mb__after_atomic();
>>>   
>>> -		if (!list_empty(&tp->rx_done))
>>> +		if (!list_empty(&tp->rx_done)) {
>>> +			local_bh_disable();
>>>   			napi_schedule(&tp->napi);
>>> +			local_bh_enable();
>>> +		}
>>
>> AFAIU drivers/net/usb/r8152.c::rtl_work_func_t exhibits the same
>> problem.
> 
> It's a workqueue function and softirqs don't seem to be disabled.
> Looks like a goot catch!

Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>

Are you going to send a v2, so it might get into Linux 6.14, or is it 
too late anyway?


Kind regards,

Paul
diff mbox series

Patch

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 468c73974046..1325460ae457 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -8537,8 +8537,11 @@  static int rtl8152_runtime_resume(struct r8152 *tp)
 		clear_bit(SELECTIVE_SUSPEND, &tp->flags);
 		smp_mb__after_atomic();
 
-		if (!list_empty(&tp->rx_done))
+		if (!list_empty(&tp->rx_done)) {
+			local_bh_disable();
 			napi_schedule(&tp->napi);
+			local_bh_enable();
+		}
 
 		usb_submit_urb(tp->intr_urb, GFP_NOIO);
 	} else {