Message ID | 20210414080845.11426-1-lijunp213@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] net: core: make napi_disable more robust | expand |
On 2021/4/14 16:08, Lijun Pan wrote: > There are chances that napi_disable can be called twice by NIC driver. > This could generate deadlock. For example, > the first napi_disable will spin until NAPI_STATE_SCHED is cleared > by napi_complete_done, then set it again. > When napi_disable is called the second time, it will loop infinitely > because no dev->poll will be running to clear NAPI_STATE_SCHED. > > Though it is driver writer's responsibility to make sure it being > called only once, making napi_disable more robust does not hurt, not > to say it can prevent a buggy driver from crashing a system. > So, we check the napi state bit to make sure that if napi is already > disabled, we exit the call early enough to avoid spinning infinitely. > > Fixes: bea3348eef27 ("[NET]: Make NAPI polling independent of struct net_device objects.") > Signed-off-by: Lijun Pan <lijunp213@gmail.com> > --- > v2: justify that this patch makes napi_disable more robust. > > net/core/dev.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 1f79b9aa9a3f..fa0aa212b7bb 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6830,6 +6830,24 @@ EXPORT_SYMBOL(netif_napi_add); > void napi_disable(struct napi_struct *n) > { > might_sleep(); > + > + /* make sure napi_disable() runs only once, > + * When napi is disabled, the state bits are like: > + * NAPI_STATE_SCHED (set by previous napi_disable) > + * NAPI_STATE_NPSVC (set by previous napi_disable) > + * NAPI_STATE_DISABLE (cleared by previous napi_disable) > + * NAPI_STATE_PREFER_BUSY_POLL (cleared by previous napi_complete_done) > + * NAPI_STATE_MISSED (cleared by previous napi_complete_done) > + */ > + > + if (napi_disable_pending(n)) > + return; > + if (test_bit(NAPI_STATE_SCHED, &n->state) && > + test_bit(NAPI_STATE_NPSVC, &n->state) && > + !test_bit(NAPI_STATE_MISSED, &n->state) && > + !test_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state)) > + return; The NAPI_STATE_DISABLE is cleared at the end of napi_disable(), and if a buggy driver/hw triggers a interrupt and driver calls napi_schedule_irqoff(), which may set NAPI_STATE_MISSED if NAPI_STATE_SCHED is set(in napi_schedule_prep()), the above checking does not seem to handle it? > + > set_bit(NAPI_STATE_DISABLE, &n->state); > > while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) >
On Wed, Apr 14, 2021 at 3:45 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2021/4/14 16:08, Lijun Pan wrote: > > There are chances that napi_disable can be called twice by NIC driver. > > This could generate deadlock. For example, > > the first napi_disable will spin until NAPI_STATE_SCHED is cleared > > by napi_complete_done, then set it again. > > When napi_disable is called the second time, it will loop infinitely > > because no dev->poll will be running to clear NAPI_STATE_SCHED. > > > > Though it is driver writer's responsibility to make sure it being > > called only once, making napi_disable more robust does not hurt, not > > to say it can prevent a buggy driver from crashing a system. > > So, we check the napi state bit to make sure that if napi is already > > disabled, we exit the call early enough to avoid spinning infinitely. > > > > Fixes: bea3348eef27 ("[NET]: Make NAPI polling independent of struct net_device objects.") > > Signed-off-by: Lijun Pan <lijunp213@gmail.com> > > --- > > v2: justify that this patch makes napi_disable more robust. > > > > net/core/dev.c | 18 ++++++++++++++++++ > > 1 file changed, 18 insertions(+) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 1f79b9aa9a3f..fa0aa212b7bb 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -6830,6 +6830,24 @@ EXPORT_SYMBOL(netif_napi_add); > > void napi_disable(struct napi_struct *n) > > { > > might_sleep(); > > + > > + /* make sure napi_disable() runs only once, > > + * When napi is disabled, the state bits are like: > > + * NAPI_STATE_SCHED (set by previous napi_disable) > > + * NAPI_STATE_NPSVC (set by previous napi_disable) > > + * NAPI_STATE_DISABLE (cleared by previous napi_disable) > > + * NAPI_STATE_PREFER_BUSY_POLL (cleared by previous napi_complete_done) > > + * NAPI_STATE_MISSED (cleared by previous napi_complete_done) > > + */ > > + > > + if (napi_disable_pending(n)) > > + return; > > + if (test_bit(NAPI_STATE_SCHED, &n->state) && > > + test_bit(NAPI_STATE_NPSVC, &n->state) && > > + !test_bit(NAPI_STATE_MISSED, &n->state) && > > + !test_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state)) > > + return; > > The NAPI_STATE_DISABLE is cleared at the end of napi_disable(), > and if a buggy driver/hw triggers a interrupt and driver calls > napi_schedule_irqoff(), which may set NAPI_STATE_MISSED > if NAPI_STATE_SCHED is set(in napi_schedule_prep()), the above > checking does not seem to handle it? What I described in the commit message is the napi_disable() being called from the same instance, same cpu. e.g., funcA { napi_disable(); ... funcB{ if (blah) napi_disable(); ... } funcC; } The scenario you mentioned above seems to have napi already enabled and scheduled, such that napi_schedule_prep() would set NAPI_STATE_MISSED. The two scenarios are different per my understanding. Is there a way that your scenario will finally call into my scenario? Let me know if I understand you correctly. Maybe testing NAPI_STATE_MISSED bit is not needed because this bit is not that reliable. Lijun
On Wed, 14 Apr 2021 03:08:45 -0500 Lijun Pan wrote: > There are chances that napi_disable can be called twice by NIC driver. > This could generate deadlock. For example, > the first napi_disable will spin until NAPI_STATE_SCHED is cleared > by napi_complete_done, then set it again. > When napi_disable is called the second time, it will loop infinitely > because no dev->poll will be running to clear NAPI_STATE_SCHED. > > Though it is driver writer's responsibility to make sure it being > called only once, making napi_disable more robust does not hurt, not > to say it can prevent a buggy driver from crashing a system. > So, we check the napi state bit to make sure that if napi is already > disabled, we exit the call early enough to avoid spinning infinitely. You've already been told by Eric & Dave to fix the driver instead. Your check is _not_ correct - SCHED && NPSVC && !MISSED && !BUSY_POLL can well arise without disabling the NAPI. But regardless, a driver bug should be relatively easy to identify with task getting stuck in napi_disable(). We don't provide "protection" for taking spin locks or ref counts twice either. Unless you can show a strong use case please stop posting new versions of this patch.
On 4/15/21 1:21 AM, Jakub Kicinski wrote: > On Wed, 14 Apr 2021 03:08:45 -0500 Lijun Pan wrote: >> There are chances that napi_disable can be called twice by NIC driver. >> This could generate deadlock. For example, >> the first napi_disable will spin until NAPI_STATE_SCHED is cleared >> by napi_complete_done, then set it again. >> When napi_disable is called the second time, it will loop infinitely >> because no dev->poll will be running to clear NAPI_STATE_SCHED. >> >> Though it is driver writer's responsibility to make sure it being >> called only once, making napi_disable more robust does not hurt, not >> to say it can prevent a buggy driver from crashing a system. >> So, we check the napi state bit to make sure that if napi is already >> disabled, we exit the call early enough to avoid spinning infinitely. > > You've already been told by Eric & Dave to fix the driver instead. > > Your check is _not_ correct - SCHED && NPSVC && !MISSED && !BUSY_POLL > can well arise without disabling the NAPI. > > But regardless, a driver bug should be relatively easy to identify with > task getting stuck in napi_disable(). We don't provide "protection" > for taking spin locks or ref counts twice either. Unless you can show > a strong use case please stop posting new versions of this patch. > +222 I notice this v2 does not even mention which driver has the issue. I suspect an out-of-tree driver.
On 4/14/21 10:08 AM, Lijun Pan wrote: > There are chances that napi_disable can be called twice by NIC driver. > This could generate deadlock. For example, > the first napi_disable will spin until NAPI_STATE_SCHED is cleared > by napi_complete_done, then set it again. > When napi_disable is called the second time, it will loop infinitely > because no dev->poll will be running to clear NAPI_STATE_SCHED. > > Though it is driver writer's responsibility to make sure it being > called only once, making napi_disable more robust does not hurt, not > to say it can prevent a buggy driver from crashing a system. This is hard to digest. A buggy driver has plenty of ways to crash the system. If you need help to fix the buggy driver, please ask for help.
diff --git a/net/core/dev.c b/net/core/dev.c index 1f79b9aa9a3f..fa0aa212b7bb 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6830,6 +6830,24 @@ EXPORT_SYMBOL(netif_napi_add); void napi_disable(struct napi_struct *n) { might_sleep(); + + /* make sure napi_disable() runs only once, + * When napi is disabled, the state bits are like: + * NAPI_STATE_SCHED (set by previous napi_disable) + * NAPI_STATE_NPSVC (set by previous napi_disable) + * NAPI_STATE_DISABLE (cleared by previous napi_disable) + * NAPI_STATE_PREFER_BUSY_POLL (cleared by previous napi_complete_done) + * NAPI_STATE_MISSED (cleared by previous napi_complete_done) + */ + + if (napi_disable_pending(n)) + return; + if (test_bit(NAPI_STATE_SCHED, &n->state) && + test_bit(NAPI_STATE_NPSVC, &n->state) && + !test_bit(NAPI_STATE_MISSED, &n->state) && + !test_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state)) + return; + set_bit(NAPI_STATE_DISABLE, &n->state); while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
There are chances that napi_disable can be called twice by NIC driver. This could generate deadlock. For example, the first napi_disable will spin until NAPI_STATE_SCHED is cleared by napi_complete_done, then set it again. When napi_disable is called the second time, it will loop infinitely because no dev->poll will be running to clear NAPI_STATE_SCHED. Though it is driver writer's responsibility to make sure it being called only once, making napi_disable more robust does not hurt, not to say it can prevent a buggy driver from crashing a system. So, we check the napi state bit to make sure that if napi is already disabled, we exit the call early enough to avoid spinning infinitely. Fixes: bea3348eef27 ("[NET]: Make NAPI polling independent of struct net_device objects.") Signed-off-by: Lijun Pan <lijunp213@gmail.com> --- v2: justify that this patch makes napi_disable more robust. net/core/dev.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)