diff mbox series

[net,v2] igb: fix netpoll exit with traffic

Message ID 20211123204000.1597971-1-jesse.brandeburg@intel.com (mailing list archive)
State Accepted
Commit eaeace60778e524a2820d0c0ad60bf80289e292c
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] igb: fix netpoll exit with traffic | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: alexander.h.duyck@intel.com; 4 maintainers not CCed: anthony.l.nguyen@intel.com alexander.h.duyck@intel.com kuba@kernel.org davem@davemloft.net
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: 'occuring' may be misspelled - perhaps 'occurring'?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jesse Brandeburg Nov. 23, 2021, 8:40 p.m. UTC
Oleksandr brought a bug report where netpoll causes trace
messages in the log on igb.

Danielle brought this back up as still occuring, so we'll try
again.

[22038.710800] ------------[ cut here ]------------
[22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
[22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0

As Alex suggested, change the driver to return work_done at the
exit of napi_poll, which should be safe to do in this driver
because it is not polling multiple queues in this single napi
context (multiple queues attached to one MSI-X vector). Several
other drivers contain the same simple sequence, so I hope
this will not create new problems.

Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and improve performance")
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Danielle Ratson <danieller@nvidia.com>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
COMPILE TESTED ONLY! I have no way to reproduce this even on a machine I
have with igb. It works fine to load the igb driver and netconsole with
no errors.
---
v2: simplified patch with an attempt to make it work
v1: original patch that apparently didn't work
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Danielle Ratson Nov. 24, 2021, 7:51 a.m. UTC | #1
> -----Original Message-----
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Sent: Tuesday, November 23, 2021 10:40 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>;
> netdev@vger.kernel.org; Oleksandr Natalenko
> <oleksandr@natalenko.name>; Danielle Ratson <danieller@nvidia.com>;
> Alexander Duyck <alexander.duyck@gmail.com>
> Subject: [PATCH net v2] igb: fix netpoll exit with traffic
> 
> Oleksandr brought a bug report where netpoll causes trace messages in the
> log on igb.
> 
> Danielle brought this back up as still occuring, so we'll try again.

Hi Jessi,

Ill run tests with you patch and give you results for if it is ok.
Thanks!

> 
> [22038.710800] ------------[ cut here ]------------ [22038.710801]
> igb_poll+0x0/0x1440 [igb] exceeded budget in poll [22038.710802]
> WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155
> netpoll_poll_dev+0x18a/0x1a0
> 
> As Alex suggested, change the driver to return work_done at the exit of
> napi_poll, which should be safe to do in this driver because it is not polling
> multiple queues in this single napi context (multiple queues attached to one
> MSI-X vector). Several other drivers contain the same simple sequence, so I
> hope this will not create new problems.
> 
> Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and
> improve performance")
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> Reported-by: Danielle Ratson <danieller@nvidia.com>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
> COMPILE TESTED ONLY! I have no way to reproduce this even on a machine I
> have with igb. It works fine to load the igb driver and netconsole with no
> errors.
> ---
> v2: simplified patch with an attempt to make it work
> v1: original patch that apparently didn't work
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index e647cc89c239..5e24b7ce5a92 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -8104,7 +8104,7 @@ static int igb_poll(struct napi_struct *napi, int
> budget)
>  	if (likely(napi_complete_done(napi, work_done)))
>  		igb_ring_irq_enable(q_vector);
> 
> -	return min(work_done, budget - 1);
> +	return work_done;
>  }
> 
>  /**
> --
> 2.33.1
Danielle Ratson Nov. 24, 2021, 8:03 a.m. UTC | #2
> > From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > Sent: Tuesday, November 23, 2021 10:40 PM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>;
> > netdev@vger.kernel.org; Oleksandr Natalenko
> > <oleksandr@natalenko.name>; Danielle Ratson <danieller@nvidia.com>;
> > Alexander Duyck <alexander.duyck@gmail.com>
> > Subject: [PATCH net v2] igb: fix netpoll exit with traffic
> >
> > Oleksandr brought a bug report where netpoll causes trace messages in
> > the log on igb.
> >
> > Danielle brought this back up as still occuring, so we'll try again.

And also:

WARNING: 'occuring' may be misspelled - perhaps 'occurring'?
#9:
Danielle brought this back up as still occuring, so we'll try
                                       ^^^^^^^^
total: 0 errors, 1 warnings, 0 checks, 8 lines checked

Please Reword.

> 
> Hi Jessi,
> 
> Ill run tests with you patch and give you results for if it is ok.
> Thanks!
> 
> >
> > [22038.710800] ------------[ cut here ]------------ [22038.710801]
> > igb_poll+0x0/0x1440 [igb] exceeded budget in poll [22038.710802]
> > WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155
> > netpoll_poll_dev+0x18a/0x1a0
> >
> > As Alex suggested, change the driver to return work_done at the exit
> > of napi_poll, which should be safe to do in this driver because it is
> > not polling multiple queues in this single napi context (multiple
> > queues attached to one MSI-X vector). Several other drivers contain
> > the same simple sequence, so I hope this will not create new problems.
> >
> > Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead
> > and improve performance")
> > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> > Reported-by: Danielle Ratson <danieller@nvidia.com>
> > Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > ---
> > COMPILE TESTED ONLY! I have no way to reproduce this even on a machine
> > I have with igb. It works fine to load the igb driver and netconsole
> > with no errors.
> > ---
> > v2: simplified patch with an attempt to make it work
> > v1: original patch that apparently didn't work
> > ---
> >  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > b/drivers/net/ethernet/intel/igb/igb_main.c
> > index e647cc89c239..5e24b7ce5a92 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -8104,7 +8104,7 @@ static int igb_poll(struct napi_struct *napi,
> > int
> > budget)
> >  	if (likely(napi_complete_done(napi, work_done)))
> >  		igb_ring_irq_enable(q_vector);
> >
> > -	return min(work_done, budget - 1);
> > +	return work_done;
> >  }
> >
> >  /**
> > --
> > 2.33.1
Kris Karas (Bug reporting) Nov. 24, 2021, 10:32 p.m. UTC | #3
Jesse Brandeburg wrote:
> Oleksandr brought a bug report where netpoll causes trace
> messages in the log on igb.
> ---
> -	return min(work_done, budget - 1);
> +	return work_done;

I am able to reproduce the WARNING + stack trace at will.  It occurs on 
every boot.
Please see https://bugzilla.kernel.org/show_bug.cgi?id=212573 for details.

I have tested the simple patch (v2) against both kernels 5.14 and 5.15, 
and am happy to report that this fixes the issue for me.

Kris
Oleksandr Natalenko Nov. 25, 2021, 7:03 a.m. UTC | #4
Hello.

On úterý 23. listopadu 2021 21:40:00 CET Jesse Brandeburg wrote:
> Oleksandr brought a bug report where netpoll causes trace
> messages in the log on igb.
> 
> Danielle brought this back up as still occuring, so we'll try
> again.
> 
> [22038.710800] ------------[ cut here ]------------
> [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155
> netpoll_poll_dev+0x18a/0x1a0
> 
> As Alex suggested, change the driver to return work_done at the
> exit of napi_poll, which should be safe to do in this driver
> because it is not polling multiple queues in this single napi
> context (multiple queues attached to one MSI-X vector). Several
> other drivers contain the same simple sequence, so I hope
> this will not create new problems.
> 
> Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and
> improve performance") Reported-by: Oleksandr Natalenko
> <oleksandr@natalenko.name>
> Reported-by: Danielle Ratson <danieller@nvidia.com>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
> COMPILE TESTED ONLY! I have no way to reproduce this even on a machine I
> have with igb. It works fine to load the igb driver and netconsole with
> no errors.
> ---
> v2: simplified patch with an attempt to make it work
> v1: original patch that apparently didn't work
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c index
> e647cc89c239..5e24b7ce5a92 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -8104,7 +8104,7 @@ static int igb_poll(struct napi_struct *napi, int
> budget) if (likely(napi_complete_done(napi, work_done)))
>  		igb_ring_irq_enable(q_vector);
> 
> -	return min(work_done, budget - 1);
> +	return work_done;
>  }
> 
>  /**

This seems to address the issue for me. I do not see a warning after a couple 
of suspend/resume cycles any more, while previously it occurred after the first 
cycle.

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>

Thanks!
Danielle Ratson Nov. 25, 2021, 7:37 a.m. UTC | #5
> -----Original Message-----
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Sent: Tuesday, November 23, 2021 10:40 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>;
> netdev@vger.kernel.org; Oleksandr Natalenko
> <oleksandr@natalenko.name>; Danielle Ratson <danieller@nvidia.com>;
> Alexander Duyck <alexander.duyck@gmail.com>
> Subject: [PATCH net v2] igb: fix netpoll exit with traffic
> 
> Oleksandr brought a bug report where netpoll causes trace messages in the
> log on igb.
> 
> Danielle brought this back up as still occuring, so we'll try again.
> 
> [22038.710800] ------------[ cut here ]------------ [22038.710801]
> igb_poll+0x0/0x1440 [igb] exceeded budget in poll [22038.710802]
> WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155
> netpoll_poll_dev+0x18a/0x1a0
> 
> As Alex suggested, change the driver to return work_done at the exit of
> napi_poll, which should be safe to do in this driver because it is not polling
> multiple queues in this single napi context (multiple queues attached to one
> MSI-X vector). Several other drivers contain the same simple sequence, so I
> hope this will not create new problems.
> 
> Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and
> improve performance")
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> Reported-by: Danielle Ratson <danieller@nvidia.com>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
> COMPILE TESTED ONLY! I have no way to reproduce this even on a machine I
> have with igb. It works fine to load the igb driver and netconsole with no
> errors.
> ---
> v2: simplified patch with an attempt to make it work
> v1: original patch that apparently didn't work
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index e647cc89c239..5e24b7ce5a92 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -8104,7 +8104,7 @@ static int igb_poll(struct napi_struct *napi, int
> budget)
>  	if (likely(napi_complete_done(napi, work_done)))
>  		igb_ring_irq_enable(q_vector);
> 
> -	return min(work_done, budget - 1);
> +	return work_done;
>  }
> 
>  /**
> --
> 2.33.1

Tested and looks ok, thanks!
Danielle Ratson Nov. 25, 2021, 8:01 a.m. UTC | #6
> -----Original Message-----
> From: Danielle Ratson
> Sent: Thursday, November 25, 2021 9:37 AM
> To: Jesse Brandeburg <jesse.brandeburg@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Oleksandr Natalenko
> <oleksandr@natalenko.name>; Alexander Duyck
> <alexander.duyck@gmail.com>
> Subject: RE: [PATCH net v2] igb: fix netpoll exit with traffic
> 
> 
> 
> > -----Original Message-----
> > From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > Sent: Tuesday, November 23, 2021 10:40 PM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>;
> > netdev@vger.kernel.org; Oleksandr Natalenko
> > <oleksandr@natalenko.name>; Danielle Ratson <danieller@nvidia.com>;
> > Alexander Duyck <alexander.duyck@gmail.com>
> > Subject: [PATCH net v2] igb: fix netpoll exit with traffic
> >
> > Oleksandr brought a bug report where netpoll causes trace messages in
> > the log on igb.
> >
> > Danielle brought this back up as still occuring, so we'll try again.
> >
> > [22038.710800] ------------[ cut here ]------------ [22038.710801]
> > igb_poll+0x0/0x1440 [igb] exceeded budget in poll [22038.710802]
> > WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155
> > netpoll_poll_dev+0x18a/0x1a0
> >
> > As Alex suggested, change the driver to return work_done at the exit
> > of napi_poll, which should be safe to do in this driver because it is
> > not polling multiple queues in this single napi context (multiple
> > queues attached to one MSI-X vector). Several other drivers contain
> > the same simple sequence, so I hope this will not create new problems.
> >
> > Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead
> > and improve performance")
> > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> > Reported-by: Danielle Ratson <danieller@nvidia.com>
> > Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > ---
> > COMPILE TESTED ONLY! I have no way to reproduce this even on a machine
> > I have with igb. It works fine to load the igb driver and netconsole
> > with no errors.
> > ---
> > v2: simplified patch with an attempt to make it work
> > v1: original patch that apparently didn't work
> > ---
> >  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > b/drivers/net/ethernet/intel/igb/igb_main.c
> > index e647cc89c239..5e24b7ce5a92 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -8104,7 +8104,7 @@ static int igb_poll(struct napi_struct *napi,
> > int
> > budget)
> >  	if (likely(napi_complete_done(napi, work_done)))
> >  		igb_ring_irq_enable(q_vector);
> >
> > -	return min(work_done, budget - 1);
> > +	return work_done;
> >  }
> >
> >  /**
> > --
> > 2.33.1
> 
> Tested and looks ok, thanks!

Tested-By: Danielle Ratson <danieller@nvidia.com>
patchwork-bot+netdevbpf@kernel.org Nov. 25, 2021, 3:50 p.m. UTC | #7
Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 23 Nov 2021 12:40:00 -0800 you wrote:
> Oleksandr brought a bug report where netpoll causes trace
> messages in the log on igb.
> 
> Danielle brought this back up as still occuring, so we'll try
> again.
> 
> [22038.710800] ------------[ cut here ]------------
> [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
> 
> [...]

Here is the summary with links:
  - [net,v2] igb: fix netpoll exit with traffic
    https://git.kernel.org/netdev/net/c/eaeace60778e

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index e647cc89c239..5e24b7ce5a92 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8104,7 +8104,7 @@  static int igb_poll(struct napi_struct *napi, int budget)
 	if (likely(napi_complete_done(napi, work_done)))
 		igb_ring_irq_enable(q_vector);
 
-	return min(work_done, budget - 1);
+	return work_done;
 }
 
 /**