diff mbox series

[net] net: macb: Fix lost RX packet wakeup race in NAPI receive

Message ID 20220228183328.338143-1-robert.hancock@calian.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] net: macb: Fix lost RX packet wakeup race in NAPI receive | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 25 this patch: 25
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 25 this patch: 25
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Robert Hancock Feb. 28, 2022, 6:33 p.m. UTC
There is an oddity in the way the RSR register flags propagate to the
ISR register (and the actual interrupt output) on this hardware: it
appears that RSR register bits only result in ISR being asserted if the
interrupt was actually enabled at the time, so enabling interrupts with
RSR bits already set doesn't trigger an interrupt to be raised. There
was already a partial fix for this race in the macb_poll function where
it checked for RSR bits being set and re-triggered NAPI receive.
However, there was a still a race window between checking RSR and
actually enabling interrupts, where a lost wakeup could happen. It's
necessary to check again after enabling interrupts to see if RSR was set
just prior to the interrupt being enabled, and re-trigger receive in that
case.

This issue was noticed in a point-to-point UDP request-response protocol
which periodically saw timeouts or abnormally high response times due to
received packets not being processed in a timely fashion. In many
applications, more packets arriving, including TCP retransmissions, would
cause the original packet to be processed, thus masking the issue.

Also change from using napi_reschedule to napi_schedule, as the only
difference is the presence of a return value which wasn't used here
anyway.

Fixes: 02f7a34f34e3 ("net: macb: Re-enable RX interrupt only when RX is done")
Cc: stable@vger.kernel.org
Co-developed-by: Scott McNutt <scott.mcnutt@siriusxm.com>
Signed-off-by: Scott McNutt <scott.mcnutt@siriusxm.com>
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 26 ++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

Comments

Jakub Kicinski March 3, 2022, 6:40 a.m. UTC | #1
On Mon, 28 Feb 2022 12:33:28 -0600 Robert Hancock wrote:
> There is an oddity in the way the RSR register flags propagate to the
> ISR register (and the actual interrupt output) on this hardware: it
> appears that RSR register bits only result in ISR being asserted if the
> interrupt was actually enabled at the time, so enabling interrupts with
> RSR bits already set doesn't trigger an interrupt to be raised. There
> was already a partial fix for this race in the macb_poll function where
> it checked for RSR bits being set and re-triggered NAPI receive.
> However, there was a still a race window between checking RSR and
> actually enabling interrupts, where a lost wakeup could happen. It's
> necessary to check again after enabling interrupts to see if RSR was set
> just prior to the interrupt being enabled, and re-trigger receive in that
> case.
> 
> This issue was noticed in a point-to-point UDP request-response protocol
> which periodically saw timeouts or abnormally high response times due to
> received packets not being processed in a timely fashion. In many
> applications, more packets arriving, including TCP retransmissions, would
> cause the original packet to be processed, thus masking the issue.
> 
> Also change from using napi_reschedule to napi_schedule, as the only
> difference is the presence of a return value which wasn't used here
> anyway.

Let's leave that out from this particular patch - fixes should be
minimal, this sounds like cleanup.

> Fixes: 02f7a34f34e3 ("net: macb: Re-enable RX interrupt only when RX is done")
> Cc: stable@vger.kernel.org
> Co-developed-by: Scott McNutt <scott.mcnutt@siriusxm.com>
> Signed-off-by: Scott McNutt <scott.mcnutt@siriusxm.com>
> Signed-off-by: Robert Hancock <robert.hancock@calian.com>
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 26 ++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 98498a76ae16..338660fe1d93 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1573,14 +1573,36 @@ static int macb_poll(struct napi_struct *napi, int budget)
>  	if (work_done < budget) {
>  		napi_complete_done(napi, work_done);
>  
> -		/* Packets received while interrupts were disabled */
> +		/* RSR bits only seem to propagate to raise interrupts when
> +		 * interrupts are enabled at the time, so if bits are already
> +		 * set due to packets received while interrupts were disabled,
> +		 * they will not cause another interrupt to be generated when
> +		 * interrupts are re-enabled.
> +		 * Check for this case here.
> +		 */
>  		status = macb_readl(bp, RSR);

Which case is more likely - status == 0 or != 0?

Because MMIO reads are usually expensive so if status is likely 
to be zero your other suggestion could be lower overhead.
It'd be good to mention this expectation in the commit message 
or comment here.

>  		if (status) {
>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>  				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> -			napi_reschedule(napi);
> +			napi_schedule(napi);
>  		} else {
>  			queue_writel(queue, IER, bp->rx_intr_mask);
> +
> +			/* Packets could have been received in the window
> +			 * between the check above and re-enabling interrupts.
> +			 * Therefore, a double-check is required to avoid
> +			 * losing a wakeup. This can potentially race with
> +			 * the interrupt handler doing the same actions if an
> +			 * interrupt is raised just after enabling them, but
> +			 * this should be harmless.
> +			 */
> +			status = macb_readl(bp, RSR);
> +			if (unlikely(status)) {
> +				queue_writel(queue, IDR, bp->rx_intr_mask);
> +				if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> +					queue_writel(queue, ISR, MACB_BIT(RCOMP));
> +				napi_schedule(napi);
> +			}
>  		}
>  	}
>
Robert Hancock March 3, 2022, 4:01 p.m. UTC | #2
On Wed, 2022-03-02 at 22:40 -0800, Jakub Kicinski wrote:
> On Mon, 28 Feb 2022 12:33:28 -0600 Robert Hancock wrote:
> > There is an oddity in the way the RSR register flags propagate to the
> > ISR register (and the actual interrupt output) on this hardware: it
> > appears that RSR register bits only result in ISR being asserted if the
> > interrupt was actually enabled at the time, so enabling interrupts with
> > RSR bits already set doesn't trigger an interrupt to be raised. There
> > was already a partial fix for this race in the macb_poll function where
> > it checked for RSR bits being set and re-triggered NAPI receive.
> > However, there was a still a race window between checking RSR and
> > actually enabling interrupts, where a lost wakeup could happen. It's
> > necessary to check again after enabling interrupts to see if RSR was set
> > just prior to the interrupt being enabled, and re-trigger receive in that
> > case.
> > 
> > This issue was noticed in a point-to-point UDP request-response protocol
> > which periodically saw timeouts or abnormally high response times due to
> > received packets not being processed in a timely fashion. In many
> > applications, more packets arriving, including TCP retransmissions, would
> > cause the original packet to be processed, thus masking the issue.
> > 
> > Also change from using napi_reschedule to napi_schedule, as the only
> > difference is the presence of a return value which wasn't used here
> > anyway.
> 
> Let's leave that out from this particular patch - fixes should be
> minimal, this sounds like cleanup.

Can do.

> 
> > Fixes: 02f7a34f34e3 ("net: macb: Re-enable RX interrupt only when RX is
> > done")
> > Cc: stable@vger.kernel.org
> > Co-developed-by: Scott McNutt <scott.mcnutt@siriusxm.com>
> > Signed-off-by: Scott McNutt <scott.mcnutt@siriusxm.com>
> > Signed-off-by: Robert Hancock <robert.hancock@calian.com>
> > ---
> >  drivers/net/ethernet/cadence/macb_main.c | 26 ++++++++++++++++++++++--
> >  1 file changed, 24 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > b/drivers/net/ethernet/cadence/macb_main.c
> > index 98498a76ae16..338660fe1d93 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -1573,14 +1573,36 @@ static int macb_poll(struct napi_struct *napi, int
> > budget)
> >  	if (work_done < budget) {
> >  		napi_complete_done(napi, work_done);
> >  
> > -		/* Packets received while interrupts were disabled */
> > +		/* RSR bits only seem to propagate to raise interrupts when
> > +		 * interrupts are enabled at the time, so if bits are already
> > +		 * set due to packets received while interrupts were disabled,
> > +		 * they will not cause another interrupt to be generated when
> > +		 * interrupts are re-enabled.
> > +		 * Check for this case here.
> > +		 */
> >  		status = macb_readl(bp, RSR);
> 
> Which case is more likely - status == 0 or != 0?
> 
> Because MMIO reads are usually expensive so if status is likely 
> to be zero your other suggestion could be lower overhead.
> It'd be good to mention this expectation in the commit message 
> or comment here.

There was some measurement done on this that motivated a previous patch in this
area:

commit 504ad98df3a6b027ce997ca8f620e949cafb151f
Author: Soren Brinkmann <soren.brinkmann@xilinx.com>
Date:   Sun May 4 15:43:01 2014 -0700

    net: macb: Remove 'unlikely' optimization
    
    Coverage data suggests that the unlikely case of receiving data while
    the receive handler is running may not be that unlikely.
    Coverage data after running iperf for a while:
        91320:  891:        work_done = bp->macbgem_ops.mog_rx(bp, budget);
        91320:  892:        if (work_done < budget) {
         2362:  893:                napi_complete(napi);
            -:  894:
            -:  895:                /* Packets received while interrupts were
disabled */
         4724:  896:                status = macb_readl(bp, RSR);
         2362:  897:                if (unlikely(status)) {
          762:  898:                        if (bp->caps &
MACB_CAPS_ISR_CLEAR_ON_WRITE)
          762:  899:                                macb_writel(bp, ISR,
MACB_BIT(RCOMP));
            -:  900:                        napi_reschedule(napi);
            -:  901:                } else {
         1600:  902:                        macb_writel(bp, IER,
MACB_RX_INT_FLAGS);
            -:  903:                }
            -:  904:        }
    
    Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

So it looks like the non-zero status case was being hit roughly 1/3 of the
time, at least under that particular workload. It may depend heavily on
workload etc. but doesn't seem to be clear-cut to optimize one way or the
other.

For the new "double check" branch, from adding debug in, it appears that one is
hit on the order of a few dozen times a day under constant load, so the
"unlikely" seems appropriate there.

> 
> >  		if (status) {
> >  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> >  				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> > -			napi_reschedule(napi);
> > +			napi_schedule(napi);
> >  		} else {
> >  			queue_writel(queue, IER, bp->rx_intr_mask);
> > +
> > +			/* Packets could have been received in the window
> > +			 * between the check above and re-enabling interrupts.
> > +			 * Therefore, a double-check is required to avoid
> > +			 * losing a wakeup. This can potentially race with
> > +			 * the interrupt handler doing the same actions if an
> > +			 * interrupt is raised just after enabling them, but
> > +			 * this should be harmless.
> > +			 */
> > +			status = macb_readl(bp, RSR);
> > +			if (unlikely(status)) {
> > +				queue_writel(queue, IDR, bp->rx_intr_mask);
> > +				if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > +					queue_writel(queue, ISR,
> > MACB_BIT(RCOMP));
> > +				napi_schedule(napi);
> > +			}
> >  		}
> >  	}
> >
diff mbox series

Patch

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 98498a76ae16..338660fe1d93 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1573,14 +1573,36 @@  static int macb_poll(struct napi_struct *napi, int budget)
 	if (work_done < budget) {
 		napi_complete_done(napi, work_done);
 
-		/* Packets received while interrupts were disabled */
+		/* RSR bits only seem to propagate to raise interrupts when
+		 * interrupts are enabled at the time, so if bits are already
+		 * set due to packets received while interrupts were disabled,
+		 * they will not cause another interrupt to be generated when
+		 * interrupts are re-enabled.
+		 * Check for this case here.
+		 */
 		status = macb_readl(bp, RSR);
 		if (status) {
 			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
 				queue_writel(queue, ISR, MACB_BIT(RCOMP));
-			napi_reschedule(napi);
+			napi_schedule(napi);
 		} else {
 			queue_writel(queue, IER, bp->rx_intr_mask);
+
+			/* Packets could have been received in the window
+			 * between the check above and re-enabling interrupts.
+			 * Therefore, a double-check is required to avoid
+			 * losing a wakeup. This can potentially race with
+			 * the interrupt handler doing the same actions if an
+			 * interrupt is raised just after enabling them, but
+			 * this should be harmless.
+			 */
+			status = macb_readl(bp, RSR);
+			if (unlikely(status)) {
+				queue_writel(queue, IDR, bp->rx_intr_mask);
+				if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
+					queue_writel(queue, ISR, MACB_BIT(RCOMP));
+				napi_schedule(napi);
+			}
 		}
 	}