diff mbox series

[1/2] net: axienet: recheck condition after timeout in mdio_wait()

Message ID 20181030093139.10226-2-kurt@linutronix.de (mailing list archive)
State New, archived
Headers show
Series net: xlinx: mdio: recheck condition after timeout | expand

Commit Message

Kurt Kanzenbach Oct. 30, 2018, 9:31 a.m. UTC
The function could report a false positive if it gets preempted between reading
the XAE_MDIO_MCR_OFFSET register and checking for the timeout.  In such a case,
the condition has to be rechecked to avoid false positives.

Therefore, check for expected condition even after the timeout occurred.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
---
 drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

Comments

David Miller Oct. 30, 2018, 6:25 p.m. UTC | #1
From: Kurt Kanzenbach <kurt@linutronix.de>
Date: Tue, 30 Oct 2018 10:31:38 +0100

> The function could report a false positive if it gets preempted between reading
> the XAE_MDIO_MCR_OFFSET register and checking for the timeout.  In such a case,
> the condition has to be rechecked to avoid false positives.
> 
> Therefore, check for expected condition even after the timeout occurred.
> 
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
 ...
>  		if (time_before_eq(end, jiffies)) {
> -			WARN_ON(1);
> -			return -ETIMEDOUT;
> +			val = axienet_ior(lp, XAE_MDIO_MCR_OFFSET);
> +			break;
>  		}
> +
>  		udelay(1);
>  	}
> -	return 0;
> +	if (val & XAE_MDIO_MCR_READY_MASK)
> +		return 0;
> +
> +	WARN_ON(1);
> +	return -ETIMEDOUT;

You are not fundamentally changing the situation at all.

The condtion could change right after your last read of
XAR_MDIO_MCR_OFFSET, which is the same thing that happens before your
modifications to this code.

It sounds more like the timeout is slightly too short, and that's the
real problem that causes whatever behavior you think you are fixing
here.

I'm not applying this.
Kurt Kanzenbach Oct. 31, 2018, 1:06 p.m. UTC | #2
On Tue, Oct 30, 2018 at 11:25:11AM -0700, David Miller wrote:
> From: Kurt Kanzenbach <kurt@linutronix.de>
> Date: Tue, 30 Oct 2018 10:31:38 +0100
>
> > The function could report a false positive if it gets preempted between reading
> > the XAE_MDIO_MCR_OFFSET register and checking for the timeout.  In such a case,
> > the condition has to be rechecked to avoid false positives.
> >
> > Therefore, check for expected condition even after the timeout occurred.
> >
> > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
>  ...
> >  		if (time_before_eq(end, jiffies)) {
> > -			WARN_ON(1);
> > -			return -ETIMEDOUT;
> > +			val = axienet_ior(lp, XAE_MDIO_MCR_OFFSET);
> > +			break;
> >  		}
> > +
> >  		udelay(1);
> >  	}
> > -	return 0;
> > +	if (val & XAE_MDIO_MCR_READY_MASK)
> > +		return 0;
> > +
> > +	WARN_ON(1);
> > +	return -ETIMEDOUT;
>
> You are not fundamentally changing the situation at all.
>
> The condtion could change right after your last read of
> XAR_MDIO_MCR_OFFSET, which is the same thing that happens before your
> modifications to this code.

That's true. The problem is different: If the current task gets
preempted by a higher priority task between checking the condition and
the timeout code, then a timeout might be falsely detected. Consider the
following events:

loop:
 check mdio condition
 ------------------------
 task with real time priority may run for a long time
 ------------------------
 check for timeout
 wait

That's why I've added the recheck of the condition in the timeout case.

>
> It sounds more like the timeout is slightly too short, and that's the
> real problem that causes whatever behavior you think you are fixing
> here.

The timeout value is not the problem here.

Thanks,
Kurt
Sebastian Andrzej Siewior Nov. 5, 2018, 9:16 a.m. UTC | #3
On 2018-10-30 11:25:11 [-0700], David Miller wrote:
> From: Kurt Kanzenbach <kurt@linutronix.de>
> Date: Tue, 30 Oct 2018 10:31:38 +0100
> 
> > The function could report a false positive if it gets preempted between reading
> > the XAE_MDIO_MCR_OFFSET register and checking for the timeout.  In such a case,
> > the condition has to be rechecked to avoid false positives.
> > 
> > Therefore, check for expected condition even after the timeout occurred.
> > 
> > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
>  ...
> >  		if (time_before_eq(end, jiffies)) {
> > -			WARN_ON(1);
> > -			return -ETIMEDOUT;
> > +			val = axienet_ior(lp, XAE_MDIO_MCR_OFFSET);
> > +			break;
> >  		}
> > +
> >  		udelay(1);
> >  	}
> > -	return 0;
> > +	if (val & XAE_MDIO_MCR_READY_MASK)
> > +		return 0;
> > +
> > +	WARN_ON(1);
> > +	return -ETIMEDOUT;
> 
> You are not fundamentally changing the situation at all.

> The condtion could change right after your last read of
> XAR_MDIO_MCR_OFFSET, which is the same thing that happens before your
> modifications to this code.
> 
> It sounds more like the timeout is slightly too short, and that's the
> real problem that causes whatever behavior you think you are fixing
> here.

There is a timeout of two jiffies. If the condition is not true within
those two jiffies it will attempt to check condition one last time after
the timeout occured.
If the task got preempted after the reading from the register but before
the timeout it is possible that the task gets back on the CPU after the
timeout occured. And since the timeout occured it won't check if the
condition changed:
Time
 0   +---+
     | c | Check for condition (false)
     | c |
     | c |
     | c |
     | c |
     | P | Task gets preempted
     |   |
     | O | Condition is true, task still preempted, no check
     |   |
 2   | T | The timeout is true
     |   |
     |   |
     |   |
     | p | Task gets back on the CPU, no re-check of condition

In the last step, there is no checking of the condition after the
timeout occured and it wrongly assumes that the condition is not true.
Increasing the timeout would help as long as the task gets not preempted
past the new timeout.
The same pattern (check condition after timeout) is also used in
wait_event_timeout() or readx_poll_timeout().  Would you prefer to
refactor this with readx_poll_timeout() instead?

> I'm not applying this.
Please reconsider.

Sebastian
diff mbox series

Patch

diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c b/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c
index 757a3b37ae8a..4f13125e4942 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c
@@ -21,15 +21,26 @@ 
 int axienet_mdio_wait_until_ready(struct axienet_local *lp)
 {
 	unsigned long end = jiffies + 2;
-	while (!(axienet_ior(lp, XAE_MDIO_MCR_OFFSET) &
-		 XAE_MDIO_MCR_READY_MASK)) {
+	u32 val;
+
+	while (1) {
+		val = axienet_ior(lp, XAE_MDIO_MCR_OFFSET);
+
+		if (val & XAE_MDIO_MCR_READY_MASK)
+			return 0;
+
 		if (time_before_eq(end, jiffies)) {
-			WARN_ON(1);
-			return -ETIMEDOUT;
+			val = axienet_ior(lp, XAE_MDIO_MCR_OFFSET);
+			break;
 		}
+
 		udelay(1);
 	}
-	return 0;
+	if (val & XAE_MDIO_MCR_READY_MASK)
+		return 0;
+
+	WARN_ON(1);
+	return -ETIMEDOUT;
 }
 
 /**