diff mbox series

[2/3] usb: dwc3: ulpi: Replace CPU-based busyloop with Protocol-based one

Message ID 20201010222351.7323-3-Sergey.Semin@baikalelectronics.ru (mailing list archive)
State Superseded
Headers show
Series usb: dwc3: ulpi: Fix UPLI registers read/write ops | expand

Commit Message

Serge Semin Oct. 10, 2020, 10:23 p.m. UTC
Originally the procedure of the ULPI transaction finish detection has been
developed as a simple busy-loop with just decrementing counter and no
delays. It's wrong since on different systems the loop will take a
different time to complete. So if the system bus and CPU are fast enough
to overtake the ULPI bus and the companion PHY reaction, then we'll get to
take a false timeout error. Fix this by converting the busy-loop procedure
to take the standard bus speed, address value and the registers access
mode into account for the busy-loop delay calculation.

Here is the way the fix works. It's known that the ULPI bus is clocked
with 60MHz signal. In accordance with [1] the ULPI bus protocol is created
so to spend 5 and 6 clock periods for immediate register write and read
operations respectively, and 6 and 7 clock periods - for the extended
register writes and reads. Based on that we can easily pre-calculate the
time which will be needed for the controller to perform a requested IO
operation. Note we'll still preserve the attempts counter in case if the
DWC USB3 controller has got some internals delays.

[1] UTMI+ Low Pin Interface (ULPI) Specification, Revision 1.1,
    October 20, 2004, pp. 30 - 36.

Fixes: 88bc9d194ff6 ("usb: dwc3: add ULPI interface support")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
---
 drivers/usb/dwc3/ulpi.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Felipe Balbi Oct. 27, 2020, 9:18 a.m. UTC | #1
Hi,

Serge Semin <Sergey.Semin@baikalelectronics.ru> writes:

> Originally the procedure of the ULPI transaction finish detection has been
> developed as a simple busy-loop with just decrementing counter and no
> delays. It's wrong since on different systems the loop will take a
> different time to complete. So if the system bus and CPU are fast enough
> to overtake the ULPI bus and the companion PHY reaction, then we'll get to
> take a false timeout error. Fix this by converting the busy-loop procedure
> to take the standard bus speed, address value and the registers access
> mode into account for the busy-loop delay calculation.
>
> Here is the way the fix works. It's known that the ULPI bus is clocked
> with 60MHz signal. In accordance with [1] the ULPI bus protocol is created
> so to spend 5 and 6 clock periods for immediate register write and read
> operations respectively, and 6 and 7 clock periods - for the extended
> register writes and reads. Based on that we can easily pre-calculate the
> time which will be needed for the controller to perform a requested IO
> operation. Note we'll still preserve the attempts counter in case if the
> DWC USB3 controller has got some internals delays.
>
> [1] UTMI+ Low Pin Interface (ULPI) Specification, Revision 1.1,
>     October 20, 2004, pp. 30 - 36.
>
> Fixes: 88bc9d194ff6 ("usb: dwc3: add ULPI interface support")
> Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> ---
>  drivers/usb/dwc3/ulpi.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/usb/dwc3/ulpi.c b/drivers/usb/dwc3/ulpi.c
> index 20f5d9aba317..0dbc826355a5 100644
> --- a/drivers/usb/dwc3/ulpi.c
> +++ b/drivers/usb/dwc3/ulpi.c
> @@ -7,6 +7,8 @@
>   * Author: Heikki Krogerus <heikki.krogerus@linux.intel.com>
>   */
>  
> +#include <linux/delay.h>
> +#include <linux/time64.h>
>  #include <linux/ulpi/regs.h>
>  
>  #include "core.h"
> @@ -17,12 +19,22 @@
>  		DWC3_GUSB2PHYACC_ADDR(ULPI_ACCESS_EXTENDED) | \
>  		DWC3_GUSB2PHYACC_EXTEND_ADDR(a) : DWC3_GUSB2PHYACC_ADDR(a))
>  
> -static int dwc3_ulpi_busyloop(struct dwc3 *dwc)
> +#define DWC3_ULPI_BASE_DELAY	DIV_ROUND_UP(NSEC_PER_SEC, 60000000L)
> +
> +static int dwc3_ulpi_busyloop(struct dwc3 *dwc, u8 addr, bool read)
>  {
> +	unsigned long ns = 5L * DWC3_ULPI_BASE_DELAY;
>  	unsigned count = 1000;
>  	u32 reg;
>  
> +	if (addr >= ULPI_EXT_VENDOR_SPECIFIC)
> +		ns += DWC3_ULPI_BASE_DELAY;
> +
> +	if (read)
> +		ns += DWC3_ULPI_BASE_DELAY;
> +
>  	while (count--) {
> +		ndelay(ns);

could we allow for a sleep here instead of a delay? Also, I wonder if
you need to make this so complex or should we just take the larger
access time of 7 clock cycles.
Serge Semin Oct. 27, 2020, 9:06 p.m. UTC | #2
On Tue, Oct 27, 2020 at 11:18:51AM +0200, Felipe Balbi wrote:
> 
> Hi,
> 
> Serge Semin <Sergey.Semin@baikalelectronics.ru> writes:
> 
> > Originally the procedure of the ULPI transaction finish detection has been
> > developed as a simple busy-loop with just decrementing counter and no
> > delays. It's wrong since on different systems the loop will take a
> > different time to complete. So if the system bus and CPU are fast enough
> > to overtake the ULPI bus and the companion PHY reaction, then we'll get to
> > take a false timeout error. Fix this by converting the busy-loop procedure
> > to take the standard bus speed, address value and the registers access
> > mode into account for the busy-loop delay calculation.
> >
> > Here is the way the fix works. It's known that the ULPI bus is clocked
> > with 60MHz signal. In accordance with [1] the ULPI bus protocol is created
> > so to spend 5 and 6 clock periods for immediate register write and read
> > operations respectively, and 6 and 7 clock periods - for the extended
> > register writes and reads. Based on that we can easily pre-calculate the
> > time which will be needed for the controller to perform a requested IO
> > operation. Note we'll still preserve the attempts counter in case if the
> > DWC USB3 controller has got some internals delays.
> >
> > [1] UTMI+ Low Pin Interface (ULPI) Specification, Revision 1.1,
> >     October 20, 2004, pp. 30 - 36.
> >
> > Fixes: 88bc9d194ff6 ("usb: dwc3: add ULPI interface support")
> > Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> > ---
> >  drivers/usb/dwc3/ulpi.c | 18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/usb/dwc3/ulpi.c b/drivers/usb/dwc3/ulpi.c
> > index 20f5d9aba317..0dbc826355a5 100644
> > --- a/drivers/usb/dwc3/ulpi.c
> > +++ b/drivers/usb/dwc3/ulpi.c
> > @@ -7,6 +7,8 @@
> >   * Author: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> >   */
> >  
> > +#include <linux/delay.h>
> > +#include <linux/time64.h>
> >  #include <linux/ulpi/regs.h>
> >  
> >  #include "core.h"
> > @@ -17,12 +19,22 @@
> >  		DWC3_GUSB2PHYACC_ADDR(ULPI_ACCESS_EXTENDED) | \
> >  		DWC3_GUSB2PHYACC_EXTEND_ADDR(a) : DWC3_GUSB2PHYACC_ADDR(a))
> >  
> > -static int dwc3_ulpi_busyloop(struct dwc3 *dwc)
> > +#define DWC3_ULPI_BASE_DELAY	DIV_ROUND_UP(NSEC_PER_SEC, 60000000L)
> > +
> > +static int dwc3_ulpi_busyloop(struct dwc3 *dwc, u8 addr, bool read)
> >  {
> > +	unsigned long ns = 5L * DWC3_ULPI_BASE_DELAY;
> >  	unsigned count = 1000;
> >  	u32 reg;
> >  
> > +	if (addr >= ULPI_EXT_VENDOR_SPECIFIC)
> > +		ns += DWC3_ULPI_BASE_DELAY;
> > +
> > +	if (read)
> > +		ns += DWC3_ULPI_BASE_DELAY;
> > +
> >  	while (count--) {
> > +		ndelay(ns);
> 

> could we allow for a sleep here instead of a delay?

The kernel ULPI-bus API isn't clear about that. I also couldn't find an example
of using the ULPI-bus accessors in the atomic context or being implemented with
the sleeping methods. So there is no certain answer to your question. Anyway I
added an ms-sleep in the later patch to fix the suspend-regression problem. I
thought it was reasonable since I couldn't find an example of using the
accessors in the atomic context.

Regarding this patch. I wouldn't suggest to replace the ndelay with sleeping
here, since 5-7 ref clock ticks is enough to finish the transaction for the vast
majority of the cases. It's just 80 - 115 ns, which can't be reached by the
sleeping procedures.

> Also, I wonder if
> you need to make this so complex or should we just take the larger
> access time of 7 clock cycles.

I wouldn't say it's complex. Here I've implemented a simple calculation of the
time needed to finish the ULPI-bus commands in accordance with the number of
ticks they normally require. Regarding the while-looping alas we can't get rid
of it here for the reason I've described in the patch 3 of the series.

-Sergey

> 
> -- 
> balbi
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/ulpi.c b/drivers/usb/dwc3/ulpi.c
index 20f5d9aba317..0dbc826355a5 100644
--- a/drivers/usb/dwc3/ulpi.c
+++ b/drivers/usb/dwc3/ulpi.c
@@ -7,6 +7,8 @@ 
  * Author: Heikki Krogerus <heikki.krogerus@linux.intel.com>
  */
 
+#include <linux/delay.h>
+#include <linux/time64.h>
 #include <linux/ulpi/regs.h>
 
 #include "core.h"
@@ -17,12 +19,22 @@ 
 		DWC3_GUSB2PHYACC_ADDR(ULPI_ACCESS_EXTENDED) | \
 		DWC3_GUSB2PHYACC_EXTEND_ADDR(a) : DWC3_GUSB2PHYACC_ADDR(a))
 
-static int dwc3_ulpi_busyloop(struct dwc3 *dwc)
+#define DWC3_ULPI_BASE_DELAY	DIV_ROUND_UP(NSEC_PER_SEC, 60000000L)
+
+static int dwc3_ulpi_busyloop(struct dwc3 *dwc, u8 addr, bool read)
 {
+	unsigned long ns = 5L * DWC3_ULPI_BASE_DELAY;
 	unsigned count = 1000;
 	u32 reg;
 
+	if (addr >= ULPI_EXT_VENDOR_SPECIFIC)
+		ns += DWC3_ULPI_BASE_DELAY;
+
+	if (read)
+		ns += DWC3_ULPI_BASE_DELAY;
+
 	while (count--) {
+		ndelay(ns);
 		reg = dwc3_readl(dwc->regs, DWC3_GUSB2PHYACC(0));
 		if (reg & DWC3_GUSB2PHYACC_DONE)
 			return 0;
@@ -47,7 +59,7 @@  static int dwc3_ulpi_read(struct device *dev, u8 addr)
 	reg = DWC3_GUSB2PHYACC_NEWREGREQ | DWC3_ULPI_ADDR(addr);
 	dwc3_writel(dwc->regs, DWC3_GUSB2PHYACC(0), reg);
 
-	ret = dwc3_ulpi_busyloop(dwc);
+	ret = dwc3_ulpi_busyloop(dwc, addr, true);
 	if (ret)
 		return ret;
 
@@ -71,7 +83,7 @@  static int dwc3_ulpi_write(struct device *dev, u8 addr, u8 val)
 	reg |= DWC3_GUSB2PHYACC_WRITE | val;
 	dwc3_writel(dwc->regs, DWC3_GUSB2PHYACC(0), reg);
 
-	return dwc3_ulpi_busyloop(dwc);
+	return dwc3_ulpi_busyloop(dwc, addr, false);
 }
 
 static const struct ulpi_ops dwc3_ulpi_ops = {