diff mbox series

[RFC,2/2] hwmon: (pmbus/ucd9000) Throttle SMBus transfers to avoid poor behaviour

Message ID 20200914122811.3295678-3-andrew@aj.id.au (mailing list archive)
State Changes Requested
Headers show
Series Throttle I2C transfers to UCD9000 devices | expand

Commit Message

Andrew Jeffery Sept. 14, 2020, 12:28 p.m. UTC
Short turn-around times between transfers to e.g. the UCD90320 can lead
to problematic behaviour, including excessive clock stretching, bus
lockups and potential corruption of the device's volatile state.

Introduce transfer throttling for the device with a minimum access
delay of 1ms.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
---
 drivers/hwmon/pmbus/ucd9000.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Andrew Jeffery Sept. 15, 2020, 12:09 a.m. UTC | #1
On Mon, 14 Sep 2020, at 23:44, Guenter Roeck wrote:
> On 9/14/20 5:28 AM, Andrew Jeffery wrote:
> > Short turn-around times between transfers to e.g. the UCD90320 can lead
> > to problematic behaviour, including excessive clock stretching, bus
> > lockups and potential corruption of the device's volatile state.
> > 
> > Introduce transfer throttling for the device with a minimum access
> > delay of 1ms.
> > 
> 
> Some Zilker labs devices have the same problem, though not as bad
> to need a 1ms delay. See zl6100.c. Various LTS devices have a similar
> problem, but there it is possible to poll the device until it is ready.
> See ltc2978.c.

Ah, this kind of info is what I was hoping for. Thanks.

> 
> > Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> > ---
> >  drivers/hwmon/pmbus/ucd9000.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/hwmon/pmbus/ucd9000.c b/drivers/hwmon/pmbus/ucd9000.c
> > index 81f4c4f166cd..a0b97d035326 100644
> > --- a/drivers/hwmon/pmbus/ucd9000.c
> > +++ b/drivers/hwmon/pmbus/ucd9000.c
> > @@ -9,6 +9,7 @@
> >  #include <linux/debugfs.h>
> >  #include <linux/kernel.h>
> >  #include <linux/module.h>
> > +#include <linux/moduleparam.h>
> >  #include <linux/of_device.h>
> >  #include <linux/init.h>
> >  #include <linux/err.h>
> > @@ -18,6 +19,9 @@
> >  #include <linux/gpio/driver.h>
> >  #include "pmbus.h"
> >  
> > +static unsigned long smbus_delay_us = 1000;
> 
> Is that to be on the super-safe side ? Patch 0 talks about needing 250 uS.

Yeah, 250us was what we came to after 5 minutes of playing with values and a 
logic analyzer, we didn't really take the time to determine a minimum with 
confidence. TI mentioned the minimum time between transfers in their test 
environment is 2.5ms, so 1ms is aggressive by comparison.

> 
> > +module_param(smbus_delay_us, ulong, 0664);
> > +
> 
> I would not want to have this in user control, and it should not affect devices
> not known to be affected. 

Certainly agree with the latter, and regarding the former, it was mostly a
convenient mechanism for me to experiment with values. I agree that it's
not something we would want to be changed arbitrarily by a system admin.

> I would suggest an implementation similar to other
> affected devices; again, see zl6100.c or ltc2978.c for examples.

Yes, thanks for these pointers, I will take a look.

Andrew
Andrew Jeffery Sept. 16, 2020, 5:21 a.m. UTC | #2
On Mon, 14 Sep 2020, at 23:44, Guenter Roeck wrote:
> On 9/14/20 5:28 AM, Andrew Jeffery wrote:
> > Short turn-around times between transfers to e.g. the UCD90320 can lead
> > to problematic behaviour, including excessive clock stretching, bus
> > lockups and potential corruption of the device's volatile state.
> > 
> > Introduce transfer throttling for the device with a minimum access
> > delay of 1ms.
> > 
> 
> Some Zilker labs devices have the same problem, though not as bad
> to need a 1ms delay. See zl6100.c. Various LTS devices have a similar
> problem, but there it is possible to poll the device until it is ready.
> See ltc2978.c.
> 
> > Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> > ---
> >  drivers/hwmon/pmbus/ucd9000.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/hwmon/pmbus/ucd9000.c b/drivers/hwmon/pmbus/ucd9000.c
> > index 81f4c4f166cd..a0b97d035326 100644
> > --- a/drivers/hwmon/pmbus/ucd9000.c
> > +++ b/drivers/hwmon/pmbus/ucd9000.c
> > @@ -9,6 +9,7 @@
> >  #include <linux/debugfs.h>
> >  #include <linux/kernel.h>
> >  #include <linux/module.h>
> > +#include <linux/moduleparam.h>
> >  #include <linux/of_device.h>
> >  #include <linux/init.h>
> >  #include <linux/err.h>
> > @@ -18,6 +19,9 @@
> >  #include <linux/gpio/driver.h>
> >  #include "pmbus.h"
> >  
> > +static unsigned long smbus_delay_us = 1000;
> 
> Is that to be on the super-safe side ? Patch 0 talks about needing 250 uS.
> 
> > +module_param(smbus_delay_us, ulong, 0664);
> > +
> 
> I would not want to have this in user control, and it should not affect devices
> not known to be affected. 

Can you clarify what you mean here? Initially I interpreted your statement as 
meaning "Don't impose delays on the UCD90160 when the issues have only been 
demonstrated with the UCD90320". But I've since looked at zl6100.c and its 
delay is also exposed as a module parameter, which makes me wonder whether it 
was unclear that smbus_delay_us here is specific to the driver's i2c_client and 
is not a delay imposed on all SMBus accesses from the associated master. That 
is, with the implementation I've posted here, other (non-UCD9000) devices on 
the same bus are _not_ impacted by this value.

> I would suggest an implementation similar to other
> affected devices; again, see zl6100.c or ltc2978.c for examples.

I've had a look at these two examples. As you suggest the delays in zl6100.c 
look pretty similar to what this series implements in the i2c core. I'm finding 
it hard to dislodge the feeling that open-coding the waits is error prone, but 
to avoid that and not implement the waits in the i2c core means having almost 
duplicate implementations of handlers for i2c_smbus_{read,write}*() and 
pmbus_{read,write}*() calls in the driver.

Andrew
Guenter Roeck Sept. 16, 2020, 3:56 p.m. UTC | #3
On Wed, Sep 16, 2020 at 02:51:08PM +0930, Andrew Jeffery wrote:
> 
> 
> On Mon, 14 Sep 2020, at 23:44, Guenter Roeck wrote:
> > On 9/14/20 5:28 AM, Andrew Jeffery wrote:
> > > Short turn-around times between transfers to e.g. the UCD90320 can lead
> > > to problematic behaviour, including excessive clock stretching, bus
> > > lockups and potential corruption of the device's volatile state.
> > > 
> > > Introduce transfer throttling for the device with a minimum access
> > > delay of 1ms.
> > > 
> > 
> > Some Zilker labs devices have the same problem, though not as bad
> > to need a 1ms delay. See zl6100.c. Various LTS devices have a similar
> > problem, but there it is possible to poll the device until it is ready.
> > See ltc2978.c.
> > 
> > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> > > ---
> > >  drivers/hwmon/pmbus/ucd9000.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/drivers/hwmon/pmbus/ucd9000.c b/drivers/hwmon/pmbus/ucd9000.c
> > > index 81f4c4f166cd..a0b97d035326 100644
> > > --- a/drivers/hwmon/pmbus/ucd9000.c
> > > +++ b/drivers/hwmon/pmbus/ucd9000.c
> > > @@ -9,6 +9,7 @@
> > >  #include <linux/debugfs.h>
> > >  #include <linux/kernel.h>
> > >  #include <linux/module.h>
> > > +#include <linux/moduleparam.h>
> > >  #include <linux/of_device.h>
> > >  #include <linux/init.h>
> > >  #include <linux/err.h>
> > > @@ -18,6 +19,9 @@
> > >  #include <linux/gpio/driver.h>
> > >  #include "pmbus.h"
> > >  
> > > +static unsigned long smbus_delay_us = 1000;
> > 
> > Is that to be on the super-safe side ? Patch 0 talks about needing 250 uS.
> > 
> > > +module_param(smbus_delay_us, ulong, 0664);
> > > +
> > 
> > I would not want to have this in user control, and it should not affect devices
> > not known to be affected. 
> 
> Can you clarify what you mean here? Initially I interpreted your statement as 
> meaning "Don't impose delays on the UCD90160 when the issues have only been 
> demonstrated with the UCD90320". But I've since looked at zl6100.c and its 

Yes, that is what I meant.

> delay is also exposed as a module parameter, which makes me wonder whether it 

My bad. Not how I would implement it today. I'd also not use udelay() if I were
to re-implement this today.

> was unclear that smbus_delay_us here is specific to the driver's i2c_client and 
> is not a delay imposed on all SMBus accesses from the associated master. That 
> is, with the implementation I've posted here, other (non-UCD9000) devices on 
> the same bus are _not_ impacted by this value.
> 
The delay is specific to the driver's i2c client.

> > I would suggest an implementation similar to other
> > affected devices; again, see zl6100.c or ltc2978.c for examples.
> 
> I've had a look at these two examples. As you suggest the delays in zl6100.c 
> look pretty similar to what this series implements in the i2c core. I'm finding 
> it hard to dislodge the feeling that open-coding the waits is error prone, but 
> to avoid that and not implement the waits in the i2c core means having almost 
> duplicate implementations of handlers for i2c_smbus_{read,write}*() and 
> pmbus_{read,write}*() calls in the driver.
> 

Not sure I can follow you here. Anyway, it seems to me that you are set on
an implementation in the i2c core. I personally don't like that approach,
but I'll accept a change in the ucd9000 driver to make use of it. Please
leave the zl6100 code alone, though - it took me long enough to get that
working, and I won't have time to test any changes.

Thanks,
Guenter
Andrew Jeffery Sept. 17, 2020, 12:33 a.m. UTC | #4
On Thu, 17 Sep 2020, at 01:26, Guenter Roeck wrote:
> > I've had a look at these two examples. As you suggest the delays in zl6100.c 
> > look pretty similar to what this series implements in the i2c core. I'm finding 
> > it hard to dislodge the feeling that open-coding the waits is error prone, but 
> > to avoid that and not implement the waits in the i2c core means having almost 
> > duplicate implementations of handlers for i2c_smbus_{read,write}*() and 
> > pmbus_{read,write}*() calls in the driver.
> > 
> 
> Not sure I can follow you here. Anyway, it seems to me that you are set on
> an implementation in the i2c core. I personally don't like that approach,

Not really set on it, but it does seem convenient. I'm looking at whether 
delays resolve the issues we have with the max31785 as well (I have a bunch of 
patches that introduce retries under the various circumstances we've hit poor 
behaviour).

> but I'll accept a change in the ucd9000 driver to make use of it. Please
> leave the zl6100 code alone, though - it took me long enough to get that
> working, and I won't have time to test any changes.

No worries. If you don't have time to test changes it reduces the motivation to 
find a general approach, and so maybe isolating the work-arounds to the ucd9000 
is the way to go.

Thanks for the feedback.

Andrew
diff mbox series

Patch

diff --git a/drivers/hwmon/pmbus/ucd9000.c b/drivers/hwmon/pmbus/ucd9000.c
index 81f4c4f166cd..a0b97d035326 100644
--- a/drivers/hwmon/pmbus/ucd9000.c
+++ b/drivers/hwmon/pmbus/ucd9000.c
@@ -9,6 +9,7 @@ 
 #include <linux/debugfs.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/moduleparam.h>
 #include <linux/of_device.h>
 #include <linux/init.h>
 #include <linux/err.h>
@@ -18,6 +19,9 @@ 
 #include <linux/gpio/driver.h>
 #include "pmbus.h"
 
+static unsigned long smbus_delay_us = 1000;
+module_param(smbus_delay_us, ulong, 0664);
+
 enum chips { ucd9000, ucd90120, ucd90124, ucd90160, ucd90320, ucd9090,
 	     ucd90910 };
 
@@ -502,6 +506,8 @@  static int ucd9000_probe(struct i2c_client *client,
 				     I2C_FUNC_SMBUS_BLOCK_DATA))
 		return -ENODEV;
 
+	i2c_smbus_throttle_client(client, smbus_delay_us);
+
 	ret = i2c_smbus_read_block_data(client, UCD9000_DEVICE_ID,
 					block_buffer);
 	if (ret < 0) {