diff mbox series

[v1] i2c: imx: Retry transfer on transient failure

Message ID 20220712082415.319738-1-francesco.dolcini@toradex.com (mailing list archive)
State New, archived
Headers show
Series [v1] i2c: imx: Retry transfer on transient failure | expand

Commit Message

Francesco Dolcini July 12, 2022, 8:24 a.m. UTC
From: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>

Set the i2c_adapter retries field to a sensible value. This allows
the i2c core to retry master_xfer()/master_xfer_atomic() when it
returns -EAGAIN. Currently the i2c-imx driver returns -EAGAIN only
on Tx arbitration failure (I2SR_IAL).

Signed-off-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
---
 drivers/i2c/busses/i2c-imx.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Marco Felsch July 12, 2022, 8:47 a.m. UTC | #1
Hi Francesco,

On 22-07-12, Francesco Dolcini wrote:
> From: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
> 
> Set the i2c_adapter retries field to a sensible value. This allows
> the i2c core to retry master_xfer()/master_xfer_atomic() when it
> returns -EAGAIN. Currently the i2c-imx driver returns -EAGAIN only
> on Tx arbitration failure (I2SR_IAL).
> 
> Signed-off-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
> ---
>  drivers/i2c/busses/i2c-imx.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
> index e9e2db68b9fb..26738e713c94 100644
> --- a/drivers/i2c/busses/i2c-imx.c
> +++ b/drivers/i2c/busses/i2c-imx.c
> @@ -54,6 +54,7 @@
>  #define DRIVER_NAME "imx-i2c"
>  
>  #define I2C_IMX_CHECK_DELAY 30000 /* Time to check for bus idle, in NS */
> +#define I2C_IMX_MAX_RETRIES 3     /* Retries on arbitration loss */

Just one question: Why 3 and should we document this within the commit
message?

Regards,
  Marco

>  /*
>   * Enable DMA if transfer byte size is bigger than this threshold.
> @@ -1477,6 +1478,7 @@ static int i2c_imx_probe(struct platform_device *pdev)
>  	i2c_imx->adapter.dev.parent	= &pdev->dev;
>  	i2c_imx->adapter.nr		= pdev->id;
>  	i2c_imx->adapter.dev.of_node	= pdev->dev.of_node;
> +	i2c_imx->adapter.retries	= I2C_IMX_MAX_RETRIES;
>  	i2c_imx->base			= base;
>  	ACPI_COMPANION_SET(&i2c_imx->adapter.dev, ACPI_COMPANION(&pdev->dev));
>  
> -- 
> 2.25.1
> 
> 
>
Uwe Kleine-König July 12, 2022, 9:05 a.m. UTC | #2
On Tue, Jul 12, 2022 at 10:24:15AM +0200, Francesco Dolcini wrote:
> From: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
> 
> Set the i2c_adapter retries field to a sensible value. This allows
> the i2c core to retry master_xfer()/master_xfer_atomic() when it
> returns -EAGAIN. Currently the i2c-imx driver returns -EAGAIN only
> on Tx arbitration failure (I2SR_IAL).
> 
> Signed-off-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>

Additional to Marco's questions:

 - I wouldn't have added a define for the constant as IMHO

	i2c_imx->adapter.retries = 3;

   is expressive enough. Thoughts?

 - Is it right this is a driver specific setting? I would have expected
   that this setting is in the realm of the i2c core. Grepping through
   the drivers (git grep \\\<retries\\s\*= drivers/i2c/) shows a wide
   variety between 0 and 5. (For some drivers you have do some deeper
   checking than that grep because driver authors chose to use a define
   for that; see above.)

 - In which situations does this help? Please mention these in the
   commit log.

Best regards
Uwe
Francesco Dolcini July 12, 2022, 9:14 a.m. UTC | #3
Hello Marco

On Tue, Jul 12, 2022 at 10:47:16AM +0200, Marco Felsch wrote:
> On 22-07-12, Francesco Dolcini wrote:
> > From: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
> > 
> > Set the i2c_adapter retries field to a sensible value. This allows
> > the i2c core to retry master_xfer()/master_xfer_atomic() when it
> > returns -EAGAIN. Currently the i2c-imx driver returns -EAGAIN only
> > on Tx arbitration failure (I2SR_IAL).
> > 
> > Signed-off-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
> > Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
> > ---
> >  drivers/i2c/busses/i2c-imx.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
> > index e9e2db68b9fb..26738e713c94 100644
> > --- a/drivers/i2c/busses/i2c-imx.c
> > +++ b/drivers/i2c/busses/i2c-imx.c
> > @@ -54,6 +54,7 @@
> >  #define DRIVER_NAME "imx-i2c"
> >  
> >  #define I2C_IMX_CHECK_DELAY 30000 /* Time to check for bus idle, in NS */
> > +#define I2C_IMX_MAX_RETRIES 3     /* Retries on arbitration loss */
> 
> Just one question: Why 3 and should we document this within the commit
> message?

In our tests 3 seems big enough to solve some sporadic failure we
experienced, and small enough to not have any kind of drawback. This is
the meaning of "sensible" value I have in the commit message, other
drivers use the same value.

Francesco
Francesco Dolcini July 12, 2022, 10:05 a.m. UTC | #4
On Tue, Jul 12, 2022 at 11:05:14AM +0200, Uwe Kleine-König wrote:
>  - I wouldn't have added a define for the constant as IMHO
> 	i2c_imx->adapter.retries = 3;
>    is expressive enough. Thoughts?
Fine for me

>  - In which situations does this help? Please mention these in the
>    commit log.
I'll do

Francesco
Wolfram Sang July 12, 2022, 11:32 a.m. UTC | #5
> > +#define I2C_IMX_MAX_RETRIES 3     /* Retries on arbitration loss */
> 
> Just one question: Why 3 and should we document this within the commit
> message?

Note that you can always change this from userspace via i2c-dev and the
I2C_RETRIES ioctl. So, it is really about a default value here.
Marco Felsch July 12, 2022, 12:02 p.m. UTC | #6
On 22-07-12, Wolfram Sang wrote:
> 
> > > +#define I2C_IMX_MAX_RETRIES 3     /* Retries on arbitration loss */
> > 
> > Just one question: Why 3 and should we document this within the commit
> > message?
> 
> Note that you can always change this from userspace via i2c-dev and the
> I2C_RETRIES ioctl. So, it is really about a default value here.

Ah okay, this makes now a bit more sense. So the message should point
out to set it to a more reasonable default value and that 3 looks like
common sense for most of the drivers if I got this correctly. If this is
correct, would it make more sense to set it within the core?

Regards,
  Marco
Francesco Dolcini July 13, 2022, 11:57 a.m. UTC | #7
+ oleksandr.suvorov@foundries.io

Hello all,

On Tue, Jul 12, 2022 at 12:05:04PM +0200, Francesco Dolcini wrote:
> On Tue, Jul 12, 2022 at 11:05:14AM +0200, Uwe Kleine-König wrote:
> > In which situations does this help? Please mention these in the
> > commit log.
> I'll do

I did some investigation on this, unfortunately we have this change
laying around since 1 year, it was written by Oleksandr, and in the
meantime he moved to a new company. I added him to this email thread, so
he can comment in case he remembers more.

We introduced this change while working on OV5640 camera sensor on an
apalis-imx6q evaluation board, without this change we had some sporadic
i2c communication issues. Unfortunately I do not have any better
details.

To me looks like having some (3? 5?) retry as a default is somehow
more reasonable than to never retry, not sure if this should be
implemented as a default for all the i2c adapters. From what I was able
to see that would not be a trivial change (the retry parameter is coming
from the i2c_imx driver, there is no obvious way to have a default in
the i2c core).

Would it work for you to keep the change as it is (just getting rid
of the useless define) and add a little bit more blurb to the commit
message to include the various comments collected so far?

Francesco
Oleksij Rempel July 13, 2022, 1:24 p.m. UTC | #8
On Wed, Jul 13, 2022 at 01:57:50PM +0200, Francesco Dolcini wrote:
> + oleksandr.suvorov@foundries.io
> 
> Hello all,
> 
> On Tue, Jul 12, 2022 at 12:05:04PM +0200, Francesco Dolcini wrote:
> > On Tue, Jul 12, 2022 at 11:05:14AM +0200, Uwe Kleine-König wrote:
> > > In which situations does this help? Please mention these in the
> > > commit log.
> > I'll do
> 
> I did some investigation on this, unfortunately we have this change
> laying around since 1 year, it was written by Oleksandr, and in the
> meantime he moved to a new company. I added him to this email thread, so
> he can comment in case he remembers more.
> 
> We introduced this change while working on OV5640 camera sensor on an
> apalis-imx6q evaluation board, without this change we had some sporadic
> i2c communication issues. Unfortunately I do not have any better
> details.
> 
> To me looks like having some (3? 5?) retry as a default is somehow
> more reasonable than to never retry, not sure if this should be
> implemented as a default for all the i2c adapters. From what I was able
> to see that would not be a trivial change (the retry parameter is coming
> from the i2c_imx driver, there is no obvious way to have a default in
> the i2c core).
> 
> Would it work for you to keep the change as it is (just getting rid
> of the useless define) and add a little bit more blurb to the commit
> message to include the various comments collected so far?

I assume, it is related to reset time or other reason where the camera
is not responding. In this case, amount of retries would depend on I2C
CLK speed and host CPU speed.

May be it is worth to investigate real issue and potentially fix camera driver?

Regards,
Oleksij
Francesco Dolcini July 13, 2022, 1:43 p.m. UTC | #9
On Wed, Jul 13, 2022 at 03:24:37PM +0200, Oleksij Rempel wrote:
> On Wed, Jul 13, 2022 at 01:57:50PM +0200, Francesco Dolcini wrote:
> > + oleksandr.suvorov@foundries.io
> > 
> > Hello all,
> > 
> > On Tue, Jul 12, 2022 at 12:05:04PM +0200, Francesco Dolcini wrote:
> > > On Tue, Jul 12, 2022 at 11:05:14AM +0200, Uwe Kleine-König wrote:
> > > > In which situations does this help? Please mention these in the
> > > > commit log.
> > > I'll do
> > 
> > I did some investigation on this, unfortunately we have this change
> > laying around since 1 year, it was written by Oleksandr, and in the
> > meantime he moved to a new company. I added him to this email thread, so
> > he can comment in case he remembers more.
> > 
> > We introduced this change while working on OV5640 camera sensor on an
> > apalis-imx6q evaluation board, without this change we had some sporadic
> > i2c communication issues. Unfortunately I do not have any better
> > details.
> > 
> > To me looks like having some (3? 5?) retry as a default is somehow
> > more reasonable than to never retry, not sure if this should be
> > implemented as a default for all the i2c adapters. From what I was able
> > to see that would not be a trivial change (the retry parameter is coming
> > from the i2c_imx driver, there is no obvious way to have a default in
> > the i2c core).
> > 
> > Would it work for you to keep the change as it is (just getting rid
> > of the useless define) and add a little bit more blurb to the commit
> > message to include the various comments collected so far?
> 
> I assume, it is related to reset time or other reason where the camera
> is not responding. In this case, amount of retries would depend on I2C
> CLK speed and host CPU speed.
> 

The retry on the I2C IMX driver would trigger only on tx arbitration
failure, that would be the SDA being tied low by the slave in an
unexpected moment, correct? If the camera does not respond it will just
not ack the transaction and that would not be recovered by the retry
in this change.

Can this just a layout/cabling issue with some noise on the SDA line? We
are talking about somehow long board to board cables with various
signals on it. This is an issue that we had for sure in the past,
however I do have record of this only on a different camera.

Francesco
Oleksij Rempel July 13, 2022, 3:57 p.m. UTC | #10
On Wed, Jul 13, 2022 at 03:43:29PM +0200, Francesco Dolcini wrote:
> On Wed, Jul 13, 2022 at 03:24:37PM +0200, Oleksij Rempel wrote:
> > On Wed, Jul 13, 2022 at 01:57:50PM +0200, Francesco Dolcini wrote:
> > > + oleksandr.suvorov@foundries.io
> > > 
> > > Hello all,
> > > 
> > > On Tue, Jul 12, 2022 at 12:05:04PM +0200, Francesco Dolcini wrote:
> > > > On Tue, Jul 12, 2022 at 11:05:14AM +0200, Uwe Kleine-König wrote:
> > > > > In which situations does this help? Please mention these in the
> > > > > commit log.
> > > > I'll do
> > > 
> > > I did some investigation on this, unfortunately we have this change
> > > laying around since 1 year, it was written by Oleksandr, and in the
> > > meantime he moved to a new company. I added him to this email thread, so
> > > he can comment in case he remembers more.
> > > 
> > > We introduced this change while working on OV5640 camera sensor on an
> > > apalis-imx6q evaluation board, without this change we had some sporadic
> > > i2c communication issues. Unfortunately I do not have any better
> > > details.
> > > 
> > > To me looks like having some (3? 5?) retry as a default is somehow
> > > more reasonable than to never retry, not sure if this should be
> > > implemented as a default for all the i2c adapters. From what I was able
> > > to see that would not be a trivial change (the retry parameter is coming
> > > from the i2c_imx driver, there is no obvious way to have a default in
> > > the i2c core).
> > > 
> > > Would it work for you to keep the change as it is (just getting rid
> > > of the useless define) and add a little bit more blurb to the commit
> > > message to include the various comments collected so far?
> > 
> > I assume, it is related to reset time or other reason where the camera
> > is not responding. In this case, amount of retries would depend on I2C
> > CLK speed and host CPU speed.
> > 
> 
> The retry on the I2C IMX driver would trigger only on tx arbitration
> failure, that would be the SDA being tied low by the slave in an
> unexpected moment, correct? 

If it is the case, it is better to understand why. Are there some
special timing requirements?

> If the camera does not respond it will just
> not ack the transaction and that would not be recovered by the retry
> in this change.
> 
> Can this just a layout/cabling issue with some noise on the SDA line? We
> are talking about somehow long board to board cables with various
> signals on it. This is an issue that we had for sure in the past,
> however I do have record of this only on a different camera.

If it is cabling issue, then I would take a look at pinmux
configuration. If it is so noisy, that some errors are expected, then it would
affect camera configuration as well. I mean, system is potentially
writing trash to the config register.
Francesco Dolcini July 13, 2022, 8:25 p.m. UTC | #11
Hello Oleksij,

On Wed, Jul 13, 2022 at 05:57:23PM +0200, Oleksij Rempel wrote:
> On Wed, Jul 13, 2022 at 03:43:29PM +0200, Francesco Dolcini wrote:
> > On Wed, Jul 13, 2022 at 03:24:37PM +0200, Oleksij Rempel wrote:
> > > On Wed, Jul 13, 2022 at 01:57:50PM +0200, Francesco Dolcini wrote:
> > > > + oleksandr.suvorov@foundries.io
> > > > 
> > > > Hello all,
> > > > 
> > > > On Tue, Jul 12, 2022 at 12:05:04PM +0200, Francesco Dolcini wrote:
> > > > > On Tue, Jul 12, 2022 at 11:05:14AM +0200, Uwe Kleine-König wrote:
> > > > > > In which situations does this help? Please mention these in the
> > > > > > commit log.
> > > > > I'll do
> > > > 
> > > > I did some investigation on this, unfortunately we have this change
> > > > laying around since 1 year, it was written by Oleksandr, and in the
> > > > meantime he moved to a new company. I added him to this email thread, so
> > > > he can comment in case he remembers more.
> > > > 
> > > > We introduced this change while working on OV5640 camera sensor on an
> > > > apalis-imx6q evaluation board, without this change we had some sporadic
> > > > i2c communication issues. Unfortunately I do not have any better
> > > > details.
> > > > 
> > > > To me looks like having some (3? 5?) retry as a default is somehow
> > > > more reasonable than to never retry, not sure if this should be
> > > > implemented as a default for all the i2c adapters. From what I was able
> > > > to see that would not be a trivial change (the retry parameter is coming
> > > > from the i2c_imx driver, there is no obvious way to have a default in
> > > > the i2c core).
> > > > 
> > > > Would it work for you to keep the change as it is (just getting rid
> > > > of the useless define) and add a little bit more blurb to the commit
> > > > message to include the various comments collected so far?
> > > 
> > > I assume, it is related to reset time or other reason where the camera
> > > is not responding. In this case, amount of retries would depend on I2C
> > > CLK speed and host CPU speed.
> > > 
> > 
> > The retry on the I2C IMX driver would trigger only on tx arbitration
> > failure, that would be the SDA being tied low by the slave in an
> > unexpected moment, correct? 
> 
> If it is the case, it is better to understand why. Are there some
> special timing requirements?
> 
> > If the camera does not respond it will just
> > not ack the transaction and that would not be recovered by the retry
> > in this change.
> > 
> > Can this just a layout/cabling issue with some noise on the SDA line? We
> > are talking about somehow long board to board cables with various
> > signals on it. This is an issue that we had for sure in the past,
> > however I do have record of this only on a different camera.
> 
> If it is cabling issue, then I would take a look at pinmux
> configuration. If it is so noisy, that some errors are expected, then it would
> affect camera configuration as well. I mean, system is potentially
> writing trash to the config register.

I do not think that this is possible in the way you defined, if SDA is
low when the master is driving it high the master will just stop
transmitting and an arbitration lost interrupt is raised. I guess this
is just the same for any I2C controller, anyway is defined in
`35.7.4 I2C Status Register (I2Cx_I2SR)` in the i.MX6QDL reference manual.

I guess it would be still theoretically possible that the master read
garbage from the slave, I'm not aware of any mechanism to avoid it.

Said that I do not have more details, for some unfortunate reason this
change was laying in our downstream kernel since too long.

Anyway, let's look at this in a different way, from what I was able to
understand digging on this topic retrying on I2C arbitration lost is
just the normal thing to do and the I2C core provides support for this
since ever, the comment in i2c-core is
/* Retry automatically on arbitration loss */.

Setting retries to something like 3 or 5 is just very common, various
drivers have this value in the first commit or had it added later on (as
Uwe already commented)

To me it seems like the most sensible thing to do, is there any reason
why the i2c_imx driver should not do it?

Francesco
Oleksij Rempel July 14, 2022, 7:07 a.m. UTC | #12
Hello Francesco,

On Wed, Jul 13, 2022 at 10:25:41PM +0200, Francesco Dolcini wrote:
> Hello Oleksij,
> 
> On Wed, Jul 13, 2022 at 05:57:23PM +0200, Oleksij Rempel wrote:
> > On Wed, Jul 13, 2022 at 03:43:29PM +0200, Francesco Dolcini wrote:
> > > On Wed, Jul 13, 2022 at 03:24:37PM +0200, Oleksij Rempel wrote:
> > > > On Wed, Jul 13, 2022 at 01:57:50PM +0200, Francesco Dolcini wrote:
> > > > > + oleksandr.suvorov@foundries.io
> > > > > 
> > > > > Hello all,
> > > > > 
> > > > > On Tue, Jul 12, 2022 at 12:05:04PM +0200, Francesco Dolcini wrote:
> > > > > > On Tue, Jul 12, 2022 at 11:05:14AM +0200, Uwe Kleine-König wrote:
> > > > > > > In which situations does this help? Please mention these in the
> > > > > > > commit log.
> > > > > > I'll do
> > > > > 
> > > > > I did some investigation on this, unfortunately we have this change
> > > > > laying around since 1 year, it was written by Oleksandr, and in the
> > > > > meantime he moved to a new company. I added him to this email thread, so
> > > > > he can comment in case he remembers more.
> > > > > 
> > > > > We introduced this change while working on OV5640 camera sensor on an
> > > > > apalis-imx6q evaluation board, without this change we had some sporadic
> > > > > i2c communication issues. Unfortunately I do not have any better
> > > > > details.
> > > > > 
> > > > > To me looks like having some (3? 5?) retry as a default is somehow
> > > > > more reasonable than to never retry, not sure if this should be
> > > > > implemented as a default for all the i2c adapters. From what I was able
> > > > > to see that would not be a trivial change (the retry parameter is coming
> > > > > from the i2c_imx driver, there is no obvious way to have a default in
> > > > > the i2c core).
> > > > > 
> > > > > Would it work for you to keep the change as it is (just getting rid
> > > > > of the useless define) and add a little bit more blurb to the commit
> > > > > message to include the various comments collected so far?
> > > > 
> > > > I assume, it is related to reset time or other reason where the camera
> > > > is not responding. In this case, amount of retries would depend on I2C
> > > > CLK speed and host CPU speed.
> > > > 
> > > 
> > > The retry on the I2C IMX driver would trigger only on tx arbitration
> > > failure, that would be the SDA being tied low by the slave in an
> > > unexpected moment, correct? 
> > 
> > If it is the case, it is better to understand why. Are there some
> > special timing requirements?
> > 
> > > If the camera does not respond it will just
> > > not ack the transaction and that would not be recovered by the retry
> > > in this change.
> > > 
> > > Can this just a layout/cabling issue with some noise on the SDA line? We
> > > are talking about somehow long board to board cables with various
> > > signals on it. This is an issue that we had for sure in the past,
> > > however I do have record of this only on a different camera.
> > 
> > If it is cabling issue, then I would take a look at pinmux
> > configuration. If it is so noisy, that some errors are expected, then it would
> > affect camera configuration as well. I mean, system is potentially
> > writing trash to the config register.
> 
> I do not think that this is possible in the way you defined, if SDA is
> low when the master is driving it high the master will just stop
> transmitting and an arbitration lost interrupt is raised. I guess this
> is just the same for any I2C controller, anyway is defined in
> `35.7.4 I2C Status Register (I2Cx_I2SR)` in the i.MX6QDL reference manual.
> 
> I guess it would be still theoretically possible that the master read
> garbage from the slave, I'm not aware of any mechanism to avoid it.
> 
> Said that I do not have more details, for some unfortunate reason this
> change was laying in our downstream kernel since too long.
> 
> Anyway, let's look at this in a different way, from what I was able to
> understand digging on this topic retrying on I2C arbitration lost is
> just the normal thing to do and the I2C core provides support for this
> since ever, the comment in i2c-core is
> /* Retry automatically on arbitration loss */.
> 
> Setting retries to something like 3 or 5 is just very common, various
> drivers have this value in the first commit or had it added later on (as
> Uwe already commented)
> 
> To me it seems like the most sensible thing to do, is there any reason
> why the i2c_imx driver should not do it?

Here we go:
https://www.i2c-bus.org/i2c-primer/analysing-obscure-problems/master-reports-arbitration-lost/
"Possible reasons are the same as the ones described in “No Acknowledge
From I2C Slave”"

So, lets see what it can be:
https://www.i2c-bus.org/i2c-primer/analysing-obscure-problems/no-acknowledge-from-i2c-slave/
"Possible reasons are:

- The I2C slave could not correctly interpret the data on SDA because the SDA
  high or low-level voltages do not reach its appropriate input
  thresholds.
- The I2C slave missed an SCL cycle because the SCL high or low-level voltages
  do not reach its appropriate input thresholds.
- The I2C slave accidently interpreted a spike etc. as an SCL cycle.

With adequate serial resistors between master and slave, an analog shot
of the signals at the slave’s SDA and SCL pins provides a clue whether
the slave acknowledges and to which SCL clock pulse. The different SDA
low levels due to the serial resistor make it possible to distinguish
acknowledges from the slave from data bits from the master. "

I interpret it, that setting retry count on any non zero value is an
workaround for brocken circuit. It means, on HW development phase we
won't be able to detect HW issue, if retry count will be not 0.

IMHO, it board specific configuration and should not be set by driver.

Regards,
Oleksij
Francesco Dolcini July 14, 2022, 7:34 a.m. UTC | #13
Hello all,

On Thu, Jul 14, 2022 at 09:07:05AM +0200, Oleksij Rempel wrote:
> Hello Francesco,
> 
> On Wed, Jul 13, 2022 at 10:25:41PM +0200, Francesco Dolcini wrote:
> > Hello Oleksij,
> > 
> > On Wed, Jul 13, 2022 at 05:57:23PM +0200, Oleksij Rempel wrote:
> > > On Wed, Jul 13, 2022 at 03:43:29PM +0200, Francesco Dolcini wrote:
> > > > On Wed, Jul 13, 2022 at 03:24:37PM +0200, Oleksij Rempel wrote:
> > > > > On Wed, Jul 13, 2022 at 01:57:50PM +0200, Francesco Dolcini wrote:
> > > > > > + oleksandr.suvorov@foundries.io
> > > > > > 
> > > > > > Hello all,
> > > > > > 
> > > > > > On Tue, Jul 12, 2022 at 12:05:04PM +0200, Francesco Dolcini wrote:
> > > > > > > On Tue, Jul 12, 2022 at 11:05:14AM +0200, Uwe Kleine-König wrote:
> > > > > > > > In which situations does this help? Please mention these in the
> > > > > > > > commit log.
> > > > > > > I'll do
> > > > > > 
> > > > > > I did some investigation on this, unfortunately we have this change
> > > > > > laying around since 1 year, it was written by Oleksandr, and in the
> > > > > > meantime he moved to a new company. I added him to this email thread, so
> > > > > > he can comment in case he remembers more.
> > > > > > 
> > > > > > We introduced this change while working on OV5640 camera sensor on an
> > > > > > apalis-imx6q evaluation board, without this change we had some sporadic
> > > > > > i2c communication issues. Unfortunately I do not have any better
> > > > > > details.
> > > > > > 
> > > > > > To me looks like having some (3? 5?) retry as a default is somehow
> > > > > > more reasonable than to never retry, not sure if this should be
> > > > > > implemented as a default for all the i2c adapters. From what I was able
> > > > > > to see that would not be a trivial change (the retry parameter is coming
> > > > > > from the i2c_imx driver, there is no obvious way to have a default in
> > > > > > the i2c core).
> > > > > > 
> > > > > > Would it work for you to keep the change as it is (just getting rid
> > > > > > of the useless define) and add a little bit more blurb to the commit
> > > > > > message to include the various comments collected so far?
> > > > > 
> > > > > I assume, it is related to reset time or other reason where the camera
> > > > > is not responding. In this case, amount of retries would depend on I2C
> > > > > CLK speed and host CPU speed.
> > > > > 
> > > > 
> > > > The retry on the I2C IMX driver would trigger only on tx arbitration
> > > > failure, that would be the SDA being tied low by the slave in an
> > > > unexpected moment, correct? 
> > > 
> > > If it is the case, it is better to understand why. Are there some
> > > special timing requirements?
> > > 
> > > > If the camera does not respond it will just
> > > > not ack the transaction and that would not be recovered by the retry
> > > > in this change.
> > > > 
> > > > Can this just a layout/cabling issue with some noise on the SDA line? We
> > > > are talking about somehow long board to board cables with various
> > > > signals on it. This is an issue that we had for sure in the past,
> > > > however I do have record of this only on a different camera.
> > > 
> > > If it is cabling issue, then I would take a look at pinmux
> > > configuration. If it is so noisy, that some errors are expected, then it would
> > > affect camera configuration as well. I mean, system is potentially
> > > writing trash to the config register.
> > 
> > I do not think that this is possible in the way you defined, if SDA is
> > low when the master is driving it high the master will just stop
> > transmitting and an arbitration lost interrupt is raised. I guess this
> > is just the same for any I2C controller, anyway is defined in
> > `35.7.4 I2C Status Register (I2Cx_I2SR)` in the i.MX6QDL reference manual.
> > 
> > I guess it would be still theoretically possible that the master read
> > garbage from the slave, I'm not aware of any mechanism to avoid it.
> > 
> > Said that I do not have more details, for some unfortunate reason this
> > change was laying in our downstream kernel since too long.
> > 
> > Anyway, let's look at this in a different way, from what I was able to
> > understand digging on this topic retrying on I2C arbitration lost is
> > just the normal thing to do and the I2C core provides support for this
> > since ever, the comment in i2c-core is
> > /* Retry automatically on arbitration loss */.
> > 
> > Setting retries to something like 3 or 5 is just very common, various
> > drivers have this value in the first commit or had it added later on (as
> > Uwe already commented)
> > 
> > To me it seems like the most sensible thing to do, is there any reason
> > why the i2c_imx driver should not do it?
> 
> Here we go:
> https://www.i2c-bus.org/i2c-primer/analysing-obscure-problems/master-reports-arbitration-lost/
> "Possible reasons are the same as the ones described in “No Acknowledge
> From I2C Slave”"
> 
> So, lets see what it can be:
> https://www.i2c-bus.org/i2c-primer/analysing-obscure-problems/no-acknowledge-from-i2c-slave/
> "Possible reasons are:
> 
> - The I2C slave could not correctly interpret the data on SDA because the SDA
>   high or low-level voltages do not reach its appropriate input
>   thresholds.
> - The I2C slave missed an SCL cycle because the SCL high or low-level voltages
>   do not reach its appropriate input thresholds.
> - The I2C slave accidently interpreted a spike etc. as an SCL cycle.
> 
> With adequate serial resistors between master and slave, an analog shot
> of the signals at the slave’s SDA and SCL pins provides a clue whether
> the slave acknowledges and to which SCL clock pulse. The different SDA
> low levels due to the serial resistor make it possible to distinguish
> acknowledges from the slave from data bits from the master. "
> 
> I interpret it, that setting retry count on any non zero value is an
> workaround for brocken circuit. It means, on HW development phase we
> won't be able to detect HW issue, if retry count will be not 0.
> 
> IMHO, it board specific configuration and should not be set by driver.

Got your point, and in contrast to what you wrote, according to the I2C
spec, a master is required ("must") to restart the transaction in case
the arbitration is lost.

From I2C-bus specification and user manual [1], 3.1.8 Arbitration:

```
No information is lost during the arbitration process. A controller that loses the arbitration
can generate clock pulses until the end of the byte in which it loses the arbitration and
must restart its transaction when the bus is free.
```

In case there is a different view from the one Oleksij expressed here,
let me know and I can improve the commit message, otherwise I guess
we can just drop this change.

[1] https://www.nxp.com/docs/en/user-guide/UM10204.pdf

Francesco
Uwe Kleine-König July 15, 2022, 6:49 a.m. UTC | #14
Hello Francesco,

On Thu, Jul 14, 2022 at 09:34:08AM +0200, Francesco Dolcini wrote:
> On Thu, Jul 14, 2022 at 09:07:05AM +0200, Oleksij Rempel wrote:
> > On Wed, Jul 13, 2022 at 10:25:41PM +0200, Francesco Dolcini wrote:
> > > Hello Oleksij,
> > > 
> > > On Wed, Jul 13, 2022 at 05:57:23PM +0200, Oleksij Rempel wrote:
> > > > On Wed, Jul 13, 2022 at 03:43:29PM +0200, Francesco Dolcini wrote:
> > > > > On Wed, Jul 13, 2022 at 03:24:37PM +0200, Oleksij Rempel wrote:
> > > > > > On Wed, Jul 13, 2022 at 01:57:50PM +0200, Francesco Dolcini wrote:
> > > > > > > + oleksandr.suvorov@foundries.io
> > > > > > > 
> > > > > > > Hello all,
> > > > > > > 
> > > > > > > On Tue, Jul 12, 2022 at 12:05:04PM +0200, Francesco Dolcini wrote:
> > > > > > > > On Tue, Jul 12, 2022 at 11:05:14AM +0200, Uwe Kleine-König wrote:
> > > > > > > > > In which situations does this help? Please mention these in the
> > > > > > > > > commit log.
> > > > > > > > I'll do
> > > > > > > 
> > > > > > > I did some investigation on this, unfortunately we have this change
> > > > > > > laying around since 1 year, it was written by Oleksandr, and in the
> > > > > > > meantime he moved to a new company. I added him to this email thread, so
> > > > > > > he can comment in case he remembers more.
> > > > > > > 
> > > > > > > We introduced this change while working on OV5640 camera sensor on an
> > > > > > > apalis-imx6q evaluation board, without this change we had some sporadic
> > > > > > > i2c communication issues. Unfortunately I do not have any better
> > > > > > > details.
> > > > > > > 
> > > > > > > To me looks like having some (3? 5?) retry as a default is somehow
> > > > > > > more reasonable than to never retry, not sure if this should be
> > > > > > > implemented as a default for all the i2c adapters. From what I was able
> > > > > > > to see that would not be a trivial change (the retry parameter is coming
> > > > > > > from the i2c_imx driver, there is no obvious way to have a default in
> > > > > > > the i2c core).
> > > > > > > 
> > > > > > > Would it work for you to keep the change as it is (just getting rid
> > > > > > > of the useless define) and add a little bit more blurb to the commit
> > > > > > > message to include the various comments collected so far?
> > > > > > 
> > > > > > I assume, it is related to reset time or other reason where the camera
> > > > > > is not responding. In this case, amount of retries would depend on I2C
> > > > > > CLK speed and host CPU speed.
> > > > > > 
> > > > > 
> > > > > The retry on the I2C IMX driver would trigger only on tx arbitration
> > > > > failure, that would be the SDA being tied low by the slave in an
> > > > > unexpected moment, correct? 
> > > > 
> > > > If it is the case, it is better to understand why. Are there some
> > > > special timing requirements?
> > > > 
> > > > > If the camera does not respond it will just
> > > > > not ack the transaction and that would not be recovered by the retry
> > > > > in this change.
> > > > > 
> > > > > Can this just a layout/cabling issue with some noise on the SDA line? We
> > > > > are talking about somehow long board to board cables with various
> > > > > signals on it. This is an issue that we had for sure in the past,
> > > > > however I do have record of this only on a different camera.
> > > > 
> > > > If it is cabling issue, then I would take a look at pinmux
> > > > configuration. If it is so noisy, that some errors are expected, then it would
> > > > affect camera configuration as well. I mean, system is potentially
> > > > writing trash to the config register.
> > > 
> > > I do not think that this is possible in the way you defined, if SDA is
> > > low when the master is driving it high the master will just stop
> > > transmitting and an arbitration lost interrupt is raised. I guess this
> > > is just the same for any I2C controller, anyway is defined in
> > > `35.7.4 I2C Status Register (I2Cx_I2SR)` in the i.MX6QDL reference manual.
> > > 
> > > I guess it would be still theoretically possible that the master read
> > > garbage from the slave, I'm not aware of any mechanism to avoid it.
> > > 
> > > Said that I do not have more details, for some unfortunate reason this
> > > change was laying in our downstream kernel since too long.
> > > 
> > > Anyway, let's look at this in a different way, from what I was able to
> > > understand digging on this topic retrying on I2C arbitration lost is
> > > just the normal thing to do and the I2C core provides support for this
> > > since ever, the comment in i2c-core is
> > > /* Retry automatically on arbitration loss */.
> > > 
> > > Setting retries to something like 3 or 5 is just very common, various
> > > drivers have this value in the first commit or had it added later on (as
> > > Uwe already commented)
> > > 
> > > To me it seems like the most sensible thing to do, is there any reason
> > > why the i2c_imx driver should not do it?
> > 
> > Here we go:
> > https://www.i2c-bus.org/i2c-primer/analysing-obscure-problems/master-reports-arbitration-lost/
> > "Possible reasons are the same as the ones described in “No Acknowledge
> > From I2C Slave”"
> > 
> > So, lets see what it can be:
> > https://www.i2c-bus.org/i2c-primer/analysing-obscure-problems/no-acknowledge-from-i2c-slave/
> > "Possible reasons are:
> > 
> > - The I2C slave could not correctly interpret the data on SDA because the SDA
> >   high or low-level voltages do not reach its appropriate input
> >   thresholds.
> > - The I2C slave missed an SCL cycle because the SCL high or low-level voltages
> >   do not reach its appropriate input thresholds.
> > - The I2C slave accidently interpreted a spike etc. as an SCL cycle.
> > 
> > With adequate serial resistors between master and slave, an analog shot
> > of the signals at the slave’s SDA and SCL pins provides a clue whether
> > the slave acknowledges and to which SCL clock pulse. The different SDA
> > low levels due to the serial resistor make it possible to distinguish
> > acknowledges from the slave from data bits from the master. "
> > 
> > I interpret it, that setting retry count on any non zero value is an
> > workaround for brocken circuit. It means, on HW development phase we
> > won't be able to detect HW issue, if retry count will be not 0.
> > 
> > IMHO, it board specific configuration and should not be set by driver.
> 
> Got your point, and in contrast to what you wrote, according to the I2C
> spec, a master is required ("must") to restart the transaction in case
> the arbitration is lost.
> 
> From I2C-bus specification and user manual [1], 3.1.8 Arbitration:
> 
> ```
> No information is lost during the arbitration process. A controller that loses the arbitration
> can generate clock pulses until the end of the byte in which it loses the arbitration and
> must restart its transaction when the bus is free.
> ```

I'd interpret that differently. IMHO this is not "After a arbitration
loss it's obligatory that the (previously aborted) message is repeated",
but more "On an arbitration loss, the master's transfer had no effect on
the slave, so if the message is still to be sent, it must be repeated
from its very beginning."

Otherwise I'm on Oleksij's side: Unless you have a multi-controller
setup an arbitration loss is a problem with the signal integrity. And
increasing the retry count is only a work around.

Best regards
Uwe
Francesco Dolcini July 15, 2022, 7:24 a.m. UTC | #15
On Fri, Jul 15, 2022 at 08:49:31AM +0200, Uwe Kleine-König wrote:
> Unless you have a multi-controller setup an arbitration loss is a
> problem with the signal integrity. And increasing the retry count is
> only a work around.
Fair enough.

Regarding your comment, I was wondering:

1. There is currently no easy way to enable the retry on arbitration
lost at system level in case the I2C bus is multimaster. Unless we
consider setting the retries value using I2C_RETRIES ioctl the way to
go. Not my specific problem, but I wonder if this situation is relevant
in practice.

2. We do have an I2C bus recovery mechanism implemented to recover from
a stuck bus, isn't this a workaround as retrying on arbitration lost in a
non-multimaster setup?

I guess this discussion is no longer about the original patch I sent,
but from my point of view still interesting, real-life I2C is not
perfect sometimes ... 

Francesco
Uwe Kleine-König July 15, 2022, 8:34 a.m. UTC | #16
Hello Francesco,

On Fri, Jul 15, 2022 at 09:24:32AM +0200, Francesco Dolcini wrote:
> On Fri, Jul 15, 2022 at 08:49:31AM +0200, Uwe Kleine-König wrote:
> > Unless you have a multi-controller setup an arbitration loss is a
> > problem with the signal integrity. And increasing the retry count is
> > only a work around.
> Fair enough.
> 
> Regarding your comment, I was wondering:
> 
> 1. There is currently no easy way to enable the retry on arbitration
> lost at system level in case the I2C bus is multimaster. Unless we
> consider setting the retries value using I2C_RETRIES ioctl the way to
> go. Not my specific problem, but I wonder if this situation is relevant
> in practice.

I think multi-controller is quite unusual. I never saw one, but that
might just be my bubble.

> 2. We do have an I2C bus recovery mechanism implemented to recover from
> a stuck bus, isn't this a workaround as retrying on arbitration lost in a
> non-multimaster setup?

It depends, if you do it to recover after a transfer failure, it would
indeed consider it a work around covering the real problem. But
sometimes there is no practical way around such work arounds. I happens
from time to time that the reason for problem is known, but fixing the
hardware is no option, then you need such workrounds. (This applies to
both, retrying the transfers and resetting the bus.)

Note that even without signal integrity problems an i2c bus can get
stuck. E.g. if the controller resets in the middle of a transfer.

> I guess this discussion is no longer about the original patch I sent,
> but from my point of view still interesting, real-life I2C is not
> perfect sometimes ... 

:-)

Best regards
Uwe
Francesco Dolcini July 15, 2022, 11:45 a.m. UTC | #17
On Fri, Jul 15, 2022 at 10:34:00AM +0200, Uwe Kleine-König wrote:
> On Fri, Jul 15, 2022 at 09:24:32AM +0200, Francesco Dolcini wrote:
> > On Fri, Jul 15, 2022 at 08:49:31AM +0200, Uwe Kleine-König wrote:
> > > Unless you have a multi-controller setup an arbitration loss is a
> > > problem with the signal integrity. And increasing the retry count is
> > > only a work around.
> > Fair enough.
> > 
> > 2. We do have an I2C bus recovery mechanism implemented to recover from
> > a stuck bus, isn't this a workaround as retrying on arbitration lost in a
> > non-multimaster setup?
> 
> It depends, if you do it to recover after a transfer failure, it would
> indeed consider it a work around covering the real problem. But
> sometimes there is no practical way around such work arounds. I happens
> from time to time that the reason for problem is known, but fixing the
> hardware is no option, then you need such workrounds. (This applies to
> both, retrying the transfers and resetting the bus.)

And here I am back to square one, to the original reason we did this
change for :-) I feel like we do not want to merge this now because I
cannot prove/explain the exact details of the issue this is trying to
workaround. Not because _is_ a workaround :-).

Fair enough, thanks Oleksij and Uwe for the valuable discussion.

Francesco
diff mbox series

Patch

diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index e9e2db68b9fb..26738e713c94 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -54,6 +54,7 @@ 
 #define DRIVER_NAME "imx-i2c"
 
 #define I2C_IMX_CHECK_DELAY 30000 /* Time to check for bus idle, in NS */
+#define I2C_IMX_MAX_RETRIES 3     /* Retries on arbitration loss */
 
 /*
  * Enable DMA if transfer byte size is bigger than this threshold.
@@ -1477,6 +1478,7 @@  static int i2c_imx_probe(struct platform_device *pdev)
 	i2c_imx->adapter.dev.parent	= &pdev->dev;
 	i2c_imx->adapter.nr		= pdev->id;
 	i2c_imx->adapter.dev.of_node	= pdev->dev.of_node;
+	i2c_imx->adapter.retries	= I2C_IMX_MAX_RETRIES;
 	i2c_imx->base			= base;
 	ACPI_COMPANION_SET(&i2c_imx->adapter.dev, ACPI_COMPANION(&pdev->dev));