diff mbox series

[v4,2/2] iio: pressure: dps310: Reset chip if MEAS_CFG is corrupt

Message ID 20220809211246.251006-3-eajames@linux.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series iio: pressure: dps310: Reset chip if MEAS_CFG is corrupt | expand

Commit Message

Eddie James Aug. 9, 2022, 9:12 p.m. UTC
Corruption of the MEAS_CFG register has been observed soon after
system boot. In order to recover this scenario, check MEAS_CFG if
measurement isn't ready, and if it's incorrect, reset the DPS310
and execute the startup procedure.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
---
 drivers/iio/pressure/dps310.c | 88 +++++++++++++++++++++++++++++------
 1 file changed, 73 insertions(+), 15 deletions(-)

Comments

Andy Shevchenko Aug. 12, 2022, 10:13 p.m. UTC | #1
On Wed, Aug 10, 2022 at 12:12 AM Eddie James <eajames@linux.ibm.com> wrote:
>
> Corruption of the MEAS_CFG register has been observed soon after
> system boot. In order to recover this scenario, check MEAS_CFG if
> measurement isn't ready, and if it's incorrect, reset the DPS310
> and execute the startup procedure.

Looks like both patches miss the Fixes tag. Can you add them?

...

> +/*
> + * Called with lock held. Returns a negative value on error, a positive value
> + * when the device is not ready, and zero when the device is ready.
> + */
> +static int dps310_check_reset_meas_cfg(struct dps310_data *data, int ready_bit)
> +{
> +       int meas_cfg;

> +       int rc = regmap_read(data->regmap, DPS310_MEAS_CFG, &meas_cfg);
> +
> +       if (rc < 0)
> +               return rc;

Please, split definition and assignment.

> +       /* Device is ready, proceed to measurement */
> +       if (meas_cfg & ready_bit)
> +               return 0;
> +
> +       /* Device is OK, just not ready */
> +       if (meas_cfg & (DPS310_PRS_EN | DPS310_TEMP_EN | DPS310_BACKGROUND))
> +               return 1;
> +
> +       /* DPS310 register state corrupt, better start from scratch */
> +       rc = regmap_write(data->regmap, DPS310_RESET, DPS310_RESET_MAGIC);
> +       if (rc < 0)
> +               return rc;
> +
> +       /* Wait for device chip access: 2.5ms in specification */
> +       usleep_range(2500, 12000);
> +
> +       /* Reinitialize the chip */
> +       rc = dps310_startup(data);
> +       if (rc)
> +               return rc;
> +
> +       dev_info(&data->client->dev,
> +                "recovered from corrupted MEAS_CFG=%02x\n", meas_cfg);
> +       return 1;
> +}
> +
>  static int dps310_read_pres_raw(struct dps310_data *data)
>  {
>         int rc;
> @@ -405,16 +443,26 @@ static int dps310_read_pres_raw(struct dps310_data *data)
>         if (mutex_lock_interruptible(&data->lock))
>                 return -EINTR;
>
> -       rate = dps310_get_pres_samp_freq(data);
> -       timeout = DPS310_POLL_TIMEOUT_US(rate);
> -
> -       /* Poll for sensor readiness; base the timeout upon the sample rate. */
> -       rc = regmap_read_poll_timeout(data->regmap, DPS310_MEAS_CFG, ready,
> -                                     ready & DPS310_PRS_RDY,
> -                                     DPS310_POLL_SLEEP_US(timeout), timeout);
> -       if (rc)
> +       rc = dps310_check_reset_meas_cfg(data, DPS310_PRS_RDY);
> +       if (rc < 0)
>                 goto done;
>
> +       if (rc > 0) {
> +               rate = dps310_get_pres_samp_freq(data);
> +               timeout = DPS310_POLL_TIMEOUT_US(rate);
> +
> +               /*
> +                * Poll for sensor readiness; base the timeout upon the sample
> +                * rate.
> +                */
> +               rc = regmap_read_poll_timeout(data->regmap, DPS310_MEAS_CFG,
> +                                             ready, ready & DPS310_PRS_RDY,
> +                                             DPS310_POLL_SLEEP_US(timeout),
> +                                             timeout);
> +               if (rc)
> +                       goto done;
> +       }

If you split the condition body to a helper, it can be rewritten like
(also note special definition for positive returned numbers):

  rc = ..._reset_meas_cfg(...);
  if (rc  == DPS310_MEAS_NOT_READY)
    rc = ..._new_helper_func(...);
  if (rc)
    goto done;

and looking at this it might be worth considering calling that
conditional in the middle in the _reset_meas_cfg(), so the latter will
return either 0 or negative error code.

> +       rc = dps310_check_reset_meas_cfg(data, DPS310_TMP_RDY);
>         if (rc < 0)
>                 goto done;
>
> +       if (rc > 0) {
> +               rate = dps310_get_temp_samp_freq(data);

Okay, I see this function is different, but still you may realize a
helper from below and something like above suggestion can still be
achieved.

> +               timeout = DPS310_POLL_TIMEOUT_US(rate);
> +
> +               /*
> +                * Poll for sensor readiness; base the timeout upon the sample
> +                * rate.
> +                */
> +               rc = regmap_read_poll_timeout(data->regmap, DPS310_MEAS_CFG,
> +                                             ready, ready & DPS310_TMP_RDY,
> +                                             DPS310_POLL_SLEEP_US(timeout),
> +                                             timeout);
> +               if (rc < 0)

Why out of a sudden ' < 0'?

> +                       goto done;
> +       }

As per above.

>         rc = dps310_read_temp_ready(data);
Eddie James Aug. 15, 2022, 1:59 p.m. UTC | #2
On 8/12/22 17:13, Andy Shevchenko wrote:
> On Wed, Aug 10, 2022 at 12:12 AM Eddie James <eajames@linux.ibm.com> wrote:
>> Corruption of the MEAS_CFG register has been observed soon after
>> system boot. In order to recover this scenario, check MEAS_CFG if
>> measurement isn't ready, and if it's incorrect, reset the DPS310
>> and execute the startup procedure.
> Looks like both patches miss the Fixes tag. Can you add them?


Well this isn't really a software fix - there's no identifiable bug in 
the driver. Just trying to recover the chip in this observed mystery 
scenario.


>
> ...
>
>> +/*
>> + * Called with lock held. Returns a negative value on error, a positive value
>> + * when the device is not ready, and zero when the device is ready.
>> + */
>> +static int dps310_check_reset_meas_cfg(struct dps310_data *data, int ready_bit)
>> +{
>> +       int meas_cfg;
>> +       int rc = regmap_read(data->regmap, DPS310_MEAS_CFG, &meas_cfg);
>> +
>> +       if (rc < 0)
>> +               return rc;
> Please, split definition and assignment.


Ack.


>
>> +       /* Device is ready, proceed to measurement */
>> +       if (meas_cfg & ready_bit)
>> +               return 0;
>> +
>> +       /* Device is OK, just not ready */
>> +       if (meas_cfg & (DPS310_PRS_EN | DPS310_TEMP_EN | DPS310_BACKGROUND))
>> +               return 1;
>> +
>> +       /* DPS310 register state corrupt, better start from scratch */
>> +       rc = regmap_write(data->regmap, DPS310_RESET, DPS310_RESET_MAGIC);
>> +       if (rc < 0)
>> +               return rc;
>> +
>> +       /* Wait for device chip access: 2.5ms in specification */
>> +       usleep_range(2500, 12000);
>> +
>> +       /* Reinitialize the chip */
>> +       rc = dps310_startup(data);
>> +       if (rc)
>> +               return rc;
>> +
>> +       dev_info(&data->client->dev,
>> +                "recovered from corrupted MEAS_CFG=%02x\n", meas_cfg);
>> +       return 1;
>> +}
>> +
>>   static int dps310_read_pres_raw(struct dps310_data *data)
>>   {
>>          int rc;
>> @@ -405,16 +443,26 @@ static int dps310_read_pres_raw(struct dps310_data *data)
>>          if (mutex_lock_interruptible(&data->lock))
>>                  return -EINTR;
>>
>> -       rate = dps310_get_pres_samp_freq(data);
>> -       timeout = DPS310_POLL_TIMEOUT_US(rate);
>> -
>> -       /* Poll for sensor readiness; base the timeout upon the sample rate. */
>> -       rc = regmap_read_poll_timeout(data->regmap, DPS310_MEAS_CFG, ready,
>> -                                     ready & DPS310_PRS_RDY,
>> -                                     DPS310_POLL_SLEEP_US(timeout), timeout);
>> -       if (rc)
>> +       rc = dps310_check_reset_meas_cfg(data, DPS310_PRS_RDY);
>> +       if (rc < 0)
>>                  goto done;
>>
>> +       if (rc > 0) {
>> +               rate = dps310_get_pres_samp_freq(data);
>> +               timeout = DPS310_POLL_TIMEOUT_US(rate);
>> +
>> +               /*
>> +                * Poll for sensor readiness; base the timeout upon the sample
>> +                * rate.
>> +                */
>> +               rc = regmap_read_poll_timeout(data->regmap, DPS310_MEAS_CFG,
>> +                                             ready, ready & DPS310_PRS_RDY,
>> +                                             DPS310_POLL_SLEEP_US(timeout),
>> +                                             timeout);
>> +               if (rc)
>> +                       goto done;
>> +       }
> If you split the condition body to a helper, it can be rewritten like
> (also note special definition for positive returned numbers):
>
>    rc = ..._reset_meas_cfg(...);
>    if (rc  == DPS310_MEAS_NOT_READY)
>      rc = ..._new_helper_func(...);
>    if (rc)
>      goto done;
>
> and looking at this it might be worth considering calling that
> conditional in the middle in the _reset_meas_cfg(), so the latter will
> return either 0 or negative error code.


To be honest that looks more complicated than the way it is now? And I 
don't think I can make it common between the temp and pressure without 
some complicated macro business.


>
>> +       rc = dps310_check_reset_meas_cfg(data, DPS310_TMP_RDY);
>>          if (rc < 0)
>>                  goto done;
>>
>> +       if (rc > 0) {
>> +               rate = dps310_get_temp_samp_freq(data);
> Okay, I see this function is different, but still you may realize a
> helper from below and something like above suggestion can still be
> achieved.
>
>> +               timeout = DPS310_POLL_TIMEOUT_US(rate);
>> +
>> +               /*
>> +                * Poll for sensor readiness; base the timeout upon the sample
>> +                * rate.
>> +                */
>> +               rc = regmap_read_poll_timeout(data->regmap, DPS310_MEAS_CFG,
>> +                                             ready, ready & DPS310_TMP_RDY,
>> +                                             DPS310_POLL_SLEEP_US(timeout),
>> +                                             timeout);
>> +               if (rc < 0)
> Why out of a sudden ' < 0'?


Good point, I'll fix that.


>
>> +                       goto done;
>> +       }
> As per above.
>
>>          rc = dps310_read_temp_ready(data);
>
Jonathan Cameron Aug. 20, 2022, 11:50 a.m. UTC | #3
On Mon, 15 Aug 2022 08:59:03 -0500
Eddie James <eajames@linux.ibm.com> wrote:

> On 8/12/22 17:13, Andy Shevchenko wrote:
> > On Wed, Aug 10, 2022 at 12:12 AM Eddie James <eajames@linux.ibm.com> wrote:  
> >> Corruption of the MEAS_CFG register has been observed soon after
> >> system boot. In order to recover this scenario, check MEAS_CFG if
> >> measurement isn't ready, and if it's incorrect, reset the DPS310
> >> and execute the startup procedure.  
> > Looks like both patches miss the Fixes tag. Can you add them?  
> 
> 
> Well this isn't really a software fix - there's no identifiable bug in 
> the driver. Just trying to recover the chip in this observed mystery 
> scenario.

The tag is useful as well for where to backport this to.
Probably just tag the driver introduction.

Your description makes it clear we aren't finding faults in the
driver - just that it didn't cover this undocumented case!

Jonathan
diff mbox series

Patch

diff --git a/drivers/iio/pressure/dps310.c b/drivers/iio/pressure/dps310.c
index c706a8b423b5..3a1aeeea3cdd 100644
--- a/drivers/iio/pressure/dps310.c
+++ b/drivers/iio/pressure/dps310.c
@@ -393,6 +393,44 @@  static int dps310_get_temp_k(struct dps310_data *data)
 	return scale_factors[ilog2(rc)];
 }
 
+/*
+ * Called with lock held. Returns a negative value on error, a positive value
+ * when the device is not ready, and zero when the device is ready.
+ */
+static int dps310_check_reset_meas_cfg(struct dps310_data *data, int ready_bit)
+{
+	int meas_cfg;
+	int rc = regmap_read(data->regmap, DPS310_MEAS_CFG, &meas_cfg);
+
+	if (rc < 0)
+		return rc;
+
+	/* Device is ready, proceed to measurement */
+	if (meas_cfg & ready_bit)
+		return 0;
+
+	/* Device is OK, just not ready */
+	if (meas_cfg & (DPS310_PRS_EN | DPS310_TEMP_EN | DPS310_BACKGROUND))
+		return 1;
+
+	/* DPS310 register state corrupt, better start from scratch */
+	rc = regmap_write(data->regmap, DPS310_RESET, DPS310_RESET_MAGIC);
+	if (rc < 0)
+		return rc;
+
+	/* Wait for device chip access: 2.5ms in specification */
+	usleep_range(2500, 12000);
+
+	/* Reinitialize the chip */
+	rc = dps310_startup(data);
+	if (rc)
+		return rc;
+
+	dev_info(&data->client->dev,
+		 "recovered from corrupted MEAS_CFG=%02x\n", meas_cfg);
+	return 1;
+}
+
 static int dps310_read_pres_raw(struct dps310_data *data)
 {
 	int rc;
@@ -405,16 +443,26 @@  static int dps310_read_pres_raw(struct dps310_data *data)
 	if (mutex_lock_interruptible(&data->lock))
 		return -EINTR;
 
-	rate = dps310_get_pres_samp_freq(data);
-	timeout = DPS310_POLL_TIMEOUT_US(rate);
-
-	/* Poll for sensor readiness; base the timeout upon the sample rate. */
-	rc = regmap_read_poll_timeout(data->regmap, DPS310_MEAS_CFG, ready,
-				      ready & DPS310_PRS_RDY,
-				      DPS310_POLL_SLEEP_US(timeout), timeout);
-	if (rc)
+	rc = dps310_check_reset_meas_cfg(data, DPS310_PRS_RDY);
+	if (rc < 0)
 		goto done;
 
+	if (rc > 0) {
+		rate = dps310_get_pres_samp_freq(data);
+		timeout = DPS310_POLL_TIMEOUT_US(rate);
+
+		/*
+		 * Poll for sensor readiness; base the timeout upon the sample
+		 * rate.
+		 */
+		rc = regmap_read_poll_timeout(data->regmap, DPS310_MEAS_CFG,
+					      ready, ready & DPS310_PRS_RDY,
+					      DPS310_POLL_SLEEP_US(timeout),
+					      timeout);
+		if (rc)
+			goto done;
+	}
+
 	rc = regmap_bulk_read(data->regmap, DPS310_PRS_BASE, val, sizeof(val));
 	if (rc < 0)
 		goto done;
@@ -454,16 +502,26 @@  static int dps310_read_temp_raw(struct dps310_data *data)
 	if (mutex_lock_interruptible(&data->lock))
 		return -EINTR;
 
-	rate = dps310_get_temp_samp_freq(data);
-	timeout = DPS310_POLL_TIMEOUT_US(rate);
-
-	/* Poll for sensor readiness; base the timeout upon the sample rate. */
-	rc = regmap_read_poll_timeout(data->regmap, DPS310_MEAS_CFG, ready,
-				      ready & DPS310_TMP_RDY,
-				      DPS310_POLL_SLEEP_US(timeout), timeout);
+	rc = dps310_check_reset_meas_cfg(data, DPS310_TMP_RDY);
 	if (rc < 0)
 		goto done;
 
+	if (rc > 0) {
+		rate = dps310_get_temp_samp_freq(data);
+		timeout = DPS310_POLL_TIMEOUT_US(rate);
+
+		/*
+		 * Poll for sensor readiness; base the timeout upon the sample
+		 * rate.
+		 */
+		rc = regmap_read_poll_timeout(data->regmap, DPS310_MEAS_CFG,
+					      ready, ready & DPS310_TMP_RDY,
+					      DPS310_POLL_SLEEP_US(timeout),
+					      timeout);
+		if (rc < 0)
+			goto done;
+	}
+
 	rc = dps310_read_temp_ready(data);
 
 done: