diff mbox series

soundwire: bus: conditionally recheck UNATTACHED status

Message ID 20220830074224.2924179-1-yung-chuan.liao@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series soundwire: bus: conditionally recheck UNATTACHED status | expand

Commit Message

Bard Liao Aug. 30, 2022, 7:42 a.m. UTC
From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

In configurations with two amplifiers on the same link, the Intel CI
reports occasional enumeration/initialization timeouts during
suspend-resume stress tests, where one of the two amplifiers becomes
UNATTACHED immediately after being enumerated. This problem was
reported both with Maxim and Realtek codecs, which pointed initially
to a problem with status handling on the Intel side.

The Cadence IP integrated on Intel platforms throws an interrupt when
the status changes, and the information is kept with sticky bits until
cleared. We initially added more checks to make sure the edge
detection did not miss any transition, but that did not improve the
results significantly.

With the recent addition of the read_ping_status() callback, we were
able to show that the status in sticky bits is shown as UNATTACHED
even though the PING frames show the problematic device as
ATTACHED. That completely breaks the entire logic where we assumed
that a peripheral would always re-attach as device0. The resume
timeouts make sense in that in those cases, the
enumeration/initialization never happens a second time.

One possible explanation is that this problem typically happens when a
bus clash is reported, so it could very well be that the detection is
fooled by a transient electrical issue or conflict between two
peripherals.

This patch conditionally double-checks the status reported in the
sticky bits with the actual PING frame status. If the peripheral
reports as attached in PING frames, the early detection based on
sticky bits is discarded.

Note that this patch only corrects issues of false positives on the
manager side.

If the peripheral lost and regain sync, then it would report as
attached on Device0. A peripheral that would not reset its dev_num
would not be compliant with the MIPI specification.

BugLink: https://github.com/thesofproject/linux/issues/3638
BugLink: https://github.com/thesofproject/linux/issues/3325
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
---
Hi Vinod,

You will need the "ASoC/soundwire: log actual PING status on resume issues"
series which is applied at ASoC tree before appling this patch.

---
 drivers/soundwire/bus.c       | 19 +++++++++++++++++++
 drivers/soundwire/intel.c     |  1 +
 include/linux/soundwire/sdw.h |  3 +++
 3 files changed, 23 insertions(+)

Comments

Pierre-Louis Bossart Sept. 12, 2022, 11:57 a.m. UTC | #1
[top posting]
I reverted this patch in the SOF tree to use Richard Fitzgerald's
series, see

https://github.com/thesofproject/linux/pull/3836

I don't think we want this patch upstream, do we?


On 8/30/22 09:42, Bard Liao wrote:
> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> 
> In configurations with two amplifiers on the same link, the Intel CI
> reports occasional enumeration/initialization timeouts during
> suspend-resume stress tests, where one of the two amplifiers becomes
> UNATTACHED immediately after being enumerated. This problem was
> reported both with Maxim and Realtek codecs, which pointed initially
> to a problem with status handling on the Intel side.
> 
> The Cadence IP integrated on Intel platforms throws an interrupt when
> the status changes, and the information is kept with sticky bits until
> cleared. We initially added more checks to make sure the edge
> detection did not miss any transition, but that did not improve the
> results significantly.
> 
> With the recent addition of the read_ping_status() callback, we were
> able to show that the status in sticky bits is shown as UNATTACHED
> even though the PING frames show the problematic device as
> ATTACHED. That completely breaks the entire logic where we assumed
> that a peripheral would always re-attach as device0. The resume
> timeouts make sense in that in those cases, the
> enumeration/initialization never happens a second time.
> 
> One possible explanation is that this problem typically happens when a
> bus clash is reported, so it could very well be that the detection is
> fooled by a transient electrical issue or conflict between two
> peripherals.
> 
> This patch conditionally double-checks the status reported in the
> sticky bits with the actual PING frame status. If the peripheral
> reports as attached in PING frames, the early detection based on
> sticky bits is discarded.
> 
> Note that this patch only corrects issues of false positives on the
> manager side.
> 
> If the peripheral lost and regain sync, then it would report as
> attached on Device0. A peripheral that would not reset its dev_num
> would not be compliant with the MIPI specification.
> 
> BugLink: https://github.com/thesofproject/linux/issues/3638
> BugLink: https://github.com/thesofproject/linux/issues/3325
> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> Reviewed-by: Rander Wang <rander.wang@intel.com>
> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
> ---
> Hi Vinod,
> 
> You will need the "ASoC/soundwire: log actual PING status on resume issues"
> series which is applied at ASoC tree before appling this patch.
> 
> ---
>  drivers/soundwire/bus.c       | 19 +++++++++++++++++++
>  drivers/soundwire/intel.c     |  1 +
>  include/linux/soundwire/sdw.h |  3 +++
>  3 files changed, 23 insertions(+)
> 
> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
> index 2772973eebb1..d0d486f07673 100644
> --- a/drivers/soundwire/bus.c
> +++ b/drivers/soundwire/bus.c
> @@ -1767,6 +1767,25 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
>  		    slave->status != SDW_SLAVE_UNATTACHED) {
>  			dev_warn(&slave->dev, "Slave %d state check1: UNATTACHED, status was %d\n",
>  				 i, slave->status);
> +
> +			if (bus->recheck_unattached && bus->ops->read_ping_status) {
> +				u32 ping_status;
> +
> +				mutex_lock(&bus->msg_lock);
> +
> +				ping_status = bus->ops->read_ping_status(bus);
> +
> +				mutex_unlock(&bus->msg_lock);
> +
> +				ping_status >>= (i * 2);
> +				ping_status &= 0x3;
> +
> +				if (ping_status != 0) {
> +					dev_warn(&slave->dev, "Slave %d state in PING frame is %d, ignoring earlier detection\n",
> +						 i, ping_status);
> +					continue;
> +				}
> +			}
>  			sdw_modify_slave_status(slave, SDW_SLAVE_UNATTACHED);
>  		}
>  	}
> diff --git a/drivers/soundwire/intel.c b/drivers/soundwire/intel.c
> index 25ec9c272239..0c6e674dbf85 100644
> --- a/drivers/soundwire/intel.c
> +++ b/drivers/soundwire/intel.c
> @@ -1311,6 +1311,7 @@ static int intel_link_probe(struct auxiliary_device *auxdev,
>  
>  	bus->link_id = auxdev->id;
>  	bus->dev_num_ida_min = INTEL_DEV_NUM_IDA_MIN;
> +	bus->recheck_unattached = true;
>  
>  	sdw_cdns_probe(cdns);
>  
> diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h
> index a2b31d25ea27..51ac71984260 100644
> --- a/include/linux/soundwire/sdw.h
> +++ b/include/linux/soundwire/sdw.h
> @@ -892,6 +892,8 @@ struct sdw_master_ops {
>   * @dev_num_ida_min: if set, defines the minimum values for the IDA
>   * used to allocate system-unique device numbers. This value needs to be
>   * identical across all SoundWire bus in the system.
> + * @recheck_unattached: if set, double-check UNATTACHED status changes
> + * by reading PING frame status.
>   */
>  struct sdw_bus {
>  	struct device *dev;
> @@ -917,6 +919,7 @@ struct sdw_bus {
>  	bool multi_link;
>  	int hw_sync_min_links;
>  	int dev_num_ida_min;
> +	bool recheck_unattached;
>  };
>  
>  int sdw_bus_master_add(struct sdw_bus *bus, struct device *parent,
Liao, Bard Sept. 12, 2022, 4:18 p.m. UTC | #2
> -----Original Message-----
> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> Sent: Monday, September 12, 2022 4:58 AM
> To: Bard Liao <yung-chuan.liao@linux.intel.com>; alsa-devel@alsa-project.org;
> vkoul@kernel.org; broonie@kernel.org
> Cc: vinod.koul@linaro.org; Liao, Bard <bard.liao@intel.com>; linux-
> kernel@vger.kernel.org; Richard Fitzgerald <rf@opensource.cirrus.com>
> Subject: Re: [PATCH] soundwire: bus: conditionally recheck UNATTACHED status
> 
> [top posting]
> I reverted this patch in the SOF tree to use Richard Fitzgerald's
> series, see
> 
> https://github.com/thesofproject/linux/pull/3836
> 
> I don't think we want this patch upstream, do we?

Agree, let's don't merge this upstream.

> 
> 
> On 8/30/22 09:42, Bard Liao wrote:
> > From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> >
> > In configurations with two amplifiers on the same link, the Intel CI
> > reports occasional enumeration/initialization timeouts during
> > suspend-resume stress tests, where one of the two amplifiers becomes
> > UNATTACHED immediately after being enumerated. This problem was
> > reported both with Maxim and Realtek codecs, which pointed initially
> > to a problem with status handling on the Intel side.
> >
> > The Cadence IP integrated on Intel platforms throws an interrupt when
> > the status changes, and the information is kept with sticky bits until
> > cleared. We initially added more checks to make sure the edge
> > detection did not miss any transition, but that did not improve the
> > results significantly.
> >
> > With the recent addition of the read_ping_status() callback, we were
> > able to show that the status in sticky bits is shown as UNATTACHED
> > even though the PING frames show the problematic device as
> > ATTACHED. That completely breaks the entire logic where we assumed
> > that a peripheral would always re-attach as device0. The resume
> > timeouts make sense in that in those cases, the
> > enumeration/initialization never happens a second time.
> >
> > One possible explanation is that this problem typically happens when a
> > bus clash is reported, so it could very well be that the detection is
> > fooled by a transient electrical issue or conflict between two
> > peripherals.
> >
> > This patch conditionally double-checks the status reported in the
> > sticky bits with the actual PING frame status. If the peripheral
> > reports as attached in PING frames, the early detection based on
> > sticky bits is discarded.
> >
> > Note that this patch only corrects issues of false positives on the
> > manager side.
> >
> > If the peripheral lost and regain sync, then it would report as
> > attached on Device0. A peripheral that would not reset its dev_num
> > would not be compliant with the MIPI specification.
> >
> > BugLink: https://github.com/thesofproject/linux/issues/3638
> > BugLink: https://github.com/thesofproject/linux/issues/3325
> > Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > Reviewed-by: Rander Wang <rander.wang@intel.com>
> > Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
> > ---
> > Hi Vinod,
> >
> > You will need the "ASoC/soundwire: log actual PING status on resume issues"
> > series which is applied at ASoC tree before appling this patch.
> >
> > ---
> >  drivers/soundwire/bus.c       | 19 +++++++++++++++++++
> >  drivers/soundwire/intel.c     |  1 +
> >  include/linux/soundwire/sdw.h |  3 +++
> >  3 files changed, 23 insertions(+)
> >
> > diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
> > index 2772973eebb1..d0d486f07673 100644
> > --- a/drivers/soundwire/bus.c
> > +++ b/drivers/soundwire/bus.c
> > @@ -1767,6 +1767,25 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
> >  		    slave->status != SDW_SLAVE_UNATTACHED) {
> >  			dev_warn(&slave->dev, "Slave %d state check1:
> UNATTACHED, status was %d\n",
> >  				 i, slave->status);
> > +
> > +			if (bus->recheck_unattached && bus->ops-
> >read_ping_status) {
> > +				u32 ping_status;
> > +
> > +				mutex_lock(&bus->msg_lock);
> > +
> > +				ping_status = bus->ops->read_ping_status(bus);
> > +
> > +				mutex_unlock(&bus->msg_lock);
> > +
> > +				ping_status >>= (i * 2);
> > +				ping_status &= 0x3;
> > +
> > +				if (ping_status != 0) {
> > +					dev_warn(&slave->dev, "Slave %d state
> in PING frame is %d, ignoring earlier detection\n",
> > +						 i, ping_status);
> > +					continue;
> > +				}
> > +			}
> >  			sdw_modify_slave_status(slave,
> SDW_SLAVE_UNATTACHED);
> >  		}
> >  	}
> > diff --git a/drivers/soundwire/intel.c b/drivers/soundwire/intel.c
> > index 25ec9c272239..0c6e674dbf85 100644
> > --- a/drivers/soundwire/intel.c
> > +++ b/drivers/soundwire/intel.c
> > @@ -1311,6 +1311,7 @@ static int intel_link_probe(struct auxiliary_device
> *auxdev,
> >
> >  	bus->link_id = auxdev->id;
> >  	bus->dev_num_ida_min = INTEL_DEV_NUM_IDA_MIN;
> > +	bus->recheck_unattached = true;
> >
> >  	sdw_cdns_probe(cdns);
> >
> > diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h
> > index a2b31d25ea27..51ac71984260 100644
> > --- a/include/linux/soundwire/sdw.h
> > +++ b/include/linux/soundwire/sdw.h
> > @@ -892,6 +892,8 @@ struct sdw_master_ops {
> >   * @dev_num_ida_min: if set, defines the minimum values for the IDA
> >   * used to allocate system-unique device numbers. This value needs to be
> >   * identical across all SoundWire bus in the system.
> > + * @recheck_unattached: if set, double-check UNATTACHED status changes
> > + * by reading PING frame status.
> >   */
> >  struct sdw_bus {
> >  	struct device *dev;
> > @@ -917,6 +919,7 @@ struct sdw_bus {
> >  	bool multi_link;
> >  	int hw_sync_min_links;
> >  	int dev_num_ida_min;
> > +	bool recheck_unattached;
> >  };
> >
> >  int sdw_bus_master_add(struct sdw_bus *bus, struct device *parent,
diff mbox series

Patch

diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index 2772973eebb1..d0d486f07673 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -1767,6 +1767,25 @@  int sdw_handle_slave_status(struct sdw_bus *bus,
 		    slave->status != SDW_SLAVE_UNATTACHED) {
 			dev_warn(&slave->dev, "Slave %d state check1: UNATTACHED, status was %d\n",
 				 i, slave->status);
+
+			if (bus->recheck_unattached && bus->ops->read_ping_status) {
+				u32 ping_status;
+
+				mutex_lock(&bus->msg_lock);
+
+				ping_status = bus->ops->read_ping_status(bus);
+
+				mutex_unlock(&bus->msg_lock);
+
+				ping_status >>= (i * 2);
+				ping_status &= 0x3;
+
+				if (ping_status != 0) {
+					dev_warn(&slave->dev, "Slave %d state in PING frame is %d, ignoring earlier detection\n",
+						 i, ping_status);
+					continue;
+				}
+			}
 			sdw_modify_slave_status(slave, SDW_SLAVE_UNATTACHED);
 		}
 	}
diff --git a/drivers/soundwire/intel.c b/drivers/soundwire/intel.c
index 25ec9c272239..0c6e674dbf85 100644
--- a/drivers/soundwire/intel.c
+++ b/drivers/soundwire/intel.c
@@ -1311,6 +1311,7 @@  static int intel_link_probe(struct auxiliary_device *auxdev,
 
 	bus->link_id = auxdev->id;
 	bus->dev_num_ida_min = INTEL_DEV_NUM_IDA_MIN;
+	bus->recheck_unattached = true;
 
 	sdw_cdns_probe(cdns);
 
diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h
index a2b31d25ea27..51ac71984260 100644
--- a/include/linux/soundwire/sdw.h
+++ b/include/linux/soundwire/sdw.h
@@ -892,6 +892,8 @@  struct sdw_master_ops {
  * @dev_num_ida_min: if set, defines the minimum values for the IDA
  * used to allocate system-unique device numbers. This value needs to be
  * identical across all SoundWire bus in the system.
+ * @recheck_unattached: if set, double-check UNATTACHED status changes
+ * by reading PING frame status.
  */
 struct sdw_bus {
 	struct device *dev;
@@ -917,6 +919,7 @@  struct sdw_bus {
 	bool multi_link;
 	int hw_sync_min_links;
 	int dev_num_ida_min;
+	bool recheck_unattached;
 };
 
 int sdw_bus_master_add(struct sdw_bus *bus, struct device *parent,