Message ID | 20230111090222.2016499-20-Vijendar.Mukunda@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add soundwire support for Pink Sardine platform | expand |
On 1/11/23 03:02, Vijendar Mukunda wrote: > To avoid ACP entering into D3 state during slave enumeration and > initialization on two soundwire controller instances for multiple codecs, > increase the runtime suspend delay to 3 seconds. You have a parent PCI device and a set of child devices for each manager. The parent PCI device cannot suspend before all its children are also suspended, so shouldn't the delay be modified at the manager level? Not getting what this delay is and how this would deal with a lengthy enumeration/initialization process. > > Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com> > --- > sound/soc/amd/ps/acp63.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/sound/soc/amd/ps/acp63.h b/sound/soc/amd/ps/acp63.h > index 833d0b5aa73d..6c8849f2bcec 100644 > --- a/sound/soc/amd/ps/acp63.h > +++ b/sound/soc/amd/ps/acp63.h > @@ -51,7 +51,7 @@ > #define MIN_BUFFER MAX_BUFFER > > /* time in ms for runtime suspend delay */ > -#define ACP_SUSPEND_DELAY_MS 2000 > +#define ACP_SUSPEND_DELAY_MS 3000 > > #define ACP63_DMIC_ADDR 2 > #define ACP63_PDM_MODE_DEVS 3
On 11/01/23 21:32, Pierre-Louis Bossart wrote: > On 1/11/23 03:02, Vijendar Mukunda wrote: >> To avoid ACP entering into D3 state during slave enumeration and >> initialization on two soundwire controller instances for multiple codecs, >> increase the runtime suspend delay to 3 seconds. > You have a parent PCI device and a set of child devices for each > manager. The parent PCI device cannot suspend before all its children > are also suspended, so shouldn't the delay be modified at the manager level? > > Not getting what this delay is and how this would deal with a lengthy > enumeration/initialization process. Yes agreed. Until Child devices are suspended, parent device will be in D0 state. We will rephrase the commit message. Machine driver node will be created by ACP PCI driver. We have added delay in machine driver to make sure two manager instances completes codec enumeration and peripheral initialization before registering the sound card. Without adding delay in machine driver will result early card registration before codec initialization is completed. Manager will enter in to bad state due to codec read/write failures. We are intended to keep the ACP in D0 state, till sound card is created and jack controls are initialized. To handle, at manager level increased runtime suspend delay. >> Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com> >> --- >> sound/soc/amd/ps/acp63.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/sound/soc/amd/ps/acp63.h b/sound/soc/amd/ps/acp63.h >> index 833d0b5aa73d..6c8849f2bcec 100644 >> --- a/sound/soc/amd/ps/acp63.h >> +++ b/sound/soc/amd/ps/acp63.h >> @@ -51,7 +51,7 @@ >> #define MIN_BUFFER MAX_BUFFER >> >> /* time in ms for runtime suspend delay */ >> -#define ACP_SUSPEND_DELAY_MS 2000 >> +#define ACP_SUSPEND_DELAY_MS 3000 >> >> #define ACP63_DMIC_ADDR 2 >> #define ACP63_PDM_MODE_DEVS 3
On 1/12/23 05:02, Mukunda,Vijendar wrote: > On 11/01/23 21:32, Pierre-Louis Bossart wrote: >> On 1/11/23 03:02, Vijendar Mukunda wrote: >>> To avoid ACP entering into D3 state during slave enumeration and >>> initialization on two soundwire controller instances for multiple codecs, >>> increase the runtime suspend delay to 3 seconds. >> You have a parent PCI device and a set of child devices for each >> manager. The parent PCI device cannot suspend before all its children >> are also suspended, so shouldn't the delay be modified at the manager level? >> >> Not getting what this delay is and how this would deal with a lengthy >> enumeration/initialization process. > Yes agreed. Until Child devices are suspended, parent device will > be in D0 state. We will rephrase the commit message. > > Machine driver node will be created by ACP PCI driver. > We have added delay in machine driver to make sure > two manager instances completes codec enumeration and > peripheral initialization before registering the sound card. > Without adding delay in machine driver will result early card > registration before codec initialization is completed. Manager > will enter in to bad state due to codec read/write failures. > We are intended to keep the ACP in D0 state, till sound card > is created and jack controls are initialized. To handle, at manager > level increased runtime suspend delay. This doesn't look too good. You should not assume any timing dependencies in the machine driver probe. I made that mistake in earlier versions and we had to revisit all this to make sure drivers could be bound/unbound at any time.
On 1/12/2023 08:54, Pierre-Louis Bossart wrote: > > > On 1/12/23 05:02, Mukunda,Vijendar wrote: >> On 11/01/23 21:32, Pierre-Louis Bossart wrote: >>> On 1/11/23 03:02, Vijendar Mukunda wrote: >>>> To avoid ACP entering into D3 state during slave enumeration and >>>> initialization on two soundwire controller instances for multiple codecs, >>>> increase the runtime suspend delay to 3 seconds. >>> You have a parent PCI device and a set of child devices for each >>> manager. The parent PCI device cannot suspend before all its children >>> are also suspended, so shouldn't the delay be modified at the manager level? >>> >>> Not getting what this delay is and how this would deal with a lengthy >>> enumeration/initialization process. >> Yes agreed. Until Child devices are suspended, parent device will >> be in D0 state. We will rephrase the commit message. >> >> Machine driver node will be created by ACP PCI driver. >> We have added delay in machine driver to make sure >> two manager instances completes codec enumeration and >> peripheral initialization before registering the sound card. >> Without adding delay in machine driver will result early card >> registration before codec initialization is completed. Manager >> will enter in to bad state due to codec read/write failures. >> We are intended to keep the ACP in D0 state, till sound card >> is created and jack controls are initialized. To handle, at manager >> level increased runtime suspend delay. > > This doesn't look too good. You should not assume any timing > dependencies in the machine driver probe. I made that mistake in earlier > versions and we had to revisit all this to make sure drivers could be > bound/unbound at any time. Rather than a timing dependency, could you perhaps prohibit runtime PM and have a codec make a callback to indicate it's fully initialized and then allow runtime PM again?
On 1/12/23 09:29, Limonciello, Mario wrote: > On 1/12/2023 08:54, Pierre-Louis Bossart wrote: >> >> >> On 1/12/23 05:02, Mukunda,Vijendar wrote: >>> On 11/01/23 21:32, Pierre-Louis Bossart wrote: >>>> On 1/11/23 03:02, Vijendar Mukunda wrote: >>>>> To avoid ACP entering into D3 state during slave enumeration and >>>>> initialization on two soundwire controller instances for multiple >>>>> codecs, >>>>> increase the runtime suspend delay to 3 seconds. >>>> You have a parent PCI device and a set of child devices for each >>>> manager. The parent PCI device cannot suspend before all its children >>>> are also suspended, so shouldn't the delay be modified at the >>>> manager level? >>>> >>>> Not getting what this delay is and how this would deal with a lengthy >>>> enumeration/initialization process. >>> Yes agreed. Until Child devices are suspended, parent device will >>> be in D0 state. We will rephrase the commit message. >>> >>> Machine driver node will be created by ACP PCI driver. >>> We have added delay in machine driver to make sure >>> two manager instances completes codec enumeration and >>> peripheral initialization before registering the sound card. >>> Without adding delay in machine driver will result early card >>> registration before codec initialization is completed. Manager >>> will enter in to bad state due to codec read/write failures. >>> We are intended to keep the ACP in D0 state, till sound card >>> is created and jack controls are initialized. To handle, at manager >>> level increased runtime suspend delay. >> >> This doesn't look too good. You should not assume any timing >> dependencies in the machine driver probe. I made that mistake in earlier >> versions and we had to revisit all this to make sure drivers could be >> bound/unbound at any time. > > Rather than a timing dependency, could you perhaps prohibit runtime PM > and have a codec make a callback to indicate it's fully initialized and > then allow runtime PM again? We already have enumeration and initialization 'struct completion' that are used by codec drivers to know if the hardware is usable. We also have pm_runtime_get_sync() is the bus layer to make sure the codec is resumed before being accessed. The explanations above confuse card registration and manager probe/initialization. These are two different things. Maybe there's indeed a missing part in the SoundWire PM assumptions, but I am not getting what the issue is.
On 12/01/23 21:35, Pierre-Louis Bossart wrote: > > On 1/12/23 09:29, Limonciello, Mario wrote: >> On 1/12/2023 08:54, Pierre-Louis Bossart wrote: >>> >>> On 1/12/23 05:02, Mukunda,Vijendar wrote: >>>> On 11/01/23 21:32, Pierre-Louis Bossart wrote: >>>>> On 1/11/23 03:02, Vijendar Mukunda wrote: >>>>>> To avoid ACP entering into D3 state during slave enumeration and >>>>>> initialization on two soundwire controller instances for multiple >>>>>> codecs, >>>>>> increase the runtime suspend delay to 3 seconds. >>>>> You have a parent PCI device and a set of child devices for each >>>>> manager. The parent PCI device cannot suspend before all its children >>>>> are also suspended, so shouldn't the delay be modified at the >>>>> manager level? >>>>> >>>>> Not getting what this delay is and how this would deal with a lengthy >>>>> enumeration/initialization process. >>>> Yes agreed. Until Child devices are suspended, parent device will >>>> be in D0 state. We will rephrase the commit message. >>>> >>>> Machine driver node will be created by ACP PCI driver. >>>> We have added delay in machine driver to make sure >>>> two manager instances completes codec enumeration and >>>> peripheral initialization before registering the sound card. >>>> Without adding delay in machine driver will result early card >>>> registration before codec initialization is completed. Manager >>>> will enter in to bad state due to codec read/write failures. >>>> We are intended to keep the ACP in D0 state, till sound card >>>> is created and jack controls are initialized. To handle, at manager >>>> level increased runtime suspend delay. >>> This doesn't look too good. You should not assume any timing >>> dependencies in the machine driver probe. I made that mistake in earlier >>> versions and we had to revisit all this to make sure drivers could be >>> bound/unbound at any time. >> Rather than a timing dependency, could you perhaps prohibit runtime PM >> and have a codec make a callback to indicate it's fully initialized and >> then allow runtime PM again? > We already have enumeration and initialization 'struct completion' that > are used by codec drivers to know if the hardware is usable. We also > have pm_runtime_get_sync() is the bus layer to make sure the codec is > resumed before being accessed. Instead of walking through codec list and checking completion status for every codec over the link, can we have some solution where once all codecs gets enumerated and initialized, a variable in bus instance will be updated to know all peripherals initialized. So that we can check this variable in machine driver. > > The explanations above confuse card registration and manager > probe/initialization. These are two different things. Maybe there's > indeed a missing part in the SoundWire PM assumptions, but I am not > getting what the issue is. We will rephrase the commit message. At manager level we want to increase the delay to 3s.
>>>>>>> increase the runtime suspend delay to 3 seconds. >>>>>> You have a parent PCI device and a set of child devices for each >>>>>> manager. The parent PCI device cannot suspend before all its children >>>>>> are also suspended, so shouldn't the delay be modified at the >>>>>> manager level? >>>>>> >>>>>> Not getting what this delay is and how this would deal with a lengthy >>>>>> enumeration/initialization process. >>>>> Yes agreed. Until Child devices are suspended, parent device will >>>>> be in D0 state. We will rephrase the commit message. >>>>> >>>>> Machine driver node will be created by ACP PCI driver. >>>>> We have added delay in machine driver to make sure >>>>> two manager instances completes codec enumeration and >>>>> peripheral initialization before registering the sound card. >>>>> Without adding delay in machine driver will result early card >>>>> registration before codec initialization is completed. Manager >>>>> will enter in to bad state due to codec read/write failures. >>>>> We are intended to keep the ACP in D0 state, till sound card >>>>> is created and jack controls are initialized. To handle, at manager >>>>> level increased runtime suspend delay. >>>> This doesn't look too good. You should not assume any timing >>>> dependencies in the machine driver probe. I made that mistake in earlier >>>> versions and we had to revisit all this to make sure drivers could be >>>> bound/unbound at any time. >>> Rather than a timing dependency, could you perhaps prohibit runtime PM >>> and have a codec make a callback to indicate it's fully initialized and >>> then allow runtime PM again? >> We already have enumeration and initialization 'struct completion' that >> are used by codec drivers to know if the hardware is usable. We also >> have pm_runtime_get_sync() is the bus layer to make sure the codec is >> resumed before being accessed. > Instead of walking through codec list and checking completion status > for every codec over the link, can we have some solution where once > all codecs gets enumerated and initialized, a variable in bus instance > will be updated to know all peripherals initialized. So that we can > check this variable in machine driver. No, because the bus cannot know for sure what codecs to expect on the platform. This comes from the design, we first create a bunch of devices based on ACPI information, which causes the drivers to probe. Then when the bus starts, codecs that are physically present on the bus will attach and be initialized in the update_status callback. It's perfectly acceptable for devices to be exposed in ACPI and not be present on a board. The bus wouldn't know what is needed. I am still not clear on what the "early card registration" issue might be. Can you clarify which codec registers are accessed in that case, are those supposed to be managed with regmap? one possibility is that we need to make sure the codec drivers are in regmap cache_only probe at the probe time, that may prevent this sort of uncontrolled register access. I had a PR on this that I haven't touched in a while, see [1] I do recall some issues with the codec jacks, where if the card registration happens too late the codec might have suspended. But we added pm_runtime_resume_and_get in the set_jack_detect callbacks, so that was solved. [1] https://github.com/thesofproject/linux/pull/3941
On Fri, Jan 13, 2023 at 11:33:09AM -0600, Pierre-Louis Bossart wrote: > I do recall some issues with the codec jacks, where if the card > registration happens too late the codec might have suspended. But we > added pm_runtime_resume_and_get in the set_jack_detect callbacks, so > that was solved. Right, I would expect that whatever needs the device to be powered on would be explicitly ensuring that this is done rather than tweaking timeouts - the timeouts should be more of a performance thing to avoid bouncing power too much, not a correctness thing.
On 14/01/23 01:27, Mark Brown wrote: > On Fri, Jan 13, 2023 at 11:33:09AM -0600, Pierre-Louis Bossart wrote: > >> I do recall some issues with the codec jacks, where if the card >> registration happens too late the codec might have suspended. But we >> added pm_runtime_resume_and_get in the set_jack_detect callbacks, so >> that was solved. > Right, I would expect that whatever needs the device to be powered on > would be explicitly ensuring that this is done rather than tweaking > timeouts - the timeouts should be more of a performance thing to avoid > bouncing power too much, not a correctness thing. Machine driver probe is executed in parallel with Manager driver probe sequence. Because of it, before completion of all peripherals enumeration across the multiple links, if card registration is completed, codec register writes will fail as Codec device numbers are not assigned. If we understood correctly, as per your suggestion, We shouldn't use any time bounds in machine driver probe sequence and before registering the sound card, need to traverses through all peripheral initialization completion status for all the managers.
On 1/16/23 02:35, Mukunda,Vijendar wrote: > On 14/01/23 01:27, Mark Brown wrote: >> On Fri, Jan 13, 2023 at 11:33:09AM -0600, Pierre-Louis Bossart wrote: >> >>> I do recall some issues with the codec jacks, where if the card >>> registration happens too late the codec might have suspended. But we >>> added pm_runtime_resume_and_get in the set_jack_detect callbacks, so >>> that was solved. >> Right, I would expect that whatever needs the device to be powered on >> would be explicitly ensuring that this is done rather than tweaking >> timeouts - the timeouts should be more of a performance thing to avoid >> bouncing power too much, not a correctness thing. > Machine driver probe is executed in parallel with Manager driver > probe sequence. Because of it, before completion of all peripherals > enumeration across the multiple links, if card registration is > completed, codec register writes will fail as Codec device numbers > are not assigned. > > If we understood correctly, as per your suggestion, We shouldn't use any > time bounds in machine driver probe sequence and before registering the > sound card, need to traverses through all peripheral initialization completion > status for all the managers. What's not clear in your reply is this: What codec registers are accessed as a result of the machine driver probe and card registration, and in what part of the card registration? Are we talking about SoundWire 'standard' registers for device/port management, about vendor specific ones that are exposed to userspace, or vendor-specific ones entirely configured by the driver/regmap. You've got to give us more data or understanding of the sequence to help. Saying there's a race condition doesn't really help if there's nothing that explains what codec registers are accessed and when.
On 16/01/23 20:32, Pierre-Louis Bossart wrote: > > On 1/16/23 02:35, Mukunda,Vijendar wrote: >> On 14/01/23 01:27, Mark Brown wrote: >>> On Fri, Jan 13, 2023 at 11:33:09AM -0600, Pierre-Louis Bossart wrote: >>> >>>> I do recall some issues with the codec jacks, where if the card >>>> registration happens too late the codec might have suspended. But we >>>> added pm_runtime_resume_and_get in the set_jack_detect callbacks, so >>>> that was solved. >>> Right, I would expect that whatever needs the device to be powered on >>> would be explicitly ensuring that this is done rather than tweaking >>> timeouts - the timeouts should be more of a performance thing to avoid >>> bouncing power too much, not a correctness thing. >> Machine driver probe is executed in parallel with Manager driver >> probe sequence. Because of it, before completion of all peripherals >> enumeration across the multiple links, if card registration is >> completed, codec register writes will fail as Codec device numbers >> are not assigned. >> >> If we understood correctly, as per your suggestion, We shouldn't use any >> time bounds in machine driver probe sequence and before registering the >> sound card, need to traverses through all peripheral initialization completion >> status for all the managers. > What's not clear in your reply is this: > > What codec registers are accessed as a result of the machine driver > probe and card registration, and in what part of the card registration? > > Are we talking about SoundWire 'standard' registers for device/port > management, about vendor specific ones that are exposed to userspace, or > vendor-specific ones entirely configured by the driver/regmap. > > You've got to give us more data or understanding of the sequence to > help. Saying there's a race condition doesn't really help if there's > nothing that explains what codec registers are accessed and when. We have come across a race condition, where sound card registration is successful before codec enumerations across all the links gets completed and our manager instance going into bad state. Please refer below link for error logs. https://pastebin.com/ZYEN928S
On 1/17/23 05:33, Mukunda,Vijendar wrote: > On 16/01/23 20:32, Pierre-Louis Bossart wrote: >> >> On 1/16/23 02:35, Mukunda,Vijendar wrote: >>> On 14/01/23 01:27, Mark Brown wrote: >>>> On Fri, Jan 13, 2023 at 11:33:09AM -0600, Pierre-Louis Bossart wrote: >>>> >>>>> I do recall some issues with the codec jacks, where if the card >>>>> registration happens too late the codec might have suspended. But we >>>>> added pm_runtime_resume_and_get in the set_jack_detect callbacks, so >>>>> that was solved. >>>> Right, I would expect that whatever needs the device to be powered on >>>> would be explicitly ensuring that this is done rather than tweaking >>>> timeouts - the timeouts should be more of a performance thing to avoid >>>> bouncing power too much, not a correctness thing. >>> Machine driver probe is executed in parallel with Manager driver >>> probe sequence. Because of it, before completion of all peripherals >>> enumeration across the multiple links, if card registration is >>> completed, codec register writes will fail as Codec device numbers >>> are not assigned. >>> >>> If we understood correctly, as per your suggestion, We shouldn't use any >>> time bounds in machine driver probe sequence and before registering the >>> sound card, need to traverses through all peripheral initialization completion >>> status for all the managers. >> What's not clear in your reply is this: >> >> What codec registers are accessed as a result of the machine driver >> probe and card registration, and in what part of the card registration? >> >> Are we talking about SoundWire 'standard' registers for device/port >> management, about vendor specific ones that are exposed to userspace, or >> vendor-specific ones entirely configured by the driver/regmap. >> >> You've got to give us more data or understanding of the sequence to >> help. Saying there's a race condition doesn't really help if there's >> nothing that explains what codec registers are accessed and when. > We have come across a race condition, where sound card registration > is successful before codec enumerations across all the links gets completed > and our manager instance going into bad state. > > Please refer below link for error logs. > https://pastebin.com/ZYEN928S You have two RT1316 register areas that are accessed while the codec is not even enumerated: [ 2.755828] rt1316-sdca sdw:0:025d:1316:01:0: ASoC: error at snd_soc_component_update_bits on sdw:0:025d:1316:01:0 for register: [0x41080100] -22 [ 2.758904] rt1316-sdca sdw:0:025d:1316:01:0: ASoC: error at snd_soc_component_update_bits on sdw:0:025d:1316:01:0 for register: [0x00003004] -110 The last one is clearly listed in the regmap list. You probably want to reverse-engineer what causes these accesses. I see this suspicious kcontrol definition that might be related: SOC_SINGLE("Left I Tag Select", 0x3004, 4, 7, 0),
On Tue, Jan 17, 2023 at 05:51:03AM -0600, Pierre-Louis Bossart wrote: > On 1/17/23 05:33, Mukunda,Vijendar wrote: > [ 2.758904] rt1316-sdca sdw:0:025d:1316:01:0: ASoC: error at > snd_soc_component_update_bits on sdw:0:025d:1316:01:0 for register: > [0x00003004] -110 > The last one is clearly listed in the regmap list. > You probably want to reverse-engineer what causes these accesses. > I see this suspicious kcontrol definition that might be related: > SOC_SINGLE("Left I Tag Select", 0x3004, 4, 7, 0), Looks like a case for putting the CODEC in cache only mode...
On 1/17/23 06:16, Mark Brown wrote: > On Tue, Jan 17, 2023 at 05:51:03AM -0600, Pierre-Louis Bossart wrote: >> On 1/17/23 05:33, Mukunda,Vijendar wrote: > >> [ 2.758904] rt1316-sdca sdw:0:025d:1316:01:0: ASoC: error at >> snd_soc_component_update_bits on sdw:0:025d:1316:01:0 for register: >> [0x00003004] -110 > >> The last one is clearly listed in the regmap list. > >> You probably want to reverse-engineer what causes these accesses. >> I see this suspicious kcontrol definition that might be related: > >> SOC_SINGLE("Left I Tag Select", 0x3004, 4, 7, 0), > > Looks like a case for putting the CODEC in cache only mode... Right, and I think we'd need to do this during the probe instead of the hardware initialization (which could happen at a later time). I started a PR to try and improve regmap handling, see https://github.com/thesofproject/linux/pull/3941 I was trying to solve the case where codecs become unattached, but apparently the problem is hardware-related. One of the suggested improvements was to move the cache_only part earlier to prevent such accesses. Unfortunately the work isn't complete so that PR is just a draft at the moment.
diff --git a/sound/soc/amd/ps/acp63.h b/sound/soc/amd/ps/acp63.h index 833d0b5aa73d..6c8849f2bcec 100644 --- a/sound/soc/amd/ps/acp63.h +++ b/sound/soc/amd/ps/acp63.h @@ -51,7 +51,7 @@ #define MIN_BUFFER MAX_BUFFER /* time in ms for runtime suspend delay */ -#define ACP_SUSPEND_DELAY_MS 2000 +#define ACP_SUSPEND_DELAY_MS 3000 #define ACP63_DMIC_ADDR 2 #define ACP63_PDM_MODE_DEVS 3
To avoid ACP entering into D3 state during slave enumeration and initialization on two soundwire controller instances for multiple codecs, increase the runtime suspend delay to 3 seconds. Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com> --- sound/soc/amd/ps/acp63.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)