diff mbox series

[RFC] ALSA: hda/hdmi: fix race in handling acomp ELD notification at resume

Message ID 20210407154727.589017-1-kai.vehmanen@linux.intel.com (mailing list archive)
State New
Headers show
Series [RFC] ALSA: hda/hdmi: fix race in handling acomp ELD notification at resume | expand

Commit Message

Kai Vehmanen April 7, 2021, 3:47 p.m. UTC
When snd-hda-codec-hdmi is used with ASoC HDA controller like SOF (acomp
used for ELD notifications), display connection change done during suspend,
can be lost due to following sequence of events:

  1. system in S3 suspend
  2. DP/HDMI receiver connected
  3. system resumed
  4. HDA controller resumed, but card->deferred_resume_work not complete
  5. acomp eld_notify callback
  6. eld_notify ignored as power state is not CTL_POWER_D0
  7. HDA resume deferred work completed, power state set to CTL_POWER_D0

This results in losing the notification, and the jack state reported to
user-space is not correct.

The check on step 6 was added in commit 8ae743e82f0b ("ALSA: hda - Skip
ELD notification during system suspend"). It would seem with the deferred
resume logic in ASoC core, this check is not safe.

Fix the issue by modifying the check to only skip ELD notification
processing if power state is D3 or deeper. This helps in the ASoC
controller case as card power state is set to D2 at start of
soc_resume_deferred().

BugLink: https://github.com/thesofproject/linux/issues/2825
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
---
 sound/pci/hda/patch_hdmi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

NOTES:
 - I wonder if there is a better way to check for system suspend
   case than looking at snd_power_get_state()
 - 'chip->pm_prepared' is one option, but this is not directly available
   to codec drivers
 - storing PM target is hda_codec_pm_prepare() is perhaps one option


base-commit: 7dc53a38e4ac00d68943bab91deadc67f07d4a0b

Comments

Takashi Iwai April 7, 2021, 4:10 p.m. UTC | #1
On Wed, 07 Apr 2021 17:47:27 +0200,
Kai Vehmanen wrote:
> 
> When snd-hda-codec-hdmi is used with ASoC HDA controller like SOF (acomp
> used for ELD notifications), display connection change done during suspend,
> can be lost due to following sequence of events:
> 
>   1. system in S3 suspend
>   2. DP/HDMI receiver connected
>   3. system resumed
>   4. HDA controller resumed, but card->deferred_resume_work not complete
>   5. acomp eld_notify callback
>   6. eld_notify ignored as power state is not CTL_POWER_D0
>   7. HDA resume deferred work completed, power state set to CTL_POWER_D0
> 
> This results in losing the notification, and the jack state reported to
> user-space is not correct.

Hrm, that's odd.  The logic there is: there is a manual call of
hdmi_present_sense() for each pin in the resume call back of HDMI
codec driver, so at the point 7, update_eld() is invoked from
hdmi_present_sense(), which notifies the state to user-space.

So I don't see what's missing there.  Could you check whether the
scenario above is correct?  The state is updated in
snd_hdac_acomp_get_eld() call in sync_eld_via_acomp().  We can see
what state is returned there at which timing.

The only possible case I can think of now is that the graphics driver
isn't ready for returning the right value at the HDMI codec resume.
But this should have been covered by the device link...


thanks,

Takashi

> The check on step 6 was added in commit 8ae743e82f0b ("ALSA: hda - Skip
> ELD notification during system suspend"). It would seem with the deferred
> resume logic in ASoC core, this check is not safe.
> 
> Fix the issue by modifying the check to only skip ELD notification
> processing if power state is D3 or deeper. This helps in the ASoC
> controller case as card power state is set to D2 at start of
> soc_resume_deferred().
> 
> BugLink: https://github.com/thesofproject/linux/issues/2825
> Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
> ---
>  sound/pci/hda/patch_hdmi.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> NOTES:
>  - I wonder if there is a better way to check for system suspend
>    case than looking at snd_power_get_state()
>  - 'chip->pm_prepared' is one option, but this is not directly available
>    to codec drivers
>  - storing PM target is hda_codec_pm_prepare() is perhaps one option
> 
> diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c
> index 5de3666a7101..a43df036db1d 100644
> --- a/sound/pci/hda/patch_hdmi.c
> +++ b/sound/pci/hda/patch_hdmi.c
> @@ -2654,7 +2654,7 @@ static void generic_acomp_pin_eld_notify(void *audio_ptr, int port, int dev_id)
>  	/* skip notification during system suspend (but not in runtime PM);
>  	 * the state will be updated at resume
>  	 */
> -	if (snd_power_get_state(codec->card) != SNDRV_CTL_POWER_D0)
> +	if (snd_power_get_state(codec->card) >= SNDRV_CTL_POWER_D3)
>  		return;
>  	/* ditto during suspend/resume process itself */
>  	if (snd_hdac_is_in_pm(&codec->core))
> @@ -2840,7 +2840,7 @@ static void intel_pin_eld_notify(void *audio_ptr, int port, int pipe)
>  	/* skip notification during system suspend (but not in runtime PM);
>  	 * the state will be updated at resume
>  	 */
> -	if (snd_power_get_state(codec->card) != SNDRV_CTL_POWER_D0)
> +	if (snd_power_get_state(codec->card) >= SNDRV_CTL_POWER_D3)
>  		return;
>  	/* ditto during suspend/resume process itself */
>  	if (snd_hdac_is_in_pm(&codec->core))
> 
> base-commit: 7dc53a38e4ac00d68943bab91deadc67f07d4a0b
> -- 
> 2.31.0
>
Kai Vehmanen April 7, 2021, 4:40 p.m. UTC | #2
Hey,

On Wed, 7 Apr 2021, Takashi Iwai wrote:

> On Wed, 07 Apr 2021 17:47:27 +0200, Kai Vehmanen wrote:
> > 
> > When snd-hda-codec-hdmi is used with ASoC HDA controller like SOF (acomp
> > used for ELD notifications), display connection change done during suspend,
> > can be lost due to following sequence of events:
> > 
> >   1. system in S3 suspend
> >   2. DP/HDMI receiver connected
> >   3. system resumed
> >   4. HDA controller resumed, but card->deferred_resume_work not complete
> >   5. acomp eld_notify callback
> >   6. eld_notify ignored as power state is not CTL_POWER_D0
> >   7. HDA resume deferred work completed, power state set to CTL_POWER_D0
> > 
> > This results in losing the notification, and the jack state reported to
> > user-space is not correct.
> 
> Hrm, that's odd.  The logic there is: there is a manual call of
> hdmi_present_sense() for each pin in the resume call back of HDMI
> codec driver, so at the point 7, update_eld() is invoked from
> hdmi_present_sense(), which notifies the state to user-space.

In the bug case, the codec resume is completed in step (4). i915 is up and 
running but no HDMI/DP receiver is yet found/setup at this point. So HDA 
codec driver resumes and concludes no HDMI/DP receivers are available.

A bit later, the HDMI/DP receiver is found and i915 calls eld_notify. But 
as HDA controller's soc_resume_deferred() is still running, 
card->power_state==D2 still at this point. patch_hdmi.c:*pin_eld_notify() 
checks power_state, figures card is not in D0 and ignores the 
notification.

Then another moment later, HDA controller's deferred resume work completes 
and card power state is set to D0, but at this point there are no actions 
left that would trigger reprocessing the ELD nodification.

I now changed this so that if card is in D2, that's good enough and we 
process the notification in patch_hdmi.c:*pin_eld_notify().

> So I don't see what's missing there.  Could you check whether the
> scenario above is correct?  The state is updated in
> snd_hdac_acomp_get_eld() call in sync_eld_via_acomp().  We can see
> what state is returned there at which timing.

At this point, state for the ports is still disconnected (monitor was
connected while system was in suspend).

> The only possible case I can think of now is that the graphics driver
> isn't ready for returning the right value at the HDMI codec resume.
> But this should have been covered by the device link...

Yes, this seems to be the case. The device link seems to be honoured,
but the fact that 1) monitor/receiver is not immediately found, and 2) 
ASoC core does some of the resume work in a work-queue, opens this race 
still.

Seems quite odd indeed, but I've now got reports of systems where this is 
hit, and unfortunately it's very systematic on these systems. By adding 
some arbitrary delay to soc_resume_deferred(), I could easily hit this
myself as well on the systems I have at hand.

Br, Kai
Takashi Iwai April 8, 2021, 8:33 a.m. UTC | #3
On Wed, 07 Apr 2021 18:40:29 +0200,
Kai Vehmanen wrote:
> 
> Hey,
> 
> On Wed, 7 Apr 2021, Takashi Iwai wrote:
> 
> > On Wed, 07 Apr 2021 17:47:27 +0200, Kai Vehmanen wrote:
> > > 
> > > When snd-hda-codec-hdmi is used with ASoC HDA controller like SOF (acomp
> > > used for ELD notifications), display connection change done during suspend,
> > > can be lost due to following sequence of events:
> > > 
> > >   1. system in S3 suspend
> > >   2. DP/HDMI receiver connected
> > >   3. system resumed
> > >   4. HDA controller resumed, but card->deferred_resume_work not complete
> > >   5. acomp eld_notify callback
> > >   6. eld_notify ignored as power state is not CTL_POWER_D0
> > >   7. HDA resume deferred work completed, power state set to CTL_POWER_D0
> > > 
> > > This results in losing the notification, and the jack state reported to
> > > user-space is not correct.
> > 
> > Hrm, that's odd.  The logic there is: there is a manual call of
> > hdmi_present_sense() for each pin in the resume call back of HDMI
> > codec driver, so at the point 7, update_eld() is invoked from
> > hdmi_present_sense(), which notifies the state to user-space.
> 
> In the bug case, the codec resume is completed in step (4). i915 is up and 
> running but no HDMI/DP receiver is yet found/setup at this point. So HDA 
> codec driver resumes and concludes no HDMI/DP receivers are available.
> 
> A bit later, the HDMI/DP receiver is found and i915 calls eld_notify. But 
> as HDA controller's soc_resume_deferred() is still running, 
> card->power_state==D2 still at this point. patch_hdmi.c:*pin_eld_notify() 
> checks power_state, figures card is not in D0 and ignores the 
> notification.
> 
> Then another moment later, HDA controller's deferred resume work completes 
> and card power state is set to D0, but at this point there are no actions 
> left that would trigger reprocessing the ELD nodification.
> 
> I now changed this so that if card is in D2, that's good enough and we 
> process the notification in patch_hdmi.c:*pin_eld_notify().
> 
> > So I don't see what's missing there.  Could you check whether the
> > scenario above is correct?  The state is updated in
> > snd_hdac_acomp_get_eld() call in sync_eld_via_acomp().  We can see
> > what state is returned there at which timing.
> 
> At this point, state for the ports is still disconnected (monitor was
> connected while system was in suspend).

OK, that's a messy problem, indeed.  It's partly because of ASoC
referred resume that is completely independent from the rest resume
via HD-audio bus.  More badly, this can't be managed via the device
link because the resume callback itself has been processed.

And, IIUC, another part of the problem is that i915 notifies the HPD
*after* the resume completion, right?  Then indeed it can be racy.

> > The only possible case I can think of now is that the graphics driver
> > isn't ready for returning the right value at the HDMI codec resume.
> > But this should have been covered by the device link...
> 
> Yes, this seems to be the case. The device link seems to be honoured,
> but the fact that 1) monitor/receiver is not immediately found, and 2) 
> ASoC core does some of the resume work in a work-queue, opens this race 
> still.
> 
> Seems quite odd indeed, but I've now got reports of systems where this is 
> hit, and unfortunately it's very systematic on these systems. By adding 
> some arbitrary delay to soc_resume_deferred(), I could easily hit this
> myself as well on the systems I have at hand.

Judging from the above, I see no problem to merge the patch as is.
It's no intrusive changes and cover practically ASoC cases (mostly).

Another possible fix would be to check dev->power.power_state instead
of the global card state.  This is set in each PM callback in
hda_codec.c to indicate the current PM state of the codec.  Something
like below.  Let me know if this works, too.


thanks,

Takashi

---
--- a/sound/pci/hda/patch_hdmi.c
+++ b/sound/pci/hda/patch_hdmi.c
@@ -2658,7 +2658,7 @@ static void generic_acomp_pin_eld_notify(void *audio_ptr, int port, int dev_id)
 	/* skip notification during system suspend (but not in runtime PM);
 	 * the state will be updated at resume
 	 */
-	if (snd_power_get_state(codec->card) != SNDRV_CTL_POWER_D0)
+	if (codec->core.dev.power.power_state.event != PM_EVENT_ON)
 		return;
 	/* ditto during suspend/resume process itself */
 	if (snd_hdac_is_in_pm(&codec->core))
Kai Vehmanen April 8, 2021, 10:49 a.m. UTC | #4
Hi,

On Thu, 8 Apr 2021, Takashi Iwai wrote:
> OK, that's a messy problem, indeed.  It's partly because of ASoC
> referred resume that is completely independent from the rest resume
> via HD-audio bus.  More badly, this can't be managed via the device
> link because the resume callback itself has been processed.
>
> And, IIUC, another part of the problem is that i915 notifies the HPD
> *after* the resume completion, right?  Then indeed it can be racy.

yes, exactly.

>> Seems quite odd indeed, but I've now got reports of systems where this is
>> hit, and unfortunately it's very systematic on these systems. By adding
>> some arbitrary delay to soc_resume_deferred(), I could easily hit this
>> myself as well on the systems I have at hand.
>
> Another possible fix would be to check dev->power.power_state instead
> of the global card state.  This is set in each PM callback in
> hda_codec.c to indicate the current PM state of the codec.  Something
> like below.  Let me know if this works, too.

Thanks, this works in my setup and is much cleaner. I think this is also 
more robust. I realized that with snd_power_get_state() check, there is a 
theoretical race still possible if notify comes before 
soc_resume_deferred() gets scheduled (i.e. delay is not within 
soc_resume_deferred() but in getting it scheduled to begin with). This 
would seem really unlikely, but it's a possible race nevertheless.

I'll update the patch to use dev->power.power_state, ask people with 
affected systems to double check, and I'll send a V2.

Br, Kai
diff mbox series

Patch

diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c
index 5de3666a7101..a43df036db1d 100644
--- a/sound/pci/hda/patch_hdmi.c
+++ b/sound/pci/hda/patch_hdmi.c
@@ -2654,7 +2654,7 @@  static void generic_acomp_pin_eld_notify(void *audio_ptr, int port, int dev_id)
 	/* skip notification during system suspend (but not in runtime PM);
 	 * the state will be updated at resume
 	 */
-	if (snd_power_get_state(codec->card) != SNDRV_CTL_POWER_D0)
+	if (snd_power_get_state(codec->card) >= SNDRV_CTL_POWER_D3)
 		return;
 	/* ditto during suspend/resume process itself */
 	if (snd_hdac_is_in_pm(&codec->core))
@@ -2840,7 +2840,7 @@  static void intel_pin_eld_notify(void *audio_ptr, int port, int pipe)
 	/* skip notification during system suspend (but not in runtime PM);
 	 * the state will be updated at resume
 	 */
-	if (snd_power_get_state(codec->card) != SNDRV_CTL_POWER_D0)
+	if (snd_power_get_state(codec->card) >= SNDRV_CTL_POWER_D3)
 		return;
 	/* ditto during suspend/resume process itself */
 	if (snd_hdac_is_in_pm(&codec->core))