[v2,0/2] Add snd_card_disconnect_sync() helper
diff mbox

Message ID 87efq7v2qf.wl%kuninori.morimoto.gx@renesas.com
State New
Headers show

Commit Message

Kuninori Morimoto Oct. 13, 2017, 9:11 a.m. UTC
Hi Takashi

> > I tested this patch-set.
> > I noticed that it doesn't work if I used DPCM
> > (Kernel has Oops).
> > I will investigate it next week
> 
> Could you show the Oops message if you have?
> My concern is whether it happens by stopping at disconnect, or it's
> just another missing piece.

OK, but kernel log doesn't help you.
see below

My environment now is I'm using DPCM.
	FE : rsnd
	FE : rsnd
	BE : ak4613

1st issue is that kernel need below patch.
I guess BE is using dummy driver, and it doesn't have ops(?).
If this is needed, I can post it.

-----------------------
After this patch, my driver side clock counter mismatch Oops happen.
It seems this is because FE side wasn't called snd_pcm_stop();
(= BE side only called snd_pcm_stop()).

I could confirm this by printing who's stop was called by local quick hack.

Maybe timing reason, if kernel has Oops for some reasons,
then, both BE/FE snd_pcm_stop() are called.
If no Oops, BE snd_pcm_stop() only called.

Best regards
---
Kuninori Morimoto

Comments

Takashi Iwai Oct. 13, 2017, 9:43 a.m. UTC | #1
On Fri, 13 Oct 2017 11:11:42 +0200,
Kuninori Morimoto wrote:
> 
> 
> Hi Takashi
> 
> > > I tested this patch-set.
> > > I noticed that it doesn't work if I used DPCM
> > > (Kernel has Oops).
> > > I will investigate it next week
> > 
> > Could you show the Oops message if you have?
> > My concern is whether it happens by stopping at disconnect, or it's
> > just another missing piece.
> 
> OK, but kernel log doesn't help you.
> see below
> 
> My environment now is I'm using DPCM.
> 	FE : rsnd
> 	FE : rsnd
> 	BE : ak4613
> 
> 1st issue is that kernel need below patch.
> I guess BE is using dummy driver, and it doesn't have ops(?).
> If this is needed, I can post it.

No, this can't be right.  Every PCM implementation mandates the
presence of a trigger callback.  It's a must.  If a fix needed, it has
to be fixed in the driver side.

> -----------------------
> diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> index 2fec2fe..972408b 100644
> --- a/sound/core/pcm_native.c
> +++ b/sound/core/pcm_native.c
> @@ -1241,7 +1241,8 @@ static int snd_pcm_do_stop(struct snd_pcm_substream *substream, int state)
>  {
>  	if (substream->runtime->trigger_master == substream &&
>  	    snd_pcm_running(substream))
> -		substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP);
> +		if (substream->ops && substream->ops->trigger)
> +			substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP);
>  	return 0; /* unconditonally stop all substreams */
>  }
> -----------------------
> After this patch, my driver side clock counter mismatch Oops happen.
> It seems this is because FE side wasn't called snd_pcm_stop();
> (= BE side only called snd_pcm_stop()).

So the check might work around an Oops, but it's no "fix", per se.

> I could confirm this by printing who's stop was called by local quick hack.
> 
> Maybe timing reason, if kernel has Oops for some reasons,
> then, both BE/FE snd_pcm_stop() are called.
> If no Oops, BE snd_pcm_stop() only called.

Any pending delayed work (like rtd->delayed_work)?
This is flushed at soc_cleanup_card_resources(), but it's called at
card removal, thus it's too late for the hot-removal of the
component.


thanks,

Takashi
Mark Brown Oct. 13, 2017, 5:11 p.m. UTC | #2
On Fri, Oct 13, 2017 at 11:43:59AM +0200, Takashi Iwai wrote:
> Kuninori Morimoto wrote:

> > My environment now is I'm using DPCM.
> > 	FE : rsnd
> > 	FE : rsnd
> > 	BE : ak4613

> > 1st issue is that kernel need below patch.
> > I guess BE is using dummy driver, and it doesn't have ops(?).
> > If this is needed, I can post it.

> No, this can't be right.  Every PCM implementation mandates the
> presence of a trigger callback.  It's a must.  If a fix needed, it has
> to be fixed in the driver side.

I suspect this is a DPCM framework thing - when the paths are joined up
at runtime there's going to be at least one trigger in there but with
DPCM it's most likely going to be in the front end, the back end DMA is
often going to be transparent to the system and so not have a trigger.
However if you just iterate over all the PCMs you'll see the decomposed
front and back end links with their dummies in there.  Not sure what the
fix is here, probably we need to be hiding the back end links more from
the ALSA core.

Liam?
Kuninori Morimoto Oct. 16, 2017, 2:26 a.m. UTC | #3
Hi Takashi

> > I could confirm this by printing who's stop was called by local quick hack.
> > 
> > Maybe timing reason, if kernel has Oops for some reasons,
> > then, both BE/FE snd_pcm_stop() are called.
> > If no Oops, BE snd_pcm_stop() only called.
> 
> Any pending delayed work (like rtd->delayed_work)?
> This is flushed at soc_cleanup_card_resources(), but it's called at
> card removal, thus it's too late for the hot-removal of the
> component.

Current my dirver used delayed_work, but we can control it
by snd_soc_runtime_ignore_pmdown_time();
I think my driver now doesn't use delayed_work.

But there is still issue.
I don't know detail, but it seems 
snd_pcm_dev_disconnect() and snd_pcm_relase() are called
it the same time, both are calling snd_pcm_stop().

Then, snd_pcm_relase() side will calls
snd_pcm_detach_substream() and snd_pcm_dev_disconnect() side will die.

Mark's suggestion (= hiding BE) can solve this ?

Best regards
---
Kuninori Morimoto
Takashi Iwai Oct. 16, 2017, 3:37 p.m. UTC | #4
On Mon, 16 Oct 2017 04:26:42 +0200,
Kuninori Morimoto wrote:
> 
> 
> Hi Takashi
> 
> > > I could confirm this by printing who's stop was called by local quick hack.
> > > 
> > > Maybe timing reason, if kernel has Oops for some reasons,
> > > then, both BE/FE snd_pcm_stop() are called.
> > > If no Oops, BE snd_pcm_stop() only called.
> > 
> > Any pending delayed work (like rtd->delayed_work)?
> > This is flushed at soc_cleanup_card_resources(), but it's called at
> > card removal, thus it's too late for the hot-removal of the
> > component.
> 
> Current my dirver used delayed_work, but we can control it
> by snd_soc_runtime_ignore_pmdown_time();
> I think my driver now doesn't use delayed_work.
> 
> But there is still issue.
> I don't know detail, but it seems 
> snd_pcm_dev_disconnect() and snd_pcm_relase() are called
> it the same time, both are calling snd_pcm_stop().
> 
> Then, snd_pcm_relase() side will calls
> snd_pcm_detach_substream() and snd_pcm_dev_disconnect() side will die.

This must be also specific to DPCM.  Something is really wrong there.

Basically snd_pcm_dev_disconnect() and snd_pcm_release() can't race
since both are protected via pcm->open_mutex.  snd_pcm_stop() calls
are protected even more with substream lock.

> Mark's suggestion (= hiding BE) can solve this ?

Some of the issues might be addressed, yes, but I'm skeptical whether
it covers all.  The BE needs proper locking and refcounting that is
coupled with the FE, I suppose.


Takashi
Kuninori Morimoto Oct. 17, 2017, 12:59 a.m. UTC | #5
Hi Takashi-san, Mark

Thank you for your feedback

> > Then, snd_pcm_relase() side will calls
> > snd_pcm_detach_substream() and snd_pcm_dev_disconnect() side will die.
> 
> This must be also specific to DPCM.  Something is really wrong there.
> 
> Basically snd_pcm_dev_disconnect() and snd_pcm_release() can't race
> since both are protected via pcm->open_mutex.  snd_pcm_stop() calls
> are protected even more with substream lock.
> 
> > Mark's suggestion (= hiding BE) can solve this ?
> 
> Some of the issues might be addressed, yes, but I'm skeptical whether
> it covers all.  The BE needs proper locking and refcounting that is
> coupled with the FE, I suppose.

So, this means, your helper patch seems OK,
but DPCM side needs more hack.

Mark
I'm happy to solve this issue, but I'm not good at DPCM.
If you can give me some help/advice, I can debug it.

Best regards
---
Kuninori Morimoto
Takashi Iwai Oct. 17, 2017, 2:34 p.m. UTC | #6
On Tue, 17 Oct 2017 16:25:10 +0200,
Takashi Sakamoto wrote:
> 
> Hi guys,
> 
> On Oct 17 2017 09:59, Kuninori Morimoto wrote:>> Some of the issues might be addressed, yes, but I'm skeptical whether
> >> it covers all.  The BE needs proper locking and refcounting that is
> >> coupled with the FE, I suppose.
> > 
> > So, this means, your helper patch seems OK,
> > but DPCM side needs more hack.
> 
> In my opinion, it's better for us to take enough proofs and explanations
> when changing core functionalities.
> 
> 
> Below patch for snd-dummy is to enable unbind operation to platform_device
> for this module. (but it's too rough to apply mainline.) This works without
> Iwai-san's two patches.
> 
> Let's try:
> $ modprobe snd-dummy
> $ aplay -D hw:Dummy /dev/null &
> $ lsmod | grep dummy
> $ echo -n snd_dummy.0 > /sys/bus/platform/drivers/snd_dummy/unbind
> $ lsmod | grep dummy
> $ cat /proc/asound/cards
> $ modprobe -r snd-dummy
> $ cat /proc/asound/cards
> 
> Actually, we have no addressed issues in these operations. The issue is
> only on ALSA SoC part.

Well, as I mentioned, a potential breakage would appear in the legacy
AC97 codec binding -- if it would do any hot-unplug.  But most of
legacy drivers work fine without my patches just because it has only
top-level unbind.

So, the top-level hot-unplug as you tested with the dummy driver works
as is.  The problem is only about unbinding the middle-layer
component, as mentioned in the patch.  It's found mostly in ASoC, but
not necessarily tied only with ASoC.


> But I note that even if unbind works fine to shift state of sound devices
> into disconnected, poll(2) call to ALSA control character devices does not
> return (e.g. run 'amixer events'). I don't know exactly the cause yet...

The disconnection doesn't close the device by itself (we can't), but
it replaces with the dummy ops so that it never touches the driver
except for closing.  That is, the device is in the idle state and just
accepts closing.


Takashi

> ======== 8< --------
> 
> >From 74bb5c45f0bf36e6487538088c654b88e1efb5b5 Mon Sep 17 00:00:00 2001
> From: Takashi Sakamoto <o-takashi@sakamocchi.jp>
> Date: Tue, 17 Oct 2017 17:58:47 +0900
> Subject: [PATCH] ALSA: dummy: support unbind operation to shift state of sound
>  devices to disconnected
> 
> ---
>  sound/drivers/dummy.c | 39 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/sound/drivers/dummy.c b/sound/drivers/dummy.c
> index c0939a0164a6..dcdefd2931c7 100644
> --- a/sound/drivers/dummy.c
> +++ b/sound/drivers/dummy.c
> @@ -92,6 +92,7 @@ MODULE_PARM_DESC(hrtimer, "Use hrtimer as the timer source.");
>  #endif
>  
>  static struct platform_device *devices[SNDRV_CARDS];
> +static struct snd_card *cards[SNDRV_CARDS];
>  
>  #define MIXER_ADDR_MASTER	0
>  #define MIXER_ADDR_LINE		1
> @@ -1134,7 +1135,24 @@ static int snd_dummy_probe(struct platform_device *devptr)
>  
>  static int snd_dummy_remove(struct platform_device *devptr)
>  {
> -	snd_card_free(platform_get_drvdata(devptr));
> +	struct snd_card *card = platform_get_drvdata(devptr);
> +	struct snd_device *dev;
> +	int i;
> +
> +	for (i = 0; i < SNDRV_CARDS; ++i) {
> +		/* Temporary for module removal. */
> +		if (devices[i] == devptr)
> +			cards[i] = card;
> +	}
> +
> +	/*
> +	 * This shifts state of an instance of sound card into disconnected, but
> +	 * don't wait till all of references to instances of sound devices are
> +	 * released.
> +	 */
> +	list_for_each_entry_reverse(dev, &card->devices, list)
> +		snd_device_disconnect(card, dev->device_data);
> +
>  	return 0;
>  }
>  
> @@ -1178,8 +1196,23 @@ static void snd_dummy_unregister_all(void)
>  {
>  	int i;
>  
> -	for (i = 0; i < ARRAY_SIZE(devices); ++i)
> -		platform_device_unregister(devices[i]);
> +	for (i = 0; i < ARRAY_SIZE(devices); ++i) {
> +		struct platform_device *devptr = devices[i];
> +		struct snd_card *card = cards[i];
> +
> +		if (devptr == NULL)
> +			continue;
> +		if (card) {
> +			/*
> +			 * This can wait till the final reference to an
> +			 * instance of sound card is released.
> +			 */
> +			snd_card_free(card);
> +		}
> +
> +		platform_device_unregister(devptr);
> +	}
> +
>  	platform_driver_unregister(&snd_dummy_driver);
>  	free_fake_buffer();
>  }
> -- 
> 2.11.0
>
Takashi Sakamoto Oct. 17, 2017, 3:16 p.m. UTC | #7
On Oct 17 2017 23:34, Takashi Iwai wrote:
> Well, as I mentioned, a potential breakage would appear in the legacy
> AC97 codec binding -- if it would do any hot-unplug.  But most of
> legacy drivers work fine without my patches just because it has only
> top-level unbind.

Yep. Recent drivers get no suffers from the issue because AC'97 is
enough ancient, at least for rsnd.

> So, the top-level hot-unplug as you tested with the dummy driver works
> as is.  The problem is only about unbinding the middle-layer
> component, as mentioned in the patch.  It's found mostly in ASoC, but
> not necessarily tied only with ASoC.

Are there any driver outside of ALSA SoC part to use the feature?
Actually, not. It's quite natural to assume that the feature is somehow
specialized for ALSA SoC part even it it's in ALSA PCM core.

And today I have little time to review the patch. Please check
in-reply-to field of my previous message. This is not a reply to yours.

>> But I note that even if unbind works fine to shift state of sound devices
>> into disconnected, poll(2) call to ALSA control character devices does not
>> return (e.g. run 'amixer events'). I don't know exactly the cause yet...
> 
> The disconnection doesn't close the device by itself (we can't), but
> it replaces with the dummy ops so that it never touches the driver
> except for closing.  That is, the device is in the idle state and just
> accepts closing.

You did misread. I just said that a call of poll(2) doesn't return.

1599 static unsigned int snd_ctl_poll(struct file *file, poll_table * wait)
1600 {
1601         unsigned int mask;
1602         struct snd_ctl_file *ctl;
1603
1604         ctl = file->private_data;
1605         if (!ctl->subscribed)
1606                 return 0;
1607         poll_wait(file, &ctl->change_sleep, wait);

This should be awakened by below lines:

1775 static int snd_ctl_dev_disconnect(struct snd_device *device)
1776 {
1777         struct snd_card *card = device->device_data;
1778         struct snd_ctl_file *ctl;
1779
1780         read_lock(&card->ctl_files_rwlock);
1781         list_for_each_entry(ctl, &card->ctl_files, list) {
1782                 wake_up(&ctl->change_sleep);

As long as I observed, it doesn't even if the above line is executed.
(how odd...) I suspect to hit some other bugs such as buffer-overrun
depending on my environment. But it's out of my current plan and I
didn't investigate further. That's all of what I experience.


Thanks

Takashi Sakamoto
Takashi Iwai Oct. 17, 2017, 4:09 p.m. UTC | #8
On Tue, 17 Oct 2017 17:16:51 +0200,
Takashi Sakamoto wrote:
> 
> On Oct 17 2017 23:34, Takashi Iwai wrote:
> > Well, as I mentioned, a potential breakage would appear in the legacy
> > AC97 codec binding -- if it would do any hot-unplug.  But most of
> > legacy drivers work fine without my patches just because it has only
> > top-level unbind.
> 
> Yep. Recent drivers get no suffers from the issue because AC'97 is
> enough ancient, at least for rsnd.

Erm, no, what I meant is the legacy AC97 codec driver implementation,
i.e. the code in sound/pci/ac97/ac97_codec.c.  The driver using this
codec implementations are all non-ASoC.  But AFAIK most of its usage
is always with the hard-dependency, not via bus binding, so most
likely we don't see the issue in the legacy drivers, luckily or
unluckily.

> > So, the top-level hot-unplug as you tested with the dummy driver works
> > as is.  The problem is only about unbinding the middle-layer
> > component, as mentioned in the patch.  It's found mostly in ASoC, but
> > not necessarily tied only with ASoC.
> 
> Are there any driver outside of ALSA SoC part to use the feature?

If a middle-layer driver can unbind over bus or class, yes, any driver
code may suffer from the very same problem.  The soundbus of AOA
drivers likely has the same problem, for example, and I can imagine
some of media stuff has a similar problem, too.

> Actually, not. It's quite natural to assume that the feature is somehow
> specialized for ALSA SoC part even it it's in ALSA PCM core.

Please stop blaming as if it were only ASoC-specific.  It's the nature
of Linux driver model.  As long as you can unbind some partial
component, it has to sync with the rest for a proper removal.
The lack of such mechanism in ASoC is partly because ALSA core doesn't
provide the capability, and it's the very reason we're discussing
here.

> And today I have little time to review the patch. Please check
> in-reply-to field of my previous message. This is not a reply to yours.
>
> >> But I note that even if unbind works fine to shift state of sound devices
> >> into disconnected, poll(2) call to ALSA control character devices does not
> >> return (e.g. run 'amixer events'). I don't know exactly the cause yet...
> > 
> > The disconnection doesn't close the device by itself (we can't), but
> > it replaces with the dummy ops so that it never touches the driver
> > except for closing.  That is, the device is in the idle state and just
> > accepts closing.
> 
> You did misread. I just said that a call of poll(2) doesn't return.
> 
> 1599 static unsigned int snd_ctl_poll(struct file *file, poll_table * wait)
> 1600 {
> 1601         unsigned int mask;
> 1602         struct snd_ctl_file *ctl;
> 1603
> 1604         ctl = file->private_data;
> 1605         if (!ctl->subscribed)
> 1606                 return 0;
> 1607         poll_wait(file, &ctl->change_sleep, wait);
> 
> This should be awakened by below lines:
> 
> 1775 static int snd_ctl_dev_disconnect(struct snd_device *device)
> 1776 {
> 1777         struct snd_card *card = device->device_data;
> 1778         struct snd_ctl_file *ctl;
> 1779
> 1780         read_lock(&card->ctl_files_rwlock);
> 1781         list_for_each_entry(ctl, &card->ctl_files, list) {
> 1782                 wake_up(&ctl->change_sleep);
> 
> As long as I observed, it doesn't even if the above line is executed.
> (how odd...)

Hm, then it's worth to check, but it's irrelevant from the issue we're
chasing.

> I suspect to hit some other bugs such as buffer-overrun
> depending on my environment.

How can such a thing happen...?

> But it's out of my current plan and I
> didn't investigate further. That's all of what I experience.

OK, thanks for reporting.


Takashi

Patch
diff mbox

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 2fec2fe..972408b 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -1241,7 +1241,8 @@  static int snd_pcm_do_stop(struct snd_pcm_substream *substream, int state)
 {
 	if (substream->runtime->trigger_master == substream &&
 	    snd_pcm_running(substream))
-		substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP);
+		if (substream->ops && substream->ops->trigger)
+			substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP);
 	return 0; /* unconditonally stop all substreams */
 }
-----------------------