diff mbox series

ASoC: hda: increment codec device refcount when it is added to the card

Message ID 20190530201828.2648-1-ranjani.sridharan@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series ASoC: hda: increment codec device refcount when it is added to the card | expand

Commit Message

Ranjani Sridharan May 30, 2019, 8:18 p.m. UTC
Calling snd_device_new() makes the codec devices managed by the card.
So, when the card is removed, the refcount for the codec
device is decremented and results in the codec device's kobject
being cleaned up if the refcount is 0. But, this leads to a NULL
pointer exception while attempting to remove the symlinks when the
codec driver is released later on. Therefore, increment the codec
device's refcount before adding it to the card to prevent this.

Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
---
 sound/pci/hda/hda_codec.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Pierre-Louis Bossart May 30, 2019, 9 p.m. UTC | #1
On 5/30/19 3:18 PM, Ranjani Sridharan wrote:
> Calling snd_device_new() makes the codec devices managed by the card.
> So, when the card is removed, the refcount for the codec
> device is decremented and results in the codec device's kobject
> being cleaned up if the refcount is 0. But, this leads to a NULL
> pointer exception while attempting to remove the symlinks when the
> codec driver is released later on. Therefore, increment the codec
> device's refcount before adding it to the card to prevent this.

Ranjani, you should add a bit of context for the rest of the list...

This patch suggest a solution to a set of sightings occurring when 
removing/adding modules in a loop, and the current analysis points to a 
difference between the way the HDMI and HDaudio codecs are handled.

https://github.com/thesofproject/linux/issues/981
https://github.com/thesofproject/linux/issues/966
https://github.com/thesofproject/linux/pull/988

Since it's not SOF specific it's better to get feedback directly from 
the large ALSA community/maintainers. We probably want to focus on the 
platform-specific/vendor-specific stuff on GitHub and use the mailing 
list for such framework-level changes.

> 
> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> ---
>   sound/pci/hda/hda_codec.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c
> index b20eb7fc83eb..0d5d95b07e19 100644
> --- a/sound/pci/hda/hda_codec.c
> +++ b/sound/pci/hda/hda_codec.c
> @@ -985,6 +985,14 @@ int snd_hda_codec_device_new(struct hda_bus *bus, struct snd_card *card,
>   		codec->core.subsystem_id, codec->core.revision_id);
>   	snd_component_add(card, component);
>   
> +	/*
> +	 * snd_device_new() makes the codec device managed by the card.
> +	 * When the card is removed, the device reference count is
> +	 * decremented. Therefore, increment it here to prevent removing
> +	 * the codec device's kobject when the card is removed.
> +	 */
> +	get_device(hda_codec_dev(codec));
> +
>   	err = snd_device_new(card, SNDRV_DEV_CODEC, codec, &dev_ops);
>   	if (err < 0)
>   		goto error;
>
Takashi Iwai May 31, 2019, 6:11 a.m. UTC | #2
On Thu, 30 May 2019 23:00:10 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> 
> On 5/30/19 3:18 PM, Ranjani Sridharan wrote:
> > Calling snd_device_new() makes the codec devices managed by the card.
> > So, when the card is removed, the refcount for the codec
> > device is decremented and results in the codec device's kobject
> > being cleaned up if the refcount is 0. But, this leads to a NULL
> > pointer exception while attempting to remove the symlinks when the
> > codec driver is released later on. Therefore, increment the codec
> > device's refcount before adding it to the card to prevent this.
> 
> Ranjani, you should add a bit of context for the rest of the list...
> 
> This patch suggest a solution to a set of sightings occurring when
> removing/adding modules in a loop, and the current analysis points to
> a difference between the way the HDMI and HDaudio codecs are handled.
> 
> https://github.com/thesofproject/linux/issues/981
> https://github.com/thesofproject/linux/issues/966
> https://github.com/thesofproject/linux/pull/988
> 
> Since it's not SOF specific it's better to get feedback directly from
> the large ALSA community/maintainers. We probably want to focus on the
> platform-specific/vendor-specific stuff on GitHub and use the mailing
> list for such framework-level changes.

Hm, I still wonder why this doens't happen with the HDA legacy.

What is the shortest way to trigger the bug manually without a script?


thanks,

Takashi


> > Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > ---
> >   sound/pci/hda/hda_codec.c | 8 ++++++++
> >   1 file changed, 8 insertions(+)
> >
> > diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c
> > index b20eb7fc83eb..0d5d95b07e19 100644
> > --- a/sound/pci/hda/hda_codec.c
> > +++ b/sound/pci/hda/hda_codec.c
> > @@ -985,6 +985,14 @@ int snd_hda_codec_device_new(struct hda_bus *bus, struct snd_card *card,
> >   		codec->core.subsystem_id, codec->core.revision_id);
> >   	snd_component_add(card, component);
> >   +	/*
> > +	 * snd_device_new() makes the codec device managed by the card.
> > +	 * When the card is removed, the device reference count is
> > +	 * decremented. Therefore, increment it here to prevent removing
> > +	 * the codec device's kobject when the card is removed.
> > +	 */
> > +	get_device(hda_codec_dev(codec));
> > +
> >   	err = snd_device_new(card, SNDRV_DEV_CODEC, codec, &dev_ops);
> >   	if (err < 0)
> >   		goto error;
> >
>
Ranjani Sridharan May 31, 2019, 1:18 p.m. UTC | #3
On Fri, 2019-05-31 at 08:11 +0200, Takashi Iwai wrote:
> On Thu, 30 May 2019 23:00:10 +0200,
> Pierre-Louis Bossart wrote:
> > 
> > 
> > 
> > On 5/30/19 3:18 PM, Ranjani Sridharan wrote:
> > > Calling snd_device_new() makes the codec devices managed by the
> > > card.
> > > So, when the card is removed, the refcount for the codec
> > > device is decremented and results in the codec device's kobject
> > > being cleaned up if the refcount is 0. But, this leads to a NULL
> > > pointer exception while attempting to remove the symlinks when
> > > the
> > > codec driver is released later on. Therefore, increment the codec
> > > device's refcount before adding it to the card to prevent this.
> > 
> > Ranjani, you should add a bit of context for the rest of the
> > list...
> > 
> > This patch suggest a solution to a set of sightings occurring when
> > removing/adding modules in a loop, and the current analysis points
> > to
> > a difference between the way the HDMI and HDaudio codecs are
> > handled.
> > 
> > https://github.com/thesofproject/linux/issues/981
> > https://github.com/thesofproject/linux/issues/966
> > https://github.com/thesofproject/linux/pull/988
> > 
> > Since it's not SOF specific it's better to get feedback directly
> > from
> > the large ALSA community/maintainers. We probably want to focus on
> > the
> > platform-specific/vendor-specific stuff on GitHub and use the
> > mailing
> > list for such framework-level changes.
> 
> Hm, I still wonder why this doens't happen with the HDA legacy.
> 
> What is the shortest way to trigger the bug manually without a
> script?
Hi Takashi,

With SOF, I can reproduce the issue if I just unload the sof_pci_dev
module with rmmod. 

Basically, the remove routine for the SOF pci device, unregisters the
machine driver and then removes the codec device. So the first step of
unregistering the machine driver frees the card which decrements the
refcount for the HDA codec's kobject. In the case of HDMI codec, since
it is not managed by the card, the refcount is not decremented when the
card is removed. 

Thanks,
Ranjani
> 
> 
> thanks,
> 
> Takashi
> 
>
Takashi Iwai May 31, 2019, 1:25 p.m. UTC | #4
On Fri, 31 May 2019 15:18:03 +0200,
Ranjani Sridharan wrote:
> 
> On Fri, 2019-05-31 at 08:11 +0200, Takashi Iwai wrote:
> > On Thu, 30 May 2019 23:00:10 +0200,
> > Pierre-Louis Bossart wrote:
> > > 
> > > 
> > > 
> > > On 5/30/19 3:18 PM, Ranjani Sridharan wrote:
> > > > Calling snd_device_new() makes the codec devices managed by the
> > > > card.
> > > > So, when the card is removed, the refcount for the codec
> > > > device is decremented and results in the codec device's kobject
> > > > being cleaned up if the refcount is 0. But, this leads to a NULL
> > > > pointer exception while attempting to remove the symlinks when
> > > > the
> > > > codec driver is released later on. Therefore, increment the codec
> > > > device's refcount before adding it to the card to prevent this.
> > > 
> > > Ranjani, you should add a bit of context for the rest of the
> > > list...
> > > 
> > > This patch suggest a solution to a set of sightings occurring when
> > > removing/adding modules in a loop, and the current analysis points
> > > to
> > > a difference between the way the HDMI and HDaudio codecs are
> > > handled.
> > > 
> > > https://github.com/thesofproject/linux/issues/981
> > > https://github.com/thesofproject/linux/issues/966
> > > https://github.com/thesofproject/linux/pull/988
> > > 
> > > Since it's not SOF specific it's better to get feedback directly
> > > from
> > > the large ALSA community/maintainers. We probably want to focus on
> > > the
> > > platform-specific/vendor-specific stuff on GitHub and use the
> > > mailing
> > > list for such framework-level changes.
> > 
> > Hm, I still wonder why this doens't happen with the HDA legacy.
> > 
> > What is the shortest way to trigger the bug manually without a
> > script?
> Hi Takashi,
> 
> With SOF, I can reproduce the issue if I just unload the sof_pci_dev
> module with rmmod. 
> 
> Basically, the remove routine for the SOF pci device, unregisters the
> machine driver and then removes the codec device. So the first step of
> unregistering the machine driver frees the card which decrements the
> refcount for the HDA codec's kobject. In the case of HDMI codec, since
> it is not managed by the card, the refcount is not decremented when the
> card is removed. 

So it's only about hdac_hdmi codec, or only about hdac_hda codec?

And why HDMI codec isn't managed by the card...?  IOW, isn't it
dangerous -- it means the codec being always removable after bound to
the card?


thanks,

Takashi
Ranjani Sridharan May 31, 2019, 1:52 p.m. UTC | #5
On Fri, 2019-05-31 at 15:25 +0200, Takashi Iwai wrote:
> On Fri, 31 May 2019 15:18:03 +0200,
> Ranjani Sridharan wrote:
> > 
> > On Fri, 2019-05-31 at 08:11 +0200, Takashi Iwai wrote:
> > > On Thu, 30 May 2019 23:00:10 +0200,
> > > Pierre-Louis Bossart wrote:
> > > > 
> > > > 
> > > > 
> > > > On 5/30/19 3:18 PM, Ranjani Sridharan wrote:
> > > > > Calling snd_device_new() makes the codec devices managed by
> > > > > the
> > > > > card.
> > > > > So, when the card is removed, the refcount for the codec
> > > > > device is decremented and results in the codec device's
> > > > > kobject
> > > > > being cleaned up if the refcount is 0. But, this leads to a
> > > > > NULL
> > > > > pointer exception while attempting to remove the symlinks
> > > > > when
> > > > > the
> > > > > codec driver is released later on. Therefore, increment the
> > > > > codec
> > > > > device's refcount before adding it to the card to prevent
> > > > > this.
> > > > 
> > > > Ranjani, you should add a bit of context for the rest of the
> > > > list...
> > > > 
> > > > This patch suggest a solution to a set of sightings occurring
> > > > when
> > > > removing/adding modules in a loop, and the current analysis
> > > > points
> > > > to
> > > > a difference between the way the HDMI and HDaudio codecs are
> > > > handled.
> > > > 
> > > > https://github.com/thesofproject/linux/issues/981
> > > > https://github.com/thesofproject/linux/issues/966
> > > > https://github.com/thesofproject/linux/pull/988
> > > > 
> > > > Since it's not SOF specific it's better to get feedback
> > > > directly
> > > > from
> > > > the large ALSA community/maintainers. We probably want to focus
> > > > on
> > > > the
> > > > platform-specific/vendor-specific stuff on GitHub and use the
> > > > mailing
> > > > list for such framework-level changes.
> > > 
> > > Hm, I still wonder why this doens't happen with the HDA legacy.
> > > 
> > > What is the shortest way to trigger the bug manually without a
> > > script?
> > 
> > Hi Takashi,
> > 
> > With SOF, I can reproduce the issue if I just unload the
> > sof_pci_dev
> > module with rmmod. 
> > 
> > Basically, the remove routine for the SOF pci device, unregisters
> > the
> > machine driver and then removes the codec device. So the first step
> > of
> > unregistering the machine driver frees the card which decrements
> > the
> > refcount for the HDA codec's kobject. In the case of HDMI codec,
> > since
> > it is not managed by the card, the refcount is not decremented when
> > the
> > card is removed. 
> 
> So it's only about hdac_hdmi codec, or only about hdac_hda codec?

It is only about the hdac_hda codec. 
> 
> And why HDMI codec isn't managed by the card...?  IOW, isn't it
> dangerous -- it means the codec being always removable after bound to
> the card?
That is a good point. Probably this needs to be fixed as well. I can
include the change for that if you think it is the right thing to do.

But I was wondering if it makes sense to increment the refcount when
the device is added to the card with snd_device_new()? 
I'm not sure how it affects the other devices so didnt go down this
route.

Thanks,
Ranjani

> 
> 
> thanks,
> 
> Takashi
Takashi Iwai May 31, 2019, 2:02 p.m. UTC | #6
On Fri, 31 May 2019 15:52:27 +0200,
Ranjani Sridharan wrote:
> 
> On Fri, 2019-05-31 at 15:25 +0200, Takashi Iwai wrote:
> > On Fri, 31 May 2019 15:18:03 +0200,
> > Ranjani Sridharan wrote:
> > > 
> > > On Fri, 2019-05-31 at 08:11 +0200, Takashi Iwai wrote:
> > > > On Thu, 30 May 2019 23:00:10 +0200,
> > > > Pierre-Louis Bossart wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > On 5/30/19 3:18 PM, Ranjani Sridharan wrote:
> > > > > > Calling snd_device_new() makes the codec devices managed by
> > > > > > the
> > > > > > card.
> > > > > > So, when the card is removed, the refcount for the codec
> > > > > > device is decremented and results in the codec device's
> > > > > > kobject
> > > > > > being cleaned up if the refcount is 0. But, this leads to a
> > > > > > NULL
> > > > > > pointer exception while attempting to remove the symlinks
> > > > > > when
> > > > > > the
> > > > > > codec driver is released later on. Therefore, increment the
> > > > > > codec
> > > > > > device's refcount before adding it to the card to prevent
> > > > > > this.
> > > > > 
> > > > > Ranjani, you should add a bit of context for the rest of the
> > > > > list...
> > > > > 
> > > > > This patch suggest a solution to a set of sightings occurring
> > > > > when
> > > > > removing/adding modules in a loop, and the current analysis
> > > > > points
> > > > > to
> > > > > a difference between the way the HDMI and HDaudio codecs are
> > > > > handled.
> > > > > 
> > > > > https://github.com/thesofproject/linux/issues/981
> > > > > https://github.com/thesofproject/linux/issues/966
> > > > > https://github.com/thesofproject/linux/pull/988
> > > > > 
> > > > > Since it's not SOF specific it's better to get feedback
> > > > > directly
> > > > > from
> > > > > the large ALSA community/maintainers. We probably want to focus
> > > > > on
> > > > > the
> > > > > platform-specific/vendor-specific stuff on GitHub and use the
> > > > > mailing
> > > > > list for such framework-level changes.
> > > > 
> > > > Hm, I still wonder why this doens't happen with the HDA legacy.
> > > > 
> > > > What is the shortest way to trigger the bug manually without a
> > > > script?
> > > 
> > > Hi Takashi,
> > > 
> > > With SOF, I can reproduce the issue if I just unload the
> > > sof_pci_dev
> > > module with rmmod. 
> > > 
> > > Basically, the remove routine for the SOF pci device, unregisters
> > > the
> > > machine driver and then removes the codec device. So the first step
> > > of
> > > unregistering the machine driver frees the card which decrements
> > > the
> > > refcount for the HDA codec's kobject. In the case of HDMI codec,
> > > since
> > > it is not managed by the card, the refcount is not decremented when
> > > the
> > > card is removed. 
> > 
> > So it's only about hdac_hdmi codec, or only about hdac_hda codec?
> 
> It is only about the hdac_hda codec. 
> > 
> > And why HDMI codec isn't managed by the card...?  IOW, isn't it
> > dangerous -- it means the codec being always removable after bound to
> > the card?
> That is a good point. Probably this needs to be fixed as well. I can
> include the change for that if you think it is the right thing to do.
> 
> But I was wondering if it makes sense to increment the refcount when
> the device is added to the card with snd_device_new()? 
> I'm not sure how it affects the other devices so didnt go down this
> route.

If you mean really snd_device_new() calls generically, not specific to
snd_hda_codec_device_new() -- then no, something must be wrong.

And even for snd_hda_codec_device_new(), I'm not sure about it.

Actually, who decrements the device refcount at which timing...?
It'd be helpful if you can clarify it, then we might see a better
solution or a better explanation.


thanks,

Takashi
Takashi Iwai May 31, 2019, 2:28 p.m. UTC | #7
On Fri, 31 May 2019 16:02:30 +0200,
Takashi Iwai wrote:
> 
> On Fri, 31 May 2019 15:52:27 +0200,
> Ranjani Sridharan wrote:
> > 
> > On Fri, 2019-05-31 at 15:25 +0200, Takashi Iwai wrote:
> > > On Fri, 31 May 2019 15:18:03 +0200,
> > > Ranjani Sridharan wrote:
> > > > 
> > > > On Fri, 2019-05-31 at 08:11 +0200, Takashi Iwai wrote:
> > > > > On Thu, 30 May 2019 23:00:10 +0200,
> > > > > Pierre-Louis Bossart wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 5/30/19 3:18 PM, Ranjani Sridharan wrote:
> > > > > > > Calling snd_device_new() makes the codec devices managed by
> > > > > > > the
> > > > > > > card.
> > > > > > > So, when the card is removed, the refcount for the codec
> > > > > > > device is decremented and results in the codec device's
> > > > > > > kobject
> > > > > > > being cleaned up if the refcount is 0. But, this leads to a
> > > > > > > NULL
> > > > > > > pointer exception while attempting to remove the symlinks
> > > > > > > when
> > > > > > > the
> > > > > > > codec driver is released later on. Therefore, increment the
> > > > > > > codec
> > > > > > > device's refcount before adding it to the card to prevent
> > > > > > > this.
> > > > > > 
> > > > > > Ranjani, you should add a bit of context for the rest of the
> > > > > > list...
> > > > > > 
> > > > > > This patch suggest a solution to a set of sightings occurring
> > > > > > when
> > > > > > removing/adding modules in a loop, and the current analysis
> > > > > > points
> > > > > > to
> > > > > > a difference between the way the HDMI and HDaudio codecs are
> > > > > > handled.
> > > > > > 
> > > > > > https://github.com/thesofproject/linux/issues/981
> > > > > > https://github.com/thesofproject/linux/issues/966
> > > > > > https://github.com/thesofproject/linux/pull/988
> > > > > > 
> > > > > > Since it's not SOF specific it's better to get feedback
> > > > > > directly
> > > > > > from
> > > > > > the large ALSA community/maintainers. We probably want to focus
> > > > > > on
> > > > > > the
> > > > > > platform-specific/vendor-specific stuff on GitHub and use the
> > > > > > mailing
> > > > > > list for such framework-level changes.
> > > > > 
> > > > > Hm, I still wonder why this doens't happen with the HDA legacy.
> > > > > 
> > > > > What is the shortest way to trigger the bug manually without a
> > > > > script?
> > > > 
> > > > Hi Takashi,
> > > > 
> > > > With SOF, I can reproduce the issue if I just unload the
> > > > sof_pci_dev
> > > > module with rmmod. 
> > > > 
> > > > Basically, the remove routine for the SOF pci device, unregisters
> > > > the
> > > > machine driver and then removes the codec device. So the first step
> > > > of
> > > > unregistering the machine driver frees the card which decrements
> > > > the
> > > > refcount for the HDA codec's kobject. In the case of HDMI codec,
> > > > since
> > > > it is not managed by the card, the refcount is not decremented when
> > > > the
> > > > card is removed. 
> > > 
> > > So it's only about hdac_hdmi codec, or only about hdac_hda codec?
> > 
> > It is only about the hdac_hda codec. 
> > > 
> > > And why HDMI codec isn't managed by the card...?  IOW, isn't it
> > > dangerous -- it means the codec being always removable after bound to
> > > the card?
> > That is a good point. Probably this needs to be fixed as well. I can
> > include the change for that if you think it is the right thing to do.
> > 
> > But I was wondering if it makes sense to increment the refcount when
> > the device is added to the card with snd_device_new()? 
> > I'm not sure how it affects the other devices so didnt go down this
> > route.
> 
> If you mean really snd_device_new() calls generically, not specific to
> snd_hda_codec_device_new() -- then no, something must be wrong.
> 
> And even for snd_hda_codec_device_new(), I'm not sure about it.
> 
> Actually, who decrements the device refcount at which timing...?
> It'd be helpful if you can clarify it, then we might see a better
> solution or a better explanation.

Now I'm reading the code again.  Is it about the put_device() in
snd_hdac_ext_bus_device_remove()?

void snd_hdac_ext_bus_device_remove(struct hdac_bus *bus)
{
	struct hdac_device *codec, *__codec;
	/*
	 * we need to remove all the codec devices objects created in the
	 * snd_hdac_ext_bus_device_init
	 */
	list_for_each_entry_safe(codec, __codec, &bus->codec_list, list) {
		snd_hdac_device_unregister(codec);
		put_device(&codec->dev);
	}
}

I don't figure out why put_device() is needed at this place...


Takashi
Ranjani Sridharan May 31, 2019, 3:20 p.m. UTC | #8
On Fri, 2019-05-31 at 16:28 +0200, Takashi Iwai wrote:
> On Fri, 31 May 2019 16:02:30 +0200,
> Takashi Iwai wrote:
> > 
> > On Fri, 31 May 2019 15:52:27 +0200,
> > Ranjani Sridharan wrote:
> > > 
> > > On Fri, 2019-05-31 at 15:25 +0200, Takashi Iwai wrote:
> > > > On Fri, 31 May 2019 15:18:03 +0200,
> > > > Ranjani Sridharan wrote:
> > > > > 
> > > > > On Fri, 2019-05-31 at 08:11 +0200, Takashi Iwai wrote:
> > > > > > On Thu, 30 May 2019 23:00:10 +0200,
> > > > > > Pierre-Louis Bossart wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On 5/30/19 3:18 PM, Ranjani Sridharan wrote:
> > > > > > > > Calling snd_device_new() makes the codec devices
> > > > > > > > managed by
> > > > > > > > the
> > > > > > > > card.
> > > > > > > > So, when the card is removed, the refcount for the
> > > > > > > > codec
> > > > > > > > device is decremented and results in the codec device's
> > > > > > > > kobject
> > > > > > > > being cleaned up if the refcount is 0. But, this leads
> > > > > > > > to a
> > > > > > > > NULL
> > > > > > > > pointer exception while attempting to remove the
> > > > > > > > symlinks
> > > > > > > > when
> > > > > > > > the
> > > > > > > > codec driver is released later on. Therefore, increment
> > > > > > > > the
> > > > > > > > codec
> > > > > > > > device's refcount before adding it to the card to
> > > > > > > > prevent
> > > > > > > > this.
> > > > > > > 
> > > > > > > Ranjani, you should add a bit of context for the rest of
> > > > > > > the
> > > > > > > list...
> > > > > > > 
> > > > > > > This patch suggest a solution to a set of sightings
> > > > > > > occurring
> > > > > > > when
> > > > > > > removing/adding modules in a loop, and the current
> > > > > > > analysis
> > > > > > > points
> > > > > > > to
> > > > > > > a difference between the way the HDMI and HDaudio codecs
> > > > > > > are
> > > > > > > handled.
> > > > > > > 
> > > > > > > https://github.com/thesofproject/linux/issues/981
> > > > > > > https://github.com/thesofproject/linux/issues/966
> > > > > > > https://github.com/thesofproject/linux/pull/988
> > > > > > > 
> > > > > > > Since it's not SOF specific it's better to get feedback
> > > > > > > directly
> > > > > > > from
> > > > > > > the large ALSA community/maintainers. We probably want to
> > > > > > > focus
> > > > > > > on
> > > > > > > the
> > > > > > > platform-specific/vendor-specific stuff on GitHub and use
> > > > > > > the
> > > > > > > mailing
> > > > > > > list for such framework-level changes.
> > > > > > 
> > > > > > Hm, I still wonder why this doens't happen with the HDA
> > > > > > legacy.
> > > > > > 
> > > > > > What is the shortest way to trigger the bug manually
> > > > > > without a
> > > > > > script?
> > > > > 
> > > > > Hi Takashi,
> > > > > 
> > > > > With SOF, I can reproduce the issue if I just unload the
> > > > > sof_pci_dev
> > > > > module with rmmod. 
> > > > > 
> > > > > Basically, the remove routine for the SOF pci device,
> > > > > unregisters
> > > > > the
> > > > > machine driver and then removes the codec device. So the
> > > > > first step
> > > > > of
> > > > > unregistering the machine driver frees the card which
> > > > > decrements
> > > > > the
> > > > > refcount for the HDA codec's kobject. In the case of HDMI
> > > > > codec,
> > > > > since
> > > > > it is not managed by the card, the refcount is not
> > > > > decremented when
> > > > > the
> > > > > card is removed. 
> > > > 
> > > > So it's only about hdac_hdmi codec, or only about hdac_hda
> > > > codec?
> > > 
> > > It is only about the hdac_hda codec. 
> > > > 
> > > > And why HDMI codec isn't managed by the card...?  IOW, isn't it
> > > > dangerous -- it means the codec being always removable after
> > > > bound to
> > > > the card?
> > > 
> > > That is a good point. Probably this needs to be fixed as well. I
> > > can
> > > include the change for that if you think it is the right thing to
> > > do.
> > > 
> > > But I was wondering if it makes sense to increment the refcount
> > > when
> > > the device is added to the card with snd_device_new()? 
> > > I'm not sure how it affects the other devices so didnt go down
> > > this
> > > route.
> > 
> > If you mean really snd_device_new() calls generically, not specific
> > to
> > snd_hda_codec_device_new() -- then no, something must be wrong.
> > 
> > And even for snd_hda_codec_device_new(), I'm not sure about it.
> > 
> > Actually, who decrements the device refcount at which timing...?
> > It'd be helpful if you can clarify it, then we might see a better
> > solution or a better explanation.
> 
> Now I'm reading the code again.  Is it about the put_device() in
> snd_hdac_ext_bus_device_remove()?
> 
> void snd_hdac_ext_bus_device_remove(struct hdac_bus *bus)
> {
> 	struct hdac_device *codec, *__codec;
> 	/*
> 	 * we need to remove all the codec devices objects created in
> the
> 	 * snd_hdac_ext_bus_device_init
> 	 */
> 	list_for_each_entry_safe(codec, __codec, &bus->codec_list,
> list) {
> 		snd_hdac_device_unregister(codec);
> 		put_device(&codec->dev);
> 	}
> }
> 
> I don't figure out why put_device() is needed at this place...
Hi Takashi,

No, this actually comes at the second step in the case of SOF (ie after
the machine driver is unregistered).
 
Actually, I just found out what's causing the issue. It is the call to
snd_hda_codec_dev_free() which calls put_device() when snd_card_free()
is invoked. So, adding a get_device() in snd_hda_codec_device_new()
would make the refcount balanced. 

On the other hand, removing the put_device() in
snd_hda_codec_dev_free() would also address the problem. I'm not sure
which would be the preferred route.

THanks,
Ranjani
> 
> 
> Takashi
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@alsa-project.org
> https://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Takashi Iwai May 31, 2019, 3:37 p.m. UTC | #9
On Fri, 31 May 2019 17:20:33 +0200,
Ranjani Sridharan wrote:
> 
> On Fri, 2019-05-31 at 16:28 +0200, Takashi Iwai wrote:
> > On Fri, 31 May 2019 16:02:30 +0200,
> > Takashi Iwai wrote:
> > > 
> > > On Fri, 31 May 2019 15:52:27 +0200,
> > > Ranjani Sridharan wrote:
> > > > 
> > > > On Fri, 2019-05-31 at 15:25 +0200, Takashi Iwai wrote:
> > > > > On Fri, 31 May 2019 15:18:03 +0200,
> > > > > Ranjani Sridharan wrote:
> > > > > > 
> > > > > > On Fri, 2019-05-31 at 08:11 +0200, Takashi Iwai wrote:
> > > > > > > On Thu, 30 May 2019 23:00:10 +0200,
> > > > > > > Pierre-Louis Bossart wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 5/30/19 3:18 PM, Ranjani Sridharan wrote:
> > > > > > > > > Calling snd_device_new() makes the codec devices
> > > > > > > > > managed by
> > > > > > > > > the
> > > > > > > > > card.
> > > > > > > > > So, when the card is removed, the refcount for the
> > > > > > > > > codec
> > > > > > > > > device is decremented and results in the codec device's
> > > > > > > > > kobject
> > > > > > > > > being cleaned up if the refcount is 0. But, this leads
> > > > > > > > > to a
> > > > > > > > > NULL
> > > > > > > > > pointer exception while attempting to remove the
> > > > > > > > > symlinks
> > > > > > > > > when
> > > > > > > > > the
> > > > > > > > > codec driver is released later on. Therefore, increment
> > > > > > > > > the
> > > > > > > > > codec
> > > > > > > > > device's refcount before adding it to the card to
> > > > > > > > > prevent
> > > > > > > > > this.
> > > > > > > > 
> > > > > > > > Ranjani, you should add a bit of context for the rest of
> > > > > > > > the
> > > > > > > > list...
> > > > > > > > 
> > > > > > > > This patch suggest a solution to a set of sightings
> > > > > > > > occurring
> > > > > > > > when
> > > > > > > > removing/adding modules in a loop, and the current
> > > > > > > > analysis
> > > > > > > > points
> > > > > > > > to
> > > > > > > > a difference between the way the HDMI and HDaudio codecs
> > > > > > > > are
> > > > > > > > handled.
> > > > > > > > 
> > > > > > > > https://github.com/thesofproject/linux/issues/981
> > > > > > > > https://github.com/thesofproject/linux/issues/966
> > > > > > > > https://github.com/thesofproject/linux/pull/988
> > > > > > > > 
> > > > > > > > Since it's not SOF specific it's better to get feedback
> > > > > > > > directly
> > > > > > > > from
> > > > > > > > the large ALSA community/maintainers. We probably want to
> > > > > > > > focus
> > > > > > > > on
> > > > > > > > the
> > > > > > > > platform-specific/vendor-specific stuff on GitHub and use
> > > > > > > > the
> > > > > > > > mailing
> > > > > > > > list for such framework-level changes.
> > > > > > > 
> > > > > > > Hm, I still wonder why this doens't happen with the HDA
> > > > > > > legacy.
> > > > > > > 
> > > > > > > What is the shortest way to trigger the bug manually
> > > > > > > without a
> > > > > > > script?
> > > > > > 
> > > > > > Hi Takashi,
> > > > > > 
> > > > > > With SOF, I can reproduce the issue if I just unload the
> > > > > > sof_pci_dev
> > > > > > module with rmmod. 
> > > > > > 
> > > > > > Basically, the remove routine for the SOF pci device,
> > > > > > unregisters
> > > > > > the
> > > > > > machine driver and then removes the codec device. So the
> > > > > > first step
> > > > > > of
> > > > > > unregistering the machine driver frees the card which
> > > > > > decrements
> > > > > > the
> > > > > > refcount for the HDA codec's kobject. In the case of HDMI
> > > > > > codec,
> > > > > > since
> > > > > > it is not managed by the card, the refcount is not
> > > > > > decremented when
> > > > > > the
> > > > > > card is removed. 
> > > > > 
> > > > > So it's only about hdac_hdmi codec, or only about hdac_hda
> > > > > codec?
> > > > 
> > > > It is only about the hdac_hda codec. 
> > > > > 
> > > > > And why HDMI codec isn't managed by the card...?  IOW, isn't it
> > > > > dangerous -- it means the codec being always removable after
> > > > > bound to
> > > > > the card?
> > > > 
> > > > That is a good point. Probably this needs to be fixed as well. I
> > > > can
> > > > include the change for that if you think it is the right thing to
> > > > do.
> > > > 
> > > > But I was wondering if it makes sense to increment the refcount
> > > > when
> > > > the device is added to the card with snd_device_new()? 
> > > > I'm not sure how it affects the other devices so didnt go down
> > > > this
> > > > route.
> > > 
> > > If you mean really snd_device_new() calls generically, not specific
> > > to
> > > snd_hda_codec_device_new() -- then no, something must be wrong.
> > > 
> > > And even for snd_hda_codec_device_new(), I'm not sure about it.
> > > 
> > > Actually, who decrements the device refcount at which timing...?
> > > It'd be helpful if you can clarify it, then we might see a better
> > > solution or a better explanation.
> > 
> > Now I'm reading the code again.  Is it about the put_device() in
> > snd_hdac_ext_bus_device_remove()?
> > 
> > void snd_hdac_ext_bus_device_remove(struct hdac_bus *bus)
> > {
> > 	struct hdac_device *codec, *__codec;
> > 	/*
> > 	 * we need to remove all the codec devices objects created in
> > the
> > 	 * snd_hdac_ext_bus_device_init
> > 	 */
> > 	list_for_each_entry_safe(codec, __codec, &bus->codec_list,
> > list) {
> > 		snd_hdac_device_unregister(codec);
> > 		put_device(&codec->dev);
> > 	}
> > }
> > 
> > I don't figure out why put_device() is needed at this place...
> Hi Takashi,
> 
> No, this actually comes at the second step in the case of SOF (ie after
> the machine driver is unregistered).
>  
> Actually, I just found out what's causing the issue. It is the call to
> snd_hda_codec_dev_free() which calls put_device() when snd_card_free()
> is invoked. So, adding a get_device() in snd_hda_codec_device_new()
> would make the refcount balanced. 
> 
> On the other hand, removing the put_device() in
> snd_hda_codec_dev_free() would also address the problem. I'm not sure
> which would be the preferred route.

The latter one, I'd say.

Actually the difference between ASoC and the legacy HDA bus is who
releases the device object.  For HDA legacy bus, it's supposed to be
done via snd_device_free() chain, while ASoC bus releases explicitly
as shown in my previous post.

So, if any, I'd paper over it like below.


thanks,

Takashi

--- a/sound/pci/hda/hda_codec.c
+++ b/sound/pci/hda/hda_codec.c
@@ -840,7 +840,12 @@ static int snd_hda_codec_dev_free(struct snd_device *device)
 	if (codec->core.type == HDA_DEV_LEGACY)
 		snd_hdac_device_unregister(&codec->core);
 	codec_display_power(codec, false);
-	put_device(hda_codec_dev(codec));
+	/*
+	 * again, ASoC HD-audio bus manages differently; it's released in
+	 * snd_hdac_ext_bus_device_remove() explicitly
+	 */
+	if (codec->core.type == HDA_DEV_LEGACY)
+		put_device(hda_codec_dev(codec));
 	return 0;
 }
Ranjani Sridharan May 31, 2019, 3:43 p.m. UTC | #10
> > Hi Takashi,
> > 
> > No, this actually comes at the second step in the case of SOF (ie
> > after
> > the machine driver is unregistered).
> >  
> > Actually, I just found out what's causing the issue. It is the call
> > to
> > snd_hda_codec_dev_free() which calls put_device() when
> > snd_card_free()
> > is invoked. So, adding a get_device() in snd_hda_codec_device_new()
> > would make the refcount balanced. 
> > 
> > On the other hand, removing the put_device() in
> > snd_hda_codec_dev_free() would also address the problem. I'm not
> > sure
> > which would be the preferred route.
> 
> The latter one, I'd say.
> 
> Actually the difference between ASoC and the legacy HDA bus is who
> releases the device object.  For HDA legacy bus, it's supposed to be
> done via snd_device_free() chain, while ASoC bus releases explicitly
> as shown in my previous post.
> 
> So, if any, I'd paper over it like below.
OK, makes sense. Let me send a V2 with the change. 

Also, should I also look into adding the change to make hdac_hdmi codec
card managed as well?

Thanks,
Ranjani

> 
> 
> thanks,
> 
> Takashi
> 
> --- a/sound/pci/hda/hda_codec.c
> +++ b/sound/pci/hda/hda_codec.c
> @@ -840,7 +840,12 @@ static int snd_hda_codec_dev_free(struct
> snd_device *device)
>  	if (codec->core.type == HDA_DEV_LEGACY)
>  		snd_hdac_device_unregister(&codec->core);
>  	codec_display_power(codec, false);
> -	put_device(hda_codec_dev(codec));
> +	/*
> +	 * again, ASoC HD-audio bus manages differently; it's released
> in
> +	 * snd_hdac_ext_bus_device_remove() explicitly
> +	 */
> +	if (codec->core.type == HDA_DEV_LEGACY)
> +		put_device(hda_codec_dev(codec));
>  	return 0;
>  }
>
Takashi Iwai May 31, 2019, 3:45 p.m. UTC | #11
On Fri, 31 May 2019 17:43:53 +0200,
Ranjani Sridharan wrote:
> 
> > > Hi Takashi,
> > > 
> > > No, this actually comes at the second step in the case of SOF (ie
> > > after
> > > the machine driver is unregistered).
> > >  
> > > Actually, I just found out what's causing the issue. It is the call
> > > to
> > > snd_hda_codec_dev_free() which calls put_device() when
> > > snd_card_free()
> > > is invoked. So, adding a get_device() in snd_hda_codec_device_new()
> > > would make the refcount balanced. 
> > > 
> > > On the other hand, removing the put_device() in
> > > snd_hda_codec_dev_free() would also address the problem. I'm not
> > > sure
> > > which would be the preferred route.
> > 
> > The latter one, I'd say.
> > 
> > Actually the difference between ASoC and the legacy HDA bus is who
> > releases the device object.  For HDA legacy bus, it's supposed to be
> > done via snd_device_free() chain, while ASoC bus releases explicitly
> > as shown in my previous post.
> > 
> > So, if any, I'd paper over it like below.
> OK, makes sense. Let me send a V2 with the change. 
> 
> Also, should I also look into adding the change to make hdac_hdmi codec
> card managed as well?

Well, feel free to fix more bugs, of course ;)

BTW, I'll be on vacation from tomorrow for a week, so my reply will be
delayed.


thanks,

Takashi
diff mbox series

Patch

diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c
index b20eb7fc83eb..0d5d95b07e19 100644
--- a/sound/pci/hda/hda_codec.c
+++ b/sound/pci/hda/hda_codec.c
@@ -985,6 +985,14 @@  int snd_hda_codec_device_new(struct hda_bus *bus, struct snd_card *card,
 		codec->core.subsystem_id, codec->core.revision_id);
 	snd_component_add(card, component);
 
+	/*
+	 * snd_device_new() makes the codec device managed by the card.
+	 * When the card is removed, the device reference count is
+	 * decremented. Therefore, increment it here to prevent removing
+	 * the codec device's kobject when the card is removed.
+	 */
+	get_device(hda_codec_dev(codec));
+
 	err = snd_device_new(card, SNDRV_DEV_CODEC, codec, &dev_ops);
 	if (err < 0)
 		goto error;