diff mbox series

[v2,12/12] ASoC: SOF: Intel: hda-codec: fix memory allocation

Message ID 20190522162142.11525-13-pierre-louis.bossart@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series ASoC: SOF: stability fixes | expand

Commit Message

Pierre-Louis Bossart May 22, 2019, 4:21 p.m. UTC
From: Libin Yang <libin.yang@intel.com>

Align all users of the hdac library to use devm_kzalloc.

Note for backports/stable: the patch ("ALSA: hdac: fix memory release
for SST and SOF drivers") needs to be applied as well.

Fixes: 5507b8103e26 ("ASoC: SOF: Intel: Add support for HDAudio codecs")
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Libin Yang <libin.yang@intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 sound/soc/sof/intel/hda-codec.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Yang, Libin May 23, 2019, 4:10 a.m. UTC | #1
Please hold on this patch. It seems there is some corner case failed
because of this patch.

Regards,
Libin


>-----Original Message-----
>From: Pierre-Louis Bossart [mailto:pierre-louis.bossart@linux.intel.com]
>Sent: Thursday, May 23, 2019 12:22 AM
>To: alsa-devel@alsa-project.org
>Cc: tiwai@suse.de; broonie@kernel.org; Yang, Libin <libin.yang@intel.com>;
>Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
>Subject: [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix memory allocation
>
>From: Libin Yang <libin.yang@intel.com>
>
>Align all users of the hdac library to use devm_kzalloc.
>
>Note for backports/stable: the patch ("ALSA: hdac: fix memory release for SST
>and SOF drivers") needs to be applied as well.
>
>Fixes: 5507b8103e26 ("ASoC: SOF: Intel: Add support for HDAudio codecs")
>Reviewed-by: Takashi Iwai <tiwai@suse.de>
>Signed-off-by: Libin Yang <libin.yang@intel.com>
>Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
>---
> sound/soc/sof/intel/hda-codec.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
>diff --git a/sound/soc/sof/intel/hda-codec.c b/sound/soc/sof/intel/hda-
>codec.c index b8b37f082309..0d8437b080bf 100644
>--- a/sound/soc/sof/intel/hda-codec.c
>+++ b/sound/soc/sof/intel/hda-codec.c
>@@ -62,8 +62,7 @@ static int hda_codec_probe(struct snd_sof_dev *sdev,
>int address)
> 		address, resp);
>
> #if IS_ENABLED(CONFIG_SND_SOC_SOF_HDA_AUDIO_CODEC)
>-	/* snd_hdac_ext_bus_device_exit will use kfree to free hdev */
>-	hda_priv = kzalloc(sizeof(*hda_priv), GFP_KERNEL);
>+	hda_priv = devm_kzalloc(sdev->dev, sizeof(*hda_priv), GFP_KERNEL);
> 	if (!hda_priv)
> 		return -ENOMEM;
>
>@@ -82,8 +81,7 @@ static int hda_codec_probe(struct snd_sof_dev *sdev,
>int address)
>
> 	return 0;
> #else
>-	/* snd_hdac_ext_bus_device_exit will use kfree to free hdev */
>-	hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
>+	hdev = devm_kzalloc(sdev->dev, sizeof(*hdev), GFP_KERNEL);
> 	if (!hdev)
> 		return -ENOMEM;
>
>--
>2.20.1
Yang, Libin May 23, 2019, 8:03 a.m. UTC | #2
Please let me describe the issue here. 

The test case is:
1) Unload module with script "sudo ./sof_remove.sh" , 
2) reload module with script "sudo ./sof_insert.sh"

After several rounds of removing and inserting kernel modules,
system will complain like below:
"BUG: unable to handle kernel paging request at 000000292a282031"

sof_remove.sh is used to remove the modules:
remove_module sof_pci_dev
remove_module snd_sof_intel_hda_common
remove_module snd_soc_skl_hda_dsp
remove_module snd_soc_hdac_hdmi
remove_module snd_hda_codec_realtek
remove_module snd_hda_codec_generic
remove_module snd_soc_dmic
remove_module snd_soc_hdac_hda
remove_module snd_sof_intel_hda
remove_module snd_hda_ext_core
remove_module snd_hda_codec
remove_module snd_hda_core

remove_module sof_acpi_dev
remove_module snd_soc_acpi_intel_match
remove_module snd_sof_intel_byt
remove_module snd_sof_intel_hsw
remove_module snd_sof_intel_bdw
remove_module snd_sof_xtensa_dsp
remove_module snd_sof_intel_ipc

remove_module snd_soc_sst_bytcr_rt5640
remove_module snd_soc_sst_bytcr_rt5651
remove_module snd_soc_sst_cht_bsw_rt5645
remove_module snd_soc_sst_cht_bsw_rt5670
remove_module snd_soc_sst_byt_cht_da7213
remove_module snd_soc_sst_bxt_pcm512x
remove_module snd_soc_sst_bxt_tdf8532
remove_module snd_soc_cnl_rt274
remove_module snd_sof_nocodec
remove_module snd_sof

remove_module snd_soc_acpi_intel_match

remove_module snd_soc_rt5670
remove_module snd_soc_rt5645
remove_module snd_soc_rt5651
remove_module snd_soc_rt5640
remove_module snd_soc_rl6231
remove_module snd_soc_da7213
remove_module snd_soc_pcm512x_i2c
remove_module snd_soc_pcm512x
remove_module snd_soc_tdf8532
remove_module snd_soc_rt274
remove_module snd_soc_acpi

remove_module snd_soc_core
remove_module snd_pcm

And sof_insert.sh is to insert the modules:
modprobe snd_soc_rt5670
modprobe snd_soc_rt5645
modprobe snd_soc_rt5651
modprobe snd_soc_rt5640
modprobe snd_soc_da7213
modprobe snd_soc_pcm512x_i2c
#modprobe snd_soc_tdf8532
#modprobe snd_soc_rt274

modprobe sof_acpi_dev
modprobe sof_pci_dev

Regards,
Libin


>-----Original Message-----
>From: Yang, Libin
>Sent: Thursday, May 23, 2019 12:10 PM
>To: 'Pierre-Louis Bossart' <pierre-louis.bossart@linux.intel.com>; alsa-
>devel@alsa-project.org
>Cc: tiwai@suse.de; broonie@kernel.org
>Subject: RE: [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix memory
>allocation
>
>Please hold on this patch. It seems there is some corner case failed because of
>this patch.
>
>Regards,
>Libin
>
>
>>-----Original Message-----
>>From: Pierre-Louis Bossart
>>[mailto:pierre-louis.bossart@linux.intel.com]
>>Sent: Thursday, May 23, 2019 12:22 AM
>>To: alsa-devel@alsa-project.org
>>Cc: tiwai@suse.de; broonie@kernel.org; Yang, Libin
>><libin.yang@intel.com>; Pierre-Louis Bossart
>><pierre-louis.bossart@linux.intel.com>
>>Subject: [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix memory
>>allocation
>>
>>From: Libin Yang <libin.yang@intel.com>
>>
>>Align all users of the hdac library to use devm_kzalloc.
>>
>>Note for backports/stable: the patch ("ALSA: hdac: fix memory release
>>for SST and SOF drivers") needs to be applied as well.
>>
>>Fixes: 5507b8103e26 ("ASoC: SOF: Intel: Add support for HDAudio
>>codecs")
>>Reviewed-by: Takashi Iwai <tiwai@suse.de>
>>Signed-off-by: Libin Yang <libin.yang@intel.com>
>>Signed-off-by: Pierre-Louis Bossart
>><pierre-louis.bossart@linux.intel.com>
>>---
>> sound/soc/sof/intel/hda-codec.c | 6 ++----
>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>
>>diff --git a/sound/soc/sof/intel/hda-codec.c b/sound/soc/sof/intel/hda-
>>codec.c index b8b37f082309..0d8437b080bf 100644
>>--- a/sound/soc/sof/intel/hda-codec.c
>>+++ b/sound/soc/sof/intel/hda-codec.c
>>@@ -62,8 +62,7 @@ static int hda_codec_probe(struct snd_sof_dev *sdev,
>>int address)
>> 		address, resp);
>>
>> #if IS_ENABLED(CONFIG_SND_SOC_SOF_HDA_AUDIO_CODEC)
>>-	/* snd_hdac_ext_bus_device_exit will use kfree to free hdev */
>>-	hda_priv = kzalloc(sizeof(*hda_priv), GFP_KERNEL);
>>+	hda_priv = devm_kzalloc(sdev->dev, sizeof(*hda_priv), GFP_KERNEL);
>> 	if (!hda_priv)
>> 		return -ENOMEM;
>>
>>@@ -82,8 +81,7 @@ static int hda_codec_probe(struct snd_sof_dev *sdev,
>>int address)
>>
>> 	return 0;
>> #else
>>-	/* snd_hdac_ext_bus_device_exit will use kfree to free hdev */
>>-	hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
>>+	hdev = devm_kzalloc(sdev->dev, sizeof(*hdev), GFP_KERNEL);
>> 	if (!hdev)
>> 		return -ENOMEM;
>>
>>--
>>2.20.1
Takashi Iwai May 23, 2019, 8:15 a.m. UTC | #3
On Thu, 23 May 2019 10:03:03 +0200,
Yang, Libin wrote:
> 
> Please let me describe the issue here. 
> 
> The test case is:
> 1) Unload module with script "sudo ./sof_remove.sh" , 
> 2) reload module with script "sudo ./sof_insert.sh"
> 
> After several rounds of removing and inserting kernel modules,
> system will complain like below:
> "BUG: unable to handle kernel paging request at 000000292a282031"

Did you try some kernel debug options?  It might show what went wrong.


Takashi
Yang, Libin May 23, 2019, 8:21 a.m. UTC | #4
Hi Takashi,

>-----Original Message-----
>From: Takashi Iwai [mailto:tiwai@suse.de]
>Sent: Thursday, May 23, 2019 4:16 PM
>To: Yang, Libin <libin.yang@intel.com>
>Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
>devel@alsa-project.org; broonie@kernel.org
>Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix
>memory allocation
>
>On Thu, 23 May 2019 10:03:03 +0200,
>Yang, Libin wrote:
>>
>> Please let me describe the issue here.
>>
>> The test case is:
>> 1) Unload module with script "sudo ./sof_remove.sh" ,
>> 2) reload module with script "sudo ./sof_insert.sh"
>>
>> After several rounds of removing and inserting kernel modules, system
>> will complain like below:
>> "BUG: unable to handle kernel paging request at 000000292a282031"
>
>Did you try some kernel debug options?  It might show what went wrong.

No, I haven't. I'm not sure which options I can use for this case. Could you
please give me some suggestions?

Regards,
Libin

>
>
>Takashi
Yang, Libin May 23, 2019, 8:24 a.m. UTC | #5
Hi Takashi,


>>-----Original Message-----
>>From: Takashi Iwai [mailto:tiwai@suse.de]
>>Sent: Thursday, May 23, 2019 4:16 PM
>>To: Yang, Libin <libin.yang@intel.com>
>>Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
>>devel@alsa-project.org; broonie@kernel.org
>>Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec:
>>fix memory allocation
>>
>>On Thu, 23 May 2019 10:03:03 +0200,
>>Yang, Libin wrote:
>>>
>>> Please let me describe the issue here.
>>>
>>> The test case is:
>>> 1) Unload module with script "sudo ./sof_remove.sh" ,
>>> 2) reload module with script "sudo ./sof_insert.sh"
>>>
>>> After several rounds of removing and inserting kernel modules, system
>>> will complain like below:
>>> "BUG: unable to handle kernel paging request at 000000292a282031"
>>
>>Did you try some kernel debug options?  It might show what went wrong.
>
>No, I haven't. I'm not sure which options I can use for this case. Could you
>please give me some suggestions?

BTW: I have enabled DEBUG_DEVRES. Are there any other options I can try?

Regards,
Libin 

>
>Regards,
>Libin
>
>>
>>
>>Takashi
Takashi Iwai May 23, 2019, 8:27 a.m. UTC | #6
On Thu, 23 May 2019 10:21:20 +0200,
Yang, Libin wrote:
> 
> Hi Takashi,
> 
> >-----Original Message-----
> >From: Takashi Iwai [mailto:tiwai@suse.de]
> >Sent: Thursday, May 23, 2019 4:16 PM
> >To: Yang, Libin <libin.yang@intel.com>
> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
> >devel@alsa-project.org; broonie@kernel.org
> >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix
> >memory allocation
> >
> >On Thu, 23 May 2019 10:03:03 +0200,
> >Yang, Libin wrote:
> >>
> >> Please let me describe the issue here.
> >>
> >> The test case is:
> >> 1) Unload module with script "sudo ./sof_remove.sh" ,
> >> 2) reload module with script "sudo ./sof_insert.sh"
> >>
> >> After several rounds of removing and inserting kernel modules, system
> >> will complain like below:
> >> "BUG: unable to handle kernel paging request at 000000292a282031"
> >
> >Did you try some kernel debug options?  It might show what went wrong.
> 
> No, I haven't. I'm not sure which options I can use for this case. Could you
> please give me some suggestions?

You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option
for showing each devres allocation and removal.  And I'd try
CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever interesting in
CONFIG_DEBUG section, too.


Takashi
Yang, Libin May 23, 2019, 8:34 a.m. UTC | #7
>-----Original Message-----
>From: Takashi Iwai [mailto:tiwai@suse.de]
>Sent: Thursday, May 23, 2019 4:27 PM
>To: Yang, Libin <libin.yang@intel.com>
>Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
>devel@alsa-project.org; broonie@kernel.org
>Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix
>memory allocation
>
>On Thu, 23 May 2019 10:21:20 +0200,
>Yang, Libin wrote:
>>
>> Hi Takashi,
>>
>> >-----Original Message-----
>> >From: Takashi Iwai [mailto:tiwai@suse.de]
>> >Sent: Thursday, May 23, 2019 4:16 PM
>> >To: Yang, Libin <libin.yang@intel.com>
>> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>;
>> >alsa- devel@alsa-project.org; broonie@kernel.org
>> >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel:
>> >hda-codec: fix memory allocation
>> >
>> >On Thu, 23 May 2019 10:03:03 +0200,
>> >Yang, Libin wrote:
>> >>
>> >> Please let me describe the issue here.
>> >>
>> >> The test case is:
>> >> 1) Unload module with script "sudo ./sof_remove.sh" ,
>> >> 2) reload module with script "sudo ./sof_insert.sh"
>> >>
>> >> After several rounds of removing and inserting kernel modules,
>> >> system will complain like below:
>> >> "BUG: unable to handle kernel paging request at 000000292a282031"
>> >
>> >Did you try some kernel debug options?  It might show what went wrong.
>>
>> No, I haven't. I'm not sure which options I can use for this case.
>> Could you please give me some suggestions?
>
>You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option for
>showing each devres allocation and removal.  And I'd try
>CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever
>interesting in CONFIG_DEBUG section, too.

Thanks. I will have a try.

Regards,
Libin

>
>
>Takashi
Pierre-Louis Bossart May 23, 2019, 11:35 a.m. UTC | #8
On 5/23/19 3:24 AM, Yang, Libin wrote:
> Hi Takashi,
> 
> 
>>> -----Original Message-----
>>> From: Takashi Iwai [mailto:tiwai@suse.de]
>>> Sent: Thursday, May 23, 2019 4:16 PM
>>> To: Yang, Libin <libin.yang@intel.com>
>>> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
>>> devel@alsa-project.org; broonie@kernel.org
>>> Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec:
>>> fix memory allocation
>>>
>>> On Thu, 23 May 2019 10:03:03 +0200,
>>> Yang, Libin wrote:
>>>>
>>>> Please let me describe the issue here.
>>>>
>>>> The test case is:
>>>> 1) Unload module with script "sudo ./sof_remove.sh" ,
>>>> 2) reload module with script "sudo ./sof_insert.sh"
>>>>
>>>> After several rounds of removing and inserting kernel modules, system
>>>> will complain like below:
>>>> "BUG: unable to handle kernel paging request at 000000292a282031"
>>>
>>> Did you try some kernel debug options?  It might show what went wrong.
>>
>> No, I haven't. I'm not sure which options I can use for this case. Could you
>> please give me some suggestions?
> 
> BTW: I have enabled DEBUG_DEVRES. Are there any other options I can try?

There are already a set of kconfig fragments for debug, see 
https://github.com/thesofproject/kconfig and select memory-debug-defconfig.

I guess I will need to require this test in the SOF CI, I really don't 
get how this issue was not seen earlier. Gah.
Yang, Libin May 24, 2019, 1:10 a.m. UTC | #9
Hi Peirre,

>>>>
>>>> On Thu, 23 May 2019 10:03:03 +0200,
>>>> Yang, Libin wrote:
>>>>>
>>>>> Please let me describe the issue here.
>>>>>
>>>>> The test case is:
>>>>> 1) Unload module with script "sudo ./sof_remove.sh" ,
>>>>> 2) reload module with script "sudo ./sof_insert.sh"
>>>>>
>>>>> After several rounds of removing and inserting kernel modules,
>>>>> system will complain like below:
>>>>> "BUG: unable to handle kernel paging request at 000000292a282031"
>>>>
>>>> Did you try some kernel debug options?  It might show what went wrong.
>>>
>>> No, I haven't. I'm not sure which options I can use for this case.
>>> Could you please give me some suggestions?
>>
>> BTW: I have enabled DEBUG_DEVRES. Are there any other options I can try?
>
>There are already a set of kconfig fragments for debug, see
>https://github.com/thesofproject/kconfig and select memory-debug-defconfig.
>
>I guess I will need to require this test in the SOF CI, I really don't get how this
>issue was not seen earlier. Gah.

This bug can't be reproduced easily sometimes. Sometimes, the bug will be hit 
at the second or third round of the test. And sometimes, we need do more 
rounds to reproduce this bug. This may be the reason we didn't hit this bug before.

Regards,
Libin
Yang, Libin June 3, 2019, 9:10 a.m. UTC | #10
Hi Takashi,

>-----Original Message-----
>From: Takashi Iwai [mailto:tiwai@suse.de]
>Sent: Thursday, May 23, 2019 4:27 PM
>To: Yang, Libin <libin.yang@intel.com>
>Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
>devel@alsa-project.org; broonie@kernel.org
>Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix
>memory allocation
>
>On Thu, 23 May 2019 10:21:20 +0200,
>Yang, Libin wrote:
>>
>> Hi Takashi,
>>
>> >-----Original Message-----
>> >From: Takashi Iwai [mailto:tiwai@suse.de]
>> >Sent: Thursday, May 23, 2019 4:16 PM
>> >To: Yang, Libin <libin.yang@intel.com>
>> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>;
>> >alsa- devel@alsa-project.org; broonie@kernel.org
>> >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel:
>> >hda-codec: fix memory allocation
>> >
>> >On Thu, 23 May 2019 10:03:03 +0200,
>> >Yang, Libin wrote:
>> >>
>> >> Please let me describe the issue here.
>> >>
>> >> The test case is:
>> >> 1) Unload module with script "sudo ./sof_remove.sh" ,
>> >> 2) reload module with script "sudo ./sof_insert.sh"
>> >>
>> >> After several rounds of removing and inserting kernel modules,
>> >> system will complain like below:
>> >> "BUG: unable to handle kernel paging request at 000000292a282031"
>> >
>> >Did you try some kernel debug options?  It might show what went wrong.
>>
>> No, I haven't. I'm not sure which options I can use for this case.
>> Could you please give me some suggestions?
>
>You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option for
>showing each devres allocation and removal.  And I'd try
>CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever
>interesting in CONFIG_DEBUG section, too.

Thanks for your suggestion. After more than 1 week debug, I think maybe 
I have root caused this issue from the devres.log message.

Below is my finding.
1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called,
and it will set hdev->dev.release = default_release.
However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe()
will be called later. And it will call snd_hda_codec_device_new(), which will
reset codec->code.dev.release = snd_hda_codec_dev_release;
This means hdac_hdmi's hdev dev release is default_release() defined in 
hdac_ext_bus.c, and other hda codec's hdev dev release is
snd_hda_codec_dev_release().

Both default_release() and snd_hda_codec_dev_release() will call kfree()
to free the hdac_device (or its container) in the current code.

2. When we run rmmod sof_pci_dev, it will free the sdev. If we use
Struct hdac_device *hdev = devm_kzalloc(sdev->dev...). This means 
hdev will also be freed automatically.

In the removal, snd_hdac_ext_bus_device_remove() will be called
to remove the hdev (in this function it is struct hdac_device *codec.
The name is not aligned in different places). 
However for hdac_hdmi, the hdev->dev is used by other code. 
So calling device.release() (the function default_release()) will
be postponed. After after sdev is freed, the device.release() will
be called. But for devm_xxx, hdev will also be freed when sdev is
freed. This means hdev.dev after sdev is freed is invalid now as
hdev has already freed. It will access invalid memory. This will cause the bug.

So I think we should not use devm_xxx, and let's free the hdev manually.

At the end of this topic, I still found 2 suspicious code in the current code.
1. in sound/soc/intel/skylate/skl.c
it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc().
As we will call kfree() in the current code, should we replace it with
kzalloc()? Maybe we need cavs drivers owner's help on it.

2. in snd_hdac_ext_bus_device_remove()
It will call snd_hdac_device_unregister() to unregister the hdac_devices
and put_device(&codec->dev) to release the dev.
For analog codec, snd_hdac_device_unregister()  will free the codec->dev's
kobject. And snd_hda_codec_dev_release() will be called to free the 
hdac_device.
So it is invalid to call put_device(&codec->dev). If you print 
refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before
put_device(), you will find the refcount has already been 0.

Regards,
Libin 

>
>
>Takashi
Pierre-Louis Bossart June 3, 2019, 1:45 p.m. UTC | #11
Hi Libin,

>>>>> Please let me describe the issue here.
>>>>>
>>>>> The test case is:
>>>>> 1) Unload module with script "sudo ./sof_remove.sh" ,
>>>>> 2) reload module with script "sudo ./sof_insert.sh"
>>>>>
>>>>> After several rounds of removing and inserting kernel modules,
>>>>> system will complain like below:
>>>>> "BUG: unable to handle kernel paging request at 000000292a282031"
>>>>
>>>> Did you try some kernel debug options?  It might show what went wrong.
>>>
>>> No, I haven't. I'm not sure which options I can use for this case.
>>> Could you please give me some suggestions?
>>
>> You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option for
>> showing each devres allocation and removal.  And I'd try
>> CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever
>> interesting in CONFIG_DEBUG section, too.
> 
> Thanks for your suggestion. After more than 1 week debug, I think maybe
> I have root caused this issue from the devres.log message.
> 
> Below is my finding.
> 1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called,
> and it will set hdev->dev.release = default_release.
> However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe()
> will be called later. And it will call snd_hda_codec_device_new(), which will
> reset codec->code.dev.release = snd_hda_codec_dev_release;
> This means hdac_hdmi's hdev dev release is default_release() defined in
> hdac_ext_bus.c, and other hda codec's hdev dev release is
> snd_hda_codec_dev_release().
> 
> Both default_release() and snd_hda_codec_dev_release() will call kfree()
> to free the hdac_device (or its container) in the current code.
> 
> 2. When we run rmmod sof_pci_dev, it will free the sdev. If we use
> Struct hdac_device *hdev = devm_kzalloc(sdev->dev...). This means
> hdev will also be freed automatically.
> 
> In the removal, snd_hdac_ext_bus_device_remove() will be called
> to remove the hdev (in this function it is struct hdac_device *codec.
> The name is not aligned in different places).
> However for hdac_hdmi, the hdev->dev is used by other code.

what other code? Can you elaborate on why the release is delayed?

> So calling device.release() (the function default_release()) will
> be postponed. After after sdev is freed, the device.release() will
> be called. But for devm_xxx, hdev will also be freed when sdev is
> freed. This means hdev.dev after sdev is freed is invalid now as
> hdev has already freed. It will access invalid memory. This will cause the bug.

This is very hard to follow. 4 lines above you wrote the release is 
postponed but the way you describe is looks completely sequential.

> 
> So I think we should not use devm_xxx, and let's free the hdev manually.
> 
> At the end of this topic, I still found 2 suspicious code in the current code.
> 1. in sound/soc/intel/skylate/skl.c
> it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc().
> As we will call kfree() in the current code, should we replace it with
> kzalloc()? Maybe we need cavs drivers owner's help on it.

maybe you should send a diff suggestion to help everyone understand the 
changes you are referring to?

> 
> 2. in snd_hdac_ext_bus_device_remove()
> It will call snd_hdac_device_unregister() to unregister the hdac_devices
> and put_device(&codec->dev) to release the dev.
> For analog codec, snd_hdac_device_unregister()  will free the codec->dev's
> kobject. And snd_hda_codec_dev_release() will be called to free the
> hdac_device.
> So it is invalid to call put_device(&codec->dev). If you print
> refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before
> put_device(), you will find the refcount has already been 0.

Isn't it a different problem though? Does this cause a freeze or is this 
just a bad refcount?
Yang, Libin June 4, 2019, 1:21 a.m. UTC | #12
>>> >On Thu, 23 May 2019 10:03:03 +0200,
>>> >Yang, Libin wrote:
>>> >>
>>> >> Please let me describe the issue here.
>>> >>
>>> >> The test case is:
>>> >> 1) Unload module with script "sudo ./sof_remove.sh" ,
>>> >> 2) reload module with script "sudo ./sof_insert.sh"
>>> >>
>>> >> After several rounds of removing and inserting kernel modules,
>>> >> system will complain like below:
>>> >> "BUG: unable to handle kernel paging request at 000000292a282031"
>>> >
>>> >Did you try some kernel debug options?  It might show what went wrong.
>>>
>>> No, I haven't. I'm not sure which options I can use for this case.
>>> Could you please give me some suggestions?
>>
>>You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option
>for
>>showing each devres allocation and removal.  And I'd try
>>CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever
>interesting in
>>CONFIG_DEBUG section, too.
>
>Thanks for your suggestion. After more than 1 week debug, I think maybe I
>have root caused this issue from the devres.log message.
>
>Below is my finding.
>1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called,
>and it will set hdev->dev.release = default_release.
>However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe()
>will be called later. And it will call snd_hda_codec_device_new(), which will
>reset codec->code.dev.release = snd_hda_codec_dev_release; This means
>hdac_hdmi's hdev dev release is default_release() defined in hdac_ext_bus.c,
>and other hda codec's hdev dev release is snd_hda_codec_dev_release().
>
>Both default_release() and snd_hda_codec_dev_release() will call kfree() to
>free the hdac_device (or its container) in the current code.
>
>2. When we run rmmod sof_pci_dev, it will free the sdev. If we use Struct
>hdac_device *hdev = devm_kzalloc(sdev->dev...). This means hdev will also be
>freed automatically.
>
>In the removal, snd_hdac_ext_bus_device_remove() will be called to remove
>the hdev (in this function it is struct hdac_device *codec.
>The name is not aligned in different places).
>However for hdac_hdmi, the hdev->dev is used by other code.
>So calling device.release() (the function default_release()) will be postponed.
>After after sdev is freed, the device.release() will be called. But for devm_xxx,
>hdev will also be freed when sdev is freed. This means hdev.dev after sdev is
>freed is invalid now as hdev has already freed. It will access invalid memory.
>This will cause the bug.
>
>So I think we should not use devm_xxx, and let's free the hdev manually.
>
>At the end of this topic, I still found 2 suspicious code in the current code.
>1. in sound/soc/intel/skylate/skl.c
>it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc().
>As we will call kfree() in the current code, should we replace it with kzalloc()?
>Maybe we need cavs drivers owner's help on it.
>
>2. in snd_hdac_ext_bus_device_remove()
>It will call snd_hdac_device_unregister() to unregister the hdac_devices and
>put_device(&codec->dev) to release the dev.
>For analog codec, snd_hdac_device_unregister()  will free the codec->dev's
>kobject. And snd_hda_codec_dev_release() will be called to free the
>hdac_device.
>So it is invalid to call put_device(&codec->dev). If you print
>refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before
>put_device(), you will find the refcount has already been 0.

It seems Ranjani's " ASoC: hda: increment codec device refcount
when it is added to the card" patch can fix the second issue.

Regards,
Libin
Yang, Libin June 4, 2019, 1:38 a.m. UTC | #13
Hi Pierre,

>Hi Libin,
>
>>>>>> Please let me describe the issue here.
>>>>>>
>>>>>> The test case is:
>>>>>> 1) Unload module with script "sudo ./sof_remove.sh" ,
>>>>>> 2) reload module with script "sudo ./sof_insert.sh"
>>>>>>
>>>>>> After several rounds of removing and inserting kernel modules,
>>>>>> system will complain like below:
>>>>>> "BUG: unable to handle kernel paging request at 000000292a282031"
>>>>>
>>>>> Did you try some kernel debug options?  It might show what went wrong.
>>>>
>>>> No, I haven't. I'm not sure which options I can use for this case.
>>>> Could you please give me some suggestions?
>>>
>>> You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option
>>> for showing each devres allocation and removal.  And I'd try
>>> CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever
>interesting
>>> in CONFIG_DEBUG section, too.
>>
>> Thanks for your suggestion. After more than 1 week debug, I think
>> maybe I have root caused this issue from the devres.log message.
>>
>> Below is my finding.
>> 1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be
>> called, and it will set hdev->dev.release = default_release.
>> However, for analog codec (not hdac_hdmi codec),
>> hdac_hda_codec_probe() will be called later. And it will call
>> snd_hda_codec_device_new(), which will reset codec->code.dev.release =
>> snd_hda_codec_dev_release; This means hdac_hdmi's hdev dev release is
>> default_release() defined in hdac_ext_bus.c, and other hda codec's
>> hdev dev release is snd_hda_codec_dev_release().
>>
>> Both default_release() and snd_hda_codec_dev_release() will call
>> kfree() to free the hdac_device (or its container) in the current code.
>>
>> 2. When we run rmmod sof_pci_dev, it will free the sdev. If we use
>> Struct hdac_device *hdev = devm_kzalloc(sdev->dev...). This means hdev
>> will also be freed automatically.
>>
>> In the removal, snd_hdac_ext_bus_device_remove() will be called to
>> remove the hdev (in this function it is struct hdac_device *codec.
>> The name is not aligned in different places).
>> However for hdac_hdmi, the hdev->dev is used by other code.
>
>what other code? Can you elaborate on why the release is delayed?

From the dmesg, it is the device_link used for hdac_hdmi that causes
the release delay. Device_link will increase the refcount.

>
>> So calling device.release() (the function default_release()) will be
>> postponed. After after sdev is freed, the device.release() will be
>> called. But for devm_xxx, hdev will also be freed when sdev is freed.
>> This means hdev.dev after sdev is freed is invalid now as hdev has
>> already freed. It will access invalid memory. This will cause the bug.
>
>This is very hard to follow. 4 lines above you wrote the release is postponed
>but the way you describe is looks completely sequential.

For hdac_hdmi, as it is used by other code (device_link), the device release
will not be called immediately. After sdev and hdev is freed, the hdev
device is put by device_link. Then hdev device can be released. But hdev
has already be freed. This is wrong.

>
>>
>> So I think we should not use devm_xxx, and let's free the hdev manually.
>>
>> At the end of this topic, I still found 2 suspicious code in the current code.
>> 1. in sound/soc/intel/skylate/skl.c
>> it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc().
>> As we will call kfree() in the current code, should we replace it with
>> kzalloc()? Maybe we need cavs drivers owner's help on it.
>
>maybe you should send a diff suggestion to help everyone understand the
>changes you are referring to?

Please see my inline patch:

diff --git a/sound/soc/intel/skylake/skl.c b/sound/soc/intel/skylake/skl.c
index f864f7b..12af99a 100644
--- a/sound/soc/intel/skylake/skl.c
+++ b/sound/soc/intel/skylake/skl.c
@@ -697,8 +697,7 @@ static int probe_codec(struct hdac_bus *bus, int addr)
        dev_dbg(bus->dev, "codec #%d probed OK: %x\n", addr, res);

 #if IS_ENABLED(CONFIG_SND_SOC_INTEL_SKYLAKE_HDAUDIO_CODEC)
-       hda_codec = devm_kzalloc(&skl->pci->dev, sizeof(*hda_codec),
-                                GFP_KERNEL);
+       hda_codec = kzalloc(sizeof(*hda_codec), GFP_KERNEL);
        if (!hda_codec)
                return -ENOMEM;

@@ -716,7 +715,7 @@ static int probe_codec(struct hdac_bus *bus, int addr)
        }
        return 0;
 #else
-       hdev = devm_kzalloc(&skl->pci->dev, sizeof(*hdev), GFP_KERNEL);
+       hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
        if (!hdev)
                return -ENOMEM;

>
>>
>> 2. in snd_hdac_ext_bus_device_remove() It will call
>> snd_hdac_device_unregister() to unregister the hdac_devices and
>> put_device(&codec->dev) to release the dev.
>> For analog codec, snd_hdac_device_unregister()  will free the
>> codec->dev's kobject. And snd_hda_codec_dev_release() will be called
>> to free the hdac_device.
>> So it is invalid to call put_device(&codec->dev). If you print
>> refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec
>> before put_device(), you will find the refcount has already been 0.
>
>Isn't it a different problem though? Does this cause a freeze or is this just a
>bad refcount?

Yes, it is a totally different problem. And today morning, I found Ranjani has
a patch " ASoC: hda: increment codec device refcount when it is 
added to the card ", which should fix this bug.

Regards,
Libin
Yang, Libin June 17, 2019, 7:30 a.m. UTC | #14
>-----Original Message-----
>From: Yang, Libin
>Sent: Monday, June 3, 2019 5:10 PM
>To: Takashi Iwai <tiwai@suse.de>
>Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
>devel@alsa-project.org; broonie@kernel.org
>Subject: RE: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix
>memory allocation
>
>Hi Takashi,
>
>>-----Original Message-----
>>From: Takashi Iwai [mailto:tiwai@suse.de]
>>Sent: Thursday, May 23, 2019 4:27 PM
>>To: Yang, Libin <libin.yang@intel.com>
>>Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
>>devel@alsa-project.org; broonie@kernel.org
>>Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec:
>>fix memory allocation
>>
>>On Thu, 23 May 2019 10:21:20 +0200,
>>Yang, Libin wrote:
>>>
>>> Hi Takashi,
>>>
>>> >-----Original Message-----
>>> >From: Takashi Iwai [mailto:tiwai@suse.de]
>>> >Sent: Thursday, May 23, 2019 4:16 PM
>>> >To: Yang, Libin <libin.yang@intel.com>
>>> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>;
>>> >alsa- devel@alsa-project.org; broonie@kernel.org
>>> >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel:
>>> >hda-codec: fix memory allocation
>>> >
>>> >On Thu, 23 May 2019 10:03:03 +0200,
>>> >Yang, Libin wrote:
>>> >>
>>> >> Please let me describe the issue here.
>>> >>
>>> >> The test case is:
>>> >> 1) Unload module with script "sudo ./sof_remove.sh" ,
>>> >> 2) reload module with script "sudo ./sof_insert.sh"
>>> >>
>>> >> After several rounds of removing and inserting kernel modules,
>>> >> system will complain like below:
>>> >> "BUG: unable to handle kernel paging request at 000000292a282031"
>>> >
>>> >Did you try some kernel debug options?  It might show what went wrong.
>>>
>>> No, I haven't. I'm not sure which options I can use for this case.
>>> Could you please give me some suggestions?
>>
>>You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option
>for
>>showing each devres allocation and removal.  And I'd try
>>CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever
>interesting in
>>CONFIG_DEBUG section, too.
>
>Thanks for your suggestion. After more than 1 week debug, I think maybe I
>have root caused this issue from the devres.log message.
>
>Below is my finding.
>1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called,
>and it will set hdev->dev.release = default_release.
>However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe()
>will be called later. And it will call snd_hda_codec_device_new(), which will
>reset codec->code.dev.release = snd_hda_codec_dev_release; This means
>hdac_hdmi's hdev dev release is default_release() defined in hdac_ext_bus.c,
>and other hda codec's hdev dev release is snd_hda_codec_dev_release().
>
>Both default_release() and snd_hda_codec_dev_release() will call kfree() to
>free the hdac_device (or its container) in the current code.
>
>2. When we run rmmod sof_pci_dev, it will free the sdev. If we use Struct
>hdac_device *hdev = devm_kzalloc(sdev->dev...). This means hdev will also be
>freed automatically.
>
>In the removal, snd_hdac_ext_bus_device_remove() will be called to remove
>the hdev (in this function it is struct hdac_device *codec.
>The name is not aligned in different places).
>However for hdac_hdmi, the hdev->dev is used by other code.
>So calling device.release() (the function default_release()) will be postponed.
>After after sdev is freed, the device.release() will be called. But for devm_xxx,
>hdev will also be freed when sdev is freed. This means hdev.dev after sdev is
>freed is invalid now as hdev has already freed. It will access invalid memory.
>This will cause the bug.
>
>So I think we should not use devm_xxx, and let's free the hdev manually.
>
>At the end of this topic, I still found 2 suspicious code in the current code.
>1. in sound/soc/intel/skylate/skl.c
>it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc().
>As we will call kfree() in the current code, should we replace it with kzalloc()?
>Maybe we need cavs drivers owner's help on it.
>
>2. in snd_hdac_ext_bus_device_remove()
>It will call snd_hdac_device_unregister() to unregister the hdac_devices and
>put_device(&codec->dev) to release the dev.
>For analog codec, snd_hdac_device_unregister()  will free the codec->dev's
>kobject. And snd_hda_codec_dev_release() will be called to free the
>hdac_device.
>So it is invalid to call put_device(&codec->dev). If you print
>refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before
>put_device(), you will find the refcount has already been 0.

I tested on the latest code. The behavior on kernel 5.2.0-rc1 is different from
it on kernel 5.0.0 of https://github.com/thesofproject/linux/. (thesofproject
kernel has already been upgraded to 5.2.0-rc1 now)

On kernel 5.0.0 (https://github.com/thesofproject/linux/), when removing
the sof kernel modules, the sequence is:
1. devm_kazlloc_release() to release hdac_device
2. snd_hdac_ext_bus_device_exit(): this function will use hdac_device

However, on kernel 5.2.0-rc1, the sequence is:
1. snd_hdac_ext_bus_device_exit()
2. devm_kazlloc_release() to release hdac_device

So I think it is safe now to use devm_xxx to allocate the hdac_device.
Also, Ranjani found a duplicated release of hdac_device. She has
a patch to fix it.

Regards,
Libin

>
>Regards,
>Libin
>
>>
>>
>>Takashi
diff mbox series

Patch

diff --git a/sound/soc/sof/intel/hda-codec.c b/sound/soc/sof/intel/hda-codec.c
index b8b37f082309..0d8437b080bf 100644
--- a/sound/soc/sof/intel/hda-codec.c
+++ b/sound/soc/sof/intel/hda-codec.c
@@ -62,8 +62,7 @@  static int hda_codec_probe(struct snd_sof_dev *sdev, int address)
 		address, resp);
 
 #if IS_ENABLED(CONFIG_SND_SOC_SOF_HDA_AUDIO_CODEC)
-	/* snd_hdac_ext_bus_device_exit will use kfree to free hdev */
-	hda_priv = kzalloc(sizeof(*hda_priv), GFP_KERNEL);
+	hda_priv = devm_kzalloc(sdev->dev, sizeof(*hda_priv), GFP_KERNEL);
 	if (!hda_priv)
 		return -ENOMEM;
 
@@ -82,8 +81,7 @@  static int hda_codec_probe(struct snd_sof_dev *sdev, int address)
 
 	return 0;
 #else
-	/* snd_hdac_ext_bus_device_exit will use kfree to free hdev */
-	hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
+	hdev = devm_kzalloc(sdev->dev, sizeof(*hdev), GFP_KERNEL);
 	if (!hdev)
 		return -ENOMEM;