Message ID | 20190522162142.11525-13-pierre-louis.bossart@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ASoC: SOF: stability fixes | expand |
Please hold on this patch. It seems there is some corner case failed because of this patch. Regards, Libin >-----Original Message----- >From: Pierre-Louis Bossart [mailto:pierre-louis.bossart@linux.intel.com] >Sent: Thursday, May 23, 2019 12:22 AM >To: alsa-devel@alsa-project.org >Cc: tiwai@suse.de; broonie@kernel.org; Yang, Libin <libin.yang@intel.com>; >Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> >Subject: [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix memory allocation > >From: Libin Yang <libin.yang@intel.com> > >Align all users of the hdac library to use devm_kzalloc. > >Note for backports/stable: the patch ("ALSA: hdac: fix memory release for SST >and SOF drivers") needs to be applied as well. > >Fixes: 5507b8103e26 ("ASoC: SOF: Intel: Add support for HDAudio codecs") >Reviewed-by: Takashi Iwai <tiwai@suse.de> >Signed-off-by: Libin Yang <libin.yang@intel.com> >Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> >--- > sound/soc/sof/intel/hda-codec.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > >diff --git a/sound/soc/sof/intel/hda-codec.c b/sound/soc/sof/intel/hda- >codec.c index b8b37f082309..0d8437b080bf 100644 >--- a/sound/soc/sof/intel/hda-codec.c >+++ b/sound/soc/sof/intel/hda-codec.c >@@ -62,8 +62,7 @@ static int hda_codec_probe(struct snd_sof_dev *sdev, >int address) > address, resp); > > #if IS_ENABLED(CONFIG_SND_SOC_SOF_HDA_AUDIO_CODEC) >- /* snd_hdac_ext_bus_device_exit will use kfree to free hdev */ >- hda_priv = kzalloc(sizeof(*hda_priv), GFP_KERNEL); >+ hda_priv = devm_kzalloc(sdev->dev, sizeof(*hda_priv), GFP_KERNEL); > if (!hda_priv) > return -ENOMEM; > >@@ -82,8 +81,7 @@ static int hda_codec_probe(struct snd_sof_dev *sdev, >int address) > > return 0; > #else >- /* snd_hdac_ext_bus_device_exit will use kfree to free hdev */ >- hdev = kzalloc(sizeof(*hdev), GFP_KERNEL); >+ hdev = devm_kzalloc(sdev->dev, sizeof(*hdev), GFP_KERNEL); > if (!hdev) > return -ENOMEM; > >-- >2.20.1
Please let me describe the issue here. The test case is: 1) Unload module with script "sudo ./sof_remove.sh" , 2) reload module with script "sudo ./sof_insert.sh" After several rounds of removing and inserting kernel modules, system will complain like below: "BUG: unable to handle kernel paging request at 000000292a282031" sof_remove.sh is used to remove the modules: remove_module sof_pci_dev remove_module snd_sof_intel_hda_common remove_module snd_soc_skl_hda_dsp remove_module snd_soc_hdac_hdmi remove_module snd_hda_codec_realtek remove_module snd_hda_codec_generic remove_module snd_soc_dmic remove_module snd_soc_hdac_hda remove_module snd_sof_intel_hda remove_module snd_hda_ext_core remove_module snd_hda_codec remove_module snd_hda_core remove_module sof_acpi_dev remove_module snd_soc_acpi_intel_match remove_module snd_sof_intel_byt remove_module snd_sof_intel_hsw remove_module snd_sof_intel_bdw remove_module snd_sof_xtensa_dsp remove_module snd_sof_intel_ipc remove_module snd_soc_sst_bytcr_rt5640 remove_module snd_soc_sst_bytcr_rt5651 remove_module snd_soc_sst_cht_bsw_rt5645 remove_module snd_soc_sst_cht_bsw_rt5670 remove_module snd_soc_sst_byt_cht_da7213 remove_module snd_soc_sst_bxt_pcm512x remove_module snd_soc_sst_bxt_tdf8532 remove_module snd_soc_cnl_rt274 remove_module snd_sof_nocodec remove_module snd_sof remove_module snd_soc_acpi_intel_match remove_module snd_soc_rt5670 remove_module snd_soc_rt5645 remove_module snd_soc_rt5651 remove_module snd_soc_rt5640 remove_module snd_soc_rl6231 remove_module snd_soc_da7213 remove_module snd_soc_pcm512x_i2c remove_module snd_soc_pcm512x remove_module snd_soc_tdf8532 remove_module snd_soc_rt274 remove_module snd_soc_acpi remove_module snd_soc_core remove_module snd_pcm And sof_insert.sh is to insert the modules: modprobe snd_soc_rt5670 modprobe snd_soc_rt5645 modprobe snd_soc_rt5651 modprobe snd_soc_rt5640 modprobe snd_soc_da7213 modprobe snd_soc_pcm512x_i2c #modprobe snd_soc_tdf8532 #modprobe snd_soc_rt274 modprobe sof_acpi_dev modprobe sof_pci_dev Regards, Libin >-----Original Message----- >From: Yang, Libin >Sent: Thursday, May 23, 2019 12:10 PM >To: 'Pierre-Louis Bossart' <pierre-louis.bossart@linux.intel.com>; alsa- >devel@alsa-project.org >Cc: tiwai@suse.de; broonie@kernel.org >Subject: RE: [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix memory >allocation > >Please hold on this patch. It seems there is some corner case failed because of >this patch. > >Regards, >Libin > > >>-----Original Message----- >>From: Pierre-Louis Bossart >>[mailto:pierre-louis.bossart@linux.intel.com] >>Sent: Thursday, May 23, 2019 12:22 AM >>To: alsa-devel@alsa-project.org >>Cc: tiwai@suse.de; broonie@kernel.org; Yang, Libin >><libin.yang@intel.com>; Pierre-Louis Bossart >><pierre-louis.bossart@linux.intel.com> >>Subject: [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix memory >>allocation >> >>From: Libin Yang <libin.yang@intel.com> >> >>Align all users of the hdac library to use devm_kzalloc. >> >>Note for backports/stable: the patch ("ALSA: hdac: fix memory release >>for SST and SOF drivers") needs to be applied as well. >> >>Fixes: 5507b8103e26 ("ASoC: SOF: Intel: Add support for HDAudio >>codecs") >>Reviewed-by: Takashi Iwai <tiwai@suse.de> >>Signed-off-by: Libin Yang <libin.yang@intel.com> >>Signed-off-by: Pierre-Louis Bossart >><pierre-louis.bossart@linux.intel.com> >>--- >> sound/soc/sof/intel/hda-codec.c | 6 ++---- >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >>diff --git a/sound/soc/sof/intel/hda-codec.c b/sound/soc/sof/intel/hda- >>codec.c index b8b37f082309..0d8437b080bf 100644 >>--- a/sound/soc/sof/intel/hda-codec.c >>+++ b/sound/soc/sof/intel/hda-codec.c >>@@ -62,8 +62,7 @@ static int hda_codec_probe(struct snd_sof_dev *sdev, >>int address) >> address, resp); >> >> #if IS_ENABLED(CONFIG_SND_SOC_SOF_HDA_AUDIO_CODEC) >>- /* snd_hdac_ext_bus_device_exit will use kfree to free hdev */ >>- hda_priv = kzalloc(sizeof(*hda_priv), GFP_KERNEL); >>+ hda_priv = devm_kzalloc(sdev->dev, sizeof(*hda_priv), GFP_KERNEL); >> if (!hda_priv) >> return -ENOMEM; >> >>@@ -82,8 +81,7 @@ static int hda_codec_probe(struct snd_sof_dev *sdev, >>int address) >> >> return 0; >> #else >>- /* snd_hdac_ext_bus_device_exit will use kfree to free hdev */ >>- hdev = kzalloc(sizeof(*hdev), GFP_KERNEL); >>+ hdev = devm_kzalloc(sdev->dev, sizeof(*hdev), GFP_KERNEL); >> if (!hdev) >> return -ENOMEM; >> >>-- >>2.20.1
On Thu, 23 May 2019 10:03:03 +0200, Yang, Libin wrote: > > Please let me describe the issue here. > > The test case is: > 1) Unload module with script "sudo ./sof_remove.sh" , > 2) reload module with script "sudo ./sof_insert.sh" > > After several rounds of removing and inserting kernel modules, > system will complain like below: > "BUG: unable to handle kernel paging request at 000000292a282031" Did you try some kernel debug options? It might show what went wrong. Takashi
Hi Takashi, >-----Original Message----- >From: Takashi Iwai [mailto:tiwai@suse.de] >Sent: Thursday, May 23, 2019 4:16 PM >To: Yang, Libin <libin.yang@intel.com> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa- >devel@alsa-project.org; broonie@kernel.org >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix >memory allocation > >On Thu, 23 May 2019 10:03:03 +0200, >Yang, Libin wrote: >> >> Please let me describe the issue here. >> >> The test case is: >> 1) Unload module with script "sudo ./sof_remove.sh" , >> 2) reload module with script "sudo ./sof_insert.sh" >> >> After several rounds of removing and inserting kernel modules, system >> will complain like below: >> "BUG: unable to handle kernel paging request at 000000292a282031" > >Did you try some kernel debug options? It might show what went wrong. No, I haven't. I'm not sure which options I can use for this case. Could you please give me some suggestions? Regards, Libin > > >Takashi
Hi Takashi, >>-----Original Message----- >>From: Takashi Iwai [mailto:tiwai@suse.de] >>Sent: Thursday, May 23, 2019 4:16 PM >>To: Yang, Libin <libin.yang@intel.com> >>Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa- >>devel@alsa-project.org; broonie@kernel.org >>Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: >>fix memory allocation >> >>On Thu, 23 May 2019 10:03:03 +0200, >>Yang, Libin wrote: >>> >>> Please let me describe the issue here. >>> >>> The test case is: >>> 1) Unload module with script "sudo ./sof_remove.sh" , >>> 2) reload module with script "sudo ./sof_insert.sh" >>> >>> After several rounds of removing and inserting kernel modules, system >>> will complain like below: >>> "BUG: unable to handle kernel paging request at 000000292a282031" >> >>Did you try some kernel debug options? It might show what went wrong. > >No, I haven't. I'm not sure which options I can use for this case. Could you >please give me some suggestions? BTW: I have enabled DEBUG_DEVRES. Are there any other options I can try? Regards, Libin > >Regards, >Libin > >> >> >>Takashi
On Thu, 23 May 2019 10:21:20 +0200, Yang, Libin wrote: > > Hi Takashi, > > >-----Original Message----- > >From: Takashi Iwai [mailto:tiwai@suse.de] > >Sent: Thursday, May 23, 2019 4:16 PM > >To: Yang, Libin <libin.yang@intel.com> > >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa- > >devel@alsa-project.org; broonie@kernel.org > >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix > >memory allocation > > > >On Thu, 23 May 2019 10:03:03 +0200, > >Yang, Libin wrote: > >> > >> Please let me describe the issue here. > >> > >> The test case is: > >> 1) Unload module with script "sudo ./sof_remove.sh" , > >> 2) reload module with script "sudo ./sof_insert.sh" > >> > >> After several rounds of removing and inserting kernel modules, system > >> will complain like below: > >> "BUG: unable to handle kernel paging request at 000000292a282031" > > > >Did you try some kernel debug options? It might show what went wrong. > > No, I haven't. I'm not sure which options I can use for this case. Could you > please give me some suggestions? You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option for showing each devres allocation and removal. And I'd try CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever interesting in CONFIG_DEBUG section, too. Takashi
>-----Original Message----- >From: Takashi Iwai [mailto:tiwai@suse.de] >Sent: Thursday, May 23, 2019 4:27 PM >To: Yang, Libin <libin.yang@intel.com> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa- >devel@alsa-project.org; broonie@kernel.org >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix >memory allocation > >On Thu, 23 May 2019 10:21:20 +0200, >Yang, Libin wrote: >> >> Hi Takashi, >> >> >-----Original Message----- >> >From: Takashi Iwai [mailto:tiwai@suse.de] >> >Sent: Thursday, May 23, 2019 4:16 PM >> >To: Yang, Libin <libin.yang@intel.com> >> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; >> >alsa- devel@alsa-project.org; broonie@kernel.org >> >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: >> >hda-codec: fix memory allocation >> > >> >On Thu, 23 May 2019 10:03:03 +0200, >> >Yang, Libin wrote: >> >> >> >> Please let me describe the issue here. >> >> >> >> The test case is: >> >> 1) Unload module with script "sudo ./sof_remove.sh" , >> >> 2) reload module with script "sudo ./sof_insert.sh" >> >> >> >> After several rounds of removing and inserting kernel modules, >> >> system will complain like below: >> >> "BUG: unable to handle kernel paging request at 000000292a282031" >> > >> >Did you try some kernel debug options? It might show what went wrong. >> >> No, I haven't. I'm not sure which options I can use for this case. >> Could you please give me some suggestions? > >You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option for >showing each devres allocation and removal. And I'd try >CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever >interesting in CONFIG_DEBUG section, too. Thanks. I will have a try. Regards, Libin > > >Takashi
On 5/23/19 3:24 AM, Yang, Libin wrote: > Hi Takashi, > > >>> -----Original Message----- >>> From: Takashi Iwai [mailto:tiwai@suse.de] >>> Sent: Thursday, May 23, 2019 4:16 PM >>> To: Yang, Libin <libin.yang@intel.com> >>> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa- >>> devel@alsa-project.org; broonie@kernel.org >>> Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: >>> fix memory allocation >>> >>> On Thu, 23 May 2019 10:03:03 +0200, >>> Yang, Libin wrote: >>>> >>>> Please let me describe the issue here. >>>> >>>> The test case is: >>>> 1) Unload module with script "sudo ./sof_remove.sh" , >>>> 2) reload module with script "sudo ./sof_insert.sh" >>>> >>>> After several rounds of removing and inserting kernel modules, system >>>> will complain like below: >>>> "BUG: unable to handle kernel paging request at 000000292a282031" >>> >>> Did you try some kernel debug options? It might show what went wrong. >> >> No, I haven't. I'm not sure which options I can use for this case. Could you >> please give me some suggestions? > > BTW: I have enabled DEBUG_DEVRES. Are there any other options I can try? There are already a set of kconfig fragments for debug, see https://github.com/thesofproject/kconfig and select memory-debug-defconfig. I guess I will need to require this test in the SOF CI, I really don't get how this issue was not seen earlier. Gah.
Hi Peirre, >>>> >>>> On Thu, 23 May 2019 10:03:03 +0200, >>>> Yang, Libin wrote: >>>>> >>>>> Please let me describe the issue here. >>>>> >>>>> The test case is: >>>>> 1) Unload module with script "sudo ./sof_remove.sh" , >>>>> 2) reload module with script "sudo ./sof_insert.sh" >>>>> >>>>> After several rounds of removing and inserting kernel modules, >>>>> system will complain like below: >>>>> "BUG: unable to handle kernel paging request at 000000292a282031" >>>> >>>> Did you try some kernel debug options? It might show what went wrong. >>> >>> No, I haven't. I'm not sure which options I can use for this case. >>> Could you please give me some suggestions? >> >> BTW: I have enabled DEBUG_DEVRES. Are there any other options I can try? > >There are already a set of kconfig fragments for debug, see >https://github.com/thesofproject/kconfig and select memory-debug-defconfig. > >I guess I will need to require this test in the SOF CI, I really don't get how this >issue was not seen earlier. Gah. This bug can't be reproduced easily sometimes. Sometimes, the bug will be hit at the second or third round of the test. And sometimes, we need do more rounds to reproduce this bug. This may be the reason we didn't hit this bug before. Regards, Libin
Hi Takashi, >-----Original Message----- >From: Takashi Iwai [mailto:tiwai@suse.de] >Sent: Thursday, May 23, 2019 4:27 PM >To: Yang, Libin <libin.yang@intel.com> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa- >devel@alsa-project.org; broonie@kernel.org >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix >memory allocation > >On Thu, 23 May 2019 10:21:20 +0200, >Yang, Libin wrote: >> >> Hi Takashi, >> >> >-----Original Message----- >> >From: Takashi Iwai [mailto:tiwai@suse.de] >> >Sent: Thursday, May 23, 2019 4:16 PM >> >To: Yang, Libin <libin.yang@intel.com> >> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; >> >alsa- devel@alsa-project.org; broonie@kernel.org >> >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: >> >hda-codec: fix memory allocation >> > >> >On Thu, 23 May 2019 10:03:03 +0200, >> >Yang, Libin wrote: >> >> >> >> Please let me describe the issue here. >> >> >> >> The test case is: >> >> 1) Unload module with script "sudo ./sof_remove.sh" , >> >> 2) reload module with script "sudo ./sof_insert.sh" >> >> >> >> After several rounds of removing and inserting kernel modules, >> >> system will complain like below: >> >> "BUG: unable to handle kernel paging request at 000000292a282031" >> > >> >Did you try some kernel debug options? It might show what went wrong. >> >> No, I haven't. I'm not sure which options I can use for this case. >> Could you please give me some suggestions? > >You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option for >showing each devres allocation and removal. And I'd try >CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever >interesting in CONFIG_DEBUG section, too. Thanks for your suggestion. After more than 1 week debug, I think maybe I have root caused this issue from the devres.log message. Below is my finding. 1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called, and it will set hdev->dev.release = default_release. However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe() will be called later. And it will call snd_hda_codec_device_new(), which will reset codec->code.dev.release = snd_hda_codec_dev_release; This means hdac_hdmi's hdev dev release is default_release() defined in hdac_ext_bus.c, and other hda codec's hdev dev release is snd_hda_codec_dev_release(). Both default_release() and snd_hda_codec_dev_release() will call kfree() to free the hdac_device (or its container) in the current code. 2. When we run rmmod sof_pci_dev, it will free the sdev. If we use Struct hdac_device *hdev = devm_kzalloc(sdev->dev...). This means hdev will also be freed automatically. In the removal, snd_hdac_ext_bus_device_remove() will be called to remove the hdev (in this function it is struct hdac_device *codec. The name is not aligned in different places). However for hdac_hdmi, the hdev->dev is used by other code. So calling device.release() (the function default_release()) will be postponed. After after sdev is freed, the device.release() will be called. But for devm_xxx, hdev will also be freed when sdev is freed. This means hdev.dev after sdev is freed is invalid now as hdev has already freed. It will access invalid memory. This will cause the bug. So I think we should not use devm_xxx, and let's free the hdev manually. At the end of this topic, I still found 2 suspicious code in the current code. 1. in sound/soc/intel/skylate/skl.c it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc(). As we will call kfree() in the current code, should we replace it with kzalloc()? Maybe we need cavs drivers owner's help on it. 2. in snd_hdac_ext_bus_device_remove() It will call snd_hdac_device_unregister() to unregister the hdac_devices and put_device(&codec->dev) to release the dev. For analog codec, snd_hdac_device_unregister() will free the codec->dev's kobject. And snd_hda_codec_dev_release() will be called to free the hdac_device. So it is invalid to call put_device(&codec->dev). If you print refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before put_device(), you will find the refcount has already been 0. Regards, Libin > > >Takashi
Hi Libin, >>>>> Please let me describe the issue here. >>>>> >>>>> The test case is: >>>>> 1) Unload module with script "sudo ./sof_remove.sh" , >>>>> 2) reload module with script "sudo ./sof_insert.sh" >>>>> >>>>> After several rounds of removing and inserting kernel modules, >>>>> system will complain like below: >>>>> "BUG: unable to handle kernel paging request at 000000292a282031" >>>> >>>> Did you try some kernel debug options? It might show what went wrong. >>> >>> No, I haven't. I'm not sure which options I can use for this case. >>> Could you please give me some suggestions? >> >> You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option for >> showing each devres allocation and removal. And I'd try >> CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever >> interesting in CONFIG_DEBUG section, too. > > Thanks for your suggestion. After more than 1 week debug, I think maybe > I have root caused this issue from the devres.log message. > > Below is my finding. > 1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called, > and it will set hdev->dev.release = default_release. > However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe() > will be called later. And it will call snd_hda_codec_device_new(), which will > reset codec->code.dev.release = snd_hda_codec_dev_release; > This means hdac_hdmi's hdev dev release is default_release() defined in > hdac_ext_bus.c, and other hda codec's hdev dev release is > snd_hda_codec_dev_release(). > > Both default_release() and snd_hda_codec_dev_release() will call kfree() > to free the hdac_device (or its container) in the current code. > > 2. When we run rmmod sof_pci_dev, it will free the sdev. If we use > Struct hdac_device *hdev = devm_kzalloc(sdev->dev...). This means > hdev will also be freed automatically. > > In the removal, snd_hdac_ext_bus_device_remove() will be called > to remove the hdev (in this function it is struct hdac_device *codec. > The name is not aligned in different places). > However for hdac_hdmi, the hdev->dev is used by other code. what other code? Can you elaborate on why the release is delayed? > So calling device.release() (the function default_release()) will > be postponed. After after sdev is freed, the device.release() will > be called. But for devm_xxx, hdev will also be freed when sdev is > freed. This means hdev.dev after sdev is freed is invalid now as > hdev has already freed. It will access invalid memory. This will cause the bug. This is very hard to follow. 4 lines above you wrote the release is postponed but the way you describe is looks completely sequential. > > So I think we should not use devm_xxx, and let's free the hdev manually. > > At the end of this topic, I still found 2 suspicious code in the current code. > 1. in sound/soc/intel/skylate/skl.c > it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc(). > As we will call kfree() in the current code, should we replace it with > kzalloc()? Maybe we need cavs drivers owner's help on it. maybe you should send a diff suggestion to help everyone understand the changes you are referring to? > > 2. in snd_hdac_ext_bus_device_remove() > It will call snd_hdac_device_unregister() to unregister the hdac_devices > and put_device(&codec->dev) to release the dev. > For analog codec, snd_hdac_device_unregister() will free the codec->dev's > kobject. And snd_hda_codec_dev_release() will be called to free the > hdac_device. > So it is invalid to call put_device(&codec->dev). If you print > refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before > put_device(), you will find the refcount has already been 0. Isn't it a different problem though? Does this cause a freeze or is this just a bad refcount?
>>> >On Thu, 23 May 2019 10:03:03 +0200, >>> >Yang, Libin wrote: >>> >> >>> >> Please let me describe the issue here. >>> >> >>> >> The test case is: >>> >> 1) Unload module with script "sudo ./sof_remove.sh" , >>> >> 2) reload module with script "sudo ./sof_insert.sh" >>> >> >>> >> After several rounds of removing and inserting kernel modules, >>> >> system will complain like below: >>> >> "BUG: unable to handle kernel paging request at 000000292a282031" >>> > >>> >Did you try some kernel debug options? It might show what went wrong. >>> >>> No, I haven't. I'm not sure which options I can use for this case. >>> Could you please give me some suggestions? >> >>You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option >for >>showing each devres allocation and removal. And I'd try >>CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever >interesting in >>CONFIG_DEBUG section, too. > >Thanks for your suggestion. After more than 1 week debug, I think maybe I >have root caused this issue from the devres.log message. > >Below is my finding. >1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called, >and it will set hdev->dev.release = default_release. >However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe() >will be called later. And it will call snd_hda_codec_device_new(), which will >reset codec->code.dev.release = snd_hda_codec_dev_release; This means >hdac_hdmi's hdev dev release is default_release() defined in hdac_ext_bus.c, >and other hda codec's hdev dev release is snd_hda_codec_dev_release(). > >Both default_release() and snd_hda_codec_dev_release() will call kfree() to >free the hdac_device (or its container) in the current code. > >2. When we run rmmod sof_pci_dev, it will free the sdev. If we use Struct >hdac_device *hdev = devm_kzalloc(sdev->dev...). This means hdev will also be >freed automatically. > >In the removal, snd_hdac_ext_bus_device_remove() will be called to remove >the hdev (in this function it is struct hdac_device *codec. >The name is not aligned in different places). >However for hdac_hdmi, the hdev->dev is used by other code. >So calling device.release() (the function default_release()) will be postponed. >After after sdev is freed, the device.release() will be called. But for devm_xxx, >hdev will also be freed when sdev is freed. This means hdev.dev after sdev is >freed is invalid now as hdev has already freed. It will access invalid memory. >This will cause the bug. > >So I think we should not use devm_xxx, and let's free the hdev manually. > >At the end of this topic, I still found 2 suspicious code in the current code. >1. in sound/soc/intel/skylate/skl.c >it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc(). >As we will call kfree() in the current code, should we replace it with kzalloc()? >Maybe we need cavs drivers owner's help on it. > >2. in snd_hdac_ext_bus_device_remove() >It will call snd_hdac_device_unregister() to unregister the hdac_devices and >put_device(&codec->dev) to release the dev. >For analog codec, snd_hdac_device_unregister() will free the codec->dev's >kobject. And snd_hda_codec_dev_release() will be called to free the >hdac_device. >So it is invalid to call put_device(&codec->dev). If you print >refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before >put_device(), you will find the refcount has already been 0. It seems Ranjani's " ASoC: hda: increment codec device refcount when it is added to the card" patch can fix the second issue. Regards, Libin
Hi Pierre, >Hi Libin, > >>>>>> Please let me describe the issue here. >>>>>> >>>>>> The test case is: >>>>>> 1) Unload module with script "sudo ./sof_remove.sh" , >>>>>> 2) reload module with script "sudo ./sof_insert.sh" >>>>>> >>>>>> After several rounds of removing and inserting kernel modules, >>>>>> system will complain like below: >>>>>> "BUG: unable to handle kernel paging request at 000000292a282031" >>>>> >>>>> Did you try some kernel debug options? It might show what went wrong. >>>> >>>> No, I haven't. I'm not sure which options I can use for this case. >>>> Could you please give me some suggestions? >>> >>> You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option >>> for showing each devres allocation and removal. And I'd try >>> CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever >interesting >>> in CONFIG_DEBUG section, too. >> >> Thanks for your suggestion. After more than 1 week debug, I think >> maybe I have root caused this issue from the devres.log message. >> >> Below is my finding. >> 1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be >> called, and it will set hdev->dev.release = default_release. >> However, for analog codec (not hdac_hdmi codec), >> hdac_hda_codec_probe() will be called later. And it will call >> snd_hda_codec_device_new(), which will reset codec->code.dev.release = >> snd_hda_codec_dev_release; This means hdac_hdmi's hdev dev release is >> default_release() defined in hdac_ext_bus.c, and other hda codec's >> hdev dev release is snd_hda_codec_dev_release(). >> >> Both default_release() and snd_hda_codec_dev_release() will call >> kfree() to free the hdac_device (or its container) in the current code. >> >> 2. When we run rmmod sof_pci_dev, it will free the sdev. If we use >> Struct hdac_device *hdev = devm_kzalloc(sdev->dev...). This means hdev >> will also be freed automatically. >> >> In the removal, snd_hdac_ext_bus_device_remove() will be called to >> remove the hdev (in this function it is struct hdac_device *codec. >> The name is not aligned in different places). >> However for hdac_hdmi, the hdev->dev is used by other code. > >what other code? Can you elaborate on why the release is delayed? From the dmesg, it is the device_link used for hdac_hdmi that causes the release delay. Device_link will increase the refcount. > >> So calling device.release() (the function default_release()) will be >> postponed. After after sdev is freed, the device.release() will be >> called. But for devm_xxx, hdev will also be freed when sdev is freed. >> This means hdev.dev after sdev is freed is invalid now as hdev has >> already freed. It will access invalid memory. This will cause the bug. > >This is very hard to follow. 4 lines above you wrote the release is postponed >but the way you describe is looks completely sequential. For hdac_hdmi, as it is used by other code (device_link), the device release will not be called immediately. After sdev and hdev is freed, the hdev device is put by device_link. Then hdev device can be released. But hdev has already be freed. This is wrong. > >> >> So I think we should not use devm_xxx, and let's free the hdev manually. >> >> At the end of this topic, I still found 2 suspicious code in the current code. >> 1. in sound/soc/intel/skylate/skl.c >> it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc(). >> As we will call kfree() in the current code, should we replace it with >> kzalloc()? Maybe we need cavs drivers owner's help on it. > >maybe you should send a diff suggestion to help everyone understand the >changes you are referring to? Please see my inline patch: diff --git a/sound/soc/intel/skylake/skl.c b/sound/soc/intel/skylake/skl.c index f864f7b..12af99a 100644 --- a/sound/soc/intel/skylake/skl.c +++ b/sound/soc/intel/skylake/skl.c @@ -697,8 +697,7 @@ static int probe_codec(struct hdac_bus *bus, int addr) dev_dbg(bus->dev, "codec #%d probed OK: %x\n", addr, res); #if IS_ENABLED(CONFIG_SND_SOC_INTEL_SKYLAKE_HDAUDIO_CODEC) - hda_codec = devm_kzalloc(&skl->pci->dev, sizeof(*hda_codec), - GFP_KERNEL); + hda_codec = kzalloc(sizeof(*hda_codec), GFP_KERNEL); if (!hda_codec) return -ENOMEM; @@ -716,7 +715,7 @@ static int probe_codec(struct hdac_bus *bus, int addr) } return 0; #else - hdev = devm_kzalloc(&skl->pci->dev, sizeof(*hdev), GFP_KERNEL); + hdev = kzalloc(sizeof(*hdev), GFP_KERNEL); if (!hdev) return -ENOMEM; > >> >> 2. in snd_hdac_ext_bus_device_remove() It will call >> snd_hdac_device_unregister() to unregister the hdac_devices and >> put_device(&codec->dev) to release the dev. >> For analog codec, snd_hdac_device_unregister() will free the >> codec->dev's kobject. And snd_hda_codec_dev_release() will be called >> to free the hdac_device. >> So it is invalid to call put_device(&codec->dev). If you print >> refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec >> before put_device(), you will find the refcount has already been 0. > >Isn't it a different problem though? Does this cause a freeze or is this just a >bad refcount? Yes, it is a totally different problem. And today morning, I found Ranjani has a patch " ASoC: hda: increment codec device refcount when it is added to the card ", which should fix this bug. Regards, Libin
>-----Original Message----- >From: Yang, Libin >Sent: Monday, June 3, 2019 5:10 PM >To: Takashi Iwai <tiwai@suse.de> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa- >devel@alsa-project.org; broonie@kernel.org >Subject: RE: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix >memory allocation > >Hi Takashi, > >>-----Original Message----- >>From: Takashi Iwai [mailto:tiwai@suse.de] >>Sent: Thursday, May 23, 2019 4:27 PM >>To: Yang, Libin <libin.yang@intel.com> >>Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa- >>devel@alsa-project.org; broonie@kernel.org >>Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: >>fix memory allocation >> >>On Thu, 23 May 2019 10:21:20 +0200, >>Yang, Libin wrote: >>> >>> Hi Takashi, >>> >>> >-----Original Message----- >>> >From: Takashi Iwai [mailto:tiwai@suse.de] >>> >Sent: Thursday, May 23, 2019 4:16 PM >>> >To: Yang, Libin <libin.yang@intel.com> >>> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; >>> >alsa- devel@alsa-project.org; broonie@kernel.org >>> >Subject: Re: [alsa-devel] [PATCH v2 12/12] ASoC: SOF: Intel: >>> >hda-codec: fix memory allocation >>> > >>> >On Thu, 23 May 2019 10:03:03 +0200, >>> >Yang, Libin wrote: >>> >> >>> >> Please let me describe the issue here. >>> >> >>> >> The test case is: >>> >> 1) Unload module with script "sudo ./sof_remove.sh" , >>> >> 2) reload module with script "sudo ./sof_insert.sh" >>> >> >>> >> After several rounds of removing and inserting kernel modules, >>> >> system will complain like below: >>> >> "BUG: unable to handle kernel paging request at 000000292a282031" >>> > >>> >Did you try some kernel debug options? It might show what went wrong. >>> >>> No, I haven't. I'm not sure which options I can use for this case. >>> Could you please give me some suggestions? >> >>You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option >for >>showing each devres allocation and removal. And I'd try >>CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever >interesting in >>CONFIG_DEBUG section, too. > >Thanks for your suggestion. After more than 1 week debug, I think maybe I >have root caused this issue from the devres.log message. > >Below is my finding. >1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called, >and it will set hdev->dev.release = default_release. >However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe() >will be called later. And it will call snd_hda_codec_device_new(), which will >reset codec->code.dev.release = snd_hda_codec_dev_release; This means >hdac_hdmi's hdev dev release is default_release() defined in hdac_ext_bus.c, >and other hda codec's hdev dev release is snd_hda_codec_dev_release(). > >Both default_release() and snd_hda_codec_dev_release() will call kfree() to >free the hdac_device (or its container) in the current code. > >2. When we run rmmod sof_pci_dev, it will free the sdev. If we use Struct >hdac_device *hdev = devm_kzalloc(sdev->dev...). This means hdev will also be >freed automatically. > >In the removal, snd_hdac_ext_bus_device_remove() will be called to remove >the hdev (in this function it is struct hdac_device *codec. >The name is not aligned in different places). >However for hdac_hdmi, the hdev->dev is used by other code. >So calling device.release() (the function default_release()) will be postponed. >After after sdev is freed, the device.release() will be called. But for devm_xxx, >hdev will also be freed when sdev is freed. This means hdev.dev after sdev is >freed is invalid now as hdev has already freed. It will access invalid memory. >This will cause the bug. > >So I think we should not use devm_xxx, and let's free the hdev manually. > >At the end of this topic, I still found 2 suspicious code in the current code. >1. in sound/soc/intel/skylate/skl.c >it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc(). >As we will call kfree() in the current code, should we replace it with kzalloc()? >Maybe we need cavs drivers owner's help on it. > >2. in snd_hdac_ext_bus_device_remove() >It will call snd_hdac_device_unregister() to unregister the hdac_devices and >put_device(&codec->dev) to release the dev. >For analog codec, snd_hdac_device_unregister() will free the codec->dev's >kobject. And snd_hda_codec_dev_release() will be called to free the >hdac_device. >So it is invalid to call put_device(&codec->dev). If you print >refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before >put_device(), you will find the refcount has already been 0. I tested on the latest code. The behavior on kernel 5.2.0-rc1 is different from it on kernel 5.0.0 of https://github.com/thesofproject/linux/. (thesofproject kernel has already been upgraded to 5.2.0-rc1 now) On kernel 5.0.0 (https://github.com/thesofproject/linux/), when removing the sof kernel modules, the sequence is: 1. devm_kazlloc_release() to release hdac_device 2. snd_hdac_ext_bus_device_exit(): this function will use hdac_device However, on kernel 5.2.0-rc1, the sequence is: 1. snd_hdac_ext_bus_device_exit() 2. devm_kazlloc_release() to release hdac_device So I think it is safe now to use devm_xxx to allocate the hdac_device. Also, Ranjani found a duplicated release of hdac_device. She has a patch to fix it. Regards, Libin > >Regards, >Libin > >> >> >>Takashi
diff --git a/sound/soc/sof/intel/hda-codec.c b/sound/soc/sof/intel/hda-codec.c index b8b37f082309..0d8437b080bf 100644 --- a/sound/soc/sof/intel/hda-codec.c +++ b/sound/soc/sof/intel/hda-codec.c @@ -62,8 +62,7 @@ static int hda_codec_probe(struct snd_sof_dev *sdev, int address) address, resp); #if IS_ENABLED(CONFIG_SND_SOC_SOF_HDA_AUDIO_CODEC) - /* snd_hdac_ext_bus_device_exit will use kfree to free hdev */ - hda_priv = kzalloc(sizeof(*hda_priv), GFP_KERNEL); + hda_priv = devm_kzalloc(sdev->dev, sizeof(*hda_priv), GFP_KERNEL); if (!hda_priv) return -ENOMEM; @@ -82,8 +81,7 @@ static int hda_codec_probe(struct snd_sof_dev *sdev, int address) return 0; #else - /* snd_hdac_ext_bus_device_exit will use kfree to free hdev */ - hdev = kzalloc(sizeof(*hdev), GFP_KERNEL); + hdev = devm_kzalloc(sdev->dev, sizeof(*hdev), GFP_KERNEL); if (!hdev) return -ENOMEM;