diff mbox series

[2/2] hda/i915: split the wait for the component binding

Message ID 20220223195203.13708-3-ramalingam.c@intel.com (mailing list archive)
State New, archived
Headers show
Series hda/i915: split wait for component binding | expand

Commit Message

Ramalingam C Feb. 23, 2022, 7:52 p.m. UTC
Split the wait for component binding from i915 in multiples of
sysctl_hung_task_timeout_secs. This helps to avoid the possible kworker
thread hung detection given below.

<3>[   60.946316] INFO: task kworker/11:1:104 blocked for more than 30
seconds.
<3>[   60.946479]       Tainted: G        W
5.17.0-rc5-CI-CI_DRM_11265+ #1
<3>[   60.946580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
<6>[   60.946688] task:kworker/11:1    state:D stack:14192 pid:  104
ppid:     2 flags:0x00004000
<6>[   60.946713] Workqueue: events azx_probe_work [snd_hda_intel]
<6>[   60.946740] Call Trace:
<6>[   60.946745]  <TASK>
<6>[   60.946763]  __schedule+0x42c/0xa80
<6>[   60.946797]  schedule+0x3f/0xc0
<6>[   60.946811]  schedule_timeout+0x1be/0x2e0
<6>[   60.946829]  ? del_timer_sync+0xb0/0xb0
<6>[   60.946849]  ? 0xffffffff81000000
<6>[   60.946864]  ? wait_for_completion_timeout+0x79/0x120
<6>[   60.946879]  wait_for_completion_timeout+0xab/0x120
<6>[   60.946906]  snd_hdac_i915_init+0xa5/0xb0 [snd_hda_core]
<6>[   60.946943]  azx_probe_work+0x71/0x84c [snd_hda_intel]
<6>[   60.946974]  process_one_work+0x275/0x5c0
<6>[   60.947010]  worker_thread+0x37/0x370
<6>[   60.947028]  ? process_one_work+0x5c0/0x5c0
<6>[   60.947038]  kthread+0xef/0x120
<6>[   60.947047]  ? kthread_complete_and_exit+0x20/0x20
<6>[   60.947065]  ret_from_fork+0x22/0x30
<6>[   60.947110]  </TASK>

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Kai Vehmanen <kai.vehmanen@intel.com>
cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
 sound/hda/hdac_i915.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

Comments

Kai Vehmanen Feb. 24, 2022, 4:34 p.m. UTC | #1
Hi,

On Thu, 24 Feb 2022, Ramalingam C wrote:

> Split the wait for component binding from i915 in multiples of
> sysctl_hung_task_timeout_secs. This helps to avoid the possible kworker
> thread hung detection given below.

while I understand the problem, I'm not sure whether a simpler option
should be chosen. Maybe just split the wait_for_completion_timeout()
into small 5sec iterations, without consulting value of hung_task_timeout.
This would seem unligned with more mainstream use of 
wait_for_completion_timeout() in kernel and still do the job.

I'll loop in Takashi here as well. Basicly the 60 sec timeout in 
hda/hdac_i915.c is getting caught by hung_task_detection logic in builds
where the hung_task_timeout is below 60secs.

I have a patch that tries to avoid hitting the timeout in some of the more 
common cases:
"ALSA: hda/i915 - skip acomp init if no matching display"
https://lists.freedesktop.org/archives/intel-gfx-trybot/2022-February/128278.html
... but we'll still be stuck with some configurations where the timeout 
will be hit. And above needs careful testing.

One logic comment below as well, but I'll quote the whole patch to give
context to Takashi.
 
> <3>[   60.946316] INFO: task kworker/11:1:104 blocked for more than 30
> seconds.
> <3>[   60.946479]       Tainted: G        W
> 5.17.0-rc5-CI-CI_DRM_11265+ #1
> <3>[   60.946580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> <6>[   60.946688] task:kworker/11:1    state:D stack:14192 pid:  104
> ppid:     2 flags:0x00004000
> <6>[   60.946713] Workqueue: events azx_probe_work [snd_hda_intel]
> <6>[   60.946740] Call Trace:
> <6>[   60.946745]  <TASK>
> <6>[   60.946763]  __schedule+0x42c/0xa80
> <6>[   60.946797]  schedule+0x3f/0xc0
> <6>[   60.946811]  schedule_timeout+0x1be/0x2e0
> <6>[   60.946829]  ? del_timer_sync+0xb0/0xb0
> <6>[   60.946849]  ? 0xffffffff81000000
> <6>[   60.946864]  ? wait_for_completion_timeout+0x79/0x120
> <6>[   60.946879]  wait_for_completion_timeout+0xab/0x120
> <6>[   60.946906]  snd_hdac_i915_init+0xa5/0xb0 [snd_hda_core]
> <6>[   60.946943]  azx_probe_work+0x71/0x84c [snd_hda_intel]
> <6>[   60.946974]  process_one_work+0x275/0x5c0
> <6>[   60.947010]  worker_thread+0x37/0x370
> <6>[   60.947028]  ? process_one_work+0x5c0/0x5c0
> <6>[   60.947038]  kthread+0xef/0x120
> <6>[   60.947047]  ? kthread_complete_and_exit+0x20/0x20
> <6>[   60.947065]  ret_from_fork+0x22/0x30
> <6>[   60.947110]  </TASK>
> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Kai Vehmanen <kai.vehmanen@intel.com>
> cc: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  sound/hda/hdac_i915.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/sound/hda/hdac_i915.c b/sound/hda/hdac_i915.c
> index d20a450a9a15..daaeebc5099e 100644
> --- a/sound/hda/hdac_i915.c
> +++ b/sound/hda/hdac_i915.c
> @@ -6,6 +6,7 @@
>  #include <linux/init.h>
>  #include <linux/module.h>
>  #include <linux/pci.h>
> +#include <linux/sched/sysctl.h>
>  #include <sound/core.h>
>  #include <sound/hdaudio.h>
>  #include <sound/hda_i915.h>
> @@ -163,7 +164,8 @@ static bool dg1_gfx_present(void)
>  int snd_hdac_i915_init(struct hdac_bus *bus)
>  {
>  	struct drm_audio_component *acomp;
> -	int err;
> +	unsigned long timeout, ret = 0;
> +	int err, i, itr_cnt;
>  
>  	if (!i915_gfx_present())
>  		return -ENODEV;
> @@ -182,9 +184,18 @@ int snd_hdac_i915_init(struct hdac_bus *bus)
>  	if (!acomp->ops) {
>  		if (!IS_ENABLED(CONFIG_MODULES) ||
>  		    !request_module("i915")) {
5~> +			if (!sysctl_hung_task_timeout_secs) {
> +				itr_cnt = 1;
> +				timeout = msecs_to_jiffies(60 * 1000);
> +			} else {
> +				itr_cnt = DIV_ROUND_UP(60, sysctl_hung_task_timeout_secs);
> +				timeout = msecs_to_jiffies(sysctl_hung_task_timeout_secs * 1000);
> +			}
> +
>  			/* 60s timeout */
> -			wait_for_completion_timeout(&acomp->master_bind_complete,
> -						    msecs_to_jiffies(30 * 1000));
> +			for (i = 0; i < itr_cnt || !ret; i++)
> +				ret = wait_for_completion_timeout(&acomp->master_bind_complete,
> +								  timeout);

I think that should be 'i < itr_cnt && !ret'. If wait_for_completion
returns with a positive value, we should stop waiting as the completion
has been signalled.

>  		}
>  	}
>  	if (!acomp->ops) {
> -- 

Br, Kai
Lucas De Marchi Feb. 24, 2022, 5:13 p.m. UTC | #2
On Thu, Feb 24, 2022 at 01:22:03AM +0530, Ramalingam C wrote:
>Split the wait for component binding from i915 in multiples of
>sysctl_hung_task_timeout_secs. This helps to avoid the possible kworker
>thread hung detection given below.
>
><3>[   60.946316] INFO: task kworker/11:1:104 blocked for more than 30
>seconds.
><3>[   60.946479]       Tainted: G        W
>5.17.0-rc5-CI-CI_DRM_11265+ #1
><3>[   60.946580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>disables this message.
><6>[   60.946688] task:kworker/11:1    state:D stack:14192 pid:  104
>ppid:     2 flags:0x00004000
><6>[   60.946713] Workqueue: events azx_probe_work [snd_hda_intel]
><6>[   60.946740] Call Trace:
><6>[   60.946745]  <TASK>
><6>[   60.946763]  __schedule+0x42c/0xa80
><6>[   60.946797]  schedule+0x3f/0xc0
><6>[   60.946811]  schedule_timeout+0x1be/0x2e0
><6>[   60.946829]  ? del_timer_sync+0xb0/0xb0
><6>[   60.946849]  ? 0xffffffff81000000
><6>[   60.946864]  ? wait_for_completion_timeout+0x79/0x120
><6>[   60.946879]  wait_for_completion_timeout+0xab/0x120
><6>[   60.946906]  snd_hdac_i915_init+0xa5/0xb0 [snd_hda_core]
><6>[   60.946943]  azx_probe_work+0x71/0x84c [snd_hda_intel]
><6>[   60.946974]  process_one_work+0x275/0x5c0
><6>[   60.947010]  worker_thread+0x37/0x370
><6>[   60.947028]  ? process_one_work+0x5c0/0x5c0
><6>[   60.947038]  kthread+0xef/0x120
><6>[   60.947047]  ? kthread_complete_and_exit+0x20/0x20
><6>[   60.947065]  ret_from_fork+0x22/0x30
><6>[   60.947110]  </TASK>
>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>cc: Kai Vehmanen <kai.vehmanen@intel.com>
>cc: Lucas De Marchi <lucas.demarchi@intel.com>

some more Cc needed?

$ ./scripts/get_maintainer.pl sound/hda/hdac_i915.c
Jaroslav Kysela <perex@perex.cz> (maintainer:SOUND)
Takashi Iwai <tiwai@suse.com> (maintainer:SOUND)
Lucas De Marchi <lucas.demarchi@intel.com> (commit_signer:2/2=100%)
Rodrigo Vivi <rodrigo.vivi@intel.com> (commit_signer:1/2=50%)
Ramalingam C <ramalingam.c@intel.com> (commit_signer:1/2=50%,authored:1/2=50%,removed_lines:1/1=100%)
Chris Wilson <chris@chris-wilson.co.uk> (authored:1/2=50%,added_lines:23/24=96%)
alsa-devel@alsa-project.org (moderated list:SOUND)
linux-kernel@vger.kernel.org (open list)


>---
> sound/hda/hdac_i915.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
>diff --git a/sound/hda/hdac_i915.c b/sound/hda/hdac_i915.c
>index d20a450a9a15..daaeebc5099e 100644
>--- a/sound/hda/hdac_i915.c
>+++ b/sound/hda/hdac_i915.c
>@@ -6,6 +6,7 @@
> #include <linux/init.h>
> #include <linux/module.h>
> #include <linux/pci.h>
>+#include <linux/sched/sysctl.h>
> #include <sound/core.h>
> #include <sound/hdaudio.h>
> #include <sound/hda_i915.h>
>@@ -163,7 +164,8 @@ static bool dg1_gfx_present(void)
> int snd_hdac_i915_init(struct hdac_bus *bus)
> {
> 	struct drm_audio_component *acomp;
>-	int err;
>+	unsigned long timeout, ret = 0;
>+	int err, i, itr_cnt;
>
> 	if (!i915_gfx_present())
> 		return -ENODEV;
>@@ -182,9 +184,18 @@ int snd_hdac_i915_init(struct hdac_bus *bus)
> 	if (!acomp->ops) {
> 		if (!IS_ENABLED(CONFIG_MODULES) ||
> 		    !request_module("i915")) {
>+			if (!sysctl_hung_task_timeout_secs) {
>+				itr_cnt = 1;
>+				timeout = msecs_to_jiffies(60 * 1000);
>+			} else {
>+				itr_cnt = DIV_ROUND_UP(60, sysctl_hung_task_timeout_secs);
>+				timeout = msecs_to_jiffies(sysctl_hung_task_timeout_secs * 1000);
>+			}
>+
> 			/* 60s timeout */
>-			wait_for_completion_timeout(&acomp->master_bind_complete,
>-						    msecs_to_jiffies(30 * 1000));
>+			for (i = 0; i < itr_cnt || !ret; i++)

it will return 0 if a timeout occurs so it's trapped forever here? Did
you mean &&?

Also maybe:

	itr_cnt = DIV_ROUND_UP(60, sysctl_hung_task_timeout_secs ?: 60);

Lucas De Marchi

} while ();

>+				ret = wait_for_completion_timeout(&acomp->master_bind_complete,
>+								  timeout);
> 		}
> 	}
> 	if (!acomp->ops) {
>-- 
>2.20.1
>
Takashi Iwai Feb. 25, 2022, 11:42 a.m. UTC | #3
On Thu, 24 Feb 2022 17:34:54 +0100,
Kai Vehmanen wrote:
> 
> Hi,
> 
> On Thu, 24 Feb 2022, Ramalingam C wrote:
> 
> > Split the wait for component binding from i915 in multiples of
> > sysctl_hung_task_timeout_secs. This helps to avoid the possible kworker
> > thread hung detection given below.
> 
> while I understand the problem, I'm not sure whether a simpler option
> should be chosen. Maybe just split the wait_for_completion_timeout()
> into small 5sec iterations, without consulting value of hung_task_timeout.
> This would seem unligned with more mainstream use of 
> wait_for_completion_timeout() in kernel and still do the job.
> 
> I'll loop in Takashi here as well. Basicly the 60 sec timeout in 
> hda/hdac_i915.c is getting caught by hung_task_detection logic in builds
> where the hung_task_timeout is below 60secs.
> 
> I have a patch that tries to avoid hitting the timeout in some of the more 
> common cases:
> "ALSA: hda/i915 - skip acomp init if no matching display"
> https://lists.freedesktop.org/archives/intel-gfx-trybot/2022-February/128278.html
> ... but we'll still be stuck with some configurations where the timeout 
> will be hit. And above needs careful testing.
> 
> One logic comment below as well, but I'll quote the whole patch to give
> context to Takashi.

I agree with Kai, we can just make the wait_for_completion_timeout()
split in a loop independently from hung_task_timeout.  It's simpler,
less error-prone.


thanks,

Takashi
diff mbox series

Patch

diff --git a/sound/hda/hdac_i915.c b/sound/hda/hdac_i915.c
index d20a450a9a15..daaeebc5099e 100644
--- a/sound/hda/hdac_i915.c
+++ b/sound/hda/hdac_i915.c
@@ -6,6 +6,7 @@ 
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/sched/sysctl.h>
 #include <sound/core.h>
 #include <sound/hdaudio.h>
 #include <sound/hda_i915.h>
@@ -163,7 +164,8 @@  static bool dg1_gfx_present(void)
 int snd_hdac_i915_init(struct hdac_bus *bus)
 {
 	struct drm_audio_component *acomp;
-	int err;
+	unsigned long timeout, ret = 0;
+	int err, i, itr_cnt;
 
 	if (!i915_gfx_present())
 		return -ENODEV;
@@ -182,9 +184,18 @@  int snd_hdac_i915_init(struct hdac_bus *bus)
 	if (!acomp->ops) {
 		if (!IS_ENABLED(CONFIG_MODULES) ||
 		    !request_module("i915")) {
+			if (!sysctl_hung_task_timeout_secs) {
+				itr_cnt = 1;
+				timeout = msecs_to_jiffies(60 * 1000);
+			} else {
+				itr_cnt = DIV_ROUND_UP(60, sysctl_hung_task_timeout_secs);
+				timeout = msecs_to_jiffies(sysctl_hung_task_timeout_secs * 1000);
+			}
+
 			/* 60s timeout */
-			wait_for_completion_timeout(&acomp->master_bind_complete,
-						    msecs_to_jiffies(30 * 1000));
+			for (i = 0; i < itr_cnt || !ret; i++)
+				ret = wait_for_completion_timeout(&acomp->master_bind_complete,
+								  timeout);
 		}
 	}
 	if (!acomp->ops) {