ALSA: hda: Use standard waitqueue for RIRB wakeup
diff mbox series

Message ID 20191210145727.22054-1-tiwai@suse.de
State New
Headers show
Series
  • ALSA: hda: Use standard waitqueue for RIRB wakeup
Related show

Commit Message

Takashi Iwai Dec. 10, 2019, 2:57 p.m. UTC
The HD-audio CORB/RIRB communication was programmed in a way that was
documented in the reference in decades ago, which is essentially a
polling in the waiter side.  It's working fine but costs CPU cycles on
some platforms that support only slow communications.  Also, for some
platforms that had unreliable communications, we put longer wait time
(2 ms), which accumulate quite long time if you execute many verbs in
a shot (e.g. at the initialization or resume phase).

This patch attempts to improve the situation by introducing the
standard waitqueue in the RIRB waiter side instead of polling.  The
test results on my machine show significant improvements.  The time
spent for "cat /proc/asound/card*/codec#*" were changed like:

* Intel SKL + Realtek codec
  before the patch:
   0.00user 0.04system 0:00.10elapsed 40.0%CPU
  after the patch:
   0.00user 0.01system 0:00.10elapsed 10.0%CPU

* Nvidia GP107GL + Nvidia HDMI codec
  before the patch:
   0.00user 0.00system 0:02.76elapsed 0.0%CPU
  after the patch:
   0.00user 0.00system 0:00.01elapsed 17.0%CPU

So, for Intel chips, the total time is same, while the total time is
greatly reduced (from 2.76 to 0.01s) for Nvidia chips.
The only negative data here is the increase of CPU time for Nvidia,
but this is the unavoidable cost for faster wakeups, supposedly.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 include/sound/hdaudio.h        |  1 +
 sound/hda/hdac_bus.c           |  1 +
 sound/hda/hdac_controller.c    |  3 +++
 sound/pci/hda/hda_controller.c | 29 ++++++++++++++---------------
 4 files changed, 19 insertions(+), 15 deletions(-)

Comments

Kai Vehmanen Dec. 12, 2019, 5:17 p.m. UTC | #1
Hey,

On Tue, 10 Dec 2019, Takashi Iwai wrote:

> This patch attempts to improve the situation by introducing the
> standard waitqueue in the RIRB waiter side instead of polling.  The

this seems like a good change. It's a bit unfortunate this logic is not 
yet shared and SOF uses the variant in snd_hdac_bus_get_response() -- that 
would need similar treatment. I can make a matching change there and see 
how this fairs in SOF CI. But basic tests with snd-hda-intel on a few 
systems didn't reveal anything odd and code seems correct.

Br, Kai
Takashi Iwai Dec. 12, 2019, 5:23 p.m. UTC | #2
On Thu, 12 Dec 2019 18:17:14 +0100,
Kai Vehmanen wrote:
> 
> Hey,
> 
> On Tue, 10 Dec 2019, Takashi Iwai wrote:
> 
> > This patch attempts to improve the situation by introducing the
> > standard waitqueue in the RIRB waiter side instead of polling.  The
> 
> this seems like a good change. It's a bit unfortunate this logic is not 
> yet shared and SOF uses the variant in snd_hdac_bus_get_response() -- that 
> would need similar treatment. I can make a matching change there and see 
> how this fairs in SOF CI. But basic tests with snd-hda-intel on a few 
> systems didn't reveal anything odd and code seems correct.

Yes, please, the adaption to SOF is really appreciated.
This is a 5.6 material and I applied only the changes I could actually
test as a start.


thanks,

Takashi
Takashi Iwai Dec. 12, 2019, 6:42 p.m. UTC | #3
On Thu, 12 Dec 2019 18:23:08 +0100,
Takashi Iwai wrote:
> 
> On Thu, 12 Dec 2019 18:17:14 +0100,
> Kai Vehmanen wrote:
> > 
> > Hey,
> > 
> > On Tue, 10 Dec 2019, Takashi Iwai wrote:
> > 
> > > This patch attempts to improve the situation by introducing the
> > > standard waitqueue in the RIRB waiter side instead of polling.  The
> > 
> > this seems like a good change. It's a bit unfortunate this logic is not 
> > yet shared and SOF uses the variant in snd_hdac_bus_get_response() -- that 
> > would need similar treatment. I can make a matching change there and see 
> > how this fairs in SOF CI. But basic tests with snd-hda-intel on a few 
> > systems didn't reveal anything odd and code seems correct.
> 
> Yes, please, the adaption to SOF is really appreciated.
> This is a 5.6 material and I applied only the changes I could actually
> test as a start.

And now I realized that adapting this for the common code is trivial,
and it's a good chance to unify the code there.  I'll submit a test
patch set soon later.  Stay tuned.


Takashi
Kai Vehmanen Dec. 13, 2019, 1:05 p.m. UTC | #4
Hey,

On Tue, 10 Dec 2019, Takashi Iwai wrote:

> This patch attempts to improve the situation by introducing the
> standard waitqueue in the RIRB waiter side instead of polling.  The

this patch was part of the testing as well, so looks good. One minor
nit only:

> @@ -216,6 +216,9 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
>  		else if (bus->rirb.cmds[addr]) {
>  			bus->rirb.res[addr] = res;
>  			bus->rirb.cmds[addr]--;
> +			if (!bus->rirb.cmds[addr] &&
> +			    waitqueue_active(&bus->rirb_wq))
> +				wake_up(&bus->rirb_wq);

Checkpath would like to have a comment here:

WARNING: waitqueue_active without comment
#77: FILE: sound/hda/hdac_controller.c:220:
+                           waitqueue_active(&bus->rirb_wq))

Br, Kai
Takashi Iwai Dec. 13, 2019, 1:34 p.m. UTC | #5
On Fri, 13 Dec 2019 14:05:31 +0100,
Kai Vehmanen wrote:
> 
> Hey,
> 
> On Tue, 10 Dec 2019, Takashi Iwai wrote:
> 
> > This patch attempts to improve the situation by introducing the
> > standard waitqueue in the RIRB waiter side instead of polling.  The
> 
> this patch was part of the testing as well, so looks good. One minor
> nit only:
> 
> > @@ -216,6 +216,9 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
> >  		else if (bus->rirb.cmds[addr]) {
> >  			bus->rirb.res[addr] = res;
> >  			bus->rirb.cmds[addr]--;
> > +			if (!bus->rirb.cmds[addr] &&
> > +			    waitqueue_active(&bus->rirb_wq))
> > +				wake_up(&bus->rirb_wq);
> 
> Checkpath would like to have a comment here:
> 
> WARNING: waitqueue_active without comment
> #77: FILE: sound/hda/hdac_controller.c:220:
> +                           waitqueue_active(&bus->rirb_wq))

Yeah, that was known to me, too.  Actually it's misleading; what
actually matters is the memory barrier or other synchronization there,
and this should work as is because of the current code path.
(Besides, majority of existing waitqueue_active() have no comments at
all :)

And, now I found wa_has_sleeper() as a better replacement of
waitqueue_active() that explicitly cares about synchronization.
So I'll change with this in a later patch (after unification to
hda-core).


thanks,

Takashi
Takashi Iwai Dec. 13, 2019, 1:42 p.m. UTC | #6
On Fri, 13 Dec 2019 14:34:13 +0100,
Takashi Iwai wrote:
> 
> On Fri, 13 Dec 2019 14:05:31 +0100,
> Kai Vehmanen wrote:
> > 
> > Hey,
> > 
> > On Tue, 10 Dec 2019, Takashi Iwai wrote:
> > 
> > > This patch attempts to improve the situation by introducing the
> > > standard waitqueue in the RIRB waiter side instead of polling.  The
> > 
> > this patch was part of the testing as well, so looks good. One minor
> > nit only:
> > 
> > > @@ -216,6 +216,9 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
> > >  		else if (bus->rirb.cmds[addr]) {
> > >  			bus->rirb.res[addr] = res;
> > >  			bus->rirb.cmds[addr]--;
> > > +			if (!bus->rirb.cmds[addr] &&
> > > +			    waitqueue_active(&bus->rirb_wq))
> > > +				wake_up(&bus->rirb_wq);
> > 
> > Checkpath would like to have a comment here:
> > 
> > WARNING: waitqueue_active without comment
> > #77: FILE: sound/hda/hdac_controller.c:220:
> > +                           waitqueue_active(&bus->rirb_wq))
> 
> Yeah, that was known to me, too.  Actually it's misleading; what
> actually matters is the memory barrier or other synchronization there,
> and this should work as is because of the current code path.
> (Besides, majority of existing waitqueue_active() have no comments at
> all :)
> 
> And, now I found wa_has_sleeper() as a better replacement of
> waitqueue_active() that explicitly cares about synchronization.
> So I'll change with this in a later patch (after unification to
> hda-core).

On the second thought, the extra barrier should be utterly
superfluous, so I'll go for commenting about the spinlock instead...


Takashi
Jon Hunter Dec. 18, 2019, 2:17 p.m. UTC | #7
On 10/12/2019 14:57, Takashi Iwai wrote:
> The HD-audio CORB/RIRB communication was programmed in a way that was
> documented in the reference in decades ago, which is essentially a
> polling in the waiter side.  It's working fine but costs CPU cycles on
> some platforms that support only slow communications.  Also, for some
> platforms that had unreliable communications, we put longer wait time
> (2 ms), which accumulate quite long time if you execute many verbs in
> a shot (e.g. at the initialization or resume phase).
> 
> This patch attempts to improve the situation by introducing the
> standard waitqueue in the RIRB waiter side instead of polling.  The
> test results on my machine show significant improvements.  The time
> spent for "cat /proc/asound/card*/codec#*" were changed like:
> 
> * Intel SKL + Realtek codec
>   before the patch:
>    0.00user 0.04system 0:00.10elapsed 40.0%CPU
>   after the patch:
>    0.00user 0.01system 0:00.10elapsed 10.0%CPU
> 
> * Nvidia GP107GL + Nvidia HDMI codec
>   before the patch:
>    0.00user 0.00system 0:02.76elapsed 0.0%CPU
>   after the patch:
>    0.00user 0.00system 0:00.01elapsed 17.0%CPU
> 
> So, for Intel chips, the total time is same, while the total time is
> greatly reduced (from 2.76 to 0.01s) for Nvidia chips.
> The only negative data here is the increase of CPU time for Nvidia,
> but this is the unavoidable cost for faster wakeups, supposedly.
> 
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Starting with next-20191217 I am seeing the following error on one of
our Tegra platforms ...

tegra-hda 3510000.hda: azx_get_response timeout, switching to polling
mode: last cmd=0x404f2d00

Bisect is point to this commit and although it does not cleanly revert,
if I revert this and a couple dependencies on top of -next the issue
goes away. Any thoughts on what could be going on here?

Cheers
Jon
Takashi Iwai Dec. 18, 2019, 2:31 p.m. UTC | #8
On Wed, 18 Dec 2019 15:17:27 +0100,
Jon Hunter wrote:
> 
> 
> On 10/12/2019 14:57, Takashi Iwai wrote:
> > The HD-audio CORB/RIRB communication was programmed in a way that was
> > documented in the reference in decades ago, which is essentially a
> > polling in the waiter side.  It's working fine but costs CPU cycles on
> > some platforms that support only slow communications.  Also, for some
> > platforms that had unreliable communications, we put longer wait time
> > (2 ms), which accumulate quite long time if you execute many verbs in
> > a shot (e.g. at the initialization or resume phase).
> > 
> > This patch attempts to improve the situation by introducing the
> > standard waitqueue in the RIRB waiter side instead of polling.  The
> > test results on my machine show significant improvements.  The time
> > spent for "cat /proc/asound/card*/codec#*" were changed like:
> > 
> > * Intel SKL + Realtek codec
> >   before the patch:
> >    0.00user 0.04system 0:00.10elapsed 40.0%CPU
> >   after the patch:
> >    0.00user 0.01system 0:00.10elapsed 10.0%CPU
> > 
> > * Nvidia GP107GL + Nvidia HDMI codec
> >   before the patch:
> >    0.00user 0.00system 0:02.76elapsed 0.0%CPU
> >   after the patch:
> >    0.00user 0.00system 0:00.01elapsed 17.0%CPU
> > 
> > So, for Intel chips, the total time is same, while the total time is
> > greatly reduced (from 2.76 to 0.01s) for Nvidia chips.
> > The only negative data here is the increase of CPU time for Nvidia,
> > but this is the unavoidable cost for faster wakeups, supposedly.
> > 
> > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> Starting with next-20191217 I am seeing the following error on one of
> our Tegra platforms ...
> 
> tegra-hda 3510000.hda: azx_get_response timeout, switching to polling
> mode: last cmd=0x404f2d00
> 
> Bisect is point to this commit and although it does not cleanly revert,
> if I revert this and a couple dependencies on top of -next the issue
> goes away. Any thoughts on what could be going on here?

Do you see any bad behavior other than the warning message?

If you don't see any dysfunction, I guess that the difference is that
the old code went to the trial mode at first silently (with
dev_dbg()), then switching to polling mode at next.  The trial mode is
basically same as polling mode, but it was just considered to be a
temporary transition, so not warned.

IOW, if my guess is correct, maybe Tegra never worked in the normal
mode but only in the polling mode (but without complaints).
If so, the patch like below would be needed.

To prove my theory, could you check the old code with dyndbg enabled
for sound/pci/hda/hda_controller.c?  If a message like below appears,
it's the case:
  azx_get_response timeout, polling the codec once: last cmd=xxx


thanks,

Takashi

--- a/sound/pci/hda/hda_tegra.c
+++ b/sound/pci/hda/hda_tegra.c
@@ -394,6 +394,7 @@ static int hda_tegra_create(struct snd_card *card,
 	if (err < 0)
 		return err;
 
+	chip->bus.core.polling = 1;
 	chip->bus.core.needs_damn_long_delay = 1;
 
 	err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops);
Jon Hunter Dec. 19, 2019, 12:15 p.m. UTC | #9
On 18/12/2019 14:31, Takashi Iwai wrote:
> On Wed, 18 Dec 2019 15:17:27 +0100,
> Jon Hunter wrote:
>>
>>
>> On 10/12/2019 14:57, Takashi Iwai wrote:
>>> The HD-audio CORB/RIRB communication was programmed in a way that was
>>> documented in the reference in decades ago, which is essentially a
>>> polling in the waiter side.  It's working fine but costs CPU cycles on
>>> some platforms that support only slow communications.  Also, for some
>>> platforms that had unreliable communications, we put longer wait time
>>> (2 ms), which accumulate quite long time if you execute many verbs in
>>> a shot (e.g. at the initialization or resume phase).
>>>
>>> This patch attempts to improve the situation by introducing the
>>> standard waitqueue in the RIRB waiter side instead of polling.  The
>>> test results on my machine show significant improvements.  The time
>>> spent for "cat /proc/asound/card*/codec#*" were changed like:
>>>
>>> * Intel SKL + Realtek codec
>>>   before the patch:
>>>    0.00user 0.04system 0:00.10elapsed 40.0%CPU
>>>   after the patch:
>>>    0.00user 0.01system 0:00.10elapsed 10.0%CPU
>>>
>>> * Nvidia GP107GL + Nvidia HDMI codec
>>>   before the patch:
>>>    0.00user 0.00system 0:02.76elapsed 0.0%CPU
>>>   after the patch:
>>>    0.00user 0.00system 0:00.01elapsed 17.0%CPU
>>>
>>> So, for Intel chips, the total time is same, while the total time is
>>> greatly reduced (from 2.76 to 0.01s) for Nvidia chips.
>>> The only negative data here is the increase of CPU time for Nvidia,
>>> but this is the unavoidable cost for faster wakeups, supposedly.
>>>
>>> Signed-off-by: Takashi Iwai <tiwai@suse.de>
>> Starting with next-20191217 I am seeing the following error on one of
>> our Tegra platforms ...
>>
>> tegra-hda 3510000.hda: azx_get_response timeout, switching to polling
>> mode: last cmd=0x404f2d00
>>
>> Bisect is point to this commit and although it does not cleanly revert,
>> if I revert this and a couple dependencies on top of -next the issue
>> goes away. Any thoughts on what could be going on here?
> 
> Do you see any bad behavior other than the warning message?

I have done some more local testing and so far I don't see any bad
behaviour just the warning.

> If you don't see any dysfunction, I guess that the difference is that
> the old code went to the trial mode at first silently (with
> dev_dbg()), then switching to polling mode at next.  The trial mode is
> basically same as polling mode, but it was just considered to be a
> temporary transition, so not warned.
> 
> IOW, if my guess is correct, maybe Tegra never worked in the normal
> mode but only in the polling mode (but without complaints).
> If so, the patch like below would be needed.
> 
> To prove my theory, could you check the old code with dyndbg enabled
> for sound/pci/hda/hda_controller.c?  If a message like below appears,
> it's the case:
>   azx_get_response timeout, polling the codec once: last cmd=xxx

Yes I tried this and you are correct, this does appear even if v5.5-rc2.

Please note that this timeout is intermittent and so does not always
happen. So it appears to work, but sometimes it can fail.

> --- a/sound/pci/hda/hda_tegra.c
> +++ b/sound/pci/hda/hda_tegra.c
> @@ -394,6 +394,7 @@ static int hda_tegra_create(struct snd_card *card,
>  	if (err < 0)
>  		return err;
>  
> +	chip->bus.core.polling = 1;
>  	chip->bus.core.needs_damn_long_delay = 1;
>  
>  	err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops);

I don't think we want to do this, because so far this is only seen on
one Tegra device and this enable polling for all.

For now you can ignore this report and we will investigate what is
happening on Tegra194 to cause this.

Thanks
Jon

Patch
diff mbox series

diff --git a/include/sound/hdaudio.h b/include/sound/hdaudio.h
index e05b95e83d5a..81373a2efd96 100644
--- a/include/sound/hdaudio.h
+++ b/include/sound/hdaudio.h
@@ -317,6 +317,7 @@  struct hdac_bus {
 	struct hdac_rb corb;
 	struct hdac_rb rirb;
 	unsigned int last_cmd[HDA_MAX_CODECS];	/* last sent command */
+	wait_queue_head_t rirb_wq;
 
 	/* CORB/RIRB and position buffers */
 	struct snd_dma_buffer rb;
diff --git a/sound/hda/hdac_bus.c b/sound/hda/hdac_bus.c
index 8f19876244eb..48b227fff204 100644
--- a/sound/hda/hdac_bus.c
+++ b/sound/hda/hdac_bus.c
@@ -43,6 +43,7 @@  int snd_hdac_bus_init(struct hdac_bus *bus, struct device *dev,
 	mutex_init(&bus->cmd_mutex);
 	mutex_init(&bus->lock);
 	INIT_LIST_HEAD(&bus->hlink_list);
+	init_waitqueue_head(&bus->rirb_wq);
 	bus->irq = -1;
 	return 0;
 }
diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c
index 7e7be8e4dcf9..cd1c3b282657 100644
--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -216,6 +216,9 @@  void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
 		else if (bus->rirb.cmds[addr]) {
 			bus->rirb.res[addr] = res;
 			bus->rirb.cmds[addr]--;
+			if (!bus->rirb.cmds[addr] &&
+			    waitqueue_active(&bus->rirb_wq))
+				wake_up(&bus->rirb_wq);
 		} else {
 			dev_err_ratelimited(bus->dev,
 				"spurious response %#x:%#x, last cmd=%#08x\n",
diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c
index 2f3b7a35f2d9..f30a053d981e 100644
--- a/sound/pci/hda/hda_controller.c
+++ b/sound/pci/hda/hda_controller.c
@@ -792,21 +792,25 @@  static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr,
 	struct hda_bus *hbus = &chip->bus;
 	unsigned long timeout;
 	unsigned long loopcounter;
-	int do_poll = 0;
+	wait_queue_entry_t wait;
 	bool warned = false;
 
+	init_wait_entry(&wait, 0);
  again:
 	timeout = jiffies + msecs_to_jiffies(1000);
 
 	for (loopcounter = 0;; loopcounter++) {
 		spin_lock_irq(&bus->reg_lock);
-		if (bus->polling_mode || do_poll)
+		if (!bus->polling_mode)
+			prepare_to_wait(&bus->rirb_wq, &wait,
+					TASK_UNINTERRUPTIBLE);
+		if (bus->polling_mode)
 			snd_hdac_bus_update_rirb(bus);
 		if (!bus->rirb.cmds[addr]) {
-			if (!do_poll)
-				bus->poll_count = 0;
 			if (res)
 				*res = bus->rirb.res[addr]; /* the last value */
+			if (!bus->polling_mode)
+				finish_wait(&bus->rirb_wq, &wait);
 			spin_unlock_irq(&bus->reg_lock);
 			return 0;
 		}
@@ -814,7 +818,9 @@  static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr,
 		if (time_after(jiffies, timeout))
 			break;
 #define LOOP_COUNT_MAX	3000
-		if (hbus->needs_damn_long_delay ||
+		if (!bus->polling_mode) {
+			schedule_timeout(msecs_to_jiffies(2));
+		} else if (hbus->needs_damn_long_delay ||
 		    loopcounter > LOOP_COUNT_MAX) {
 			if (loopcounter > LOOP_COUNT_MAX && !warned) {
 				dev_dbg_ratelimited(chip->card->dev,
@@ -829,19 +835,12 @@  static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr,
 		}
 	}
 
+	if (!bus->polling_mode)
+		finish_wait(&bus->rirb_wq, &wait);
+
 	if (hbus->no_response_fallback)
 		return -EIO;
 
-	if (!bus->polling_mode && bus->poll_count < 2) {
-		dev_dbg(chip->card->dev,
-			"azx_get_response timeout, polling the codec once: last cmd=0x%08x\n",
-			bus->last_cmd[addr]);
-		do_poll = 1;
-		bus->poll_count++;
-		goto again;
-	}
-
-
 	if (!bus->polling_mode) {
 		dev_warn(chip->card->dev,
 			 "azx_get_response timeout, switching to polling mode: last cmd=0x%08x\n",