diff mbox series

[3/3] echoaudio: Address bugs in the interrupt handling

Message ID 20200616131743.4793-3-mark@xwax.org (mailing list archive)
State New, archived
Headers show
Series [1/3] echoaudio: Race conditions around "opencount" | expand

Commit Message

Mark Hills June 16, 2020, 1:17 p.m. UTC
Distorted audio appears occasionally, affecting either playback
or capture and requiring the affected substream to be closed by
all applications and re-opened.

The best way I have found to reproduce the bug is to use dmix in
combination with Chromium, which opens the audio device multiple
times in threads. Anecdotally, the problems appear to have increased
with faster CPUs.

Since applying this patch I have not had problems, where previously
they would occur several times a day.

This patch addresses the following issues:

* Check for progress using the counter from the hardware, not after
  it has been truncated to the buffer.

  There appears to be a possible bug if a whole ringbuffer advances
  between interrupts, it goes unnoticed.

* Remove chip->last_period:

  It's trivial to derive from pipe->last_counter, and inside pipe
  is where it more logically belongs. This has less scope for bugs
  as it is not wrapped to the buffer length.

* Remove the accessing of pipe->dma_counter twice in the interrupt
  handler:

  The value will be changing between accesses. It doesn't look like
  this could cause a bug in practice, assuming the value only goes up.
  Except perhaps if another thread were able to reset it to 0 on the
  second access because chip->lock is not held.

  This may improve the performance of the interrupt handler;
  dma_counter is volatile so access is slow.

The resulting interrupt handling resembles more closely the pattern in
the kernel "Writing an ALSA driver" documentation.

Signed-off-by: Mark Hills <mark@xwax.org>
---
 sound/pci/echoaudio/echoaudio.c | 80 +++++++++++++++++++++------------
 sound/pci/echoaudio/echoaudio.h |  1 -
 2 files changed, 52 insertions(+), 29 deletions(-)

Comments

Takashi Iwai June 16, 2020, 1:35 p.m. UTC | #1
On Tue, 16 Jun 2020 15:17:43 +0200,
Mark Hills wrote:
> 
> +/* Update software pointer to match the hardware
> + *
> + * Return: 1 if we crossed a period threshold, otherwise 0
> + */
> +static int snd_echo_poll_substream(struct snd_pcm_substream *substream)

This is a bit confusing name.  One would imagine from the function
name as if it were about the handling of poll() syscall.

> +{
> +	struct snd_pcm_runtime *runtime = substream->runtime;
> +	struct audiopipe *pipe = runtime->private_data;
> +	unsigned counter, step, period, last_period;
> +	size_t buffer_bytes;
> +
> +	if (pipe->state != PIPE_STATE_STARTED)
> +		return 0;
> +
> +	counter = le32_to_cpu(*pipe->dma_counter);
> +
> +	period = bytes_to_frames(runtime, counter)
> +		/ runtime->period_size;
> +	last_period = bytes_to_frames(runtime, pipe->last_counter)
> +		/ runtime->period_size;
> +
> +	if (period == last_period)
> +		return 0;  /* don't do any work yet */
> +
> +	step = counter - pipe->last_counter;
> +	pipe->last_counter = counter;

Dose this calculation consider the overlap of 32bit integer of the
counter?  (I'm not sure whether the old code did it right, though.)


thanks,

Takashi
Mark Hills June 16, 2020, 2:01 p.m. UTC | #2
On Tue, 16 Jun 2020, Takashi Iwai wrote:

> On Tue, 16 Jun 2020 15:17:43 +0200,
> Mark Hills wrote:
> > 
> > +/* Update software pointer to match the hardware
> > + *
> > + * Return: 1 if we crossed a period threshold, otherwise 0
> > + */
> > +static int snd_echo_poll_substream(struct snd_pcm_substream *substream)
> 
> This is a bit confusing name.  One would imagine from the function
> name as if it were about the handling of poll() syscall.

Poll felt intuitive to me; maybe from FreeBSD where network drivers can 
poll on a timer instead of interrupts. I do know about poll(), though.

How about snd_echo_update_substream()?
 
> > +{
> > +	struct snd_pcm_runtime *runtime = substream->runtime;
> > +	struct audiopipe *pipe = runtime->private_data;
> > +	unsigned counter, step, period, last_period;
> > +	size_t buffer_bytes;
> > +
> > +	if (pipe->state != PIPE_STATE_STARTED)
> > +		return 0;
> > +
> > +	counter = le32_to_cpu(*pipe->dma_counter);
> > +
> > +	period = bytes_to_frames(runtime, counter)
> > +		/ runtime->period_size;
> > +	last_period = bytes_to_frames(runtime, pipe->last_counter)
> > +		/ runtime->period_size;
> > +
> > +	if (period == last_period)
> > +		return 0;  /* don't do any work yet */
> > +
> > +	step = counter - pipe->last_counter;
> > +	pipe->last_counter = counter;
> 
> Dose this calculation consider the overlap of 32bit integer of the
> counter?  (I'm not sure whether the old code did it right, though.)

I believe it does, since (small - big) as unsigned gives small value. And 
period is checked only for equality (not greater than). I'll add a comment 
as such. I have run it with long streams.

Would it be clearer, or should it be that buffer_bytes to "unsigned"? 
Though size_t conveys that it is a byte length, in memory.

In general I haven't deviated from existing code without need to, so I 
inherited these types.

The same goes for the pattern of calculating "step" with unsigned values, 
and then using a loop to wrap it to the buffer:

    while (pipe->position >= buffer_bytes)
            pipe->position -= buffer_bytes;

I've assumed this was a recognised pattern in ALSA code, preferred over 
modulus.
Takashi Iwai June 16, 2020, 2:18 p.m. UTC | #3
On Tue, 16 Jun 2020 16:01:11 +0200,
Mark Hills wrote:
> 
> On Tue, 16 Jun 2020, Takashi Iwai wrote:
> 
> > On Tue, 16 Jun 2020 15:17:43 +0200,
> > Mark Hills wrote:
> > > 
> > > +/* Update software pointer to match the hardware
> > > + *
> > > + * Return: 1 if we crossed a period threshold, otherwise 0
> > > + */
> > > +static int snd_echo_poll_substream(struct snd_pcm_substream *substream)
> > 
> > This is a bit confusing name.  One would imagine from the function
> > name as if it were about the handling of poll() syscall.
> 
> Poll felt intuitive to me; maybe from FreeBSD where network drivers can 
> poll on a timer instead of interrupts. I do know about poll(), though.
> 
> How about snd_echo_update_substream()?

Yes, it's more self-explanatory.  Or even better
snd_echo_update_substream_position() :)

> > > +{
> > > +	struct snd_pcm_runtime *runtime = substream->runtime;
> > > +	struct audiopipe *pipe = runtime->private_data;
> > > +	unsigned counter, step, period, last_period;
> > > +	size_t buffer_bytes;
> > > +
> > > +	if (pipe->state != PIPE_STATE_STARTED)
> > > +		return 0;
> > > +
> > > +	counter = le32_to_cpu(*pipe->dma_counter);
> > > +
> > > +	period = bytes_to_frames(runtime, counter)
> > > +		/ runtime->period_size;
> > > +	last_period = bytes_to_frames(runtime, pipe->last_counter)
> > > +		/ runtime->period_size;
> > > +
> > > +	if (period == last_period)
> > > +		return 0;  /* don't do any work yet */
> > > +
> > > +	step = counter - pipe->last_counter;
> > > +	pipe->last_counter = counter;
> > 
> > Dose this calculation consider the overlap of 32bit integer of the
> > counter?  (I'm not sure whether the old code did it right, though.)
> 
> I believe it does, since (small - big) as unsigned gives small value. And 
> period is checked only for equality (not greater than). I'll add a comment 
> as such. I have run it with long streams.

The problem is that the last_period calculation can be wrong if the
period size isn't aligned.  e.g. when period size is 44100, around the
boundary position, it gets a different last_period value although it
still in the same period.


thanks,

Takashi
Giuliano Pochini June 16, 2020, 7:46 p.m. UTC | #4
On Tue, 16 Jun 2020 14:17:43 +0100
Mark Hills <mark@xwax.org> wrote:

> Distorted audio appears occasionally, affecting either playback
> or capture and requiring the affected substream to be closed by
> all applications and re-opened.

Yes, it also happens here very rarely (one time every some months) and I failed to reproduce the problem.


> The best way I have found to reproduce the bug is to use dmix in
> combination with Chromium, which opens the audio device multiple
> times in threads. Anecdotally, the problems appear to have increased
> with faster CPUs.
> 
> Since applying this patch I have not had problems, where previously
> they would occur several times a day.
> 
> This patch addresses the following issues:
> 
> * Check for progress using the counter from the hardware, not after
>   it has been truncated to the buffer.
> 
>   There appears to be a possible bug if a whole ringbuffer advances
>   between interrupts, it goes unnoticed.

In that case the stream must be restarted anyway due to xrun.


> * Remove chip->last_period:
> 
>   It's trivial to derive from pipe->last_counter, and inside pipe
>   is where it more logically belongs. This has less scope for bugs
>   as it is not wrapped to the buffer length.

Ok.


> * Remove the accessing of pipe->dma_counter twice in the interrupt
>   handler:

Why twice?


> +static int snd_echo_poll_substream(struct snd_pcm_substream *substream)
> [...]
>  static irqreturn_t snd_echo_interrupt(int irq, void *dev_id)
> [...]

Looks fine to me.
Giuliano Pochini June 16, 2020, 10:01 p.m. UTC | #5
On Tue, 16 Jun 2020 14:17:43 +0100
Mark Hills <mark@xwax.org> wrote:

> +	counter = le32_to_cpu(*pipe->dma_counter);
> +
> +	period = bytes_to_frames(runtime, counter)
> +		/ runtime->period_size;
> +	last_period = bytes_to_frames(runtime, pipe->last_counter)
> +		/ runtime->period_size;
> +
> +	if (period == last_period)
> +		return 0;  /* don't do any work yet */
> +
> +	step = counter - pipe->last_counter;

Uhmm.. no, when the dma_counter wraps around the comparison gives wrong
result, unless 4G is divisible by the buffer size.

Please consider this patch (to apply after yours). It computes the new period
using pipe->position before and after adding the amount of bytes tranferred
since the previous period.
Briefly tested - It seems ok.


Signed-off-by: Giuliano Pochini <pochini@shiny.it>

--- sound/pci/echoaudio/echoaudio.c__orig	2020-06-16 23:36:02.818601055 +0200
+++ sound/pci/echoaudio/echoaudio.c	2020-06-16 23:52:46.593325066 +0200
@@ -1781,7 +1781,7 @@
 {
 	struct snd_pcm_runtime *runtime = substream->runtime;
 	struct audiopipe *pipe = runtime->private_data;
-	unsigned counter, step, period, last_period;
+	u32 counter, step, period, last_period, pipe_position;
 	size_t buffer_bytes;
 
 	if (pipe->state != PIPE_STATE_STARTED)
@@ -1789,24 +1789,22 @@
 
 	counter = le32_to_cpu(*pipe->dma_counter);
 
-	period = bytes_to_frames(runtime, counter)
+	step = counter - pipe->last_counter;
+	pipe_position = pipe->position + step;	/* bytes */
+	buffer_bytes = frames_to_bytes(runtime, runtime->buffer_size);
+	while (pipe_position >= buffer_bytes)
+		pipe_position -= buffer_bytes;
+
+	period = bytes_to_frames(runtime, pipe_position)
 		/ runtime->period_size;
-	last_period = bytes_to_frames(runtime, pipe->last_counter)
+	last_period = bytes_to_frames(runtime, pipe->position)
 		/ runtime->period_size;
 
 	if (period == last_period)
 		return 0;  /* don't do any work yet */
 
-	step = counter - pipe->last_counter;
+	pipe->position = pipe_position;
 	pipe->last_counter = counter;
-
-	pipe->position += step;  /* bytes */
-
-	buffer_bytes = frames_to_bytes(runtime, runtime->buffer_size);
-
-	while (pipe->position >= buffer_bytes)
-		pipe->position -= buffer_bytes;
-
 	return 1;
 }
Mark Hills June 17, 2020, 10:51 a.m. UTC | #6
On Tue, 16 Jun 2020, Takashi Iwai wrote:

> On Tue, 16 Jun 2020 16:01:11 +0200,
> Mark Hills wrote:
> > 
> > On Tue, 16 Jun 2020, Takashi Iwai wrote:
> > 
> > > On Tue, 16 Jun 2020 15:17:43 +0200,
> > > Mark Hills wrote:
> > > > 
> > > > +/* Update software pointer to match the hardware
> > > > + *
> > > > + * Return: 1 if we crossed a period threshold, otherwise 0
> > > > + */
> > > > +static int snd_echo_poll_substream(struct snd_pcm_substream *substream)
> > > 
> > > This is a bit confusing name.  One would imagine from the function
> > > name as if it were about the handling of poll() syscall.
> > 
> > Poll felt intuitive to me; maybe from FreeBSD where network drivers can 
> > poll on a timer instead of interrupts. I do know about poll(), though.
> > 
> > How about snd_echo_update_substream()?
> 
> Yes, it's more self-explanatory.  Or even better
> snd_echo_update_substream_position() :)

Out of interest, these are static but the names are globally qualified. I 
see this elsewhere, so I followed, but is this agreed convention?

Because it could be update_substream_position() :)

> > > > +{
> > > > +	struct snd_pcm_runtime *runtime = substream->runtime;
> > > > +	struct audiopipe *pipe = runtime->private_data;
> > > > +	unsigned counter, step, period, last_period;
> > > > +	size_t buffer_bytes;
> > > > +
> > > > +	if (pipe->state != PIPE_STATE_STARTED)
> > > > +		return 0;
> > > > +
> > > > +	counter = le32_to_cpu(*pipe->dma_counter);
> > > > +
> > > > +	period = bytes_to_frames(runtime, counter)
> > > > +		/ runtime->period_size;
> > > > +	last_period = bytes_to_frames(runtime, pipe->last_counter)
> > > > +		/ runtime->period_size;
> > > > +
> > > > +	if (period == last_period)
> > > > +		return 0;  /* don't do any work yet */
> > > > +
> > > > +	step = counter - pipe->last_counter;
> > > > +	pipe->last_counter = counter;
> > > 
> > > Dose this calculation consider the overlap of 32bit integer of the
> > > counter?  (I'm not sure whether the old code did it right, though.)
> > 
> > I believe it does, since (small - big) as unsigned gives small value. And 
> > period is checked only for equality (not greater than). I'll add a comment 
> > as such. I have run it with long streams.
> 
> The problem is that the last_period calculation can be wrong if the
> period size isn't aligned.  e.g. when period size is 44100, around the
> boundary position, it gets a different last_period value although it
> still in the same period.

Agreed, yes.

In which case I think it's clearer to not infer anything about periods 
from the current counter or position, and (effectively) accumulate it.

Would you please make suggestions on the code below?

Because it allowed for further simplification whilst fixing the bugs.

In the end, modulo became clearer than loops and I think it has the bonus 
of being less risky should the counter make a large step.

I'll run for a longer test today.

---

/* Update software pointer to match the hardware
 *
 * \pre chip->lock is held
 */
static void snd_echo_update_substream_position(struct echoaudio *chip,
					struct snd_pcm_substream *substream)
{
	struct snd_pcm_runtime *runtime = substream->runtime;
	struct audiopipe *pipe = runtime->private_data;
	u32 counter, step;
	size_t period_bytes;

	if (pipe->state != PIPE_STATE_STARTED)
		return;

	period_bytes = frames_to_bytes(runtime, runtime->period_size);

	counter = le32_to_cpu(*pipe->dma_counter);

	step = counter - pipe->last_counter;  /* handles wrapping of counter */
	step -= step % period_bytes;  /* acknowledge whole periods only */

	if (step == 0)
		return;  /* haven't advanced a whole period yet */

	pipe->last_counter += step;  /* does not always wrap on a period */
	pipe->position += step;

	/* wraparound the buffer */
	pipe->position %= frames_to_bytes(runtime, runtime->buffer_size);

	/* notify only once, even if multiple periods elapsed */
	spin_unlock(&chip->lock);
	snd_pcm_period_elapsed(substream);
	spin_lock(&chip->lock);
}

static irqreturn_t snd_echo_interrupt(int irq, void *dev_id)
{
	struct echoaudio *chip = dev_id;
	int ss, st;

	spin_lock(&chip->lock);
	st = service_irq(chip);
	if (st < 0) {
		spin_unlock(&chip->lock);
		return IRQ_NONE;
	}
	/* The hardware doesn't tell us which substream caused the irq,
	thus we have to check all running substreams. */
	for (ss = 0; ss < DSP_MAXPIPES; ss++) {
		struct snd_pcm_substream *substream;

		substream = chip->substream[ss];
		if (substream)
			snd_echo_update_substream_position(chip, substream);
	}
	spin_unlock(&chip->lock);

#ifdef ECHOCARD_HAS_MIDI
	if (st > 0 && chip->midi_in) {
		snd_rawmidi_receive(chip->midi_in, chip->midi_buffer, st);
		dev_dbg(chip->card->dev, "rawmidi_iread=%d\n", st);
	}
#endif
	return IRQ_HANDLED;
}

static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
{
	struct snd_pcm_runtime *runtime = substream->runtime;
	struct audiopipe *pipe = runtime->private_data;

	return bytes_to_frames(runtime, pipe->position);
}
Mark Hills June 17, 2020, 10:57 a.m. UTC | #7
On Tue, 16 Jun 2020, Giuliano Pochini wrote:

> On Tue, 16 Jun 2020 14:17:43 +0100
> Mark Hills <mark@xwax.org> wrote:
> 
> > Distorted audio appears occasionally, affecting either playback
> > or capture and requiring the affected substream to be closed by
> > all applications and re-opened.
> 
> Yes, it also happens here very rarely (one time every some months) and I 
> failed to reproduce the problem.

It frustrated me for years, but I had other work I needed to do. 
Eventually I was working around it several times a day :-/
 
[...]
> > * Check for progress using the counter from the hardware, not after
> >   it has been truncated to the buffer.
> > 
> >   There appears to be a possible bug if a whole ringbuffer advances
> >   between interrupts, it goes unnoticed.
> 
> In that case the stream must be restarted anyway due to xrun.

Yes, but I think it will go unnoticed, so you don't know to re-start.

[...]
> > * Remove the accessing of pipe->dma_counter twice in the interrupt
> >   handler:
> 
> Why twice?

Yes, the interrupt handler calls pcm_pointer() directly to make a 
decision. Then it calls snd_pcm_period_elapsed(), which itself goes on to 
make another call to pcm_pointer(), by this time the bus mastering of the 
interface has adjusted the counter.

The new code processes the counter only once, and this is easier to 
rationalise whether there are bugs.
Mark Hills June 17, 2020, 11:14 a.m. UTC | #8
On Wed, 17 Jun 2020, Giuliano Pochini wrote:

> On Tue, 16 Jun 2020 14:17:43 +0100
> Mark Hills <mark@xwax.org> wrote:
> 
> > +	counter = le32_to_cpu(*pipe->dma_counter);
> > +
> > +	period = bytes_to_frames(runtime, counter)
> > +		/ runtime->period_size;
> > +	last_period = bytes_to_frames(runtime, pipe->last_counter)
> > +		/ runtime->period_size;
> > +
> > +	if (period == last_period)
> > +		return 0;  /* don't do any work yet */
> > +
> > +	step = counter - pipe->last_counter;
> 
> Uhmm.. no, when the dma_counter wraps around the comparison gives wrong 
> result, unless 4G is divisible by the buffer size.

Agree.

> Please consider this patch (to apply after yours). It computes the new 
> period using pipe->position before and after adding the amount of bytes 
> tranferred since the previous period. Briefly tested - It seems ok.
> 
> 
> Signed-off-by: Giuliano Pochini <pochini@shiny.it>
> 
> --- sound/pci/echoaudio/echoaudio.c__orig	2020-06-16 23:36:02.818601055 +0200
> +++ sound/pci/echoaudio/echoaudio.c	2020-06-16 23:52:46.593325066 +0200
> @@ -1781,7 +1781,7 @@
>  {
>  	struct snd_pcm_runtime *runtime = substream->runtime;
>  	struct audiopipe *pipe = runtime->private_data;
> -	unsigned counter, step, period, last_period;
> +	u32 counter, step, period, last_period, pipe_position;
>  	size_t buffer_bytes;
>  
>  	if (pipe->state != PIPE_STATE_STARTED)
> @@ -1789,24 +1789,22 @@
>  
>  	counter = le32_to_cpu(*pipe->dma_counter);
>  
> -	period = bytes_to_frames(runtime, counter)
> +	step = counter - pipe->last_counter;
> +	pipe_position = pipe->position + step;	/* bytes */
> +	buffer_bytes = frames_to_bytes(runtime, runtime->buffer_size);
> +	while (pipe_position >= buffer_bytes)
> +		pipe_position -= buffer_bytes;
> +
> +	period = bytes_to_frames(runtime, pipe_position)
>  		/ runtime->period_size;
> -	last_period = bytes_to_frames(runtime, pipe->last_counter)
> +	last_period = bytes_to_frames(runtime, pipe->position)
>  		/ runtime->period_size;
>  
>  	if (period == last_period)
>  		return 0;  /* don't do any work yet */
>  
> -	step = counter - pipe->last_counter;
> +	pipe->position = pipe_position;
>  	pipe->last_counter = counter;
> -
> -	pipe->position += step;  /* bytes */
> -
> -	buffer_bytes = frames_to_bytes(runtime, runtime->buffer_size);
> -
> -	while (pipe->position >= buffer_bytes)
> -		pipe->position -= buffer_bytes;
> -
>  	return 1;

I think this risks returning to a case where it concludes nothing advances 
if the counter advances by a whole buffer?

You might be able to do the comparison before wrapping pipe_position, but 
hopefully you'll consider my patch in reply to Takashi has more clarity.
Takashi Iwai June 18, 2020, 8:17 a.m. UTC | #9
On Wed, 17 Jun 2020 12:51:05 +0200,
Mark Hills wrote:
> 
> On Tue, 16 Jun 2020, Takashi Iwai wrote:
> 
> > On Tue, 16 Jun 2020 16:01:11 +0200,
> > Mark Hills wrote:
> > > 
> > > On Tue, 16 Jun 2020, Takashi Iwai wrote:
> > > 
> > > > On Tue, 16 Jun 2020 15:17:43 +0200,
> > > > Mark Hills wrote:
> > > > > 
> > > > > +/* Update software pointer to match the hardware
> > > > > + *
> > > > > + * Return: 1 if we crossed a period threshold, otherwise 0
> > > > > + */
> > > > > +static int snd_echo_poll_substream(struct snd_pcm_substream *substream)
> > > > 
> > > > This is a bit confusing name.  One would imagine from the function
> > > > name as if it were about the handling of poll() syscall.
> > > 
> > > Poll felt intuitive to me; maybe from FreeBSD where network drivers can 
> > > poll on a timer instead of interrupts. I do know about poll(), though.
> > > 
> > > How about snd_echo_update_substream()?
> > 
> > Yes, it's more self-explanatory.  Or even better
> > snd_echo_update_substream_position() :)
> 
> Out of interest, these are static but the names are globally qualified. I 
> see this elsewhere, so I followed, but is this agreed convention?
> 
> Because it could be update_substream_position() :)

It's fine to use any name for a local function.


> > > > > +{
> > > > > +	struct snd_pcm_runtime *runtime = substream->runtime;
> > > > > +	struct audiopipe *pipe = runtime->private_data;
> > > > > +	unsigned counter, step, period, last_period;
> > > > > +	size_t buffer_bytes;
> > > > > +
> > > > > +	if (pipe->state != PIPE_STATE_STARTED)
> > > > > +		return 0;
> > > > > +
> > > > > +	counter = le32_to_cpu(*pipe->dma_counter);
> > > > > +
> > > > > +	period = bytes_to_frames(runtime, counter)
> > > > > +		/ runtime->period_size;
> > > > > +	last_period = bytes_to_frames(runtime, pipe->last_counter)
> > > > > +		/ runtime->period_size;
> > > > > +
> > > > > +	if (period == last_period)
> > > > > +		return 0;  /* don't do any work yet */
> > > > > +
> > > > > +	step = counter - pipe->last_counter;
> > > > > +	pipe->last_counter = counter;
> > > > 
> > > > Dose this calculation consider the overlap of 32bit integer of the
> > > > counter?  (I'm not sure whether the old code did it right, though.)
> > > 
> > > I believe it does, since (small - big) as unsigned gives small value. And 
> > > period is checked only for equality (not greater than). I'll add a comment 
> > > as such. I have run it with long streams.
> > 
> > The problem is that the last_period calculation can be wrong if the
> > period size isn't aligned.  e.g. when period size is 44100, around the
> > boundary position, it gets a different last_period value although it
> > still in the same period.
> 
> Agreed, yes.
> 
> In which case I think it's clearer to not infer anything about periods 
> from the current counter or position, and (effectively) accumulate it.
> 
> Would you please make suggestions on the code below?
> 
> Because it allowed for further simplification whilst fixing the bugs.
> 
> In the end, modulo became clearer than loops and I think it has the bonus 
> of being less risky should the counter make a large step.
> 
> I'll run for a longer test today.
> 
> ---
> 
> /* Update software pointer to match the hardware
>  *
>  * \pre chip->lock is held
>  */
> static void snd_echo_update_substream_position(struct echoaudio *chip,
> 					struct snd_pcm_substream *substream)
> {
> 	struct snd_pcm_runtime *runtime = substream->runtime;
> 	struct audiopipe *pipe = runtime->private_data;
> 	u32 counter, step;
> 	size_t period_bytes;
> 
> 	if (pipe->state != PIPE_STATE_STARTED)
> 		return;
> 
> 	period_bytes = frames_to_bytes(runtime, runtime->period_size);
> 
> 	counter = le32_to_cpu(*pipe->dma_counter);
> 
> 	step = counter - pipe->last_counter;  /* handles wrapping of counter */
> 	step -= step % period_bytes;  /* acknowledge whole periods only */
> 
> 	if (step == 0)
> 		return;  /* haven't advanced a whole period yet */
> 	pipe->last_counter += step;  /* does not always wrap on a period */
> 	pipe->position += step;
> 
> 	/* wraparound the buffer */
> 	pipe->position %= frames_to_bytes(runtime, runtime->buffer_size);
> 
> 	/* notify only once, even if multiple periods elapsed */
> 	spin_unlock(&chip->lock);
> 	snd_pcm_period_elapsed(substream);
> 	spin_lock(&chip->lock);
> }
> 
> static irqreturn_t snd_echo_interrupt(int irq, void *dev_id)
> {
> 	struct echoaudio *chip = dev_id;
> 	int ss, st;
> 
> 	spin_lock(&chip->lock);
> 	st = service_irq(chip);
> 	if (st < 0) {
> 		spin_unlock(&chip->lock);
> 		return IRQ_NONE;
> 	}
> 	/* The hardware doesn't tell us which substream caused the irq,
> 	thus we have to check all running substreams. */
> 	for (ss = 0; ss < DSP_MAXPIPES; ss++) {
> 		struct snd_pcm_substream *substream;
> 
> 		substream = chip->substream[ss];
> 		if (substream)
> 			snd_echo_update_substream_position(chip, substream);
> 	}
> 	spin_unlock(&chip->lock);
> 
> #ifdef ECHOCARD_HAS_MIDI
> 	if (st > 0 && chip->midi_in) {
> 		snd_rawmidi_receive(chip->midi_in, chip->midi_buffer, st);
> 		dev_dbg(chip->card->dev, "rawmidi_iread=%d\n", st);
> 	}
> #endif
> 	return IRQ_HANDLED;
> }
> 
> static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
> {
> 	struct snd_pcm_runtime *runtime = substream->runtime;
> 	struct audiopipe *pipe = runtime->private_data;
> 
> 	return bytes_to_frames(runtime, pipe->position);

I guess this misses the update of the precise position; in your code,
pipe->position gets updated only with the period size at irq handler.


IMO, we should have the code like:

static bool update_stream_position(struct snd_pcm_substream *substream)
{
	// update pipe->position and others, returns true if period elapsed
}

static irqreturn_t snd_echo_interrupt()
{
	spin_lock(&chip->lock);
	....
	if (update_stream_position(substream)) {
		spin_unlock(&chip->lock);
		snd_pcm_period_elapsed(substream);
		spin_lock(&chip->lock);
	}
	....
	spin_unlock(&chip->lock);
	return IRQ_HANDLED;
}

static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
{
	....
	update_stream_position(substream);
	return bytes_to_frames(runtime, pipe->position);
}


Note that snd_pcm_period_elapsed() isn't needed (must not be done)
from the pointer callback.  The PCM core would detect and handle
properly if the period gets elapsed there.


thanks,

Takashi
Mark Hills June 18, 2020, 11:07 a.m. UTC | #10
On Thu, 18 Jun 2020, Takashi Iwai wrote:

> On Wed, 17 Jun 2020 12:51:05 +0200,
> Mark Hills wrote:
> > 
> > On Tue, 16 Jun 2020, Takashi Iwai wrote:
> > 
> > > On Tue, 16 Jun 2020 16:01:11 +0200,
> > > Mark Hills wrote:
[...]
> > 
> > /* Update software pointer to match the hardware
> >  *
> >  * \pre chip->lock is held
> >  */
> > static void snd_echo_update_substream_position(struct echoaudio *chip,
> > 					struct snd_pcm_substream *substream)
> > {
> > 	struct snd_pcm_runtime *runtime = substream->runtime;
> > 	struct audiopipe *pipe = runtime->private_data;
> > 	u32 counter, step;
> > 	size_t period_bytes;
> > 
> > 	if (pipe->state != PIPE_STATE_STARTED)
> > 		return;
> > 
> > 	period_bytes = frames_to_bytes(runtime, runtime->period_size);
> > 
> > 	counter = le32_to_cpu(*pipe->dma_counter);
> > 
> > 	step = counter - pipe->last_counter;  /* handles wrapping of counter */
> > 	step -= step % period_bytes;  /* acknowledge whole periods only */
> > 
> > 	if (step == 0)
> > 		return;  /* haven't advanced a whole period yet */
> > 	pipe->last_counter += step;  /* does not always wrap on a period */
> > 	pipe->position += step;
> > 
> > 	/* wraparound the buffer */
> > 	pipe->position %= frames_to_bytes(runtime, runtime->buffer_size);
> > 
> > 	/* notify only once, even if multiple periods elapsed */
> > 	spin_unlock(&chip->lock);
> > 	snd_pcm_period_elapsed(substream);
> > 	spin_lock(&chip->lock);
> > }
> > 
> > static irqreturn_t snd_echo_interrupt(int irq, void *dev_id)
> > {
> > 	struct echoaudio *chip = dev_id;
> > 	int ss, st;
> > 
> > 	spin_lock(&chip->lock);
> > 	st = service_irq(chip);
> > 	if (st < 0) {
> > 		spin_unlock(&chip->lock);
> > 		return IRQ_NONE;
> > 	}
> > 	/* The hardware doesn't tell us which substream caused the irq,
> > 	thus we have to check all running substreams. */
> > 	for (ss = 0; ss < DSP_MAXPIPES; ss++) {
> > 		struct snd_pcm_substream *substream;
> > 
> > 		substream = chip->substream[ss];
> > 		if (substream)
> > 			snd_echo_update_substream_position(chip, substream);
> > 	}
> > 	spin_unlock(&chip->lock);
> > 
> > #ifdef ECHOCARD_HAS_MIDI
> > 	if (st > 0 && chip->midi_in) {
> > 		snd_rawmidi_receive(chip->midi_in, chip->midi_buffer, st);
> > 		dev_dbg(chip->card->dev, "rawmidi_iread=%d\n", st);
> > 	}
> > #endif
> > 	return IRQ_HANDLED;
> > }
> > 
> > static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
> > {
> > 	struct snd_pcm_runtime *runtime = substream->runtime;
> > 	struct audiopipe *pipe = runtime->private_data;
> > 
> > 	return bytes_to_frames(runtime, pipe->position);
> 
> I guess this misses the update of the precise position; in your code,
> pipe->position gets updated only with the period size at irq handler.
> 
> 
> IMO, we should have the code like:
> 
> static bool update_stream_position(struct snd_pcm_substream *substream)
> {
> 	// update pipe->position and others, returns true if period elapsed
> }
> 
> static irqreturn_t snd_echo_interrupt()
> {
> 	spin_lock(&chip->lock);
> 	....
> 	if (update_stream_position(substream)) {
> 		spin_unlock(&chip->lock);
> 		snd_pcm_period_elapsed(substream);
> 		spin_lock(&chip->lock);
> 	}
> 	....
> 	spin_unlock(&chip->lock);
> 	return IRQ_HANDLED;
> }
> 
> static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
> {
> 	....
> 	update_stream_position(substream);
> 	return bytes_to_frames(runtime, pipe->position);
> }

Thanks.

I certainly understand this in isolation. 

But could I please ask for help with the bigger picture? As it feels 
mismatched.

* Code should take every possible opportunity to update the stream 
  position; interrupts, or explicit pcm_pointer calls (whereas the docs 
  guide towards doing it in the interrupt handler)

* I critiqued (elsewhere in thread) the older interrupt handler for 
  checking the counter, unlocking, calling back into PCM core and checking 
  again a moment later. Whereas this is considered good behaviour.

* Seems like the overall aim is for userland to be able (if it wants to)  
  to poll the soundcard, even outside of periods.

If all the above is true, I would expect interrupt handling to be very 
simple -- it would straight away call into PCM core, join existing if the 
codepaths (to take locks) and do a position update. PCM core would decide 
if a period really elapsed, not the driver. But this is not how it works.

This now relates strongly to a question of locking:

I ran the code (top of this message) all day, with a few instances in 
dmesg (at irregular intervals, not wrapping):

  [161644.076666] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
  [163232.392501] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
  [164976.098069] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
  [165054.946225] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
  [165312.141545] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64

A definite bug, of course.

However (and I am happy to be corrected) the function never finishes with 
position == buffer size. Only way is a race between interrupt handler and 
pcm_pointer.

Therefore one of these is needed:

* pcm_pointer locks chip->lock

  Even though the docs emphasise PCM core has exclusivity, it it not worth 
  much as it does not protect against the interrupt handler.

  But now interrupt handler becomes ugly in total: take chip->lock, check 
  the counter, release chip->lock, go to PCM core (which takes middle 
  layer locks), take chip->lock again, check counter again, release 
  chip->lock again.

* interrupt handler must make atomic update of pipe->position

  This could have been a solution, but not if we expect pcm_pointer to 
  also invoke the position update. Now we have a race: both the interrupt 
  handler and pcm_position want to read dma_counter and write 
  pipe->position after.

So either do everthing in interrupt, everything in the pointer callback 
(though there isn't the API for this), but doing both does not seem to 
work well (though probably can be made to work if necessary)

If we can clarify the requirements then I do not think it would be hard 
for me to write the code.

[...]
> Takashi

Thanks again,
Takashi Iwai June 18, 2020, 11:21 a.m. UTC | #11
On Thu, 18 Jun 2020 13:07:33 +0200,
Mark Hills wrote:
> 
> On Thu, 18 Jun 2020, Takashi Iwai wrote:
> 
> > On Wed, 17 Jun 2020 12:51:05 +0200,
> > Mark Hills wrote:
> > > 
> > > On Tue, 16 Jun 2020, Takashi Iwai wrote:
> > > 
> > > > On Tue, 16 Jun 2020 16:01:11 +0200,
> > > > Mark Hills wrote:
> [...]
> > > 
> > > /* Update software pointer to match the hardware
> > >  *
> > >  * \pre chip->lock is held
> > >  */
> > > static void snd_echo_update_substream_position(struct echoaudio *chip,
> > > 					struct snd_pcm_substream *substream)
> > > {
> > > 	struct snd_pcm_runtime *runtime = substream->runtime;
> > > 	struct audiopipe *pipe = runtime->private_data;
> > > 	u32 counter, step;
> > > 	size_t period_bytes;
> > > 
> > > 	if (pipe->state != PIPE_STATE_STARTED)
> > > 		return;
> > > 
> > > 	period_bytes = frames_to_bytes(runtime, runtime->period_size);
> > > 
> > > 	counter = le32_to_cpu(*pipe->dma_counter);
> > > 
> > > 	step = counter - pipe->last_counter;  /* handles wrapping of counter */
> > > 	step -= step % period_bytes;  /* acknowledge whole periods only */
> > > 
> > > 	if (step == 0)
> > > 		return;  /* haven't advanced a whole period yet */
> > > 	pipe->last_counter += step;  /* does not always wrap on a period */
> > > 	pipe->position += step;
> > > 
> > > 	/* wraparound the buffer */
> > > 	pipe->position %= frames_to_bytes(runtime, runtime->buffer_size);
> > > 
> > > 	/* notify only once, even if multiple periods elapsed */
> > > 	spin_unlock(&chip->lock);
> > > 	snd_pcm_period_elapsed(substream);
> > > 	spin_lock(&chip->lock);
> > > }
> > > 
> > > static irqreturn_t snd_echo_interrupt(int irq, void *dev_id)
> > > {
> > > 	struct echoaudio *chip = dev_id;
> > > 	int ss, st;
> > > 
> > > 	spin_lock(&chip->lock);
> > > 	st = service_irq(chip);
> > > 	if (st < 0) {
> > > 		spin_unlock(&chip->lock);
> > > 		return IRQ_NONE;
> > > 	}
> > > 	/* The hardware doesn't tell us which substream caused the irq,
> > > 	thus we have to check all running substreams. */
> > > 	for (ss = 0; ss < DSP_MAXPIPES; ss++) {
> > > 		struct snd_pcm_substream *substream;
> > > 
> > > 		substream = chip->substream[ss];
> > > 		if (substream)
> > > 			snd_echo_update_substream_position(chip, substream);
> > > 	}
> > > 	spin_unlock(&chip->lock);
> > > 
> > > #ifdef ECHOCARD_HAS_MIDI
> > > 	if (st > 0 && chip->midi_in) {
> > > 		snd_rawmidi_receive(chip->midi_in, chip->midi_buffer, st);
> > > 		dev_dbg(chip->card->dev, "rawmidi_iread=%d\n", st);
> > > 	}
> > > #endif
> > > 	return IRQ_HANDLED;
> > > }
> > > 
> > > static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
> > > {
> > > 	struct snd_pcm_runtime *runtime = substream->runtime;
> > > 	struct audiopipe *pipe = runtime->private_data;
> > > 
> > > 	return bytes_to_frames(runtime, pipe->position);
> > 
> > I guess this misses the update of the precise position; in your code,
> > pipe->position gets updated only with the period size at irq handler.
> > 
> > 
> > IMO, we should have the code like:
> > 
> > static bool update_stream_position(struct snd_pcm_substream *substream)
> > {
> > 	// update pipe->position and others, returns true if period elapsed
> > }
> > 
> > static irqreturn_t snd_echo_interrupt()
> > {
> > 	spin_lock(&chip->lock);
> > 	....
> > 	if (update_stream_position(substream)) {
> > 		spin_unlock(&chip->lock);
> > 		snd_pcm_period_elapsed(substream);
> > 		spin_lock(&chip->lock);
> > 	}
> > 	....
> > 	spin_unlock(&chip->lock);
> > 	return IRQ_HANDLED;
> > }
> > 
> > static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
> > {
> > 	....
> > 	update_stream_position(substream);
> > 	return bytes_to_frames(runtime, pipe->position);
> > }
> 
> Thanks.
> 
> I certainly understand this in isolation. 
> 
> But could I please ask for help with the bigger picture? As it feels 
> mismatched.
> 
> * Code should take every possible opportunity to update the stream 
>   position; interrupts, or explicit pcm_pointer calls (whereas the docs 
>   guide towards doing it in the interrupt handler)
> 
> * I critiqued (elsewhere in thread) the older interrupt handler for 
>   checking the counter, unlocking, calling back into PCM core and checking 
>   again a moment later. Whereas this is considered good behaviour.
> 
> * Seems like the overall aim is for userland to be able (if it wants to)  
>   to poll the soundcard, even outside of periods.

Right, the user-space can query the current position at any time, and
the driver should return the position as precisely as possible.

Some applications (like PulseAudio) sets the interrupt as minimum as
possible while it does schedule the update by itself, judging the
position via the ioctl.  In such operational mode, the accuracy of the
current position query is vital.

> If all the above is true, I would expect interrupt handling to be very 
> simple -- it would straight away call into PCM core, join existing if the 
> codepaths (to take locks) and do a position update. PCM core would decide 
> if a period really elapsed, not the driver. But this is not how it works.
> 
> This now relates strongly to a question of locking:
> 
> I ran the code (top of this message) all day, with a few instances in 
> dmesg (at irregular intervals, not wrapping):
> 
>   [161644.076666] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
>   [163232.392501] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
>   [164976.098069] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
>   [165054.946225] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
>   [165312.141545] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> 
> A definite bug, of course.
> 
> However (and I am happy to be corrected) the function never finishes with 
> position == buffer size. Only way is a race between interrupt handler and 
> pcm_pointer.
> 
> Therefore one of these is needed:
> 
> * pcm_pointer locks chip->lock
> 
>   Even though the docs emphasise PCM core has exclusivity, it it not worth 
>   much as it does not protect against the interrupt handler.
> 
>   But now interrupt handler becomes ugly in total: take chip->lock, check 
>   the counter, release chip->lock, go to PCM core (which takes middle 
>   layer locks), take chip->lock again, check counter again, release 
>   chip->lock again.

Yes, that's the most robust way to go.  If the lock is really costing
too much, you can consider a fast-path by some flag for the irq ->
snd_pcm_period_elapsed() case.

Basically you can do everything in the pointer callback.  The only
requirement in the interrupt handle is basically judge whether you
need the call of snd_pcm_period_elapsed() and call it.  The rest
update could be done in the other places.

> * interrupt handler must make atomic update of pipe->position
> 
>   This could have been a solution, but not if we expect pcm_pointer to 
>   also invoke the position update. Now we have a race: both the interrupt 
>   handler and pcm_position want to read dma_counter and write 
>   pipe->position after.

As mentioned, the update of position at query is essential in majority
of sound systems.


Takashi

> So either do everthing in interrupt, everything in the pointer callback 
> (though there isn't the API for this), but doing both does not seem to 
> work well (though probably can be made to work if necessary)
> 
> If we can clarify the requirements then I do not think it would be hard 
> for me to write the code.
> 
> [...]
> > Takashi
> 
> Thanks again,
> 
> -- 
> Mark
>
Mark Hills June 18, 2020, 12:29 p.m. UTC | #12
On Thu, 18 Jun 2020, Takashi Iwai wrote:

> On Thu, 18 Jun 2020 13:07:33 +0200,
> Mark Hills wrote:
> > 
> > On Thu, 18 Jun 2020, Takashi Iwai wrote:
> > 
> > > On Wed, 17 Jun 2020 12:51:05 +0200,
> > > Mark Hills wrote:
> > [...]
> > But could I please ask for help with the bigger picture? As it feels 
> > mismatched.
> > 
> > * Code should take every possible opportunity to update the stream 
> >   position; interrupts, or explicit pcm_pointer calls (whereas the docs 
> >   guide towards doing it in the interrupt handler)
> > 
> > * I critiqued (elsewhere in thread) the older interrupt handler for 
> >   checking the counter, unlocking, calling back into PCM core and checking 
> >   again a moment later. Whereas this is considered good behaviour.
> > 
> > * Seems like the overall aim is for userland to be able (if it wants to)  
> >   to poll the soundcard, even outside of periods.
> 
> Right, the user-space can query the current position at any time, and
> the driver should return the position as precisely as possible.
> 
> Some applications (like PulseAudio) sets the interrupt as minimum as
> possible while it does schedule the update by itself, judging the
> position via the ioctl.  In such operational mode, the accuracy of the
> current position query is vital.
> 
> > If all the above is true, I would expect interrupt handling to be very 
> > simple -- it would straight away call into PCM core, join existing if the 
> > codepaths (to take locks) and do a position update. PCM core would decide 
> > if a period really elapsed, not the driver. But this is not how it works.
> > 
> > This now relates strongly to a question of locking:
> > 
> > I ran the code (top of this message) all day, with a few instances in 
> > dmesg (at irregular intervals, not wrapping):
> > 
> >   [161644.076666] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> >   [163232.392501] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> >   [164976.098069] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> >   [165054.946225] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> >   [165312.141545] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > 
> > A definite bug, of course.
> > 
> > However (and I am happy to be corrected) the function never finishes with 
> > position == buffer size. Only way is a race between interrupt handler and 
> > pcm_pointer.
> > 
> > Therefore one of these is needed:
> > 
> > * pcm_pointer locks chip->lock
> > 
> >   Even though the docs emphasise PCM core has exclusivity, it it not worth 
> >   much as it does not protect against the interrupt handler.
> > 
> >   But now interrupt handler becomes ugly in total: take chip->lock, check 
> >   the counter, release chip->lock, go to PCM core (which takes middle 
> >   layer locks), take chip->lock again, check counter again, release 
> >   chip->lock again.
> 
> Yes, that's the most robust way to go.  If the lock is really costing
> too much, you can consider a fast-path by some flag for the irq ->
> snd_pcm_period_elapsed() case.

I don't understand how a fast path could be made to work, as it can't pass 
data across snd_pcm_period_elapsed() and it still must syncronise access 
between reading dma_counter and writing pipe->position.

Hence questioning if a better design is simpler interrupt handlers that 
just enter PCM core.
 
> Basically you can do everything in the pointer callback.  The only 
> requirement in the interrupt handle is basically judge whether you need 
> the call of snd_pcm_period_elapsed() and call it.  The rest update could 
> be done in the other places.

Thanks, please just another clarification:

I presume that calls to pcm_pointer are completely independent of the 
period notifications?

ie. period notifications are at regular intervals, regardless of whether 
pcm_pointer is called inbetween. pcm_pointer must not reset any state used 
by the interrupt.

Which means we must handle when non-interrupt call to pcm_pointer causes a 
period to elapse. The next interrupt handler must notify.

I can see in the original code uses chip->last_period exclusively by the 
interrupt handler, and I removed it. Some comments around the intent would 
help. I'll cross reference the original code with my new understanding.

My instinct here is that to preserve

- regular period notifications

- handle period_size not aligning with 32-bit counter

- no races between interrupt and pcm_pointer

that the clearest and bug-free implementation may be to separate the 
interrupt (notifications) and pointer updates to separate state.

Then there's no lock and the only crossover is an atomic read of 
dma_counter.

And that's what I will try -- thanks.
Mark Hills June 18, 2020, 1:22 p.m. UTC | #13
On Thu, 18 Jun 2020, Mark Hills wrote:

> On Thu, 18 Jun 2020, Takashi Iwai wrote:
> 
> > On Thu, 18 Jun 2020 13:07:33 +0200,
> > Mark Hills wrote:
> > > 
> > > On Thu, 18 Jun 2020, Takashi Iwai wrote:
> > > 
> > > > On Wed, 17 Jun 2020 12:51:05 +0200,
> > > > Mark Hills wrote:
> > > [...]
> > > But could I please ask for help with the bigger picture? As it feels 
> > > mismatched.
> > > 
> > > * Code should take every possible opportunity to update the stream 
> > >   position; interrupts, or explicit pcm_pointer calls (whereas the docs 
> > >   guide towards doing it in the interrupt handler)
> > > 
> > > * I critiqued (elsewhere in thread) the older interrupt handler for 
> > >   checking the counter, unlocking, calling back into PCM core and checking 
> > >   again a moment later. Whereas this is considered good behaviour.
> > > 
> > > * Seems like the overall aim is for userland to be able (if it wants to)  
> > >   to poll the soundcard, even outside of periods.
> > 
> > Right, the user-space can query the current position at any time, and
> > the driver should return the position as precisely as possible.
> > 
> > Some applications (like PulseAudio) sets the interrupt as minimum as
> > possible while it does schedule the update by itself, judging the
> > position via the ioctl.  In such operational mode, the accuracy of the
> > current position query is vital.
> > 
> > > If all the above is true, I would expect interrupt handling to be very 
> > > simple -- it would straight away call into PCM core, join existing if the 
> > > codepaths (to take locks) and do a position update. PCM core would decide 
> > > if a period really elapsed, not the driver. But this is not how it works.
> > > 
> > > This now relates strongly to a question of locking:
> > > 
> > > I ran the code (top of this message) all day, with a few instances in 
> > > dmesg (at irregular intervals, not wrapping):
> > > 
> > >   [161644.076666] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > >   [163232.392501] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > >   [164976.098069] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > >   [165054.946225] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > >   [165312.141545] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > > 
> > > A definite bug, of course.
> > > 
> > > However (and I am happy to be corrected) the function never finishes with 
> > > position == buffer size. Only way is a race between interrupt handler and 
> > > pcm_pointer.
> > > 
> > > Therefore one of these is needed:
> > > 
> > > * pcm_pointer locks chip->lock
> > > 
> > >   Even though the docs emphasise PCM core has exclusivity, it it not worth 
> > >   much as it does not protect against the interrupt handler.
> > > 
> > >   But now interrupt handler becomes ugly in total: take chip->lock, check 
> > >   the counter, release chip->lock, go to PCM core (which takes middle 
> > >   layer locks), take chip->lock again, check counter again, release 
> > >   chip->lock again.
> > 
> > Yes, that's the most robust way to go.  If the lock is really costing
> > too much, you can consider a fast-path by some flag for the irq ->
> > snd_pcm_period_elapsed() case.
> 
> I don't understand how a fast path could be made to work, as it can't pass 
> data across snd_pcm_period_elapsed() and it still must syncronise access 
> between reading dma_counter and writing pipe->position.
> 
> Hence questioning if a better design is simpler interrupt handlers that 
> just enter PCM core.
>  
> > Basically you can do everything in the pointer callback.  The only 
> > requirement in the interrupt handle is basically judge whether you need 
> > the call of snd_pcm_period_elapsed() and call it.  The rest update could 
> > be done in the other places.
> 
> Thanks, please just another clarification:
> 
> I presume that calls to pcm_pointer are completely independent of the 
> period notifications?
> 
> ie. period notifications are at regular intervals, regardless of whether 
> pcm_pointer is called inbetween. pcm_pointer must not reset any state used 
> by the interrupt.
> 
> Which means we must handle when non-interrupt call to pcm_pointer causes a 
> period to elapse. The next interrupt handler must notify.
> 
> I can see in the original code uses chip->last_period exclusively by the 
> interrupt handler, and I removed it. Some comments around the intent would 
> help. I'll cross reference the original code with my new understanding.
> 
> My instinct here is that to preserve
> 
> - regular period notifications
> 
> - handle period_size not aligning with 32-bit counter
> 
> - no races between interrupt and pcm_pointer
> 
> that the clearest and bug-free implementation may be to separate the 
> interrupt (notifications) and pointer updates to separate state.
> 
> Then there's no lock and the only crossover is an atomic read of 
> dma_counter.
> 
> And that's what I will try -- thanks.

Ok so the implementation would look something like this below,
which I will run for the rest of the day:

* Clear separation of the period notification from position updates; only 
  syncronising is around dma_counter, no need for locks

* All counting is accumulated to avoid bugs in the cases of wrapping
  and non-alignment

It's easier to see in the end results but of course I'll work on a clear 
diff.

---

/******************************************************************************
	IRQ Handling
******************************************************************************/
/* Check if a period has elapsed since last interrupt
 *
 * Don't make any updates to state; PCM core handles this with the
 * correct locks.
 *
 * \return true if a period has elapsed, otherwise false
 */
static bool period_has_elapsed(struct snd_pcm_substream *substream)
{
	struct snd_pcm_runtime *runtime = substream->runtime;
	struct audiopipe *pipe = runtime->private_data;
	u32 counter, step;
	size_t period_bytes;

	if (pipe->state != PIPE_STATE_STARTED)
		return false;

	period_bytes = frames_to_bytes(runtime, runtime->period_size);

	counter = le32_to_cpu(*pipe->dma_counter);  /* presumed atomic */

	step = counter - pipe->last_period;  /* handles wrapping */
	step -= step % period_bytes;  /* acknowledge whole periods only */

	if (step == 0)
		return false;  /* haven't advanced a whole period yet */

	pipe->last_period += step;  /* used exclusively by us */
	return true;
}

static irqreturn_t snd_echo_interrupt(int irq, void *dev_id)
{
	struct echoaudio *chip = dev_id;
	int ss, st;

	spin_lock(&chip->lock);
	st = service_irq(chip);
	if (st < 0) {
		spin_unlock(&chip->lock);
		return IRQ_NONE;
	}
	/* The hardware doesn't tell us which substream caused the irq,
	thus we have to check all running substreams. */
	for (ss = 0; ss < DSP_MAXPIPES; ss++) {
		struct snd_pcm_substream *substream;

		substream = chip->substream[ss];
		if (substream && period_has_elapsed(substream)) {
			spin_unlock(&chip->lock);
			snd_pcm_period_elapsed(substream);
			spin_lock(&chip->lock);
		}
	}
	spin_unlock(&chip->lock);

#ifdef ECHOCARD_HAS_MIDI
	if (st > 0 && chip->midi_in) {
		snd_rawmidi_receive(chip->midi_in, chip->midi_buffer, st);
		dev_dbg(chip->card->dev, "rawmidi_iread=%d\n", st);
	}
#endif
	return IRQ_HANDLED;
}


static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
{
	struct snd_pcm_runtime *runtime = substream->runtime;
	struct audiopipe *pipe = runtime->private_data;
	u32 counter, step;

	/*
	 * IRQ handling runs concurrently. Do not share tracking of
	 * counter with it, which would race or require locking
	 */

	counter = le32_to_cpu(*pipe->dma_counter);  /* presumed atomic */

	step = counter - pipe->last_counter;  /* handles wrapping */
	pipe->last_counter = counter;

	/* counter doesn't neccessarily wrap on a multiple of
	 * buffer_size, so can't derive the position; must
	 * accumulate */

	pipe->position += step;
	pipe->position %= frames_to_bytes(runtime, runtime->buffer_size); /* wrap */

	return bytes_to_frames(runtime, pipe->position);
}
Giuliano Pochini June 19, 2020, 7:56 p.m. UTC | #14
On Wed, 17 Jun 2020 12:14:42 +0100 (BST)
Mark Hills <mark@xwax.org> wrote:

> On Wed, 17 Jun 2020, Giuliano Pochini wrote:
> [...]
> > -	pipe->position += step;  /* bytes */
> > -
> > -	buffer_bytes = frames_to_bytes(runtime, runtime->buffer_size);
> > -
> > -	while (pipe->position >= buffer_bytes)
> > -		pipe->position -= buffer_bytes;
> > -
> >  	return 1;
> 
> I think this risks returning to a case where it concludes nothing
> advances if the counter advances by a whole buffer?

Yes, it can, but you can detect that case checking for step >= period_bytes.


> You might be able to do the comparison before wrapping pipe_position, but 
> hopefully you'll consider my patch in reply to Takashi has more clarity.

Your patch is very interesting. I didn't take into account the idea of
advancing the position by full periods only. If the PCM subsystem hasn't
changed much since I last checked (I wrote the driver many years ago), it
should work fine (and I'm sure you tested it). But I don't know if
something else requires better resolution.
Mark Hills June 19, 2020, 9:21 p.m. UTC | #15
On Fri, 19 Jun 2020, Giuliano Pochini wrote:

> On Wed, 17 Jun 2020 12:14:42 +0100 (BST)
> Mark Hills <mark@xwax.org> wrote:
> 
[...]
> > You might be able to do the comparison before wrapping pipe_position, 
> > but hopefully you'll consider my patch in reply to Takashi has more 
> > clarity.
> 
> Your patch is very interesting. I didn't take into account the idea of 
> advancing the position by full periods only. If the PCM subsystem hasn't 
> changed much since I last checked (I wrote the driver many years ago), 
> it should work fine (and I'm sure you tested it). But I don't know if 
> something else requires better resolution.

It's funny, but I didn't take account of the opposite; that there was any 
merits to polling inbetween the interrupts for better resolution.

Takashi pointed out the need for this and we had some discussion. Check 
the other thread, where I provided a newer revision of the code.

The good thing is I think we can have all the things we want and be bug 
free, just I have to understand the specification.

It would be great if you would like to take a look at the newer code for 
any problems you can see. I was going to run it for a few days then turn 
it into some patches.
Giuliano Pochini June 28, 2020, 10:02 p.m. UTC | #16
On Fri, 19 Jun 2020 22:21:54 +0100 (BST)
Mark Hills <mark@xwax.org> wrote:

> On Fri, 19 Jun 2020, Giuliano Pochini wrote:
> 
> > On Wed, 17 Jun 2020 12:14:42 +0100 (BST)
> > Mark Hills <mark@xwax.org> wrote:
> > 
> [...]
> > > You might be able to do the comparison before wrapping pipe_position, 
> > > but hopefully you'll consider my patch in reply to Takashi has more 
> > > clarity.
> > 
> > Your patch is very interesting. I didn't take into account the idea of 
> > advancing the position by full periods only. If the PCM subsystem
> > hasn't changed much since I last checked (I wrote the driver many years
> > ago), it should work fine (and I'm sure you tested it). But I don't
> > know if something else requires better resolution.
> 
> It's funny, but I didn't take account of the opposite; that there was any 
> merits to polling inbetween the interrupts for better resolution.
> 
> Takashi pointed out the need for this and we had some discussion. Check 
> the other thread, where I provided a newer revision of the code.
> 
> The good thing is I think we can have all the things we want and be bug 
> free, just I have to understand the specification.
> 
> It would be great if you would like to take a look at the newer code for 
> any problems you can see. I was going to run it for a few days then turn 
> it into some patches.

I looked at your code and I think it's OK. I'm using it for some days
without any problem. I also stressed it with pretty tight timings and it
worked fine all the time.

Since I could not reproduce that problem before, except in some rare random
circumstances, I'm not a good tester at all. At most I can say that your
patch does not make things worse :)
Mark Hills July 1, 2020, 12:25 p.m. UTC | #17
On Mon, 29 Jun 2020, Giuliano Pochini wrote:

> On Fri, 19 Jun 2020 22:21:54 +0100 (BST)
> Mark Hills <mark@xwax.org> wrote:
> 
> > On Fri, 19 Jun 2020, Giuliano Pochini wrote:
> > 
> > > On Wed, 17 Jun 2020 12:14:42 +0100 (BST)
> > > Mark Hills <mark@xwax.org> wrote:
> > > 
> > [...]
> > > > You might be able to do the comparison before wrapping pipe_position, 
> > > > but hopefully you'll consider my patch in reply to Takashi has more 
> > > > clarity.
> > > 
> > > Your patch is very interesting. I didn't take into account the idea of 
> > > advancing the position by full periods only. If the PCM subsystem
> > > hasn't changed much since I last checked (I wrote the driver many years
> > > ago), it should work fine (and I'm sure you tested it). But I don't
> > > know if something else requires better resolution.
> > 
> > It's funny, but I didn't take account of the opposite; that there was any 
> > merits to polling inbetween the interrupts for better resolution.
> > 
> > Takashi pointed out the need for this and we had some discussion. Check 
> > the other thread, where I provided a newer revision of the code.
> > 
> > The good thing is I think we can have all the things we want and be bug 
> > free, just I have to understand the specification.
> > 
> > It would be great if you would like to take a look at the newer code for 
> > any problems you can see. I was going to run it for a few days then turn 
> > it into some patches.
> 
> I looked at your code and I think it's OK. I'm using it for some days 
> without any problem. I also stressed it with pretty tight timings and it 
> worked fine all the time.
> 
> Since I could not reproduce that problem before, except in some rare 
> random circumstances, I'm not a good tester at all. At most I can say 
> that your patch does not make things worse :)

What software are you using on the device, and are you using x86_64 and 
dmix?

I think some issues might be exaggerated by dmix which has a unique way of 
opening the device several times. And then chromium exercises dmix a lot 
with all of its threads/forks. That would I presume be how it exercises 
races between pcm_pointer and interrupts.
Giuliano Pochini July 1, 2020, 2:51 p.m. UTC | #18
On Wed, 1 Jul 2020 13:25:07 +0100 (BST)
Mark Hills <mark@xwax.org> wrote:

> On Mon, 29 Jun 2020, Giuliano Pochini wrote:
> 
> > Since I could not reproduce that problem before, except in some rare 
> > random circumstances, I'm not a good tester at all. At most I can say 
> > that your patch does not make things worse :)
> 
> What software are you using on the device, and are you using x86_64 and 
> dmix?
> 
> I think some issues might be exaggerated by dmix which has a unique way
> of opening the device several times. And then chromium exercises dmix a
> lot with all of its threads/forks. That would I presume be how it
> exercises races between pcm_pointer and interrupts.

x86_64 now. When I wrote the driver I had a powermac. I also can test it on
a x86_32 laptop with an Indigo-IO card.

I tested both plughw and dmix, but I don't use the latter often.
diff mbox series

Patch

diff --git a/sound/pci/echoaudio/echoaudio.c b/sound/pci/echoaudio/echoaudio.c
index 8cf08988959f..12217649fb44 100644
--- a/sound/pci/echoaudio/echoaudio.c
+++ b/sound/pci/echoaudio/echoaudio.c
@@ -2,6 +2,7 @@ 
 /*
  *  ALSA driver for Echoaudio soundcards.
  *  Copyright (C) 2003-2004 Giuliano Pochini <pochini@shiny.it>
+ *  Copyright (C) 2020 Mark Hills <mark@xwax.org>
  */
 
 #include <linux/module.h>
@@ -590,7 +591,6 @@  static int init_engine(struct snd_pcm_substream *substream,
 	/* This stuff is used by the irq handler, so it must be
 	 * initialized before chip->substream
 	 */
-	chip->last_period[pipe_index] = 0;
 	pipe->last_counter = 0;
 	pipe->position = 0;
 	smp_wmb();
@@ -759,7 +759,6 @@  static int pcm_trigger(struct snd_pcm_substream *substream, int cmd)
 				pipe = chip->substream[i]->runtime->private_data;
 				switch (pipe->state) {
 				case PIPE_STATE_STOPPED:
-					chip->last_period[i] = 0;
 					pipe->last_counter = 0;
 					pipe->position = 0;
 					*pipe->dma_counter = 0;
@@ -807,19 +806,8 @@  static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
 {
 	struct snd_pcm_runtime *runtime = substream->runtime;
 	struct audiopipe *pipe = runtime->private_data;
-	size_t cnt, bufsize, pos;
 
-	cnt = le32_to_cpu(*pipe->dma_counter);
-	pipe->position += cnt - pipe->last_counter;
-	pipe->last_counter = cnt;
-	bufsize = substream->runtime->buffer_size;
-	pos = bytes_to_frames(substream->runtime, pipe->position);
-
-	while (pos >= bufsize) {
-		pipe->position -= frames_to_bytes(substream->runtime, bufsize);
-		pos -= bufsize;
-	}
-	return pos;
+	return bytes_to_frames(runtime, pipe->position);
 }
 
 
@@ -1782,14 +1770,50 @@  static const struct snd_kcontrol_new snd_echo_channels_info = {
 
 
 /******************************************************************************
-	IRQ Handler
+	IRQ Handling
 ******************************************************************************/
 
+/* Update software pointer to match the hardware
+ *
+ * Return: 1 if we crossed a period threshold, otherwise 0
+ */
+static int snd_echo_poll_substream(struct snd_pcm_substream *substream)
+{
+	struct snd_pcm_runtime *runtime = substream->runtime;
+	struct audiopipe *pipe = runtime->private_data;
+	unsigned counter, step, period, last_period;
+	size_t buffer_bytes;
+
+	if (pipe->state != PIPE_STATE_STARTED)
+		return 0;
+
+	counter = le32_to_cpu(*pipe->dma_counter);
+
+	period = bytes_to_frames(runtime, counter)
+		/ runtime->period_size;
+	last_period = bytes_to_frames(runtime, pipe->last_counter)
+		/ runtime->period_size;
+
+	if (period == last_period)
+		return 0;  /* don't do any work yet */
+
+	step = counter - pipe->last_counter;
+	pipe->last_counter = counter;
+
+	pipe->position += step;  /* bytes */
+
+	buffer_bytes = frames_to_bytes(runtime, runtime->buffer_size);
+
+	while (pipe->position >= buffer_bytes)
+		pipe->position -= buffer_bytes;
+
+	return 1;
+}
+
 static irqreturn_t snd_echo_interrupt(int irq, void *dev_id)
 {
 	struct echoaudio *chip = dev_id;
-	struct snd_pcm_substream *substream;
-	int period, ss, st;
+	int ss, st;
 
 	spin_lock(&chip->lock);
 	st = service_irq(chip);
@@ -1800,18 +1824,18 @@  static irqreturn_t snd_echo_interrupt(int irq, void *dev_id)
 	/* The hardware doesn't tell us which substream caused the irq,
 	thus we have to check all running substreams. */
 	for (ss = 0; ss < DSP_MAXPIPES; ss++) {
+		struct snd_pcm_substream *substream;
+
 		substream = chip->substream[ss];
-		if (substream && ((struct audiopipe *)substream->runtime->
-				private_data)->state == PIPE_STATE_STARTED) {
-			period = pcm_pointer(substream) /
-				substream->runtime->period_size;
-			if (period != chip->last_period[ss]) {
-				chip->last_period[ss] = period;
-				spin_unlock(&chip->lock);
-				snd_pcm_period_elapsed(substream);
-				spin_lock(&chip->lock);
-			}
-		}
+		if (!substream)
+			continue;
+
+		if (!snd_echo_poll_substream(substream))
+			continue;
+
+		spin_unlock(&chip->lock);
+		snd_pcm_period_elapsed(substream);
+		spin_lock(&chip->lock);
 	}
 	spin_unlock(&chip->lock);
 
diff --git a/sound/pci/echoaudio/echoaudio.h b/sound/pci/echoaudio/echoaudio.h
index 6fd283e4e676..7ff5d4de6880 100644
--- a/sound/pci/echoaudio/echoaudio.h
+++ b/sound/pci/echoaudio/echoaudio.h
@@ -332,7 +332,6 @@  struct audioformat {
 struct echoaudio {
 	spinlock_t lock;
 	struct snd_pcm_substream *substream[DSP_MAXPIPES];
-	int last_period[DSP_MAXPIPES];
 	struct mutex mode_mutex;
 	u16 num_digital_modes, digital_mode_list[6];
 	u16 num_clock_sources, clock_source_list[10];