diff mbox

[08/11] ALSA: vsnd: Add timer for period interrupt emulation

Message ID 1502091796-14413-9-git-send-email-andr2000@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Oleksandr Andrushchenko Aug. 7, 2017, 7:43 a.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Front sound driver has no real interrupts, so
playback/capture period passed interrupt needs to be emulated:
this is done via timer. Add required timer operations,
this is based on sound/drivers/dummy.c.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
 sound/drivers/xen-front.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 121 insertions(+)

Comments

Clemens Ladisch Aug. 7, 2017, 10:27 a.m. UTC | #1
Oleksandr Andrushchenko wrote:
> Front sound driver has no real interrupts, so
> playback/capture period passed interrupt needs to be emulated:
> this is done via timer. Add required timer operations,
> this is based on sound/drivers/dummy.c.

A 'real' sound card use the interrupt to synchronize the stream position
between the hardware and the driver.  The hardware triggers an interrupt
immediately after a period has been completely read (for playback) from
the ring buffer by the DMA unit; this tells the driver that it is now
again allowed to write to that part of the buffer.

The dummy driver has no hardware that accesses the buffer, so the period
interrupts are not synchronized to anything.  This is not a suitable
implementation when the samples are actually used.

If you issue interrupts based on the system timer, the position reported
by the .pointer callback and the position where the hardware (backend)
actually accesses the buffer will diverge, which will eventually corrupt
data.

You have to implement period interrupts (and the .pointer callback)
based on when the samples are actually moved from/to the backend.


Regards,
Clemens
Oleksandr Andrushchenko Aug. 7, 2017, 11:30 a.m. UTC | #2
Hi, Clemens!

On 08/07/2017 01:27 PM, Clemens Ladisch wrote:
> Oleksandr Andrushchenko wrote:
>> Front sound driver has no real interrupts, so
>> playback/capture period passed interrupt needs to be emulated:
>> this is done via timer. Add required timer operations,
>> this is based on sound/drivers/dummy.c.
> A 'real' sound card use the interrupt to synchronize the stream position
> between the hardware and the driver.  The hardware triggers an interrupt
> immediately after a period has been completely read (for playback) from
> the ring buffer by the DMA unit; this tells the driver that it is now
> again allowed to write to that part of the buffer.
Yes, I know that, thank you
> The dummy driver has no hardware that accesses the buffer, so the period
> interrupts are not synchronized to anything.
Exactly
>    This is not a suitable
> implementation when the samples are actually used.

> If you issue interrupts based on the system timer, the position reported
> by the .pointer callback and the position where the hardware (backend)
> actually accesses the buffer will diverge, which will eventually corrupt
> data.
Makes sense, but in my case the buffer from the frontend
is copied into backend's memory, so they don't share the
same buffer as real HW does. But it is still possible that
the new portion of data may arrive and backend will overwrite
the memory which hasn't been played yet because pointers are not
synchronized
> You have to implement period interrupts (and the .pointer callback)
> based on when the samples are actually moved from/to the backend.
Do you think I can implement this in a slightly different way,
without a timer at all, by updating
substream->runtime->hw_ptr_base explicitly in the frontend driver?
Like it was implemented [1], see virtualcard_pcm_pointer
(unfortunately, that driver didn't make it to the kernel).
So, that way, whenever I get an ack/response from the backend that it has
successfully played the buffer I can update hw_ptr_base at the frontend
and thus be always in sync to the backend.
> Regards,
> Clemens
Thank you,
Oleksandr

[1] http://marc.info/?l=xen-devel&m=142185395013970&w=4
Clemens Ladisch Aug. 7, 2017, 1:11 p.m. UTC | #3
Oleksandr Andrushchenko wrote:
> On 08/07/2017 01:27 PM, Clemens Ladisch wrote:
>> You have to implement period interrupts (and the .pointer callback)
>> based on when the samples are actually moved from/to the backend.
>
> Do you think I can implement this in a slightly different way,
> without a timer at all, by updating
> substream->runtime->hw_ptr_base explicitly in the frontend driver?

As far as I am aware, hw_ptr_base is an internal field that drivers
are not supposed to change.

Just use your own variable, and return it from the .pointer callback.

> So, that way, whenever I get an ack/response from the backend that it has
> successfully played the buffer

That response should come after every period.

How does that interface work?  Is it possible to change the period size,
or at least to detect what it is?


Regards,
Clemens
Oleksandr Andrushchenko Aug. 7, 2017, 1:38 p.m. UTC | #4
On 08/07/2017 04:11 PM, Clemens Ladisch wrote:
> Oleksandr Andrushchenko wrote:
>> On 08/07/2017 01:27 PM, Clemens Ladisch wrote:
>>> You have to implement period interrupts (and the .pointer callback)
>>> based on when the samples are actually moved from/to the backend.
>> Do you think I can implement this in a slightly different way,
>> without a timer at all, by updating
>> substream->runtime->hw_ptr_base explicitly in the frontend driver?
> As far as I am aware, hw_ptr_base is an internal field that drivers
> are not supposed to change.
I know that and always considered not a good solution,
this is why I have timer to emulate things
> Just use your own variable, and return it from the .pointer callback.
this can work, but see below
>> So, that way, whenever I get an ack/response from the backend that it has
>> successfully played the buffer
> That response should come after every period.

> How does that interface work?
For the buffer received in .copy_user/.copy_kernel we send
a request to the backend and get response back (async) when it has copied
the bytes into HW/mixer/etc, so the buffer at frontend side can be reused.
So, the amount of bytes in this exchange is not necessarily
a multiply of the period. Also, there is no way to synchronize period
sizes in the front driver and backend to make those equal.
There is also no event from the backend in the
protocol to tell that the period has elapsed, so
sending data in period's size buffers will not probably
help because of possible underruns
>   Is it possible to change the period size,
> or at least to detect what it is?

Unfortunately no, this is not in the protocol.

>
>
> Regards,
> Clemens
you can see the protocol at [1]

Thank you,
Oleksandr

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/tree/include/xen/interface/io/sndif.h?h=for-next
Clemens Ladisch Aug. 7, 2017, 1:55 p.m. UTC | #5
Oleksandr Andrushchenko wrote:
> On 08/07/2017 04:11 PM, Clemens Ladisch wrote:
>> How does that interface work?
>
> For the buffer received in .copy_user/.copy_kernel we send
> a request to the backend and get response back (async) when it has copied
> the bytes into HW/mixer/etc, so the buffer at frontend side can be reused.

So if the frontend sends too many (too large) requests, does the
backend wait until there is enough free space in the buffer before
it does the actual copying and then acks?

If yes, then these acks can be used as interrupts.  (You still
have to count frames, and call snd_pcm_period_elapsed() exactly
when a period boundary was reached or crossed.)

Splitting a large read/write into smaller requests to the backend
would improve the granularity of the known stream position.

The overall latency would be the sum of the sizes of the frontend
and backend buffers.


Why is the protocol designed this way?  Wasn't the goal to expose
some 'real' sound card?


Regards,
Clemens
Oleksandr Andrushchenko Aug. 7, 2017, 3:14 p.m. UTC | #6
On 08/07/2017 04:55 PM, Clemens Ladisch wrote:
> Oleksandr Andrushchenko wrote:
>> On 08/07/2017 04:11 PM, Clemens Ladisch wrote:
>>> How does that interface work?
>> For the buffer received in .copy_user/.copy_kernel we send
>> a request to the backend and get response back (async) when it has copied
>> the bytes into HW/mixer/etc, so the buffer at frontend side can be reused.
> So if the frontend sends too many (too large) requests, does the
> backend wait until there is enough free space in the buffer before
> it does the actual copying and then acks?
Well, the frontend should be backend agnostic,
In our implementation backend is a user-space application which sits
either on top of ALSA driver or PulseAudio: so, it acks correspondingly,
e.g, when, for example, ALSA driver completes .copy_user and returns
from the kernel
> If yes, then these acks can be used as interrupts.
we can probably teach our backend to track periods elapsed for ALSA,
but not sure if it is possible for PulseAudio - do you know if this is also
doable for pulse?

Let's assume backend blocks until the buffer played/consumed...
>    (You still
> have to count frames, and call snd_pcm_period_elapsed() exactly
> when a period boundary was reached or crossed.)
... and what if the buffer has multiple periods? So, that the backend sends
a single response for multiple periods (buffers with fractional period 
number
can be handled separately)?
We will have to either send snd_pcm_period_elapsed once (wrong, because
multiple periods consumed) or multiple times at one time with no delay 
(wrong,
because there will be a confusion that multiple periods were not 
reported for quite
some long time and then there is a burst of events)
Either way the behavior will not be the one desired (please correct me
if I am wrong here)
>
> Splitting a large read/write into smaller requests to the backend
> would improve the granularity of the known stream position.
>
> The overall latency would be the sum of the sizes of the frontend
> and backend buffers.
>
>
> Why is the protocol designed this way?
We also work on para-virtualizing display device and there we tried to use
page flip events from backend to frontend to signal similar to
period interrupt for audio. When multiple displays (read multiple audio 
streams)
were in place we flooded with the system interrupts (which are period 
events in our case)
and performance dropped significantly. This is why we switched to
interrupt emulation, here via timer for audio. The main measures were:
1. Number of events between front and back
2. Latency
With timer approach we reduce 1) to the minimum which is a must (no period
interrupts), but 2) is still here
With emulated period interrupts (protocol events) we have issue with 1)
and still 2) remains.

So, to me, neither approach solves the problem for 100%, so we decided
to stick to timers. Hope, this gives more background on why we did things
the way we did.
>   Wasn't the goal to expose
> some 'real' sound card?
>
yes, but it can be implemented in different ways, please see above
> Regards,
> Clemens
Thank you for your interest,
Oleksandr
Oleksandr Andrushchenko Aug. 8, 2017, 6:09 a.m. UTC | #7
On 08/07/2017 06:14 PM, Oleksandr Andrushchenko wrote:
>
> On 08/07/2017 04:55 PM, Clemens Ladisch wrote:
>> Oleksandr Andrushchenko wrote:
>>> On 08/07/2017 04:11 PM, Clemens Ladisch wrote:
>>>> How does that interface work?
>>> For the buffer received in .copy_user/.copy_kernel we send
>>> a request to the backend and get response back (async) when it has 
>>> copied
>>> the bytes into HW/mixer/etc, so the buffer at frontend side can be 
>>> reused.
>> So if the frontend sends too many (too large) requests, does the
>> backend wait until there is enough free space in the buffer before
>> it does the actual copying and then acks?
> Well, the frontend should be backend agnostic,
> In our implementation backend is a user-space application which sits
> either on top of ALSA driver or PulseAudio: so, it acks correspondingly,
> e.g, when, for example, ALSA driver completes .copy_user and returns
> from the kernel
>> If yes, then these acks can be used as interrupts.
> we can probably teach our backend to track periods elapsed for ALSA,
> but not sure if it is possible for PulseAudio - do you know if this is 
> also
> doable for pulse?
>
> Let's assume backend blocks until the buffer played/consumed...
>>    (You still
>> have to count frames, and call snd_pcm_period_elapsed() exactly
>> when a period boundary was reached or crossed.)
> ... and what if the buffer has multiple periods? So, that the backend 
> sends
> a single response for multiple periods (buffers with fractional period 
> number
> can be handled separately)?
> We will have to either send snd_pcm_period_elapsed once (wrong, because
> multiple periods consumed) or multiple times at one time with no delay 
> (wrong,
> because there will be a confusion that multiple periods were not 
> reported for quite
> some long time and then there is a burst of events)
> Either way the behavior will not be the one desired (please correct me
> if I am wrong here)
>>
>> Splitting a large read/write into smaller requests to the backend
>> would improve the granularity of the known stream position.
>>
>> The overall latency would be the sum of the sizes of the frontend
>> and backend buffers.
>>
>>
>> Why is the protocol designed this way?
> We also work on para-virtualizing display device and there we tried to 
> use
> page flip events from backend to frontend to signal similar to
> period interrupt for audio. When multiple displays (read multiple 
> audio streams)
> were in place we flooded with the system interrupts (which are period 
> events in our case)
> and performance dropped significantly. This is why we switched to
> interrupt emulation, here via timer for audio. The main measures were:
> 1. Number of events between front and back
> 2. Latency
> With timer approach we reduce 1) to the minimum which is a must (no 
> period
> interrupts), but 2) is still here
> With emulated period interrupts (protocol events) we have issue with 1)
> and still 2) remains.
>
BTW, there is one more approach to solve this [1],
but it uses its own Xen sound protocol and heavily relies
on Linux implementation, which cannot be a part of a generic protocol
> So, to me, neither approach solves the problem for 100%, so we decided
> to stick to timers. Hope, this gives more background on why we did things
> the way we did.
>>   Wasn't the goal to expose
>> some 'real' sound card?
>>
> yes, but it can be implemented in different ways, please see above
>> Regards,
>> Clemens
> Thank you for your interest,
> Oleksandr

[1] 
https://github.com/OpenXT/pv-linux-drivers/blob/master/archive/openxt-audio/main.c#L356
diff mbox

Patch

diff --git a/sound/drivers/xen-front.c b/sound/drivers/xen-front.c
index 9f31e6832086..507c5eb343c8 100644
--- a/sound/drivers/xen-front.c
+++ b/sound/drivers/xen-front.c
@@ -67,12 +67,29 @@  struct sh_buf_info {
 	size_t vbuffer_sz;
 };
 
+struct sdev_alsa_timer_info {
+	spinlock_t lock;
+	struct timer_list timer;
+	unsigned long base_time;
+	/* fractional sample position (based HZ) */
+	unsigned int frac_pos;
+	unsigned int frac_period_rest;
+	/* buffer_size * HZ */
+	unsigned int frac_buffer_size;
+	/* period_size * HZ */
+	unsigned int frac_period_size;
+	unsigned int rate;
+	int elapsed;
+	struct snd_pcm_substream *substream;
+};
+
 struct sdev_pcm_stream_info {
 	int unique_id;
 	struct snd_pcm_hardware pcm_hw;
 	struct xdrv_evtchnl_info *evt_chnl;
 	bool is_open;
 	uint8_t req_next_id;
+	struct sdev_alsa_timer_info timer;
 	struct sh_buf_info sh_buf;
 };
 
@@ -148,6 +165,110 @@  static struct sdev_pcm_stream_info *sdrv_stream_get(
 	return stream;
 }
 
+static inline void sdrv_alsa_timer_rearm(struct sdev_alsa_timer_info *dpcm)
+{
+	mod_timer(&dpcm->timer, jiffies +
+		(dpcm->frac_period_rest + dpcm->rate - 1) / dpcm->rate);
+}
+
+static void sdrv_alsa_timer_update(struct sdev_alsa_timer_info *dpcm)
+{
+	unsigned long delta;
+
+	delta = jiffies - dpcm->base_time;
+	if (!delta)
+		return;
+	dpcm->base_time += delta;
+	delta *= dpcm->rate;
+	dpcm->frac_pos += delta;
+	while (dpcm->frac_pos >= dpcm->frac_buffer_size)
+		dpcm->frac_pos -= dpcm->frac_buffer_size;
+	while (dpcm->frac_period_rest <= delta) {
+		dpcm->elapsed++;
+		dpcm->frac_period_rest += dpcm->frac_period_size;
+	}
+	dpcm->frac_period_rest -= delta;
+}
+
+static int sdrv_alsa_timer_start(struct snd_pcm_substream *substream)
+{
+	struct sdev_pcm_stream_info *stream = sdrv_stream_get(substream);
+	struct sdev_alsa_timer_info *dpcm = &stream->timer;
+
+	spin_lock(&dpcm->lock);
+	dpcm->base_time = jiffies;
+	sdrv_alsa_timer_rearm(dpcm);
+	spin_unlock(&dpcm->lock);
+	return 0;
+}
+
+static int sdrv_alsa_timer_stop(struct snd_pcm_substream *substream)
+{
+	struct sdev_pcm_stream_info *stream = sdrv_stream_get(substream);
+	struct sdev_alsa_timer_info *dpcm = &stream->timer;
+
+	spin_lock(&dpcm->lock);
+	del_timer(&dpcm->timer);
+	spin_unlock(&dpcm->lock);
+	return 0;
+}
+
+static int sdrv_alsa_timer_prepare(struct snd_pcm_substream *substream)
+{
+	struct snd_pcm_runtime *runtime = substream->runtime;
+	struct sdev_pcm_stream_info *stream = sdrv_stream_get(substream);
+	struct sdev_alsa_timer_info *dpcm = &stream->timer;
+
+	dpcm->frac_pos = 0;
+	dpcm->rate = runtime->rate;
+	dpcm->frac_buffer_size = runtime->buffer_size * HZ;
+	dpcm->frac_period_size = runtime->period_size * HZ;
+	dpcm->frac_period_rest = dpcm->frac_period_size;
+	dpcm->elapsed = 0;
+	return 0;
+}
+
+static void sdrv_alsa_timer_callback(unsigned long data)
+{
+	struct sdev_alsa_timer_info *dpcm = (struct sdev_alsa_timer_info *)data;
+	int elapsed;
+
+	spin_lock(&dpcm->lock);
+	sdrv_alsa_timer_update(dpcm);
+	sdrv_alsa_timer_rearm(dpcm);
+	elapsed = dpcm->elapsed;
+	dpcm->elapsed = 0;
+	spin_unlock(&dpcm->lock);
+	if (elapsed)
+		snd_pcm_period_elapsed(dpcm->substream);
+}
+
+static snd_pcm_uframes_t sdrv_alsa_timer_pointer(
+	struct snd_pcm_substream *substream)
+{
+	struct sdev_pcm_stream_info *stream = sdrv_stream_get(substream);
+	struct sdev_alsa_timer_info *dpcm = &stream->timer;
+	snd_pcm_uframes_t pos;
+
+	spin_lock(&dpcm->lock);
+	sdrv_alsa_timer_update(dpcm);
+	pos = dpcm->frac_pos / HZ;
+	spin_unlock(&dpcm->lock);
+	return pos;
+}
+
+static int sdrv_alsa_timer_create(struct snd_pcm_substream *substream)
+{
+	struct sdev_pcm_stream_info *stream = sdrv_stream_get(substream);
+	struct sdev_alsa_timer_info *dpcm = &stream->timer;
+
+	spin_lock_init(&dpcm->lock);
+	dpcm->substream = substream;
+	setup_timer(&dpcm->timer, sdrv_alsa_timer_callback,
+		(unsigned long) dpcm);
+	return 0;
+}
+
 static void sdrv_copy_pcm_hw(struct snd_pcm_hardware *dst,
 	struct snd_pcm_hardware *src,
 	struct snd_pcm_hardware *ref_pcm_hw)