diff mbox series

[v2] staging: bcm2835-audio: interpolate audio delay

Message ID 20181022191708.GA4659@ubuntu (mailing list archive)
State New, archived
Headers show
Series [v2] staging: bcm2835-audio: interpolate audio delay | expand

Commit Message

Mike Brady Oct. 22, 2018, 7:17 p.m. UTC
When the BCM2835 audio output is used, userspace sees a jitter up to 10ms
in the audio position, aka "delay" -- the number of frames that must
be output before a new frame would be played.
Make this a bit nicer for userspace by interpolating the position
using the CPU clock.
The overhead is small -- an extra ktime_get() every time a GPU message
is sent -- and another call and a few calculations whenever the delay
is sought from userland.
At 48,000 frames per second, i.e. approximately 20 microseconds per
frame, it would take a clock inaccuracy of
20 microseconds in 10 milliseconds -- 2,000 parts per million --
to result in an inaccurate estimate, whereas
crystal- or resonator-based clocks typically have an
inaccuracy of 10s to 100s of parts per million.

Signed-off-by: Mike Brady <mikebrady@eircom.net>
---
Changes in v2 -- remove inappropriate addition of SNDRV_PCM_INFO_BATCH flag

 .../vc04_services/bcm2835-audio/bcm2835-pcm.c | 20 +++++++++++++++++++
 .../vc04_services/bcm2835-audio/bcm2835.h     |  1 +
 2 files changed, 21 insertions(+)

Comments

Kirill Marinushkin Oct. 22, 2018, 10:25 p.m. UTC | #1
Hello Mike,

AFAIU, this patch is wrong. Please correct me, maybe I misunderstand something.

> The problem that this patch seeks to resolve is that when userland asks for
> the delay

The userspace asks not for delay, but for the pointer.
You modify the function, which is called `snd_bcm2835_pcm_pointer`. Here you are
supposed to increase `alsa_stream->pos` with the proper offset. Instead, you
imitate a delay, but in fact the delay is not increased.

So, the proper solution should be to fix the reported pointer. As a result,
userspace will recieve the correct delay, instead of these crazy 10 ms.

> FYI, there is
> a discussion of the effects of a downstream equivalent of this suggested patch
> at:
> https://github.com/raspberrypi/firmware/issues/1026#issuecomment-415746016.

Thank you for the link, it clarified for me what you try to achieve.

On 10/22/18 21:17, Mike Brady wrote:
> When the BCM2835 audio output is used, userspace sees a jitter up to 10ms
> in the audio position, aka "delay" -- the number of frames that must
> be output before a new frame would be played.
> Make this a bit nicer for userspace by interpolating the position
> using the CPU clock.
> The overhead is small -- an extra ktime_get() every time a GPU message
> is sent -- and another call and a few calculations whenever the delay
> is sought from userland.
> At 48,000 frames per second, i.e. approximately 20 microseconds per
> frame, it would take a clock inaccuracy of
> 20 microseconds in 10 milliseconds -- 2,000 parts per million --
> to result in an inaccurate estimate, whereas
> crystal- or resonator-based clocks typically have an
> inaccuracy of 10s to 100s of parts per million.
> 
> Signed-off-by: Mike Brady <mikebrady@eircom.net>
> ---
> Changes in v2 -- remove inappropriate addition of SNDRV_PCM_INFO_BATCH flag
> 
>  .../vc04_services/bcm2835-audio/bcm2835-pcm.c | 20 +++++++++++++++++++
>  .../vc04_services/bcm2835-audio/bcm2835.h     |  1 +
>  2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
> index e66da11af5cf..9053b996cada 100644
> --- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
> +++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
> @@ -74,6 +74,7 @@ void bcm2835_playback_fifo(struct bcm2835_alsa_stream *alsa_stream,
>  	atomic_set(&alsa_stream->pos, pos);
>  
>  	alsa_stream->period_offset += bytes;
> +	alsa_stream->interpolate_start = ktime_get();
>  	if (alsa_stream->period_offset >= alsa_stream->period_size) {
>  		alsa_stream->period_offset %= alsa_stream->period_size;
>  		snd_pcm_period_elapsed(substream);
> @@ -243,6 +244,7 @@ static int snd_bcm2835_pcm_prepare(struct snd_pcm_substream *substream)
>  	atomic_set(&alsa_stream->pos, 0);
>  	alsa_stream->period_offset = 0;
>  	alsa_stream->draining = false;
> +	alsa_stream->interpolate_start = ktime_get();
>  
>  	return 0;
>  }
> @@ -292,6 +294,24 @@ snd_bcm2835_pcm_pointer(struct snd_pcm_substream *substream)
>  {
>  	struct snd_pcm_runtime *runtime = substream->runtime;
>  	struct bcm2835_alsa_stream *alsa_stream = runtime->private_data;
> +	ktime_t now = ktime_get();
> +
> +	/* Give userspace better delay reporting by interpolating between GPU
> +	 * notifications, assuming audio speed is close enough to the clock
> +	 * used for ktime
> +	 */
> +
> +	if ((ktime_to_ns(alsa_stream->interpolate_start)) &&
> +	    (ktime_compare(alsa_stream->interpolate_start, now) < 0)) {
> +		u64 interval =
> +			(ktime_to_ns(ktime_sub(now,
> +				alsa_stream->interpolate_start)));
> +		u64 frames_output_in_interval =
> +			div_u64((interval * runtime->rate), 1000000000);
> +		snd_pcm_sframes_t frames_output_in_interval_sized =
> +			-frames_output_in_interval;
> +		runtime->delay = frames_output_in_interval_sized;
> +	}
>  
>  	return snd_pcm_indirect_playback_pointer(substream,
>  		&alsa_stream->pcm_indirect,
> diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h b/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
> index e13435d1c205..595ad584243f 100644
> --- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
> +++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
> @@ -78,6 +78,7 @@ struct bcm2835_alsa_stream {
>  	unsigned int period_offset;
>  	unsigned int buffer_size;
>  	unsigned int period_size;
> +	ktime_t interpolate_start;
>  
>  	struct bcm2835_audio_instance *instance;
>  	int idx;
>
Mike Brady Oct. 24, 2018, 8:20 a.m. UTC | #2
Hi Kirill. Thanks for your comments.

> On 22 Oct 2018, at 23:25, Kirill Marinushkin <k.marinushkin@gmail.com> wrote:
> 
> AFAIU, this patch is wrong. Please correct me, maybe I misunderstand something.
> 
>> The problem that this patch seeks to resolve is that when userland asks for
>> the delay
> 
> The userspace asks not for delay, but for the pointer.

The call in question is snd_pcm_delay. I presume this delay is calculated from knowledge of the stream position “pos", the period (buffer?) number (and period/buffer size) and the snd_pcm_runtime structure’s “delay" field (“runtime->delay”).

> You modify the function, which is called `snd_bcm2835_pcm_pointer`. Here you are
> supposed to increase `alsa_stream->pos` with the proper offset. Instead, you
> imitate a delay, but in fact the delay is not increased.
> 
> So, the proper solution should be to fix the reported pointer.

I think there is a difficulty with this. The “pos” pointer looks to have to be modulo the buffer size. This causes a problem, as I see it, in that if the calculated (pos + interpolated delay in bytes) is longer than the buffer size, it must be wrapped, but AFAIK we are unable to increment a buffer index used in the snd_pcm_delay calculation. Hence the calculation of the actual position would be wrong. This is why the snd_pcm_runtime delay field is used. On reflection, BTW, the patch assumes that the field's original value was zero — that can be rectified.

> As a result,
> userspace will recieve the correct delay, instead of these crazy 10 ms.

Just to point out that with the proposed patch, it appears that the correct delay is being reported, (apart, possibly, from any delay originally set in the snd_pcm_delay field, as mentioned above).

All the best,
Mike

> FYI, there is
>> a discussion of the effects of a downstream equivalent of this suggested patch
>> at:
>> https://github.com/raspberrypi/firmware/issues/1026#issuecomment-415746016.
> 
> Thank you for the link, it clarified for me what you try to achieve.
> 
> On 10/22/18 21:17, Mike Brady wrote:
>> When the BCM2835 audio output is used, userspace sees a jitter up to 10ms
>> in the audio position, aka "delay" -- the number of frames that must
>> be output before a new frame would be played.
>> Make this a bit nicer for userspace by interpolating the position
>> using the CPU clock.
>> The overhead is small -- an extra ktime_get() every time a GPU message
>> is sent -- and another call and a few calculations whenever the delay
>> is sought from userland.
>> At 48,000 frames per second, i.e. approximately 20 microseconds per
>> frame, it would take a clock inaccuracy of
>> 20 microseconds in 10 milliseconds -- 2,000 parts per million --
>> to result in an inaccurate estimate, whereas
>> crystal- or resonator-based clocks typically have an
>> inaccuracy of 10s to 100s of parts per million.
>> 
>> Signed-off-by: Mike Brady <mikebrady@eircom.net>
>> ---
>> Changes in v2 -- remove inappropriate addition of SNDRV_PCM_INFO_BATCH flag
>> 
>> .../vc04_services/bcm2835-audio/bcm2835-pcm.c | 20 +++++++++++++++++++
>> .../vc04_services/bcm2835-audio/bcm2835.h     |  1 +
>> 2 files changed, 21 insertions(+)
>> 
>> diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
>> index e66da11af5cf..9053b996cada 100644
>> --- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
>> +++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
>> @@ -74,6 +74,7 @@ void bcm2835_playback_fifo(struct bcm2835_alsa_stream *alsa_stream,
>> 	atomic_set(&alsa_stream->pos, pos);
>> 
>> 	alsa_stream->period_offset += bytes;
>> +	alsa_stream->interpolate_start = ktime_get();
>> 	if (alsa_stream->period_offset >= alsa_stream->period_size) {
>> 		alsa_stream->period_offset %= alsa_stream->period_size;
>> 		snd_pcm_period_elapsed(substream);
>> @@ -243,6 +244,7 @@ static int snd_bcm2835_pcm_prepare(struct snd_pcm_substream *substream)
>> 	atomic_set(&alsa_stream->pos, 0);
>> 	alsa_stream->period_offset = 0;
>> 	alsa_stream->draining = false;
>> +	alsa_stream->interpolate_start = ktime_get();
>> 
>> 	return 0;
>> }
>> @@ -292,6 +294,24 @@ snd_bcm2835_pcm_pointer(struct snd_pcm_substream *substream)
>> {
>> 	struct snd_pcm_runtime *runtime = substream->runtime;
>> 	struct bcm2835_alsa_stream *alsa_stream = runtime->private_data;
>> +	ktime_t now = ktime_get();
>> +
>> +	/* Give userspace better delay reporting by interpolating between GPU
>> +	 * notifications, assuming audio speed is close enough to the clock
>> +	 * used for ktime
>> +	 */
>> +
>> +	if ((ktime_to_ns(alsa_stream->interpolate_start)) &&
>> +	    (ktime_compare(alsa_stream->interpolate_start, now) < 0)) {
>> +		u64 interval =
>> +			(ktime_to_ns(ktime_sub(now,
>> +				alsa_stream->interpolate_start)));
>> +		u64 frames_output_in_interval =
>> +			div_u64((interval * runtime->rate), 1000000000);
>> +		snd_pcm_sframes_t frames_output_in_interval_sized =
>> +			-frames_output_in_interval;
>> +		runtime->delay = frames_output_in_interval_sized;
>> +	}
>> 
>> 	return snd_pcm_indirect_playback_pointer(substream,
>> 		&alsa_stream->pcm_indirect,
>> diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h b/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
>> index e13435d1c205..595ad584243f 100644
>> --- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
>> +++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
>> @@ -78,6 +78,7 @@ struct bcm2835_alsa_stream {
>> 	unsigned int period_offset;
>> 	unsigned int buffer_size;
>> 	unsigned int period_size;
>> +	ktime_t interpolate_start;
>> 
>> 	struct bcm2835_audio_instance *instance;
>> 	int idx;
>> 
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Kirill Marinushkin Oct. 24, 2018, 6:06 p.m. UTC | #3
Hello Mike,

On 10/24/18 10:20, Mike Brady wrote:
> Hi Kirill. Thanks for your comments.
> 
>> On 22 Oct 2018, at 23:25, Kirill Marinushkin <k.marinushkin@gmail.com> wrote:
>>
>> AFAIU, this patch is wrong. Please correct me, maybe I misunderstand something.
>>
>>> The problem that this patch seeks to resolve is that when userland asks for
>>> the delay
>>
>> The userspace asks not for delay, but for the pointer.
> 
> The call in question is snd_pcm_delay. I presume this delay is calculated from knowledge of the stream position “pos", the period (buffer?) number (and period/buffer size) and the snd_pcm_runtime structure’s “delay" field (“runtime->delay”).
> 


In kernel, this delay is calculated in `snd_pcm_calc_delay()`, defined at
`sound/core/pcm_native.c:889`. If you analyze how this calculation is done, you
will see that the returned value is a summary of the following fields:

* runtime->status->hw_ptr
* runtime->buffer_size
* runtime->control->appl_ptr
* runtime->boundary
* runtime->delay


>> You modify the function, which is called `snd_bcm2835_pcm_pointer`. Here you are
>> supposed to increase `alsa_stream->pos` with the proper offset. Instead, you
>> imitate a delay, but in fact the delay is not increased.
>>
>> So, the proper solution should be to fix the reported pointer.
> 
> I think there is a difficulty with this. The “pos” pointer looks to have to be modulo the buffer size. This causes a problem, as I see it, in that if the calculated (pos + interpolated delay in bytes) is longer than the buffer size


There is no "interpolated delay". The concept of "interpolated delay" is
incorrect. When you play sound - the pointer increments. But in this commit you
increment the delay, as if sound doesn't play.


>> As a result,
>> userspace will recieve the correct delay, instead of these crazy 10 ms.
> 
> Just to point out that with the proposed patch, it appears that the correct delay is being reported, (apart, possibly, from any delay originally set in the snd_pcm_delay field, as mentioned above).


Then I would like to point out the alsa-lib function `snd_pcm_avail()` - it will
return the wrong value.


> 
> All the best,
> Mike
> 
>> FYI, there is
>>> a discussion of the effects of a downstream equivalent of this suggested patch
>>> at:
>>> https://github.com/raspberrypi/firmware/issues/1026#issuecomment-415746016.
>>
>> Thank you for the link, it clarified for me what you try to achieve.
>>
>> On 10/22/18 21:17, Mike Brady wrote:
>>> When the BCM2835 audio output is used, userspace sees a jitter up to 10ms
>>> in the audio position, aka "delay" -- the number of frames that must
>>> be output before a new frame would be played.
>>> Make this a bit nicer for userspace by interpolating the position
>>> using the CPU clock.
>>> The overhead is small -- an extra ktime_get() every time a GPU message
>>> is sent -- and another call and a few calculations whenever the delay
>>> is sought from userland.
>>> At 48,000 frames per second, i.e. approximately 20 microseconds per
>>> frame, it would take a clock inaccuracy of
>>> 20 microseconds in 10 milliseconds -- 2,000 parts per million --
>>> to result in an inaccurate estimate, whereas
>>> crystal- or resonator-based clocks typically have an
>>> inaccuracy of 10s to 100s of parts per million.
>>>
>>> Signed-off-by: Mike Brady <mikebrady@eircom.net>
>>> ---
>>> Changes in v2 -- remove inappropriate addition of SNDRV_PCM_INFO_BATCH flag
>>>
>>> .../vc04_services/bcm2835-audio/bcm2835-pcm.c | 20 +++++++++++++++++++
>>> .../vc04_services/bcm2835-audio/bcm2835.h     |  1 +
>>> 2 files changed, 21 insertions(+)
>>>
>>> diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
>>> index e66da11af5cf..9053b996cada 100644
>>> --- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
>>> +++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
>>> @@ -74,6 +74,7 @@ void bcm2835_playback_fifo(struct bcm2835_alsa_stream *alsa_stream,
>>> 	atomic_set(&alsa_stream->pos, pos);
>>>
>>> 	alsa_stream->period_offset += bytes;
>>> +	alsa_stream->interpolate_start = ktime_get();
>>> 	if (alsa_stream->period_offset >= alsa_stream->period_size) {
>>> 		alsa_stream->period_offset %= alsa_stream->period_size;
>>> 		snd_pcm_period_elapsed(substream);
>>> @@ -243,6 +244,7 @@ static int snd_bcm2835_pcm_prepare(struct snd_pcm_substream *substream)
>>> 	atomic_set(&alsa_stream->pos, 0);
>>> 	alsa_stream->period_offset = 0;
>>> 	alsa_stream->draining = false;
>>> +	alsa_stream->interpolate_start = ktime_get();
>>>
>>> 	return 0;
>>> }
>>> @@ -292,6 +294,24 @@ snd_bcm2835_pcm_pointer(struct snd_pcm_substream *substream)
>>> {
>>> 	struct snd_pcm_runtime *runtime = substream->runtime;
>>> 	struct bcm2835_alsa_stream *alsa_stream = runtime->private_data;
>>> +	ktime_t now = ktime_get();
>>> +
>>> +	/* Give userspace better delay reporting by interpolating between GPU
>>> +	 * notifications, assuming audio speed is close enough to the clock
>>> +	 * used for ktime
>>> +	 */
>>> +
>>> +	if ((ktime_to_ns(alsa_stream->interpolate_start)) &&
>>> +	    (ktime_compare(alsa_stream->interpolate_start, now) < 0)) {
>>> +		u64 interval =
>>> +			(ktime_to_ns(ktime_sub(now,
>>> +				alsa_stream->interpolate_start)));
>>> +		u64 frames_output_in_interval =
>>> +			div_u64((interval * runtime->rate), 1000000000);
>>> +		snd_pcm_sframes_t frames_output_in_interval_sized =
>>> +			-frames_output_in_interval;
>>> +		runtime->delay = frames_output_in_interval_sized;
>>> +	}
>>>
>>> 	return snd_pcm_indirect_playback_pointer(substream,
>>> 		&alsa_stream->pcm_indirect,
>>> diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h b/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
>>> index e13435d1c205..595ad584243f 100644
>>> --- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
>>> +++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
>>> @@ -78,6 +78,7 @@ struct bcm2835_alsa_stream {
>>> 	unsigned int period_offset;
>>> 	unsigned int buffer_size;
>>> 	unsigned int period_size;
>>> +	ktime_t interpolate_start;
>>>
>>> 	struct bcm2835_audio_instance *instance;
>>> 	int idx;
>>>
>> _______________________________________________
>> Alsa-devel mailing list
>> Alsa-devel@alsa-project.org
>> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
>
Mike Brady Oct. 24, 2018, 7:54 p.m. UTC | #4
Hi Kirill. Thanks again for your comments.

> On 24 Oct 2018, at 19:06, Kirill Marinushkin <k.marinushkin@gmail.com> wrote:
> 
> Hello Mike,
> 
> On 10/24/18 10:20, Mike Brady wrote:
>> Hi Kirill. Thanks for your comments.
>> 
>>> On 22 Oct 2018, at 23:25, Kirill Marinushkin <k.marinushkin@gmail.com> wrote:
>>> 
>>> AFAIU, this patch is wrong. Please correct me, maybe I misunderstand something.
>>> 
>>>> The problem that this patch seeks to resolve is that when userland asks for
>>>> the delay
>>> 
>>> The userspace asks not for delay, but for the pointer.
>> 
>> The call in question is snd_pcm_delay. I presume this delay is calculated from knowledge of the stream position “pos", the period (buffer?) number (and period/buffer size) and the snd_pcm_runtime structure’s “delay" field (“runtime->delay”).
>> 
> 
> 
> In kernel, this delay is calculated in `snd_pcm_calc_delay()`, defined at
> `sound/core/pcm_native.c:889`. If you analyze how this calculation is done, you
> will see that the returned value is a summary of the following fields:
> 
> * runtime->status->hw_ptr
> * runtime->buffer_size
> * runtime->control->appl_ptr
> * runtime->boundary
> * runtime->delay

That’s very useful, thanks.

>>> You modify the function, which is called `snd_bcm2835_pcm_pointer`. Here you are
>>> supposed to increase `alsa_stream->pos` with the proper offset. Instead, you
>>> imitate a delay, but in fact the delay is not increased.
>>> 
>>> So, the proper solution should be to fix the reported pointer.
>> 
>> I think there is a difficulty with this. The “pos” pointer looks to have to be modulo the buffer size. This causes a problem, as I see it, in that if the calculated (pos + interpolated delay in bytes) is longer than the buffer size
> 
> 
> There is no "interpolated delay". The concept of "interpolated delay" is
> incorrect.

Yes, my language here is wrong. What I mean is the estimated number of frames output since the pointer was last updated — let’s call it the `interpolated frame count`.

> When you play sound - the pointer increments.

Unfortunately, when you play sound, the pointer does not actually increment, for up to about 10 milliseconds. I know of no way to actually access the true “live” position of the frame that is being played at any instant; hence the desire to estimate it.

What actually seems to be happening is that when `bcm2835_playback_fifo` is called, the pointer is updated, but as frames are individually output to the DAC, this pointer does not increment. It is not updated until the next time `bcm2835_playback_fifo` is called.

> But in this commit you increment the delay, as if sound doesn't play.

It is true that the patch does make use of the  snd_pcm_runtime structure’s “delay" field (aka "runtime->delay” here). That field is defined for: “/* extra delay; typically FIFO size */”. Clearly it is not being used for that here — it is being used simply because it is part of the calculation done in snd_pcm_calc_delay(), as you point out. At present, it looks like that field isn’t being used –– it’s set to zero –– and not modified anywhere else in the driver, AFAICS. If it was necessary, it would be a simple matter to preserve whatever value it was given.

>>> As a result,
>>> userspace will recieve the correct delay, instead of these crazy 10 ms.
>> 
>> Just to point out that with the proposed patch, it appears that the correct delay is being reported, (apart, possibly, from any delay originally set in the snd_pcm_delay field, as mentioned above).
> 
> 
> Then I would like to point out the alsa-lib function `snd_pcm_avail()` - it will return the wrong value.

It is already the case that snd_pcm_avail() does not return the true delay. The ALSA documentation states: "The value returned by that call [i.e. the snd_pcm_avail*() functions] is not directly related to the delay…” 

Kind regards
Mike

>> 
>>> FYI, there is
>>>> a discussion of the effects of a downstream equivalent of this suggested patch
>>>> at:
>>>> https://github.com/raspberrypi/firmware/issues/1026#issuecomment-415746016.
>>> 
>>> Thank you for the link, it clarified for me what you try to achieve.
>>> 
>>> On 10/22/18 21:17, Mike Brady wrote:
>>>> When the BCM2835 audio output is used, userspace sees a jitter up to 10ms
>>>> in the audio position, aka "delay" -- the number of frames that must
>>>> be output before a new frame would be played.
>>>> Make this a bit nicer for userspace by interpolating the position
>>>> using the CPU clock.
>>>> The overhead is small -- an extra ktime_get() every time a GPU message
>>>> is sent -- and another call and a few calculations whenever the delay
>>>> is sought from userland.
>>>> At 48,000 frames per second, i.e. approximately 20 microseconds per
>>>> frame, it would take a clock inaccuracy of
>>>> 20 microseconds in 10 milliseconds -- 2,000 parts per million --
>>>> to result in an inaccurate estimate, whereas
>>>> crystal- or resonator-based clocks typically have an
>>>> inaccuracy of 10s to 100s of parts per million.
>>>> 
>>>> Signed-off-by: Mike Brady <mikebrady@eircom.net>
>>>> ---
>>>> Changes in v2 -- remove inappropriate addition of SNDRV_PCM_INFO_BATCH flag
>>>> 
>>>> .../vc04_services/bcm2835-audio/bcm2835-pcm.c | 20 +++++++++++++++++++
>>>> .../vc04_services/bcm2835-audio/bcm2835.h     |  1 +
>>>> 2 files changed, 21 insertions(+)
>>>> 
>>>> diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
>>>> index e66da11af5cf..9053b996cada 100644
>>>> --- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
>>>> +++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
>>>> @@ -74,6 +74,7 @@ void bcm2835_playback_fifo(struct bcm2835_alsa_stream *alsa_stream,
>>>> 	atomic_set(&alsa_stream->pos, pos);
>>>> 
>>>> 	alsa_stream->period_offset += bytes;
>>>> +	alsa_stream->interpolate_start = ktime_get();
>>>> 	if (alsa_stream->period_offset >= alsa_stream->period_size) {
>>>> 		alsa_stream->period_offset %= alsa_stream->period_size;
>>>> 		snd_pcm_period_elapsed(substream);
>>>> @@ -243,6 +244,7 @@ static int snd_bcm2835_pcm_prepare(struct snd_pcm_substream *substream)
>>>> 	atomic_set(&alsa_stream->pos, 0);
>>>> 	alsa_stream->period_offset = 0;
>>>> 	alsa_stream->draining = false;
>>>> +	alsa_stream->interpolate_start = ktime_get();
>>>> 
>>>> 	return 0;
>>>> }
>>>> @@ -292,6 +294,24 @@ snd_bcm2835_pcm_pointer(struct snd_pcm_substream *substream)
>>>> {
>>>> 	struct snd_pcm_runtime *runtime = substream->runtime;
>>>> 	struct bcm2835_alsa_stream *alsa_stream = runtime->private_data;
>>>> +	ktime_t now = ktime_get();
>>>> +
>>>> +	/* Give userspace better delay reporting by interpolating between GPU
>>>> +	 * notifications, assuming audio speed is close enough to the clock
>>>> +	 * used for ktime
>>>> +	 */
>>>> +
>>>> +	if ((ktime_to_ns(alsa_stream->interpolate_start)) &&
>>>> +	    (ktime_compare(alsa_stream->interpolate_start, now) < 0)) {
>>>> +		u64 interval =
>>>> +			(ktime_to_ns(ktime_sub(now,
>>>> +				alsa_stream->interpolate_start)));
>>>> +		u64 frames_output_in_interval =
>>>> +			div_u64((interval * runtime->rate), 1000000000);
>>>> +		snd_pcm_sframes_t frames_output_in_interval_sized =
>>>> +			-frames_output_in_interval;
>>>> +		runtime->delay = frames_output_in_interval_sized;
>>>> +	}
>>>> 
>>>> 	return snd_pcm_indirect_playback_pointer(substream,
>>>> 		&alsa_stream->pcm_indirect,
>>>> diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h b/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
>>>> index e13435d1c205..595ad584243f 100644
>>>> --- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
>>>> +++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
>>>> @@ -78,6 +78,7 @@ struct bcm2835_alsa_stream {
>>>> 	unsigned int period_offset;
>>>> 	unsigned int buffer_size;
>>>> 	unsigned int period_size;
>>>> +	ktime_t interpolate_start;
>>>> 
>>>> 	struct bcm2835_audio_instance *instance;
>>>> 	int idx;
>>>> 
>>> _______________________________________________
>>> Alsa-devel mailing list
>>> Alsa-devel@alsa-project.org
>>> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
>> 
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Kirill Marinushkin Oct. 24, 2018, 10:02 p.m. UTC | #5
Hello Mike,

We are not on the same page. What you hear is not what I tell you.
Either you don't understand what happens in your commit, or I don't understand
what happens in the driver.

Hopefully somebody in the community can comment here.

On 10/24/18 21:54, Mike Brady wrote:
>>>> You modify the function, which is called `snd_bcm2835_pcm_pointer`. Here you are
>>>> supposed to increase `alsa_stream->pos` with the proper offset. Instead, you
>>>> imitate a delay, but in fact the delay is not increased.
>>>>
>>>> So, the proper solution should be to fix the reported pointer.
>>>
>>> I think there is a difficulty with this. The “pos” pointer looks to have to be modulo the buffer size. This causes a problem, as I see it, in that if the calculated (pos + interpolated delay in bytes) is longer than the buffer size
>>
>>
>> There is no "interpolated delay". The concept of "interpolated delay" is
>> incorrect.
> 
> Yes, my language here is wrong. What I mean is the estimated number of frames output since the pointer was last updated — let’s call it the `interpolated frame count`.
> 

That's not what I mean. From my perspective, the problem is not in language, but
in the concept which you introduce here.

>> When you play sound - the pointer increments.
> 
> Unfortunately, when you play sound, the pointer does not actually increment, for up to about 10 milliseconds. I know of no way to actually access the true “live” position of the frame that is being played at any instant; hence the desire to estimate it.
> 

Your vision of situation in the opposite from my vision. What you see as a
symptom - I see as a root cause. As I see, you should fix the
pointer-not-incrementing. Why do you think that it's okay that the pointer is
not updating during sound play? Why do you insist that there is a delay? I don't
understand why we are so stuck here.

> What actually seems to be happening is that when `bcm2835_playback_fifo` is called, the pointer is updated, but as frames are individually output to the DAC, this pointer does not increment. It is not updated until the next time `bcm2835_playback_fifo` is called.
> 
>> But in this commit you increment the delay, as if sound doesn't play.
> 
> It is true that the patch does make use of the  snd_pcm_runtime structure’s “delay" field (aka "runtime->delay” here). That field is defined for: “/* extra delay; typically FIFO size */”. Clearly it is not being used for that here — it is being used simply because it is part of the calculation done in snd_pcm_calc_delay(), as you point out. At present, it looks like that field isn’t being used –– it’s set to zero –– and not modified anywhere else in the driver, AFAICS. If it was necessary, it would be a simple matter to preserve whatever value it was given.
> 

That's not what I am talking about. Somehow we don't understand each other.

>>>> As a result,
>>>> userspace will recieve the correct delay, instead of these crazy 10 ms.
>>>
>>> Just to point out that with the proposed patch, it appears that the correct delay is being reported, (apart, possibly, from any delay originally set in the snd_pcm_delay field, as mentioned above).
>>
>>
>> Then I would like to point out the alsa-lib function `snd_pcm_avail()` - it will return the wrong value.
> 
> It is already the case that snd_pcm_avail() does not return the true delay. The ALSA documentation states: "The value returned by that call [i.e. the snd_pcm_avail*() functions] is not directly related to the delay…” 
> 

Do you mean, that you are submitting the patch into the upstream kernel without
reading the code?

snd_pcm_avail() is calculated based on:

* hw_ptr
* buffer_size
* appl_ptr
* boundary

If you fix hw_ptr - it will fix both snd_pcm_delay() and snd_pcm_avail().
Instead, you invent the "interpolated delay", which in fact only compensates the
wrong hw_ptr instead of fixing it.

Best Regards,
Kirill
Takashi Iwai Oct. 25, 2018, 7:37 a.m. UTC | #6
On Thu, 25 Oct 2018 00:02:34 +0200,
Kirill Marinushkin wrote:
> 
> >> When you play sound - the pointer increments.
> > 
> > Unfortunately, when you play sound, the pointer does not actually increment, for up to about 10 milliseconds. I know of no way to actually access the true “live” position of the frame that is being played at any instant; hence the desire to estimate it.
> > 
> 
> Your vision of situation in the opposite from my vision. What you see as a
> symptom - I see as a root cause. As I see, you should fix the
> pointer-not-incrementing. Why do you think that it's okay that the pointer is
> not updating during sound play? Why do you insist that there is a delay? I don't
> understand why we are so stuck here.

Well, in the API POV, it's nothing wrong to keep hwptr sticking while
updating only delay value.  It implies that the hardware chip doesn't
provide the hwptr update.

Though, usually the delay value is provided also from the hardware,
e.g. reading the link position or such.  It's a typical case like
USB-audio, where the hwptr gets updated and the delay indicates the
actual position *behind* hwptr.  That works because hwptr shows the
position in the ring buffer at which you can access the data.  And it
doesn't mean that hwptr is the actually playing position, but it can
be ahead of the current position, when many samples are queued on
FIFO.  The delay is provided to correct the position back to the
actual point.

But, this also doesn't mean that the delay shouldn't be used for the
purpose like this patchset, either.  OTOH, providing a finer hwptr
value would be likely more apps-friendly; there must be many programs
that don't evaluate the delay value properly.

So, I suppose that hwptr update might be a better option if the code
won't become too complex.  Let's see.


One another thing I'd like to point out is that the value given in the
patch is nothing but an estimated position, optimistically calculated
via the system timer.  Mike and I had already discussion in another
thread, and another possible option would be to provide the proper
timestamp-vs-hwptr pair, instead of updating the timestamp always at
the status read.

Maybe it's worth to have a module option to suppress this optimistic
hwptr update behavior, in case something went wrong with clocks?


thanks,

Takashi
Kirill Marinushkin Oct. 25, 2018, 5:20 p.m. UTC | #7
Hello Takashi, Mike,

@Takashi

On 10/25/18 09:37, Takashi Iwai wrote:
> Well, in the API POV, it's nothing wrong to keep hwptr sticking while
> updating only delay value.  It implies that the hardware chip doesn't
> provide the hwptr update.

Thank you for the clarification. Modifying `runtime->delay` from the `pointer`
function looked wrong for me. Now I understand the motivation and the use-case.
I will be more careful when analyzing the code which doesn't fit my expectations.

@Mike

I was wrong. You can ignore my comments. Please don't take them personal: it's
all about having high-quality code in kernel.

Best Regards,
Kirill
Mike Brady Oct. 28, 2018, 2:24 p.m. UTC | #8
Hi Kirill. Thanks for the post.
Mike

> On 25 Oct 2018, at 18:20, Kirill Marinushkin <k.marinushkin@gmail.com> wrote:
> 
> Hello Takashi, Mike,
> 
> @Takashi
> 
> On 10/25/18 09:37, Takashi Iwai wrote:
>> Well, in the API POV, it's nothing wrong to keep hwptr sticking while
>> updating only delay value.  It implies that the hardware chip doesn't
>> provide the hwptr update.
> 
> Thank you for the clarification. Modifying `runtime->delay` from the `pointer`
> function looked wrong for me. Now I understand the motivation and the use-case.
> I will be more careful when analyzing the code which doesn't fit my expectations.
> 
> @Mike
> 
> I was wrong. You can ignore my comments. Please don't take them personal: it's
> all about having high-quality code in kernel.
> 
> Best Regards,
> Kirill
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Mike Brady Oct. 28, 2018, 2:26 p.m. UTC | #9
> On 25 Oct 2018, at 08:37, Takashi Iwai <tiwai@suse.de> wrote:
> 
> On Thu, 25 Oct 2018 00:02:34 +0200,
> Kirill Marinushkin wrote:
>> 
>>>> When you play sound - the pointer increments.
>>> 
>>> Unfortunately, when you play sound, the pointer does not actually increment, for up to about 10 milliseconds. I know of no way to actually access the true “live” position of the frame that is being played at any instant; hence the desire to estimate it.
>>> 
>> 
>> Your vision of situation in the opposite from my vision. What you see as a
>> symptom - I see as a root cause. As I see, you should fix the
>> pointer-not-incrementing. Why do you think that it's okay that the pointer is
>> not updating during sound play? Why do you insist that there is a delay? I don't
>> understand why we are so stuck here.
> 
> Well, in the API POV, it's nothing wrong to keep hwptr sticking while
> updating only delay value.  It implies that the hardware chip doesn't
> provide the hwptr update.
> 
> Though, usually the delay value is provided also from the hardware,
> e.g. reading the link position or such.  It's a typical case like
> USB-audio, where the hwptr gets updated and the delay indicates the
> actual position *behind* hwptr.  That works because hwptr shows the
> position in the ring buffer at which you can access the data.  And it
> doesn't mean that hwptr is the actually playing position, but it can
> be ahead of the current position, when many samples are queued on
> FIFO.  The delay is provided to correct the position back to the
> actual point.
> 
> But, this also doesn't mean that the delay shouldn't be used for the
> purpose like this patchset, either.  OTOH, providing a finer hwptr
> value would be likely more apps-friendly; there must be many programs
> that don't evaluate the delay value properly.
> 
> So, I suppose that hwptr update might be a better option if the code
> won't become too complex.  Let's see.

Indeed. It will take me a few days to look into this…

Regards
Mike

> One another thing I'd like to point out is that the value given in the
> patch is nothing but an estimated position, optimistically calculated
> via the system timer.  Mike and I had already discussion in another
> thread, and another possible option would be to provide the proper
> timestamp-vs-hwptr pair, instead of updating the timestamp always at
> the status read.
> 
> Maybe it's worth to have a module option to suppress this optimistic
> hwptr update behavior, in case something went wrong with clocks?
> 
> 
> thanks,
> 
> Takashi
Mike Brady Nov. 5, 2018, 3:57 p.m. UTC | #10
Thanks for the comments and suggestions.

> On 25 Oct 2018, at 08:37, Takashi Iwai <tiwai@suse.de> wrote:
> 
> Well, in the API POV, it's nothing wrong to keep hwptr sticking while
> updating only delay value.  It implies that the hardware chip doesn't
> provide the hwptr update.

As I understand it, this driver stages settings, data and status information for the true audio driver which is part of VideoCore (VC). The driver communicates with the VC by sending messages. Responses come back in asynchronous callbacks. There doesn’t seem to be any other source of data or status.

When parameters such as frame format, rate and period size have been set up, the VC executes periodic callbacks to retrieve period-sized buffers of data. At 44,100 frames per second and with standard 444-frame periods, callbacks occur approximately every 10.07 milliseconds.

> Though, usually the delay value is provided also from the hardware,
> e.g. reading the link position or such.  It's a typical case like
> USB-audio, where the hwptr gets updated and the delay indicates the
> actual position *behind* hwptr.  That works because hwptr shows the
> position in the ring buffer at which you can access the data.  And it
> doesn't mean that hwptr is the actually playing position, but it can
> be ahead of the current position, when many samples are queued on
> FIFO.  The delay is provided to correct the position back to the
> actual point.

 The information that the alsa snd_pcm_delay() function depends on is updated during these callbacks. Thus, a user program monitoring the snc_pcm_delay() value closely will see sudden jumps in the value every 10 milliseconds or so — a 10 millisecond jitter. 
> 
> But, this also doesn't mean that the delay shouldn't be used for the
> purpose like this patchset, either.  OTOH, providing a finer hwptr
> value would be likely more apps-friendly; there must be many programs
> that don't evaluate the delay value properly.

With interpolation, the number of frames that would have been output from the time of the last callback is subtracted from the delay to give a more accurate estimate of the actual delay at the time it is requested.

> So, I suppose that hwptr update might be a better option if the code
> won't become too complex.  Let's see.

Having looked at the code, there does not seem to be a good way to avoid interpolation. Later versions of the interface include a message type of VC_AUDIO_MSG_TYPE_LATENCY (see https://github.com/raspberrypi/firmware/blob/master/opt/vc/include/interface/vmcs_host/vc_vchi_audioserv_defs.h#L158) which seems to be a request to return the latency. However, the latency would be returned in an asynchronous callback (see function audio_vchi_callback in bcm2835-vchiq.c). One can wait for the result, but it seems that it could take up to 10 milliseconds (see function bcm2835_audio_send_msg_locked in bcm2835-vchiq.c). This is hardly tolerable, and to avoid it, one would have to store both the latency returned and the time the request was sent (or the time the reply was returned — it’s not clear which would be correct) and interpolate from that to the time the delay is requested. In other words, from the point of view of avoiding interpolation, this is likely to be no better than the present suggestion. There wold also be a need to make the latency request periodically, adding to the overhead.

Without getting a good deal more information about the VC, which may not be available, I’m afraid I can’t see a way of getting a better fix on the instantaneous values of pointers such as the hw_ptr. BTW, I have not been able to find a source for the file vc_vchi_audioserv_defs.h, which looks like a Broadcom file and which appears to have two  versions. If anyone could point me to the source, I’d be grateful.

> One another thing I'd like to point out is that the value given in the
> patch is nothing but an estimated position, optimistically calculated
> via the system timer.  Mike and I had already discussion in another
> thread, and another possible option would be to provide the proper
> timestamp-vs-hwptr pair, instead of updating the timestamp always at
> the status read.

Agreed — that would give the caller the information needed to do the interpolation for themselves if desired.

> Maybe it's worth to have a module option to suppress this optimistic
> hwptr update behavior, in case something went wrong with clocks?

Following this suggestion, I have updated the patch to include a module parameter ‘enable_delay_interpolation’, and I will post that later for consideration.

Regards
Mike


> thanks,
> 
> Takashi

> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Takashi Iwai Nov. 5, 2018, 4:11 p.m. UTC | #11
On Mon, 05 Nov 2018 16:57:07 +0100,
Mike Brady wrote:
> 
> > One another thing I'd like to point out is that the value given in the
> > patch is nothing but an estimated position, optimistically calculated
> > via the system timer.  Mike and I had already discussion in another
> > thread, and another possible option would be to provide the proper
> > timestamp-vs-hwptr pair, instead of updating the timestamp always at
> > the status read.
> 
> Agreed — that would give the caller the information needed to do the
> interpolation for themselves if desired.

And now I wonder whether the problem is still present with the latest
code.  There was a (kind of) regression in this regard when we
introduced the fine-grained hardware timestamping, but it should have
been addressed by the commit 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
    ALSA: pcm: update tstamp only if audio_tstamp changed

Could you double-check whether the tstamp field gets still updated
even if no hwptr (and delay) is changed?


thanks,

Takashi
Mike Brady Nov. 6, 2018, 9:05 p.m. UTC | #12
> On 5 Nov 2018, at 16:11, Takashi Iwai <tiwai@suse.de> wrote:
> 
> On Mon, 05 Nov 2018 16:57:07 +0100,
> Mike Brady wrote:
>> 
>>> One another thing I'd like to point out is that the value given in the
>>> patch is nothing but an estimated position, optimistically calculated
>>> via the system timer.  Mike and I had already discussion in another
>>> thread, and another possible option would be to provide the proper
>>> timestamp-vs-hwptr pair, instead of updating the timestamp always at
>>> the status read.
>> 
>> Agreed — that would give the caller the information needed to do the
>> interpolation for themselves if desired.
> 
> And now I wonder whether the problem is still present with the latest
> code.  There was a (kind of) regression in this regard when we
> introduced the fine-grained hardware timestamping, but it should have
> been addressed by the commit 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
>    ALSA: pcm: update tstamp only if audio_tstamp changed
> 
> Could you double-check whether the tstamp field gets still updated
> even if no hwptr (and delay) is changed?

Yes, this could be a bit problematic. The function update_audio_tstamp in pcm_lib.c could include the interpolated delay in the calculation of audio_tstamp, and hence
could trigger the update of tstamp.

Another issue, as I see it, is that the the audio_tstamp value would depend on whether, and when, a snd_pcm_delay() call (which recalculates the interpolation and puts it into the delay field) was made immediately prior to it. By zeroing the delay when a GPU interrupt occurs, you could be certain that the interpolated delay would be less than or equal to the true delay, but this doesn’t seem very satisfactory — you have neither the timestamp of the last update nor the correctly interpolated timestamp. 

Sadly, therefore, I’m now of the view that this approach to interpolating the delay between GPU interrupts is not really viable. Would that be your view?

Regards
Mike

> thanks,
> 
> Takashi
Takashi Iwai Nov. 6, 2018, 9:31 p.m. UTC | #13
On Tue, 06 Nov 2018 22:05:11 +0100,
Mike Brady wrote:
> 
> 
> > On 5 Nov 2018, at 16:11, Takashi Iwai <tiwai@suse.de> wrote:
> > 
> > On Mon, 05 Nov 2018 16:57:07 +0100,
> > Mike Brady wrote:
> >> 
> >>> One another thing I'd like to point out is that the value given in the
> >>> patch is nothing but an estimated position, optimistically calculated
> >>> via the system timer.  Mike and I had already discussion in another
> >>> thread, and another possible option would be to provide the proper
> >>> timestamp-vs-hwptr pair, instead of updating the timestamp always at
> >>> the status read.
> >> 
> >> Agreed — that would give the caller the information needed to do the
> >> interpolation for themselves if desired.
> > 
> > And now I wonder whether the problem is still present with the latest
> > code.  There was a (kind of) regression in this regard when we
> > introduced the fine-grained hardware timestamping, but it should have
> > been addressed by the commit 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
> >    ALSA: pcm: update tstamp only if audio_tstamp changed
> > 
> > Could you double-check whether the tstamp field gets still updated
> > even if no hwptr (and delay) is changed?
> 
> Yes, this could be a bit problematic. The function update_audio_tstamp in pcm_lib.c could include the interpolated delay in the calculation of audio_tstamp, and hence
> could trigger the update of tstamp.

Well, my question is about the current driver as-is.
It has no runtime->delay, so far, hence audio_tstamp is calculated
only from the hwptr position.  As the corresponding tstamp gets
updated only when the audio_tstamp (i.e. hwptr) is updated, the driver
should provide the consistent pair of audio_tstamp (i.e. hwptr) vs
tstamp.

> Another issue, as I see it, is that the the audio_tstamp value would depend on whether, and when, a snd_pcm_delay() call (which recalculates the interpolation and puts it into the delay field) was made immediately prior to it. By zeroing the delay when a GPU interrupt occurs, you could be certain that the interpolated delay would be less than or equal to the true delay, but this doesn’t seem very satisfactory — you have neither the timestamp of the last update nor the correctly interpolated timestamp. 

No, audio_stamp field is updated at snd_pcm_period_elapsed() call as
well as tstamp field.

Basically the driver provides three things: hwptr, tstamp and
audio_tstamp.  For the default configuration (like bcm audio does),
audio_tstamp is calculated from hwptr, so it can be seen as the hwptr
represented in timespec.  OTOH, tstamp is the actual system time that
is updated only when audio_tstamp changes -- which means tstamp gets
updated *only* at snd_pcm_period_elapsed() call on bcm audio.

And, my point is that you should be able to interpolate the actual
position in user-space side based on these information; it doesn't
have to be done in the kernel at all.

> Sadly, therefore, I’m now of the view that this approach to interpolating the delay between GPU interrupts is not really viable. Would that be your view?

Actually there were some bugs in the past that the tstamp was updated
at each snd_pcm_status(), but it should have been fixed in the recent
kernels.  That's why I asked to re-check the current status.


thanks,

Takashi
Mike Brady Nov. 11, 2018, 6:08 p.m. UTC | #14
> On 6 Nov 2018, at 21:31, Takashi Iwai <tiwai@suse.de> wrote:
> 
> On Tue, 06 Nov 2018 22:05:11 +0100,
> Mike Brady wrote:
>> 
>> 
>>> On 5 Nov 2018, at 16:11, Takashi Iwai <tiwai@suse.de> wrote:
>>> 
>>> On Mon, 05 Nov 2018 16:57:07 +0100,
>>> Mike Brady wrote:
>>>> 
>>>>> One another thing I'd like to point out is that the value given in the
>>>>> patch is nothing but an estimated position, optimistically calculated
>>>>> via the system timer.  Mike and I had already discussion in another
>>>>> thread, and another possible option would be to provide the proper
>>>>> timestamp-vs-hwptr pair, instead of updating the timestamp always at
>>>>> the status read.
>>>> 
>>>> Agreed — that would give the caller the information needed to do the
>>>> interpolation for themselves if desired.
>>> 
>>> And now I wonder whether the problem is still present with the latest
>>> code.  There was a (kind of) regression in this regard when we
>>> introduced the fine-grained hardware timestamping, but it should have
>>> been addressed by the commit 20e3f985bb875fea4f86b04eba4b6cc29bfd6b71
>>>   ALSA: pcm: update tstamp only if audio_tstamp changed
>>> 
>>> Could you double-check whether the tstamp field gets still updated
>>> even if no hwptr (and delay) is changed?
>> 
>> Yes, this could be a bit problematic. The function update_audio_tstamp in pcm_lib.c could include the interpolated delay in the calculation of audio_tstamp, and hence
>> could trigger the update of tstamp.
> 
> Well, my question is about the current driver as-is.
> It has no runtime->delay, so far, hence audio_tstamp is calculated
> only from the hwptr position.  As the corresponding tstamp gets
> updated only when the audio_tstamp (i.e. hwptr) is updated, the driver
> should provide the consistent pair of audio_tstamp (i.e. hwptr) vs
> tstamp.

Fair enough — I had been thinking about the situation with the patch in place.

>> Another issue, as I see it, is that the the audio_tstamp value would depend on whether, and when, a snd_pcm_delay() call (which recalculates the interpolation and puts it into the delay field) was made immediately prior to it. By zeroing the delay when a GPU interrupt occurs, you could be certain that the interpolated delay would be less than or equal to the true delay, but this doesn’t seem very satisfactory — you have neither the timestamp of the last update nor the correctly interpolated timestamp. 
> 
> No, audio_stamp field is updated at snd_pcm_period_elapsed() call as
> well as tstamp field.

That's great.

> Basically the driver provides three things: hwptr, tstamp and
> audio_tstamp.  For the default configuration (like bcm audio does),
> audio_tstamp is calculated from hwptr, so it can be seen as the hwptr
> represented in timespec.  OTOH, tstamp is the actual system time that
> is updated only when audio_tstamp changes -- which means tstamp gets
> updated *only* at snd_pcm_period_elapsed() call on bcm audio.
> 
> And, my point is that you should be able to interpolate the actual
> position in user-space side based on these information; it doesn't
> have to be done in the kernel at all.

That is true, of course. The problem is that the snd_pcm_delay() call is so inaccurate though.

>> Sadly, therefore, I’m now of the view that this approach to interpolating the delay between GPU interrupts is not really viable. Would that be your view?

> Actually there were some bugs in the past that the tstamp was updated
> at each snd_pcm_status(), but it should have been fixed in the recent
> kernels.  That's why I asked to re-check the current status.

Yes, as far as I can see, that's fixed. 

In further testing, however, I noticed that the audio_frames calculation in update_audio_tstamp() in pcm_lib.c didn't include the delay,
so now it does if the delay field is negative, which it is "naturally" in this case. With that change, the delay reported by snd_pcm_delay() and
calculated as you referred to above are consistent.

So, overall, I am happier that this approach is at least viable. But two issues remain, in my view:
First, is it "a good idea"?
Second, the delay field is now being used as a delay if its positive and an interpolation if it's negative. It works, but would it be better to have an extra "interpolation" field?

I'll post the updated patch shortly.

Thanks,
Mike
diff mbox series

Patch

diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
index e66da11af5cf..9053b996cada 100644
--- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
+++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-pcm.c
@@ -74,6 +74,7 @@  void bcm2835_playback_fifo(struct bcm2835_alsa_stream *alsa_stream,
 	atomic_set(&alsa_stream->pos, pos);
 
 	alsa_stream->period_offset += bytes;
+	alsa_stream->interpolate_start = ktime_get();
 	if (alsa_stream->period_offset >= alsa_stream->period_size) {
 		alsa_stream->period_offset %= alsa_stream->period_size;
 		snd_pcm_period_elapsed(substream);
@@ -243,6 +244,7 @@  static int snd_bcm2835_pcm_prepare(struct snd_pcm_substream *substream)
 	atomic_set(&alsa_stream->pos, 0);
 	alsa_stream->period_offset = 0;
 	alsa_stream->draining = false;
+	alsa_stream->interpolate_start = ktime_get();
 
 	return 0;
 }
@@ -292,6 +294,24 @@  snd_bcm2835_pcm_pointer(struct snd_pcm_substream *substream)
 {
 	struct snd_pcm_runtime *runtime = substream->runtime;
 	struct bcm2835_alsa_stream *alsa_stream = runtime->private_data;
+	ktime_t now = ktime_get();
+
+	/* Give userspace better delay reporting by interpolating between GPU
+	 * notifications, assuming audio speed is close enough to the clock
+	 * used for ktime
+	 */
+
+	if ((ktime_to_ns(alsa_stream->interpolate_start)) &&
+	    (ktime_compare(alsa_stream->interpolate_start, now) < 0)) {
+		u64 interval =
+			(ktime_to_ns(ktime_sub(now,
+				alsa_stream->interpolate_start)));
+		u64 frames_output_in_interval =
+			div_u64((interval * runtime->rate), 1000000000);
+		snd_pcm_sframes_t frames_output_in_interval_sized =
+			-frames_output_in_interval;
+		runtime->delay = frames_output_in_interval_sized;
+	}
 
 	return snd_pcm_indirect_playback_pointer(substream,
 		&alsa_stream->pcm_indirect,
diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h b/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
index e13435d1c205..595ad584243f 100644
--- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
+++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835.h
@@ -78,6 +78,7 @@  struct bcm2835_alsa_stream {
 	unsigned int period_offset;
 	unsigned int buffer_size;
 	unsigned int period_size;
+	ktime_t interpolate_start;
 
 	struct bcm2835_audio_instance *instance;
 	int idx;