[-,dmix,v3,0/1] pcm: dmix: Align slave_hw_ptr to slave period boundary
mbox series

Message ID bdf2f9ca-7ff1-40cb-271d-4a83b32ccf56@de.adit-jv.com
Headers show
Series
  • pcm: dmix: Align slave_hw_ptr to slave period boundary
Related show

Message

Timo Wischer Nov. 6, 2018, 12:48 p.m. UTC
On 10/30/18 14:21, Takashi Iwai wrote:
 > Yes, and is this correct?  Suppose you start a stream at the position
 > one sample before the next period (psize * N + (psize-1)). Then
 > slave_hw_ptr is psize * N.  At the next moment, the dpcm->hw.ptr
 > reaches to psize * (N+1), and snd_pcm_dmix_sync_ptr() gets called.
 > Then slave_hw_ptr will be updated to psize * (N+1) although only one
 > sample has been processed?

I have created a spreadsheet to simulate the behavior of dmix (see 
attachment). I used this sheet to test different corner cases with 
different implementations.
Column B-D describes 3 corner cases with the original implementation. 
Column E-H describes the corner cases with patch v2 applied and column 
I-L describes the corner cases with patch v3.
With patch v3 I was not able to find a corner case which would fail. All 
our internal audio test are also passing fine.

The corner cases are described in the form
N*period+period-1
(N+1)period+1
(N+2)period+1

The first line describes the value of dmix->spcm->hw.ptr when 
snd_pcm_dmix_start() will be called.
The second line describes the hw_ptr when poll() returns for the first 
time and the third line describes the hw_ptr when poll() returns for the 
second time.

In column B is the issue case described when the patch is not applied. 
snd_pcm_avail() returns 2 frames only (B27) but the buffer only contains 
3 frames (B18).
Therefore up to 5 frames are available (a buffer size of 8 frames is 
choosen).

The issue case of patch v2 is shown in column H. snd_pcm_avail() returns 
7 frames (H46) but the buffer contains still 3 frames (H37). Therefore 
there are only 5 frames free to overwrite.

Do you see any corner cases which I missed or any other drawbacks?

Best regards and thanks for your time

Timo

Comments

Takashi Iwai Nov. 6, 2018, 2:27 p.m. UTC | #1
On Tue, 06 Nov 2018 13:48:08 +0100,
Timo Wischer wrote:
> 
> On 10/30/18 14:21, Takashi Iwai wrote:
> > Yes, and is this correct?  Suppose you start a stream at the position
> > one sample before the next period (psize * N + (psize-1)). Then
> > slave_hw_ptr is psize * N.  At the next moment, the dpcm->hw.ptr
> > reaches to psize * (N+1), and snd_pcm_dmix_sync_ptr() gets called.
> > Then slave_hw_ptr will be updated to psize * (N+1) although only one
> > sample has been processed?
> 
> I have created a spreadsheet to simulate the behavior of dmix (see
> attachment). I used this sheet to test different corner cases with
> different implementations.
> Column B-D describes 3 corner cases with the original
> implementation. Column E-H describes the corner cases with patch v2
> applied and column I-L describes the corner cases with patch v3.
> With patch v3 I was not able to find a corner case which would
> fail. All our internal audio test are also passing fine.
> 
> The corner cases are described in the form
> N*period+period-1
> (N+1)period+1
> (N+2)period+1
> 
> The first line describes the value of dmix->spcm->hw.ptr when
> snd_pcm_dmix_start() will be called.
> The second line describes the hw_ptr when poll() returns for the first
> time and the third line describes the hw_ptr when poll() returns for
> the second time.
> 
> In column B is the issue case described when the patch is not
> applied. snd_pcm_avail() returns 2 frames only (B27) but the buffer
> only contains 3 frames (B18).
> Therefore up to 5 frames are available (a buffer size of 8 frames is
> choosen).
> 
> The issue case of patch v2 is shown in column H. snd_pcm_avail()
> returns 7 frames (H46) but the buffer contains still 3 frames
> (H37). Therefore there are only 5 frames free to overwrite.

Hrm, sorry, it's still not clear to me what you've tested and what
these items represent.

> Do you see any corner cases which I missed or any other drawbacks?

I guess that the biggest issue is the understanding of PCM period
wakeup; let me cite the description of a part of your patch:

 "But since 3 snd_pcm_period_elapsed(), 3 periods should be available
  and it should have been able to write.
  If rsnd_pointer during the start was 0x120 which is 3 periods
  then 0x248 - 0x120 =  128 it could go on with write."

This assumption can't be applied.

However, the current implementation of dmix is designed to achieve the
lowest latency by putting the data at the exact position being played
back now.  This, of course, doesn't guarantee the period wakeup = one
period free to fill like the above.  So it's the design choice.

The current code has a workaround for the case of nperiods <= 2, by
setting the initial slave_appl_ptr to the next period start.  This
guarantees that the period wakeup = one period to be filled.  But its
cost is the start latency; the playback doesn't start immediately but
wait until the next period start.


Takashi
Timo Wischer Nov. 6, 2018, 4:15 p.m. UTC | #2
On 11/6/18 15:27, Takashi Iwai wrote:
> I guess that the biggest issue is the understanding of PCM period
> wakeup; let me cite the description of a part of your patch:
>
>   "But since 3 snd_pcm_period_elapsed(), 3 periods should be available
>    and it should have been able to write.
>    If rsnd_pointer during the start was 0x120 which is 3 periods
>    then 0x248 - 0x120 =  128 it could go on with write."
>
> This assumption can't be applied.
>
> However, the current implementation of dmix is designed to achieve the
> lowest latency by putting the data at the exact position being played
> back now.  This, of course, doesn't guarantee the period wakeup = one
> period free to fill like the above.  So it's the design choice.
>
> The current code has a workaround for the case of nperiods <= 2, by
> setting the initial slave_appl_ptr to the next period start.  This
> guarantees that the period wakeup = one period to be filled.  But its
> cost is the start latency; the playback doesn't start immediately but
> wait until the next period start.
>
>
> Takashi

Thanks for this explanation. That helps me to understand.

Now, I am thinking about a solution to align the slave_hw_ptr and 
slave_appl_ptr to the slave_period (round down) only on start up.
So we keep this low latency. (At start up it is the same behavior as 
known from old sound drivers)

As known from the past in worst case it will drop up to slave_period-1 
frames.
If the application uses big periods and the drop is not acceptable a 
small slave_period using the var_period feature could be configured.
But I think it is also not acceptable for an application which is using 
really big periods to wait sometimes one period longer.

There is always a higher probability of an under run whenever a poll() 
wakeup will not result in a write.
For example on systems with high load this could be an issue.

If you are anyway not happy with this solution may be we can think about 
an option to disable this feature
like "min_drop_on_start true"
But then we also should describe the drawbacks of this option like
* Delay of snd_pcm_wait() up to 2*periods
* Higher probability of Xruns


Best regards

Timo
Takashi Iwai Nov. 6, 2018, 4:32 p.m. UTC | #3
On Tue, 06 Nov 2018 17:15:31 +0100,
Timo Wischer wrote:
> 
> On 11/6/18 15:27, Takashi Iwai wrote:
> > I guess that the biggest issue is the understanding of PCM period
> > wakeup; let me cite the description of a part of your patch:
> >
> >   "But since 3 snd_pcm_period_elapsed(), 3 periods should be available
> >    and it should have been able to write.
> >    If rsnd_pointer during the start was 0x120 which is 3 periods
> >    then 0x248 - 0x120 =  128 it could go on with write."
> >
> > This assumption can't be applied.
> >
> > However, the current implementation of dmix is designed to achieve the
> > lowest latency by putting the data at the exact position being played
> > back now.  This, of course, doesn't guarantee the period wakeup = one
> > period free to fill like the above.  So it's the design choice.
> >
> > The current code has a workaround for the case of nperiods <= 2, by
> > setting the initial slave_appl_ptr to the next period start.  This
> > guarantees that the period wakeup = one period to be filled.  But its
> > cost is the start latency; the playback doesn't start immediately but
> > wait until the next period start.
> >
> >
> > Takashi
> 
> Thanks for this explanation. That helps me to understand.
> 
> Now, I am thinking about a solution to align the slave_hw_ptr and
> slave_appl_ptr to the slave_period (round down) only on start up.
> So we keep this low latency. (At start up it is the same behavior as
> known from old sound drivers)
> 
> As known from the past in worst case it will drop up to slave_period-1 
> frames.
> If the application uses big periods and the drop is not acceptable a
> small slave_period using the var_period feature could be configured.
> But I think it is also not acceptable for an application which is
> using really big periods to wait sometimes one period longer.

Well, which case is acceptable and which isn't -- we can't define it
in our side so easily.  This really depends on how the application is
written and how the hardware setup is.

For example, imagine to play a very short beep sound with dmix.
Dropping samples (at most period-1 size) may result in silence, and
that's not acceptable.

> There is always a higher probability of an under run whenever a poll()
> wakeup will not result in a write.
> For example on systems with high load this could be an issue.
> 
> If you are anyway not happy with this solution may be we can think
> about an option to disable this feature
> like "min_drop_on_start true"
> But then we also should describe the drawbacks of this option like
> * Delay of snd_pcm_wait() up to 2*periods
> * Higher probability of Xruns

This probably depends on how many periods are present on the buffer.
For the standard PC onboard sounds, 16 periods are used as default,
and that's enough for avoiding xruns for normal applications.

I guess we may provide an option for choosing one of the following
modes:

- current default mode;
  no latency, but wakeup may be delayed, and may require 2*periods

- appl_ptr round up to the next period;
  the current workaround for periods <= 2.
  This imposes the start latency.

- round down to the current period start;
  No latency, but will drop a few samples at start.

... and ideally, provide a mode "auto" for choosing the appropriate
one depending on the situation by a smart guess.


thanks,

Takashi