fsl_ssi.c: Getting channel slips with fsl_ssi.c in TDM (network) mode.
diff mbox

Message ID CAG5mAdzJ2_MizAq2ACXwnA=wfO=xujGbUdB2OvcDi=7fHQfyBQ@mail.gmail.com
State Not Applicable
Headers show

Commit Message

Caleb Crome Oct. 29, 2015, 10:23 p.m. UTC
On Thu, Oct 29, 2015 at 12:28 PM, Nicolin Chen <nicoleotsuka@gmail.com> wrote:
> On Thu, Oct 29, 2015 at 12:06:16PM -0700, Caleb Crome wrote:
>
>> This actually is exactly what I'm seeing now.  I'm seeing the
>> *startup* happening from the trigger starting up slipped.  So this
>> does make perfect sense to me.
>
> I saw your problem in the other reply. And I suggested you to let
> DMA work first before SSI gets enabled. As SDMA in that case would
> transfer one burst length (16 if you applied my patch I sent you)
> and pause before SSI gets enabled. Then SSI would have enough data
> to send out without any startup issue.

Ah ha, you are exactly right.  The root cause is that TE and SSIE are
enabled at the same regmap write, with no opportunity for delay
between the SSIE and TE.
DMA can only get going if SSIE is enabled, and the only place SSIE
gets enabled is exactly the same line that TE gets enabled.

specifically: regmap_update_bits(regs, CCSR_SSI_SCR, vals->scr, vals->scr);

I've looked over your emails and I don't see the patch that shows a
pause between SSIE enable and TE enable.  (I do see the dual-fifo
example -- thank you!  I'll give that a try -- it may further reduce
stress on the system).

Here is a patch that solves the issue much more elegantly than my previous one:

>
>> It occurred to me that perhaps the problem has to do when exactly when
>> during the frame-sync period the fsl_ssi_trigger function was called.
>> Perhaps, if it's called near the end or beginning of a frame, somehow
>
> I don't know how you measured if it's before of after. But the frame
> should not start until trigger() gets call -- more clearly SSIEN and
> TE get enabled. From my point of view, you problem should be caused
> by SSI getting enabled without enough data in the FIFO. And that's
> what I just described in the previous paragraph and previous reply.

Yep, that sure seems to be it.  This patch above never seems to have a
bad start.
Is adding the udelay the best way to put a delay between SSIE and TE enable?
Are there any other mechanisms for that?

Thanks so much for your attention and help!  I think I can finally
move forward with the MX6 on a bunch of projects now :-)

(well, I still have to test Rx and verify full dulex perfection there
too, but this is a great start)

-Caleb

Comments

Nicolin Chen Oct. 29, 2015, 10:47 p.m. UTC | #1
On Thu, Oct 29, 2015 at 03:23:41PM -0700, Caleb Crome wrote:

> > I saw your problem in the other reply. And I suggested you to let
> > DMA work first before SSI gets enabled. As SDMA in that case would
> > transfer one burst length (16 if you applied my patch I sent you)
> > and pause before SSI gets enabled. Then SSI would have enough data
> > to send out without any startup issue.
> 
> Ah ha, you are exactly right.  The root cause is that TE and SSIE are
> enabled at the same regmap write, with no opportunity for delay
> between the SSIE and TE.
> DMA can only get going if SSIE is enabled, and the only place SSIE
> gets enabled is exactly the same line that TE gets enabled.

A little difference between your point and mine is that you think
DMA request only starts when SSIE and TDMAE both get set while I
only think about TDMAE. It's hard to say which one is correct as
it depends on the design of IP wrapper but you can fairly test it
with your change below: Mask both TE with SSIE and set them after
the delay. If it doesn't work, yours is the correct one.

> I've looked over your emails and I don't see the patch that shows a

You may need to open an offline email that I sent you with patches
in its attachment. I can see it via Gmail anyway.

> pause between SSIE enable and TE enable.  (I do see the dual-fifo
> example -- thank you!  I'll give that a try -- it may further reduce
> stress on the system).

I'm sure dual FIFO will get better performance. But the example I
gave you doesn't set RX parameters so well. You may need to fine
tune it later.

> Is adding the udelay the best way to put a delay between SSIE and TE enable?
> Are there any other mechanisms for that?

Having a delay is much safer for you but surely it's not a common
practice that's best all other platforms such as two-channel cases
and those who needs performance.

I encourage you to try to follow one of patches I gave you that
sets TDMAE/RDMAE at the beginning of the trigger(). Surely you may
change it to TDMAE | SSIE after you find out that SSIE is indeed
required. If you are still having trouble, adding a delay would
be nice for you but it may be hard for me to ack it if you want
to merge it in the driver.

Nicolin
Caleb Crome Oct. 29, 2015, 11:33 p.m. UTC | #2
On Thu, Oct 29, 2015 at 3:47 PM, Nicolin Chen <nicoleotsuka@gmail.com> wrote:
> On Thu, Oct 29, 2015 at 03:23:41PM -0700, Caleb Crome wrote:
>
>> > I saw your problem in the other reply. And I suggested you to let
>> > DMA work first before SSI gets enabled. As SDMA in that case would
>> > transfer one burst length (16 if you applied my patch I sent you)
>> > and pause before SSI gets enabled. Then SSI would have enough data
>> > to send out without any startup issue.
>>
>> Ah ha, you are exactly right.  The root cause is that TE and SSIE are
>> enabled at the same regmap write, with no opportunity for delay
>> between the SSIE and TE.
>> DMA can only get going if SSIE is enabled, and the only place SSIE
>> gets enabled is exactly the same line that TE gets enabled.
>
> A little difference between your point and mine is that you think
> DMA request only starts when SSIE and TDMAE both get set while I
> only think about TDMAE. It's hard to say which one is correct as
> it depends on the design of IP wrapper but you can fairly test it
> with your change below: Mask both TE with SSIE and set them after
> the delay. If it doesn't work, yours is the correct one.

Ah, that's one thing that's very clear in the FSL datasheet:  the
FIFOs are ZEROED if SSIE is 0.  This means that even if the DMA were
trying to dump data in before SSIE is enabled, the data would go to
bit heaven.

The docs for TE say, "The normal transmit enable sequence is to write
data to the STX register(s) and then set the TE bit." (page 5145 of
IMX6SDLRM.pdf)

So in the DMA + fifo case the words, "write data to the STX
register(s)" imply that it's actually DMA writing to FIFOs, which then
write the STX register.  So, the sequence must be:  enable SSIE &
TDMAE to allow DMA to write to the fifo, then later enable TE, right?

>
>> I've looked over your emails and I don't see the patch that shows a
>
> You may need to open an offline email that I sent you with patches
> in its attachment. I can see it via Gmail anyway.

Will do.  Thanks.

>
>> pause between SSIE enable and TE enable.  (I do see the dual-fifo
>> example -- thank you!  I'll give that a try -- it may further reduce
>> stress on the system).
>
> I'm sure dual FIFO will get better performance. But the example I
> gave you doesn't set RX parameters so well. You may need to fine
> tune it later.
>
>> Is adding the udelay the best way to put a delay between SSIE and TE enable?
>> Are there any other mechanisms for that?
>
> Having a delay is much safer for you but surely it's not a common
> practice that's best all other platforms such as two-channel cases
> and those who needs performance.
>
> I encourage you to try to follow one of patches I gave you that
> sets TDMAE/RDMAE at the beginning of the trigger(). Surely you may
> change it to TDMAE | SSIE after you find out that SSIE is indeed
> required. If you are still having trouble, adding a delay would
> be nice for you but it may be hard for me to ack it if you want
> to merge it in the driver.

I now I see your patch!  Okay, I'll give that a go, but it's still
just a race condition between the regmap_update_bits with TDMAE (your
patch) verses the regmap_update_bits from fsl_ssi_config. You're just
hoping that a DMA write happens between TDMAE and the end of
fsl_ssi_config where TE is enabled.

Now I think I get it though.  We do TMDAE + SSIEN like your patch,
then a short while loop on SFCSR.TFCNT0.  After the first word gets
written to the fifo, TFCNT0 should go > 0, and then we can release TE.

There may be a better status register to wait on but TFCNT0 seems like
it will do the trick.

What do you think of that solution?  Any better register to wait on?
Would that be acceptable to merge into the driver?

Thanks,
  -Caleb

>
> Nicolin
Nicolin Chen Oct. 30, 2015, 1:29 a.m. UTC | #3
On Thu, Oct 29, 2015 at 04:33:26PM -0700, Caleb Crome wrote:

> > A little difference between your point and mine is that you think
> > DMA request only starts when SSIE and TDMAE both get set while I
> > only think about TDMAE. It's hard to say which one is correct as
> > it depends on the design of IP wrapper but you can fairly test it
> > with your change below: Mask both TE with SSIE and set them after
> > the delay. If it doesn't work, yours is the correct one.
> 
> Ah, that's one thing that's very clear in the FSL datasheet:  the
> FIFOs are ZEROED if SSIE is 0.  This means that even if the DMA were
> trying to dump data in before SSIE is enabled, the data would go to
> bit heaven.
> 
> The docs for TE say, "The normal transmit enable sequence is to write
> data to the STX register(s) and then set the TE bit." (page 5145 of
> IMX6SDLRM.pdf)
> 
> So in the DMA + fifo case the words, "write data to the STX
> register(s)" imply that it's actually DMA writing to FIFOs, which then
> write the STX register.  So, the sequence must be:  enable SSIE &
> TDMAE to allow DMA to write to the fifo, then later enable TE, right?

You have the point. If SSIEN is being treated as the reset signal
internally, any write enable signal could be ignored.

> > I encourage you to try to follow one of patches I gave you that
> > sets TDMAE/RDMAE at the beginning of the trigger(). Surely you may
> > change it to TDMAE | SSIE after you find out that SSIE is indeed
> > required. If you are still having trouble, adding a delay would
> > be nice for you but it may be hard for me to ack it if you want
> > to merge it in the driver.
> 
> I now I see your patch!  Okay, I'll give that a go, but it's still
> just a race condition between the regmap_update_bits with TDMAE (your
> patch) verses the regmap_update_bits from fsl_ssi_config. You're just
> hoping that a DMA write happens between TDMAE and the end of
> fsl_ssi_config where TE is enabled.

DMA transaction will be issued once BD is ready (in SDMA driver)
and SSI sends a DMA request. So I'm hoping that the context
latency between the regmap_update_bits() and TE setting should be
enough for DMA to fill the FIFO.

> Now I think I get it though.  We do TMDAE + SSIEN like your patch,
> then a short while loop on SFCSR.TFCNT0.  After the first word gets
> written to the fifo, TFCNT0 should go > 0, and then we can release TE.
> 
> There may be a better status register to wait on but TFCNT0 seems like
> it will do the trick.

Waiting for TFCNT0 sounds reasonable to me as long as the code is
well commented.
Arnaud Mouiche Oct. 30, 2015, 8:29 a.m. UTC | #4
Hi,

Le 30/10/2015 02:29, Nicolin Chen a écrit :
> On Thu, Oct 29, 2015 at 04:33:26PM -0700, Caleb Crome wrote:
>
>>> A little difference between your point and mine is that you think
>>> DMA request only starts when SSIE and TDMAE both get set while I
>>> only think about TDMAE. It's hard to say which one is correct as
>>> it depends on the design of IP wrapper but you can fairly test it
>>> with your change below: Mask both TE with SSIE and set them after
>>> the delay. If it doesn't work, yours is the correct one.
>> Ah, that's one thing that's very clear in the FSL datasheet:  the
>> FIFOs are ZEROED if SSIE is 0.  This means that even if the DMA were
>> trying to dump data in before SSIE is enabled, the data would go to
>> bit heaven.
>>
>> The docs for TE say, "The normal transmit enable sequence is to write
>> data to the STX register(s) and then set the TE bit." (page 5145 of
>> IMX6SDLRM.pdf)
>>
>> So in the DMA + fifo case the words, "write data to the STX
>> register(s)" imply that it's actually DMA writing to FIFOs, which then
>> write the STX register.  So, the sequence must be:  enable SSIE &
>> TDMAE to allow DMA to write to the fifo, then later enable TE, right?
> You have the point. If SSIEN is being treated as the reset signal
> internally, any write enable signal could be ignored.
>
>>> I encourage you to try to follow one of patches I gave you that
>>> sets TDMAE/RDMAE at the beginning of the trigger(). Surely you may
>>> change it to TDMAE | SSIE after you find out that SSIE is indeed
>>> required. If you are still having trouble, adding a delay would
>>> be nice for you but it may be hard for me to ack it if you want
>>> to merge it in the driver.
>> I now I see your patch!  Okay, I'll give that a go, but it's still
>> just a race condition between the regmap_update_bits with TDMAE (your
>> patch) verses the regmap_update_bits from fsl_ssi_config. You're just
>> hoping that a DMA write happens between TDMAE and the end of
>> fsl_ssi_config where TE is enabled.
> DMA transaction will be issued once BD is ready (in SDMA driver)
> and SSI sends a DMA request. So I'm hoping that the context
> latency between the regmap_update_bits() and TE setting should be
> enough for DMA to fill the FIFO.
>
>> Now I think I get it though.  We do TMDAE + SSIEN like your patch,
>> then a short while loop on SFCSR.TFCNT0.  After the first word gets
>> written to the fifo, TFCNT0 should go > 0, and then we can release TE.
>>
>> There may be a better status register to wait on but TFCNT0 seems like
>> it will do the trick.
> Waiting for TFCNT0 sounds reasonable to me as long as the code is
> well commented.

At imx50 age, I remember one workaround was to fill the fifo manually, 
writing directly a number of samples (equal to the number of slots for 
one frame to keep the synchronization), and then, enable the TMDAE.
This just allow to not have to wait an undefined period of time for the 
DMA to be ready.
But, on the other hand, if the time to wait the DMA is short enough, it 
should not be an issue.

Regards,
Arnaud
Arnaud Mouiche Oct. 30, 2015, 8:45 a.m. UTC | #5
Hi again,

Le 30/10/2015 09:29, arnaud.mouiche@invoxia.com a écrit :
> Hi,
>
> Le 30/10/2015 02:29, Nicolin Chen a écrit :
>> On Thu, Oct 29, 2015 at 04:33:26PM -0700, Caleb Crome wrote:
>>
>>>> A little difference between your point and mine is that you think
>>>> DMA request only starts when SSIE and TDMAE both get set while I
>>>> only think about TDMAE. It's hard to say which one is correct as
>>>> it depends on the design of IP wrapper but you can fairly test it
>>>> with your change below: Mask both TE with SSIE and set them after
>>>> the delay. If it doesn't work, yours is the correct one.
>>> Ah, that's one thing that's very clear in the FSL datasheet: the
>>> FIFOs are ZEROED if SSIE is 0.  This means that even if the DMA were
>>> trying to dump data in before SSIE is enabled, the data would go to
>>> bit heaven.
>>>
>>> The docs for TE say, "The normal transmit enable sequence is to write
>>> data to the STX register(s) and then set the TE bit." (page 5145 of
>>> IMX6SDLRM.pdf)
>>>
>>> So in the DMA + fifo case the words, "write data to the STX
>>> register(s)" imply that it's actually DMA writing to FIFOs, which then
>>> write the STX register.  So, the sequence must be:  enable SSIE &
>>> TDMAE to allow DMA to write to the fifo, then later enable TE, right?
>> You have the point. If SSIEN is being treated as the reset signal
>> internally, any write enable signal could be ignored.
>>
>>>> I encourage you to try to follow one of patches I gave you that
>>>> sets TDMAE/RDMAE at the beginning of the trigger(). Surely you may
>>>> change it to TDMAE | SSIE after you find out that SSIE is indeed
>>>> required. If you are still having trouble, adding a delay would
>>>> be nice for you but it may be hard for me to ack it if you want
>>>> to merge it in the driver.
>>> I now I see your patch!  Okay, I'll give that a go, but it's still
>>> just a race condition between the regmap_update_bits with TDMAE (your
>>> patch) verses the regmap_update_bits from fsl_ssi_config. You're just
>>> hoping that a DMA write happens between TDMAE and the end of
>>> fsl_ssi_config where TE is enabled.
>> DMA transaction will be issued once BD is ready (in SDMA driver)
>> and SSI sends a DMA request. So I'm hoping that the context
>> latency between the regmap_update_bits() and TE setting should be
>> enough for DMA to fill the FIFO.
>>
>>> Now I think I get it though.  We do TMDAE + SSIEN like your patch,
>>> then a short while loop on SFCSR.TFCNT0.  After the first word gets
>>> written to the fifo, TFCNT0 should go > 0, and then we can release TE.
>>>
>>> There may be a better status register to wait on but TFCNT0 seems like
>>> it will do the trick.
>> Waiting for TFCNT0 sounds reasonable to me as long as the code is
>> well commented.
>
> At imx50 age, I remember one workaround was to fill the fifo manually, 
> writing directly a number of samples (equal to the number of slots for 
> one frame to keep the synchronization), and then, enable the TMDAE.
> This just allow to not have to wait an undefined period of time for 
> the DMA to be ready.
> But, on the other hand, if the time to wait the DMA is short enough, 
> it should not be an issue.
>
> Regards,
> Arnaud
>
In the same idea, they were other similar issues to deal with concerning 
the RX and TX fifo.

1) Still some samples in the TX fifo when stoping/ re-starting the TX, 
while RX stream is going on.
Since we can't reset the TX fifo content without disabling SSIEN, 
possible samples filled by the TX DMA are still there when the TX stream 
stops. And when we start it again, they introduce a random 
de-synchronization of the output.
The workaround for this case was to add additional zero samples in the 
fifo manually to reach a multiple of the frame size.
But I would prefer a way to empty manually the fifo instead. If 
Freescale can help us to find another way as they know the internal of 
the SSI...

2) the same for RX fifo, if the RX stream is stopped/-restarted, while 
TX stream is not stopped.
We may still have some samples in the RX fifo, and those fifo must be 
removed before starting the RX again.
This was more simple in this case, since we only need to read the RX 
register manually until the fifo is empty, before enabling the DMA.

Obviously, disabling the SSIEN completely to start on good basis is not 
possible since we will lose a random number of samples already present 
in the fifo corresponding to the stream we don't want to stop, and this 
number of samples may not be a multiple of slots.

Arnaud
Nicolin Chen Oct. 30, 2015, 3:49 p.m. UTC | #6
On Fri, Oct 30, 2015 at 09:29:02AM +0100, arnaud.mouiche@invoxia.com wrote:

> At imx50 age, I remember one workaround was to fill the fifo
> manually, writing directly a number of samples (equal to the number
> of slots for one frame to keep the synchronization), and then,
> enable the TMDAE.
> This just allow to not have to wait an undefined period of time for
> the DMA to be ready.
> But, on the other hand, if the time to wait the DMA is short enough,
> it should not be an issue.

Nice input. This reminds me of the zero-filling step inside the
ESAI startup procedure:

	case SNDRV_PCM_TRIGGER_START:
	case SNDRV_PCM_TRIGGER_RESUME:
	case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
		regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx),
				   ESAI_xFCR_xFEN_MASK, ESAI_xFCR_xFEN);

		/* Write initial words reqiured by ESAI as normal procedure */
		for (i = 0; tx && i < channels; i++)
			regmap_write(esai_priv->regmap, REG_ESAI_ETDR, 0x0);

		regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx),
				   tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK,
				   tx ? ESAI_xCR_TE(pins) : ESAI_xCR_RE(pins));
		break;

It's exactly the same thing to prevent underrun. This might be
a reasonable alternative option due to no polling overhead and
timeout handling.

Thanks
Nicolin
Nicolin Chen Oct. 30, 2015, 4:07 p.m. UTC | #7
Add Shengjiu.

On Fri, Oct 30, 2015 at 09:45:32AM +0100, arnaud.mouiche@invoxia.com wrote:

> >At imx50 age, I remember one workaround was to fill the fifo
> >manually, writing directly a number of samples (equal to the
> >number of slots for one frame to keep the synchronization), and
> >then, enable the TMDAE.
> >This just allow to not have to wait an undefined period of time
> >for the DMA to be ready.
> >But, on the other hand, if the time to wait the DMA is short
> >enough, it should not be an issue.

> In the same idea, they were other similar issues to deal with
> concerning the RX and TX fifo.
> 
> 1) Still some samples in the TX fifo when stoping/ re-starting the
> TX, while RX stream is going on.
> Since we can't reset the TX fifo content without disabling SSIEN,
> possible samples filled by the TX DMA are still there when the TX
> stream stops. And when we start it again, they introduce a random
> de-synchronization of the output.
> The workaround for this case was to add additional zero samples in
> the fifo manually to reach a multiple of the frame size.
> But I would prefer a way to empty manually the fifo instead. If
> Freescale can help us to find another way as they know the internal
> of the SSI...
> 
> 2) the same for RX fifo, if the RX stream is stopped/-restarted,
> while TX stream is not stopped.
> We may still have some samples in the RX fifo, and those fifo must
> be removed before starting the RX again.
> This was more simple in this case, since we only need to read the RX
> register manually until the fifo is empty, before enabling the DMA.
> 
> Obviously, disabling the SSIEN completely to start on good basis is
> not possible since we will lose a random number of samples already
> present in the fifo corresponding to the stream we don't want to
> stop, and this number of samples may not be a multiple of slots.

You are right. Since SSI doesn't have separate FIFO reset bits
for TX and RX respectively. These could be issues.
Caleb Crome Oct. 30, 2015, 6:10 p.m. UTC | #8
On Fri, Oct 30, 2015 at 8:49 AM, Nicolin Chen <nicoleotsuka@gmail.com> wrote:
> On Fri, Oct 30, 2015 at 09:29:02AM +0100, arnaud.mouiche@invoxia.com wrote:
>
>> At imx50 age, I remember one workaround was to fill the fifo
>> manually, writing directly a number of samples (equal to the number
>> of slots for one frame to keep the synchronization), and then,
>> enable the TMDAE.
>> This just allow to not have to wait an undefined period of time for
>> the DMA to be ready.
>> But, on the other hand, if the time to wait the DMA is short enough,
>> it should not be an issue.
>
> Nice input. This reminds me of the zero-filling step inside the
> ESAI startup procedure:
>
>         case SNDRV_PCM_TRIGGER_START:
>         case SNDRV_PCM_TRIGGER_RESUME:
>         case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
>                 regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx),
>                                    ESAI_xFCR_xFEN_MASK, ESAI_xFCR_xFEN);
>
>                 /* Write initial words reqiured by ESAI as normal procedure */
>                 for (i = 0; tx && i < channels; i++)
>                         regmap_write(esai_priv->regmap, REG_ESAI_ETDR, 0x0);
>
>                 regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx),
>                                    tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK,
>                                    tx ? ESAI_xCR_TE(pins) : ESAI_xCR_RE(pins));
>                 break;
>
> It's exactly the same thing to prevent underrun. This might be
> a reasonable alternative option due to no polling overhead and
> timeout handling.
>
> Thanks
> Nicolin

Interesting, but in the SSI case, the # channels can easily be larger
than the fifo size.  For example, I'm using 16 channels, and the fifo
size is only 15.  Of course, with the dual fifo enabled, then this
restriction is eased to a maximum # of slots of 30.

I'll do a quick check this morning with Nicolin's suggested patch, and
see how many loops it needs to poll before the fifo gets a word.  My
suspicion is that Nicolin's hunch is correct: that is,  that the
context switch provides enough time for the DMA to push a word.

Will check back shortly.

Then, we'll have the problem that Arnaud bring up -- when starting the
Rx after Tx, we will have fresh new problems.  In fact, I'm 100% sure
we will -- I already see it with a portaudio program that must bring
up tx and rx separately.  I think I'll start a new thread for that
problem once this one is put to bed.

-Caleb
Caleb Crome Oct. 30, 2015, 10:04 p.m. UTC | #9
On Thu, Oct 29, 2015 at 6:29 PM, Nicolin Chen <nicoleotsuka@gmail.com> wrote:
> On Thu, Oct 29, 2015 at 04:33:26PM -0700, Caleb Crome wrote:
>
>> > A little difference between your point and mine is that you think
>> > DMA request only starts when SSIE and TDMAE both get set while I
>> > only think about TDMAE. It's hard to say which one is correct as
>> > it depends on the design of IP wrapper but you can fairly test it
>> > with your change below: Mask both TE with SSIE and set them after
>> > the delay. If it doesn't work, yours is the correct one.
>>
>> Ah, that's one thing that's very clear in the FSL datasheet:  the
>> FIFOs are ZEROED if SSIE is 0.  This means that even if the DMA were
>> trying to dump data in before SSIE is enabled, the data would go to
>> bit heaven.
>>
>> The docs for TE say, "The normal transmit enable sequence is to write
>> data to the STX register(s) and then set the TE bit." (page 5145 of
>> IMX6SDLRM.pdf)
>>
>> So in the DMA + fifo case the words, "write data to the STX
>> register(s)" imply that it's actually DMA writing to FIFOs, which then
>> write the STX register.  So, the sequence must be:  enable SSIE &
>> TDMAE to allow DMA to write to the fifo, then later enable TE, right?
>
> You have the point. If SSIEN is being treated as the reset signal
> internally, any write enable signal could be ignored.
>
>> > I encourage you to try to follow one of patches I gave you that
>> > sets TDMAE/RDMAE at the beginning of the trigger(). Surely you may
>> > change it to TDMAE | SSIE after you find out that SSIE is indeed
>> > required. If you are still having trouble, adding a delay would
>> > be nice for you but it may be hard for me to ack it if you want
>> > to merge it in the driver.
>>
>> I now I see your patch!  Okay, I'll give that a go, but it's still
>> just a race condition between the regmap_update_bits with TDMAE (your
>> patch) verses the regmap_update_bits from fsl_ssi_config. You're just
>> hoping that a DMA write happens between TDMAE and the end of
>> fsl_ssi_config where TE is enabled.
>
> DMA transaction will be issued once BD is ready (in SDMA driver)
> and SSI sends a DMA request. So I'm hoping that the context
> latency between the regmap_update_bits() and TE setting should be
> enough for DMA to fill the FIFO.
>
>> Now I think I get it though.  We do TMDAE + SSIEN like your patch,
>> then a short while loop on SFCSR.TFCNT0.  After the first word gets
>> written to the fifo, TFCNT0 should go > 0, and then we can release TE.
>>
>> There may be a better status register to wait on but TFCNT0 seems like
>> it will do the trick.
>
> Waiting for TFCNT0 sounds reasonable to me as long as the code is
> well commented.

Okay, so I tried out your patch and it has 2 separate issues:  the
first related to missing samples (even when I enabled the SSIEN), and
the second relating to the dual fifo.
So, first the missing samples.

**********************
** Problem 1 *******
**********************
When I enable your patch in single fifo mode we get maxburst lost
samples at the beginning of the stream.  The important bit is below.
It doesn't matter if the SSIEN comes before or after the TDMAE, the
samples are lost.  Surely because something in the fsl_ssi_config
clears the fifo.  I didn't look into that yet.
--------------------------- snip -------------------
diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 73778c2..9d414e8 100644
@@ -1039,6 +1070,20 @@ static int fsl_ssi_trigger(struct
snd_pcm_substream *substream, int cmd,
     case SNDRV_PCM_TRIGGER_START:
     case SNDRV_PCM_TRIGGER_RESUME:
     case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
+        /* eanble SSI */
+        regmap_update_bits(regs, CCSR_SSI_SCR,
+                   CCSR_SSI_SCR_SSIEN,
+                   CCSR_SSI_SCR_SSIEN);
+        /* and enable DMA to start data pumping */
+        if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
+            regmap_update_bits(regs, CCSR_SSI_SIER,
+                       CCSR_SSI_SIER_TDMAE,
+                       CCSR_SSI_SIER_TDMAE);
+        else
+            regmap_update_bits(regs, CCSR_SSI_SIER,
+                       CCSR_SSI_SIER_RDMAE,
+                       CCSR_SSI_SIER_RDMAE);
+
         if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
             fsl_ssi_tx_config(ssi_private, true);
         else

--------------------------- snip -------------------

If instead, I do the SSIEN at the bottom of the fsl_ssi_configure
function like this, it works perfectly (in single fifo mode)

--------------------------- snip -------------------
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -435,8 +435,49 @@ static void fsl_ssi_config(struct fsl_ssi_private
*ssi_private, bool enable,

 config_done:
     /* Enabling of subunits is done after configuration */
-    if (enable)
+    if (enable) {
+        /* eanble SSI */
+        regmap_update_bits(regs, CCSR_SSI_SCR,
+                   CCSR_SSI_SCR_SSIEN,
+                   CCSR_SSI_SCR_SSIEN);
+        /*
+         * We must wait here until the DMA actually manages to
+         * get a word into the Tx FIFO.  Only if starting a Tx
+         * stream.
+         * In tests on an MX6 at 1GHz clock speed, the do
+         * loop below never iterated at all (i.e. it dropped
+         * through without repeating ever.  which means that
+         * the DMA has had time to get some words into the TX
+         * buffer. In fact, the tfcnt0 was always 13, so it
+         * was quite full by the time it reached this point,
+         * so this do loop should never be a bottleneck.  If
+         * max iterations is hit, then something might be
+         * wrong.  report it in that case.
+         */
+        int tfcnt0 = 0;
+        int tfcnt1 = 1;
+        int max_iterations = 1000;
+        if (vals->scr & CCSR_SSI_SCR_TE) {
+            u32 sfcsr;
+            do {
+                regmap_read(regs, CCSR_SSI_SFCSR, &sfcsr);
+                tfcnt0 = CCSR_SSI_SFCSR_TFCNT0(sfcsr);
+                tfcnt1 = CCSR_SSI_SFCSR_TFCNT0(sfcsr);
+            } while(max_iterations-- && (tfcnt0 == 0) && (tfcnt1 == 0));
+        }
+        if (max_iterations <= 0) {
+            /*
+             * The DMA definitely should have stuck at
+             * least a word into the FIFO by now.  Report
+             * an error, but continue on blindly anyway,
+             * even though the SSI might not start right.
+             */
+            struct platform_device *pdev = ssi_private->pdev;
+            dev_err(&pdev->dev, "max_iterations reached when"
+                "starting SSI Tx\n");
+        }
         regmap_update_bits(regs, CCSR_SSI_SCR, vals->scr, vals->scr);
+    }
 }
--------------------------- snip -------------------

When all is fixed, the data comes in perfect order:
(most significant nibble is channel # from wav file, least significant
12 bits are frame number).
(hope that lines don't wrap at 79 characters...)
0000,1000,2000,3000,4000,5000,6000,7000,8000,9000,a000,b000,c000,d000,e000,f000
0001,1001,2001,3001,4001,5001,6001,7001,8001,9001,a001,b001,c001,d001,e001,f001
0002,1002,2002,3002,4002,5002,6002,7002,8002,9002,a002,b002,c002,d002,e002,f002
0003,1003,2003,3003,4003,5003,6003,7003,8003,9003,a003,b003,c003,d003,e003,f003
...
00fd,10fd,20fd,30fd,40fd,50fd,60fd,70fd,80fd,90fd,a0fd,b0fd,c0fd,d0fd,e0fd,f0fd
00fe,10fe,20fe,30fe,40fe,50fe,60fe,70fe,80fe,90fe,a0fe,b0fe,c0fe,d0fe,e0fe,f0fe
00ff,10ff,20ff,30ff,40ff,50ff,60ff,70ff,80ff,90ff,a0ff,b0ff,c0ff,d0ff,e0ff,f0ff
and all zeros after that.

FYI, this is a 256 frame, 16 channel sound file for doing this test.

**********************
** Problem 2 *******
**********************
When the dual fifo mode is enabled, the data comes in the following order:
0000,1000,2000,3000,4000,5000,6000,7000,8000,9000,a000,b000,c000,d000,e000,f000
0001,1001,2001,3001,4001,5001,6001,7001,8001,9001,a001,b001,c001,d001,e001,1002
0002,3002,2002,5002,4002,7002,6002,9002,8002,b002,a002,d002,c002,f002,e002,1003
0003,3003,2003,5003,4003,7003,6003,9003,8003,b003,a003,d003,c003,f003,e003,1004

Strange, right?  Frame 0 is perfect, however after the first frame,
the data gets scrambled and from then on is wrong.
The pattern stays consistent after frame 2.


Looks like I need to stick with single fifo for now.

So, bottom line is:  single fifo seems to work perfectly with my
proposed fix above.  Dual fifo doesn't seem to work.

-caleb
Caleb Crome Oct. 30, 2015, 10:35 p.m. UTC | #10
On Fri, Oct 30, 2015 at 3:04 PM, Caleb Crome <caleb@crome.org> wrote:
> On Thu, Oct 29, 2015 at 6:29 PM, Nicolin Chen <nicoleotsuka@gmail.com> wrote:
>> On Thu, Oct 29, 2015 at 04:33:26PM -0700, Caleb Crome wrote:
>>
>> Waiting for TFCNT0 sounds reasonable to me as long as the code is
>> well commented.
>
> Okay, so I tried out your patch and it has 2 separate issues:  the
> first related to missing samples (even when I enabled the SSIEN), and
> the second relating to the dual fifo.
> So, first the missing samples.
>
>
> **********************
> ** Problem 2 *******
> **********************
> When the dual fifo mode is enabled, the data comes in the following order:
> 0000,1000,2000,3000,4000,5000,6000,7000,8000,9000,a000,b000,c000,d000,e000,f000
> 0001,1001,2001,3001,4001,5001,6001,7001,8001,9001,a001,b001,c001,d001,e001,1002
> 0002,3002,2002,5002,4002,7002,6002,9002,8002,b002,a002,d002,c002,f002,e002,1003
> 0003,3003,2003,5003,4003,7003,6003,9003,8003,b003,a003,d003,c003,f003,e003,1004
>
> Strange, right?  Frame 0 is perfect, however after the first frame,
> the data gets scrambled and from then on is wrong.
> The pattern stays consistent after frame 2.
>
>
> Looks like I need to stick with single fifo for now.
>
> So, bottom line is:  single fifo seems to work perfectly with my
> proposed fix above.  Dual fifo doesn't seem to work.
>

I realized the problem with the dual fifo mode:  it's that you had set
the maxburst to 16 in your patch.  I guess this must be the maxburst
to a single fifo maybe?  When I set the fual fifo maxburst back to 8,
I get perfection again!

So, now bottom line is:  both single and dual ffio modes work :-)

So, with this patch (I'll send separately as a proper patch), Tx will
work perfectly.

Next thing to tackle is:  doing starts/stops on Tx/Rx in different orders:

So far, I've only verified:
TxStart - TxEnd

Next is:
RxStart - RxEnd

And the combinations:
TxStart - RxStart - TxEnd - RxEnd
TxStart - RxStart - RxEnd - TxEnd
RxStart - TxStart - RxEnd - TxEnd
RxStart - TxStart - TxEnd - RxEnd

Maybe the rest won't be as difficult :-/


> -caleb
Nicolin Chen Oct. 31, 2015, 1:32 a.m. UTC | #11
On Fri, Oct 30, 2015 at 03:35:17PM -0700, Caleb Crome wrote:

> > **********************
> > ** Problem 2 *******
> > **********************

> I realized the problem with the dual fifo mode:  it's that you had set
> the maxburst to 16 in your patch.  I guess this must be the maxburst
> to a single fifo maybe?  When I set the fual fifo maxburst back to 8,
> I get perfection again!

I actually set 16 for dual FIFO and tested with 2-channel playback.
Since you have dual FIFO right now, burst 8 doesn't hurt a lot as
SPBA should be a pretty dedicated bus from SDMA point of view. As
long as you don't have too many SPBA modules working with SDMA. It
should work for you.
Nicolin Chen Oct. 31, 2015, 1:48 a.m. UTC | #12
On Fri, Oct 30, 2015 at 03:04:37PM -0700, Caleb Crome wrote:
 
> **********************
> ** Problem 1 *******
> **********************
> When I enable your patch in single fifo mode we get maxburst lost
> samples at the beginning of the stream.  The important bit is below.
> It doesn't matter if the SSIEN comes before or after the TDMAE, the
> samples are lost.  Surely because something in the fsl_ssi_config
> clears the fifo.  I didn't look into that yet.

The config function has gone way complicated than I realized.
But you may need to dig a little bit to make a perfect point
to insert your change as it might break other platforms: AC97
and old i.MX and MPC.

> --------------------------- snip -------------------
> 
> If instead, I do the SSIEN at the bottom of the fsl_ssi_configure
> function like this, it works perfectly (in single fifo mode)
> 
> --------------------------- snip -------------------
> +++ b/sound/soc/fsl/fsl_ssi.c
> @@ -435,8 +435,49 @@ static void fsl_ssi_config(struct fsl_ssi_private
> *ssi_private, bool enable,
> 
>  config_done:
>      /* Enabling of subunits is done after configuration */
> -    if (enable)
> +    if (enable) {
> +        /* eanble SSI */
> +        regmap_update_bits(regs, CCSR_SSI_SCR,
> +                   CCSR_SSI_SCR_SSIEN,
> +                   CCSR_SSI_SCR_SSIEN);
> +        /*
> +         * We must wait here until the DMA actually manages to
> +         * get a word into the Tx FIFO.  Only if starting a Tx
> +         * stream.
> +         * In tests on an MX6 at 1GHz clock speed, the do
> +         * loop below never iterated at all (i.e. it dropped
> +         * through without repeating ever.  which means that
> +         * the DMA has had time to get some words into the TX
> +         * buffer. In fact, the tfcnt0 was always 13, so it
> +         * was quite full by the time it reached this point,
> +         * so this do loop should never be a bottleneck.  If
> +         * max iterations is hit, then something might be
> +         * wrong.  report it in that case.
> +         */
> +        int tfcnt0 = 0;
> +        int tfcnt1 = 1;
> +        int max_iterations = 1000;
> +        if (vals->scr & CCSR_SSI_SCR_TE) {
> +            u32 sfcsr;
> +            do {
> +                regmap_read(regs, CCSR_SSI_SFCSR, &sfcsr);
> +                tfcnt0 = CCSR_SSI_SFCSR_TFCNT0(sfcsr);
> +                tfcnt1 = CCSR_SSI_SFCSR_TFCNT0(sfcsr);
> +            } while(max_iterations-- && (tfcnt0 == 0) && (tfcnt1 == 0));
> +        }
> +        if (max_iterations <= 0) {
> +            /*
> +             * The DMA definitely should have stuck at
> +             * least a word into the FIFO by now.  Report
> +             * an error, but continue on blindly anyway,
> +             * even though the SSI might not start right.
> +             */
> +            struct platform_device *pdev = ssi_private->pdev;
> +            dev_err(&pdev->dev, "max_iterations reached when"
> +                "starting SSI Tx\n");
> +        }

Looks like polling is the only way to safely kick off. It's okay.
But I would like to see how the change will be after merging the
simultaneous TE/RE work around. It may need go a long way with
other platform users.
Caleb Crome Oct. 31, 2015, 4:12 p.m. UTC | #13
On Fri, Oct 30, 2015 at 6:32 PM, Nicolin Chen <nicoleotsuka@gmail.com> wrote:
> On Fri, Oct 30, 2015 at 03:35:17PM -0700, Caleb Crome wrote:
>
>> > **********************
>> > ** Problem 2 *******
>> > **********************
>
>> I realized the problem with the dual fifo mode:  it's that you had set
>> the maxburst to 16 in your patch.  I guess this must be the maxburst
>> to a single fifo maybe?  When I set the fual fifo maxburst back to 8,
>> I get perfection again!
>
> I actually set 16 for dual FIFO and tested with 2-channel playback.
> Since you have dual FIFO right now, burst 8 doesn't hurt a lot as
> SPBA should be a pretty dedicated bus from SDMA point of view. As
> long as you don't have too many SPBA modules working with SDMA. It
> should work for you.

But did check for bit perfect playback from start to end of your file?
 Or simply that it sounded right?  it appears that with maxburst=16,
after about 30 samples, the data plays out without further slipping.
This would appear 'working' without careful analysis because it could
slip an even number of samples and each channel would end up in the
right slot after a few frames.  maxbust=16 *definitely* does not work
for me.

-Caleb


> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Caleb Crome Oct. 31, 2015, 4:22 p.m. UTC | #14
On Fri, Oct 30, 2015 at 6:48 PM, Nicolin Chen <nicoleotsuka@gmail.com> wrote:
> On Fri, Oct 30, 2015 at 03:04:37PM -0700, Caleb Crome wrote:
>
>> **********************
>> ** Problem 1 *******
>> **********************
>> When I enable your patch in single fifo mode we get maxburst lost
>> samples at the beginning of the stream.  The important bit is below.
>> It doesn't matter if the SSIEN comes before or after the TDMAE, the
>> samples are lost.  Surely because something in the fsl_ssi_config
>> clears the fifo.  I didn't look into that yet.
>
> The config function has gone way complicated than I realized.

I know!  It's quite complex and has taken me some time to pretty much
understand.

> But you may need to dig a little bit to make a perfect point
> to insert your change as it might break other platforms: AC97
> and old i.MX and MPC.

The simple change to first enable SSIE, then TX should not cause any
issues I think.  The current code enables TE and SSIE at the same
time, which is clearly against what the datasheet says to do.  I
believe the end of the fsl_ssi_config file in fact is the right place
to do it, regardless of platform or format.  (As long as the fifo &
dima are enabled).

>



>> --------------------------- snip -------------------
>>
>> If instead, I do the SSIEN at the bottom of the fsl_ssi_configure
>> function like this, it works perfectly (in single fifo mode)
>>
>> --------------------------- snip -------------------
>> +++ b/sound/soc/fsl/fsl_ssi.c
>> @@ -435,8 +435,49 @@ static void fsl_ssi_config(struct fsl_ssi_private
>> *ssi_private, bool enable,
>>
>>  config_done:
>>      /* Enabling of subunits is done after configuration */
>> -    if (enable)
>> +    if (enable) {
>> +        /* eanble SSI */
>> +        regmap_update_bits(regs, CCSR_SSI_SCR,
>> +                   CCSR_SSI_SCR_SSIEN,
>> +                   CCSR_SSI_SCR_SSIEN);
>> +        /*
>> +         * We must wait here until the DMA actually manages to
>> +         * get a word into the Tx FIFO.  Only if starting a Tx
>> +         * stream.
>> +         * In tests on an MX6 at 1GHz clock speed, the do
>> +         * loop below never iterated at all (i.e. it dropped
>> +         * through without repeating ever.  which means that
>> +         * the DMA has had time to get some words into the TX
>> +         * buffer. In fact, the tfcnt0 was always 13, so it
>> +         * was quite full by the time it reached this point,
>> +         * so this do loop should never be a bottleneck.  If
>> +         * max iterations is hit, then something might be
>> +         * wrong.  report it in that case.
>> +         */
>> +        int tfcnt0 = 0;
>> +        int tfcnt1 = 1;
>> +        int max_iterations = 1000;
>> +        if (vals->scr & CCSR_SSI_SCR_TE) {
>> +            u32 sfcsr;
>> +            do {
>> +                regmap_read(regs, CCSR_SSI_SFCSR, &sfcsr);
>> +                tfcnt0 = CCSR_SSI_SFCSR_TFCNT0(sfcsr);
>> +                tfcnt1 = CCSR_SSI_SFCSR_TFCNT0(sfcsr);
>> +            } while(max_iterations-- && (tfcnt0 == 0) && (tfcnt1 == 0));
>> +        }
>> +        if (max_iterations <= 0) {
>> +            /*
>> +             * The DMA definitely should have stuck at
>> +             * least a word into the FIFO by now.  Report
>> +             * an error, but continue on blindly anyway,
>> +             * even though the SSI might not start right.
>> +             */
>> +            struct platform_device *pdev = ssi_private->pdev;
>> +            dev_err(&pdev->dev, "max_iterations reached when"
>> +                "starting SSI Tx\n");
>> +        }
>
> Looks like polling is the only way to safely kick off. It's okay.

'polling' is a strong word because it never actually loops :-)
Perhaps 'checking' is a better word.

> But I would like to see how the change will be after merging the
> simultaneous TE/RE work around.

I'm not sure I understand this statement.  What's the simultaneous
TE/RE work around?  It appears that the trigger function *only* does
either one or the other, and not simultaneously. Is there a patch in
the works to make tx/rx happen simultaneously?

-Caleb
Nicolin Chen Nov. 2, 2015, 5:22 p.m. UTC | #15
On Sat, Oct 31, 2015 at 09:22:54AM -0700, Caleb Crome wrote:
 
> > But I would like to see how the change will be after merging the
> > simultaneous TE/RE work around.
> 
> I'm not sure I understand this statement.  What's the simultaneous
> TE/RE work around?  It appears that the trigger function *only* does
> either one or the other, and not simultaneously. Is there a patch in
> the works to make tx/rx happen simultaneously?

I mean those problems Arnaud mentioned.

Patch
diff mbox

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 73778c2..0bb5e52 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -435,8 +475,27 @@  static void fsl_ssi_config(struct fsl_ssi_private
*ssi_private, bool enable,

 config_done:
     /* Enabling of subunits is done after configuration */
-    if (enable)
+    if (enable) {
+        /*
+         * don't enable TE/RE and SSIEN at the same time.
+         * enable SSIEN, but delay enabling of TE to
+         * allow time for DMA buffer to fill.
+         */
+        u32 mask = vals->scr & ~CCSR_SSI_SCR_TE;
+        if (mask != vals->scr) {
+            /* enabling TE in this call.  don't enable it
+             * until some delay after SSIE gets
+             * enabled. */
+            if (vals->scr & ~CCSR_SSI_SCR_TE) {
+                regmap_update_bits(regs, CCSR_SSI_SCR,
+                           mask, vals->scr);
+                udelay(50); /* give the DMA a chance
+                         * to fill the TX buffer
+                         * after SSIE is enabled. */
+            }
+        }
         regmap_update_bits(regs, CCSR_SSI_SCR, vals->scr, vals->scr);
+    }
 }