diff mbox

[RFC] dmaengine: omap-dma: Allow DMA controller to prefetch data

Message ID 20121018222046.GA28541@animalcreek.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mark Greer Oct. 18, 2012, 10:20 p.m. UTC
Enable DMA prefetching by setting the 'OMAP_DMA_DST_SYNC_PREFETCH'
flag whenever there is a destination synchronized DMA transfer.
Prefetching is not allowed on source synchronized DMA transfers.

Enabling prefetch significantly improves DMA performance.
For example, running 'modprobe tcrypt sec=2 mode=403' which
exercises the omap-sham driver on an am37x EVM yeilds the
following results:

a) With prefetch disabled

testing speed of async sha1
test  0 (   16 byte blocks,   16 bytes per update,   1 updates):  24049 opers/sec,    384784 bytes/sec
test  1 (   64 byte blocks,   16 bytes per update,   4 updates):  22030 opers/sec,   1409920 bytes/sec
test  2 (   64 byte blocks,   64 bytes per update,   1 updates):  24055 opers/sec,   1539520 bytes/sec
test  3 (  256 byte blocks,   16 bytes per update,  16 updates):   7648 opers/sec,   1958016 bytes/sec
test  4 (  256 byte blocks,   64 bytes per update,   4 updates):   7918 opers/sec,   2027008 bytes/sec
test  5 (  256 byte blocks,  256 bytes per update,   1 updates):   8000 opers/sec,   2048000 bytes/sec
test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates):   3295 opers/sec,   3374080 bytes/sec
test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates):   3602 opers/sec,   3688960 bytes/sec
test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):   3753 opers/sec,   3843072 bytes/sec
test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates):   3239 opers/sec,   6633472 bytes/sec
test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates):   3557 opers/sec,   7284736 bytes/sec
test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates):   3591 opers/sec,   7354368 bytes/sec
test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):   3598 opers/sec,   7369728 bytes/sec
test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates):   1751 opers/sec,   7174144 bytes/sec
test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates):   2302 opers/sec,   9431040 bytes/sec
test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates):   2087 opers/sec,   8548352 bytes/sec
test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates):   2050 opers/sec,   8398848 bytes/sec
test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates):    864 opers/sec,   7077888 bytes/sec
test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates):    993 opers/sec,   8138752 bytes/sec
test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates):    936 opers/sec,   7671808 bytes/sec
test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates):   1048 opers/sec,   8589312 bytes/sec
test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates):   1274 opers/sec,  10436608 bytes/sec

b) With prefetch enabled

testing speed of async sha1
test  0 (   16 byte blocks,   16 bytes per update,   1 updates):  23868 opers/sec,    381888 bytes/sec
test  1 (   64 byte blocks,   16 bytes per update,   4 updates):  21928 opers/sec,   1403424 bytes/sec
test  2 (   64 byte blocks,   64 bytes per update,   1 updates):  23910 opers/sec,   1530272 bytes/sec
test  3 (  256 byte blocks,   16 bytes per update,  16 updates):   7664 opers/sec,   1962112 bytes/sec
test  4 (  256 byte blocks,   64 bytes per update,   4 updates):   7924 opers/sec,   2028672 bytes/sec
test  5 (  256 byte blocks,  256 bytes per update,   1 updates):   8006 opers/sec,   2049536 bytes/sec
test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates):   3276 opers/sec,   3355136 bytes/sec
test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates):   3856 opers/sec,   3949056 bytes/sec
test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):   3634 opers/sec,   3721728 bytes/sec
test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates):   3257 opers/sec,   6670336 bytes/sec
test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates):   3604 opers/sec,   7380992 bytes/sec
test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates):   3604 opers/sec,   7380992 bytes/sec
test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):   3624 opers/sec,   7422976 bytes/sec
test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates):   2698 opers/sec,  11051008 bytes/sec
test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates):   3500 opers/sec,  14336000 bytes/sec
test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates):   3596 opers/sec,  14729216 bytes/sec
test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates):   3588 opers/sec,  14698496 bytes/sec
test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates):   1319 opers/sec,  10809344 bytes/sec
test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates):   1550 opers/sec,  12701696 bytes/sec
test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates):   1164 opers/sec,   9539584 bytes/sec
test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates):   1802 opers/sec,  14766080 bytes/sec
test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates):   1720 opers/sec,  14094336 bytes/sec

CC: Peter Ujfalusi <peter.ujfalusi@ti.com>
CC: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
---

This patch seems fairly stable but I've only tested omap-sham (crypto)
and omap_hsmmc (mmc) on an am37x EVM.  I also enabled burst mode but
that made the system unstable when exercising either omap-sham or
omap_hsmmc.  I'm unaware of any errata that would make this an unwanted
modification but I haven't checked all of the SoCs.  Are there other
reasons that this should be applied??

The different types of hardware that I have is somewhat limited so if
you have some different platforms/SoCs, please give this patch a try.
It should apply cleanly against recent k.o. kernels.

Note that the current omap-sham driver doesn't use the dmaengine API
but I have a set of patches to convert it which is what I used when
testing.  I will submit those patches once they're ready (next day or so).
Also note that an am37xx GP actually does have sham hardware and yours
might too if you look closely.  If so, you'll have hack omap_sham_mod_init()
to use it.

Thanks,

Mark

 drivers/dma/omap-dma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Mark Greer Oct. 18, 2012, 10:46 p.m. UTC | #1
On Thu, Oct 18, 2012 at 03:20:46PM -0700, Mark A. Greer wrote:

> This patch seems fairly stable but I've only tested omap-sham (crypto)
> and omap_hsmmc (mmc) on an am37x EVM.  I also enabled burst mode but
> that made the system unstable when exercising either omap-sham or
> omap_hsmmc.  I'm unaware of any errata that would make this an unwanted
> modification but I haven't checked all of the SoCs.  Are there other
> reasons that this should be applied??
                          ^not
Russell King - ARM Linux Oct. 18, 2012, 10:55 p.m. UTC | #2
On Thu, Oct 18, 2012 at 03:20:46PM -0700, Mark A. Greer wrote:
> This patch seems fairly stable but I've only tested omap-sham (crypto)
> and omap_hsmmc (mmc) on an am37x EVM.  I also enabled burst mode but
> that made the system unstable when exercising either omap-sham or
> omap_hsmmc.  I'm unaware of any errata that would make this an unwanted
> modification but I haven't checked all of the SoCs.  Are there other
> reasons that this should be applied??

It definitely needs checking with audio, because it will affect the
pointer position in relation to audio output, and it will have an
effect on how much audio data is lost over a pause/resume event.

Unfortunately, the OMAP DMA hardware has no way to do a proper "pause",
it can only do a "stop" which involves dumping its FIFOs on the floor
in the case of anything but a DEV->MEM transfer.  So the more data
held in the DMA hardware, the more is lost on pause.
Mark Greer Oct. 18, 2012, 11:24 p.m. UTC | #3
On Thu, Oct 18, 2012 at 11:55:40PM +0100, Russell King - ARM Linux wrote:
> On Thu, Oct 18, 2012 at 03:20:46PM -0700, Mark A. Greer wrote:
> > This patch seems fairly stable but I've only tested omap-sham (crypto)
> > and omap_hsmmc (mmc) on an am37x EVM.  I also enabled burst mode but
> > that made the system unstable when exercising either omap-sham or
> > omap_hsmmc.  I'm unaware of any errata that would make this an unwanted
> > modification but I haven't checked all of the SoCs.  Are there other
> > reasons that this should be applied??
> 
> It definitely needs checking with audio, because it will affect the
> pointer position in relation to audio output, and it will have an
> effect on how much audio data is lost over a pause/resume event.
> 
> Unfortunately, the OMAP DMA hardware has no way to do a proper "pause",
> it can only do a "stop" which involves dumping its FIFOs on the floor
> in the case of anything but a DEV->MEM transfer.  So the more data
> held in the DMA hardware, the more is lost on pause.

Hmm, interesting.

Is there a way to tweak DMA params like this on a per logical channel
basis using the dmaengine API?  I don't see any but I could have missed it.

If not, are you open to adding such a thing (e.g., extend 'enum dma_ctrl_flags'
with DMA_ENABL_PREFETCH)?

Mark
--
Russell King - ARM Linux Oct. 18, 2012, 11:33 p.m. UTC | #4
On Thu, Oct 18, 2012 at 04:24:05PM -0700, Mark A. Greer wrote:
> On Thu, Oct 18, 2012 at 11:55:40PM +0100, Russell King - ARM Linux wrote:
> > On Thu, Oct 18, 2012 at 03:20:46PM -0700, Mark A. Greer wrote:
> > > This patch seems fairly stable but I've only tested omap-sham (crypto)
> > > and omap_hsmmc (mmc) on an am37x EVM.  I also enabled burst mode but
> > > that made the system unstable when exercising either omap-sham or
> > > omap_hsmmc.  I'm unaware of any errata that would make this an unwanted
> > > modification but I haven't checked all of the SoCs.  Are there other
> > > reasons that this should be applied??
> > 
> > It definitely needs checking with audio, because it will affect the
> > pointer position in relation to audio output, and it will have an
> > effect on how much audio data is lost over a pause/resume event.
> > 
> > Unfortunately, the OMAP DMA hardware has no way to do a proper "pause",
> > it can only do a "stop" which involves dumping its FIFOs on the floor
> > in the case of anything but a DEV->MEM transfer.  So the more data
> > held in the DMA hardware, the more is lost on pause.
> 
> Hmm, interesting.
> 
> Is there a way to tweak DMA params like this on a per logical channel
> basis using the dmaengine API?  I don't see any but I could have missed it.
> 
> If not, are you open to adding such a thing (e.g., extend 'enum dma_ctrl_flags'
> with DMA_ENABL_PREFETCH)?

I would suggest getting some feedback from the ASoC people first, before
trying to invent new APIs to work around this stuff.  If they can live
with having prefetch enabled on OMAP then there isn't an issue here.  If
not, we need a solution to this.

I do not believe that precisely stopping and starting playback across a
suspend/resume event is really necessary (it's desirable but the world
doesn't collapse if you miss a few samples.)  It could be more of an
issue for pause/resume though, but as I say, that's for ASoC people to
comment on.

I'm merely pointing out here that we need their feedback here before
deciding if there's anything further that needs to happen.
Mark Greer Oct. 18, 2012, 11:50 p.m. UTC | #5
On Fri, Oct 19, 2012 at 12:33:35AM +0100, Russell King - ARM Linux wrote:
> On Thu, Oct 18, 2012 at 04:24:05PM -0700, Mark A. Greer wrote:
> > On Thu, Oct 18, 2012 at 11:55:40PM +0100, Russell King - ARM Linux wrote:
> > > On Thu, Oct 18, 2012 at 03:20:46PM -0700, Mark A. Greer wrote:
> > > > This patch seems fairly stable but I've only tested omap-sham (crypto)
> > > > and omap_hsmmc (mmc) on an am37x EVM.  I also enabled burst mode but
> > > > that made the system unstable when exercising either omap-sham or
> > > > omap_hsmmc.  I'm unaware of any errata that would make this an unwanted
> > > > modification but I haven't checked all of the SoCs.  Are there other
> > > > reasons that this should be applied??
> > > 
> > > It definitely needs checking with audio, because it will affect the
> > > pointer position in relation to audio output, and it will have an
> > > effect on how much audio data is lost over a pause/resume event.
> > > 
> > > Unfortunately, the OMAP DMA hardware has no way to do a proper "pause",
> > > it can only do a "stop" which involves dumping its FIFOs on the floor
> > > in the case of anything but a DEV->MEM transfer.  So the more data
> > > held in the DMA hardware, the more is lost on pause.
> > 
> > Hmm, interesting.
> > 
> > Is there a way to tweak DMA params like this on a per logical channel
> > basis using the dmaengine API?  I don't see any but I could have missed it.
> > 
> > If not, are you open to adding such a thing (e.g., extend 'enum dma_ctrl_flags'
> > with DMA_ENABL_PREFETCH)?
> 
> I would suggest getting some feedback from the ASoC people first, before
> trying to invent new APIs to work around this stuff.  If they can live
> with having prefetch enabled on OMAP then there isn't an issue here.  If
> not, we need a solution to this.
> 
> I do not believe that precisely stopping and starting playback across a
> suspend/resume event is really necessary (it's desirable but the world
> doesn't collapse if you miss a few samples.)  It could be more of an
> issue for pause/resume though, but as I say, that's for ASoC people to
> comment on.
> 
> I'm merely pointing out here that we need their feedback here before
> deciding if there's anything further that needs to happen.

Thanks, I will do that.
Peter Ujfalusi Oct. 19, 2012, 12:45 p.m. UTC | #6
Hi,

On 10/19/2012 01:33 AM, Russell King - ARM Linux wrote:
> I would suggest getting some feedback from the ASoC people first, before
> trying to invent new APIs to work around this stuff.  If they can live
> with having prefetch enabled on OMAP then there isn't an issue here.  If
> not, we need a solution to this.
> 
> I do not believe that precisely stopping and starting playback across a
> suspend/resume event is really necessary (it's desirable but the world
> doesn't collapse if you miss a few samples.)  It could be more of an
> issue for pause/resume though, but as I say, that's for ASoC people to
> comment on.

There is another issue with the prefetch in audio:
we tend to like to know the position of the DMA and also to know how much data
we have stored in buffers, FIFOs. This information is used by userspace to do
echo cancellation and also used by PA for example to do runtime mixing
directly in the audio buffer. We have means to extract this information from
McBSP for example (and from tlv320dac33 codec) but AFAIK this information can
not be retrieved from sDMA.
We could assume that the sDMA FIFO is kept full and report that as a 'delay'
or do not account this information.

For now I think the cyclic mode should not set the prefetch. If I recall right
the cyclic mode is only used by audio at the moment.

> I'm merely pointing out here that we need their feedback here before
> deciding if there's anything further that needs to happen.

Thanks Russell, I'll take a look at the implication of the prefetch for audio.
Mark Greer Nov. 5, 2012, 10:06 p.m. UTC | #7
On Fri, Oct 19, 2012 at 02:45:55PM +0200, Péter Ujfalusi wrote:
> Hi,
> 
> On 10/19/2012 01:33 AM, Russell King - ARM Linux wrote:
> > I would suggest getting some feedback from the ASoC people first, before
> > trying to invent new APIs to work around this stuff.  If they can live
> > with having prefetch enabled on OMAP then there isn't an issue here.  If
> > not, we need a solution to this.
> > 
> > I do not believe that precisely stopping and starting playback across a
> > suspend/resume event is really necessary (it's desirable but the world
> > doesn't collapse if you miss a few samples.)  It could be more of an
> > issue for pause/resume though, but as I say, that's for ASoC people to
> > comment on.
> 
> There is another issue with the prefetch in audio:
> we tend to like to know the position of the DMA and also to know how much data
> we have stored in buffers, FIFOs. This information is used by userspace to do
> echo cancellation and also used by PA for example to do runtime mixing
> directly in the audio buffer. We have means to extract this information from
> McBSP for example (and from tlv320dac33 codec) but AFAIK this information can
> not be retrieved from sDMA.
> We could assume that the sDMA FIFO is kept full and report that as a 'delay'
> or do not account this information.
> 
> For now I think the cyclic mode should not set the prefetch. If I recall right
> the cyclic mode is only used by audio at the moment.
> 
> > I'm merely pointing out here that we need their feedback here before
> > deciding if there's anything further that needs to happen.
> 
> Thanks Russell, I'll take a look at the implication of the prefetch for audio.

Ping?
Russell King - ARM Linux Nov. 17, 2012, 11:31 a.m. UTC | #8
On Fri, Oct 19, 2012 at 02:45:55PM +0200, Péter Ujfalusi wrote:
> Hi,
> 
> On 10/19/2012 01:33 AM, Russell King - ARM Linux wrote:
> > I'm merely pointing out here that we need their feedback here before
> > deciding if there's anything further that needs to happen.
> 
> Thanks Russell, I'll take a look at the implication of the prefetch for audio.

Péter, any news?
Peter Ujfalusi Nov. 19, 2012, 10 a.m. UTC | #9
On 11/17/2012 12:31 PM, Russell King - ARM Linux wrote:
> On Fri, Oct 19, 2012 at 02:45:55PM +0200, Péter Ujfalusi wrote:
>> Hi,
>>
>> On 10/19/2012 01:33 AM, Russell King - ARM Linux wrote:
>>> I'm merely pointing out here that we need their feedback here before
>>> deciding if there's anything further that needs to happen.
>>
>> Thanks Russell, I'll take a look at the implication of the prefetch for audio.
> 
> Péter, any news?

Sorry, I was carried away with other things...
I did some testing with and without the dma prefetch for audio (on BeagleBoard
with mplayer: -ao alsa/pulse).
When we have prefetch enabled we tend to resume after pause from a slightly
off place than when the prefetch is disabled.
I also remember that in n9 we have had some issue with the DMA prefetch.

So I would for sure enable the prefetch for non cyclic DMA.
For the cyclic I would not enable it for now. I need to dig a bit deeper in
DMA/McBSP to have better view on the issue.

I have one comment to the patch itself as well.
Peter Ujfalusi Nov. 19, 2012, 10:01 a.m. UTC | #10
Hi,

On 10/19/2012 12:20 AM, Mark A. Greer wrote:
> Enable DMA prefetching by setting the 'OMAP_DMA_DST_SYNC_PREFETCH'
> flag whenever there is a destination synchronized DMA transfer.
> Prefetching is not allowed on source synchronized DMA transfers.
> 
> Enabling prefetch significantly improves DMA performance.
> For example, running 'modprobe tcrypt sec=2 mode=403' which
> exercises the omap-sham driver on an am37x EVM yeilds the
> following results:

>  drivers/dma/omap-dma.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
> index bb2d8e7..aadddb2 100644
> --- a/drivers/dma/omap-dma.c
> +++ b/drivers/dma/omap-dma.c
> @@ -310,7 +310,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
>  		dev_addr = c->cfg.dst_addr;
>  		dev_width = c->cfg.dst_addr_width;
>  		burst = c->cfg.dst_maxburst;
> -		sync_type = OMAP_DMA_DST_SYNC;
> +		sync_type = OMAP_DMA_DST_SYNC | OMAP_DMA_DST_SYNC_PREFETCH;

This should be:
 -		sync_type = OMAP_DMA_DST_SYNC;
 +		sync_type = OMAP_DMA_DST_SYNC_PREFETCH;


>  	} else {
>  		dev_err(chan->device->dev, "%s: bad direction?\n", __func__);
>  		return NULL;
> @@ -387,7 +387,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_dma_cyclic(
>  		dev_addr = c->cfg.dst_addr;
>  		dev_width = c->cfg.dst_addr_width;
>  		burst = c->cfg.dst_maxburst;
> -		sync_type = OMAP_DMA_DST_SYNC;
> +		sync_type = OMAP_DMA_DST_SYNC | OMAP_DMA_DST_SYNC_PREFETCH;

We should not enable the prefetch for cyclic right now. We will investigate it
more.

>  	} else {
>  		dev_err(chan->device->dev, "%s: bad direction?\n", __func__);
>  		return NULL;
>
Russell King - ARM Linux Nov. 19, 2012, 12:16 p.m. UTC | #11
On Mon, Nov 19, 2012 at 11:00:15AM +0100, Peter Ujfalusi wrote:
> When we have prefetch enabled we tend to resume after pause from a slightly
> off place than when the prefetch is disabled.

That is exactly what I was expecting would happen having read how the
DMA works.

With memory->device transfers, if you stop them (there is no pause), the
DMA hardware discards its buffers.  So if you allow it to prefetch data,
you're going to throw more data away when you stop it.

For device->memory transfers, at least it will write the contents back to
memory before stopping.
Mark Greer Nov. 19, 2012, 4:19 p.m. UTC | #12
On Mon, Nov 19, 2012 at 11:01:44AM +0100, Peter Ujfalusi wrote:
> Hi,
> 
> On 10/19/2012 12:20 AM, Mark A. Greer wrote:
> > Enable DMA prefetching by setting the 'OMAP_DMA_DST_SYNC_PREFETCH'
> > flag whenever there is a destination synchronized DMA transfer.
> > Prefetching is not allowed on source synchronized DMA transfers.
> > 
> > Enabling prefetch significantly improves DMA performance.
> > For example, running 'modprobe tcrypt sec=2 mode=403' which
> > exercises the omap-sham driver on an am37x EVM yeilds the
> > following results:
> 
> >  drivers/dma/omap-dma.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
> > index bb2d8e7..aadddb2 100644
> > --- a/drivers/dma/omap-dma.c
> > +++ b/drivers/dma/omap-dma.c
> > @@ -310,7 +310,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
> >  		dev_addr = c->cfg.dst_addr;
> >  		dev_width = c->cfg.dst_addr_width;
> >  		burst = c->cfg.dst_maxburst;
> > -		sync_type = OMAP_DMA_DST_SYNC;
> > +		sync_type = OMAP_DMA_DST_SYNC | OMAP_DMA_DST_SYNC_PREFETCH;
> 
> This should be:
>  -		sync_type = OMAP_DMA_DST_SYNC;
>  +		sync_type = OMAP_DMA_DST_SYNC_PREFETCH;

Ah, right.  I will fix.

> >  	} else {
> >  		dev_err(chan->device->dev, "%s: bad direction?\n", __func__);
> >  		return NULL;
> > @@ -387,7 +387,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_dma_cyclic(
> >  		dev_addr = c->cfg.dst_addr;
> >  		dev_width = c->cfg.dst_addr_width;
> >  		burst = c->cfg.dst_maxburst;
> > -		sync_type = OMAP_DMA_DST_SYNC;
> > +		sync_type = OMAP_DMA_DST_SYNC | OMAP_DMA_DST_SYNC_PREFETCH;
> 
> We should not enable the prefetch for cyclic right now. We will investigate it
> more.

OK.  I'll send a patch in a few minutes.

Mark
--
Mark Greer Nov. 19, 2012, 4:49 p.m. UTC | #13
On Mon, Nov 19, 2012 at 09:19:15AM -0700, Mark A. Greer wrote:
> On Mon, Nov 19, 2012 at 11:01:44AM +0100, Peter Ujfalusi wrote:
> > Hi,
> > 
> > On 10/19/2012 12:20 AM, Mark A. Greer wrote:
> > > Enable DMA prefetching by setting the 'OMAP_DMA_DST_SYNC_PREFETCH'
> > > flag whenever there is a destination synchronized DMA transfer.
> > > Prefetching is not allowed on source synchronized DMA transfers.
> > > 
> > > Enabling prefetch significantly improves DMA performance.
> > > For example, running 'modprobe tcrypt sec=2 mode=403' which
> > > exercises the omap-sham driver on an am37x EVM yeilds the
> > > following results:
> > 
> > >  drivers/dma/omap-dma.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
> > > index bb2d8e7..aadddb2 100644
> > > --- a/drivers/dma/omap-dma.c
> > > +++ b/drivers/dma/omap-dma.c
> > > @@ -310,7 +310,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
> > >  		dev_addr = c->cfg.dst_addr;
> > >  		dev_width = c->cfg.dst_addr_width;
> > >  		burst = c->cfg.dst_maxburst;
> > > -		sync_type = OMAP_DMA_DST_SYNC;
> > > +		sync_type = OMAP_DMA_DST_SYNC | OMAP_DMA_DST_SYNC_PREFETCH;
> > 
> > This should be:
> >  -		sync_type = OMAP_DMA_DST_SYNC;
> >  +		sync_type = OMAP_DMA_DST_SYNC_PREFETCH;
> 
> Ah, right.  I will fix.
> 
> > >  	} else {
> > >  		dev_err(chan->device->dev, "%s: bad direction?\n", __func__);
> > >  		return NULL;
> > > @@ -387,7 +387,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_dma_cyclic(
> > >  		dev_addr = c->cfg.dst_addr;
> > >  		dev_width = c->cfg.dst_addr_width;
> > >  		burst = c->cfg.dst_maxburst;
> > > -		sync_type = OMAP_DMA_DST_SYNC;
> > > +		sync_type = OMAP_DMA_DST_SYNC | OMAP_DMA_DST_SYNC_PREFETCH;
> > 
> > We should not enable the prefetch for cyclic right now. We will investigate it
> > more.
> 
> OK.  I'll send a patch in a few minutes.

https://patchwork.kernel.org/patch/1765941/
diff mbox

Patch

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index bb2d8e7..aadddb2 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -310,7 +310,7 @@  static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
 		dev_addr = c->cfg.dst_addr;
 		dev_width = c->cfg.dst_addr_width;
 		burst = c->cfg.dst_maxburst;
-		sync_type = OMAP_DMA_DST_SYNC;
+		sync_type = OMAP_DMA_DST_SYNC | OMAP_DMA_DST_SYNC_PREFETCH;
 	} else {
 		dev_err(chan->device->dev, "%s: bad direction?\n", __func__);
 		return NULL;
@@ -387,7 +387,7 @@  static struct dma_async_tx_descriptor *omap_dma_prep_dma_cyclic(
 		dev_addr = c->cfg.dst_addr;
 		dev_width = c->cfg.dst_addr_width;
 		burst = c->cfg.dst_maxburst;
-		sync_type = OMAP_DMA_DST_SYNC;
+		sync_type = OMAP_DMA_DST_SYNC | OMAP_DMA_DST_SYNC_PREFETCH;
 	} else {
 		dev_err(chan->device->dev, "%s: bad direction?\n", __func__);
 		return NULL;