diff mbox

dmaengine and using tx descriptors multiple times

Message ID 87zj7hz1w9.fsf@free.fr (mailing list archive)
State Not Applicable
Headers show

Commit Message

Robert Jarzmik March 13, 2015, 9:23 a.m. UTC
Vinod Koul <vinod.koul@intel.com> writes:

> On Mon, Mar 09, 2015 at 11:18:53PM +0100, Robert Jarzmik wrote:
>> And here comes my question : how do I free the hardware descriptors for tx now ?
>> As I chose to not free them upon tx completion, is there an API call to do it ?
> The problem with this is that we are assuming the
> DMA_PREP_CONTINUE/DMA_PREP_ACK etc is valid for slave. These were defined
> for async_tx. I have added Dan on how they handled it there
For Dan, in [2] a reminder for the usecase, in order to not be lost in the
conversation.

> But, I think we should not use these for slave transactions.
>
> In your case what is the benefit you get from reusing the descriptors?
In my case, ie. the pxa architecture conversion to dmaengine, I'm converting the
camera capture interface.

These scatter-gather buffers are huge (several MB, depending on sensor
definition). Moreover the video4linux2 API relies on the fact that you can queue
up a buffer multiple times.

As these buffers are huge, the preparation is not negligible (a thousand
hardware linked descriptors for 1 frame), and there are usually 6 frames in the
ring. The pxa architecture (at least pxa27x) is quite old, and the cost to
recalculate the descriptors chain is not negligible (2ms to 10ms). Moreover this
time is needed to handle unqueued buffers : store them or compress them, etc ...

As this is video capture, the real time aspect is important. Missing one frame
here and there makes the video human unfriendly (as opposed to disk/io subsystem
where this is not that important).

Ah and to be more precise about what I'm thinking of [1], this is what I have in
mind. I don't know if it's the way to go, but it will make the query more
concrete. I also have a driver in preparation for pxa architecture with this,
and I want to check if the requirements are met for me to upstream it.

Cheers.

Robert

[1] Patch to support conversation
---<8---

[2] Usecase for Dan to understand the conversation
> My usecase is as follows, from a driver point of view :
>  - tx = dmaengine_prep_slave_sg(dma_chan, sg, sglen, DMA_PREP_CONTINUE);
>  - dma_map_sg(dev, sg, sglen, DMA_DEV_TO_MEM)
>  - dmaengine_submit(tx)
>  - dma_async_issue_pending(dma_chan);
>  - ... wait for tx->callback()
>    => here, because of DMA_PREP_CONTINUE, the hardware descriptors are not
>    freed, as what I need to reuse
> 
>  - dma_sync_sg_for_cpu(dev, sg, sglen, )
>  - copy video buffer somewhere (or not if frame interleaving)
> 
>  - dmaengine_submit(tx)
>  - dma_async_issue_pending(dma_chan)
>    => the video buffer is captured again, so far so good
>  - ... wait for tx->callback()
> 
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Vinod Koul March 16, 2015, 5:28 a.m. UTC | #1
On Fri, Mar 13, 2015 at 10:23:50AM +0100, Robert Jarzmik wrote:
> Vinod Koul <vinod.koul@intel.com> writes:
> 
> > On Mon, Mar 09, 2015 at 11:18:53PM +0100, Robert Jarzmik wrote:
> >> And here comes my question : how do I free the hardware descriptors for tx now ?
> >> As I chose to not free them upon tx completion, is there an API call to do it ?
> > The problem with this is that we are assuming the
> > DMA_PREP_CONTINUE/DMA_PREP_ACK etc is valid for slave. These were defined
> > for async_tx. I have added Dan on how they handled it there
> For Dan, in [2] a reminder for the usecase, in order to not be lost in the
> conversation.
> 
> > But, I think we should not use these for slave transactions.
> >
> > In your case what is the benefit you get from reusing the descriptors?
> In my case, ie. the pxa architecture conversion to dmaengine, I'm converting the
> camera capture interface.
> 
> These scatter-gather buffers are huge (several MB, depending on sensor
> definition). Moreover the video4linux2 API relies on the fact that you can queue
> up a buffer multiple times.
> 
> As these buffers are huge, the preparation is not negligible (a thousand
> hardware linked descriptors for 1 frame), and there are usually 6 frames in the
> ring. The pxa architecture (at least pxa27x) is quite old, and the cost to
> recalculate the descriptors chain is not negligible (2ms to 10ms). Moreover this
> time is needed to handle unqueued buffers : store them or compress them, etc ...
the dmaengine driver is not expected to do buffer mapping for slave cases,
so I am still not clear where the savings are coming from. You can keep the
buffer ready or prepare ahead of time and submit when required.

> As this is video capture, the real time aspect is important. Missing one frame
> here and there makes the video human unfriendly (as opposed to disk/io subsystem
> where this is not that important).
And preparing buffer ahead of time can help
> 
> Ah and to be more precise about what I'm thinking of [1], this is what I have in
> mind. I don't know if it's the way to go, but it will make the query more
> concrete. I also have a driver in preparation for pxa architecture with this,
> and I want to check if the requirements are met for me to upstream it.
> 
> Cheers.
> 
> Robert
> 
> [1] Patch to support conversation
Before this pls read up on this Documentation/crypto/async-tx-api.txt
Section 3.3
Robert Jarzmik March 16, 2015, 7:05 a.m. UTC | #2
Vinod Koul <vinod.koul@intel.com> writes:

Hi Vinod,

> On Fri, Mar 13, 2015 at 10:23:50AM +0100, Robert Jarzmik wrote:
>> As these buffers are huge, the preparation is not negligible (a thousand
>> hardware linked descriptors for 1 frame), and there are usually 6 frames in the
>> ring. The pxa architecture (at least pxa27x) is quite old, and the cost to
>> recalculate the descriptors chain is not negligible (2ms to 10ms). Moreover this
>> time is needed to handle unqueued buffers : store them or compress them, etc ...
> the dmaengine driver is not expected to do buffer mapping for slave cases,
> so I am still not clear where the savings are coming from.

Why are you talking about "buffer mapping" ? It's not a matter of mapping, it's
_hardware_ descriptor calculation, ie. the structures the DMA IP consumes to
perform the "linked list chain".. These structures are calculated out of the dma
mapping done at driver level (which is done only once, at preparation).

> You can keep the buffer ready or prepare ahead of time and submit when
> required.
Well, as long as the image capture is ongoing for that case, for obvious reasons
I can't. As the userspace can dequeue and requeue it at any time, I can't
"prepare in advance". It's the userspace which is driving the time, and
userspace is not very real-time friendly ...

>> As this is video capture, the real time aspect is important. Missing one frame
>> here and there makes the video human unfriendly (as opposed to disk/io subsystem
>> where this is not that important).
> And preparing buffer ahead of time can help
As I said above, it's userspace driven.

>> Ah and to be more precise about what I'm thinking of [1], this is what I have in
>> mind. I don't know if it's the way to go, but it will make the query more
>> concrete. I also have a driver in preparation for pxa architecture with this,
>> and I want to check if the requirements are met for me to upstream it.
>> 
>> Cheers.
>> 
>> Robert
>> 
>> [1] Patch to support conversation
> Before this pls read up on this Documentation/crypto/async-tx-api.txt
> Section 3.3
Ah you mean the "ACK" flag is what I'm looking for, right ? I can use that, as
long as I can resubmit a tx without "acking" it, which looks fine wrt the
documentation you made me read.

One more thing, there is another usecase, in which case you'll want to have a
release function :
 - driver does a tx = dmaengine_prep_slave_sg()
 - driver is forced to shutdown :
   => dmaengine_terminate_all()

The tx was never submitted. How should the resources by freed in that case ?

Cheers.

--
Robert
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vinod Koul March 18, 2015, 4:25 p.m. UTC | #3
On Mon, Mar 16, 2015 at 08:05:21AM +0100, Robert Jarzmik wrote:
> > On Fri, Mar 13, 2015 at 10:23:50AM +0100, Robert Jarzmik wrote:
> > Before this pls read up on this Documentation/crypto/async-tx-api.txt
> > Section 3.3
> Ah you mean the "ACK" flag is what I'm looking for, right ? I can use that, as
> long as I can resubmit a tx without "acking" it, which looks fine wrt the
> documentation you made me read.
That is how async_tx API is supposed to use this

> 
> One more thing, there is another usecase, in which case you'll want to have a
> release function :
>  - driver does a tx = dmaengine_prep_slave_sg()
>  - driver is forced to shutdown :
>    => dmaengine_terminate_all()
> 
> The tx was never submitted. How should the resources by freed in that case ?
well dmaengine_terminate_all() is designed to cleanup the channel. Once this
call is invoked, all descriptors in all lists: active, pending, freed are
cleaned up. All transactions must be cleaned including current active ones.

Thanks
Jarzmik, Robert March 18, 2015, 4:48 p.m. UTC | #4
Vinod Koul <vinod.koul@intel.com> writes:

> On Mon, Mar 16, 2015 at 08:05:21AM +0100, Robert Jarzmik wrote:
>> > On Fri, Mar 13, 2015 at 10:23:50AM +0100, Robert Jarzmik wrote:
>> > Before this pls read up on this Documentation/crypto/async-tx-api.txt
>> > Section 3.3
>> Ah you mean the "ACK" flag is what I'm looking for, right ? I can use that, as
>> long as I can resubmit a tx without "acking" it, which looks fine wrt the
>> documentation you made me read.
> That is how async_tx API is supposed to use this
Great, works for me.

>> 
>> One more thing, there is another usecase, in which case you'll want to have a
>> release function :
>>  - driver does a tx = dmaengine_prep_slave_sg()
>>  - driver is forced to shutdown :
>>    => dmaengine_terminate_all()
>> 
>> The tx was never submitted. How should the resources by freed in that case ?
> well dmaengine_terminate_all() is designed to cleanup the channel. Once this
> call is invoked, all descriptors in all lists: active, pending, freed are
> cleaned up. All transactions must be cleaned including current active ones.

Ah ok, I understand, so it's the slave driver responsability to keep track of
the transaction so that it can be freed. Actually the slave drivers I had a look
at don't keep that "track".

And that's perfectly in sync with one of my future patches to "virt-dma" which
creates another queue "allocated" to enhance the existing submitted, issued and
completed ones.

I'm almost finished with my new dma driver for pxa architectures, testing it for
1 week on camera flows. Once that is over, I'll submit it, to kill off the dma
support in arch/arm/mach-pxa ...

Thanks for your time, I have all the information I needed to finish it now.

Cheers.
diff mbox

Patch

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index b6997a0..b2a975c 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -432,6 +432,8 @@  struct dmaengine_unmap_data {
  * @chan: target channel for this operation
  * @tx_submit: accept the descriptor, assign ordered cookie and mark the
  * descriptor pending. To be pushed on .issue_pending() call
+ * @tx_release: optional method to release tx resources if the complete()
+ *	was programed to not release the descriptor
  * @callback: routine to call after this operation is complete
  * @callback_param: general parameter to pass to the callback routine
  * ---async_tx api specific fields---
@@ -445,6 +447,7 @@  struct dma_async_tx_descriptor {
 	dma_addr_t phys;
 	struct dma_chan *chan;
 	dma_cookie_t (*tx_submit)(struct dma_async_tx_descriptor *tx);
+	void (*tx_release)(struct dma_async_tx_descriptor *tx);
 	dma_async_tx_callback callback;
 	void *callback_param;
 	struct dmaengine_unmap_data *unmap;
@@ -796,6 +799,12 @@  static inline dma_cookie_t dmaengine_submit(struct dma_async_tx_descriptor *desc
 	return desc->tx_submit(desc);
 }
 
+static inline void dmaengine_tx_release(struct dma_async_tx_descriptor *desc)
+{
+	if (desc->tx_release)
+		desc->tx_release(desc);
+}
+
 static inline bool dmaengine_check_align(u8 align, size_t off1, size_t off2, size_t len)
 {
 	size_t mask;