Message ID | 87zj7hz1w9.fsf@free.fr (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Fri, Mar 13, 2015 at 10:23:50AM +0100, Robert Jarzmik wrote: > Vinod Koul <vinod.koul@intel.com> writes: > > > On Mon, Mar 09, 2015 at 11:18:53PM +0100, Robert Jarzmik wrote: > >> And here comes my question : how do I free the hardware descriptors for tx now ? > >> As I chose to not free them upon tx completion, is there an API call to do it ? > > The problem with this is that we are assuming the > > DMA_PREP_CONTINUE/DMA_PREP_ACK etc is valid for slave. These were defined > > for async_tx. I have added Dan on how they handled it there > For Dan, in [2] a reminder for the usecase, in order to not be lost in the > conversation. > > > But, I think we should not use these for slave transactions. > > > > In your case what is the benefit you get from reusing the descriptors? > In my case, ie. the pxa architecture conversion to dmaengine, I'm converting the > camera capture interface. > > These scatter-gather buffers are huge (several MB, depending on sensor > definition). Moreover the video4linux2 API relies on the fact that you can queue > up a buffer multiple times. > > As these buffers are huge, the preparation is not negligible (a thousand > hardware linked descriptors for 1 frame), and there are usually 6 frames in the > ring. The pxa architecture (at least pxa27x) is quite old, and the cost to > recalculate the descriptors chain is not negligible (2ms to 10ms). Moreover this > time is needed to handle unqueued buffers : store them or compress them, etc ... the dmaengine driver is not expected to do buffer mapping for slave cases, so I am still not clear where the savings are coming from. You can keep the buffer ready or prepare ahead of time and submit when required. > As this is video capture, the real time aspect is important. Missing one frame > here and there makes the video human unfriendly (as opposed to disk/io subsystem > where this is not that important). And preparing buffer ahead of time can help > > Ah and to be more precise about what I'm thinking of [1], this is what I have in > mind. I don't know if it's the way to go, but it will make the query more > concrete. I also have a driver in preparation for pxa architecture with this, > and I want to check if the requirements are met for me to upstream it. > > Cheers. > > Robert > > [1] Patch to support conversation Before this pls read up on this Documentation/crypto/async-tx-api.txt Section 3.3
Vinod Koul <vinod.koul@intel.com> writes: Hi Vinod, > On Fri, Mar 13, 2015 at 10:23:50AM +0100, Robert Jarzmik wrote: >> As these buffers are huge, the preparation is not negligible (a thousand >> hardware linked descriptors for 1 frame), and there are usually 6 frames in the >> ring. The pxa architecture (at least pxa27x) is quite old, and the cost to >> recalculate the descriptors chain is not negligible (2ms to 10ms). Moreover this >> time is needed to handle unqueued buffers : store them or compress them, etc ... > the dmaengine driver is not expected to do buffer mapping for slave cases, > so I am still not clear where the savings are coming from. Why are you talking about "buffer mapping" ? It's not a matter of mapping, it's _hardware_ descriptor calculation, ie. the structures the DMA IP consumes to perform the "linked list chain".. These structures are calculated out of the dma mapping done at driver level (which is done only once, at preparation). > You can keep the buffer ready or prepare ahead of time and submit when > required. Well, as long as the image capture is ongoing for that case, for obvious reasons I can't. As the userspace can dequeue and requeue it at any time, I can't "prepare in advance". It's the userspace which is driving the time, and userspace is not very real-time friendly ... >> As this is video capture, the real time aspect is important. Missing one frame >> here and there makes the video human unfriendly (as opposed to disk/io subsystem >> where this is not that important). > And preparing buffer ahead of time can help As I said above, it's userspace driven. >> Ah and to be more precise about what I'm thinking of [1], this is what I have in >> mind. I don't know if it's the way to go, but it will make the query more >> concrete. I also have a driver in preparation for pxa architecture with this, >> and I want to check if the requirements are met for me to upstream it. >> >> Cheers. >> >> Robert >> >> [1] Patch to support conversation > Before this pls read up on this Documentation/crypto/async-tx-api.txt > Section 3.3 Ah you mean the "ACK" flag is what I'm looking for, right ? I can use that, as long as I can resubmit a tx without "acking" it, which looks fine wrt the documentation you made me read. One more thing, there is another usecase, in which case you'll want to have a release function : - driver does a tx = dmaengine_prep_slave_sg() - driver is forced to shutdown : => dmaengine_terminate_all() The tx was never submitted. How should the resources by freed in that case ? Cheers. -- Robert -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 16, 2015 at 08:05:21AM +0100, Robert Jarzmik wrote: > > On Fri, Mar 13, 2015 at 10:23:50AM +0100, Robert Jarzmik wrote: > > Before this pls read up on this Documentation/crypto/async-tx-api.txt > > Section 3.3 > Ah you mean the "ACK" flag is what I'm looking for, right ? I can use that, as > long as I can resubmit a tx without "acking" it, which looks fine wrt the > documentation you made me read. That is how async_tx API is supposed to use this > > One more thing, there is another usecase, in which case you'll want to have a > release function : > - driver does a tx = dmaengine_prep_slave_sg() > - driver is forced to shutdown : > => dmaengine_terminate_all() > > The tx was never submitted. How should the resources by freed in that case ? well dmaengine_terminate_all() is designed to cleanup the channel. Once this call is invoked, all descriptors in all lists: active, pending, freed are cleaned up. All transactions must be cleaned including current active ones. Thanks
Vinod Koul <vinod.koul@intel.com> writes: > On Mon, Mar 16, 2015 at 08:05:21AM +0100, Robert Jarzmik wrote: >> > On Fri, Mar 13, 2015 at 10:23:50AM +0100, Robert Jarzmik wrote: >> > Before this pls read up on this Documentation/crypto/async-tx-api.txt >> > Section 3.3 >> Ah you mean the "ACK" flag is what I'm looking for, right ? I can use that, as >> long as I can resubmit a tx without "acking" it, which looks fine wrt the >> documentation you made me read. > That is how async_tx API is supposed to use this Great, works for me. >> >> One more thing, there is another usecase, in which case you'll want to have a >> release function : >> - driver does a tx = dmaengine_prep_slave_sg() >> - driver is forced to shutdown : >> => dmaengine_terminate_all() >> >> The tx was never submitted. How should the resources by freed in that case ? > well dmaengine_terminate_all() is designed to cleanup the channel. Once this > call is invoked, all descriptors in all lists: active, pending, freed are > cleaned up. All transactions must be cleaned including current active ones. Ah ok, I understand, so it's the slave driver responsability to keep track of the transaction so that it can be freed. Actually the slave drivers I had a look at don't keep that "track". And that's perfectly in sync with one of my future patches to "virt-dma" which creates another queue "allocated" to enhance the existing submitted, issued and completed ones. I'm almost finished with my new dma driver for pxa architectures, testing it for 1 week on camera flows. Once that is over, I'll submit it, to kill off the dma support in arch/arm/mach-pxa ... Thanks for your time, I have all the information I needed to finish it now. Cheers.
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index b6997a0..b2a975c 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -432,6 +432,8 @@ struct dmaengine_unmap_data { * @chan: target channel for this operation * @tx_submit: accept the descriptor, assign ordered cookie and mark the * descriptor pending. To be pushed on .issue_pending() call + * @tx_release: optional method to release tx resources if the complete() + * was programed to not release the descriptor * @callback: routine to call after this operation is complete * @callback_param: general parameter to pass to the callback routine * ---async_tx api specific fields--- @@ -445,6 +447,7 @@ struct dma_async_tx_descriptor { dma_addr_t phys; struct dma_chan *chan; dma_cookie_t (*tx_submit)(struct dma_async_tx_descriptor *tx); + void (*tx_release)(struct dma_async_tx_descriptor *tx); dma_async_tx_callback callback; void *callback_param; struct dmaengine_unmap_data *unmap; @@ -796,6 +799,12 @@ static inline dma_cookie_t dmaengine_submit(struct dma_async_tx_descriptor *desc return desc->tx_submit(desc); } +static inline void dmaengine_tx_release(struct dma_async_tx_descriptor *desc) +{ + if (desc->tx_release) + desc->tx_release(desc); +} + static inline bool dmaengine_check_align(u8 align, size_t off1, size_t off2, size_t len) { size_t mask;