Message ID | 1441539654-19055-1-git-send-email-robert.jarzmik@free.fr (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Robert Jarzmik <robert.jarzmik@free.fr> writes: > This patch attempts to enhance the case of a transfer submitted multiple > times, and where the cost of creating the descriptors chain is not > negligible. > > This happens with big video buffers (several megabytes, ie. several > thousands of linked descriptors in one scatter-gather list). In these > cases, a video driver would want to do : > - tx = dmaengine_prep_slave_sg() > - dma_engine_submit(tx); > - dma_async_issue_pending() > - wait for video completion > - read video data (or not, skipping a frame is also possible) > - dma_engine_submit(tx) > => here, the descriptors chain recalculation will take time > => the dma coherent allocation over and over might create holes in > the dma pool, which is counter-productive. > - dma_async_issue_pending() > - etc ... > > In order to cope with this case, virt-dma is modified to prevent freeing > the descriptors upon completion if DMA_CTRL_REUSE flag is set in the > transfer. > > This patch is a respin of the former DMA_CTRL_ACK approach, which was > reverted due to a regression in audio drivers. > > Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Hi Jun, Lars-Peter and Vinod, The revert of the former patch of this type, 8c8fe97b2b8a ("Revert "dmaengine: virt-dma: don't always free descriptor upon completion"") broke pxa_dma driver. The reason behind is that pxa_dma was designed, upon transfer submission (in pxad_tx_submit()), to "move" the virtual descriptor to the submitted list, rather than adding it at its tail (because it was previously on the allocated list). As a consequence, the list_head is not initialized, pxa_dma fails, and the current status is that pxa_dma Oopses in linux-next, and therefore in Linus's tree once the merge window is closed. I'd really like to have this patch as the fix, ie. that at transfer preparation in vchan_tx_prep(), the virtual descriptor is added on "allocated" list tail. But to be sure there is no regression on other platforms, I need a test, especially from someone who suffered from the former version of this patch (where we had DMA_CTRL_ACK instead of DMA_CTRL_REUSE). Could anybody give it a try, as I need a fix to go in the early -rc of v4.3 please ? Cheers.
Robert Jarzmik <robert.jarzmik@free.fr> writes: > Robert Jarzmik <robert.jarzmik@free.fr> writes: > Hi Jun, Lars-Peter and Vinod, > > The revert of the former patch of this type, 8c8fe97b2b8a ("Revert "dmaengine: > virt-dma: don't always free descriptor upon completion"") broke pxa_dma > driver. So ... what's the plan, Vinod ? I need to fix pxa-dma driver in the early -rc stages, how does it work with your schedule ? Cheers. -- Robert -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Sep 06, 2015 at 01:40:52PM +0200, Robert Jarzmik wrote: > @@ -29,7 +29,7 @@ dma_cookie_t vchan_tx_submit(struct dma_async_tx_descriptor *tx) > spin_lock_irqsave(&vc->lock, flags); > cookie = dma_cookie_assign(tx); > > - list_add_tail(&vd->node, &vc->desc_submitted); > + list_move_tail(&vd->node, &vc->desc_submitted); I am not sure I follow this, why move ? > +int vchan_tx_desc_free(struct dma_async_tx_descriptor *tx) > +{ > + struct virt_dma_chan *vc = to_virt_chan(tx->chan); > + struct virt_dma_desc *vd = to_virt_desc(tx); > + unsigned long flags; > + > + spin_lock_irqsave(&vc->lock, flags); > + list_del(&vd->node); > + spin_unlock_irqrestore(&vc->lock, flags); > + > + dev_dbg(vc->chan.device->dev, "vchan %p: txd %p[%x]: freeing\n", > + vc, vd, vd->tx.cookie); > + vc->desc_free(vd); > + return 0; > +} > +EXPORT_SYMBOL_GPL(vchan_tx_desc_free); this one seems okay, can you add comments here and also update documentation for this > @@ -96,9 +115,13 @@ void vchan_dma_desc_free_list(struct virt_dma_chan *vc, struct list_head *head) > while (!list_empty(head)) { > struct virt_dma_desc *vd = list_first_entry(head, > struct virt_dma_desc, node); > - list_del(&vd->node); > - dev_dbg(vc->chan.device->dev, "txd %p: freeing\n", vd); > - vc->desc_free(vd); > + if (dmaengine_desc_test_reuse(&vd->tx)) { > + list_move_tail(&vd->node, &vc->desc_allocated); should we check this if the list is to be freed? Why would anyone call free except when cleaning up ? > static inline void vchan_free_chan_resources(struct virt_dma_chan *vc) > { > + struct virt_dma_desc *vd; > unsigned long flags; > LIST_HEAD(head); > > spin_lock_irqsave(&vc->lock, flags); > vchan_get_all_descriptors(vc, &head); > + list_for_each_entry(vd, &head, node) > + dmaengine_desc_clear_reuse(&vd->tx); why do we want to do this, if we are freeing them
Vinod Koul <vinod.koul@intel.com> writes: > On Sun, Sep 06, 2015 at 01:40:52PM +0200, Robert Jarzmik wrote: >> @@ -29,7 +29,7 @@ dma_cookie_t vchan_tx_submit(struct dma_async_tx_descriptor *tx) >> spin_lock_irqsave(&vc->lock, flags); >> cookie = dma_cookie_assign(tx); >> >> - list_add_tail(&vd->node, &vc->desc_submitted); >> + list_move_tail(&vd->node, &vc->desc_submitted); > > I am not sure I follow this, why move ? Because it is already on the allocated list (see vchan_tx_prep()). >> +int vchan_tx_desc_free(struct dma_async_tx_descriptor *tx) >> +{ >> + struct virt_dma_chan *vc = to_virt_chan(tx->chan); >> + struct virt_dma_desc *vd = to_virt_desc(tx); >> + unsigned long flags; >> + >> + spin_lock_irqsave(&vc->lock, flags); >> + list_del(&vd->node); >> + spin_unlock_irqrestore(&vc->lock, flags); >> + >> + dev_dbg(vc->chan.device->dev, "vchan %p: txd %p[%x]: freeing\n", >> + vc, vd, vd->tx.cookie); >> + vc->desc_free(vd); >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(vchan_tx_desc_free); > this one seems okay, can you add comments here and also update documentation > for this Ah yes, very good point, doxygen like documentation, got it. >> @@ -96,9 +115,13 @@ void vchan_dma_desc_free_list(struct virt_dma_chan *vc, struct list_head *head) >> while (!list_empty(head)) { >> struct virt_dma_desc *vd = list_first_entry(head, >> struct virt_dma_desc, node); >> - list_del(&vd->node); >> - dev_dbg(vc->chan.device->dev, "txd %p: freeing\n", vd); >> - vc->desc_free(vd); >> + if (dmaengine_desc_test_reuse(&vd->tx)) { >> + list_move_tail(&vd->node, &vc->desc_allocated); > > should we check this if the list is to be freed? Why would anyone call free > except when cleaning up ? Yes, there is a corner case : a driver does a dmaengine_terminate_all(), to stop the dma transfers (a missed video frame for example). Then upon the sync signal, it resubmits the reusable transfers it had. Or said differently, reusable transfers should continue to be reusable through a dmaengine_terminate_all(), hence this test. And there is a link with your comment below. >> static inline void vchan_free_chan_resources(struct virt_dma_chan *vc) >> { >> + struct virt_dma_desc *vd; >> unsigned long flags; >> LIST_HEAD(head); >> >> spin_lock_irqsave(&vc->lock, flags); >> vchan_get_all_descriptors(vc, &head); >> + list_for_each_entry(vd, &head, node) >> + dmaengine_desc_clear_reuse(&vd->tx); > > why do we want to do this, if we are freeing them Because if we don't clear reuse flag, vchan_dma_desc_free_list() won't really free the descriptor, but move it to allocated list. Cheers.
diff --git a/drivers/dma/virt-dma.c b/drivers/dma/virt-dma.c index 6f80432a3f0a..e33caa87cbbb 100644 --- a/drivers/dma/virt-dma.c +++ b/drivers/dma/virt-dma.c @@ -29,7 +29,7 @@ dma_cookie_t vchan_tx_submit(struct dma_async_tx_descriptor *tx) spin_lock_irqsave(&vc->lock, flags); cookie = dma_cookie_assign(tx); - list_add_tail(&vd->node, &vc->desc_submitted); + list_move_tail(&vd->node, &vc->desc_submitted); spin_unlock_irqrestore(&vc->lock, flags); dev_dbg(vc->chan.device->dev, "vchan %p: txd %p[%x]: submitted\n", @@ -39,6 +39,23 @@ dma_cookie_t vchan_tx_submit(struct dma_async_tx_descriptor *tx) } EXPORT_SYMBOL_GPL(vchan_tx_submit); +int vchan_tx_desc_free(struct dma_async_tx_descriptor *tx) +{ + struct virt_dma_chan *vc = to_virt_chan(tx->chan); + struct virt_dma_desc *vd = to_virt_desc(tx); + unsigned long flags; + + spin_lock_irqsave(&vc->lock, flags); + list_del(&vd->node); + spin_unlock_irqrestore(&vc->lock, flags); + + dev_dbg(vc->chan.device->dev, "vchan %p: txd %p[%x]: freeing\n", + vc, vd, vd->tx.cookie); + vc->desc_free(vd); + return 0; +} +EXPORT_SYMBOL_GPL(vchan_tx_desc_free); + struct virt_dma_desc *vchan_find_desc(struct virt_dma_chan *vc, dma_cookie_t cookie) { @@ -83,8 +100,10 @@ static void vchan_complete(unsigned long arg) cb_data = vd->tx.callback_param; list_del(&vd->node); - - vc->desc_free(vd); + if (dmaengine_desc_test_reuse(&vd->tx)) + list_add(&vd->node, &vc->desc_allocated); + else + vc->desc_free(vd); if (cb) cb(cb_data); @@ -96,9 +115,13 @@ void vchan_dma_desc_free_list(struct virt_dma_chan *vc, struct list_head *head) while (!list_empty(head)) { struct virt_dma_desc *vd = list_first_entry(head, struct virt_dma_desc, node); - list_del(&vd->node); - dev_dbg(vc->chan.device->dev, "txd %p: freeing\n", vd); - vc->desc_free(vd); + if (dmaengine_desc_test_reuse(&vd->tx)) { + list_move_tail(&vd->node, &vc->desc_allocated); + } else { + dev_dbg(vc->chan.device->dev, "txd %p: freeing\n", vd); + list_del(&vd->node); + vc->desc_free(vd); + } } } EXPORT_SYMBOL_GPL(vchan_dma_desc_free_list); @@ -108,6 +131,7 @@ void vchan_init(struct virt_dma_chan *vc, struct dma_device *dmadev) dma_cookie_init(&vc->chan); spin_lock_init(&vc->lock); + INIT_LIST_HEAD(&vc->desc_allocated); INIT_LIST_HEAD(&vc->desc_submitted); INIT_LIST_HEAD(&vc->desc_issued); INIT_LIST_HEAD(&vc->desc_completed); diff --git a/drivers/dma/virt-dma.h b/drivers/dma/virt-dma.h index 181b95267866..a4156af93d88 100644 --- a/drivers/dma/virt-dma.h +++ b/drivers/dma/virt-dma.h @@ -29,6 +29,7 @@ struct virt_dma_chan { spinlock_t lock; /* protected by vc.lock */ + struct list_head desc_allocated; struct list_head desc_submitted; struct list_head desc_issued; struct list_head desc_completed; @@ -55,10 +56,17 @@ static inline struct dma_async_tx_descriptor *vchan_tx_prep(struct virt_dma_chan struct virt_dma_desc *vd, unsigned long tx_flags) { extern dma_cookie_t vchan_tx_submit(struct dma_async_tx_descriptor *); + extern int vchan_tx_desc_free(struct dma_async_tx_descriptor *); + unsigned long flags; dma_async_tx_descriptor_init(&vd->tx, &vc->chan); vd->tx.flags = tx_flags; vd->tx.tx_submit = vchan_tx_submit; + vd->tx.desc_free = vchan_tx_desc_free; + + spin_lock_irqsave(&vc->lock, flags); + list_add_tail(&vd->node, &vc->desc_allocated); + spin_unlock_irqrestore(&vc->lock, flags); return &vd->tx; } @@ -122,7 +130,8 @@ static inline struct virt_dma_desc *vchan_next_desc(struct virt_dma_chan *vc) } /** - * vchan_get_all_descriptors - obtain all submitted and issued descriptors + * vchan_get_all_descriptors - obtain all allocated, submitted and issued + * descriptors * vc: virtual channel to get descriptors from * head: list of descriptors found * @@ -134,6 +143,7 @@ static inline struct virt_dma_desc *vchan_next_desc(struct virt_dma_chan *vc) static inline void vchan_get_all_descriptors(struct virt_dma_chan *vc, struct list_head *head) { + list_splice_tail_init(&vc->desc_allocated, head); list_splice_tail_init(&vc->desc_submitted, head); list_splice_tail_init(&vc->desc_issued, head); list_splice_tail_init(&vc->desc_completed, head); @@ -141,11 +151,14 @@ static inline void vchan_get_all_descriptors(struct virt_dma_chan *vc, static inline void vchan_free_chan_resources(struct virt_dma_chan *vc) { + struct virt_dma_desc *vd; unsigned long flags; LIST_HEAD(head); spin_lock_irqsave(&vc->lock, flags); vchan_get_all_descriptors(vc, &head); + list_for_each_entry(vd, &head, node) + dmaengine_desc_clear_reuse(&vd->tx); spin_unlock_irqrestore(&vc->lock, flags); vchan_dma_desc_free_list(vc, &head);
This patch attempts to enhance the case of a transfer submitted multiple times, and where the cost of creating the descriptors chain is not negligible. This happens with big video buffers (several megabytes, ie. several thousands of linked descriptors in one scatter-gather list). In these cases, a video driver would want to do : - tx = dmaengine_prep_slave_sg() - dma_engine_submit(tx); - dma_async_issue_pending() - wait for video completion - read video data (or not, skipping a frame is also possible) - dma_engine_submit(tx) => here, the descriptors chain recalculation will take time => the dma coherent allocation over and over might create holes in the dma pool, which is counter-productive. - dma_async_issue_pending() - etc ... In order to cope with this case, virt-dma is modified to prevent freeing the descriptors upon completion if DMA_CTRL_REUSE flag is set in the transfer. This patch is a respin of the former DMA_CTRL_ACK approach, which was reverted due to a regression in audio drivers. Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> --- drivers/dma/virt-dma.c | 36 ++++++++++++++++++++++++++++++------ drivers/dma/virt-dma.h | 15 ++++++++++++++- 2 files changed, 44 insertions(+), 7 deletions(-)