Message ID | 1468465076-27324-1-git-send-email-okaya@codeaurora.org (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Hi Vinod, On 7/13/2016 10:57 PM, Sinan Kaya wrote: > There is a race condition between data transfer callback and descriptor > free code. The callback routine may decide to clear the resources even > though the descriptor has not yet been freed. > > Instead of calling the callback first and then releasing the memory, > this code is changing the order to return the descriptor back to the > free pool and then call the user provided callback. > > Signed-off-by: Sinan Kaya <okaya@codeaurora.org> > --- > drivers/dma/qcom/hidma.c | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) > > diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c > index 41b5c6d..c41696e 100644 > --- a/drivers/dma/qcom/hidma.c > +++ b/drivers/dma/qcom/hidma.c > @@ -111,6 +111,7 @@ static void hidma_process_completed(struct hidma_chan *mchan) > struct dma_async_tx_descriptor *desc; > dma_cookie_t last_cookie; > struct hidma_desc *mdesc; > + struct hidma_desc *next; > unsigned long irqflags; > struct list_head list; > > @@ -122,28 +123,31 @@ static void hidma_process_completed(struct hidma_chan *mchan) > spin_unlock_irqrestore(&mchan->lock, irqflags); > > /* Execute callbacks and run dependencies */ > - list_for_each_entry(mdesc, &list, node) { > - enum dma_status llstat; > + list_for_each_entry_safe(mdesc, next, &list, node) { > + dma_async_tx_callback callback; > + void *param; > > desc = &mdesc->desc; > > spin_lock_irqsave(&mchan->lock, irqflags); > - dma_cookie_complete(desc); > + if (hidma_ll_status(mdma->lldev, mdesc->tre_ch) > + == DMA_COMPLETE) > + dma_cookie_complete(desc); It looks like I introduced a behavioral change while refactoring the code. The previous one would call the callback only if the transfer was successful but it would always call dma_cookie_complete. The new behavior is to call dma_cookie_complete only if the transfer is successful and it calls the callback even in the case of error cases. Then, the client has to query if transfer was successful. Which one is the correct behavior? > spin_unlock_irqrestore(&mchan->lock, irqflags); > > - llstat = hidma_ll_status(mdma->lldev, mdesc->tre_ch); > - if (desc->callback && (llstat == DMA_COMPLETE)) > - desc->callback(desc->callback_param); > + callback = desc->callback; > + param = desc->callback_param; > > last_cookie = desc->cookie; > dma_run_dependencies(desc); > - } > > - /* Free descriptors */ > - spin_lock_irqsave(&mchan->lock, irqflags); > - list_splice_tail_init(&list, &mchan->free); > - spin_unlock_irqrestore(&mchan->lock, irqflags); > + spin_lock_irqsave(&mchan->lock, irqflags); > + list_move(&mdesc->node, &mchan->free); > + spin_unlock_irqrestore(&mchan->lock, irqflags); > > + if (callback) > + callback(param); > + } > } > > /* >
On Fri, Jul 15, 2016 at 09:00:52PM -0400, Sinan Kaya wrote: > Hi Vinod, > > On 7/13/2016 10:57 PM, Sinan Kaya wrote: > > There is a race condition between data transfer callback and descriptor > > free code. The callback routine may decide to clear the resources even > > though the descriptor has not yet been freed. > > > > Instead of calling the callback first and then releasing the memory, > > this code is changing the order to return the descriptor back to the > > free pool and then call the user provided callback. > > > > Signed-off-by: Sinan Kaya <okaya@codeaurora.org> > > --- > > drivers/dma/qcom/hidma.c | 26 +++++++++++++++----------- > > 1 file changed, 15 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c > > index 41b5c6d..c41696e 100644 > > --- a/drivers/dma/qcom/hidma.c > > +++ b/drivers/dma/qcom/hidma.c > > @@ -111,6 +111,7 @@ static void hidma_process_completed(struct hidma_chan *mchan) > > struct dma_async_tx_descriptor *desc; > > dma_cookie_t last_cookie; > > struct hidma_desc *mdesc; > > + struct hidma_desc *next; > > unsigned long irqflags; > > struct list_head list; > > > > @@ -122,28 +123,31 @@ static void hidma_process_completed(struct hidma_chan *mchan) > > spin_unlock_irqrestore(&mchan->lock, irqflags); > > > > /* Execute callbacks and run dependencies */ > > - list_for_each_entry(mdesc, &list, node) { > > - enum dma_status llstat; > > + list_for_each_entry_safe(mdesc, next, &list, node) { > > + dma_async_tx_callback callback; > > + void *param; > > > > desc = &mdesc->desc; > > > > spin_lock_irqsave(&mchan->lock, irqflags); > > - dma_cookie_complete(desc); > > + if (hidma_ll_status(mdma->lldev, mdesc->tre_ch) > > + == DMA_COMPLETE) > > + dma_cookie_complete(desc); > > It looks like I introduced a behavioral change while refactoring the code. > The previous one would call the callback only if the transfer was successful > but it would always call dma_cookie_complete. > > The new behavior is to call dma_cookie_complete only if the transfer is successful > and it calls the callback even in the case of error cases. Then, the client has > to query if transfer was successful. > > Which one is the correct behavior? Hi Sinan, Cookie is always completed. That simply means trasactions was completed and has no indication if the transaction was successfull or not. Uptill now we had no way of reporting error, Dave's series adds that up, so you can use it. Callback is optional for users. Again we didnt convey success of transaction, but a callback for reporting that trasaction was completed. So invoking callback makes sense everytime. HTH
Hi, On 7/24/2016 2:24 AM, Vinod Koul wrote: > On Fri, Jul 15, 2016 at 09:00:52PM -0400, Sinan Kaya wrote: >> Hi Vinod, >> >> On 7/13/2016 10:57 PM, Sinan Kaya wrote: >>> There is a race condition between data transfer callback and descriptor >>> free code. The callback routine may decide to clear the resources even >>> though the descriptor has not yet been freed. >>> >>> Instead of calling the callback first and then releasing the memory, >>> this code is changing the order to return the descriptor back to the >>> free pool and then call the user provided callback. >>> >>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org> >>> --- >>> drivers/dma/qcom/hidma.c | 26 +++++++++++++++----------- >>> 1 file changed, 15 insertions(+), 11 deletions(-) >>> >>> diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c >>> index 41b5c6d..c41696e 100644 >>> --- a/drivers/dma/qcom/hidma.c >>> +++ b/drivers/dma/qcom/hidma.c >>> @@ -111,6 +111,7 @@ static void hidma_process_completed(struct hidma_chan *mchan) >>> struct dma_async_tx_descriptor *desc; >>> dma_cookie_t last_cookie; >>> struct hidma_desc *mdesc; >>> + struct hidma_desc *next; >>> unsigned long irqflags; >>> struct list_head list; >>> >>> @@ -122,28 +123,31 @@ static void hidma_process_completed(struct hidma_chan *mchan) >>> spin_unlock_irqrestore(&mchan->lock, irqflags); >>> >>> /* Execute callbacks and run dependencies */ >>> - list_for_each_entry(mdesc, &list, node) { >>> - enum dma_status llstat; >>> + list_for_each_entry_safe(mdesc, next, &list, node) { >>> + dma_async_tx_callback callback; >>> + void *param; >>> >>> desc = &mdesc->desc; >>> >>> spin_lock_irqsave(&mchan->lock, irqflags); >>> - dma_cookie_complete(desc); >>> + if (hidma_ll_status(mdma->lldev, mdesc->tre_ch) >>> + == DMA_COMPLETE) >>> + dma_cookie_complete(desc); >> >> It looks like I introduced a behavioral change while refactoring the code. >> The previous one would call the callback only if the transfer was successful >> but it would always call dma_cookie_complete. >> >> The new behavior is to call dma_cookie_complete only if the transfer is successful >> and it calls the callback even in the case of error cases. Then, the client has >> to query if transfer was successful. >> >> Which one is the correct behavior? > > Hi Sinan, > > Cookie is always completed. That simply means trasactions was completed and > has no indication if the transaction was successfull or not. > > Uptill now we had no way of reporting error, Dave's series adds that up, so > you can use it. > > Callback is optional for users. Again we didnt convey success of > transaction, but a callback for reporting that trasaction was completed. So > invoking callback makes sense everytime. > > HTH > Let's put Dave's series aside for the moment and assume an error case where something bad happened during the transfer. I can add the error code once Dave's series get merged. Here is the callback from dmatest. static void dmatest_callback(void *arg) { done->done = true; } Here is how the request is made. dma_async_issue_pending(chan); wait_event_freezable_timeout(done_wait, done.done, msecs_to_jiffies(params->timeout)); status = dma_async_is_tx_complete(chan, cookie, NULL, NULL); if (!done.done) { timeout } else if (status != DMA_COMPLETE) { error } success. Based on what I see here, receiving callback all the time is OK. The client checks if the callback is received or not first. Next, the client checks the status of the tx_status. In the error case mentioned, the callback will be executed. done.done will be true. If I set dma_cookie_complete(desc) in error case, it would be wrong to tell the client that the transfer is successful. In my opinion, the new behavior is correct. Calling dma_cookie_complete(desc) all the time is not. Do you agree? If yes, I can divide this patch into two. One to correct the ordering. Another one for behavioral change.
On Mon, Jul 25, 2016 at 10:19:44AM -0400, Sinan Kaya wrote: > >> > >> It looks like I introduced a behavioral change while refactoring the code. > >> The previous one would call the callback only if the transfer was successful > >> but it would always call dma_cookie_complete. > >> > >> The new behavior is to call dma_cookie_complete only if the transfer is successful > >> and it calls the callback even in the case of error cases. Then, the client has > >> to query if transfer was successful. > >> > >> Which one is the correct behavior? > > > > Hi Sinan, > > > > Cookie is always completed. That simply means trasactions was completed and > > has no indication if the transaction was successfull or not. > > > > Uptill now we had no way of reporting error, Dave's series adds that up, so > > you can use it. > > > > Callback is optional for users. Again we didnt convey success of > > transaction, but a callback for reporting that trasaction was completed. So > > invoking callback makes sense everytime. > > > > Let's put Dave's series aside for the moment and assume an error case where > something bad happened during the transfer. I can add the error code once Dave's > series get merged. Fair enough.. > Here is the callback from dmatest. > > static void dmatest_callback(void *arg) > { > done->done = true; > } > > Here is how the request is made. > > dma_async_issue_pending(chan); > > wait_event_freezable_timeout(done_wait, done.done, > msecs_to_jiffies(params->timeout)); > > status = dma_async_is_tx_complete(chan, cookie, NULL, NULL); > if (!done.done) { > timeout > } else if (status != DMA_COMPLETE) { > error > } > > success. > > Based on what I see here, receiving callback all the time is OK. The client > checks if the callback is received or not first. Callback is optional from API PoV. Yes ppl do implement it :) > Next, the client checks the status of the tx_status. > > In the error case mentioned, the callback will be executed. done.done will be true. > > If I set dma_cookie_complete(desc) in error case, it would be wrong to tell the client > that the transfer is successful. And here is the thing that you missed :) Dmaengine tells transaction is complete. It does not say if the txn is success or failure. It can transfer data and not say if data was correct. A successful transaction implies data integrity as well, which dmaengine can't provide. > In my opinion, the new behavior is correct. Calling dma_cookie_complete(desc) all the time > is not. Do you agree? > > If yes, I can divide this patch into two. One to correct the ordering. Another one > for behavioral change. See above.. A callback or tx_status will only tell you the txn is completed. That is why we have DMA_COMPLETE and not DMA_SUCCESS. So current order seems fine to me!
On 8/4/2016 8:55 AM, Vinod Koul wrote: > Dmaengine tells transaction is complete. It does not say if the txn is > success or failure. It can transfer data and not say if data was > correct. A successful transaction implies data integrity as well, which > dmaengine can't provide. Thanks for describing this. I was confused about DMA_SUCCESS and DMA_COMPLETE. I now understand that tx_success API just returns information that the request was executed whether the result is error or not. This makes sense now. However, if the txn is failure; then we should never call the client callback since DMA engine cannot provide such feedback to the client without Dave's patch. You are saying that the calling the callback is optional. Then, the callback cannot be optional in the error case for old behavior. How does the client know if memcpy executed or not? The client got its callback and tx_status is also DMA_COMPLETE. Is the client supposed to do a memcmp ? (BTW, it doesn't make sense). >> In my opinion, the new behavior is correct. Calling dma_cookie_complete(desc) all the time >> > is not. Do you agree? >> > >> > If yes, I can divide this patch into two. One to correct the ordering. Another one >> > for behavioral change. > See above.. > > A callback or tx_status will only tell you the txn is completed. That is > why we have DMA_COMPLETE and not DMA_SUCCESS. > Still calling the callback and returning DMA_COMPLETE isn't right. There is no indication of an actual DMA error. The transaction is complete but data integrity failed. > So current order seems fine to me! I posted v2 of this patch without introducing the behavior change leaving the behavior discussion out for another patch. The current code will not call the callback if error was observed. This patch is needed to fix a race condition as the commit message describes. The callback is called before returning the descriptor back to free pool. If the client calls free resources, the descriptor that was not returned to free pool gets lost due to race condition. I'll refactor the code after Dave's change for passing the error code while calling the callback. That will be a different patch anyhow.
On Thu, Aug 04, 2016 at 10:17:24AM -0400, Sinan Kaya wrote: > On 8/4/2016 8:55 AM, Vinod Koul wrote: > > Dmaengine tells transaction is complete. It does not say if the txn is > > success or failure. It can transfer data and not say if data was > > correct. A successful transaction implies data integrity as well, which > > dmaengine can't provide. > > Thanks for describing this. I was confused about DMA_SUCCESS and DMA_COMPLETE. > I now understand that tx_success API just returns information that the request > was executed whether the result is error or not. This makes sense now. > > However, if the txn is failure; then we should never call the client callback > since DMA engine cannot provide such feedback to the client without Dave's patch. > You are saying that the calling the callback is optional. > > Then, the callback cannot be optional in the error case for old behavior. > > How does the client know if memcpy executed or not? The client got its callback > and tx_status is also DMA_COMPLETE. If an error occurred, then dma_async_is_tx_complete() is supposed to return DMA_ERROR. It's up to the DMA engine driver to ensure that this happens if it has error detection abilities. Most of the helpers in drivers/dma/dmaengine.h are there to _assist_ the driver writer - they can't do magic. dma_cookie_status() will return from the point of view of the generic DMA code what the status of a particular cookie is, and the cookie state. It doesn't take care of whether a particular transaction associated with a cookie failed or not - that's up to the driver. So, if dma_cookie_status() says that a cookie has DMA_COMPLETED status, and the DMA engine is able to detect errors on individual transfers, then the driver needs to do further status lookup to determine whether the particular transaction referred to by the cookie did fail, and modify the returned status appropriately. If dma_cookie_status() says that the cookie is DMA_IN_PROGRESS, then the driver is expected to calculate and report the residue (the remaining number of bytes) of the referred to transaction.
On 8/4/2016 10:40 AM, Russell King - ARM Linux wrote: > On Thu, Aug 04, 2016 at 10:17:24AM -0400, Sinan Kaya wrote: >> On 8/4/2016 8:55 AM, Vinod Koul wrote: >>> Dmaengine tells transaction is complete. It does not say if the txn is >>> success or failure. It can transfer data and not say if data was >>> correct. A successful transaction implies data integrity as well, which >>> dmaengine can't provide. >> >> Thanks for describing this. I was confused about DMA_SUCCESS and DMA_COMPLETE. >> I now understand that tx_success API just returns information that the request >> was executed whether the result is error or not. This makes sense now. >> >> However, if the txn is failure; then we should never call the client callback >> since DMA engine cannot provide such feedback to the client without Dave's patch. >> You are saying that the calling the callback is optional. >> >> Then, the callback cannot be optional in the error case for old behavior. >> >> How does the client know if memcpy executed or not? The client got its callback >> and tx_status is also DMA_COMPLETE. > > If an error occurred, then dma_async_is_tx_complete() is supposed to > return DMA_ERROR. It's up to the DMA engine driver to ensure that > this happens if it has error detection abilities. > Well, that's not what I'm getting from Vinod and also from the current usage in most of the drivers that I looked. The current drivers implement tx_status as follows. static enum dma_status xyz_tx_status(struct dma_chan *chan, dma_cookie_t cookie, struct dma_tx_state *state) { ... ret = dma_cookie_status(&c->vc.chan, cookie, state); if (ret == DMA_COMPLETE) return ret; ... } What Vinod is telling me that I need to set the cookie to complete whether the transaction is successful or not if the request was accepted by HW. xyz_tx_status is just an indication that the transaction was accepted by HW. An error can happen as a result of transaction execution. If I call dma_cookie_complete for all transactions regardless of transaction success or not, then the xyz_tx_status returns DMA_COMPLETE. This also matches what Vinod is saying. The transaction is complete but it may not be success. I'm saying that if we follow this scheme, then we should not call the callback. I also made the argument that the driver should not call dma_cookie_complete for failure cases and somehow return an error for transactions that failed. This is your suggestion. I don't think it matches Vinod's expectation. > Most of the helpers in drivers/dma/dmaengine.h are there to _assist_ > the driver writer - they can't do magic. dma_cookie_status() will > return from the point of view of the generic DMA code what the status > of a particular cookie is, and the cookie state. It doesn't take > care of whether a particular transaction associated with a cookie > failed or not - that's up to the driver. > > So, if dma_cookie_status() says that a cookie has DMA_COMPLETED > status, and the DMA engine is able to detect errors on individual > transfers, then the driver needs to do further status lookup to > determine whether the particular transaction referred to by the > cookie did fail, and modify the returned status appropriately. > > If dma_cookie_status() says that the cookie is DMA_IN_PROGRESS, > then the driver is expected to calculate and report the residue > (the remaining number of bytes) of the referred to transaction. > This part is fine. I'm worried about transactions that are failing.
On Thu, Aug 04, 2016 at 11:27:46AM -0400, Sinan Kaya wrote: > On 8/4/2016 10:40 AM, Russell King - ARM Linux wrote: > > On Thu, Aug 04, 2016 at 10:17:24AM -0400, Sinan Kaya wrote: > >> Thanks for describing this. I was confused about DMA_SUCCESS and DMA_COMPLETE. > >> I now understand that tx_success API just returns information that the request > >> was executed whether the result is error or not. This makes sense now. > >> > >> However, if the txn is failure; then we should never call the client callback > >> since DMA engine cannot provide such feedback to the client without Dave's patch. > >> You are saying that the calling the callback is optional. > >> > >> Then, the callback cannot be optional in the error case for old behavior. > >> > >> How does the client know if memcpy executed or not? The client got its callback > >> and tx_status is also DMA_COMPLETE. > > > > If an error occurred, then dma_async_is_tx_complete() is supposed to > > return DMA_ERROR. It's up to the DMA engine driver to ensure that > > this happens if it has error detection abilities. > > > > Well, that's not what I'm getting from Vinod and also from the current usage > in most of the drivers that I looked. > > The current drivers implement tx_status as follows. > > static enum dma_status xyz_tx_status(struct dma_chan *chan, > dma_cookie_t cookie, struct dma_tx_state *state) > { > ... > ret = dma_cookie_status(&c->vc.chan, cookie, state); > if (ret == DMA_COMPLETE) > return ret; > ... > } And... how many of those drivers detect an error occuring in the DMA? If they don't detect an error during DMA, I'm curious what would you expect the above to look like. > I also made the argument that the driver should not call dma_cookie_complete > for failure cases and somehow return an error for transactions that failed. You can't omit individual dma_cookie_complete() calls like that (I think you have little understanding how the cookie system works.) The cookies consist of continuous pool of numbers. Each transaction gets allocated a cookie which is incremented from the previous transaction. Any transaction can be identified as complete or pending depending on whether the cookie value that it has is earlier or later than the current completion cookie value. What this means is if you try to do this: - transaction 1 completes successfully, call dma_cookie_complete() - transaction 2 completes with an error, omit dma_cookie_complete() - transaction 3 completes successfully, call dma_cookie_complete() The effect at the end of that sequence will be as if transaction 2 had called dma_cookie_complete() - because the completed cookie value will now be ahead of transaction 2's cookie. So, playing bakes with dma_cookie_complete() really won't work. Please forget this idea. dma_cookie_complete() merely indicates whether the transaction completed or is still in progress. It's got nothing to do with error status. What you instead need to do is to find some way to record in your driver that transaction 2 failed, and when dma_cookie_status() says that a transaction has DMA_COMPLETE status, you need to look up to see whether it failed.
On 08/04/2016 05:38 PM, Russell King - ARM Linux wrote: [...] > What you instead need to do is to find some way to record in your > driver that transaction 2 failed, and when dma_cookie_status() says > that a transaction has DMA_COMPLETE status, you need to look up to > see whether it failed. In my opinion this is where the current API is broken by design. For each transfer that fails you need to store the cookie associated with that transfer in some kind of lookup table. Since there is no lifetime associated with a cookie entries in this table would need to be retained forever and it will grow unbound. Ideally we'd mark error reporting through this interface as deprecated and discourage new users of the interface. As far as I can see most of the few drivers that do return DMA_ERROR get it wrong anyway, e.g. return it unconditionally for all cookies when an error occurred for any of them. - Lars -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/4/2016 11:38 AM, Russell King - ARM Linux wrote: > On Thu, Aug 04, 2016 at 11:27:46AM -0400, Sinan Kaya wrote: >> On 8/4/2016 10:40 AM, Russell King - ARM Linux wrote: >>> On Thu, Aug 04, 2016 at 10:17:24AM -0400, Sinan Kaya wrote: >>>> Thanks for describing this. I was confused about DMA_SUCCESS and DMA_COMPLETE. >>>> I now understand that tx_success API just returns information that the request >>>> was executed whether the result is error or not. This makes sense now. >>>> >>>> However, if the txn is failure; then we should never call the client callback >>>> since DMA engine cannot provide such feedback to the client without Dave's patch. >>>> You are saying that the calling the callback is optional. >>>> >>>> Then, the callback cannot be optional in the error case for old behavior. >>>> >>>> How does the client know if memcpy executed or not? The client got its callback >>>> and tx_status is also DMA_COMPLETE. >>> >>> If an error occurred, then dma_async_is_tx_complete() is supposed to >>> return DMA_ERROR. It's up to the DMA engine driver to ensure that >>> this happens if it has error detection abilities. >>> >> >> Well, that's not what I'm getting from Vinod and also from the current usage >> in most of the drivers that I looked. >> >> The current drivers implement tx_status as follows. >> >> static enum dma_status xyz_tx_status(struct dma_chan *chan, >> dma_cookie_t cookie, struct dma_tx_state *state) >> { >> ... >> ret = dma_cookie_status(&c->vc.chan, cookie, state); >> if (ret == DMA_COMPLETE) >> return ret; >> ... >> } > I have patiently looked at all dma_cookie_status calls from LXR. I have not seen a single one implementing what you are suggesting. I can't believe that there is no driver that does error checking. Feel free to correct me if you have a good example I could use as a reference. > And... how many of those drivers detect an error occuring in the DMA? > If they don't detect an error during DMA, I'm curious what would you > expect the above to look like. I have this function that returns the real status of the transaction. enum dma_status hidma_ll_status(struct hidma_lldev *llhndl, u32 tre_ch); I can do another look up here. > >> I also made the argument that the driver should not call dma_cookie_complete >> for failure cases and somehow return an error for transactions that failed. > > You can't omit individual dma_cookie_complete() calls like that (I think > you have little understanding how the cookie system works.) > > The cookies consist of continuous pool of numbers. Each transaction > gets allocated a cookie which is incremented from the previous > transaction. Any transaction can be identified as complete or pending > depending on whether the cookie value that it has is earlier or later > than the current completion cookie value. > > What this means is if you try to do this: > > - transaction 1 completes successfully, call dma_cookie_complete() > - transaction 2 completes with an error, omit dma_cookie_complete() > - transaction 3 completes successfully, call dma_cookie_complete() > > The effect at the end of that sequence will be as if transaction 2 had > called dma_cookie_complete() - because the completed cookie value will > now be ahead of transaction 2's cookie. > > So, playing bakes with dma_cookie_complete() really won't work. Please > forget this idea. dma_cookie_complete() merely indicates whether the > transaction completed or is still in progress. It's got nothing to do > with error status. Fair enough. All the cookie business seemed very convoluted to me when I tried to understand how it works. I relied on what other drivers are doing instead. > > What you instead need to do is to find some way to record in your > driver that transaction 2 failed, and when dma_cookie_status() says > that a transaction has DMA_COMPLETE status, you need to look up to > see whether it failed. > I already have this information available. I just need confirmation from Vinod that this is the right way to do it in terms of DMA engine API. The other way is I can feed this information to what Dave just introduced as part of the callback mechanism and not touch this. The main discussion here is that "tx_status does not indicate whether the final transaction is successful or not" whether the driver has the capability to determine error or not. It is an API question not implementation question.
On 08/04/2016 06:08 PM, Sinan Kaya wrote: [...] > The other way is I can feed this information to what Dave just introduced > as part of the callback mechanism and not touch this. Use the callback mechanism. It is a lot easier to implement correctly than the tx_status() mechanism. > The main discussion here is that > > "tx_status does not indicate whether the final transaction is successful or > not" whether the driver has the capability to determine error or not. tx_status() is supposed to be able to indicate whether a transfer failed or not. But in my opinion this feature is broken by design and implementing it correctly is very difficult without creating memory leaks. Which is probably why so few drivers actually implement it. -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lars-Peter Clausen <lars@metafoo.de> writes: > On 08/04/2016 06:08 PM, Sinan Kaya wrote: > [...] >> The other way is I can feed this information to what Dave just introduced >> as part of the callback mechanism and not touch this. > > Use the callback mechanism. It is a lot easier to implement correctly than > the tx_status() mechanism. > >> The main discussion here is that >> >> "tx_status does not indicate whether the final transaction is successful or >> not" whether the driver has the capability to determine error or not. > > tx_status() is supposed to be able to indicate whether a transfer failed or > not. But in my opinion this feature is broken by design and implementing it > correctly is very difficult without creating memory leaks. Which is probably > why so few drivers actually implement it. I think you can implement the error reporting by remembering the "last" cookie where an error occurred such as in : - e093bf60ca49 ("dmaengine: pxa: handle bus errors") => see the part of the commit beginning with "As dma_cookie_status() ..." The point about error reporting was already discussed with Vinod in here: https://lkml.org/lkml/2016/4/13/471 => Vinod and I were seeing the reporting can be improved Yet the description made by Russell of the DMA API semantics can be implemented and used to find whether a specific tx failed or not, it's rather the "in progress" part I find misleading. Cheers.
On 08/05/2016 08:32 AM, Robert Jarzmik wrote: > Lars-Peter Clausen <lars@metafoo.de> writes: > >> On 08/04/2016 06:08 PM, Sinan Kaya wrote: >> [...] >>> The other way is I can feed this information to what Dave just introduced >>> as part of the callback mechanism and not touch this. >> >> Use the callback mechanism. It is a lot easier to implement correctly than >> the tx_status() mechanism. >> >>> The main discussion here is that >>> >>> "tx_status does not indicate whether the final transaction is successful or >>> not" whether the driver has the capability to determine error or not. >> >> tx_status() is supposed to be able to indicate whether a transfer failed or >> not. But in my opinion this feature is broken by design and implementing it >> correctly is very difficult without creating memory leaks. Which is probably >> why so few drivers actually implement it. > > I think you can implement the error reporting by remembering the "last" cookie > where an error occurred such as in : > - e093bf60ca49 ("dmaengine: pxa: handle bus errors") > => see the part of the commit beginning with "As dma_cookie_status() ..." > > The point about error reporting was already discussed with Vinod in here: > https://lkml.org/lkml/2016/4/13/471 > => Vinod and I were seeing the reporting can be improved But that allows you only to report the error for the last descriptor that failed. If two or more have failed any but the last will return DMA_COMPLETE instead of DMA_ERROR. In your case that sort of works because you completely stop execution after a descriptor failed. But as you noted in the commit message this will report misleading status values for the descriptors after the descriptor that failed. I believe we need to invest some effort to come up with clear semantics on what is the expected behavior when transferring a descriptor fails. Potentially allowing clients to choose the desired behavior, e.g. either abort all descriptors on error or continue with the next one. -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/5/2016 4:34 AM, Lars-Peter Clausen wrote: > I believe we need to invest some effort to come up with clear semantics on > what is the expected behavior when transferring a descriptor fails. > Potentially allowing clients to choose the desired behavior, e.g. either > abort all descriptors on error or continue with the next one. I agree. I was leaning towards not calling the callback when an error happens to keep the implementation simple and backwards compatible. After Dave's change, I need to call the callback with the actual error in question. Now, I have broken tx_status. If I implement DMA_ERROR into tx_status like Russell indicated, then I have the address space explosion problem like you indicated. If I report the error for the last failing cookie, is it good enough? Or another approach is tx_status is just an indication of HW accepting the request. All existing clients need to be changed to use Dave's error reporting for deciding on actual success or failure for a request that was accepted by HW. tx_status can no longer be used to check for transaction errors. It can still be used to see if HW accepts the request (like parameter checking etc. but not for the final result) I like this one better.
On Thu, Aug 04, 2016 at 10:17:24AM -0400, Sinan Kaya wrote: > On 8/4/2016 8:55 AM, Vinod Koul wrote: > > Dmaengine tells transaction is complete. It does not say if the txn is > > success or failure. It can transfer data and not say if data was > > correct. A successful transaction implies data integrity as well, which > > dmaengine can't provide. > > Thanks for describing this. I was confused about DMA_SUCCESS and DMA_COMPLETE. > I now understand that tx_success API just returns information that the request > was executed whether the result is error or not. This makes sense now. > > However, if the txn is failure; then we should never call the client callback > since DMA engine cannot provide such feedback to the client without Dave's patch. > You are saying that the calling the callback is optional. Yes callback is optional, *BUT* not for driver. If a user has set a callback, you _have_ to invoke it. That is the expectation from user and you cannot selectively choose! When I say callback is optional, it means only from caller PoV, not from dmanegine driver. Caller may not have set it! > Then, the callback cannot be optional in the error case for old behavior. > > How does the client know if memcpy executed or not? The client got its callback > and tx_status is also DMA_COMPLETE. So cookie status should be checked after callback. We can also report DMA_ERROR if you can detect one. Some hardware can do that (not very common though), but even if you report DMA_COMPLETE, that never implies success for transaction. > Is the client supposed to do a memcmp ? (BTW, it doesn't make sense). If someone really want to check data, then yes. But I don't really see the point in that. Users would anyone complain the bad data and you fix the bug :) > > > >> In my opinion, the new behavior is correct. Calling dma_cookie_complete(desc) all the time > >> > is not. Do you agree? > >> > > >> > If yes, I can divide this patch into two. One to correct the ordering. Another one > >> > for behavioral change. > > See above.. > > > > A callback or tx_status will only tell you the txn is completed. That is > > why we have DMA_COMPLETE and not DMA_SUCCESS. > > > > Still calling the callback and returning DMA_COMPLETE isn't right. There is no indication > of an actual DMA error. The transaction is complete but data integrity failed. Yes that is the idea.. You are telling client, a transaction has been completed from controller side. Is the data transfer and integrity is guaranteed, we cannot know.. Data integrity is a function of many other factors, some of which are not in control of dmaengine. So it cannot guarantee success. Even the controllers which can report errors, can do that only from dmaengine PoV, not really from system PoV. > > > So current order seems fine to me! > > I posted v2 of this patch without introducing the behavior change leaving the behavior discussion > out for another patch. The current code will not call the callback if error was observed. That's not right, as explained above. > This patch is needed to fix a race condition as the commit message describes. > The callback is called before returning the descriptor back to free pool. > > If the client calls free resources, the descriptor that was not returned to free pool gets lost due > to race condition. Hmmm, if you have txn's pending and client wants to free up, shouldn't the pending txn's be cleaned up? Sound like a different bug to me.. So if I submit 5 txn's and now want to freeup, will you still leak descriptors? Doesn't sound as right behaviour to me. > I'll refactor the code after Dave's change for passing the error code while calling the > callback. That will be a different patch anyhow. Yes the error reporting is different
On Thu, Aug 04, 2016 at 11:27:46AM -0400, Sinan Kaya wrote: > On 8/4/2016 10:40 AM, Russell King - ARM Linux wrote: > > On Thu, Aug 04, 2016 at 10:17:24AM -0400, Sinan Kaya wrote: > >> On 8/4/2016 8:55 AM, Vinod Koul wrote: > >>> Dmaengine tells transaction is complete. It does not say if the txn is > >>> success or failure. It can transfer data and not say if data was > >>> correct. A successful transaction implies data integrity as well, which > >>> dmaengine can't provide. > >> > >> Thanks for describing this. I was confused about DMA_SUCCESS and DMA_COMPLETE. > >> I now understand that tx_success API just returns information that the request > >> was executed whether the result is error or not. This makes sense now. > >> > >> However, if the txn is failure; then we should never call the client callback > >> since DMA engine cannot provide such feedback to the client without Dave's patch. > >> You are saying that the calling the callback is optional. > >> > >> Then, the callback cannot be optional in the error case for old behavior. > >> > >> How does the client know if memcpy executed or not? The client got its callback > >> and tx_status is also DMA_COMPLETE. > > > > If an error occurred, then dma_async_is_tx_complete() is supposed to > > return DMA_ERROR. It's up to the DMA engine driver to ensure that > > this happens if it has error detection abilities. > > > > Well, that's not what I'm getting from Vinod and also from the current usage > in most of the drivers that I looked. Sorry but, you are not interpreting it correctly. Me and Russell are saying the same thing! > > The current drivers implement tx_status as follows. > > static enum dma_status xyz_tx_status(struct dma_chan *chan, > dma_cookie_t cookie, struct dma_tx_state *state) > { > ... > ret = dma_cookie_status(&c->vc.chan, cookie, state); > if (ret == DMA_COMPLETE) > return ret; > ... > } > > What Vinod is telling me that I need to set the cookie to complete > whether the transaction is successful or not if the request was accepted > by HW. xyz_tx_status is just an indication that the transaction was accepted > by HW. An error can happen as a result of transaction execution. Nope, if the txn is completed you mark it complete. If you can detect error (can you??) then you can report DMA_ERROR. In that latter case do not use dma_async_is_complete() to check. You would need to store and report that cookie 'x' failed which you report status in .tx_statis() > > If I call dma_cookie_complete for all transactions regardless of transaction > success or not, then the xyz_tx_status returns DMA_COMPLETE. Again that is based on your implementation. > This also matches what Vinod is saying. The transaction is complete but > it may not be success. I'm saying that if we follow this scheme, then > we should not call the callback. That is not in driver's control. If the callback is set, you have to call it. Client may choose to not set it. > I also made the argument that the driver should not call dma_cookie_complete > for failure cases and somehow return an error for transactions that failed. > This is your suggestion. > > I don't think it matches Vinod's expectation. It does! > > Most of the helpers in drivers/dma/dmaengine.h are there to _assist_ > > the driver writer - they can't do magic. dma_cookie_status() will > > return from the point of view of the generic DMA code what the status > > of a particular cookie is, and the cookie state. It doesn't take > > care of whether a particular transaction associated with a cookie > > failed or not - that's up to the driver. > > > > So, if dma_cookie_status() says that a cookie has DMA_COMPLETED > > status, and the DMA engine is able to detect errors on individual > > transfers, then the driver needs to do further status lookup to > > determine whether the particular transaction referred to by the > > cookie did fail, and modify the returned status appropriately. > > > > If dma_cookie_status() says that the cookie is DMA_IN_PROGRESS, > > then the driver is expected to calculate and report the residue > > (the remaining number of bytes) of the referred to transaction. > > > > This part is fine. I'm worried about transactions that are failing. And you issue is complete orthogonal to this debate. I am not saying we should not discuss this, but you fix would be entirely different here (going by data you have provided till now)
On Thu, Aug 04, 2016 at 05:59:30PM +0200, Lars-Peter Clausen wrote: > On 08/04/2016 05:38 PM, Russell King - ARM Linux wrote: > [...] > > What you instead need to do is to find some way to record in your > > driver that transaction 2 failed, and when dma_cookie_status() says > > that a transaction has DMA_COMPLETE status, you need to look up to > > see whether it failed. > > In my opinion this is where the current API is broken by design. For each > transfer that fails you need to store the cookie associated with that > transfer in some kind of lookup table. Since there is no lifetime associated > with a cookie entries in this table would need to be retained forever and it > will grow unbound. And how many drivers can report errors? And how many drivers can guarantee DMA_COMPLETE implies transaction was succesful. > Ideally we'd mark error reporting through this interface as deprecated and > discourage new users of the interface. As far as I can see most of the few > drivers that do return DMA_ERROR get it wrong anyway, e.g. return it > unconditionally for all cookies when an error occurred for any of them. Error reporting is quite tricky as detection is a problem. So yes if you can do so, it is highly encouraged to report using new interface which is better than client checking after callback. Btw what is the behaviour after error? I would think that client will see an error and report to upper layer while initiaite closure of transaction. So does driver need to keep the state for a longer time :-)
On 2016-08-08 04:51, Vinod Koul wrote: > >> This patch is needed to fix a race condition as the commit message >> describes. >> The callback is called before returning the descriptor back to free >> pool. >> >> If the client calls free resources, the descriptor that was not >> returned to free pool gets lost due >> to race condition. > > Hmmm, if you have txn's pending and client wants to free up, shouldn't > the pending txn's be cleaned up? Sound like a different bug to me.. > > So if I submit 5 txn's and now want to freeup, will you still leak > descriptors? Doesn't sound as right behaviour to me. > If free is called from the callback, current code will leak the current descriptor where free was called. It will release the other 4. Because of ordering problem, descriptor is not in the active, pending or free pool. I check pending txn by looking at active and queued lists when free is called. After the callback, I put the descriptor back to free pool. At this moment, it is already too late. >> I'll refactor the code after Dave's change for passing the error code >> while calling the >> callback. That will be a different patch anyhow. > > Yes the error reporting is different -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/08/2016 11:08 AM, Vinod Koul wrote: > On Thu, Aug 04, 2016 at 05:59:30PM +0200, Lars-Peter Clausen wrote: >> On 08/04/2016 05:38 PM, Russell King - ARM Linux wrote: >> [...] >>> What you instead need to do is to find some way to record in your >>> driver that transaction 2 failed, and when dma_cookie_status() says >>> that a transaction has DMA_COMPLETE status, you need to look up to >>> see whether it failed. >> >> In my opinion this is where the current API is broken by design. For each >> transfer that fails you need to store the cookie associated with that >> transfer in some kind of lookup table. Since there is no lifetime associated >> with a cookie entries in this table would need to be retained forever and it >> will grow unbound. > > And how many drivers can report errors? And how many drivers can guarantee > DMA_COMPLETE implies transaction was succesful. The former just a handful, the later hopefully all. > >> Ideally we'd mark error reporting through this interface as deprecated and >> discourage new users of the interface. As far as I can see most of the few >> drivers that do return DMA_ERROR get it wrong anyway, e.g. return it >> unconditionally for all cookies when an error occurred for any of them. > > Error reporting is quite tricky as detection is a problem. So yes if you > can do so, it is highly encouraged to report using new interface which is > better than client checking after callback. > > Btw what is the behaviour after error? I would think that client will see an > error and report to upper layer while initiaite closure of transaction. So > does driver need to keep the state for a longer time :-) The problem is that this is not really clearly defined. 1) What should be done when multiple descriptors are queued and an error is encountered on one of them. Should the descriptors that are after the one in the queue that caused the error be discarded or should they be executed as normal? 2) How long does a error result need to be retained. Can it be discarded when the terminate_all() is called, or can it be discarded when the next issue_pending() is called or should it be retained forever? Unless we can clearly define the semantics of error reporting it is very difficult for drivers to use it. Which is probably one of the reasons why there are only very few DMAengine consumers that do actual error checking. -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/8/2016 5:02 AM, Vinod Koul wrote: >> What Vinod is telling me that I need to set the cookie to complete >> > whether the transaction is successful or not if the request was accepted >> > by HW. xyz_tx_status is just an indication that the transaction was accepted >> > by HW. An error can happen as a result of transaction execution. > Nope, if the txn is completed you mark it complete. If you can detect error > (can you??) then you can report DMA_ERROR. > Yes, the HW reports if a transaction failed or not. I have this information available in hidma_ll_status function for a limited amount of time until the descriptor gets reused. > In that latter case do not use dma_async_is_complete() to check. You would > need to store and report that cookie 'x' failed which you report status in > .tx_statis() > I really don't like the idea of telling 'hey client I finished your work and I guarantee you it is complete. A month from now, by the way I actually didn't do the work that day and I did not tell you' That's why, I preferred not to call the callback when I observe an error which I think it makes more sense. Where is the reliability in this? Some random bugs showing at random times. I'd rather not call the callback and be safe. Especially, if you are talking about servers; this is plain unacceptable. As Lars-Peter and I indicated in my last email, I think we need to kill this tx_status API and replace all the clients to use Dave's interface. It is practically impossible to implement a reliable tx_status function. Once this transition happens, I can implement Dave's interface not before. Again, it will be a different patch than this one. I think v2 of this patch needs to go in as it is. https://lkml.org/lkml/2016/7/31/64 >> > >> > If I call dma_cookie_complete for all transactions regardless of transaction >> > success or not, then the xyz_tx_status returns DMA_COMPLETE. > Again that is based on your implementation. >
On Mon, Aug 08, 2016 at 02:25:28PM +0200, Lars-Peter Clausen wrote: > On 08/08/2016 11:08 AM, Vinod Koul wrote: > > On Thu, Aug 04, 2016 at 05:59:30PM +0200, Lars-Peter Clausen wrote: > >> On 08/04/2016 05:38 PM, Russell King - ARM Linux wrote: > >> [...] > >>> What you instead need to do is to find some way to record in your > >>> driver that transaction 2 failed, and when dma_cookie_status() says > >>> that a transaction has DMA_COMPLETE status, you need to look up to > >>> see whether it failed. > >> > >> In my opinion this is where the current API is broken by design. For each > >> transfer that fails you need to store the cookie associated with that > >> transfer in some kind of lookup table. Since there is no lifetime associated > >> with a cookie entries in this table would need to be retained forever and it > >> will grow unbound. > > > > And how many drivers can report errors? And how many drivers can guarantee > > DMA_COMPLETE implies transaction was succesful. > > The former just a handful, the later hopefully all. > > > > >> Ideally we'd mark error reporting through this interface as deprecated and > >> discourage new users of the interface. As far as I can see most of the few > >> drivers that do return DMA_ERROR get it wrong anyway, e.g. return it > >> unconditionally for all cookies when an error occurred for any of them. > > > > Error reporting is quite tricky as detection is a problem. So yes if you > > can do so, it is highly encouraged to report using new interface which is > > better than client checking after callback. > > > > Btw what is the behaviour after error? I would think that client will see an > > error and report to upper layer while initiaite closure of transaction. So > > does driver need to keep the state for a longer time :-) > > The problem is that this is not really clearly defined. > > 1) What should be done when multiple descriptors are queued and an error is > encountered on one of them. Should the descriptors that are after the one in > the queue that caused the error be discarded or should they be executed as > normal? That is client's call. But a reasonable way would be for client to propagate those errors up, so it can terminate. > 2) How long does a error result need to be retained. Can it be discarded > when the terminate_all() is called, or can it be discarded when the next > issue_pending() is called or should it be retained forever? Uptill next terminate_all() > Unless we can clearly define the semantics of error reporting it is very > difficult for drivers to use it. Which is probably one of the reasons why > there are only very few DMAengine consumers that do actual error checking. Yes agreed, but most reasonable behaviour is to terminate. Also I would expect the error reporting to be done thru new API and explcitly told that we found error (if we can). - ~Vinod -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 08, 2016 at 10:45:24AM -0400, Sinan Kaya wrote: > On 8/8/2016 5:02 AM, Vinod Koul wrote: > >> What Vinod is telling me that I need to set the cookie to complete > >> > whether the transaction is successful or not if the request was accepted > >> > by HW. xyz_tx_status is just an indication that the transaction was accepted > >> > by HW. An error can happen as a result of transaction execution. > > Nope, if the txn is completed you mark it complete. If you can detect error > > (can you??) then you can report DMA_ERROR. > > > > Yes, the HW reports if a transaction failed or not. I have this information > available in hidma_ll_status function for a limited amount of time until the > descriptor gets reused. > > > In that latter case do not use dma_async_is_complete() to check. You would > > need to store and report that cookie 'x' failed which you report status in > > .tx_statis() > > > > I really don't like the idea of telling 'hey client I finished your work and I > guarantee you it is complete. A month from now, by the way I actually didn't do > the work that day and I did not tell you' As i said previously, controllers cannot detect errors. In a system DMA burst may go bad due to various different issues which controller has not handle over. So from s system PoV we cannot declare success! > That's why, I preferred not to call the callback when I observe an error which I > think it makes more sense. That doesnt make sense. A client set a callback, it expect you to call one. The result quried maybe txn completed or error. Since you have means, please report.. > Where is the reliability in this? Some random bugs showing at random times. > I'd rather not call the callback and be safe. Especially, if you are talking about > servers; this is plain unacceptable. How does ignoring client wish caller solve this? You are really ona wrong path here. > As Lars-Peter and I indicated in my last email, I think we need to kill this > tx_status API and replace all the clients to use Dave's interface. It is practically > impossible to implement a reliable tx_status function. > > Once this transition happens, I can implement Dave's interface not before. > > Again, it will be a different patch than this one. I think v2 of this patch > needs to go in as it is. > > https://lkml.org/lkml/2016/7/31/64 I havent looked at the patch. If it is not invoking callback set by user, then I am not taking it. Sorry, we dont choose over client's wish. Thanks
On 8/10/2016 1:28 PM, Vinod Koul wrote: >> That's why, I preferred not to call the callback when I observe an error which I >> > think it makes more sense. > That doesnt make sense. A client set a callback, it expect you to call one. > The result quried maybe txn completed or error. Since you have means, please > report.. > If there is a good way to fix tx_status, I can certainly do so. I just need to make sure my implementation is robust and reliable. I saw your reply that we need to keep this information around until terminate_all is called. What is a good implementation strategy? Keep a size limited list with error cookies and flush them in terminate all? What should I do if code reaches to the size limit? Size the error cookie list double the size of available descriptors? >> Again, it will be a different patch than this one. I think v2 of this patch >> > needs to go in as it is. >> > >> > https://lkml.org/lkml/2016/7/31/64 > I havent looked at the patch. If it is not invoking callback set by user, > then I am not taking it. Sorry, we dont choose over client's wish. Ok. The problem you are referring to is something else and needs to be addressed separately. I can create a series with first implement a reliable tx_status based on your recommendation above. Then, change the current behavior so that client callback is always executed as you requested. After that this patch to fix the free order.
<sorry for delay, i was out> On Wed, Aug 10, 2016 at 01:31:21PM -0400, Sinan Kaya wrote: > On 8/10/2016 1:28 PM, Vinod Koul wrote: > >> That's why, I preferred not to call the callback when I observe an error which I > >> > think it makes more sense. > > That doesnt make sense. A client set a callback, it expect you to call one. > > The result quried maybe txn completed or error. Since you have means, please > > report.. > > > > If there is a good way to fix tx_status, I can certainly do so. I just need to make > sure my implementation is robust and reliable. > > I saw your reply that we need to keep this information around until terminate_all is > called. > > What is a good implementation strategy? > > Keep a size limited list with error cookies and flush them in terminate all? I think so, terminate_all anyway cleans up the channel. Btw what is the behaviour on error? Do you terminate or somthing else? > What should I do if code reaches to the size limit? > > Size the error cookie list double the size of available descriptors? That won't help as you would have two values per cookie and we dont want that :) > >> Again, it will be a different patch than this one. I think v2 of this patch > >> > needs to go in as it is. > >> > > >> > https://lkml.org/lkml/2016/7/31/64 > > I havent looked at the patch. If it is not invoking callback set by user, > > then I am not taking it. Sorry, we dont choose over client's wish. > > Ok. The problem you are referring to is something else and needs to be addressed > separately. I can create a series with first implement a reliable tx_status based > on your recommendation above. > > Then, change the current behavior so that client callback is always executed as you > requested. > > After that this patch to fix the free order.
On 8/18/2016 10:48 PM, Vinod Koul wrote: >> Keep a size limited list with error cookies and flush them in terminate all? > I think so, terminate_all anyway cleans up the channel. Btw what is the > behaviour on error? Do you terminate or somthing else? > On error, I flush all outstanding transactions with an error code and I reset the channel. After the reset, the DMA channel is functional again. The client doesn't need to shutdown anything.
On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote: > On 8/18/2016 10:48 PM, Vinod Koul wrote: > >> Keep a size limited list with error cookies and flush them in terminate all? > > I think so, terminate_all anyway cleans up the channel. Btw what is the > > behaviour on error? Do you terminate or somthing else? > > > > On error, I flush all outstanding transactions with an error code and I reset > the channel. After the reset, the DMA channel is functional again. The client > doesn't need to shutdown anything. You mean from the client context or driver?
On 8/18/2016 11:42 PM, Vinod Koul wrote: > On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote: >> On 8/18/2016 10:48 PM, Vinod Koul wrote: >>>> Keep a size limited list with error cookies and flush them in terminate all? >>> I think so, terminate_all anyway cleans up the channel. Btw what is the >>> behaviour on error? Do you terminate or somthing else? >>> >> >> On error, I flush all outstanding transactions with an error code and I reset >> the channel. After the reset, the DMA channel is functional again. The client >> doesn't need to shutdown anything. > > You mean from the client context or driver? > The client doesn't need to call device_free_chan_resources and device_terminate_all to be specific. Client can certainly call these if it needs to but it is not required to recover the channel. After the reset in error condition, the client can continue issuing new requests with tx_submit and device_issue_pending as usual.
On Thu, Aug 18, 2016 at 11:48:52PM -0400, Sinan Kaya wrote: > On 8/18/2016 11:42 PM, Vinod Koul wrote: > > On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote: > >> On 8/18/2016 10:48 PM, Vinod Koul wrote: > >>>> Keep a size limited list with error cookies and flush them in terminate all? > >>> I think so, terminate_all anyway cleans up the channel. Btw what is the > >>> behaviour on error? Do you terminate or somthing else? > >>> > >> > >> On error, I flush all outstanding transactions with an error code and I reset > >> the channel. After the reset, the DMA channel is functional again. The client > >> doesn't need to shutdown anything. > > > > You mean from the client context or driver? > > > > The client doesn't need to call device_free_chan_resources and device_terminate_all > to be specific. Client can certainly call these if it needs to but it is not > required to recover the channel. You didn't answer my question! On error you said you flush, so who does that? > After the reset in error condition, the client can continue issuing new requests > with tx_submit and device_issue_pending as usual.
On 2016-08-19 01:52, Vinod Koul wrote: > On Thu, Aug 18, 2016 at 11:48:52PM -0400, Sinan Kaya wrote: >> On 8/18/2016 11:42 PM, Vinod Koul wrote: >> > On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote: >> >> On 8/18/2016 10:48 PM, Vinod Koul wrote: >> >>>> Keep a size limited list with error cookies and flush them in terminate all? >> >>> I think so, terminate_all anyway cleans up the channel. Btw what is the >> >>> behaviour on error? Do you terminate or somthing else? >> >>> >> >> >> >> On error, I flush all outstanding transactions with an error code and I reset >> >> the channel. After the reset, the DMA channel is functional again. The client >> >> doesn't need to shutdown anything. >> > >> > You mean from the client context or driver? >> > >> >> The client doesn't need to call device_free_chan_resources and >> device_terminate_all >> to be specific. Client can certainly call these if it needs to but it >> is not >> required to recover the channel. > > You didn't answer my question! > > On error you said you flush, so who does that? This is done by the driver in interrupt context when an error interrupt is received. All transactions are posted and hw is reset. > >> After the reset in error condition, the client can continue issuing >> new requests >> with tx_submit and device_issue_pending as usual. -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 19, 2016 at 07:13:43AM -0400, okaya@codeaurora.org wrote: > On 2016-08-19 01:52, Vinod Koul wrote: > >On Thu, Aug 18, 2016 at 11:48:52PM -0400, Sinan Kaya wrote: > >>On 8/18/2016 11:42 PM, Vinod Koul wrote: > >>> On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote: > >>>> On 8/18/2016 10:48 PM, Vinod Koul wrote: > >>>>>> Keep a size limited list with error cookies and flush them in terminate all? > >>>>> I think so, terminate_all anyway cleans up the channel. Btw what is the > >>>>> behaviour on error? Do you terminate or somthing else? > >>>>> > >>>> > >>>> On error, I flush all outstanding transactions with an error code and I reset > >>>> the channel. After the reset, the DMA channel is functional again. The client > >>>> doesn't need to shutdown anything. > >>> > >>> You mean from the client context or driver? > >>> > >> > >>The client doesn't need to call device_free_chan_resources and > >>device_terminate_all > >>to be specific. Client can certainly call these if it needs to > >>but it is not > >>required to recover the channel. > > > >You didn't answer my question! > > > >On error you said you flush, so who does that? > > This is done by the driver in interrupt context when an error > interrupt is received. All transactions are posted and hw is reset. Hmmm, waht about the txn which are pending? DO you clear them..?
On 8/19/2016 1:02 PM, Vinod Koul wrote: > On Fri, Aug 19, 2016 at 07:13:43AM -0400, okaya@codeaurora.org wrote: >> On 2016-08-19 01:52, Vinod Koul wrote: >>> On Thu, Aug 18, 2016 at 11:48:52PM -0400, Sinan Kaya wrote: >>>> On 8/18/2016 11:42 PM, Vinod Koul wrote: >>>>> On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote: >>>>>> On 8/18/2016 10:48 PM, Vinod Koul wrote: >>>>>>>> Keep a size limited list with error cookies and flush them in terminate all? >>>>>>> I think so, terminate_all anyway cleans up the channel. Btw what is the >>>>>>> behaviour on error? Do you terminate or somthing else? >>>>>>> >>>>>> >>>>>> On error, I flush all outstanding transactions with an error code and I reset >>>>>> the channel. After the reset, the DMA channel is functional again. The client >>>>>> doesn't need to shutdown anything. >>>>> >>>>> You mean from the client context or driver? >>>>> >>>> >>>> The client doesn't need to call device_free_chan_resources and >>>> device_terminate_all >>>> to be specific. Client can certainly call these if it needs to >>>> but it is not >>>> required to recover the channel. >>> >>> You didn't answer my question! >>> >>> On error you said you flush, so who does that? >> >> This is done by the driver in interrupt context when an error >> interrupt is received. All transactions are posted and hw is reset. > > Hmmm, waht about the txn which are pending? DO you clear them..? > Yes, I clear both pending and active transactions.
On Fri, Aug 19, 2016 at 01:21:34PM -0400, Sinan Kaya wrote: > On 8/19/2016 1:02 PM, Vinod Koul wrote: > > On Fri, Aug 19, 2016 at 07:13:43AM -0400, okaya@codeaurora.org wrote: > >> On 2016-08-19 01:52, Vinod Koul wrote: > >>> On Thu, Aug 18, 2016 at 11:48:52PM -0400, Sinan Kaya wrote: > >>>> On 8/18/2016 11:42 PM, Vinod Koul wrote: > >>>>> On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote: > >>>>>> On 8/18/2016 10:48 PM, Vinod Koul wrote: > >>>>>>>> Keep a size limited list with error cookies and flush them in terminate all? > >>>>>>> I think so, terminate_all anyway cleans up the channel. Btw what is the > >>>>>>> behaviour on error? Do you terminate or somthing else? > >>>>>>> > >>>>>> > >>>>>> On error, I flush all outstanding transactions with an error code and I reset > >>>>>> the channel. After the reset, the DMA channel is functional again. The client > >>>>>> doesn't need to shutdown anything. > >>>>> > >>>>> You mean from the client context or driver? > >>>>> > >>>> > >>>> The client doesn't need to call device_free_chan_resources and > >>>> device_terminate_all > >>>> to be specific. Client can certainly call these if it needs to > >>>> but it is not > >>>> required to recover the channel. > >>> > >>> You didn't answer my question! > >>> > >>> On error you said you flush, so who does that? > >> > >> This is done by the driver in interrupt context when an error > >> interrupt is received. All transactions are posted and hw is reset. > > > > Hmmm, waht about the txn which are pending? DO you clear them..? > > > > Yes, I clear both pending and active transactions. Okay, in that case your can consider below: 1. dmaengine asserts error interrupt 2. Driver receives and mark's the txn as error 3. Driver completes the txn and intimates the client. No further submissions. Drop the locks before calling callback, as subsequent processing by client maybe in callback thread. 4. Client invokes status and you can return error 5. On error, client calls terminate_all. You can reset channel, free all descriptors in the active, pending and completed lists 6. Client prepares new txn and so on.. Thanks
On 8/22/2016 2:08 AM, Vinod Koul wrote: > On Fri, Aug 19, 2016 at 01:21:34PM -0400, Sinan Kaya wrote: >> On 8/19/2016 1:02 PM, Vinod Koul wrote: >>> On Fri, Aug 19, 2016 at 07:13:43AM -0400, okaya@codeaurora.org wrote: >>>> On 2016-08-19 01:52, Vinod Koul wrote: >>>>> On Thu, Aug 18, 2016 at 11:48:52PM -0400, Sinan Kaya wrote: >>>>>> On 8/18/2016 11:42 PM, Vinod Koul wrote: >>>>>>> On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote: >>>>>>>> On 8/18/2016 10:48 PM, Vinod Koul wrote: >>>>>>>>>> Keep a size limited list with error cookies and flush them in terminate all? >>>>>>>>> I think so, terminate_all anyway cleans up the channel. Btw what is the >>>>>>>>> behaviour on error? Do you terminate or somthing else? >>>>>>>>> >>>>>>>> >>>>>>>> On error, I flush all outstanding transactions with an error code and I reset >>>>>>>> the channel. After the reset, the DMA channel is functional again. The client >>>>>>>> doesn't need to shutdown anything. >>>>>>> >>>>>>> You mean from the client context or driver? >>>>>>> >>>>>> >>>>>> The client doesn't need to call device_free_chan_resources and >>>>>> device_terminate_all >>>>>> to be specific. Client can certainly call these if it needs to >>>>>> but it is not >>>>>> required to recover the channel. >>>>> >>>>> You didn't answer my question! >>>>> >>>>> On error you said you flush, so who does that? >>>> >>>> This is done by the driver in interrupt context when an error >>>> interrupt is received. All transactions are posted and hw is reset. >>> >>> Hmmm, waht about the txn which are pending? DO you clear them..? >>> >> >> Yes, I clear both pending and active transactions. > > Okay, in that case your can consider below: > > 1. dmaengine asserts error interrupt > 2. Driver receives and mark's the txn as error > 3. Driver completes the txn and intimates the client. No further > submissions. Drop the locks before calling callback, as subsequent > processing by client maybe in callback thread. > 4. Client invokes status and you can return error > 5. On error, client calls terminate_all. You can reset channel, free all > descriptors in the active, pending and completed lists > 6. Client prepares new txn and so on.. Just to be clear, you are telling me not to accept any new transactions until terminate_all is called, right? > > Thanks >
On Mon, Aug 22, 2016 at 09:27:04AM -0400, Sinan Kaya wrote: > >>>>> You didn't answer my question! > >>>>> > >>>>> On error you said you flush, so who does that? > >>>> > >>>> This is done by the driver in interrupt context when an error > >>>> interrupt is received. All transactions are posted and hw is reset. > >>> > >>> Hmmm, waht about the txn which are pending? DO you clear them..? > >>> > >> > >> Yes, I clear both pending and active transactions. > > > > Okay, in that case your can consider below: > > > > 1. dmaengine asserts error interrupt > > 2. Driver receives and mark's the txn as error > > 3. Driver completes the txn and intimates the client. No further > > submissions. Drop the locks before calling callback, as subsequent > > processing by client maybe in callback thread. > > 4. Client invokes status and you can return error > > 5. On error, client calls terminate_all. You can reset channel, free all > > descriptors in the active, pending and completed lists > > 6. Client prepares new txn and so on.. > > > Just to be clear, you are telling me not to accept any new transactions until > terminate_all is called, right? You may return error for those cases, but my guess is that since you are interrupt driven, the error propagation in callback and query will be faster.
diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c index 41b5c6d..c41696e 100644 --- a/drivers/dma/qcom/hidma.c +++ b/drivers/dma/qcom/hidma.c @@ -111,6 +111,7 @@ static void hidma_process_completed(struct hidma_chan *mchan) struct dma_async_tx_descriptor *desc; dma_cookie_t last_cookie; struct hidma_desc *mdesc; + struct hidma_desc *next; unsigned long irqflags; struct list_head list; @@ -122,28 +123,31 @@ static void hidma_process_completed(struct hidma_chan *mchan) spin_unlock_irqrestore(&mchan->lock, irqflags); /* Execute callbacks and run dependencies */ - list_for_each_entry(mdesc, &list, node) { - enum dma_status llstat; + list_for_each_entry_safe(mdesc, next, &list, node) { + dma_async_tx_callback callback; + void *param; desc = &mdesc->desc; spin_lock_irqsave(&mchan->lock, irqflags); - dma_cookie_complete(desc); + if (hidma_ll_status(mdma->lldev, mdesc->tre_ch) + == DMA_COMPLETE) + dma_cookie_complete(desc); spin_unlock_irqrestore(&mchan->lock, irqflags); - llstat = hidma_ll_status(mdma->lldev, mdesc->tre_ch); - if (desc->callback && (llstat == DMA_COMPLETE)) - desc->callback(desc->callback_param); + callback = desc->callback; + param = desc->callback_param; last_cookie = desc->cookie; dma_run_dependencies(desc); - } - /* Free descriptors */ - spin_lock_irqsave(&mchan->lock, irqflags); - list_splice_tail_init(&list, &mchan->free); - spin_unlock_irqrestore(&mchan->lock, irqflags); + spin_lock_irqsave(&mchan->lock, irqflags); + list_move(&mdesc->node, &mchan->free); + spin_unlock_irqrestore(&mchan->lock, irqflags); + if (callback) + callback(param); + } } /*
There is a race condition between data transfer callback and descriptor free code. The callback routine may decide to clear the resources even though the descriptor has not yet been freed. Instead of calling the callback first and then releasing the memory, this code is changing the order to return the descriptor back to the free pool and then call the user provided callback. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> --- drivers/dma/qcom/hidma.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-)