Message ID | 1363382956-14557-1-git-send-email-dianders@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Doug, Great..i have found the problem like this. I will check your patch..and share the result. Best Regards, Jaehoon Chung On 03/16/2013 06:29 AM, Doug Anderson wrote: > On a flaky piece of hardware that seems good at generating CRC errors, > we have found that often times the CRC errors don't get reported > properly when using CONFIG_MMC_DW_IDMAC (they get reported OK when > using pio). > > The flow that happens is: > 1. dw_mci_interrupt() fires and status=80b8, pending=8088 so that > we hit (pending & DW_MCI_DATA_ERROR_FLAGS). We store 8088 in > data_status and set EVENT_DATA_ERROR in host->pending_events > 2. We schedule the tasklet and it runs. > 3. We're in STATE_SENDING_DATA in the tasklet and see > EVENT_DATA_ERROR so we dw_mci_stop_dma(). > 4. dw_mci_stop_dma() calls dw_mci_idmac_stop_dma() and > dw_mci_dma_cleanup(). These stop dma but _don't_ set > EVENT_XFER_COMPLETE (since we're host->using_dma). > 5. data->stop is NULL so we don't send a stop command. > 6. We move onto STATE_DATA_ERROR and loop again in the tasklet. > 7. We hit STATE_DATA_ERROR but the transfer isn't done, so the tasklet > stops. > > We never seem to get any additional DMA interrupts that cause > EVENT_XFER_COMPLETE and restart the tasklet so we just hang. That > doesn't seem surprising given that we've stopped DMA. > > We did put a print at the end of dw_mci_interrupt() to show the result > of the "mci_readl(host, IDSTS)" and saw 0xa000 in the case of the > above CRC error. > > A proposed fix for this is to ignore (but still clear) the > EVENT_XFER_COMPLETE in STATE_DATA_ERROR in the tasklet. > > Reported-by: Bing Zhao <bzhao@marvell.com> > Signed-off-by: Doug Anderson <dianders@chromium.org> > --- > drivers/mmc/host/dw_mmc.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c > index 9834221..696b3bb 100644 > --- a/drivers/mmc/host/dw_mmc.c > +++ b/drivers/mmc/host/dw_mmc.c > @@ -1137,10 +1137,7 @@ static void dw_mci_tasklet_func(unsigned long priv) > goto unlock; > > case STATE_DATA_ERROR: > - if (!test_and_clear_bit(EVENT_XFER_COMPLETE, > - &host->pending_events)) > - break; > - > + clear_bit(EVENT_XFER_COMPLETE, &host->pending_events); > state = STATE_DATA_BUSY; > break; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jaehoon, On Mon, Mar 18, 2013 at 3:21 AM, Jaehoon Chung <jh80.chung@samsung.com> wrote: > Hi Doug, > > Great..i have found the problem like this. > I will check your patch..and share the result. Did you have any time to check this patch? Thanks! -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Doug, You're right..it's something wrong. Actually i didn't test with your patch, but your commit message is reasonable. I will check until next week after test. Best Regards, Jaehoon Chung On 03/27/2013 03:06 AM, Doug Anderson wrote: > Jaehoon, > > On Mon, Mar 18, 2013 at 3:21 AM, Jaehoon Chung <jh80.chung@samsung.com> wrote: >> Hi Doug, >> >> Great..i have found the problem like this. >> I will check your patch..and share the result. > > Did you have any time to check this patch? > > Thanks! > > -Doug > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Looks good to me. Reviewed-by: Jaehoon Chung <jh80.chung@samsung.com> Best Regards, Jaehoon Chung On 04/05/2013 05:18 PM, Jaehoon Chung wrote: > Hi Doug, > > You're right..it's something wrong. > Actually i didn't test with your patch, but your commit message is reasonable. > > I will check until next week after test. > > Best Regards, > Jaehoon Chung > > On 03/27/2013 03:06 AM, Doug Anderson wrote: >> Jaehoon, >> >> On Mon, Mar 18, 2013 at 3:21 AM, Jaehoon Chung <jh80.chung@samsung.com> wrote: >>> Hi Doug, >>> >>> Great..i have found the problem like this. >>> I will check your patch..and share the result. >> >> Did you have any time to check this patch? >> >> Thanks! >> >> -Doug >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, April 05, 2013, Jaehoon Chung wrote: > Hi Doug, > > You're right..it's something wrong. > Actually i didn't test with your patch, but your commit message is reasonable. > > I will check until next week after test. Doug Anderson, Jaehoon Chung, Sorry for late response. Could I explain this problem more? I guess Doug are debugging it with wifi, right? The problem happens when dw_mci_stop_dma is called in the middle of data transfers. If data error occurs in the end of block, EVENT_XFER_COMPLETE might be set. So, it's fine. Actually, dw_mci_idmac_stop_dma stops the dma working, there is no further interrupt for dma completion. There are two solutions we have applied. #1. deferring the call of dw_mci_stop_dma until EVENT_XFER_COMPLETE flag is set into pending_events. In this case, dma transfer will be continued with error. @@ -1062,7 +1062,6 @@ static void dw_mci_tasklet_func(unsigned long priv) case STATE_SENDING_DATA: if (test_and_clear_bit(EVENT_DATA_ERROR, &host->pending_events)) { - dw_mci_stop_dma(host); if (data->stop) send_stop_cmd(host, data); state = STATE_DATA_ERROR; @@ -1155,6 +1154,9 @@ static void dw_mci_tasklet_func(unsigned long priv) &host->pending_events)) break; + dw_mci_stop_dma(host); + set_bit(EVENT_XFER_COMPLETE, &host->completed_events); + state = STATE_DATA_BUSY; break; #2. set EVENT_XFER_COMPLETE flag when dw_mci_stop_dma is called regardless using_dma. @@ -299,10 +299,9 @@ static void dw_mci_stop_dma(struct dw_mci *host) if (host->using_dma) { host->dma_ops->stop(host); host->dma_ops->cleanup(host); - } else { - /* Data transfer was stopped by the interrupt handler */ - set_bit(EVENT_XFER_COMPLETE, &host->pending_events); } + + set_bit(EVENT_XFER_COMPLETE, &host->pending_events); } If you have any opinion, please let me know. Thanks, Seungwon Jeon > > Best Regards, > Jaehoon Chung > > On 03/27/2013 03:06 AM, Doug Anderson wrote: > > Jaehoon, > > > > On Mon, Mar 18, 2013 at 3:21 AM, Jaehoon Chung <jh80.chung@samsung.com> wrote: > >> Hi Doug, > >> > >> Great..i have found the problem like this. > >> I will check your patch..and share the result. > > > > Did you have any time to check this patch? > > > > Thanks! > > > > -Doug > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Seungwon, On Mon, Apr 8, 2013 at 5:17 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote: > I guess Doug are debugging it with wifi, right? Yes, we're debugging it on the Samsung ARM Chromebook on a part that has an SDIO WiFi module by Marvell. Bing Zhao (CCed) has a unit in hand that generates lots of CRC errors and has been testing patches I've sent him. > The problem happens when dw_mci_stop_dma is called in the middle of data transfers. > If data error occurs in the end of block, EVENT_XFER_COMPLETE might be set. So, it's fine. > Actually, dw_mci_idmac_stop_dma stops the dma working, there is no further interrupt for dma completion. That sounds right to me. > There are two solutions we have applied. I'm a little confused. Have you already applied one or both of the solutions you list below, or are you proposing them as alternates to the patch I submitted? > #1. deferring the call of dw_mci_stop_dma until EVENT_XFER_COMPLETE flag is set into pending_events. > In this case, dma transfer will be continued with error. > > @@ -1062,7 +1062,6 @@ static void dw_mci_tasklet_func(unsigned long priv) > case STATE_SENDING_DATA: > if (test_and_clear_bit(EVENT_DATA_ERROR, > &host->pending_events)) { > - dw_mci_stop_dma(host); > if (data->stop) > send_stop_cmd(host, data); > state = STATE_DATA_ERROR; > @@ -1155,6 +1154,9 @@ static void dw_mci_tasklet_func(unsigned long priv) > &host->pending_events)) > break; > > + dw_mci_stop_dma(host); > + set_bit(EVENT_XFER_COMPLETE, &host->completed_events); > + > state = STATE_DATA_BUSY; > break; I can't say that I'm quite familiar enough with the intricate details of the driver to know whether this is a good idea or guaranteed to work. Do we really think that we'll still get the end of the transfer properly if we've seen an error already? I worry that we won't. > #2. set EVENT_XFER_COMPLETE flag when dw_mci_stop_dma is called regardless using_dma. > > @@ -299,10 +299,9 @@ static void dw_mci_stop_dma(struct dw_mci *host) > if (host->using_dma) { > host->dma_ops->stop(host); > host->dma_ops->cleanup(host); > - } else { > - /* Data transfer was stopped by the interrupt handler */ > - set_bit(EVENT_XFER_COMPLETE, &host->pending_events); > } > + > + set_bit(EVENT_XFER_COMPLETE, &host->pending_events); > } This is fairly similar to my patch but goes further. I believe my patch has this effect but only for the call to dw_mci_stop_dma() in STATE_SENDING_DATA in the tasklet. Your affects all 3 calls to dw_mci_stop_dma(). This seems reasonable but I don't have confidence in my understanding of this driver's state machine (especially with regards to the error conditions) that I can say which is better. If you think that this is a more correct solution than mine then we can give it some testing. Thanks! -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday, April 09, 2013, Doug Anderson wrote: > Seungwon, > > On Mon, Apr 8, 2013 at 5:17 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote: > > I guess Doug are debugging it with wifi, right? > > Yes, we're debugging it on the Samsung ARM Chromebook on a part that > has an SDIO WiFi module by Marvell. Bing Zhao (CCed) has a unit in > hand that generates lots of CRC errors and has been testing patches > I've sent him. > > > > The problem happens when dw_mci_stop_dma is called in the middle of data transfers. > > If data error occurs in the end of block, EVENT_XFER_COMPLETE might be set. So, it's fine. > > Actually, dw_mci_idmac_stop_dma stops the dma working, there is no further interrupt for dma > completion. > > That sounds right to me. > > > > There are two solutions we have applied. > > I'm a little confused. Have you already applied one or both of the > solutions you list below, or are you proposing them as alternates to > the patch I submitted? Yes, first one already has been applied. I wanted to introduce our fix. Did you try to test with these fixes? > > > #1. deferring the call of dw_mci_stop_dma until EVENT_XFER_COMPLETE flag is set into pending_events. > > In this case, dma transfer will be continued with error. > > > > @@ -1062,7 +1062,6 @@ static void dw_mci_tasklet_func(unsigned long priv) > > case STATE_SENDING_DATA: > > if (test_and_clear_bit(EVENT_DATA_ERROR, > > &host->pending_events)) { > > - dw_mci_stop_dma(host); > > if (data->stop) > > send_stop_cmd(host, data); > > state = STATE_DATA_ERROR; > > @@ -1155,6 +1154,9 @@ static void dw_mci_tasklet_func(unsigned long priv) > > &host->pending_events)) > > break; > > > > + dw_mci_stop_dma(host); > > + set_bit(EVENT_XFER_COMPLETE, &host->completed_events); > > + > > state = STATE_DATA_BUSY; > > break; > > I can't say that I'm quite familiar enough with the intricate details > of the driver to know whether this is a good idea or guaranteed to > work. Do we really think that we'll still get the end of the transfer > properly if we've seen an error already? I worry that we won't. For example, let's pretend data CRC error occurs during data read. Peer device doesn't know that error occurrence and data transmission still keeps going. dma will run as long as host doesn't take the stop or see the end of descriptor. > > > > #2. set EVENT_XFER_COMPLETE flag when dw_mci_stop_dma is called regardless using_dma. > > > > @@ -299,10 +299,9 @@ static void dw_mci_stop_dma(struct dw_mci *host) > > if (host->using_dma) { > > host->dma_ops->stop(host); > > host->dma_ops->cleanup(host); > > - } else { > > - /* Data transfer was stopped by the interrupt handler */ > > - set_bit(EVENT_XFER_COMPLETE, &host->pending_events); > > } > > + > > + set_bit(EVENT_XFER_COMPLETE, &host->pending_events); > > } > > This is fairly similar to my patch but goes further. I believe my > patch has this effect but only for the call to dw_mci_stop_dma() in > STATE_SENDING_DATA in the tasklet. Your affects all 3 calls to > dw_mci_stop_dma(). > > This seems reasonable but I don't have confidence in my understanding > of this driver's state machine (especially with regards to the error > conditions) that I can say which is better. If you think that this is > a more correct solution than mine then we can give it some testing. Yes. As a result, both patches prevent tasklet's hanging. In that regard, two patches give the similar effect. But I think your fix are just removing the test_bit to wait for EVENT_XFER_COMPLETE. 'clear_bit(...) part which is added might be of no effect. It doesn't make sense a bit. <quotation> case STATE_DATA_ERROR: - if (!test_and_clear_bit(EVENT_XFER_COMPLETE, - &host->pending_events)) - break; - + clear_bit(EVENT_XFER_COMPLETE, &host->pending_events); </quotation> Thanks, Seugwon Jeon > > Thanks! > > -Doug > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/10/2013 04:02 PM, Seungwon Jeon wrote: > On Tuesday, April 09, 2013, Doug Anderson wrote: >> Seungwon, >> >> On Mon, Apr 8, 2013 at 5:17 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote: >>> I guess Doug are debugging it with wifi, right? >> >> Yes, we're debugging it on the Samsung ARM Chromebook on a part that >> has an SDIO WiFi module by Marvell. Bing Zhao (CCed) has a unit in >> hand that generates lots of CRC errors and has been testing patches >> I've sent him. >> >> >>> The problem happens when dw_mci_stop_dma is called in the middle of data transfers. >>> If data error occurs in the end of block, EVENT_XFER_COMPLETE might be set. So, it's fine. >>> Actually, dw_mci_idmac_stop_dma stops the dma working, there is no further interrupt for dma >> completion. >> >> That sounds right to me. >> >> >>> There are two solutions we have applied. >> >> I'm a little confused. Have you already applied one or both of the >> solutions you list below, or are you proposing them as alternates to >> the patch I submitted? > Yes, first one already has been applied. > I wanted to introduce our fix. Did you try to test with these fixes? Actually i have tested with Seungwon's fixes. It looks good. > >> >>> #1. deferring the call of dw_mci_stop_dma until EVENT_XFER_COMPLETE flag is set into pending_events. >>> In this case, dma transfer will be continued with error. >>> >>> @@ -1062,7 +1062,6 @@ static void dw_mci_tasklet_func(unsigned long priv) >>> case STATE_SENDING_DATA: >>> if (test_and_clear_bit(EVENT_DATA_ERROR, >>> &host->pending_events)) { >>> - dw_mci_stop_dma(host); >>> if (data->stop) >>> send_stop_cmd(host, data); >>> state = STATE_DATA_ERROR; >>> @@ -1155,6 +1154,9 @@ static void dw_mci_tasklet_func(unsigned long priv) >>> &host->pending_events)) >>> break; >>> >>> + dw_mci_stop_dma(host); >>> + set_bit(EVENT_XFER_COMPLETE, &host->completed_events); >>> + >>> state = STATE_DATA_BUSY; >>> break; >> >> I can't say that I'm quite familiar enough with the intricate details >> of the driver to know whether this is a good idea or guaranteed to >> work. Do we really think that we'll still get the end of the transfer >> properly if we've seen an error already? I worry that we won't. > For example, let's pretend data CRC error occurs during data read. > Peer device doesn't know that error occurrence and data transmission still keeps going. > dma will run as long as host doesn't take the stop or see the end of descriptor. >> >> >>> #2. set EVENT_XFER_COMPLETE flag when dw_mci_stop_dma is called regardless using_dma. >>> >>> @@ -299,10 +299,9 @@ static void dw_mci_stop_dma(struct dw_mci *host) >>> if (host->using_dma) { >>> host->dma_ops->stop(host); >>> host->dma_ops->cleanup(host); >>> - } else { >>> - /* Data transfer was stopped by the interrupt handler */ >>> - set_bit(EVENT_XFER_COMPLETE, &host->pending_events); >>> } >>> + >>> + set_bit(EVENT_XFER_COMPLETE, &host->pending_events); >>> } >> >> This is fairly similar to my patch but goes further. I believe my >> patch has this effect but only for the call to dw_mci_stop_dma() in >> STATE_SENDING_DATA in the tasklet. Your affects all 3 calls to >> dw_mci_stop_dma(). I think we can also use the second approach. but i think that it also needs to test with this. >> >> This seems reasonable but I don't have confidence in my understanding >> of this driver's state machine (especially with regards to the error >> conditions) that I can say which is better. If you think that this is >> a more correct solution than mine then we can give it some testing. > Yes. As a result, both patches prevent tasklet's hanging. > In that regard, two patches give the similar effect. > But I think your fix are just removing the test_bit to wait for EVENT_XFER_COMPLETE. > 'clear_bit(...) part which is added might be of no effect. > It doesn't make sense a bit. > > <quotation> > case STATE_DATA_ERROR: > - if (!test_and_clear_bit(EVENT_XFER_COMPLETE, > - &host->pending_events)) > - break; > - > + clear_bit(EVENT_XFER_COMPLETE, &host->pending_events); > </quotation> > > Thanks, > Seugwon Jeon >> >> Thanks! >> >> -Doug >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Seungwon, On Wed, Apr 10, 2013 at 12:02 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote: >> > There are two solutions we have applied. >> >> I'm a little confused. Have you already applied one or both of the >> solutions you list below, or are you proposing them as alternates to >> the patch I submitted? > Yes, first one already has been applied. > I wanted to introduce our fix. Did you try to test with these fixes? I'm coming back to this after being quite distracted for a while. I'm a little confused in that you said that your first fix was already applied. I don't see it anywhere. Did you mean that you've already applied it locally, or that it's applied in some git tree somewhere? If so, can you point me to it? If this hasn't been sent out anywhere, perhaps you could send out official patches? I don't have the failing unit myself, so we'll have to get Bing to try the patches. You are suggesting that we try applying both of your patches, right? -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday, June 13, 2013, Doug Anderson wrote: > Seungwon, > > On Wed, Apr 10, 2013 at 12:02 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote: > >> > There are two solutions we have applied. > >> > >> I'm a little confused. Have you already applied one or both of the > >> solutions you list below, or are you proposing them as alternates to > >> the patch I submitted? > > Yes, first one already has been applied. > > I wanted to introduce our fix. Did you try to test with these fixes? > > I'm coming back to this after being quite distracted for a while. > > I'm a little confused in that you said that your first fix was already > applied. I don't see it anywhere. Did you mean that you've already > applied it locally, or that it's applied in some git tree somewhere? > If so, can you point me to it? > > If this hasn't been sent out anywhere, perhaps you could send out > official patches? Currently, it has just applied for some projects, not official patch. > > I don't have the failing unit myself, so we'll have to get Bing to try > the patches. You are suggesting that we try applying both of your > patches, right? Did you test the patch? I wonder that both are good for your side. Thanks, Seungwon Jeon > > -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Seungwon, > > I don't have the failing unit myself, so we'll have to get Bing to try > > the patches. You are suggesting that we try applying both of your > > patches, right? > Did you test the patch? > I wonder that both are good for your side. I tested Doug's patch on my failing unit. Without this patch, mmc got hung_task and rebooted eventually. With this patch applied, mmc returns CRC error instead of hung_task, and the error is handled gracefully. Thanks, Bing > > Thanks, > Seungwon Jeon -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bing, On Tue, Jun 18, 2013 at 12:46 PM, Bing Zhao <bzhao@marvell.com> wrote: > Hi Seungwon, > >> > I don't have the failing unit myself, so we'll have to get Bing to try >> > the patches. You are suggesting that we try applying both of your >> > patches, right? >> Did you test the patch? >> I wonder that both are good for your side. > > I tested Doug's patch on my failing unit. > > Without this patch, mmc got hung_task and rebooted eventually. > With this patch applied, mmc returns CRC error instead of hung_task, and the error is handled gracefully. Have you tried one or both or Seungwon's fixes without mine? -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Doug, > > I tested Doug's patch on my failing unit. > > > > Without this patch, mmc got hung_task and rebooted eventually. > > With this patch applied, mmc returns CRC error instead of hung_task, and the error is handled > gracefully. > > Have you tried one or both or Seungwon's fixes without mine? I only tested your original patch. That was several months ago I think. I still have that unit. Let me know if you want me to try Seungwon's patches. Thanks, Bing > > -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi All, I tested with Seungwon's patch and this patch. I'm agreed with Seungwon's opinion. Best Regards, Jaehoon Chung On 06/18/2013 09:36 PM, Seungwon Jeon wrote: > On Thursday, June 13, 2013, Doug Anderson wrote: >> Seungwon, >> >> On Wed, Apr 10, 2013 at 12:02 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote: >>>>> There are two solutions we have applied. >>>> >>>> I'm a little confused. Have you already applied one or both of the >>>> solutions you list below, or are you proposing them as alternates to >>>> the patch I submitted? >>> Yes, first one already has been applied. >>> I wanted to introduce our fix. Did you try to test with these fixes? >> >> I'm coming back to this after being quite distracted for a while. >> >> I'm a little confused in that you said that your first fix was already >> applied. I don't see it anywhere. Did you mean that you've already >> applied it locally, or that it's applied in some git tree somewhere? >> If so, can you point me to it? >> >> If this hasn't been sent out anywhere, perhaps you could send out >> official patches? > Currently, it has just applied for some projects, not official patch. >> >> I don't have the failing unit myself, so we'll have to get Bing to try >> the patches. You are suggesting that we try applying both of your >> patches, right? > Did you test the patch? > I wonder that both are good for your side. > > Thanks, > Seungwon Jeon >> >> -Doug > > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bing, On Tue, Jun 18, 2013 at 1:01 PM, Bing Zhao <bzhao@marvell.com> wrote: > Hi Doug, > >> > I tested Doug's patch on my failing unit. >> > >> > Without this patch, mmc got hung_task and rebooted eventually. >> > With this patch applied, mmc returns CRC error instead of hung_task, and the error is handled >> gracefully. >> >> Have you tried one or both or Seungwon's fixes without mine? > > I only tested your original patch. That was several months ago I think. > > I still have that unit. Let me know if you want me to try Seungwon's patches. I think the proposal on the table is to take Seungwon's patches instead of mine. Assuming they solve your problems, I'm OK with that. I think he was requesting testing the first of his two patches alone and then both of his two patches together. Thanks! -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I think the proposal on the table is to take Seungwon's patches > instead of mine. Assuming they solve your problems, I'm OK with that. > I think he was requesting testing the first of his two patches alone > and then both of his two patches together. Test #1: Swungwon's patch #1 alone [1] Test #2: Swungwon's patch #2 alone [1] Test #3: Swungwon's patch #1 and #2 [1] Test #4: Doug's original patch [2] Test #1 and #3: it doesn't work; system reboots due to kernel hung_task Test #2 and #4: it works; instead of hung_task driver gets CRC error (which is expected) Thanks, Bing [1] https://lkml.org/lkml/2013/4/8/316 [2] https://lkml.org/lkml/2013/3/15/583 -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jaehoon, Do you have the same result? Could you share the result? Thanks, Seungwon Jeon On Tuesday, June 25 2013, Bing Zhao wrote: > > I think the proposal on the table is to take Seungwon's patches > > instead of mine. Assuming they solve your problems, I'm OK with that. > > I think he was requesting testing the first of his two patches alone > > and then both of his two patches together. > > Test #1: Swungwon's patch #1 alone [1] > Test #2: Swungwon's patch #2 alone [1] > Test #3: Swungwon's patch #1 and #2 [1] > Test #4: Doug's original patch [2] > > Test #1 and #3: it doesn't work; system reboots due to kernel hung_task > Test #2 and #4: it works; instead of hung_task driver gets CRC error (which is expected) > > Thanks, > Bing > > [1] https://lkml.org/lkml/2013/4/8/316 > [2] https://lkml.org/lkml/2013/3/15/583 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Seungwon, I didn't get the same result..In my case, it's working fine. But as Bing's result, i will check more and share the result. Best Regards, Jaehoon Chung On 06/26/2013 10:53 AM, Seungwon Jeon wrote: > Hi Jaehoon, > Do you have the same result? > Could you share the result? > > Thanks, > Seungwon Jeon > On Tuesday, June 25 2013, Bing Zhao wrote: >>> I think the proposal on the table is to take Seungwon's patches >>> instead of mine. Assuming they solve your problems, I'm OK with that. >>> I think he was requesting testing the first of his two patches alone >>> and then both of his two patches together. >> >> Test #1: Swungwon's patch #1 alone [1] >> Test #2: Swungwon's patch #2 alone [1] >> Test #3: Swungwon's patch #1 and #2 [1] >> Test #4: Doug's original patch [2] >> >> Test #1 and #3: it doesn't work; system reboots due to kernel hung_task >> Test #2 and #4: it works; instead of hung_task driver gets CRC error (which is expected) >> >> Thanks, >> Bing >> >> [1] https://lkml.org/lkml/2013/4/8/316 >> [2] https://lkml.org/lkml/2013/3/15/583 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I didn't get the same result..In my case, it's working fine. > But as Bing's result, i will check more and share the result. I want to add that my kernel is from "cros/release-R26-3701.B" branch with this HEAD: 4e13a9c CHERRY-PICK: UPSTREAM: loop: prevent bdev freeing while device in use Thanks, Bing -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c index 9834221..696b3bb 100644 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -1137,10 +1137,7 @@ static void dw_mci_tasklet_func(unsigned long priv) goto unlock; case STATE_DATA_ERROR: - if (!test_and_clear_bit(EVENT_XFER_COMPLETE, - &host->pending_events)) - break; - + clear_bit(EVENT_XFER_COMPLETE, &host->pending_events); state = STATE_DATA_BUSY; break; }
On a flaky piece of hardware that seems good at generating CRC errors, we have found that often times the CRC errors don't get reported properly when using CONFIG_MMC_DW_IDMAC (they get reported OK when using pio). The flow that happens is: 1. dw_mci_interrupt() fires and status=80b8, pending=8088 so that we hit (pending & DW_MCI_DATA_ERROR_FLAGS). We store 8088 in data_status and set EVENT_DATA_ERROR in host->pending_events 2. We schedule the tasklet and it runs. 3. We're in STATE_SENDING_DATA in the tasklet and see EVENT_DATA_ERROR so we dw_mci_stop_dma(). 4. dw_mci_stop_dma() calls dw_mci_idmac_stop_dma() and dw_mci_dma_cleanup(). These stop dma but _don't_ set EVENT_XFER_COMPLETE (since we're host->using_dma). 5. data->stop is NULL so we don't send a stop command. 6. We move onto STATE_DATA_ERROR and loop again in the tasklet. 7. We hit STATE_DATA_ERROR but the transfer isn't done, so the tasklet stops. We never seem to get any additional DMA interrupts that cause EVENT_XFER_COMPLETE and restart the tasklet so we just hang. That doesn't seem surprising given that we've stopped DMA. We did put a print at the end of dw_mci_interrupt() to show the result of the "mci_readl(host, IDSTS)" and saw 0xa000 in the case of the above CRC error. A proposed fix for this is to ignore (but still clear) the EVENT_XFER_COMPLETE in STATE_DATA_ERROR in the tasklet. Reported-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Doug Anderson <dianders@chromium.org> --- drivers/mmc/host/dw_mmc.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)