diff mbox

RFC: mmc: dw_mmc: Always go to STATE_DATA_BUSY from STATE_DATA_ERROR

Message ID 1363382956-14557-1-git-send-email-dianders@chromium.org (mailing list archive)
State New, archived
Headers show

Commit Message

Doug Anderson March 15, 2013, 9:29 p.m. UTC
On a flaky piece of hardware that seems good at generating CRC errors,
we have found that often times the CRC errors don't get reported
properly when using CONFIG_MMC_DW_IDMAC (they get reported OK when
using pio).

The flow that happens is:
1. dw_mci_interrupt() fires and status=80b8, pending=8088 so that
   we hit (pending & DW_MCI_DATA_ERROR_FLAGS).  We store 8088 in
   data_status and set EVENT_DATA_ERROR in host->pending_events
2. We schedule the tasklet and it runs.
3. We're in STATE_SENDING_DATA in the tasklet and see
   EVENT_DATA_ERROR so we dw_mci_stop_dma().
4. dw_mci_stop_dma() calls dw_mci_idmac_stop_dma() and
   dw_mci_dma_cleanup().  These stop dma but _don't_ set
   EVENT_XFER_COMPLETE (since we're host->using_dma).
5. data->stop is NULL so we don't send a stop command.
6. We move onto STATE_DATA_ERROR and loop again in the tasklet.
7. We hit STATE_DATA_ERROR but the transfer isn't done, so the tasklet
   stops.

We never seem to get any additional DMA interrupts that cause
EVENT_XFER_COMPLETE and restart the tasklet so we just hang.  That
doesn't seem surprising given that we've stopped DMA.

We did put a print at the end of dw_mci_interrupt() to show the result
of the "mci_readl(host, IDSTS)" and saw 0xa000 in the case of the
above CRC error.

A proposed fix for this is to ignore (but still clear) the
EVENT_XFER_COMPLETE in STATE_DATA_ERROR in the tasklet.

Reported-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/mmc/host/dw_mmc.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Jaehoon Chung March 18, 2013, 10:21 a.m. UTC | #1
Hi Doug,

Great..i have found the problem like this.
I will check your patch..and share the result.

Best Regards,
Jaehoon Chung

On 03/16/2013 06:29 AM, Doug Anderson wrote:
> On a flaky piece of hardware that seems good at generating CRC errors,
> we have found that often times the CRC errors don't get reported
> properly when using CONFIG_MMC_DW_IDMAC (they get reported OK when
> using pio).
> 
> The flow that happens is:
> 1. dw_mci_interrupt() fires and status=80b8, pending=8088 so that
>    we hit (pending & DW_MCI_DATA_ERROR_FLAGS).  We store 8088 in
>    data_status and set EVENT_DATA_ERROR in host->pending_events
> 2. We schedule the tasklet and it runs.
> 3. We're in STATE_SENDING_DATA in the tasklet and see
>    EVENT_DATA_ERROR so we dw_mci_stop_dma().
> 4. dw_mci_stop_dma() calls dw_mci_idmac_stop_dma() and
>    dw_mci_dma_cleanup().  These stop dma but _don't_ set
>    EVENT_XFER_COMPLETE (since we're host->using_dma).
> 5. data->stop is NULL so we don't send a stop command.
> 6. We move onto STATE_DATA_ERROR and loop again in the tasklet.
> 7. We hit STATE_DATA_ERROR but the transfer isn't done, so the tasklet
>    stops.
> 
> We never seem to get any additional DMA interrupts that cause
> EVENT_XFER_COMPLETE and restart the tasklet so we just hang.  That
> doesn't seem surprising given that we've stopped DMA.
> 
> We did put a print at the end of dw_mci_interrupt() to show the result
> of the "mci_readl(host, IDSTS)" and saw 0xa000 in the case of the
> above CRC error.
> 
> A proposed fix for this is to ignore (but still clear) the
> EVENT_XFER_COMPLETE in STATE_DATA_ERROR in the tasklet.
> 
> Reported-by: Bing Zhao <bzhao@marvell.com>
> Signed-off-by: Doug Anderson <dianders@chromium.org>
> ---
>  drivers/mmc/host/dw_mmc.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 9834221..696b3bb 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -1137,10 +1137,7 @@ static void dw_mci_tasklet_func(unsigned long priv)
>  			goto unlock;
>  
>  		case STATE_DATA_ERROR:
> -			if (!test_and_clear_bit(EVENT_XFER_COMPLETE,
> -						&host->pending_events))
> -				break;
> -
> +			clear_bit(EVENT_XFER_COMPLETE, &host->pending_events);
>  			state = STATE_DATA_BUSY;
>  			break;
>  		}
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Anderson March 26, 2013, 6:06 p.m. UTC | #2
Jaehoon,

On Mon, Mar 18, 2013 at 3:21 AM, Jaehoon Chung <jh80.chung@samsung.com> wrote:
> Hi Doug,
>
> Great..i have found the problem like this.
> I will check your patch..and share the result.

Did you have any time to check this patch?

Thanks!

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jaehoon Chung April 5, 2013, 8:18 a.m. UTC | #3
Hi Doug,

You're right..it's something wrong.
Actually i didn't test with your patch, but your commit message is reasonable.

I will check until next week after test.

Best Regards,
Jaehoon Chung

On 03/27/2013 03:06 AM, Doug Anderson wrote:
> Jaehoon,
> 
> On Mon, Mar 18, 2013 at 3:21 AM, Jaehoon Chung <jh80.chung@samsung.com> wrote:
>> Hi Doug,
>>
>> Great..i have found the problem like this.
>> I will check your patch..and share the result.
> 
> Did you have any time to check this patch?
> 
> Thanks!
> 
> -Doug
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jaehoon Chung April 8, 2013, 5:10 a.m. UTC | #4
Looks good to me.

Reviewed-by: Jaehoon Chung <jh80.chung@samsung.com>

Best Regards,
Jaehoon Chung

On 04/05/2013 05:18 PM, Jaehoon Chung wrote:
> Hi Doug,
> 
> You're right..it's something wrong.
> Actually i didn't test with your patch, but your commit message is reasonable.
> 
> I will check until next week after test.
> 
> Best Regards,
> Jaehoon Chung
> 
> On 03/27/2013 03:06 AM, Doug Anderson wrote:
>> Jaehoon,
>>
>> On Mon, Mar 18, 2013 at 3:21 AM, Jaehoon Chung <jh80.chung@samsung.com> wrote:
>>> Hi Doug,
>>>
>>> Great..i have found the problem like this.
>>> I will check your patch..and share the result.
>>
>> Did you have any time to check this patch?
>>
>> Thanks!
>>
>> -Doug
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Seungwon Jeon April 8, 2013, 12:17 p.m. UTC | #5
On Friday, April 05, 2013, Jaehoon Chung wrote:
> Hi Doug,
> 
> You're right..it's something wrong.
> Actually i didn't test with your patch, but your commit message is reasonable.
> 
> I will check until next week after test.
Doug Anderson, Jaehoon Chung,

Sorry for late response.
Could I explain this problem more?
I guess Doug are debugging it with wifi, right?
The problem happens when dw_mci_stop_dma is called in the middle of data transfers.
If data error occurs in the end of block, EVENT_XFER_COMPLETE might be set. So, it's fine.
Actually, dw_mci_idmac_stop_dma stops the dma working, there is no further interrupt for dma completion.
There are two solutions we have applied.

#1. deferring the call of dw_mci_stop_dma until EVENT_XFER_COMPLETE flag is set into pending_events.
In this case, dma transfer will be continued with error.

@@ -1062,7 +1062,6 @@ static void dw_mci_tasklet_func(unsigned long priv)
                case STATE_SENDING_DATA:
                        if (test_and_clear_bit(EVENT_DATA_ERROR,
                                               &host->pending_events)) {
-                               dw_mci_stop_dma(host);
                                if (data->stop)
                                        send_stop_cmd(host, data);
                                state = STATE_DATA_ERROR;
@@ -1155,6 +1154,9 @@ static void dw_mci_tasklet_func(unsigned long priv)
                                                &host->pending_events))
                                break;

+                       dw_mci_stop_dma(host);
+                       set_bit(EVENT_XFER_COMPLETE, &host->completed_events);
+
                        state = STATE_DATA_BUSY;
                        break;
                        
#2. set EVENT_XFER_COMPLETE flag when dw_mci_stop_dma is called regardless using_dma.

@@ -299,10 +299,9 @@ static void dw_mci_stop_dma(struct dw_mci *host)
        if (host->using_dma) {
                host->dma_ops->stop(host);
                host->dma_ops->cleanup(host);
-       } else {
-               /* Data transfer was stopped by the interrupt handler */
-               set_bit(EVENT_XFER_COMPLETE, &host->pending_events);
        }
+
+       set_bit(EVENT_XFER_COMPLETE, &host->pending_events);
 }

If you have any opinion, please let me know.

Thanks,
Seungwon Jeon 
> 
> Best Regards,
> Jaehoon Chung
> 
> On 03/27/2013 03:06 AM, Doug Anderson wrote:
> > Jaehoon,
> >
> > On Mon, Mar 18, 2013 at 3:21 AM, Jaehoon Chung <jh80.chung@samsung.com> wrote:
> >> Hi Doug,
> >>
> >> Great..i have found the problem like this.
> >> I will check your patch..and share the result.
> >
> > Did you have any time to check this patch?
> >
> > Thanks!
> >
> > -Doug
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Anderson April 8, 2013, 11:09 p.m. UTC | #6
Seungwon,

On Mon, Apr 8, 2013 at 5:17 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote:
> I guess Doug are debugging it with wifi, right?

Yes, we're debugging it on the Samsung ARM Chromebook on a part that
has an SDIO WiFi module by Marvell.  Bing Zhao (CCed) has a unit in
hand that generates lots of CRC errors and has been testing patches
I've sent him.


> The problem happens when dw_mci_stop_dma is called in the middle of data transfers.
> If data error occurs in the end of block, EVENT_XFER_COMPLETE might be set. So, it's fine.
> Actually, dw_mci_idmac_stop_dma stops the dma working, there is no further interrupt for dma completion.

That sounds right to me.


> There are two solutions we have applied.

I'm a little confused.  Have you already applied one or both of the
solutions you list below, or are you proposing them as alternates to
the patch I submitted?

> #1. deferring the call of dw_mci_stop_dma until EVENT_XFER_COMPLETE flag is set into pending_events.
> In this case, dma transfer will be continued with error.
>
> @@ -1062,7 +1062,6 @@ static void dw_mci_tasklet_func(unsigned long priv)
>                 case STATE_SENDING_DATA:
>                         if (test_and_clear_bit(EVENT_DATA_ERROR,
>                                                &host->pending_events)) {
> -                               dw_mci_stop_dma(host);
>                                 if (data->stop)
>                                         send_stop_cmd(host, data);
>                                 state = STATE_DATA_ERROR;
> @@ -1155,6 +1154,9 @@ static void dw_mci_tasklet_func(unsigned long priv)
>                                                 &host->pending_events))
>                                 break;
>
> +                       dw_mci_stop_dma(host);
> +                       set_bit(EVENT_XFER_COMPLETE, &host->completed_events);
> +
>                         state = STATE_DATA_BUSY;
>                         break;

I can't say that I'm quite familiar enough with the intricate details
of the driver to know whether this is a good idea or guaranteed to
work.  Do we really think that we'll still get the end of the transfer
properly if we've seen an error already?  I worry that we won't.


> #2. set EVENT_XFER_COMPLETE flag when dw_mci_stop_dma is called regardless using_dma.
>
> @@ -299,10 +299,9 @@ static void dw_mci_stop_dma(struct dw_mci *host)
>         if (host->using_dma) {
>                 host->dma_ops->stop(host);
>                 host->dma_ops->cleanup(host);
> -       } else {
> -               /* Data transfer was stopped by the interrupt handler */
> -               set_bit(EVENT_XFER_COMPLETE, &host->pending_events);
>         }
> +
> +       set_bit(EVENT_XFER_COMPLETE, &host->pending_events);
>  }

This is fairly similar to my patch but goes further.  I believe my
patch has this effect but only for the call to dw_mci_stop_dma() in
STATE_SENDING_DATA in the tasklet.  Your affects all 3 calls to
dw_mci_stop_dma().

This seems reasonable but I don't have confidence in my understanding
of this driver's state machine (especially with regards to the error
conditions) that I can say which is better.  If you think that this is
a more correct solution than mine then we can give it some testing.

Thanks!

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Seungwon Jeon April 10, 2013, 7:02 a.m. UTC | #7
On Tuesday, April 09, 2013, Doug Anderson wrote:
> Seungwon,
> 
> On Mon, Apr 8, 2013 at 5:17 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote:
> > I guess Doug are debugging it with wifi, right?
> 
> Yes, we're debugging it on the Samsung ARM Chromebook on a part that
> has an SDIO WiFi module by Marvell.  Bing Zhao (CCed) has a unit in
> hand that generates lots of CRC errors and has been testing patches
> I've sent him.
> 
> 
> > The problem happens when dw_mci_stop_dma is called in the middle of data transfers.
> > If data error occurs in the end of block, EVENT_XFER_COMPLETE might be set. So, it's fine.
> > Actually, dw_mci_idmac_stop_dma stops the dma working, there is no further interrupt for dma
> completion.
> 
> That sounds right to me.
> 
> 
> > There are two solutions we have applied.
> 
> I'm a little confused.  Have you already applied one or both of the
> solutions you list below, or are you proposing them as alternates to
> the patch I submitted?
Yes, first one already has been applied.
I wanted to introduce our fix. Did you try to test with these fixes?

> 
> > #1. deferring the call of dw_mci_stop_dma until EVENT_XFER_COMPLETE flag is set into pending_events.
> > In this case, dma transfer will be continued with error.
> >
> > @@ -1062,7 +1062,6 @@ static void dw_mci_tasklet_func(unsigned long priv)
> >                 case STATE_SENDING_DATA:
> >                         if (test_and_clear_bit(EVENT_DATA_ERROR,
> >                                                &host->pending_events)) {
> > -                               dw_mci_stop_dma(host);
> >                                 if (data->stop)
> >                                         send_stop_cmd(host, data);
> >                                 state = STATE_DATA_ERROR;
> > @@ -1155,6 +1154,9 @@ static void dw_mci_tasklet_func(unsigned long priv)
> >                                                 &host->pending_events))
> >                                 break;
> >
> > +                       dw_mci_stop_dma(host);
> > +                       set_bit(EVENT_XFER_COMPLETE, &host->completed_events);
> > +
> >                         state = STATE_DATA_BUSY;
> >                         break;
> 
> I can't say that I'm quite familiar enough with the intricate details
> of the driver to know whether this is a good idea or guaranteed to
> work.  Do we really think that we'll still get the end of the transfer
> properly if we've seen an error already?  I worry that we won't.
For example, let's pretend data CRC error occurs during data read.
Peer device doesn't know that error occurrence and data transmission still keeps going.
dma will run as long as host doesn't take the stop or see the end of descriptor.
> 
> 
> > #2. set EVENT_XFER_COMPLETE flag when dw_mci_stop_dma is called regardless using_dma.
> >
> > @@ -299,10 +299,9 @@ static void dw_mci_stop_dma(struct dw_mci *host)
> >         if (host->using_dma) {
> >                 host->dma_ops->stop(host);
> >                 host->dma_ops->cleanup(host);
> > -       } else {
> > -               /* Data transfer was stopped by the interrupt handler */
> > -               set_bit(EVENT_XFER_COMPLETE, &host->pending_events);
> >         }
> > +
> > +       set_bit(EVENT_XFER_COMPLETE, &host->pending_events);
> >  }
> 
> This is fairly similar to my patch but goes further.  I believe my
> patch has this effect but only for the call to dw_mci_stop_dma() in
> STATE_SENDING_DATA in the tasklet.  Your affects all 3 calls to
> dw_mci_stop_dma().
> 
> This seems reasonable but I don't have confidence in my understanding
> of this driver's state machine (especially with regards to the error
> conditions) that I can say which is better.  If you think that this is
> a more correct solution than mine then we can give it some testing.
Yes. As a result, both patches prevent tasklet's hanging.
In that regard, two patches give the similar effect.
But I think your fix are just removing the test_bit to wait for EVENT_XFER_COMPLETE.
'clear_bit(...) part which is added might be of no effect.
It doesn't make sense a bit.

<quotation>
 		case STATE_DATA_ERROR:
-			if (!test_and_clear_bit(EVENT_XFER_COMPLETE,
-						&host->pending_events))
-				break;
-
+			clear_bit(EVENT_XFER_COMPLETE, &host->pending_events);
</quotation>

Thanks,
Seugwon Jeon
> 
> Thanks!
> 
> -Doug
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jaehoon Chung April 10, 2013, 8:51 a.m. UTC | #8
On 04/10/2013 04:02 PM, Seungwon Jeon wrote:
> On Tuesday, April 09, 2013, Doug Anderson wrote:
>> Seungwon,
>>
>> On Mon, Apr 8, 2013 at 5:17 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote:
>>> I guess Doug are debugging it with wifi, right?
>>
>> Yes, we're debugging it on the Samsung ARM Chromebook on a part that
>> has an SDIO WiFi module by Marvell.  Bing Zhao (CCed) has a unit in
>> hand that generates lots of CRC errors and has been testing patches
>> I've sent him.
>>
>>
>>> The problem happens when dw_mci_stop_dma is called in the middle of data transfers.
>>> If data error occurs in the end of block, EVENT_XFER_COMPLETE might be set. So, it's fine.
>>> Actually, dw_mci_idmac_stop_dma stops the dma working, there is no further interrupt for dma
>> completion.
>>
>> That sounds right to me.
>>
>>
>>> There are two solutions we have applied.
>>
>> I'm a little confused.  Have you already applied one or both of the
>> solutions you list below, or are you proposing them as alternates to
>> the patch I submitted?
> Yes, first one already has been applied.
> I wanted to introduce our fix. Did you try to test with these fixes?
Actually i have tested with Seungwon's fixes. It looks good.

> 
>>
>>> #1. deferring the call of dw_mci_stop_dma until EVENT_XFER_COMPLETE flag is set into pending_events.
>>> In this case, dma transfer will be continued with error.
>>>
>>> @@ -1062,7 +1062,6 @@ static void dw_mci_tasklet_func(unsigned long priv)
>>>                 case STATE_SENDING_DATA:
>>>                         if (test_and_clear_bit(EVENT_DATA_ERROR,
>>>                                                &host->pending_events)) {
>>> -                               dw_mci_stop_dma(host);
>>>                                 if (data->stop)
>>>                                         send_stop_cmd(host, data);
>>>                                 state = STATE_DATA_ERROR;
>>> @@ -1155,6 +1154,9 @@ static void dw_mci_tasklet_func(unsigned long priv)
>>>                                                 &host->pending_events))
>>>                                 break;
>>>
>>> +                       dw_mci_stop_dma(host);
>>> +                       set_bit(EVENT_XFER_COMPLETE, &host->completed_events);
>>> +
>>>                         state = STATE_DATA_BUSY;
>>>                         break;
>>
>> I can't say that I'm quite familiar enough with the intricate details
>> of the driver to know whether this is a good idea or guaranteed to
>> work.  Do we really think that we'll still get the end of the transfer
>> properly if we've seen an error already?  I worry that we won't.
> For example, let's pretend data CRC error occurs during data read.
> Peer device doesn't know that error occurrence and data transmission still keeps going.
> dma will run as long as host doesn't take the stop or see the end of descriptor.
>>
>>
>>> #2. set EVENT_XFER_COMPLETE flag when dw_mci_stop_dma is called regardless using_dma.
>>>
>>> @@ -299,10 +299,9 @@ static void dw_mci_stop_dma(struct dw_mci *host)
>>>         if (host->using_dma) {
>>>                 host->dma_ops->stop(host);
>>>                 host->dma_ops->cleanup(host);
>>> -       } else {
>>> -               /* Data transfer was stopped by the interrupt handler */
>>> -               set_bit(EVENT_XFER_COMPLETE, &host->pending_events);
>>>         }
>>> +
>>> +       set_bit(EVENT_XFER_COMPLETE, &host->pending_events);
>>>  }
>>
>> This is fairly similar to my patch but goes further.  I believe my
>> patch has this effect but only for the call to dw_mci_stop_dma() in
>> STATE_SENDING_DATA in the tasklet.  Your affects all 3 calls to
>> dw_mci_stop_dma().
I think we can also use the second approach.
but i think that it also needs to test with this.
>>
>> This seems reasonable but I don't have confidence in my understanding
>> of this driver's state machine (especially with regards to the error
>> conditions) that I can say which is better.  If you think that this is
>> a more correct solution than mine then we can give it some testing.
> Yes. As a result, both patches prevent tasklet's hanging.
> In that regard, two patches give the similar effect.
> But I think your fix are just removing the test_bit to wait for EVENT_XFER_COMPLETE.
> 'clear_bit(...) part which is added might be of no effect.
> It doesn't make sense a bit.
> 
> <quotation>
>  		case STATE_DATA_ERROR:
> -			if (!test_and_clear_bit(EVENT_XFER_COMPLETE,
> -						&host->pending_events))
> -				break;
> -
> +			clear_bit(EVENT_XFER_COMPLETE, &host->pending_events);
> </quotation>
> 
> Thanks,
> Seugwon Jeon
>>
>> Thanks!
>>
>> -Doug
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Anderson June 12, 2013, 7:06 p.m. UTC | #9
Seungwon,

On Wed, Apr 10, 2013 at 12:02 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote:
>> > There are two solutions we have applied.
>>
>> I'm a little confused.  Have you already applied one or both of the
>> solutions you list below, or are you proposing them as alternates to
>> the patch I submitted?
> Yes, first one already has been applied.
> I wanted to introduce our fix. Did you try to test with these fixes?

I'm coming back to this after being quite distracted for a while.

I'm a little confused in that you said that your first fix was already
applied.  I don't see it anywhere.  Did you mean that you've already
applied it locally, or that it's applied in some git tree somewhere?
If so, can you point me to it?

If this hasn't been sent out anywhere, perhaps you could send out
official patches?

I don't have the failing unit myself, so we'll have to get Bing to try
the patches.  You are suggesting that we try applying both of your
patches, right?

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Seungwon Jeon June 18, 2013, 12:36 p.m. UTC | #10
On Thursday, June 13, 2013, Doug Anderson wrote:
> Seungwon,
> 
> On Wed, Apr 10, 2013 at 12:02 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote:
> >> > There are two solutions we have applied.
> >>
> >> I'm a little confused.  Have you already applied one or both of the
> >> solutions you list below, or are you proposing them as alternates to
> >> the patch I submitted?
> > Yes, first one already has been applied.
> > I wanted to introduce our fix. Did you try to test with these fixes?
> 
> I'm coming back to this after being quite distracted for a while.
> 
> I'm a little confused in that you said that your first fix was already
> applied.  I don't see it anywhere.  Did you mean that you've already
> applied it locally, or that it's applied in some git tree somewhere?
> If so, can you point me to it?
> 
> If this hasn't been sent out anywhere, perhaps you could send out
> official patches?
Currently, it has just applied for some projects, not official patch.
> 
> I don't have the failing unit myself, so we'll have to get Bing to try
> the patches.  You are suggesting that we try applying both of your
> patches, right?
Did you test the patch?
I wonder that both are good for your side.

Thanks,
Seungwon Jeon
> 
> -Doug

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bing Zhao June 18, 2013, 7:46 p.m. UTC | #11
Hi Seungwon,

> > I don't have the failing unit myself, so we'll have to get Bing to try
> > the patches.  You are suggesting that we try applying both of your
> > patches, right?
> Did you test the patch?
> I wonder that both are good for your side.

I tested Doug's patch on my failing unit.

Without this patch, mmc got hung_task and rebooted eventually.
With this patch applied, mmc returns CRC error instead of hung_task, and the error is handled gracefully.

Thanks,
Bing

> 
> Thanks,
> Seungwon Jeon

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Anderson June 18, 2013, 7:52 p.m. UTC | #12
Bing,

On Tue, Jun 18, 2013 at 12:46 PM, Bing Zhao <bzhao@marvell.com> wrote:
> Hi Seungwon,
>
>> > I don't have the failing unit myself, so we'll have to get Bing to try
>> > the patches.  You are suggesting that we try applying both of your
>> > patches, right?
>> Did you test the patch?
>> I wonder that both are good for your side.
>
> I tested Doug's patch on my failing unit.
>
> Without this patch, mmc got hung_task and rebooted eventually.
> With this patch applied, mmc returns CRC error instead of hung_task, and the error is handled gracefully.

Have you tried one or both or Seungwon's fixes without mine?

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bing Zhao June 18, 2013, 8:01 p.m. UTC | #13
Hi Doug,

> > I tested Doug's patch on my failing unit.
> >
> > Without this patch, mmc got hung_task and rebooted eventually.
> > With this patch applied, mmc returns CRC error instead of hung_task, and the error is handled
> gracefully.
> 
> Have you tried one or both or Seungwon's fixes without mine?

I only tested your original patch. That was several months ago I think.

I still have that unit. Let me know if you want me to try Seungwon's patches.

Thanks,
Bing

> 
> -Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jaehoon Chung June 20, 2013, 1:49 a.m. UTC | #14
Hi All,

I tested with Seungwon's patch and this patch.
I'm agreed with Seungwon's opinion.

Best Regards,
Jaehoon Chung

On 06/18/2013 09:36 PM, Seungwon Jeon wrote:
> On Thursday, June 13, 2013, Doug Anderson wrote:
>> Seungwon,
>>
>> On Wed, Apr 10, 2013 at 12:02 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote:
>>>>> There are two solutions we have applied.
>>>>
>>>> I'm a little confused.  Have you already applied one or both of the
>>>> solutions you list below, or are you proposing them as alternates to
>>>> the patch I submitted?
>>> Yes, first one already has been applied.
>>> I wanted to introduce our fix. Did you try to test with these fixes?
>>
>> I'm coming back to this after being quite distracted for a while.
>>
>> I'm a little confused in that you said that your first fix was already
>> applied.  I don't see it anywhere.  Did you mean that you've already
>> applied it locally, or that it's applied in some git tree somewhere?
>> If so, can you point me to it?
>>
>> If this hasn't been sent out anywhere, perhaps you could send out
>> official patches?
> Currently, it has just applied for some projects, not official patch.
>>
>> I don't have the failing unit myself, so we'll have to get Bing to try
>> the patches.  You are suggesting that we try applying both of your
>> patches, right?
> Did you test the patch?
> I wonder that both are good for your side.
> 
> Thanks,
> Seungwon Jeon
>>
>> -Doug
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Anderson June 21, 2013, 3:33 a.m. UTC | #15
Bing,

On Tue, Jun 18, 2013 at 1:01 PM, Bing Zhao <bzhao@marvell.com> wrote:
> Hi Doug,
>
>> > I tested Doug's patch on my failing unit.
>> >
>> > Without this patch, mmc got hung_task and rebooted eventually.
>> > With this patch applied, mmc returns CRC error instead of hung_task, and the error is handled
>> gracefully.
>>
>> Have you tried one or both or Seungwon's fixes without mine?
>
> I only tested your original patch. That was several months ago I think.
>
> I still have that unit. Let me know if you want me to try Seungwon's patches.

I think the proposal on the table is to take Seungwon's patches
instead of mine.  Assuming they solve your problems, I'm OK with that.
 I think he was requesting testing the first of his two patches alone
and then both of his two patches together.

Thanks!

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bing Zhao June 25, 2013, 3:54 a.m. UTC | #16
> I think the proposal on the table is to take Seungwon's patches
> instead of mine.  Assuming they solve your problems, I'm OK with that.
>  I think he was requesting testing the first of his two patches alone
> and then both of his two patches together.

Test #1: Swungwon's patch #1 alone [1]
Test #2: Swungwon's patch #2 alone [1]
Test #3: Swungwon's patch #1 and #2 [1]
Test #4: Doug's original patch [2]

Test #1 and #3: it doesn't work; system reboots due to kernel hung_task
Test #2 and #4: it works; instead of hung_task driver gets CRC error (which is expected)

Thanks,
Bing

[1] https://lkml.org/lkml/2013/4/8/316
[2] https://lkml.org/lkml/2013/3/15/583

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Seungwon Jeon June 26, 2013, 1:53 a.m. UTC | #17
Hi Jaehoon,
Do you have the same result?
Could you share the result?

Thanks,
Seungwon Jeon
On Tuesday, June 25 2013, Bing Zhao wrote:
> > I think the proposal on the table is to take Seungwon's patches
> > instead of mine.  Assuming they solve your problems, I'm OK with that.
> >  I think he was requesting testing the first of his two patches alone
> > and then both of his two patches together.
> 
> Test #1: Swungwon's patch #1 alone [1]
> Test #2: Swungwon's patch #2 alone [1]
> Test #3: Swungwon's patch #1 and #2 [1]
> Test #4: Doug's original patch [2]
> 
> Test #1 and #3: it doesn't work; system reboots due to kernel hung_task
> Test #2 and #4: it works; instead of hung_task driver gets CRC error (which is expected)
> 
> Thanks,
> Bing
> 
> [1] https://lkml.org/lkml/2013/4/8/316
> [2] https://lkml.org/lkml/2013/3/15/583
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jaehoon Chung June 27, 2013, 3:36 a.m. UTC | #18
Hi Seungwon,

I didn't get the same result..In my case, it's working fine.
But as Bing's result, i will check more and share the result.

Best Regards,
Jaehoon Chung

On 06/26/2013 10:53 AM, Seungwon Jeon wrote:
> Hi Jaehoon,
> Do you have the same result?
> Could you share the result?
> 
> Thanks,
> Seungwon Jeon
> On Tuesday, June 25 2013, Bing Zhao wrote:
>>> I think the proposal on the table is to take Seungwon's patches
>>> instead of mine.  Assuming they solve your problems, I'm OK with that.
>>>  I think he was requesting testing the first of his two patches alone
>>> and then both of his two patches together.
>>
>> Test #1: Swungwon's patch #1 alone [1]
>> Test #2: Swungwon's patch #2 alone [1]
>> Test #3: Swungwon's patch #1 and #2 [1]
>> Test #4: Doug's original patch [2]
>>
>> Test #1 and #3: it doesn't work; system reboots due to kernel hung_task
>> Test #2 and #4: it works; instead of hung_task driver gets CRC error (which is expected)
>>
>> Thanks,
>> Bing
>>
>> [1] https://lkml.org/lkml/2013/4/8/316
>> [2] https://lkml.org/lkml/2013/3/15/583
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bing Zhao June 27, 2013, 6:18 p.m. UTC | #19
> I didn't get the same result..In my case, it's working fine.
> But as Bing's result, i will check more and share the result.

I want to add that my kernel is from "cros/release-R26-3701.B" branch with this HEAD:

4e13a9c CHERRY-PICK: UPSTREAM: loop: prevent bdev freeing while device in use

Thanks,
Bing
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 9834221..696b3bb 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1137,10 +1137,7 @@  static void dw_mci_tasklet_func(unsigned long priv)
 			goto unlock;
 
 		case STATE_DATA_ERROR:
-			if (!test_and_clear_bit(EVENT_XFER_COMPLETE,
-						&host->pending_events))
-				break;
-
+			clear_bit(EVENT_XFER_COMPLETE, &host->pending_events);
 			state = STATE_DATA_BUSY;
 			break;
 		}