diff mbox

[09/23] mmc: sdhci: fix incorrect get data interrupt during no data transfer

Message ID 1460741387-23815-10-git-send-email-aisheng.dong@nxp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dong Aisheng April 15, 2016, 5:29 p.m. UTC
Current code will report the wrong data interrupt got when no
data operation in progress assumed by getting !host->data in
sdhci_data_irq().

For a data command handling process, the driver will call
sdhci_finish_data() and clear host->data in case any data error,
then card finish_tasklet will do the rest controller reset work.

Before the tasklet got run, however, controllers may report the
TC(Transfer Complete) interrupt (SDHCI_INT_DATA_END) a bit later
than data CRC error and data end bit error interrupts for single
block transfer or the last block of multiblock transfer.

Controller usually detects and generates data CRC/end bit error
interrupts once one block on the bus is transferred completely.
For single block transfer, since there's only one bock to transfer,
the controller will report transfer complete interrupt as well,
but until the data in controller FIFO has been successfully
transferred to memory. The time gap of TC and CRC interrupt depends on
the system busy state at that point and memory bus access speed.

So it is possible when TC interrupt generated, host->data is already
equal to NULL due to cleared by former CRC/Data End Bit error which
is reasonable.

Thus we DO NOT report the weird data interrupt event for this case.

Else we may easily see warning below during SD3.0 card manually tuning
process (calling mmc_send_tuning() which is a single block transfer)
mmc0: Got data interrupt 0x00000002 even though no data operation was in progress

The detailed command log is as follows:
[ 1657.920983] mmc0: starting CMD19 arg 00000000 flags 00000035
[ 1657.921009] mmc0:     blksz 64 blocks 1 flags 00000200 tsac 150 ms nsac 0
[ 1657.921085] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00200001
[ 1657.921112] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000002
[ 1657.921131] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
[ 1657.929761] sdhci: =========== REGISTER DUMP (mmc0)===========
[ 1657.929780] sdhci: Sys addr: 0x3d5d6380 | Version:  0x00000002
[ 1657.929796] sdhci: Blk size: 0x00000040 | Blk cnt:  0x00000001
[ 1657.929814] sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
[ 1657.929831] sdhci: Present:  0x01fd8008 | Host ctl: 0x00000023
[ 1657.929847] sdhci: Power:    0x00000002 | Blk gap:  0x00000080
[ 1657.929863] sdhci: Wake-up:  0x00000008 | Clock:    0x0000000f
[ 1657.929879] sdhci: Timeout:  0x0000000f | Int stat: 0x00000000
[ 1657.929896] sdhci: Int enab: 0x107f008b | Sig enab: 0x107f008b
[ 1657.929914] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000003
[ 1657.929932] sdhci: Caps:     0x07eb0000 | Caps_1:   0x00002007
[ 1657.929949] sdhci: Cmd:      0x0000133a | Max curr: 0x00ffffff
[ 1657.929965] sdhci: Host ctl2: 0x000000c8
[ 1657.929981] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x8f042208
[ 1657.929995] sdhci: ===========================================
[ 1657.930156] mmc0: req done (CMD19): 0: 00000900 00000000 00000000 00000000
[ 1657.930179] mmc0:     0 bytes transferred: -84

It shows we first have a data CRC error interrupt then a data transfer
complete interrupt.
Then we got the !host->data case in sdhci_data_irq().

CC: stable <stable@vger.kernel.org>
Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
---
 drivers/mmc/host/sdhci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Adrian Hunter May 10, 2016, 6:51 a.m. UTC | #1
On 15/04/16 20:29, Dong Aisheng wrote:
> Current code will report the wrong data interrupt got when no
> data operation in progress assumed by getting !host->data in
> sdhci_data_irq().
> 
> For a data command handling process, the driver will call
> sdhci_finish_data() and clear host->data in case any data error,
> then card finish_tasklet will do the rest controller reset work.
> 
> Before the tasklet got run, however, controllers may report the
> TC(Transfer Complete) interrupt (SDHCI_INT_DATA_END) a bit later
> than data CRC error and data end bit error interrupts for single
> block transfer or the last block of multiblock transfer.
> 
> Controller usually detects and generates data CRC/end bit error
> interrupts once one block on the bus is transferred completely.
> For single block transfer, since there's only one bock to transfer,
> the controller will report transfer complete interrupt as well,
> but until the data in controller FIFO has been successfully
> transferred to memory. The time gap of TC and CRC interrupt depends on
> the system busy state at that point and memory bus access speed.
> 
> So it is possible when TC interrupt generated, host->data is already
> equal to NULL due to cleared by former CRC/Data End Bit error which
> is reasonable.
> 
> Thus we DO NOT report the weird data interrupt event for this case.
> 
> Else we may easily see warning below during SD3.0 card manually tuning
> process (calling mmc_send_tuning() which is a single block transfer)
> mmc0: Got data interrupt 0x00000002 even though no data operation was in progress
> 
> The detailed command log is as follows:
> [ 1657.920983] mmc0: starting CMD19 arg 00000000 flags 00000035
> [ 1657.921009] mmc0:     blksz 64 blocks 1 flags 00000200 tsac 150 ms nsac 0
> [ 1657.921085] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00200001
> [ 1657.921112] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000002
> [ 1657.921131] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
> [ 1657.929761] sdhci: =========== REGISTER DUMP (mmc0)===========
> [ 1657.929780] sdhci: Sys addr: 0x3d5d6380 | Version:  0x00000002
> [ 1657.929796] sdhci: Blk size: 0x00000040 | Blk cnt:  0x00000001
> [ 1657.929814] sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
> [ 1657.929831] sdhci: Present:  0x01fd8008 | Host ctl: 0x00000023
> [ 1657.929847] sdhci: Power:    0x00000002 | Blk gap:  0x00000080
> [ 1657.929863] sdhci: Wake-up:  0x00000008 | Clock:    0x0000000f
> [ 1657.929879] sdhci: Timeout:  0x0000000f | Int stat: 0x00000000
> [ 1657.929896] sdhci: Int enab: 0x107f008b | Sig enab: 0x107f008b
> [ 1657.929914] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000003
> [ 1657.929932] sdhci: Caps:     0x07eb0000 | Caps_1:   0x00002007
> [ 1657.929949] sdhci: Cmd:      0x0000133a | Max curr: 0x00ffffff
> [ 1657.929965] sdhci: Host ctl2: 0x000000c8
> [ 1657.929981] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x8f042208
> [ 1657.929995] sdhci: ===========================================
> [ 1657.930156] mmc0: req done (CMD19): 0: 00000900 00000000 00000000 00000000
> [ 1657.930179] mmc0:     0 bytes transferred: -84
> 
> It shows we first have a data CRC error interrupt then a data transfer
> complete interrupt.
> Then we got the !host->data case in sdhci_data_irq().
> 
> CC: stable <stable@vger.kernel.org>
> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
> ---
>  drivers/mmc/host/sdhci.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> index 40e3551..2eb0e34 100644
> --- a/drivers/mmc/host/sdhci.c
> +++ b/drivers/mmc/host/sdhci.c
> @@ -2325,6 +2325,17 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
>  			}
>  		}
>  
> +		/*
> +		 * The "data complete" interrupt is possible to happen a bit
> +		 * later than CRC error and data end bit error interrupts
> +		 * separately for single block transfer or the last block of
> +		 * multiblock transfer. For this case, we DO NOT report the
> +		 * weird data interrupt event.
> +		 */
> +		if ((intmask & SDHCI_INT_DATA_END) &&
> +		    (host->mrq && host->mrq->data && host->mrq->data->error))
> +			return;

This could be generalized a bit more i.e. what about:

		/*
		 * After an error and before the the data circuit is reset in
		 * sdhci_tasklet_finish, we could get more interrupts, but we
		 * already have an error, so ignore them.
		 */
		if (host->mrq && host->mrq->data && host->mrq->data->error)
			return;

> +
>  		pr_err("%s: Got data interrupt 0x%08x even though no data operation was in progress.\n",
>  		       mmc_hostname(host->mmc), (unsigned)intmask);
>  		sdhci_dumpregs(host);
>
Ritesh Harjani (IBM) May 17, 2016, 4:31 a.m. UTC | #2
On Tue, May 10, 2016 at 12:21 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 15/04/16 20:29, Dong Aisheng wrote:
>> Current code will report the wrong data interrupt got when no
>> data operation in progress assumed by getting !host->data in
>> sdhci_data_irq().
>>
>> For a data command handling process, the driver will call
>> sdhci_finish_data() and clear host->data in case any data error,
>> then card finish_tasklet will do the rest controller reset work.
>>
>> Before the tasklet got run, however, controllers may report the
>> TC(Transfer Complete) interrupt (SDHCI_INT_DATA_END) a bit later
>> than data CRC error and data end bit error interrupts for single
>> block transfer or the last block of multiblock transfer.

What if the controller generates Transfer complete interrupt bit more slower
and the tasklet do get a chance to run. In that case even host->mrq
will become NULL.
Actually if that happens it's even more worse since the driver may
start issuing other commands to controller.

Are we covering that case here? Do you think that this case is possible?


>>
>> Controller usually detects and generates data CRC/end bit error
>> interrupts once one block on the bus is transferred completely.
>> For single block transfer, since there's only one bock to transfer,
>> the controller will report transfer complete interrupt as well,
>> but until the data in controller FIFO has been successfully
>> transferred to memory. The time gap of TC and CRC interrupt depends on
>> the system busy state at that point and memory bus access speed.
>>
>> So it is possible when TC interrupt generated, host->data is already
>> equal to NULL due to cleared by former CRC/Data End Bit error which
>> is reasonable.
>>
>> Thus we DO NOT report the weird data interrupt event for this case.
>>
>> Else we may easily see warning below during SD3.0 card manually tuning
>> process (calling mmc_send_tuning() which is a single block transfer)
>> mmc0: Got data interrupt 0x00000002 even though no data operation was in progress
>>
>> The detailed command log is as follows:
>> [ 1657.920983] mmc0: starting CMD19 arg 00000000 flags 00000035
>> [ 1657.921009] mmc0:     blksz 64 blocks 1 flags 00000200 tsac 150 ms nsac 0
>> [ 1657.921085] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00200001
>> [ 1657.921112] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000002
>> [ 1657.921131] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
>> [ 1657.929761] sdhci: =========== REGISTER DUMP (mmc0)===========
>> [ 1657.929780] sdhci: Sys addr: 0x3d5d6380 | Version:  0x00000002
>> [ 1657.929796] sdhci: Blk size: 0x00000040 | Blk cnt:  0x00000001
>> [ 1657.929814] sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
>> [ 1657.929831] sdhci: Present:  0x01fd8008 | Host ctl: 0x00000023
>> [ 1657.929847] sdhci: Power:    0x00000002 | Blk gap:  0x00000080
>> [ 1657.929863] sdhci: Wake-up:  0x00000008 | Clock:    0x0000000f
>> [ 1657.929879] sdhci: Timeout:  0x0000000f | Int stat: 0x00000000
>> [ 1657.929896] sdhci: Int enab: 0x107f008b | Sig enab: 0x107f008b
>> [ 1657.929914] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000003
>> [ 1657.929932] sdhci: Caps:     0x07eb0000 | Caps_1:   0x00002007
>> [ 1657.929949] sdhci: Cmd:      0x0000133a | Max curr: 0x00ffffff
>> [ 1657.929965] sdhci: Host ctl2: 0x000000c8
>> [ 1657.929981] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x8f042208
>> [ 1657.929995] sdhci: ===========================================
>> [ 1657.930156] mmc0: req done (CMD19): 0: 00000900 00000000 00000000 00000000
>> [ 1657.930179] mmc0:     0 bytes transferred: -84
>>
>> It shows we first have a data CRC error interrupt then a data transfer
>> complete interrupt.
>> Then we got the !host->data case in sdhci_data_irq().
>>
>> CC: stable <stable@vger.kernel.org>
>> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
>> ---
>>  drivers/mmc/host/sdhci.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>> index 40e3551..2eb0e34 100644
>> --- a/drivers/mmc/host/sdhci.c
>> +++ b/drivers/mmc/host/sdhci.c
>> @@ -2325,6 +2325,17 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
>>                       }
>>               }
>>
>> +             /*
>> +              * The "data complete" interrupt is possible to happen a bit
>> +              * later than CRC error and data end bit error interrupts
>> +              * separately for single block transfer or the last block of
>> +              * multiblock transfer. For this case, we DO NOT report the
>> +              * weird data interrupt event.
>> +              */
>> +             if ((intmask & SDHCI_INT_DATA_END) &&
>> +                 (host->mrq && host->mrq->data && host->mrq->data->error))
>> +                     return;
>
> This could be generalized a bit more i.e. what about:
>
>                 /*
>                  * After an error and before the the data circuit is reset in
>                  * sdhci_tasklet_finish, we could get more interrupts, but we
>                  * already have an error, so ignore them.
>                  */
>                 if (host->mrq && host->mrq->data && host->mrq->data->error)
>                         return;
>
>> +
>>               pr_err("%s: Got data interrupt 0x%08x even though no data operation was in progress.\n",
>>                      mmc_hostname(host->mmc), (unsigned)intmask);
>>               sdhci_dumpregs(host);
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adrian Hunter May 17, 2016, 5:58 a.m. UTC | #3
On 17/05/16 07:31, Ritesh Harjani wrote:
> On Tue, May 10, 2016 at 12:21 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 15/04/16 20:29, Dong Aisheng wrote:
>>> Current code will report the wrong data interrupt got when no
>>> data operation in progress assumed by getting !host->data in
>>> sdhci_data_irq().
>>>
>>> For a data command handling process, the driver will call
>>> sdhci_finish_data() and clear host->data in case any data error,
>>> then card finish_tasklet will do the rest controller reset work.
>>>
>>> Before the tasklet got run, however, controllers may report the
>>> TC(Transfer Complete) interrupt (SDHCI_INT_DATA_END) a bit later
>>> than data CRC error and data end bit error interrupts for single
>>> block transfer or the last block of multiblock transfer.
> 
> What if the controller generates Transfer complete interrupt bit more slower
> and the tasklet do get a chance to run. In that case even host->mrq
> will become NULL.
> Actually if that happens it's even more worse since the driver may
> start issuing other commands to controller.
> 
> Are we covering that case here? Do you think that this case is possible?

After data error, the host driver resets the controller so there will be no
more interrupts relating to that mrq.

This patch is avoiding unnecessary warnings until that happens.

> 
> 
>>>
>>> Controller usually detects and generates data CRC/end bit error
>>> interrupts once one block on the bus is transferred completely.
>>> For single block transfer, since there's only one bock to transfer,
>>> the controller will report transfer complete interrupt as well,
>>> but until the data in controller FIFO has been successfully
>>> transferred to memory. The time gap of TC and CRC interrupt depends on
>>> the system busy state at that point and memory bus access speed.
>>>
>>> So it is possible when TC interrupt generated, host->data is already
>>> equal to NULL due to cleared by former CRC/Data End Bit error which
>>> is reasonable.
>>>
>>> Thus we DO NOT report the weird data interrupt event for this case.
>>>
>>> Else we may easily see warning below during SD3.0 card manually tuning
>>> process (calling mmc_send_tuning() which is a single block transfer)
>>> mmc0: Got data interrupt 0x00000002 even though no data operation was in progress
>>>
>>> The detailed command log is as follows:
>>> [ 1657.920983] mmc0: starting CMD19 arg 00000000 flags 00000035
>>> [ 1657.921009] mmc0:     blksz 64 blocks 1 flags 00000200 tsac 150 ms nsac 0
>>> [ 1657.921085] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00200001
>>> [ 1657.921112] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000002
>>> [ 1657.921131] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
>>> [ 1657.929761] sdhci: =========== REGISTER DUMP (mmc0)===========
>>> [ 1657.929780] sdhci: Sys addr: 0x3d5d6380 | Version:  0x00000002
>>> [ 1657.929796] sdhci: Blk size: 0x00000040 | Blk cnt:  0x00000001
>>> [ 1657.929814] sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
>>> [ 1657.929831] sdhci: Present:  0x01fd8008 | Host ctl: 0x00000023
>>> [ 1657.929847] sdhci: Power:    0x00000002 | Blk gap:  0x00000080
>>> [ 1657.929863] sdhci: Wake-up:  0x00000008 | Clock:    0x0000000f
>>> [ 1657.929879] sdhci: Timeout:  0x0000000f | Int stat: 0x00000000
>>> [ 1657.929896] sdhci: Int enab: 0x107f008b | Sig enab: 0x107f008b
>>> [ 1657.929914] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000003
>>> [ 1657.929932] sdhci: Caps:     0x07eb0000 | Caps_1:   0x00002007
>>> [ 1657.929949] sdhci: Cmd:      0x0000133a | Max curr: 0x00ffffff
>>> [ 1657.929965] sdhci: Host ctl2: 0x000000c8
>>> [ 1657.929981] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x8f042208
>>> [ 1657.929995] sdhci: ===========================================
>>> [ 1657.930156] mmc0: req done (CMD19): 0: 00000900 00000000 00000000 00000000
>>> [ 1657.930179] mmc0:     0 bytes transferred: -84
>>>
>>> It shows we first have a data CRC error interrupt then a data transfer
>>> complete interrupt.
>>> Then we got the !host->data case in sdhci_data_irq().
>>>
>>> CC: stable <stable@vger.kernel.org>
>>> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
>>> ---
>>>  drivers/mmc/host/sdhci.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>>> index 40e3551..2eb0e34 100644
>>> --- a/drivers/mmc/host/sdhci.c
>>> +++ b/drivers/mmc/host/sdhci.c
>>> @@ -2325,6 +2325,17 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
>>>                       }
>>>               }
>>>
>>> +             /*
>>> +              * The "data complete" interrupt is possible to happen a bit
>>> +              * later than CRC error and data end bit error interrupts
>>> +              * separately for single block transfer or the last block of
>>> +              * multiblock transfer. For this case, we DO NOT report the
>>> +              * weird data interrupt event.
>>> +              */
>>> +             if ((intmask & SDHCI_INT_DATA_END) &&
>>> +                 (host->mrq && host->mrq->data && host->mrq->data->error))
>>> +                     return;
>>
>> This could be generalized a bit more i.e. what about:
>>
>>                 /*
>>                  * After an error and before the the data circuit is reset in
>>                  * sdhci_tasklet_finish, we could get more interrupts, but we
>>                  * already have an error, so ignore them.
>>                  */
>>                 if (host->mrq && host->mrq->data && host->mrq->data->error)
>>                         return;
>>
>>> +
>>>               pr_err("%s: Got data interrupt 0x%08x even though no data operation was in progress.\n",
>>>                      mmc_hostname(host->mmc), (unsigned)intmask);
>>>               sdhci_dumpregs(host);
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Dong Aisheng May 26, 2016, 11:41 a.m. UTC | #4
On Tue, May 10, 2016 at 09:51:31AM +0300, Adrian Hunter wrote:
> On 15/04/16 20:29, Dong Aisheng wrote:
> > Current code will report the wrong data interrupt got when no
> > data operation in progress assumed by getting !host->data in
> > sdhci_data_irq().
> > 
> > For a data command handling process, the driver will call
> > sdhci_finish_data() and clear host->data in case any data error,
> > then card finish_tasklet will do the rest controller reset work.
> > 
> > Before the tasklet got run, however, controllers may report the
> > TC(Transfer Complete) interrupt (SDHCI_INT_DATA_END) a bit later
> > than data CRC error and data end bit error interrupts for single
> > block transfer or the last block of multiblock transfer.
> > 
> > Controller usually detects and generates data CRC/end bit error
> > interrupts once one block on the bus is transferred completely.
> > For single block transfer, since there's only one bock to transfer,
> > the controller will report transfer complete interrupt as well,
> > but until the data in controller FIFO has been successfully
> > transferred to memory. The time gap of TC and CRC interrupt depends on
> > the system busy state at that point and memory bus access speed.
> > 
> > So it is possible when TC interrupt generated, host->data is already
> > equal to NULL due to cleared by former CRC/Data End Bit error which
> > is reasonable.
> > 
> > Thus we DO NOT report the weird data interrupt event for this case.
> > 
> > Else we may easily see warning below during SD3.0 card manually tuning
> > process (calling mmc_send_tuning() which is a single block transfer)
> > mmc0: Got data interrupt 0x00000002 even though no data operation was in progress
> > 
> > The detailed command log is as follows:
> > [ 1657.920983] mmc0: starting CMD19 arg 00000000 flags 00000035
> > [ 1657.921009] mmc0:     blksz 64 blocks 1 flags 00000200 tsac 150 ms nsac 0
> > [ 1657.921085] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00200001
> > [ 1657.921112] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000002
> > [ 1657.921131] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
> > [ 1657.929761] sdhci: =========== REGISTER DUMP (mmc0)===========
> > [ 1657.929780] sdhci: Sys addr: 0x3d5d6380 | Version:  0x00000002
> > [ 1657.929796] sdhci: Blk size: 0x00000040 | Blk cnt:  0x00000001
> > [ 1657.929814] sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
> > [ 1657.929831] sdhci: Present:  0x01fd8008 | Host ctl: 0x00000023
> > [ 1657.929847] sdhci: Power:    0x00000002 | Blk gap:  0x00000080
> > [ 1657.929863] sdhci: Wake-up:  0x00000008 | Clock:    0x0000000f
> > [ 1657.929879] sdhci: Timeout:  0x0000000f | Int stat: 0x00000000
> > [ 1657.929896] sdhci: Int enab: 0x107f008b | Sig enab: 0x107f008b
> > [ 1657.929914] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000003
> > [ 1657.929932] sdhci: Caps:     0x07eb0000 | Caps_1:   0x00002007
> > [ 1657.929949] sdhci: Cmd:      0x0000133a | Max curr: 0x00ffffff
> > [ 1657.929965] sdhci: Host ctl2: 0x000000c8
> > [ 1657.929981] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x8f042208
> > [ 1657.929995] sdhci: ===========================================
> > [ 1657.930156] mmc0: req done (CMD19): 0: 00000900 00000000 00000000 00000000
> > [ 1657.930179] mmc0:     0 bytes transferred: -84
> > 
> > It shows we first have a data CRC error interrupt then a data transfer
> > complete interrupt.
> > Then we got the !host->data case in sdhci_data_irq().
> > 
> > CC: stable <stable@vger.kernel.org>
> > Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
> > ---
> >  drivers/mmc/host/sdhci.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> > index 40e3551..2eb0e34 100644
> > --- a/drivers/mmc/host/sdhci.c
> > +++ b/drivers/mmc/host/sdhci.c
> > @@ -2325,6 +2325,17 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
> >  			}
> >  		}
> >  
> > +		/*
> > +		 * The "data complete" interrupt is possible to happen a bit
> > +		 * later than CRC error and data end bit error interrupts
> > +		 * separately for single block transfer or the last block of
> > +		 * multiblock transfer. For this case, we DO NOT report the
> > +		 * weird data interrupt event.
> > +		 */
> > +		if ((intmask & SDHCI_INT_DATA_END) &&
> > +		    (host->mrq && host->mrq->data && host->mrq->data->error))
> > +			return;
> 
> This could be generalized a bit more i.e. what about:
> 
> 		/*
> 		 * After an error and before the the data circuit is reset in
> 		 * sdhci_tasklet_finish, we could get more interrupts, but we
> 		 * already have an error, so ignore them.
> 		 */
> 		if (host->mrq && host->mrq->data && host->mrq->data->error)
> 			return;
> 

Sounds good to me.
If you wouldn't mind, i'd like to add a bit more:
		/*
		 * After an error and before the data circuit is reset in
		 * sdhci_tasklet_finish, we could get more interrupts
		 * e.g. "data complete" interrupt, but we already have
		 * an error, so ignore them.
		 */

Regards
Dong Aisheng

> > +
> >  		pr_err("%s: Got data interrupt 0x%08x even though no data operation was in progress.\n",
> >  		       mmc_hostname(host->mmc), (unsigned)intmask);
> >  		sdhci_dumpregs(host);
> > 
>
Adrian Hunter May 26, 2016, 11:59 a.m. UTC | #5
On 26/05/16 14:41, Dong Aisheng wrote:
> On Tue, May 10, 2016 at 09:51:31AM +0300, Adrian Hunter wrote:
>> On 15/04/16 20:29, Dong Aisheng wrote:
>>> Current code will report the wrong data interrupt got when no
>>> data operation in progress assumed by getting !host->data in
>>> sdhci_data_irq().
>>>
>>> For a data command handling process, the driver will call
>>> sdhci_finish_data() and clear host->data in case any data error,
>>> then card finish_tasklet will do the rest controller reset work.
>>>
>>> Before the tasklet got run, however, controllers may report the
>>> TC(Transfer Complete) interrupt (SDHCI_INT_DATA_END) a bit later
>>> than data CRC error and data end bit error interrupts for single
>>> block transfer or the last block of multiblock transfer.
>>>
>>> Controller usually detects and generates data CRC/end bit error
>>> interrupts once one block on the bus is transferred completely.
>>> For single block transfer, since there's only one bock to transfer,
>>> the controller will report transfer complete interrupt as well,
>>> but until the data in controller FIFO has been successfully
>>> transferred to memory. The time gap of TC and CRC interrupt depends on
>>> the system busy state at that point and memory bus access speed.
>>>
>>> So it is possible when TC interrupt generated, host->data is already
>>> equal to NULL due to cleared by former CRC/Data End Bit error which
>>> is reasonable.
>>>
>>> Thus we DO NOT report the weird data interrupt event for this case.
>>>
>>> Else we may easily see warning below during SD3.0 card manually tuning
>>> process (calling mmc_send_tuning() which is a single block transfer)
>>> mmc0: Got data interrupt 0x00000002 even though no data operation was in progress
>>>
>>> The detailed command log is as follows:
>>> [ 1657.920983] mmc0: starting CMD19 arg 00000000 flags 00000035
>>> [ 1657.921009] mmc0:     blksz 64 blocks 1 flags 00000200 tsac 150 ms nsac 0
>>> [ 1657.921085] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00200001
>>> [ 1657.921112] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000002
>>> [ 1657.921131] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
>>> [ 1657.929761] sdhci: =========== REGISTER DUMP (mmc0)===========
>>> [ 1657.929780] sdhci: Sys addr: 0x3d5d6380 | Version:  0x00000002
>>> [ 1657.929796] sdhci: Blk size: 0x00000040 | Blk cnt:  0x00000001
>>> [ 1657.929814] sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
>>> [ 1657.929831] sdhci: Present:  0x01fd8008 | Host ctl: 0x00000023
>>> [ 1657.929847] sdhci: Power:    0x00000002 | Blk gap:  0x00000080
>>> [ 1657.929863] sdhci: Wake-up:  0x00000008 | Clock:    0x0000000f
>>> [ 1657.929879] sdhci: Timeout:  0x0000000f | Int stat: 0x00000000
>>> [ 1657.929896] sdhci: Int enab: 0x107f008b | Sig enab: 0x107f008b
>>> [ 1657.929914] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000003
>>> [ 1657.929932] sdhci: Caps:     0x07eb0000 | Caps_1:   0x00002007
>>> [ 1657.929949] sdhci: Cmd:      0x0000133a | Max curr: 0x00ffffff
>>> [ 1657.929965] sdhci: Host ctl2: 0x000000c8
>>> [ 1657.929981] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x8f042208
>>> [ 1657.929995] sdhci: ===========================================
>>> [ 1657.930156] mmc0: req done (CMD19): 0: 00000900 00000000 00000000 00000000
>>> [ 1657.930179] mmc0:     0 bytes transferred: -84
>>>
>>> It shows we first have a data CRC error interrupt then a data transfer
>>> complete interrupt.
>>> Then we got the !host->data case in sdhci_data_irq().
>>>
>>> CC: stable <stable@vger.kernel.org>
>>> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
>>> ---
>>>  drivers/mmc/host/sdhci.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>>> index 40e3551..2eb0e34 100644
>>> --- a/drivers/mmc/host/sdhci.c
>>> +++ b/drivers/mmc/host/sdhci.c
>>> @@ -2325,6 +2325,17 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
>>>  			}
>>>  		}
>>>  
>>> +		/*
>>> +		 * The "data complete" interrupt is possible to happen a bit
>>> +		 * later than CRC error and data end bit error interrupts
>>> +		 * separately for single block transfer or the last block of
>>> +		 * multiblock transfer. For this case, we DO NOT report the
>>> +		 * weird data interrupt event.
>>> +		 */
>>> +		if ((intmask & SDHCI_INT_DATA_END) &&
>>> +		    (host->mrq && host->mrq->data && host->mrq->data->error))
>>> +			return;
>>
>> This could be generalized a bit more i.e. what about:
>>
>> 		/*
>> 		 * After an error and before the the data circuit is reset in
>> 		 * sdhci_tasklet_finish, we could get more interrupts, but we
>> 		 * already have an error, so ignore them.
>> 		 */
>> 		if (host->mrq && host->mrq->data && host->mrq->data->error)
>> 			return;
>>
> 
> Sounds good to me.
> If you wouldn't mind, i'd like to add a bit more:
> 		/*
> 		 * After an error and before the data circuit is reset in
> 		 * sdhci_tasklet_finish, we could get more interrupts
> 		 * e.g. "data complete" interrupt, but we already have
> 		 * an error, so ignore them.
> 		 */
> 

Fine by me
Ritesh Harjani (IBM) May 26, 2016, 2:59 p.m. UTC | #6
Hi Adrian,


On Tue, May 17, 2016 at 11:28 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 17/05/16 07:31, Ritesh Harjani wrote:
>> On Tue, May 10, 2016 at 12:21 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>> On 15/04/16 20:29, Dong Aisheng wrote:
>>>> Current code will report the wrong data interrupt got when no
>>>> data operation in progress assumed by getting !host->data in
>>>> sdhci_data_irq().
>>>>
>>>> For a data command handling process, the driver will call
>>>> sdhci_finish_data() and clear host->data in case any data error,
>>>> then card finish_tasklet will do the rest controller reset work.
>>>>
>>>> Before the tasklet got run, however, controllers may report the
>>>> TC(Transfer Complete) interrupt (SDHCI_INT_DATA_END) a bit later
>>>> than data CRC error and data end bit error interrupts for single
>>>> block transfer or the last block of multiblock transfer.
>>
>> What if the controller generates Transfer complete interrupt bit more slower
>> and the tasklet do get a chance to run. In that case even host->mrq
>> will become NULL.
>> Actually if that happens it's even more worse since the driver may
>> start issuing other commands to controller.
>>
>> Are we covering that case here? Do you think that this case is possible?
>
> After data error, the host driver resets the controller so there will be no
> more interrupts relating to that mrq.
Resetting will happen in sdhci_tasklet_finish.
My case was -
1. Controller raised a Data CRC error.
2. sdhci_irq made host->data = NULL in sdhci_finish_data and scheduled
the finish_tasklet.
3. while the tasklet is running and before resetting CMD and Data
machine, TC interrupt is raised by controller.
4. sdhci_irq will call for sdhci_data_irq.

But I get it that the second interrupt will end up printing the same
error since host->data will be found as NULL in sdhci_data_irq and
will return.
So below patch should do the job :)

>
> This patch is avoiding unnecessary warnings until that happens.
Yes.



>
>>
>>
>>>>
>>>> Controller usually detects and generates data CRC/end bit error
>>>> interrupts once one block on the bus is transferred completely.
>>>> For single block transfer, since there's only one bock to transfer,
>>>> the controller will report transfer complete interrupt as well,
>>>> but until the data in controller FIFO has been successfully
>>>> transferred to memory. The time gap of TC and CRC interrupt depends on
>>>> the system busy state at that point and memory bus access speed.
>>>>
>>>> So it is possible when TC interrupt generated, host->data is already
>>>> equal to NULL due to cleared by former CRC/Data End Bit error which
>>>> is reasonable.
>>>>
>>>> Thus we DO NOT report the weird data interrupt event for this case.
>>>>
>>>> Else we may easily see warning below during SD3.0 card manually tuning
>>>> process (calling mmc_send_tuning() which is a single block transfer)
>>>> mmc0: Got data interrupt 0x00000002 even though no data operation was in progress
>>>>
>>>> The detailed command log is as follows:
>>>> [ 1657.920983] mmc0: starting CMD19 arg 00000000 flags 00000035
>>>> [ 1657.921009] mmc0:     blksz 64 blocks 1 flags 00000200 tsac 150 ms nsac 0
>>>> [ 1657.921085] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00200001
>>>> [ 1657.921112] sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000002
>>>> [ 1657.921131] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
>>>> [ 1657.929761] sdhci: =========== REGISTER DUMP (mmc0)===========
>>>> [ 1657.929780] sdhci: Sys addr: 0x3d5d6380 | Version:  0x00000002
>>>> [ 1657.929796] sdhci: Blk size: 0x00000040 | Blk cnt:  0x00000001
>>>> [ 1657.929814] sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
>>>> [ 1657.929831] sdhci: Present:  0x01fd8008 | Host ctl: 0x00000023
>>>> [ 1657.929847] sdhci: Power:    0x00000002 | Blk gap:  0x00000080
>>>> [ 1657.929863] sdhci: Wake-up:  0x00000008 | Clock:    0x0000000f
>>>> [ 1657.929879] sdhci: Timeout:  0x0000000f | Int stat: 0x00000000
>>>> [ 1657.929896] sdhci: Int enab: 0x107f008b | Sig enab: 0x107f008b
>>>> [ 1657.929914] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000003
>>>> [ 1657.929932] sdhci: Caps:     0x07eb0000 | Caps_1:   0x00002007
>>>> [ 1657.929949] sdhci: Cmd:      0x0000133a | Max curr: 0x00ffffff
>>>> [ 1657.929965] sdhci: Host ctl2: 0x000000c8
>>>> [ 1657.929981] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x8f042208
>>>> [ 1657.929995] sdhci: ===========================================
>>>> [ 1657.930156] mmc0: req done (CMD19): 0: 00000900 00000000 00000000 00000000
>>>> [ 1657.930179] mmc0:     0 bytes transferred: -84
>>>>
>>>> It shows we first have a data CRC error interrupt then a data transfer
>>>> complete interrupt.
>>>> Then we got the !host->data case in sdhci_data_irq().
>>>>
>>>> CC: stable <stable@vger.kernel.org>
>>>> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
>>>> ---
>>>>  drivers/mmc/host/sdhci.c | 11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>>>> index 40e3551..2eb0e34 100644
>>>> --- a/drivers/mmc/host/sdhci.c
>>>> +++ b/drivers/mmc/host/sdhci.c
>>>> @@ -2325,6 +2325,17 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
>>>>                       }
>>>>               }
>>>>
>>>> +             /*
>>>> +              * The "data complete" interrupt is possible to happen a bit
>>>> +              * later than CRC error and data end bit error interrupts
>>>> +              * separately for single block transfer or the last block of
>>>> +              * multiblock transfer. For this case, we DO NOT report the
>>>> +              * weird data interrupt event.
>>>> +              */
>>>> +             if ((intmask & SDHCI_INT_DATA_END) &&
>>>> +                 (host->mrq && host->mrq->data && host->mrq->data->error))
>>>> +                     return;
>>>
>>> This could be generalized a bit more i.e. what about:
>>>
>>>                 /*
>>>                  * After an error and before the the data circuit is reset in
>>>                  * sdhci_tasklet_finish, we could get more interrupts, but we
>>>                  * already have an error, so ignore them.
>>>                  */
>>>                 if (host->mrq && host->mrq->data && host->mrq->data->error)
>>>                         return;
>>>
>>>> +
>>>>               pr_err("%s: Got data interrupt 0x%08x even though no data operation was in progress.\n",
>>>>                      mmc_hostname(host->mmc), (unsigned)intmask);
>>>>               sdhci_dumpregs(host);
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
diff mbox

Patch

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 40e3551..2eb0e34 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -2325,6 +2325,17 @@  static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
 			}
 		}
 
+		/*
+		 * The "data complete" interrupt is possible to happen a bit
+		 * later than CRC error and data end bit error interrupts
+		 * separately for single block transfer or the last block of
+		 * multiblock transfer. For this case, we DO NOT report the
+		 * weird data interrupt event.
+		 */
+		if ((intmask & SDHCI_INT_DATA_END) &&
+		    (host->mrq && host->mrq->data && host->mrq->data->error))
+			return;
+
 		pr_err("%s: Got data interrupt 0x%08x even though no data operation was in progress.\n",
 		       mmc_hostname(host->mmc), (unsigned)intmask);
 		sdhci_dumpregs(host);