diff mbox series

[3/3] arm64: dts: rockchip: Disable DCMDs on RK3399's eMMC controller.

Message ID 20190301164349.60589-3-christoph.muellner@theobroma-systems.com (mailing list archive)
State New, archived
Headers show
Series [1/3] dt-bindings: mmc: Add DTS property to disable DCMDs on Arasan controllers | expand

Commit Message

Christoph Muellner March 1, 2019, 4:43 p.m. UTC
When using direct commands (DCMDs) on an RK3399, we get spurious
CQE completion interrupts for the DCMD transaction slot (#31):

[  931.196520] ------------[ cut here ]------------
[  931.201702] mmc1: cqhci: spurious TCN for tag 31
[  931.206906] WARNING: CPU: 0 PID: 1433 at
/usr/src/kernel/drivers/mmc/host/cqhci.c:725 cqhci_irq+0x2e4/0x490
[  931.206909] Modules linked in:
[  931.206918] CPU: 0 PID: 1433 Comm: irq/29-mmc1 Not tainted
4.19.8-rt6-funkadelic #1
[  931.206920] Hardware name: Theobroma Systems RK3399-Q7 SoM (DT)
[  931.206924] pstate: 40000005 (nZcv daif -PAN -UAO)
[  931.206927] pc : cqhci_irq+0x2e4/0x490
[  931.206931] lr : cqhci_irq+0x2e4/0x490
[  931.206933] sp : ffff00000e54bc80
[  931.206934] x29: ffff00000e54bc80 x28: 0000000000000000
[  931.206939] x27: 0000000000000001 x26: ffff000008f217e8
[  931.206944] x25: ffff8000f02ef030 x24: ffff0000091417b0
[  931.206948] x23: ffff0000090aa000 x22: ffff8000f008b000
[  931.206953] x21: 0000000000000002 x20: 000000000000001f
[  931.206957] x19: ffff8000f02ef018 x18: ffffffffffffffff
[  931.206961] x17: 0000000000000000 x16: 0000000000000000
[  931.206966] x15: ffff0000090aa6c8 x14: 0720072007200720
[  931.206970] x13: 0720072007200720 x12: 0720072007200720
[  931.206975] x11: 0720072007200720 x10: 0720072007200720
[  931.206980] x9 : 0720072007200720 x8 : 0720072007200720
[  931.206984] x7 : 0720073107330720 x6 : 00000000000005a0
[  931.206988] x5 : ffff00000860d4b0 x4 : 0000000000000000
[  931.206993] x3 : 0000000000000001 x2 : 0000000000000001
[  931.206997] x1 : 1bde3a91b0d4d900 x0 : 0000000000000000
[  931.207001] Call trace:
[  931.207005]  cqhci_irq+0x2e4/0x490
[  931.207009]  sdhci_arasan_cqhci_irq+0x5c/0x90
[  931.207013]  sdhci_irq+0x98/0x930
[  931.207019]  irq_forced_thread_fn+0x2c/0xa0
[  931.207023]  irq_thread+0x114/0x1c0
[  931.207027]  kthread+0x128/0x130
[  931.207032]  ret_from_fork+0x10/0x20
[  931.207035] ---[ end trace 0000000000000002 ]---

The driver shows this message only for the first spurious interrupt
by using WARN_ONCE(). Changing this to WARN() shows, that this is
happening quite frequently (up to once a second).

Since the eMMC 5.1 specification, where CQE and CQHCI are specified,
does not mention that spurious TCN interrupts for DCMDs can be simply
ignored, we must assume that using this feature is not working reliably.

The current implementation uses DCMD for REQ_OP_FLUSH only, and
I could not see any performance/power impact when disabling
this optional feature for RK3399.

Therefore this patch disables DCMDs for RK3399.

Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
---
 arch/arm64/boot/dts/rockchip/rk3399.dtsi | 1 +
 1 file changed, 1 insertion(+)

Comments

Shawn Lin March 2, 2019, 12:47 a.m. UTC | #1
On 2019/3/2 0:43, Christoph Muellner wrote:
> When using direct commands (DCMDs) on an RK3399, we get spurious
> CQE completion interrupts for the DCMD transaction slot (#31):

I didn't see it. Do you try any newer code, for instance, linux-next?

> 
> [  931.196520] ------------[ cut here ]------------
> [  931.201702] mmc1: cqhci: spurious TCN for tag 31
> [  931.206906] WARNING: CPU: 0 PID: 1433 at
> /usr/src/kernel/drivers/mmc/host/cqhci.c:725 cqhci_irq+0x2e4/0x490
> [  931.206909] Modules linked in:
> [  931.206918] CPU: 0 PID: 1433 Comm: irq/29-mmc1 Not tainted
> 4.19.8-rt6-funkadelic #1
> [  931.206920] Hardware name: Theobroma Systems RK3399-Q7 SoM (DT)
> [  931.206924] pstate: 40000005 (nZcv daif -PAN -UAO)
> [  931.206927] pc : cqhci_irq+0x2e4/0x490
> [  931.206931] lr : cqhci_irq+0x2e4/0x490
> [  931.206933] sp : ffff00000e54bc80
> [  931.206934] x29: ffff00000e54bc80 x28: 0000000000000000
> [  931.206939] x27: 0000000000000001 x26: ffff000008f217e8
> [  931.206944] x25: ffff8000f02ef030 x24: ffff0000091417b0
> [  931.206948] x23: ffff0000090aa000 x22: ffff8000f008b000
> [  931.206953] x21: 0000000000000002 x20: 000000000000001f
> [  931.206957] x19: ffff8000f02ef018 x18: ffffffffffffffff
> [  931.206961] x17: 0000000000000000 x16: 0000000000000000
> [  931.206966] x15: ffff0000090aa6c8 x14: 0720072007200720
> [  931.206970] x13: 0720072007200720 x12: 0720072007200720
> [  931.206975] x11: 0720072007200720 x10: 0720072007200720
> [  931.206980] x9 : 0720072007200720 x8 : 0720072007200720
> [  931.206984] x7 : 0720073107330720 x6 : 00000000000005a0
> [  931.206988] x5 : ffff00000860d4b0 x4 : 0000000000000000
> [  931.206993] x3 : 0000000000000001 x2 : 0000000000000001
> [  931.206997] x1 : 1bde3a91b0d4d900 x0 : 0000000000000000
> [  931.207001] Call trace:
> [  931.207005]  cqhci_irq+0x2e4/0x490
> [  931.207009]  sdhci_arasan_cqhci_irq+0x5c/0x90
> [  931.207013]  sdhci_irq+0x98/0x930
> [  931.207019]  irq_forced_thread_fn+0x2c/0xa0
> [  931.207023]  irq_thread+0x114/0x1c0
> [  931.207027]  kthread+0x128/0x130
> [  931.207032]  ret_from_fork+0x10/0x20
> [  931.207035] ---[ end trace 0000000000000002 ]---
> 
> The driver shows this message only for the first spurious interrupt
> by using WARN_ONCE(). Changing this to WARN() shows, that this is
> happening quite frequently (up to once a second).
> 
> Since the eMMC 5.1 specification, where CQE and CQHCI are specified,
> does not mention that spurious TCN interrupts for DCMDs can be simply
> ignored, we must assume that using this feature is not working reliably.
> 
> The current implementation uses DCMD for REQ_OP_FLUSH only, and
> I could not see any performance/power impact when disabling
> this optional feature for RK3399.
> 
> Therefore this patch disables DCMDs for RK3399.

We need to sort out the problem, and see if it could be solved, or
we just simply remove MMC_CAP2_CQE_DCMD it from sdhci-of-arasan

> 
> Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
> Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
> ---
>   arch/arm64/boot/dts/rockchip/rk3399.dtsi | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> index 6cc1c9fa4ea6..1bbf0da4e01d 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> @@ -333,6 +333,7 @@
>   		phys = <&emmc_phy>;
>   		phy-names = "phy_arasan";
>   		power-domains = <&power RK3399_PD_EMMC>;
> +		disable-cqe-dcmd;
>   		status = "disabled";
>   	};
>   
>
Christoph Muellner March 2, 2019, 8:26 a.m. UTC | #2
Hi Shawn,

On 3/2/19 1:47 AM, Shawn Lin wrote:
> On 2019/3/2 0:43, Christoph Muellner wrote:
>> When using direct commands (DCMDs) on an RK3399, we get spurious
>> CQE completion interrupts for the DCMD transaction slot (#31):
> 
> I didn't see it. Do you try any newer code, for instance, linux-next?

I can reproduce this with all kernel versions from 4.16 up to
linus/master. So all kernels with the cqhci driver (has been merged for
4.15) are affected.

All I need to do to reproduce the issue is to boot the system with a
root file system on the eMMC. I use a Debian stable based rootfs.

> 
>>
>> [  931.196520] ------------[ cut here ]------------
>> [  931.201702] mmc1: cqhci: spurious TCN for tag 31
>> [  931.206906] WARNING: CPU: 0 PID: 1433 at
>> /usr/src/kernel/drivers/mmc/host/cqhci.c:725 cqhci_irq+0x2e4/0x490
>> [  931.206909] Modules linked in:
>> [  931.206918] CPU: 0 PID: 1433 Comm: irq/29-mmc1 Not tainted
>> 4.19.8-rt6-funkadelic #1
>> [  931.206920] Hardware name: Theobroma Systems RK3399-Q7 SoM (DT)
>> [  931.206924] pstate: 40000005 (nZcv daif -PAN -UAO)
>> [  931.206927] pc : cqhci_irq+0x2e4/0x490
>> [  931.206931] lr : cqhci_irq+0x2e4/0x490
>> [  931.206933] sp : ffff00000e54bc80
>> [  931.206934] x29: ffff00000e54bc80 x28: 0000000000000000
>> [  931.206939] x27: 0000000000000001 x26: ffff000008f217e8
>> [  931.206944] x25: ffff8000f02ef030 x24: ffff0000091417b0
>> [  931.206948] x23: ffff0000090aa000 x22: ffff8000f008b000
>> [  931.206953] x21: 0000000000000002 x20: 000000000000001f
>> [  931.206957] x19: ffff8000f02ef018 x18: ffffffffffffffff
>> [  931.206961] x17: 0000000000000000 x16: 0000000000000000
>> [  931.206966] x15: ffff0000090aa6c8 x14: 0720072007200720
>> [  931.206970] x13: 0720072007200720 x12: 0720072007200720
>> [  931.206975] x11: 0720072007200720 x10: 0720072007200720
>> [  931.206980] x9 : 0720072007200720 x8 : 0720072007200720
>> [  931.206984] x7 : 0720073107330720 x6 : 00000000000005a0
>> [  931.206988] x5 : ffff00000860d4b0 x4 : 0000000000000000
>> [  931.206993] x3 : 0000000000000001 x2 : 0000000000000001
>> [  931.206997] x1 : 1bde3a91b0d4d900 x0 : 0000000000000000
>> [  931.207001] Call trace:
>> [  931.207005]  cqhci_irq+0x2e4/0x490
>> [  931.207009]  sdhci_arasan_cqhci_irq+0x5c/0x90
>> [  931.207013]  sdhci_irq+0x98/0x930
>> [  931.207019]  irq_forced_thread_fn+0x2c/0xa0
>> [  931.207023]  irq_thread+0x114/0x1c0
>> [  931.207027]  kthread+0x128/0x130
>> [  931.207032]  ret_from_fork+0x10/0x20
>> [  931.207035] ---[ end trace 0000000000000002 ]---
>>
>> The driver shows this message only for the first spurious interrupt
>> by using WARN_ONCE(). Changing this to WARN() shows, that this is
>> happening quite frequently (up to once a second).
>>
>> Since the eMMC 5.1 specification, where CQE and CQHCI are specified,
>> does not mention that spurious TCN interrupts for DCMDs can be simply
>> ignored, we must assume that using this feature is not working reliably.
>>
>> The current implementation uses DCMD for REQ_OP_FLUSH only, and
>> I could not see any performance/power impact when disabling
>> this optional feature for RK3399.
>>
>> Therefore this patch disables DCMDs for RK3399.
> 
> We need to sort out the problem, and see if it could be solved, or
> we just simply remove MMC_CAP2_CQE_DCMD it from sdhci-of-arasan

I fully agree that we should address it in the driver
if it would be buggy.

Therefore I debugged the issue and used an event-log
based on atomic_t variables to observe what is going on.
And it is indeed the case that we get a second spurious
interrupt (an interrupt for a slot, which has the doorbell
bit not set previously) from the controller every now and then.
Only slot #31 is affected (so only DCMDs).
And only if DCMD support is enabled.

I disagree, that we should disable it for sdhci-of-arasan (i.e. for all
Arasan eMMC 5.1 based controllers), because, I cannot say that all
Arasan eMMC 5.1 based implementations are affected.
I only know that the one in the RK3399 is affected (mainly because I
don't have access to more devices with this IP core). Therefore the
series disables it for RK3399.

Thanks,
Christoph


> 
>>
>> Signed-off-by: Christoph Muellner
>> <christoph.muellner@theobroma-systems.com>
>> Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
>> ---
>>   arch/arm64/boot/dts/rockchip/rk3399.dtsi | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
>> b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
>> index 6cc1c9fa4ea6..1bbf0da4e01d 100644
>> --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
>> +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
>> @@ -333,6 +333,7 @@
>>           phys = <&emmc_phy>;
>>           phy-names = "phy_arasan";
>>           power-domains = <&power RK3399_PD_EMMC>;
>> +        disable-cqe-dcmd;
>>           status = "disabled";
>>       };
>>  
> 
>
diff mbox series

Patch

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index 6cc1c9fa4ea6..1bbf0da4e01d 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -333,6 +333,7 @@ 
 		phys = <&emmc_phy>;
 		phy-names = "phy_arasan";
 		power-domains = <&power RK3399_PD_EMMC>;
+		disable-cqe-dcmd;
 		status = "disabled";
 	};