diff mbox series

mmc: sdhci-pci: disable intel voltage switch if unsupported

Message ID 20181018102112.31054-1-anisse@astier.eu (mailing list archive)
State New, archived
Headers show
Series mmc: sdhci-pci: disable intel voltage switch if unsupported | expand

Commit Message

Anisse Astier Oct. 18, 2018, 10:21 a.m. UTC
If we don't have the voltage switch DSM methods available, there's no
point in advertising to the rest of the kernel that we support 1.8V, or
support voltage switch at all.

This fixes an issue on a Gemini Lake (GLK) laptop : eMMC driver will
timeout on boot (from 60seconds to 10minutes ) as the cqhci attempts CQE
recovery after a failed voltage switch. In earlier kernels, the problem
existed, but only delayed boot for about 10 seconds after an I/O error,
allowing booting on the eMMC (almost) unnoticed.

Signed-off-by: Anisse Astier <anisse@astier.eu>
---
Hi,

I'm looking forward to your feedback on this patch. Is it safe to
have no voltage switch method ? Should the quirk be used or skipped
entirely ?

It seemed to work well in my tests, as the mmc core works properly.

Regards,

Anisse

 drivers/mmc/host/sdhci-pci-core.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Adrian Hunter Oct. 19, 2018, 7:07 a.m. UTC | #1
On 18/10/18 1:21 PM, Anisse Astier wrote:
> If we don't have the voltage switch DSM methods available, there's no
> point in advertising to the rest of the kernel that we support 1.8V, or
> support voltage switch at all.
> 
> This fixes an issue on a Gemini Lake (GLK) laptop : eMMC driver will
> timeout on boot (from 60seconds to 10minutes ) as the cqhci attempts CQE
> recovery after a failed voltage switch. In earlier kernels, the problem
> existed, but only delayed boot for about 10 seconds after an I/O error,
> allowing booting on the eMMC (almost) unnoticed.

Can you send the kernel messages?  Which kernel is it?  Which laptop?  An
acpidump might help too.

> 
> Signed-off-by: Anisse Astier <anisse@astier.eu>
> ---
> Hi,
> 
> I'm looking forward to your feedback on this patch. Is it safe to
> have no voltage switch method ? Should the quirk be used or skipped
> entirely ?

The eMMC is typically 1.8V only and the voltage switch does nothing.  I have
been meaning to tidy it up, but it seemed like it was working so it didn't
get much priority, until now.

> 
> It seemed to work well in my tests, as the mmc core works properly.
> 
> Regards,
> 
> Anisse
> 
>  drivers/mmc/host/sdhci-pci-core.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
> index 7bfd366d970d..8fac8cbc5ce8 100644
> --- a/drivers/mmc/host/sdhci-pci-core.c
> +++ b/drivers/mmc/host/sdhci-pci-core.c
> @@ -706,10 +706,27 @@ static int intel_execute_tuning(struct mmc_host *mmc, u32 opcode)
>  static void byt_probe_slot(struct sdhci_pci_slot *slot)
>  {
>  	struct mmc_host_ops *ops = &slot->host->mmc_host_ops;
> +	struct intel_host *intel_host = sdhci_pci_priv(slot);
>  
>  	byt_read_dsm(slot);
>  
>  	ops->execute_tuning = intel_execute_tuning;
> +
> +	/* Check if we have the appropriate voltage switch DSM methods */
> +	if (!(intel_host->dsm_fns & (1 << INTEL_DSM_V18_SWITCH))) {
> +		pr_info("%s: no DSM function for 1.8 voltage switch\n",
> +			 mmc_hostname(slot->host->mmc));
> +		slot->host->quirks2 |= SDHCI_QUIRK2_NO_1_8_V;
> +
> +		if (!(intel_host->dsm_fns & (1 << INTEL_DSM_V33_SWITCH))) {
> +			/* No voltage switching supported at all, there's no
> +			 * point in installing the callback: return.
> +			 */
> +			pr_info("%s: Voltage switching unsupported\n",
> +				mmc_hostname(slot->host->mmc));
> +			return;
> +		}
> +	}
>  	ops->start_signal_voltage_switch = intel_start_signal_voltage_switch;
>  }
>  
>
Adrian Hunter Oct. 19, 2018, 1:19 p.m. UTC | #2
On 19/10/18 12:26 PM, Anisse Astier wrote:
> Hi Adrian,
> 
> On Fri, Oct 19, 2018 at 10:07:38AM +0300, Adrian Hunter wrote:
>> On 18/10/18 1:21 PM, Anisse Astier wrote:
>>> If we don't have the voltage switch DSM methods available, there's no
>>> point in advertising to the rest of the kernel that we support 1.8V, or
>>> support voltage switch at all.
>>>
>>> This fixes an issue on a Gemini Lake (GLK) laptop : eMMC driver will
>>> timeout on boot (from 60seconds to 10minutes ) as the cqhci attempts CQE
>>> recovery after a failed voltage switch. In earlier kernels, the problem
>>> existed, but only delayed boot for about 10 seconds after an I/O error,
>>> allowing booting on the eMMC (almost) unnoticed.
>>
>> Can you send the kernel messages?  Which kernel is it?  Which laptop?  An
>> acpidump might help too.
> 
> You're right, I should have started with this. I have attached various
> dmesg traces:
>  - dmesg-4.18.3-CQE-traces.txt : the original issue that was
>    encountered, it shows the multiple CQE recovery timeouts, each taking
>    about 60s
>  - dmesg-4.19-rc8.noquirk.txt : a boot where the CQE recovery works, and
>    only an I/O error is shown. I've reduced it to the mmc/sdhci traces.
>  - dmesg-4.19-rc8.noquirk-with-error.txt : a boot where it fails, this
>    is similar to the first one, but with a more recent kernel
>  - dmesg-4.19-rc8.withquirk.txt : a boot with sdhci.debug_quirks2=0x90c
>    on the command line. I've added the SDHCI_QUIRK2_NO_1_8_V quirk to
>    the other ones present in the driver. You can see there's no CQE
>    recovery or mmc I/O errors.
> 
> 
> I've reproduced the issue with linux 4.17, 4.18 and 4.19-rc8. The laptop
> is a noname laptop with an Insyde EFI firmware (Manufacturer: Notebook,
> Product Name: N75_77GU).
> 
> You'll find the acpidump in the attachment. I've looked into another GLK
> laptop's tables, and the implemented acpi methods seem to only do a
> sleep(), which isn't really helpful. I've tried to add a similmar msleep
> in the voltage_switch function, but it didn't seem to help.

Please try disabling CQE i.e.

diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
index e53333c695b3..c0f8348f75f7 100644
--- a/drivers/mmc/host/sdhci-pci-core.c
+++ b/drivers/mmc/host/sdhci-pci-core.c
@@ -732,7 +732,7 @@ static int glk_emmc_probe_slot(struct sdhci_pci_slot *slot)
 {
 	int ret = byt_emmc_probe_slot(slot);

-	slot->host->mmc->caps2 |= MMC_CAP2_CQE;
+	//slot->host->mmc->caps2 |= MMC_CAP2_CQE;

 	if (slot->chip->pdev->device != PCI_DEVICE_ID_INTEL_GLK_EMMC) {
 		slot->host->mmc->caps2 |= MMC_CAP2_HS400_ES,


> 
>>
>>>
>>> Signed-off-by: Anisse Astier <anisse@astier.eu>
>>> ---
>>> Hi,
>>>
>>> I'm looking forward to your feedback on this patch. Is it safe to
>>> have no voltage switch method ? Should the quirk be used or skipped
>>> entirely ?
>>
>> The eMMC is typically 1.8V only and the voltage switch does nothing.  I have
>> been meaning to tidy it up, but it seemed like it was working so it didn't
>> get much priority, until now.
> 
> Answering my own question, only the quirk is important; having a voltage
> switch callback or not has no impact.
> 
> The kernel seems to assume that it's started at 3.3V, but maybe this is
> the issue then ?
> 
> Regards,
> 
> Anisse
>
Anisse Astier Oct. 19, 2018, 1:42 p.m. UTC | #3
On Fri, Oct 19, 2018 at 04:19:39PM +0300, Adrian Hunter wrote:
> On 19/10/18 12:26 PM, Anisse Astier wrote:
> > Hi Adrian,
> > 
> > On Fri, Oct 19, 2018 at 10:07:38AM +0300, Adrian Hunter wrote:
> >> On 18/10/18 1:21 PM, Anisse Astier wrote:
> >>> If we don't have the voltage switch DSM methods available, there's no
> >>> point in advertising to the rest of the kernel that we support 1.8V, or
> >>> support voltage switch at all.
> >>>
> >>> This fixes an issue on a Gemini Lake (GLK) laptop : eMMC driver will
> >>> timeout on boot (from 60seconds to 10minutes ) as the cqhci attempts CQE
> >>> recovery after a failed voltage switch. In earlier kernels, the problem
> >>> existed, but only delayed boot for about 10 seconds after an I/O error,
> >>> allowing booting on the eMMC (almost) unnoticed.
> >>
> >> Can you send the kernel messages?  Which kernel is it?  Which laptop?  An
> >> acpidump might help too.
> > 
> > You're right, I should have started with this. I have attached various
> > dmesg traces:
> >  - dmesg-4.18.3-CQE-traces.txt : the original issue that was
> >    encountered, it shows the multiple CQE recovery timeouts, each taking
> >    about 60s
> >  - dmesg-4.19-rc8.noquirk.txt : a boot where the CQE recovery works, and
> >    only an I/O error is shown. I've reduced it to the mmc/sdhci traces.
> >  - dmesg-4.19-rc8.noquirk-with-error.txt : a boot where it fails, this
> >    is similar to the first one, but with a more recent kernel
> >  - dmesg-4.19-rc8.withquirk.txt : a boot with sdhci.debug_quirks2=0x90c
> >    on the command line. I've added the SDHCI_QUIRK2_NO_1_8_V quirk to
> >    the other ones present in the driver. You can see there's no CQE
> >    recovery or mmc I/O errors.
> > 
> > 
> > I've reproduced the issue with linux 4.17, 4.18 and 4.19-rc8. The laptop
> > is a noname laptop with an Insyde EFI firmware (Manufacturer: Notebook,
> > Product Name: N75_77GU).
> > 
> > You'll find the acpidump in the attachment. I've looked into another GLK
> > laptop's tables, and the implemented acpi methods seem to only do a
> > sleep(), which isn't really helpful. I've tried to add a similmar msleep
> > in the voltage_switch function, but it didn't seem to help.
> 
> Please try disabling CQE i.e.
> 
> diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
> index e53333c695b3..c0f8348f75f7 100644
> --- a/drivers/mmc/host/sdhci-pci-core.c
> +++ b/drivers/mmc/host/sdhci-pci-core.c
> @@ -732,7 +732,7 @@ static int glk_emmc_probe_slot(struct sdhci_pci_slot *slot)
>  {
>  	int ret = byt_emmc_probe_slot(slot);
> 
> -	slot->host->mmc->caps2 |= MMC_CAP2_CQE;
> +	//slot->host->mmc->caps2 |= MMC_CAP2_CQE;
> 
>  	if (slot->chip->pdev->device != PCI_DEVICE_ID_INTEL_GLK_EMMC) {
>  		slot->host->mmc->caps2 |= MMC_CAP2_HS400_ES,
> 

I did that, and while we don't have the long CQE timeouts, we still have
an I/O error:

[    0.468934] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xe0000000-0xe3ffffff] (base 0xe0000000)
[    0.468934] PCI: MMCONFIG at [mem 0xe0000000-0xe3ffffff] reserved in E820
[    0.621318] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-3f] only partially covers this bridge
[    3.727365] sdhci: Secure Digital Host Controller Interface driver
[    3.727365] sdhci: Copyright(c) Pierre Ossman
[    3.729638] sdhci-pci 0000:00:1c.0: SDHCI controller found [8086:31cc] (rev 3)
[    3.731801] mmc0: no DSM function for 1.8 voltage switch
[    3.731802] mmc0: Voltage switching unsupported
[    3.731872] mmc0: CQHCI version 5.10
[    3.735646] mmc0: SDHCI controller on PCI [0000:00:1c.0] using ADMA 64-bit
[    3.816485] mmc0: new HS400 MMC card at address 0001
[    3.819219] mmcblk0: mmc0:0001 M52532 29.1 GiB 
[    3.819635] mmcblk0boot0: mmc0:0001 M52532 partition 1 4.00 MiB
[    3.820056] mmcblk0boot1: mmc0:0001 M52532 partition 2 4.00 MiB
[    3.820225] mmcblk0rpmb: mmc0:0001 M52532 partition 3 4.00 MiB, chardev (247:0)
[    3.823106]  mmcblk0: p1 p2 p3 p4 p5 p6
[    4.135118] bcache: register_cache() registered cache device mmcblk0p6
[    5.627275] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[   16.251559] mmc0: Timeout waiting for hardware interrupt.
[   16.251572] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[   16.251581] mmc0: sdhci: Sys addr:  0x00000020 | Version:  0x00001002
[   16.251589] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000020
[   16.251596] mmc0: sdhci: Argument:  0x000f4000 | Trn mode: 0x0000003b
[   16.251603] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
[   16.251609] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
[   16.251616] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
[   16.251623] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
[   16.251630] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
[   16.251637] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
[   16.251644] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
[   16.251651] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
[   16.251658] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
[   16.251665] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
[   16.251671] mmc0: sdhci: Host ctl2: 0x0000000d
[   16.251679] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000173126200
[   16.251682] mmc0: sdhci: ============================================
[   16.252462] mmc0: mmc_hs400_to_hs200 failed, error -84
[   16.253487] mmcblk0: error -84 requesting status
[   16.253856] mmc0: mmc_hs400_to_hs200 failed, error -84
[   16.254111] mmc0: cache flush error -84
[   16.370768] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
[   17.310793] EXT4-fs (mmcblk0p2): re-mounted. Opts: errors=remount-ro
[   17.632001] bcache: register_bcache() error /dev/mmcblk0p6: device already registered
[   18.135208] Adding 524284k swap on /dev/mmcblk0p3.  Priority:-2 extents:1 across:524284k SSFS
[   18.818304] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[   18.818848] EXT4-fs (mmcblk0p5): mounted filesystem with ordered data mode. Opts: (null)


Anisse
Anisse Astier Oct. 19, 2018, 2:13 p.m. UTC | #4
On Fri, Oct 19, 2018 at 03:42:37PM +0200, Anisse Astier wrote:
> On Fri, Oct 19, 2018 at 04:19:39PM +0300, Adrian Hunter wrote:
> > On 19/10/18 12:26 PM, Anisse Astier wrote:
> > > Hi Adrian,
> > > 
> > > On Fri, Oct 19, 2018 at 10:07:38AM +0300, Adrian Hunter wrote:
> > >> On 18/10/18 1:21 PM, Anisse Astier wrote:
> > >>> If we don't have the voltage switch DSM methods available, there's no
> > >>> point in advertising to the rest of the kernel that we support 1.8V, or
> > >>> support voltage switch at all.
> > >>>
> > >>> This fixes an issue on a Gemini Lake (GLK) laptop : eMMC driver will
> > >>> timeout on boot (from 60seconds to 10minutes ) as the cqhci attempts CQE
> > >>> recovery after a failed voltage switch. In earlier kernels, the problem
> > >>> existed, but only delayed boot for about 10 seconds after an I/O error,
> > >>> allowing booting on the eMMC (almost) unnoticed.
> > >>
> > >> Can you send the kernel messages?  Which kernel is it?  Which laptop?  An
> > >> acpidump might help too.
> > > 
> > > You're right, I should have started with this. I have attached various
> > > dmesg traces:
> > >  - dmesg-4.18.3-CQE-traces.txt : the original issue that was
> > >    encountered, it shows the multiple CQE recovery timeouts, each taking
> > >    about 60s
> > >  - dmesg-4.19-rc8.noquirk.txt : a boot where the CQE recovery works, and
> > >    only an I/O error is shown. I've reduced it to the mmc/sdhci traces.
> > >  - dmesg-4.19-rc8.noquirk-with-error.txt : a boot where it fails, this
> > >    is similar to the first one, but with a more recent kernel
> > >  - dmesg-4.19-rc8.withquirk.txt : a boot with sdhci.debug_quirks2=0x90c
> > >    on the command line. I've added the SDHCI_QUIRK2_NO_1_8_V quirk to
> > >    the other ones present in the driver. You can see there's no CQE
> > >    recovery or mmc I/O errors.
> > > 
> > > 
> > > I've reproduced the issue with linux 4.17, 4.18 and 4.19-rc8. The laptop
> > > is a noname laptop with an Insyde EFI firmware (Manufacturer: Notebook,
> > > Product Name: N75_77GU).
> > > 
> > > You'll find the acpidump in the attachment. I've looked into another GLK
> > > laptop's tables, and the implemented acpi methods seem to only do a
> > > sleep(), which isn't really helpful. I've tried to add a similmar msleep
> > > in the voltage_switch function, but it didn't seem to help.
> > 
> > Please try disabling CQE i.e.
> > 

The errors might be chained as well, each timeout taking about 10
seconds (instead of 60s with CQE), and delaying the boot:

[    0.512744] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xe0000000-0xe3ffffff] (base 0xe0000000)
[    0.512744] PCI: MMCONFIG at [mem 0xe0000000-0xe3ffffff] reserved in E820
[    0.665246] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-3f] only partially covers this bridge
[    4.145041] sdhci: Secure Digital Host Controller Interface driver
[    4.145043] sdhci: Copyright(c) Pierre Ossman
[    4.146450] sdhci-pci 0000:00:1c.0: SDHCI controller found [8086:31cc] (rev 3)
[    4.146681] mmc0: no DSM function for 1.8 voltage switch
[    4.146682] mmc0: Voltage switching unsupported
[    4.146737] mmc0: CQHCI version 5.10
[    4.151399] mmc0: SDHCI controller on PCI [0000:00:1c.0] using ADMA 64-bit
[    4.248620] mmc0: new HS400 MMC card at address 0001
[    4.249722] mmcblk0: mmc0:0001 M52532 29.1 GiB
[    4.249944] mmcblk0boot0: mmc0:0001 M52532 partition 1 4.00 MiB
[    4.250181] mmcblk0boot1: mmc0:0001 M52532 partition 2 4.00 MiB
[    4.250275] mmcblk0rpmb: mmc0:0001 M52532 partition 3 4.00 MiB, chardev (247:0)
[    4.252688]  mmcblk0: p1 p2 p3 p4 p5 p6
[    4.555256] bcache: register_cache() registered cache device mmcblk0p6
[    5.990918] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[   16.535288] mmc0: Timeout waiting for hardware interrupt.
[   16.535300] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[   16.535308] mmc0: sdhci: Sys addr:  0x00000020 | Version:  0x00001002
[   16.535316] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000020
[   16.535322] mmc0: sdhci: Argument:  0x000f4000 | Trn mode: 0x0000003b
[   16.535329] mmc0: sdhci: Present:   0x1fff0001 | Host ctl: 0x0000003d
[   16.535336] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
[   16.535343] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
[   16.535350] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
[   16.535358] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
[   16.535364] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
[   16.535370] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
[   16.535377] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
[   16.535384] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
[   16.535391] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[   16.535397] mmc0: sdhci: Host ctl2: 0x0000000d
[   16.535405] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000172c5b200
[   16.535408] mmc0: sdhci: ============================================
[   26.775373] mmc0: Timeout waiting for hardware interrupt.
[   26.775386] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[   26.775396] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
[   26.775403] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
[   26.775410] mmc0: sdhci: Argument:  0x00cf3f80 | Trn mode: 0x0000003b
[   26.775417] mmc0: sdhci: Present:   0x1fff0001 | Host ctl: 0x0000003d
[   26.775424] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
[   26.775430] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
[   26.775437] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
[   26.775444] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
[   26.775451] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
[   26.775458] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
[   26.775465] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
[   26.775472] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
[   26.775479] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000800
[   26.775484] mmc0: sdhci: Host ctl2: 0x0000000d
[   26.775492] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000172c5b200
[   26.775495] mmc0: sdhci: ============================================
[   37.015373] mmc0: Timeout waiting for hardware interrupt.
[   37.015386] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[   37.015396] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
[   37.015403] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
[   37.015410] mmc0: sdhci: Argument:  0x000f4000 | Trn mode: 0x0000003b
[   37.015418] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
[   37.015425] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
[   37.015432] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
[   37.015438] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
[   37.015445] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
[   37.015452] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
[   37.015459] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
[   37.015466] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
[   37.015473] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
[   37.015480] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
[   37.015486] mmc0: sdhci: Host ctl2: 0x0000000d
[   37.015494] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000172c5b200
[   37.015497] mmc0: sdhci: ============================================
[   37.016584] mmc0: mmc_hs400_to_hs200 failed, error -110
[   37.017321] mmc0: mmc_hs400_to_hs200 failed, error -110
[   47.255376] mmc0: Timeout waiting for hardware interrupt.
[   47.255389] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[   47.255397] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
[   47.255405] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
[   47.255412] mmc0: sdhci: Argument:  0x000f4000 | Trn mode: 0x0000003b
[   47.255420] mmc0: sdhci: Present:   0x1fff0001 | Host ctl: 0x0000003d
[   47.255426] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
[   47.255434] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000207
[   47.255441] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
[   47.255448] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
[   47.255455] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
[   47.255463] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
[   47.255470] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
[   47.255477] mmc0: sdhci: Resp[0]:   0x00400900 | Resp[1]:  0x00000000
[   47.255483] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000100
[   47.255489] mmc0: sdhci: Host ctl2: 0x0000000d
[   47.255497] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000172c5b200
[   47.255500] mmc0: sdhci: ============================================
[   57.495377] mmc0: Timeout waiting for hardware interrupt.
[   57.495390] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[   57.495399] mmc0: sdhci: Sys addr:  0x80000001 | Version:  0x00001002
[   57.495406] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000001
[   57.495413] mmc0: sdhci: Argument:  0x0139d5a2 | Trn mode: 0x0000002b
[   57.495419] mmc0: sdhci: Present:   0x1fff0106 | Host ctl: 0x0000003d
[   57.495425] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
[   57.495433] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000207
[   57.495440] mmc0: sdhci: Timeout:   0x00000008 | Int stat: 0x00000000
[   57.495446] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
[   57.495453] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
[   57.495461] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
[   57.495467] mmc0: sdhci: Cmd:       0x0000193a | Max curr: 0x00000000
[   57.495474] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
[   57.495480] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
[   57.495487] mmc0: sdhci: Host ctl2: 0x0000000d
[   57.495495] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000172c5b200
[   57.495498] mmc0: sdhci: ============================================
[   57.496650] mmc0: mmc_hs400_to_hs200 failed, error -84
[   57.498397] mmcblk0: error -84 requesting status
[   57.498924] mmc0: mmc_hs400_to_hs200 failed, error -84
[   57.499876] mmc0: cache flush error -84
[   57.601521] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
[   58.513170] EXT4-fs (mmcblk0p2): re-mounted. Opts: errors=remount-ro
[   58.852299] bcache: register_bcache() error /dev/mmcblk0p6: device already registered
[   59.362150] Adding 524284k swap on /dev/mmcblk0p3.  Priority:-2 extents:1 across:524284k SSFS
[   59.997479] EXT4-fs (mmcblk0p5): mounted filesystem with ordered data mode. Opts: (null)
[   59.998027] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: errors=remount-ro


Anisse
Anisse Astier Oct. 22, 2018, 9:14 a.m. UTC | #5
Hi,

On Mon, Oct 22, 2018 at 11:47:30AM +0300, Adrian Hunter wrote:
> On 19/10/18 4:42 PM, Anisse Astier wrote:
> > On Fri, Oct 19, 2018 at 04:19:39PM +0300, Adrian Hunter wrote:
> >> On 19/10/18 12:26 PM, Anisse Astier wrote:
> >>> Hi Adrian,
> >>>
> >>> On Fri, Oct 19, 2018 at 10:07:38AM +0300, Adrian Hunter wrote:
> >>>> On 18/10/18 1:21 PM, Anisse Astier wrote:
> >>>>> If we don't have the voltage switch DSM methods available, there's no
> >>>>> point in advertising to the rest of the kernel that we support 1.8V, or
> >>>>> support voltage switch at all.
> >>>>>
> >>>>> This fixes an issue on a Gemini Lake (GLK) laptop : eMMC driver will
> >>>>> timeout on boot (from 60seconds to 10minutes ) as the cqhci attempts CQE
> >>>>> recovery after a failed voltage switch. In earlier kernels, the problem
> >>>>> existed, but only delayed boot for about 10 seconds after an I/O error,
> >>>>> allowing booting on the eMMC (almost) unnoticed.
> >>>>
> >>>> Can you send the kernel messages?  Which kernel is it?  Which laptop?  An
> >>>> acpidump might help too.
> >>>
> >>> You're right, I should have started with this. I have attached various
> >>> dmesg traces:
> >>>  - dmesg-4.18.3-CQE-traces.txt : the original issue that was
> >>>    encountered, it shows the multiple CQE recovery timeouts, each taking
> >>>    about 60s
> >>>  - dmesg-4.19-rc8.noquirk.txt : a boot where the CQE recovery works, and
> >>>    only an I/O error is shown. I've reduced it to the mmc/sdhci traces.
> >>>  - dmesg-4.19-rc8.noquirk-with-error.txt : a boot where it fails, this
> >>>    is similar to the first one, but with a more recent kernel
> >>>  - dmesg-4.19-rc8.withquirk.txt : a boot with sdhci.debug_quirks2=0x90c
> >>>    on the command line. I've added the SDHCI_QUIRK2_NO_1_8_V quirk to
> >>>    the other ones present in the driver. You can see there's no CQE
> >>>    recovery or mmc I/O errors.
> >>>
> >>>
> >>> I've reproduced the issue with linux 4.17, 4.18 and 4.19-rc8. The laptop
> >>> is a noname laptop with an Insyde EFI firmware (Manufacturer: Notebook,
> >>> Product Name: N75_77GU).
> >>>
> >>> You'll find the acpidump in the attachment. I've looked into another GLK
> >>> laptop's tables, and the implemented acpi methods seem to only do a
> >>> sleep(), which isn't really helpful. I've tried to add a similmar msleep
> >>> in the voltage_switch function, but it didn't seem to help.
> >>
> >> Please try disabling CQE i.e.
> >>
> >> diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
> >> index e53333c695b3..c0f8348f75f7 100644
> >> --- a/drivers/mmc/host/sdhci-pci-core.c
> >> +++ b/drivers/mmc/host/sdhci-pci-core.c
> >> @@ -732,7 +732,7 @@ static int glk_emmc_probe_slot(struct sdhci_pci_slot *slot)
> >>  {
> >>  	int ret = byt_emmc_probe_slot(slot);
> >>
> >> -	slot->host->mmc->caps2 |= MMC_CAP2_CQE;
> >> +	//slot->host->mmc->caps2 |= MMC_CAP2_CQE;
> >>
> >>  	if (slot->chip->pdev->device != PCI_DEVICE_ID_INTEL_GLK_EMMC) {
> >>  		slot->host->mmc->caps2 |= MMC_CAP2_HS400_ES,
> >>
> > 
> > I did that, and while we don't have the long CQE timeouts, we still have
> > an I/O error:
> > 
> > [    0.468934] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xe0000000-0xe3ffffff] (base 0xe0000000)
> > [    0.468934] PCI: MMCONFIG at [mem 0xe0000000-0xe3ffffff] reserved in E820
> > [    0.621318] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-3f] only partially covers this bridge
> > [    3.727365] sdhci: Secure Digital Host Controller Interface driver
> > [    3.727365] sdhci: Copyright(c) Pierre Ossman
> > [    3.729638] sdhci-pci 0000:00:1c.0: SDHCI controller found [8086:31cc] (rev 3)
> > [    3.731801] mmc0: no DSM function for 1.8 voltage switch
> > [    3.731802] mmc0: Voltage switching unsupported
> > [    3.731872] mmc0: CQHCI version 5.10
> > [    3.735646] mmc0: SDHCI controller on PCI [0000:00:1c.0] using ADMA 64-bit
> > [    3.816485] mmc0: new HS400 MMC card at address 0001
> > [    3.819219] mmcblk0: mmc0:0001 M52532 29.1 GiB 
> > [    3.819635] mmcblk0boot0: mmc0:0001 M52532 partition 1 4.00 MiB
> > [    3.820056] mmcblk0boot1: mmc0:0001 M52532 partition 2 4.00 MiB
> > [    3.820225] mmcblk0rpmb: mmc0:0001 M52532 partition 3 4.00 MiB, chardev (247:0)
> > [    3.823106]  mmcblk0: p1 p2 p3 p4 p5 p6
> > [    4.135118] bcache: register_cache() registered cache device mmcblk0p6
> > [    5.627275] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
> > [   16.251559] mmc0: Timeout waiting for hardware interrupt.
> > [   16.251572] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> > [   16.251581] mmc0: sdhci: Sys addr:  0x00000020 | Version:  0x00001002
> > [   16.251589] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000020
> > [   16.251596] mmc0: sdhci: Argument:  0x000f4000 | Trn mode: 0x0000003b
> > [   16.251603] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
> > [   16.251609] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
> > [   16.251616] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
> > [   16.251623] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
> > [   16.251630] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
> > [   16.251637] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
> > [   16.251644] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
> > [   16.251651] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
> > [   16.251658] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
> > [   16.251665] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
> > [   16.251671] mmc0: sdhci: Host ctl2: 0x0000000d
> > [   16.251679] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000173126200
> > [   16.251682] mmc0: sdhci: ============================================
> > [   16.252462] mmc0: mmc_hs400_to_hs200 failed, error -84
> > [   16.253487] mmcblk0: error -84 requesting status
> > [   16.253856] mmc0: mmc_hs400_to_hs200 failed, error -84
> > [   16.254111] mmc0: cache flush error -84
> > [   16.370768] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
> > [   17.310793] EXT4-fs (mmcblk0p2): re-mounted. Opts: errors=remount-ro
> > [   17.632001] bcache: register_bcache() error /dev/mmcblk0p6: device already registered
> > [   18.135208] Adding 524284k swap on /dev/mmcblk0p3.  Priority:-2 extents:1 across:524284k SSFS
> > [   18.818304] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: errors=remount-ro
> > [   18.818848] EXT4-fs (mmcblk0p5): mounted filesystem with ordered data mode. Opts: (null)
> 
> Let's get some more debugging messages.  If you configure dynamic debug then
> add kernel command line parameter:
> 
> 	dyndbg="file drivers/mmc/core/* +p;file drivers/mmc/host/* +p"
> 
> with a recent kernel, still with CQE disabled, but no quirks or other
> workarounds, then it will produce a large number of mmc messages.
> 

You'll find the full dmesg here:
https://anisse.astier.eu/static/dmesg-4.19-rc8.nocqe-verbose.xz

I don't expect it to live very long, so I've tried quoting something
useful before the first register dump, below:

[    4.471082] <mmc0: starting CMD23 arg 80000004 flags 00000015>
[    4.471084] mmc0: starting CMD25 arg 01237156 flags 000000b5
[    4.471086] mmc0:     blksz 512 blocks 4 flags 00001100 tsac 1600 ms nsac 4000
[    4.471087] mmc0:     CMD12 arg 00000000 flags 0000049d
[    4.472140] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
[    4.472142] mmc0: req done (CMD25): 0: 00000900 00000000 00000000 00000000
[    4.472143] mmc0:     2048 bytes transferred: 0
[    4.472145] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
[    4.472184] mmc0: starting CMD13 arg 00010000 flags 00000195
[    4.472236] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000
[    4.472284] mmc0: starting CMD6 arg 03200101 flags 0000049d
[    4.472339] mmc0: req done (CMD6): 0: 00000800 00000000 00000000 00000000
[    4.472381] mmc0: starting CMD13 arg 00010000 flags 00000195
[    4.472433] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000
[    4.472480] <mmc0: starting CMD23 arg 80000001 flags 00000015>
[    4.472482] mmc0: starting CMD25 arg 01218c00 flags 000000b5
[    4.472483] mmc0:     blksz 512 blocks 1 flags 00001100 tsac 1600 ms nsac 4000
[    4.472485] mmc0:     CMD12 arg 00000000 flags 0000049d
[    4.473056] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
[    4.473058] mmc0: req done (CMD25): 0: 00000900 00000000 00000000 00000000
[    4.473059] mmc0:     512 bytes transferred: 0
[    4.473060] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
[    4.473069] mmc0: starting CMD13 arg 00010000 flags 00000195
[    4.473100] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000
[    4.487852] mmc1: req done (CMD52): -110: 00000000 00000000 00000000 00000000
[    4.487864] mmc1: clock 400000Hz busmode 2 powermode 2 cs 1 Vdd 21 width 1 timing 0
[    4.489242] mmc1: starting CMD0 arg 00000000 flags 000000c0
[    4.489630] mmc1: req done (CMD0): 0: 00000000 00000000 00000000 00000000
[    4.490924] mmc1: clock 400000Hz busmode 2 powermode 2 cs 0 Vdd 21 width 1 timing 0
[    4.492295] mmc1: starting CMD8 arg 000001aa flags 000002f5
[    4.503980] usb 1-6: new high-speed USB device number 2 using xhci_hcd
[    4.595865] mmc1: req done (CMD8): -110: 00000000 00000000 00000000 00000000
[    4.595879] mmc1: starting CMD5 arg 00000000 flags 000002e1
[    4.648038] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    4.655909] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    4.658638] ata2.00: ATAPI: HL-DT-ST DVDRAM GUD0N, 1.00, max UDMA/133
[    4.663327] ata2.00: configured for UDMA/133
[    4.686681] ata1.00: ATA-10: ST1000LM048-2E7172, SDM1, max UDMA/133
[    4.686683] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    4.699867] mmc1: req failed (CMD5): -110, retrying...
[    4.743663] usb 1-6: New USB device found, idVendor=04f2, idProduct=b59e, bcdDevice=10.19
[    4.743666] usb 1-6: New USB device strings: Mfr=3, Product=1, SerialNumber=2
[    4.743668] usb 1-6: Product: Chicony USB2.0 Camera
[    4.743670] usb 1-6: Manufacturer: Chicony Electronics Co.,Ltd.
[    4.743672] usb 1-6: SerialNumber: 200901010001
[    4.803875] mmc1: req failed (CMD5): -110, retrying...
[    4.823978] ata1.00: configured for UDMA/133
[    4.824317] scsi 0:0:0:0: Direct-Access     ATA      ST1000LM048-2E71 SDM1 PQ: 0 ANSI: 5
[    4.827949] scsi 1:0:0:0: CD-ROM            HL-DT-ST DVDRAM GUD0N     1.00 PQ: 0 ANSI: 5
[    4.871980] usb 1-9: new full-speed USB device number 3 using xhci_hcd
[    4.907876] mmc1: req failed (CMD5): -110, retrying...
[    5.011958] mmc1: req done (CMD5): -110: 00000000 00000000 00000000 00000000
[    5.011978] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    5.098795] usb 1-9: New USB device found, idVendor=8087, idProduct=0aaa, bcdDevice= 0.02
[    5.098798] usb 1-9: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    5.115993] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    5.116009] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    5.219941] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    5.219957] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    5.323853] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    5.323865] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    5.377294] psmouse serio2: synaptics: queried max coordinates: x [..5674], y [..4764]
[    5.427864] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    5.427880] mmc1: clock 400000Hz busmode 1 powermode 2 cs 0 Vdd 21 width 1 timing 0
[    5.427992] mmc1: starting CMD1 arg 00000000 flags 000000e1
[    5.504366] psmouse serio2: synaptics: queried min coordinates: x [1266..], y [1086..]
[    5.504374] psmouse serio2: synaptics: The touchpad can support a better bus than the too old PS/2 protocol. Make sure MOUSE_PS2_SYNAPTICS_SMBUS and RMI4_SMB are enabled to get a better touchpad experience.
[    5.531859] mmc1: req done (CMD1): -110: 00000000 00000000 00000000 00000000
[    5.531875] mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
[    5.533420] mmc1: mmc_rescan_try_freq: trying to init card at 300000 Hz
[    5.533423] mmc1: clock 0Hz busmode 2 powermode 1 cs 0 Vdd 21 width 1 timing 0
[    5.534192] rtsx_pci_sdmmc rtsx_pci_sdmmc.0: Initial signal voltage of 3.3v
[    5.546735] mmc1: clock 300000Hz busmode 2 powermode 2 cs 0 Vdd 21 width 1 timing 0
[    5.559367] mmc1: starting CMD52 arg 00000c00 flags 00000195
[    5.657443] psmouse serio2: synaptics: Touchpad model: 1, fw: 8.2, id: 0x1e2b1, caps: 0xf00223/0x840300/0x26800/0x0, board id: 3189, fw id: 2302754
[    5.659997] mmc1: req done (CMD52): -110: 00000000 00000000 00000000 00000000
[    5.660015] mmc1: starting CMD52 arg 80000c08 flags 00000195
[    5.710015] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio2/input/input13
[    5.763975] mmc1: req done (CMD52): -110: 00000000 00000000 00000000 00000000
[    5.763992] mmc1: clock 300000Hz busmode 2 powermode 2 cs 1 Vdd 21 width 1 timing 0
[    5.765430] mmc1: starting CMD0 arg 00000000 flags 000000c0
[    5.765962] mmc1: req done (CMD0): 0: 00000000 00000000 00000000 00000000
[    5.767388] mmc1: clock 300000Hz busmode 2 powermode 2 cs 0 Vdd 21 width 1 timing 0
[    5.768902] mmc1: starting CMD8 arg 000001aa flags 000002f5
[    5.871989] mmc1: req done (CMD8): -110: 00000000 00000000 00000000 00000000
[    5.872007] mmc1: starting CMD5 arg 00000000 flags 000002e1
[    5.976016] mmc1: req failed (CMD5): -110, retrying...
[    6.046317] Console: switching to colour frame buffer device 200x56
[    6.069187] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    6.079995] mmc1: req failed (CMD5): -110, retrying...
[    6.093063] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[    6.093068] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    6.093083] sd 0:0:0:0: [sda] Write Protect is off
[    6.093087] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    6.093106] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    6.127609] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[    6.127614] cdrom: Uniform CD-ROM driver Revision: 3.20
[    6.128132] sr 1:0:0:0: Attached scsi CD-ROM sr0
[    6.183982] mmc1: req failed (CMD5): -110, retrying...
[    6.186423] sd 0:0:0:0: [sda] Attached SCSI disk
[    6.287847] mmc1: req done (CMD5): -110: 00000000 00000000 00000000 00000000
[    6.287869] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    6.299035] bcache: register_bdev() registered backing device sda
[    6.395876] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    6.395890] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    6.499984] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    6.500005] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    6.603976] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    6.603999] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    6.620171] <mmc0: starting CMD23 arg 00000020 flags 00000015>
[    6.620177] mmc0: starting CMD18 arg 000f4000 flags 000000b5
[    6.620181] mmc0:     blksz 512 blocks 32 flags 00000200 tsac 400 ms nsac 1000
[    6.620183] mmc0:     CMD12 arg 00000000 flags 00000095
[    6.622351] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
[    6.622357] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000
[    6.622359] mmc0:     16384 bytes transferred: 0
[    6.622362] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
[    6.629789] <mmc0: starting CMD23 arg 00000020 flags 00000015>
[    6.629794] mmc0: starting CMD18 arg 000f4000 flags 000000b5
[    6.629798] mmc0:     blksz 512 blocks 32 flags 00000200 tsac 400 ms nsac 1000
[    6.629801] mmc0:     CMD12 arg 00000000 flags 00000095
[    6.630279] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
[    6.630283] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000
[    6.630286] mmc0:     16384 bytes transferred: 0
[    6.630289] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
[    6.634284] <mmc0: starting CMD23 arg 00000020 flags 00000015>
[    6.634288] mmc0: starting CMD18 arg 000f4000 flags 000000b5
[    6.634292] mmc0:     blksz 512 blocks 32 flags 00000200 tsac 400 ms nsac 1000
[    6.634295] mmc0:     CMD12 arg 00000000 flags 00000095
[    6.634771] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
[    6.634775] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000
[    6.634778] mmc0:     16384 bytes transferred: 0
[    6.634781] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
[    6.641766] <mmc0: starting CMD23 arg 00000008 flags 00000015>
[    6.641768] mmc0: starting CMD18 arg 00cf3f80 flags 000000b5
[    6.641770] mmc0:     blksz 512 blocks 8 flags 00000200 tsac 400 ms nsac 1000
[    6.641771] mmc0:     CMD12 arg 00000000 flags 00000095
[    6.641972] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
[    6.641974] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000
[    6.641975] mmc0:     4096 bytes transferred: 0
[    6.641977] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
[    6.642048] <mmc0: starting CMD23 arg 00000008 flags 00000015>
[    6.642050] mmc0: starting CMD18 arg 00cf3ff0 flags 000000b5
[    6.642052] mmc0:     blksz 512 blocks 8 flags 00000200 tsac 400 ms nsac 1000
[    6.642053] mmc0:     CMD12 arg 00000000 flags 00000095
[    6.707986] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    6.708008] mmc1: clock 300000Hz busmode 1 powermode 2 cs 0 Vdd 21 width 1 timing 0
[    6.708117] mmc1: starting CMD1 arg 00000000 flags 000000e1
[    6.811990] mmc1: req done (CMD1): -110: 00000000 00000000 00000000 00000000
[    6.812013] mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
[    6.813513] mmc1: mmc_rescan_try_freq: trying to init card at 250000 Hz
[    6.813517] mmc1: clock 0Hz busmode 2 powermode 1 cs 0 Vdd 21 width 1 timing 0
[    6.814257] rtsx_pci_sdmmc rtsx_pci_sdmmc.0: Initial signal voltage of 3.3v
[    6.826921] mmc1: clock 250000Hz busmode 2 powermode 2 cs 0 Vdd 21 width 1 timing 0
[    6.839686] mmc1: starting CMD52 arg 00000c00 flags 00000195
[    6.939992] mmc1: req done (CMD52): -110: 00000000 00000000 00000000 00000000
[    6.940015] mmc1: starting CMD52 arg 80000c08 flags 00000195
[    7.043873] mmc1: req done (CMD52): -110: 00000000 00000000 00000000 00000000
[    7.043893] mmc1: clock 250000Hz busmode 2 powermode 2 cs 1 Vdd 21 width 1 timing 0
[    7.045248] mmc1: starting CMD0 arg 00000000 flags 000000c0
[    7.045747] mmc1: req done (CMD0): 0: 00000000 00000000 00000000 00000000
[    7.047033] mmc1: clock 250000Hz busmode 2 powermode 2 cs 0 Vdd 21 width 1 timing 0
[    7.048383] mmc1: starting CMD8 arg 000001aa flags 000002f5
[    7.151991] mmc1: req done (CMD8): -110: 00000000 00000000 00000000 00000000
[    7.152014] mmc1: starting CMD5 arg 00000000 flags 000002e1
[    7.256004] mmc1: req failed (CMD5): -110, retrying...
[    7.359998] mmc1: req failed (CMD5): -110, retrying...
[    7.463892] mmc1: req failed (CMD5): -110, retrying...
[    7.567993] mmc1: req done (CMD5): -110: 00000000 00000000 00000000 00000000
[    7.568017] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    7.671992] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    7.672014] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    7.775984] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    7.776008] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    7.879993] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    7.880015] mmc1: starting CMD55 arg 00000000 flags 000000f5
[    7.983992] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
[    7.984014] mmc1: clock 250000Hz busmode 1 powermode 2 cs 0 Vdd 21 width 1 timing 0
[    7.984112] mmc1: starting CMD1 arg 00000000 flags 000000e1
[    8.087987] mmc1: req done (CMD1): -110: 00000000 00000000 00000000 00000000
[    8.088009] mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
[   16.823934] mmc0: Timeout waiting for hardware interrupt.
[   16.823947] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[   16.823956] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
[   16.823964] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
[   16.823971] mmc0: sdhci: Argument:  0x00cf3ff0 | Trn mode: 0x0000003b
[   16.823978] mmc0: sdhci: Present:   0x1fff0001 | Host ctl: 0x0000003d
[   16.823985] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
[   16.823992] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
[   16.823999] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
[   16.824006] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
[   16.824013] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
[   16.824020] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
[   16.824027] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
[   16.824034] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
[   16.824041] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
[   16.824047] mmc0: sdhci: Host ctl2: 0x0000000d
[   16.824055] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x000000016bbf5200
[   16.824058] mmc0: sdhci: ============================================


Regards,

Anisse
Adrian Hunter Oct. 22, 2018, 12:34 p.m. UTC | #6
On 22/10/18 12:14 PM, Anisse Astier wrote:
> Hi,
> 
> On Mon, Oct 22, 2018 at 11:47:30AM +0300, Adrian Hunter wrote:
>> On 19/10/18 4:42 PM, Anisse Astier wrote:
>>> On Fri, Oct 19, 2018 at 04:19:39PM +0300, Adrian Hunter wrote:
>>>> On 19/10/18 12:26 PM, Anisse Astier wrote:
>>>>> Hi Adrian,
>>>>>
>>>>> On Fri, Oct 19, 2018 at 10:07:38AM +0300, Adrian Hunter wrote:
>>>>>> On 18/10/18 1:21 PM, Anisse Astier wrote:
>>>>>>> If we don't have the voltage switch DSM methods available, there's no
>>>>>>> point in advertising to the rest of the kernel that we support 1.8V, or
>>>>>>> support voltage switch at all.
>>>>>>>
>>>>>>> This fixes an issue on a Gemini Lake (GLK) laptop : eMMC driver will
>>>>>>> timeout on boot (from 60seconds to 10minutes ) as the cqhci attempts CQE
>>>>>>> recovery after a failed voltage switch. In earlier kernels, the problem
>>>>>>> existed, but only delayed boot for about 10 seconds after an I/O error,
>>>>>>> allowing booting on the eMMC (almost) unnoticed.
>>>>>>
>>>>>> Can you send the kernel messages?  Which kernel is it?  Which laptop?  An
>>>>>> acpidump might help too.
>>>>>
>>>>> You're right, I should have started with this. I have attached various
>>>>> dmesg traces:
>>>>>  - dmesg-4.18.3-CQE-traces.txt : the original issue that was
>>>>>    encountered, it shows the multiple CQE recovery timeouts, each taking
>>>>>    about 60s
>>>>>  - dmesg-4.19-rc8.noquirk.txt : a boot where the CQE recovery works, and
>>>>>    only an I/O error is shown. I've reduced it to the mmc/sdhci traces.
>>>>>  - dmesg-4.19-rc8.noquirk-with-error.txt : a boot where it fails, this
>>>>>    is similar to the first one, but with a more recent kernel
>>>>>  - dmesg-4.19-rc8.withquirk.txt : a boot with sdhci.debug_quirks2=0x90c
>>>>>    on the command line. I've added the SDHCI_QUIRK2_NO_1_8_V quirk to
>>>>>    the other ones present in the driver. You can see there's no CQE
>>>>>    recovery or mmc I/O errors.
>>>>>
>>>>>
>>>>> I've reproduced the issue with linux 4.17, 4.18 and 4.19-rc8. The laptop
>>>>> is a noname laptop with an Insyde EFI firmware (Manufacturer: Notebook,
>>>>> Product Name: N75_77GU).
>>>>>
>>>>> You'll find the acpidump in the attachment. I've looked into another GLK
>>>>> laptop's tables, and the implemented acpi methods seem to only do a
>>>>> sleep(), which isn't really helpful. I've tried to add a similmar msleep
>>>>> in the voltage_switch function, but it didn't seem to help.
>>>>
>>>> Please try disabling CQE i.e.
>>>>
>>>> diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
>>>> index e53333c695b3..c0f8348f75f7 100644
>>>> --- a/drivers/mmc/host/sdhci-pci-core.c
>>>> +++ b/drivers/mmc/host/sdhci-pci-core.c
>>>> @@ -732,7 +732,7 @@ static int glk_emmc_probe_slot(struct sdhci_pci_slot *slot)
>>>>  {
>>>>  	int ret = byt_emmc_probe_slot(slot);
>>>>
>>>> -	slot->host->mmc->caps2 |= MMC_CAP2_CQE;
>>>> +	//slot->host->mmc->caps2 |= MMC_CAP2_CQE;
>>>>
>>>>  	if (slot->chip->pdev->device != PCI_DEVICE_ID_INTEL_GLK_EMMC) {
>>>>  		slot->host->mmc->caps2 |= MMC_CAP2_HS400_ES,
>>>>
>>>
>>> I did that, and while we don't have the long CQE timeouts, we still have
>>> an I/O error:
>>>
>>> [    0.468934] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xe0000000-0xe3ffffff] (base 0xe0000000)
>>> [    0.468934] PCI: MMCONFIG at [mem 0xe0000000-0xe3ffffff] reserved in E820
>>> [    0.621318] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-3f] only partially covers this bridge
>>> [    3.727365] sdhci: Secure Digital Host Controller Interface driver
>>> [    3.727365] sdhci: Copyright(c) Pierre Ossman
>>> [    3.729638] sdhci-pci 0000:00:1c.0: SDHCI controller found [8086:31cc] (rev 3)
>>> [    3.731801] mmc0: no DSM function for 1.8 voltage switch
>>> [    3.731802] mmc0: Voltage switching unsupported
>>> [    3.731872] mmc0: CQHCI version 5.10
>>> [    3.735646] mmc0: SDHCI controller on PCI [0000:00:1c.0] using ADMA 64-bit
>>> [    3.816485] mmc0: new HS400 MMC card at address 0001
>>> [    3.819219] mmcblk0: mmc0:0001 M52532 29.1 GiB 
>>> [    3.819635] mmcblk0boot0: mmc0:0001 M52532 partition 1 4.00 MiB
>>> [    3.820056] mmcblk0boot1: mmc0:0001 M52532 partition 2 4.00 MiB
>>> [    3.820225] mmcblk0rpmb: mmc0:0001 M52532 partition 3 4.00 MiB, chardev (247:0)
>>> [    3.823106]  mmcblk0: p1 p2 p3 p4 p5 p6
>>> [    4.135118] bcache: register_cache() registered cache device mmcblk0p6
>>> [    5.627275] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
>>> [   16.251559] mmc0: Timeout waiting for hardware interrupt.
>>> [   16.251572] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>>> [   16.251581] mmc0: sdhci: Sys addr:  0x00000020 | Version:  0x00001002
>>> [   16.251589] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000020
>>> [   16.251596] mmc0: sdhci: Argument:  0x000f4000 | Trn mode: 0x0000003b
>>> [   16.251603] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
>>> [   16.251609] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
>>> [   16.251616] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
>>> [   16.251623] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
>>> [   16.251630] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
>>> [   16.251637] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
>>> [   16.251644] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
>>> [   16.251651] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
>>> [   16.251658] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
>>> [   16.251665] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
>>> [   16.251671] mmc0: sdhci: Host ctl2: 0x0000000d
>>> [   16.251679] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000173126200
>>> [   16.251682] mmc0: sdhci: ============================================
>>> [   16.252462] mmc0: mmc_hs400_to_hs200 failed, error -84
>>> [   16.253487] mmcblk0: error -84 requesting status
>>> [   16.253856] mmc0: mmc_hs400_to_hs200 failed, error -84
>>> [   16.254111] mmc0: cache flush error -84
>>> [   16.370768] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
>>> [   17.310793] EXT4-fs (mmcblk0p2): re-mounted. Opts: errors=remount-ro
>>> [   17.632001] bcache: register_bcache() error /dev/mmcblk0p6: device already registered
>>> [   18.135208] Adding 524284k swap on /dev/mmcblk0p3.  Priority:-2 extents:1 across:524284k SSFS
>>> [   18.818304] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: errors=remount-ro
>>> [   18.818848] EXT4-fs (mmcblk0p5): mounted filesystem with ordered data mode. Opts: (null)
>>
>> Let's get some more debugging messages.  If you configure dynamic debug then
>> add kernel command line parameter:
>>
>> 	dyndbg="file drivers/mmc/core/* +p;file drivers/mmc/host/* +p"
>>
>> with a recent kernel, still with CQE disabled, but no quirks or other
>> workarounds, then it will produce a large number of mmc messages.
>>
> 
> You'll find the full dmesg here:
> https://anisse.astier.eu/static/dmesg-4.19-rc8.nocqe-verbose.xz
> 
> I don't expect it to live very long, so I've tried quoting something
> useful before the first register dump, below:
> 
> [    4.471082] <mmc0: starting CMD23 arg 80000004 flags 00000015>
> [    4.471084] mmc0: starting CMD25 arg 01237156 flags 000000b5
> [    4.471086] mmc0:     blksz 512 blocks 4 flags 00001100 tsac 1600 ms nsac 4000
> [    4.471087] mmc0:     CMD12 arg 00000000 flags 0000049d
> [    4.472140] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
> [    4.472142] mmc0: req done (CMD25): 0: 00000900 00000000 00000000 00000000
> [    4.472143] mmc0:     2048 bytes transferred: 0
> [    4.472145] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [    4.472184] mmc0: starting CMD13 arg 00010000 flags 00000195
> [    4.472236] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000
> [    4.472284] mmc0: starting CMD6 arg 03200101 flags 0000049d
> [    4.472339] mmc0: req done (CMD6): 0: 00000800 00000000 00000000 00000000
> [    4.472381] mmc0: starting CMD13 arg 00010000 flags 00000195
> [    4.472433] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000
> [    4.472480] <mmc0: starting CMD23 arg 80000001 flags 00000015>
> [    4.472482] mmc0: starting CMD25 arg 01218c00 flags 000000b5
> [    4.472483] mmc0:     blksz 512 blocks 1 flags 00001100 tsac 1600 ms nsac 4000
> [    4.472485] mmc0:     CMD12 arg 00000000 flags 0000049d
> [    4.473056] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
> [    4.473058] mmc0: req done (CMD25): 0: 00000900 00000000 00000000 00000000
> [    4.473059] mmc0:     512 bytes transferred: 0
> [    4.473060] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [    4.473069] mmc0: starting CMD13 arg 00010000 flags 00000195
> [    4.473100] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000
> [    4.487852] mmc1: req done (CMD52): -110: 00000000 00000000 00000000 00000000
> [    4.487864] mmc1: clock 400000Hz busmode 2 powermode 2 cs 1 Vdd 21 width 1 timing 0
> [    4.489242] mmc1: starting CMD0 arg 00000000 flags 000000c0
> [    4.489630] mmc1: req done (CMD0): 0: 00000000 00000000 00000000 00000000
> [    4.490924] mmc1: clock 400000Hz busmode 2 powermode 2 cs 0 Vdd 21 width 1 timing 0
> [    4.492295] mmc1: starting CMD8 arg 000001aa flags 000002f5
> [    4.503980] usb 1-6: new high-speed USB device number 2 using xhci_hcd
> [    4.595865] mmc1: req done (CMD8): -110: 00000000 00000000 00000000 00000000
> [    4.595879] mmc1: starting CMD5 arg 00000000 flags 000002e1
> [    4.648038] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [    4.655909] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [    4.658638] ata2.00: ATAPI: HL-DT-ST DVDRAM GUD0N, 1.00, max UDMA/133
> [    4.663327] ata2.00: configured for UDMA/133
> [    4.686681] ata1.00: ATA-10: ST1000LM048-2E7172, SDM1, max UDMA/133
> [    4.686683] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32), AA
> [    4.699867] mmc1: req failed (CMD5): -110, retrying...
> [    4.743663] usb 1-6: New USB device found, idVendor=04f2, idProduct=b59e, bcdDevice=10.19
> [    4.743666] usb 1-6: New USB device strings: Mfr=3, Product=1, SerialNumber=2
> [    4.743668] usb 1-6: Product: Chicony USB2.0 Camera
> [    4.743670] usb 1-6: Manufacturer: Chicony Electronics Co.,Ltd.
> [    4.743672] usb 1-6: SerialNumber: 200901010001
> [    4.803875] mmc1: req failed (CMD5): -110, retrying...
> [    4.823978] ata1.00: configured for UDMA/133
> [    4.824317] scsi 0:0:0:0: Direct-Access     ATA      ST1000LM048-2E71 SDM1 PQ: 0 ANSI: 5
> [    4.827949] scsi 1:0:0:0: CD-ROM            HL-DT-ST DVDRAM GUD0N     1.00 PQ: 0 ANSI: 5
> [    4.871980] usb 1-9: new full-speed USB device number 3 using xhci_hcd
> [    4.907876] mmc1: req failed (CMD5): -110, retrying...
> [    5.011958] mmc1: req done (CMD5): -110: 00000000 00000000 00000000 00000000
> [    5.011978] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    5.098795] usb 1-9: New USB device found, idVendor=8087, idProduct=0aaa, bcdDevice= 0.02
> [    5.098798] usb 1-9: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> [    5.115993] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    5.116009] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    5.219941] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    5.219957] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    5.323853] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    5.323865] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    5.377294] psmouse serio2: synaptics: queried max coordinates: x [..5674], y [..4764]
> [    5.427864] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    5.427880] mmc1: clock 400000Hz busmode 1 powermode 2 cs 0 Vdd 21 width 1 timing 0
> [    5.427992] mmc1: starting CMD1 arg 00000000 flags 000000e1
> [    5.504366] psmouse serio2: synaptics: queried min coordinates: x [1266..], y [1086..]
> [    5.504374] psmouse serio2: synaptics: The touchpad can support a better bus than the too old PS/2 protocol. Make sure MOUSE_PS2_SYNAPTICS_SMBUS and RMI4_SMB are enabled to get a better touchpad experience.
> [    5.531859] mmc1: req done (CMD1): -110: 00000000 00000000 00000000 00000000
> [    5.531875] mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
> [    5.533420] mmc1: mmc_rescan_try_freq: trying to init card at 300000 Hz
> [    5.533423] mmc1: clock 0Hz busmode 2 powermode 1 cs 0 Vdd 21 width 1 timing 0
> [    5.534192] rtsx_pci_sdmmc rtsx_pci_sdmmc.0: Initial signal voltage of 3.3v
> [    5.546735] mmc1: clock 300000Hz busmode 2 powermode 2 cs 0 Vdd 21 width 1 timing 0
> [    5.559367] mmc1: starting CMD52 arg 00000c00 flags 00000195
> [    5.657443] psmouse serio2: synaptics: Touchpad model: 1, fw: 8.2, id: 0x1e2b1, caps: 0xf00223/0x840300/0x26800/0x0, board id: 3189, fw id: 2302754
> [    5.659997] mmc1: req done (CMD52): -110: 00000000 00000000 00000000 00000000
> [    5.660015] mmc1: starting CMD52 arg 80000c08 flags 00000195
> [    5.710015] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio2/input/input13
> [    5.763975] mmc1: req done (CMD52): -110: 00000000 00000000 00000000 00000000
> [    5.763992] mmc1: clock 300000Hz busmode 2 powermode 2 cs 1 Vdd 21 width 1 timing 0
> [    5.765430] mmc1: starting CMD0 arg 00000000 flags 000000c0
> [    5.765962] mmc1: req done (CMD0): 0: 00000000 00000000 00000000 00000000
> [    5.767388] mmc1: clock 300000Hz busmode 2 powermode 2 cs 0 Vdd 21 width 1 timing 0
> [    5.768902] mmc1: starting CMD8 arg 000001aa flags 000002f5
> [    5.871989] mmc1: req done (CMD8): -110: 00000000 00000000 00000000 00000000
> [    5.872007] mmc1: starting CMD5 arg 00000000 flags 000002e1
> [    5.976016] mmc1: req failed (CMD5): -110, retrying...
> [    6.046317] Console: switching to colour frame buffer device 200x56
> [    6.069187] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
> [    6.079995] mmc1: req failed (CMD5): -110, retrying...
> [    6.093063] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
> [    6.093068] sd 0:0:0:0: [sda] 4096-byte physical blocks
> [    6.093083] sd 0:0:0:0: [sda] Write Protect is off
> [    6.093087] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [    6.093106] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [    6.127609] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
> [    6.127614] cdrom: Uniform CD-ROM driver Revision: 3.20
> [    6.128132] sr 1:0:0:0: Attached scsi CD-ROM sr0
> [    6.183982] mmc1: req failed (CMD5): -110, retrying...
> [    6.186423] sd 0:0:0:0: [sda] Attached SCSI disk
> [    6.287847] mmc1: req done (CMD5): -110: 00000000 00000000 00000000 00000000
> [    6.287869] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    6.299035] bcache: register_bdev() registered backing device sda
> [    6.395876] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    6.395890] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    6.499984] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    6.500005] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    6.603976] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    6.603999] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    6.620171] <mmc0: starting CMD23 arg 00000020 flags 00000015>
> [    6.620177] mmc0: starting CMD18 arg 000f4000 flags 000000b5
> [    6.620181] mmc0:     blksz 512 blocks 32 flags 00000200 tsac 400 ms nsac 1000
> [    6.620183] mmc0:     CMD12 arg 00000000 flags 00000095
> [    6.622351] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
> [    6.622357] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [    6.622359] mmc0:     16384 bytes transferred: 0
> [    6.622362] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [    6.629789] <mmc0: starting CMD23 arg 00000020 flags 00000015>
> [    6.629794] mmc0: starting CMD18 arg 000f4000 flags 000000b5
> [    6.629798] mmc0:     blksz 512 blocks 32 flags 00000200 tsac 400 ms nsac 1000
> [    6.629801] mmc0:     CMD12 arg 00000000 flags 00000095
> [    6.630279] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
> [    6.630283] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [    6.630286] mmc0:     16384 bytes transferred: 0
> [    6.630289] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [    6.634284] <mmc0: starting CMD23 arg 00000020 flags 00000015>
> [    6.634288] mmc0: starting CMD18 arg 000f4000 flags 000000b5
> [    6.634292] mmc0:     blksz 512 blocks 32 flags 00000200 tsac 400 ms nsac 1000
> [    6.634295] mmc0:     CMD12 arg 00000000 flags 00000095
> [    6.634771] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
> [    6.634775] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [    6.634778] mmc0:     16384 bytes transferred: 0
> [    6.634781] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [    6.641766] <mmc0: starting CMD23 arg 00000008 flags 00000015>
> [    6.641768] mmc0: starting CMD18 arg 00cf3f80 flags 000000b5
> [    6.641770] mmc0:     blksz 512 blocks 8 flags 00000200 tsac 400 ms nsac 1000
> [    6.641771] mmc0:     CMD12 arg 00000000 flags 00000095
> [    6.641972] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000
> [    6.641974] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [    6.641975] mmc0:     4096 bytes transferred: 0
> [    6.641977] mmc0:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [    6.642048] <mmc0: starting CMD23 arg 00000008 flags 00000015>
> [    6.642050] mmc0: starting CMD18 arg 00cf3ff0 flags 000000b5
> [    6.642052] mmc0:     blksz 512 blocks 8 flags 00000200 tsac 400 ms nsac 1000
> [    6.642053] mmc0:     CMD12 arg 00000000 flags 00000095
> [    6.707986] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    6.708008] mmc1: clock 300000Hz busmode 1 powermode 2 cs 0 Vdd 21 width 1 timing 0
> [    6.708117] mmc1: starting CMD1 arg 00000000 flags 000000e1
> [    6.811990] mmc1: req done (CMD1): -110: 00000000 00000000 00000000 00000000
> [    6.812013] mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
> [    6.813513] mmc1: mmc_rescan_try_freq: trying to init card at 250000 Hz
> [    6.813517] mmc1: clock 0Hz busmode 2 powermode 1 cs 0 Vdd 21 width 1 timing 0
> [    6.814257] rtsx_pci_sdmmc rtsx_pci_sdmmc.0: Initial signal voltage of 3.3v
> [    6.826921] mmc1: clock 250000Hz busmode 2 powermode 2 cs 0 Vdd 21 width 1 timing 0
> [    6.839686] mmc1: starting CMD52 arg 00000c00 flags 00000195
> [    6.939992] mmc1: req done (CMD52): -110: 00000000 00000000 00000000 00000000
> [    6.940015] mmc1: starting CMD52 arg 80000c08 flags 00000195
> [    7.043873] mmc1: req done (CMD52): -110: 00000000 00000000 00000000 00000000
> [    7.043893] mmc1: clock 250000Hz busmode 2 powermode 2 cs 1 Vdd 21 width 1 timing 0
> [    7.045248] mmc1: starting CMD0 arg 00000000 flags 000000c0
> [    7.045747] mmc1: req done (CMD0): 0: 00000000 00000000 00000000 00000000
> [    7.047033] mmc1: clock 250000Hz busmode 2 powermode 2 cs 0 Vdd 21 width 1 timing 0
> [    7.048383] mmc1: starting CMD8 arg 000001aa flags 000002f5
> [    7.151991] mmc1: req done (CMD8): -110: 00000000 00000000 00000000 00000000
> [    7.152014] mmc1: starting CMD5 arg 00000000 flags 000002e1
> [    7.256004] mmc1: req failed (CMD5): -110, retrying...
> [    7.359998] mmc1: req failed (CMD5): -110, retrying...
> [    7.463892] mmc1: req failed (CMD5): -110, retrying...
> [    7.567993] mmc1: req done (CMD5): -110: 00000000 00000000 00000000 00000000
> [    7.568017] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    7.671992] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    7.672014] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    7.775984] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    7.776008] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    7.879993] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    7.880015] mmc1: starting CMD55 arg 00000000 flags 000000f5
> [    7.983992] mmc1: req done (CMD55): -110: 00000000 00000000 00000000 00000000
> [    7.984014] mmc1: clock 250000Hz busmode 1 powermode 2 cs 0 Vdd 21 width 1 timing 0
> [    7.984112] mmc1: starting CMD1 arg 00000000 flags 000000e1
> [    8.087987] mmc1: req done (CMD1): -110: 00000000 00000000 00000000 00000000
> [    8.088009] mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
> [   16.823934] mmc0: Timeout waiting for hardware interrupt.
> [   16.823947] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> [   16.823956] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
> [   16.823964] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
> [   16.823971] mmc0: sdhci: Argument:  0x00cf3ff0 | Trn mode: 0x0000003b
> [   16.823978] mmc0: sdhci: Present:   0x1fff0001 | Host ctl: 0x0000003d
> [   16.823985] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
> [   16.823992] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
> [   16.823999] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
> [   16.824006] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
> [   16.824013] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
> [   16.824020] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
> [   16.824027] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
> [   16.824034] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
> [   16.824041] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
> [   16.824047] mmc0: sdhci: Host ctl2: 0x0000000d
> [   16.824055] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x000000016bbf5200
> [   16.824058] mmc0: sdhci: ============================================

Could be a BIOS issue.  Do you have the most up to date BIOS?
Anisse Astier Oct. 22, 2018, 1:40 p.m. UTC | #7
On Mon, Oct 22, 2018 at 03:34:24PM +0300, Adrian Hunter wrote:
> On 22/10/18 12:14 PM, Anisse Astier wrote:
[snip]
> > [   16.823947] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> > [   16.823956] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
> > [   16.823964] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
> > [   16.823971] mmc0: sdhci: Argument:  0x00cf3ff0 | Trn mode: 0x0000003b
> > [   16.823978] mmc0: sdhci: Present:   0x1fff0001 | Host ctl: 0x0000003d
> > [   16.823985] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
> > [   16.823992] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
> > [   16.823999] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
> > [   16.824006] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
> > [   16.824013] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
> > [   16.824020] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
> > [   16.824027] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
> > [   16.824034] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
> > [   16.824041] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
> > [   16.824047] mmc0: sdhci: Host ctl2: 0x0000000d
> > [   16.824055] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x000000016bbf5200
> > [   16.824058] mmc0: sdhci: ============================================
> 
> Could be a BIOS issue.  Do you have the most up to date BIOS?

What makes you think it could be ?

I have looked into the BIOS. This specific laptop model is a Clevo
N770GU. It does not have publicly-downloadable BIOS updates. Even it had
a BIOS fix, shouldn't we also have a best effort fix in Linux for all
those machines that are never updated ?

Regards,

Anisse
Adrian Hunter Oct. 23, 2018, 1:47 p.m. UTC | #8
On 22/10/18 4:40 PM, Anisse Astier wrote:
> On Mon, Oct 22, 2018 at 03:34:24PM +0300, Adrian Hunter wrote:
>> On 22/10/18 12:14 PM, Anisse Astier wrote:
> [snip]
>>> [   16.823947] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>>> [   16.823956] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
>>> [   16.823964] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
>>> [   16.823971] mmc0: sdhci: Argument:  0x00cf3ff0 | Trn mode: 0x0000003b
>>> [   16.823978] mmc0: sdhci: Present:   0x1fff0001 | Host ctl: 0x0000003d
>>> [   16.823985] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
>>> [   16.823992] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
>>> [   16.823999] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
>>> [   16.824006] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
>>> [   16.824013] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
>>> [   16.824020] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
>>> [   16.824027] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
>>> [   16.824034] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
>>> [   16.824041] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
>>> [   16.824047] mmc0: sdhci: Host ctl2: 0x0000000d
>>> [   16.824055] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x000000016bbf5200
>>> [   16.824058] mmc0: sdhci: ============================================
>>
>> Could be a BIOS issue.  Do you have the most up to date BIOS?
> 
> What makes you think it could be ?

It seems like the host controller just stopped for no apparent reason.

> I have looked into the BIOS. This specific laptop model is a Clevo
> N770GU. It does not have publicly-downloadable BIOS updates. Even it had
> a BIOS fix, shouldn't we also have a best effort fix in Linux for all
> those machines that are never updated ?

It may come to that.

Here are a couple of patches to get a bit more information.  Also, if you
config sdhci as built-in (y instead of m) then we should see a debug message
from the interrupt handler.
From 12a4a0e83a439577534b4e58888635a1b4efd575 Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter@intel.com>
Date: Mon, 22 Oct 2018 10:09:34 +0300
Subject: [PATCH 1/2] HACK: mmc: sdhci: Add GLK registers to register dump

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/host/sdhci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 99bdae53fa2e..ae87b9474b1b 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -56,6 +56,8 @@ static void sdhci_enable_preset_value(struct sdhci_host *host, bool enable);
 
 void sdhci_dumpregs(struct sdhci_host *host)
 {
+	int reg;
+
 	SDHCI_DUMP("============ SDHCI REGISTER DUMP ===========\n");
 
 	SDHCI_DUMP("Sys addr:  0x%08x | Version:  0x%08x\n",
@@ -113,6 +115,12 @@ void sdhci_dumpregs(struct sdhci_host *host)
 		}
 	}
 
+	for (reg = 0x804; reg <= 0x840; reg += 4) {
+		if (reg == 0x80c || reg == 0x3c)
+			continue;
+		SDHCI_DUMP("%#x:     0x%08x\n", reg, sdhci_readl(host, reg));
+	}
+
 	SDHCI_DUMP("============================================\n");
 }
 EXPORT_SYMBOL_GPL(sdhci_dumpregs);
Anisse Astier Oct. 25, 2018, 10:06 a.m. UTC | #9
On Tue, Oct 23, 2018 at 04:47:05PM +0300, Adrian Hunter wrote:
> On 22/10/18 4:40 PM, Anisse Astier wrote:
> > On Mon, Oct 22, 2018 at 03:34:24PM +0300, Adrian Hunter wrote:
> >> On 22/10/18 12:14 PM, Anisse Astier wrote:
> > [snip]
> >>> [   16.823947] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> >>> [   16.823956] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
> >>> [   16.823964] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
> >>> [   16.823971] mmc0: sdhci: Argument:  0x00cf3ff0 | Trn mode: 0x0000003b
> >>> [   16.823978] mmc0: sdhci: Present:   0x1fff0001 | Host ctl: 0x0000003d
> >>> [   16.823985] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
> >>> [   16.823992] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
> >>> [   16.823999] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
> >>> [   16.824006] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
> >>> [   16.824013] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
> >>> [   16.824020] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
> >>> [   16.824027] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
> >>> [   16.824034] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
> >>> [   16.824041] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
> >>> [   16.824047] mmc0: sdhci: Host ctl2: 0x0000000d
> >>> [   16.824055] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x000000016bbf5200
> >>> [   16.824058] mmc0: sdhci: ============================================
> >>
> >> Could be a BIOS issue.  Do you have the most up to date BIOS?
> > 
> > What makes you think it could be ?
> 
> It seems like the host controller just stopped for no apparent reason.
> 
> > I have looked into the BIOS. This specific laptop model is a Clevo
> > N770GU. It does not have publicly-downloadable BIOS updates. Even it had
> > a BIOS fix, shouldn't we also have a best effort fix in Linux for all
> > those machines that are never updated ?
> 
> It may come to that.
> 
> Here are a couple of patches to get a bit more information.  Also, if you
> config sdhci as built-in (y instead of m) then we should see a debug message
> from the interrupt handler.

Please find the new log here:
https://anisse.astier.eu/static/dmesg-4.19.nocqe-builtin-regdump.txt.xz

CQHCI is still disabled, sdhci is built-in, and it has your register
dump patches applied. Here is the register dump after the first (and
only) timeout:

[   16.312690] mmc0: Timeout waiting for hardware interrupt.
[   16.312702] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[   16.312711] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
[   16.312718] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
[   16.312726] mmc0: sdhci: Argument:  0x00cf3f80 | Trn mode: 0x0000003b
[   16.312733] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
[   16.312740] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
[   16.312747] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
[   16.312753] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
[   16.312760] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
[   16.312767] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
[   16.312774] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
[   16.312780] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
[   16.312786] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
[   16.312793] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
[   16.312799] mmc0: sdhci: Host ctl2: 0x0000000d
[   16.312807] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x000000016ab6d200
[   16.312814] mmc0: sdhci: 0x804:     0x00000800
[   16.312821] mmc0: sdhci: 0x808:     0x00000800
[   16.312827] mmc0: sdhci: 0x810:     0x0000005a
[   16.312833] mmc0: sdhci: 0x814:     0x3050eb1e
[   16.312839] mmc0: sdhci: 0x818:     0x040040c8
[   16.312845] mmc0: sdhci: 0x81c:     0x00000008
[   16.312851] mmc0: sdhci: 0x820:     0x00000502
[   16.312858] mmc0: sdhci: 0x824:     0x00000811
[   16.312864] mmc0: sdhci: 0x828:     0x1c2a2927
[   16.312870] mmc0: sdhci: 0x82c:     0x000d162f
[   16.312876] mmc0: sdhci: 0x830:     0x00000a0a
[   16.312882] mmc0: sdhci: 0x834:     0x0001003b
[   16.312889] mmc0: sdhci: 0x838:     0x00800001
[   16.312895] mmc0: sdhci: 0x83c:     0x00000000
[   16.312901] mmc0: sdhci: 0x840:     0x00000000
[   16.312904] mmc0: sdhci: ============================================


Regards,

Anisse
Anisse Astier Nov. 4, 2018, 4:33 p.m. UTC | #10
Hi Adrian,

On Thu, Oct 25, 2018 at 12:06:31PM +0200, Anisse Astier wrote:
> On Tue, Oct 23, 2018 at 04:47:05PM +0300, Adrian Hunter wrote:
> > Here are a couple of patches to get a bit more information.  Also, if you
> > config sdhci as built-in (y instead of m) then we should see a debug message
> > from the interrupt handler.
> 
> Please find the new log here:
> https://anisse.astier.eu/static/dmesg-4.19.nocqe-builtin-regdump.txt.xz
> 
> CQHCI is still disabled, sdhci is built-in, and it has your register
> dump patches applied. Here is the register dump after the first (and
> only) timeout:
> 
> [   16.312690] mmc0: Timeout waiting for hardware interrupt.
> [   16.312702] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> [   16.312711] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
> [   16.312718] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
> [   16.312726] mmc0: sdhci: Argument:  0x00cf3f80 | Trn mode: 0x0000003b
> [   16.312733] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
> [   16.312740] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
> [   16.312747] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
> [   16.312753] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
> [   16.312760] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
> [   16.312767] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
> [   16.312774] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
> [   16.312780] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
> [   16.312786] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
> [   16.312793] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
> [   16.312799] mmc0: sdhci: Host ctl2: 0x0000000d
> [   16.312807] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x000000016ab6d200
> [   16.312814] mmc0: sdhci: 0x804:     0x00000800
> [   16.312821] mmc0: sdhci: 0x808:     0x00000800
> [   16.312827] mmc0: sdhci: 0x810:     0x0000005a
> [   16.312833] mmc0: sdhci: 0x814:     0x3050eb1e
> [   16.312839] mmc0: sdhci: 0x818:     0x040040c8
> [   16.312845] mmc0: sdhci: 0x81c:     0x00000008
> [   16.312851] mmc0: sdhci: 0x820:     0x00000502
> [   16.312858] mmc0: sdhci: 0x824:     0x00000811
> [   16.312864] mmc0: sdhci: 0x828:     0x1c2a2927
> [   16.312870] mmc0: sdhci: 0x82c:     0x000d162f
> [   16.312876] mmc0: sdhci: 0x830:     0x00000a0a
> [   16.312882] mmc0: sdhci: 0x834:     0x0001003b
> [   16.312889] mmc0: sdhci: 0x838:     0x00800001
> [   16.312895] mmc0: sdhci: 0x83c:     0x00000000
> [   16.312901] mmc0: sdhci: 0x840:     0x00000000
> [   16.312904] mmc0: sdhci: ============================================


Is this of any help to understand what's wrong ?

Regards,

Anisse
Adrian Hunter Nov. 5, 2018, 1:37 p.m. UTC | #11
On 4/11/18 6:33 PM, Anisse Astier wrote:
> Hi Adrian,
> 
> On Thu, Oct 25, 2018 at 12:06:31PM +0200, Anisse Astier wrote:
>> On Tue, Oct 23, 2018 at 04:47:05PM +0300, Adrian Hunter wrote:
>>> Here are a couple of patches to get a bit more information.  Also, if you
>>> config sdhci as built-in (y instead of m) then we should see a debug message
>>> from the interrupt handler.
>>
>> Please find the new log here:
>> https://anisse.astier.eu/static/dmesg-4.19.nocqe-builtin-regdump.txt.xz
>>
>> CQHCI is still disabled, sdhci is built-in, and it has your register
>> dump patches applied. Here is the register dump after the first (and
>> only) timeout:
>>
>> [   16.312690] mmc0: Timeout waiting for hardware interrupt.
>> [   16.312702] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>> [   16.312711] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
>> [   16.312718] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
>> [   16.312726] mmc0: sdhci: Argument:  0x00cf3f80 | Trn mode: 0x0000003b
>> [   16.312733] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
>> [   16.312740] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
>> [   16.312747] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
>> [   16.312753] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
>> [   16.312760] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
>> [   16.312767] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
>> [   16.312774] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
>> [   16.312780] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
>> [   16.312786] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
>> [   16.312793] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
>> [   16.312799] mmc0: sdhci: Host ctl2: 0x0000000d
>> [   16.312807] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x000000016ab6d200
>> [   16.312814] mmc0: sdhci: 0x804:     0x00000800
>> [   16.312821] mmc0: sdhci: 0x808:     0x00000800
>> [   16.312827] mmc0: sdhci: 0x810:     0x0000005a
>> [   16.312833] mmc0: sdhci: 0x814:     0x3050eb1e
>> [   16.312839] mmc0: sdhci: 0x818:     0x040040c8
>> [   16.312845] mmc0: sdhci: 0x81c:     0x00000008
>> [   16.312851] mmc0: sdhci: 0x820:     0x00000502
>> [   16.312858] mmc0: sdhci: 0x824:     0x00000811
>> [   16.312864] mmc0: sdhci: 0x828:     0x1c2a2927
>> [   16.312870] mmc0: sdhci: 0x82c:     0x000d162f
>> [   16.312876] mmc0: sdhci: 0x830:     0x00000a0a
>> [   16.312882] mmc0: sdhci: 0x834:     0x0001003b
>> [   16.312889] mmc0: sdhci: 0x838:     0x00800001
>> [   16.312895] mmc0: sdhci: 0x83c:     0x00000000
>> [   16.312901] mmc0: sdhci: 0x840:     0x00000000
>> [   16.312904] mmc0: sdhci: ============================================
> 
> 
> Is this of any help to understand what's wrong ?

Sorry, I will look at this soon.
Anisse Astier Nov. 8, 2018, 3:55 p.m. UTC | #12
On Mon, Nov 05, 2018 at 03:37:05PM +0200, Adrian Hunter wrote:
> On 4/11/18 6:33 PM, Anisse Astier wrote:
> > Hi Adrian,
> > 
> > On Thu, Oct 25, 2018 at 12:06:31PM +0200, Anisse Astier wrote:
> >> On Tue, Oct 23, 2018 at 04:47:05PM +0300, Adrian Hunter wrote:
> >>> Here are a couple of patches to get a bit more information.  Also, if you
> >>> config sdhci as built-in (y instead of m) then we should see a debug message
> >>> from the interrupt handler.
> >>
> >> Please find the new log here:
> >> https://anisse.astier.eu/static/dmesg-4.19.nocqe-builtin-regdump.txt.xz
> >>
> >> CQHCI is still disabled, sdhci is built-in, and it has your register
> >> dump patches applied. Here is the register dump after the first (and
> >> only) timeout:
> >>
> >> [   16.312690] mmc0: Timeout waiting for hardware interrupt.
> >> [   16.312702] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> >> [   16.312711] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
> >> [   16.312718] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
> >> [   16.312726] mmc0: sdhci: Argument:  0x00cf3f80 | Trn mode: 0x0000003b
> >> [   16.312733] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
> >> [   16.312740] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
> >> [   16.312747] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
> >> [   16.312753] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
> >> [   16.312760] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
> >> [   16.312767] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
> >> [   16.312774] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
> >> [   16.312780] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
> >> [   16.312786] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
> >> [   16.312793] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
> >> [   16.312799] mmc0: sdhci: Host ctl2: 0x0000000d
> >> [   16.312807] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x000000016ab6d200
> >> [   16.312814] mmc0: sdhci: 0x804:     0x00000800
> >> [   16.312821] mmc0: sdhci: 0x808:     0x00000800
> >> [   16.312827] mmc0: sdhci: 0x810:     0x0000005a
> >> [   16.312833] mmc0: sdhci: 0x814:     0x3050eb1e
> >> [   16.312839] mmc0: sdhci: 0x818:     0x040040c8
> >> [   16.312845] mmc0: sdhci: 0x81c:     0x00000008
> >> [   16.312851] mmc0: sdhci: 0x820:     0x00000502
> >> [   16.312858] mmc0: sdhci: 0x824:     0x00000811
> >> [   16.312864] mmc0: sdhci: 0x828:     0x1c2a2927
> >> [   16.312870] mmc0: sdhci: 0x82c:     0x000d162f
> >> [   16.312876] mmc0: sdhci: 0x830:     0x00000a0a
> >> [   16.312882] mmc0: sdhci: 0x834:     0x0001003b
> >> [   16.312889] mmc0: sdhci: 0x838:     0x00800001
> >> [   16.312895] mmc0: sdhci: 0x83c:     0x00000000
> >> [   16.312901] mmc0: sdhci: 0x840:     0x00000000
> >> [   16.312904] mmc0: sdhci: ============================================
> > 
> > 
> > Is this of any help to understand what's wrong ?
> 
> Sorry, I will look at this soon.
> 

Thanks.


Anisse
Adrian Hunter Nov. 8, 2018, 5:43 p.m. UTC | #13
> -----Original Message-----
> From: Anisse Astier [mailto:anisse@astier.eu]
> Sent: Thursday, November 8, 2018 5:55 PM
> To: Hunter, Adrian <adrian.hunter@intel.com>
> Cc: linux-mmc@vger.kernel.org; Ulf Hansson <ulf.hansson@linaro.org>
> Subject: Re: [PATCH] mmc: sdhci-pci: disable intel voltage switch if
> unsupported
> 
> On Mon, Nov 05, 2018 at 03:37:05PM +0200, Adrian Hunter wrote:
> > On 4/11/18 6:33 PM, Anisse Astier wrote:
> > > Hi Adrian,
> > >
> > > On Thu, Oct 25, 2018 at 12:06:31PM +0200, Anisse Astier wrote:
> > >> On Tue, Oct 23, 2018 at 04:47:05PM +0300, Adrian Hunter wrote:
> > >>> Here are a couple of patches to get a bit more information.  Also,
> > >>> if you config sdhci as built-in (y instead of m) then we should
> > >>> see a debug message from the interrupt handler.
> > >>
> > >> Please find the new log here:
> > >> https://anisse.astier.eu/static/dmesg-4.19.nocqe-builtin-regdump.tx
> > >> t.xz
> > >>
> > >> CQHCI is still disabled, sdhci is built-in, and it has your
> > >> register dump patches applied. Here is the register dump after the
> > >> first (and
> > >> only) timeout:
> > >>
> > >> [   16.312690] mmc0: Timeout waiting for hardware interrupt.
> > >> [   16.312702] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> ===========
> > >> [   16.312711] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
> > >> [   16.312718] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
> > >> [   16.312726] mmc0: sdhci: Argument:  0x00cf3f80 | Trn mode:
> 0x0000003b
> > >> [   16.312733] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
> > >> [   16.312740] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
> > >> [   16.312747] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
> > >> [   16.312753] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
> > >> [   16.312760] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
> > >> [   16.312767] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
> > >> [   16.312774] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
> > >> [   16.312780] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
> > >> [   16.312786] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
> > >> [   16.312793] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
> > >> [   16.312799] mmc0: sdhci: Host ctl2: 0x0000000d
> > >> [   16.312807] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr:
> 0x000000016ab6d200
> > >> [   16.312814] mmc0: sdhci: 0x804:     0x00000800
> > >> [   16.312821] mmc0: sdhci: 0x808:     0x00000800
> > >> [   16.312827] mmc0: sdhci: 0x810:     0x0000005a
> > >> [   16.312833] mmc0: sdhci: 0x814:     0x3050eb1e
> > >> [   16.312839] mmc0: sdhci: 0x818:     0x040040c8
> > >> [   16.312845] mmc0: sdhci: 0x81c:     0x00000008
> > >> [   16.312851] mmc0: sdhci: 0x820:     0x00000502
> > >> [   16.312858] mmc0: sdhci: 0x824:     0x00000811
> > >> [   16.312864] mmc0: sdhci: 0x828:     0x1c2a2927
> > >> [   16.312870] mmc0: sdhci: 0x82c:     0x000d162f
> > >> [   16.312876] mmc0: sdhci: 0x830:     0x00000a0a
> > >> [   16.312882] mmc0: sdhci: 0x834:     0x0001003b
> > >> [   16.312889] mmc0: sdhci: 0x838:     0x00800001
> > >> [   16.312895] mmc0: sdhci: 0x83c:     0x00000000
> > >> [   16.312901] mmc0: sdhci: 0x840:     0x00000000
> > >> [   16.312904] mmc0: sdhci:
> ============================================
> > >
> > >
> > > Is this of any help to understand what's wrong ?
> >
> > Sorry, I will look at this soon.
> >

I found some time today.  It seems that the ACPI _PS3 method is failing to save the tuning value.
That results in a CRC error, but the driver does not recover very gracefully.

I will look at making a patch for the driver recovery, but also add some diagnostics prints to the ACPI DSDT to try to figure out what is going wrong there.
Anisse Astier Nov. 16, 2018, 9:37 a.m. UTC | #14
Hi Adrian, 

On Thu, Nov 08, 2018 at 05:43:32PM +0000, Hunter, Adrian wrote:
> > -----Original Message-----
> > From: Anisse Astier [mailto:anisse@astier.eu]
> > Sent: Thursday, November 8, 2018 5:55 PM
> > To: Hunter, Adrian <adrian.hunter@intel.com>
> > Cc: linux-mmc@vger.kernel.org; Ulf Hansson <ulf.hansson@linaro.org>
> > Subject: Re: [PATCH] mmc: sdhci-pci: disable intel voltage switch if
> > unsupported
> > 
> > On Mon, Nov 05, 2018 at 03:37:05PM +0200, Adrian Hunter wrote:
> > > On 4/11/18 6:33 PM, Anisse Astier wrote:
> > > > Hi Adrian,
> > > >
> > > > On Thu, Oct 25, 2018 at 12:06:31PM +0200, Anisse Astier wrote:
> > > >> On Tue, Oct 23, 2018 at 04:47:05PM +0300, Adrian Hunter wrote:
> > > >>> Here are a couple of patches to get a bit more information.  Also,
> > > >>> if you config sdhci as built-in (y instead of m) then we should
> > > >>> see a debug message from the interrupt handler.
> > > >>
> > > >> Please find the new log here:
> > > >> https://anisse.astier.eu/static/dmesg-4.19.nocqe-builtin-regdump.tx
> > > >> t.xz
> > > >>
> > > >> CQHCI is still disabled, sdhci is built-in, and it has your
> > > >> register dump patches applied. Here is the register dump after the
> > > >> first (and
> > > >> only) timeout:
> > > >>
> > > >> [   16.312690] mmc0: Timeout waiting for hardware interrupt.
> > > >> [   16.312702] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> > ===========
> > > >> [   16.312711] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
> > > >> [   16.312718] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
> > > >> [   16.312726] mmc0: sdhci: Argument:  0x00cf3f80 | Trn mode:
> > 0x0000003b
> > > >> [   16.312733] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
> > > >> [   16.312740] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
> > > >> [   16.312747] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
> > > >> [   16.312753] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
> > > >> [   16.312760] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
> > > >> [   16.312767] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
> > > >> [   16.312774] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
> > > >> [   16.312780] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
> > > >> [   16.312786] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
> > > >> [   16.312793] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
> > > >> [   16.312799] mmc0: sdhci: Host ctl2: 0x0000000d
> > > >> [   16.312807] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr:
> > 0x000000016ab6d200
> > > >> [   16.312814] mmc0: sdhci: 0x804:     0x00000800
> > > >> [   16.312821] mmc0: sdhci: 0x808:     0x00000800
> > > >> [   16.312827] mmc0: sdhci: 0x810:     0x0000005a
> > > >> [   16.312833] mmc0: sdhci: 0x814:     0x3050eb1e
> > > >> [   16.312839] mmc0: sdhci: 0x818:     0x040040c8
> > > >> [   16.312845] mmc0: sdhci: 0x81c:     0x00000008
> > > >> [   16.312851] mmc0: sdhci: 0x820:     0x00000502
> > > >> [   16.312858] mmc0: sdhci: 0x824:     0x00000811
> > > >> [   16.312864] mmc0: sdhci: 0x828:     0x1c2a2927
> > > >> [   16.312870] mmc0: sdhci: 0x82c:     0x000d162f
> > > >> [   16.312876] mmc0: sdhci: 0x830:     0x00000a0a
> > > >> [   16.312882] mmc0: sdhci: 0x834:     0x0001003b
> > > >> [   16.312889] mmc0: sdhci: 0x838:     0x00800001
> > > >> [   16.312895] mmc0: sdhci: 0x83c:     0x00000000
> > > >> [   16.312901] mmc0: sdhci: 0x840:     0x00000000
> > > >> [   16.312904] mmc0: sdhci:
> > ============================================
> > > >
> > > >
> > > > Is this of any help to understand what's wrong ?
> > >
> > > Sorry, I will look at this soon.
> > >
> 
> I found some time today.  It seems that the ACPI _PS3 method is failing to save the tuning value.
> That results in a CRC error, but the driver does not recover very gracefully.
> 
> I will look at making a patch for the driver recovery, but also add some diagnostics prints to the ACPI DSDT to try to figure out what is going wrong there.
> 
> 

Have you had the time to look at what diagnostics could be added ?

Regards,

Anisse
Adrian Hunter Nov. 16, 2018, 9:50 a.m. UTC | #15
On 16/11/18 11:37 AM, Anisse Astier wrote:
> Hi Adrian, 
> 
> On Thu, Nov 08, 2018 at 05:43:32PM +0000, Hunter, Adrian wrote:
>>> -----Original Message-----
>>> From: Anisse Astier [mailto:anisse@astier.eu]
>>> Sent: Thursday, November 8, 2018 5:55 PM
>>> To: Hunter, Adrian <adrian.hunter@intel.com>
>>> Cc: linux-mmc@vger.kernel.org; Ulf Hansson <ulf.hansson@linaro.org>
>>> Subject: Re: [PATCH] mmc: sdhci-pci: disable intel voltage switch if
>>> unsupported
>>>
>>> On Mon, Nov 05, 2018 at 03:37:05PM +0200, Adrian Hunter wrote:
>>>> On 4/11/18 6:33 PM, Anisse Astier wrote:
>>>>> Hi Adrian,
>>>>>
>>>>> On Thu, Oct 25, 2018 at 12:06:31PM +0200, Anisse Astier wrote:
>>>>>> On Tue, Oct 23, 2018 at 04:47:05PM +0300, Adrian Hunter wrote:
>>>>>>> Here are a couple of patches to get a bit more information.  Also,
>>>>>>> if you config sdhci as built-in (y instead of m) then we should
>>>>>>> see a debug message from the interrupt handler.
>>>>>>
>>>>>> Please find the new log here:
>>>>>> https://anisse.astier.eu/static/dmesg-4.19.nocqe-builtin-regdump.tx
>>>>>> t.xz
>>>>>>
>>>>>> CQHCI is still disabled, sdhci is built-in, and it has your
>>>>>> register dump patches applied. Here is the register dump after the
>>>>>> first (and
>>>>>> only) timeout:
>>>>>>
>>>>>> [   16.312690] mmc0: Timeout waiting for hardware interrupt.
>>>>>> [   16.312702] mmc0: sdhci: ============ SDHCI REGISTER DUMP
>>> ===========
>>>>>> [   16.312711] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
>>>>>> [   16.312718] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
>>>>>> [   16.312726] mmc0: sdhci: Argument:  0x00cf3f80 | Trn mode:
>>> 0x0000003b
>>>>>> [   16.312733] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
>>>>>> [   16.312740] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
>>>>>> [   16.312747] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
>>>>>> [   16.312753] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
>>>>>> [   16.312760] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
>>>>>> [   16.312767] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
>>>>>> [   16.312774] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
>>>>>> [   16.312780] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
>>>>>> [   16.312786] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
>>>>>> [   16.312793] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
>>>>>> [   16.312799] mmc0: sdhci: Host ctl2: 0x0000000d
>>>>>> [   16.312807] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr:
>>> 0x000000016ab6d200
>>>>>> [   16.312814] mmc0: sdhci: 0x804:     0x00000800
>>>>>> [   16.312821] mmc0: sdhci: 0x808:     0x00000800
>>>>>> [   16.312827] mmc0: sdhci: 0x810:     0x0000005a
>>>>>> [   16.312833] mmc0: sdhci: 0x814:     0x3050eb1e
>>>>>> [   16.312839] mmc0: sdhci: 0x818:     0x040040c8
>>>>>> [   16.312845] mmc0: sdhci: 0x81c:     0x00000008
>>>>>> [   16.312851] mmc0: sdhci: 0x820:     0x00000502
>>>>>> [   16.312858] mmc0: sdhci: 0x824:     0x00000811
>>>>>> [   16.312864] mmc0: sdhci: 0x828:     0x1c2a2927
>>>>>> [   16.312870] mmc0: sdhci: 0x82c:     0x000d162f
>>>>>> [   16.312876] mmc0: sdhci: 0x830:     0x00000a0a
>>>>>> [   16.312882] mmc0: sdhci: 0x834:     0x0001003b
>>>>>> [   16.312889] mmc0: sdhci: 0x838:     0x00800001
>>>>>> [   16.312895] mmc0: sdhci: 0x83c:     0x00000000
>>>>>> [   16.312901] mmc0: sdhci: 0x840:     0x00000000
>>>>>> [   16.312904] mmc0: sdhci:
>>> ============================================
>>>>>
>>>>>
>>>>> Is this of any help to understand what's wrong ?
>>>>
>>>> Sorry, I will look at this soon.
>>>>
>>
>> I found some time today.  It seems that the ACPI _PS3 method is failing to save the tuning value.
>> That results in a CRC error, but the driver does not recover very gracefully.
>>
>> I will look at making a patch for the driver recovery, but also add some diagnostics prints to the ACPI DSDT to try to figure out what is going wrong there.
>>
>>
> 
> Have you had the time to look at what diagnostics could be added ?

I have a patch, but I am still testing it.
Adrian Hunter Nov. 16, 2018, 2:01 p.m. UTC | #16
On 16/11/18 11:50 AM, Adrian Hunter wrote:
> On 16/11/18 11:37 AM, Anisse Astier wrote:
>> Hi Adrian, 
>>
>> On Thu, Nov 08, 2018 at 05:43:32PM +0000, Hunter, Adrian wrote:
>>>> -----Original Message-----
>>>> From: Anisse Astier [mailto:anisse@astier.eu]
>>>> Sent: Thursday, November 8, 2018 5:55 PM
>>>> To: Hunter, Adrian <adrian.hunter@intel.com>
>>>> Cc: linux-mmc@vger.kernel.org; Ulf Hansson <ulf.hansson@linaro.org>
>>>> Subject: Re: [PATCH] mmc: sdhci-pci: disable intel voltage switch if
>>>> unsupported
>>>>
>>>> On Mon, Nov 05, 2018 at 03:37:05PM +0200, Adrian Hunter wrote:
>>>>> On 4/11/18 6:33 PM, Anisse Astier wrote:
>>>>>> Hi Adrian,
>>>>>>
>>>>>> On Thu, Oct 25, 2018 at 12:06:31PM +0200, Anisse Astier wrote:
>>>>>>> On Tue, Oct 23, 2018 at 04:47:05PM +0300, Adrian Hunter wrote:
>>>>>>>> Here are a couple of patches to get a bit more information.  Also,
>>>>>>>> if you config sdhci as built-in (y instead of m) then we should
>>>>>>>> see a debug message from the interrupt handler.
>>>>>>>
>>>>>>> Please find the new log here:
>>>>>>> https://anisse.astier.eu/static/dmesg-4.19.nocqe-builtin-regdump.tx
>>>>>>> t.xz
>>>>>>>
>>>>>>> CQHCI is still disabled, sdhci is built-in, and it has your
>>>>>>> register dump patches applied. Here is the register dump after the
>>>>>>> first (and
>>>>>>> only) timeout:
>>>>>>>
>>>>>>> [   16.312690] mmc0: Timeout waiting for hardware interrupt.
>>>>>>> [   16.312702] mmc0: sdhci: ============ SDHCI REGISTER DUMP
>>>> ===========
>>>>>>> [   16.312711] mmc0: sdhci: Sys addr:  0x00000008 | Version:  0x00001002
>>>>>>> [   16.312718] mmc0: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000008
>>>>>>> [   16.312726] mmc0: sdhci: Argument:  0x00cf3f80 | Trn mode:
>>>> 0x0000003b
>>>>>>> [   16.312733] mmc0: sdhci: Present:   0x1fff0206 | Host ctl: 0x0000003d
>>>>>>> [   16.312740] mmc0: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
>>>>>>> [   16.312747] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
>>>>>>> [   16.312753] mmc0: sdhci: Timeout:   0x00000006 | Int stat: 0x00000000
>>>>>>> [   16.312760] mmc0: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02ff000b
>>>>>>> [   16.312767] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
>>>>>>> [   16.312774] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:   0x80000807
>>>>>>> [   16.312780] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00000000
>>>>>>> [   16.312786] mmc0: sdhci: Resp[0]:   0x00000800 | Resp[1]:  0x00000000
>>>>>>> [   16.312793] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
>>>>>>> [   16.312799] mmc0: sdhci: Host ctl2: 0x0000000d
>>>>>>> [   16.312807] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr:
>>>> 0x000000016ab6d200
>>>>>>> [   16.312814] mmc0: sdhci: 0x804:     0x00000800
>>>>>>> [   16.312821] mmc0: sdhci: 0x808:     0x00000800
>>>>>>> [   16.312827] mmc0: sdhci: 0x810:     0x0000005a
>>>>>>> [   16.312833] mmc0: sdhci: 0x814:     0x3050eb1e
>>>>>>> [   16.312839] mmc0: sdhci: 0x818:     0x040040c8
>>>>>>> [   16.312845] mmc0: sdhci: 0x81c:     0x00000008
>>>>>>> [   16.312851] mmc0: sdhci: 0x820:     0x00000502
>>>>>>> [   16.312858] mmc0: sdhci: 0x824:     0x00000811
>>>>>>> [   16.312864] mmc0: sdhci: 0x828:     0x1c2a2927
>>>>>>> [   16.312870] mmc0: sdhci: 0x82c:     0x000d162f
>>>>>>> [   16.312876] mmc0: sdhci: 0x830:     0x00000a0a
>>>>>>> [   16.312882] mmc0: sdhci: 0x834:     0x0001003b
>>>>>>> [   16.312889] mmc0: sdhci: 0x838:     0x00800001
>>>>>>> [   16.312895] mmc0: sdhci: 0x83c:     0x00000000
>>>>>>> [   16.312901] mmc0: sdhci: 0x840:     0x00000000
>>>>>>> [   16.312904] mmc0: sdhci:
>>>> ============================================
>>>>>>
>>>>>>
>>>>>> Is this of any help to understand what's wrong ?
>>>>>
>>>>> Sorry, I will look at this soon.
>>>>>
>>>
>>> I found some time today.  It seems that the ACPI _PS3 method is failing to save the tuning value.
>>> That results in a CRC error, but the driver does not recover very gracefully.
>>>
>>> I will look at making a patch for the driver recovery, but also add some diagnostics prints to the ACPI DSDT to try to figure out what is going wrong there.
>>>
>>>
>>
>> Have you had the time to look at what diagnostics could be added ?
> 
> I have a patch, but I am still testing it.
> 

Please try the attached patch
From 5c33302367e7852aa0a723e311b1749bea59f055 Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter@intel.com>
Date: Fri, 16 Nov 2018 14:02:12 +0200
Subject: [PATCH] mmc: sdhci-pci: Workaround GLK firmware failing to restore
 the tuning value

GLK firmware can indicate that the tuning value will be restored after
runtime suspend, but not actually do that. Add a workaround that detects
such cases, and lets the driver do re-tuning instead.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/host/sdhci-pci-core.c | 70 ++++++++++++++++++++++++++++++-
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
index 7bfd366d970d..dc073613cbb5 100644
--- a/drivers/mmc/host/sdhci-pci-core.c
+++ b/drivers/mmc/host/sdhci-pci-core.c
@@ -462,6 +462,9 @@ struct intel_host {
 	u32	dsm_fns;
 	int	drv_strength;
 	bool	d3_retune;
+	bool	rpm_retune_ok;
+	u32	glk_rx_ctrl1;
+	u32	glk_tun_val;
 };
 
 static const guid_t intel_dsm_guid =
@@ -791,6 +794,69 @@ static int glk_emmc_add_host(struct sdhci_pci_slot *slot)
 	return ret;
 }
 
+#ifdef CONFIG_PM
+/* Workaround firmware failing to restore the tuning value */
+static void glk_rpm_retune_wa(struct sdhci_pci_chip *chip, bool susp)
+{
+	struct sdhci_pci_slot *slot = chip->slots[0];
+	struct intel_host *intel_host = sdhci_pci_priv(slot);
+	struct sdhci_host *host = slot->host;
+	u32 glk_rx_ctrl1;
+	u32 glk_tun_val;
+	u32 dly;
+
+	if (intel_host->rpm_retune_ok || !mmc_can_retune(host->mmc))
+		return;
+
+	glk_rx_ctrl1 = sdhci_readl(host, 0x834);
+	glk_tun_val = sdhci_readl(host, 0x840);
+
+	if (susp) {
+		intel_host->glk_rx_ctrl1 = glk_rx_ctrl1;
+		intel_host->glk_tun_val = glk_tun_val;
+		return;
+	}
+
+	if (!intel_host->glk_tun_val)
+		return;
+
+	if (glk_rx_ctrl1 != intel_host->glk_rx_ctrl1) {
+		intel_host->rpm_retune_ok = true;
+		return;
+	}
+
+	dly = ((glk_rx_ctrl1 & 0x3f00) >> 8) + (intel_host->glk_tun_val << 1);
+	if ((dly & 0x7f) == (glk_rx_ctrl1 & 0x7f))
+		return;
+
+	intel_host->rpm_retune_ok = true;
+	chip->rpm_retune = true;
+	mmc_retune_needed(host->mmc);
+	pr_info("%s: Requiring re-tune after rpm resume", mmc_hostname(host->mmc));
+}
+
+static void glk_rpm_retune_chk(struct sdhci_pci_chip *chip, bool susp)
+{
+	if (chip->pdev->device == PCI_DEVICE_ID_INTEL_GLK_EMMC &&
+	    !chip->rpm_retune)
+		glk_rpm_retune_wa(chip, susp);
+}
+
+static int glk_runtime_suspend(struct sdhci_pci_chip *chip)
+{
+	glk_rpm_retune_chk(chip, true);
+
+	return sdhci_cqhci_runtime_suspend(chip);
+}
+
+static int glk_runtime_resume(struct sdhci_pci_chip *chip)
+{
+	glk_rpm_retune_chk(chip, false);
+
+	return sdhci_cqhci_runtime_resume(chip);
+}
+#endif
+
 #ifdef CONFIG_ACPI
 static int ni_set_max_freq(struct sdhci_pci_slot *slot)
 {
@@ -879,8 +945,8 @@ static const struct sdhci_pci_fixes sdhci_intel_glk_emmc = {
 	.resume			= sdhci_cqhci_resume,
 #endif
 #ifdef CONFIG_PM
-	.runtime_suspend	= sdhci_cqhci_runtime_suspend,
-	.runtime_resume		= sdhci_cqhci_runtime_resume,
+	.runtime_suspend	= glk_runtime_suspend,
+	.runtime_resume		= glk_runtime_resume,
 #endif
 	.quirks			= SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC,
 	.quirks2		= SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
Anisse Astier Nov. 16, 2018, 4:57 p.m. UTC | #17
Hi Adrian,

On Fri, Nov 16, 2018 at 04:01:36PM +0200, Adrian Hunter wrote:
> On 16/11/18 11:50 AM, Adrian Hunter wrote:
> > On 16/11/18 11:37 AM, Anisse Astier wrote:
> >> Hi Adrian, 
> >>
> >> On Thu, Nov 08, 2018 at 05:43:32PM +0000, Hunter, Adrian wrote:
> >>> [snip]
> >>> I found some time today.  It seems that the ACPI _PS3 method is failing to save the tuning value.
> >>> That results in a CRC error, but the driver does not recover very gracefully.
> >>>
> >>> I will look at making a patch for the driver recovery, but also add some diagnostics prints to the ACPI DSDT to try to figure out what is going wrong there.
> >>>
> >>>
> >>
> >> Have you had the time to look at what diagnostics could be added ?
> > 
> > I have a patch, but I am still testing it.
> > 
> 
> Please try the attached patch

> From 5c33302367e7852aa0a723e311b1749bea59f055 Mon Sep 17 00:00:00 2001
> From: Adrian Hunter <adrian.hunter@intel.com>
> Date: Fri, 16 Nov 2018 14:02:12 +0200
> Subject: [PATCH] mmc: sdhci-pci: Workaround GLK firmware failing to restore
>  the tuning value
> 
> GLK firmware can indicate that the tuning value will be restored after
> runtime suspend, but not actually do that. Add a workaround that detects
> such cases, and lets the driver do re-tuning instead.
> 

This looks like it solves the issue. Please find the boot log here:

https://anisse.astier.eu/static/dmesg-sdhci-glk-workaround.txt.xz

We still have I/O errors I don't understand, but no more long CQE
timeouts.

Please tell me if you'd like me to do any other test.

Regards,

Anisse
Adrian Hunter Nov. 19, 2018, 7:37 a.m. UTC | #18
On 16/11/18 6:57 PM, Anisse Astier wrote:
> Hi Adrian,
> 
> On Fri, Nov 16, 2018 at 04:01:36PM +0200, Adrian Hunter wrote:
>> On 16/11/18 11:50 AM, Adrian Hunter wrote:
>>> On 16/11/18 11:37 AM, Anisse Astier wrote:
>>>> Hi Adrian, 
>>>>
>>>> On Thu, Nov 08, 2018 at 05:43:32PM +0000, Hunter, Adrian wrote:
>>>>> [snip]
>>>>> I found some time today.  It seems that the ACPI _PS3 method is failing to save the tuning value.
>>>>> That results in a CRC error, but the driver does not recover very gracefully.
>>>>>
>>>>> I will look at making a patch for the driver recovery, but also add some diagnostics prints to the ACPI DSDT to try to figure out what is going wrong there.
>>>>>
>>>>>
>>>>
>>>> Have you had the time to look at what diagnostics could be added ?
>>>
>>> I have a patch, but I am still testing it.
>>>
>>
>> Please try the attached patch
> 
>> From 5c33302367e7852aa0a723e311b1749bea59f055 Mon Sep 17 00:00:00 2001
>> From: Adrian Hunter <adrian.hunter@intel.com>
>> Date: Fri, 16 Nov 2018 14:02:12 +0200
>> Subject: [PATCH] mmc: sdhci-pci: Workaround GLK firmware failing to restore
>>  the tuning value
>>
>> GLK firmware can indicate that the tuning value will be restored after
>> runtime suspend, but not actually do that. Add a workaround that detects
>> such cases, and lets the driver do re-tuning instead.
>>
> 
> This looks like it solves the issue. Please find the boot log here:
> 
> https://anisse.astier.eu/static/dmesg-sdhci-glk-workaround.txt.xz
> 
> We still have I/O errors I don't understand, but no more long CQE
> timeouts.

Please try this patch instead.
From ff445c899af814f0623ef191213b9d7ddb51aa69 Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter@intel.com>
Date: Fri, 16 Nov 2018 14:02:12 +0200
Subject: [PATCH V2] mmc: sdhci-pci: Workaround GLK firmware failing to restore
 the tuning value

GLK firmware can indicate that the tuning value will be restored after
runtime suspend, but not actually do that. Add a workaround that detects
such cases, and lets the driver do re-tuning instead.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/host/sdhci-pci-core.c | 79 ++++++++++++++++++++++++++++++-
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
index 7bfd366d970d..f5176d974a88 100644
--- a/drivers/mmc/host/sdhci-pci-core.c
+++ b/drivers/mmc/host/sdhci-pci-core.c
@@ -12,6 +12,7 @@
  *     - JMicron (hardware and technical support)
  */
 
+#include <linux/bitfield.h>
 #include <linux/string.h>
 #include <linux/delay.h>
 #include <linux/highmem.h>
@@ -462,6 +463,9 @@ struct intel_host {
 	u32	dsm_fns;
 	int	drv_strength;
 	bool	d3_retune;
+	bool	rpm_retune_ok;
+	u32	glk_rx_ctrl1;
+	u32	glk_tun_val;
 };
 
 static const guid_t intel_dsm_guid =
@@ -791,6 +795,77 @@ static int glk_emmc_add_host(struct sdhci_pci_slot *slot)
 	return ret;
 }
 
+#ifdef CONFIG_PM
+#define GLK_RX_CTRL1	0x834
+#define GLK_TUN_VAL	0x840
+#define GLK_PATH_PLL	GENMASK(13, 8)
+#define GLK_DLY		GENMASK(6, 0)
+/* Workaround firmware failing to restore the tuning value */
+static void glk_rpm_retune_wa(struct sdhci_pci_chip *chip, bool susp)
+{
+	struct sdhci_pci_slot *slot = chip->slots[0];
+	struct intel_host *intel_host = sdhci_pci_priv(slot);
+	struct sdhci_host *host = slot->host;
+	u32 glk_rx_ctrl1;
+	u32 glk_tun_val;
+	u32 dly;
+
+	if (intel_host->rpm_retune_ok || !mmc_can_retune(host->mmc))
+		return;
+
+	glk_rx_ctrl1 = sdhci_readl(host, GLK_RX_CTRL1);
+	glk_tun_val = sdhci_readl(host, GLK_TUN_VAL);
+
+	if (susp) {
+		intel_host->glk_rx_ctrl1 = glk_rx_ctrl1;
+		intel_host->glk_tun_val = glk_tun_val;
+		return;
+	}
+
+	if (!intel_host->glk_tun_val)
+		return;
+
+	if (glk_rx_ctrl1 != intel_host->glk_rx_ctrl1) {
+		intel_host->rpm_retune_ok = true;
+		return;
+	}
+
+	dly = FIELD_PREP(GLK_DLY, FIELD_GET(GLK_PATH_PLL, glk_rx_ctrl1) +
+				  (intel_host->glk_tun_val << 1));
+	if (dly == FIELD_GET(GLK_DLY, glk_rx_ctrl1))
+		return;
+
+	glk_rx_ctrl1 = (glk_rx_ctrl1 & ~GLK_DLY) | dly;
+	sdhci_writel(host, glk_rx_ctrl1, GLK_RX_CTRL1);
+
+	intel_host->rpm_retune_ok = true;
+	chip->rpm_retune = true;
+	mmc_retune_needed(host->mmc);
+	pr_info("%s: Requiring re-tune after rpm resume", mmc_hostname(host->mmc));
+}
+
+static void glk_rpm_retune_chk(struct sdhci_pci_chip *chip, bool susp)
+{
+	if (chip->pdev->device == PCI_DEVICE_ID_INTEL_GLK_EMMC &&
+	    !chip->rpm_retune)
+		glk_rpm_retune_wa(chip, susp);
+}
+
+static int glk_runtime_suspend(struct sdhci_pci_chip *chip)
+{
+	glk_rpm_retune_chk(chip, true);
+
+	return sdhci_cqhci_runtime_suspend(chip);
+}
+
+static int glk_runtime_resume(struct sdhci_pci_chip *chip)
+{
+	glk_rpm_retune_chk(chip, false);
+
+	return sdhci_cqhci_runtime_resume(chip);
+}
+#endif
+
 #ifdef CONFIG_ACPI
 static int ni_set_max_freq(struct sdhci_pci_slot *slot)
 {
@@ -879,8 +954,8 @@ static const struct sdhci_pci_fixes sdhci_intel_glk_emmc = {
 	.resume			= sdhci_cqhci_resume,
 #endif
 #ifdef CONFIG_PM
-	.runtime_suspend	= sdhci_cqhci_runtime_suspend,
-	.runtime_resume		= sdhci_cqhci_runtime_resume,
+	.runtime_suspend	= glk_runtime_suspend,
+	.runtime_resume		= glk_runtime_resume,
 #endif
 	.quirks			= SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC,
 	.quirks2		= SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
Anisse Astier Nov. 19, 2018, 9:48 a.m. UTC | #19
Hi Adrian,

On Mon, Nov 19, 2018 at 09:37:40AM +0200, Adrian Hunter wrote:
> On 16/11/18 6:57 PM, Anisse Astier wrote:
> > Hi Adrian,
> > 
> > On Fri, Nov 16, 2018 at 04:01:36PM +0200, Adrian Hunter wrote:
> >> On 16/11/18 11:50 AM, Adrian Hunter wrote:
> >>> On 16/11/18 11:37 AM, Anisse Astier wrote:
> >>>> Hi Adrian, 
> >>>>
> >>>> On Thu, Nov 08, 2018 at 05:43:32PM +0000, Hunter, Adrian wrote:
> >>>>> [snip]
> >>>>> I found some time today.  It seems that the ACPI _PS3 method is failing to save the tuning value.
> >>>>> That results in a CRC error, but the driver does not recover very gracefully.
> >>>>>
> >>>>> I will look at making a patch for the driver recovery, but also add some diagnostics prints to the ACPI DSDT to try to figure out what is going wrong there.
> >>>>>
> >>>>>
> >>>>
> >>>> Have you had the time to look at what diagnostics could be added ?
> >>>
> >>> I have a patch, but I am still testing it.
> >>>
> >>
> >> Please try the attached patch
> > 
> >> From 5c33302367e7852aa0a723e311b1749bea59f055 Mon Sep 17 00:00:00 2001
> >> From: Adrian Hunter <adrian.hunter@intel.com>
> >> Date: Fri, 16 Nov 2018 14:02:12 +0200
> >> Subject: [PATCH] mmc: sdhci-pci: Workaround GLK firmware failing to restore
> >>  the tuning value
> >>
> >> GLK firmware can indicate that the tuning value will be restored after
> >> runtime suspend, but not actually do that. Add a workaround that detects
> >> such cases, and lets the driver do re-tuning instead.
> >>
> > 
> > This looks like it solves the issue. Please find the boot log here:
> > 
> > https://anisse.astier.eu/static/dmesg-sdhci-glk-workaround.txt.xz
> > 
> > We still have I/O errors I don't understand, but no more long CQE
> > timeouts.
> 
> Please try this patch instead.

> From ff445c899af814f0623ef191213b9d7ddb51aa69 Mon Sep 17 00:00:00 2001
> From: Adrian Hunter <adrian.hunter@intel.com>
> Date: Fri, 16 Nov 2018 14:02:12 +0200
> Subject: [PATCH V2] mmc: sdhci-pci: Workaround GLK firmware failing to restore
>  the tuning value
> 
> GLK firmware can indicate that the tuning value will be restored after
> runtime suspend, but not actually do that. Add a workaround that detects
> such cases, and lets the driver do re-tuning instead.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>


Here is the boot log for the v2 of the patch:
https://anisse.astier.eu/static/dmesg-sdhci-glk-workaround-v2.txt.xz

There are no more I/O errors.

Regards,

Anisse
diff mbox series

Patch

diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
index 7bfd366d970d..8fac8cbc5ce8 100644
--- a/drivers/mmc/host/sdhci-pci-core.c
+++ b/drivers/mmc/host/sdhci-pci-core.c
@@ -706,10 +706,27 @@  static int intel_execute_tuning(struct mmc_host *mmc, u32 opcode)
 static void byt_probe_slot(struct sdhci_pci_slot *slot)
 {
 	struct mmc_host_ops *ops = &slot->host->mmc_host_ops;
+	struct intel_host *intel_host = sdhci_pci_priv(slot);
 
 	byt_read_dsm(slot);
 
 	ops->execute_tuning = intel_execute_tuning;
+
+	/* Check if we have the appropriate voltage switch DSM methods */
+	if (!(intel_host->dsm_fns & (1 << INTEL_DSM_V18_SWITCH))) {
+		pr_info("%s: no DSM function for 1.8 voltage switch\n",
+			 mmc_hostname(slot->host->mmc));
+		slot->host->quirks2 |= SDHCI_QUIRK2_NO_1_8_V;
+
+		if (!(intel_host->dsm_fns & (1 << INTEL_DSM_V33_SWITCH))) {
+			/* No voltage switching supported at all, there's no
+			 * point in installing the callback: return.
+			 */
+			pr_info("%s: Voltage switching unsupported\n",
+				mmc_hostname(slot->host->mmc));
+			return;
+		}
+	}
 	ops->start_signal_voltage_switch = intel_start_signal_voltage_switch;
 }