diff mbox series

[RFT] i2c: designware: ratelimit 'transfer when suspended' errors

Message ID 20190424161632.4994-1-wsa+renesas@sang-engineering.com (mailing list archive)
State Accepted
Delegated to: Geert Uytterhoeven
Headers show
Series [RFT] i2c: designware: ratelimit 'transfer when suspended' errors | expand

Commit Message

Wolfram Sang April 24, 2019, 4:16 p.m. UTC
There are two problems with dev_err() here. One: It is not ratelimited.
Two: We don't see which driver tried to transfer something with a
suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
is that we don't see if multiple drivers are trying to transfer while
suspended. They need to be discovered one after the other now. This is
better than a high CPU load because a really broken driver might try to
resend endlessly.

Link: https://bugs.archlinux.org/task/62391
Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---

skdnik: Would you be so kind and test this patch? I can only build-test here.

I have a prototype to fix the similar issue in the core, but this needs more
testing first, so I am sending this one out already.

 drivers/i2c/busses/i2c-designware-master.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

skidnik April 24, 2019, 8:54 p.m. UTC | #1
> There are two problems with dev_err() here. One: It is not ratelimited.
> Two: We don't see which driver tried to transfer something with a
> suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
> is that we don't see if multiple drivers are trying to transfer while
> suspended. They need to be discovered one after the other now. This is
> better than a high CPU load because a really broken driver might try to
> resend endlessly.
>
> Link: https://bugs.archlinux.org/task/62391
> Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> ---
>
> skdnik: Would you be so kind and test this patch? I can only build-test here.
>
> I have a prototype to fix the similar issue in the core, but this needs more
> testing first, so I am sending this one out already.
>
>   drivers/i2c/busses/i2c-designware-master.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
> index bb8e3f149979..d464799e40a3 100644
> --- a/drivers/i2c/busses/i2c-designware-master.c
> +++ b/drivers/i2c/busses/i2c-designware-master.c
> @@ -426,8 +426,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>   
>   	pm_runtime_get_sync(dev->dev);
>   
> -	if (dev->suspended) {
> -		dev_err(dev->dev, "Error %s call while suspended\n", __func__);
> +	if (dev_WARN_ONCE(dev->dev, dev->suspended, "Transfer while suspended\n")) {
>   		ret = -ESHUTDOWN;
>   		goto done_nolock;
>   	}
I have little experience in compiling and applying patches, guess I can 
add this diff as a patch to linux-mainline PKGBUILD and try to build it. 
I'll try.
Wolfram Sang April 25, 2019, 7:47 a.m. UTC | #2
On Wed, Apr 24, 2019 at 11:54:17PM +0300, skidnik wrote:
> > There are two problems with dev_err() here. One: It is not ratelimited.
> > Two: We don't see which driver tried to transfer something with a
> > suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
> > is that we don't see if multiple drivers are trying to transfer while
> > suspended. They need to be discovered one after the other now. This is
> > better than a high CPU load because a really broken driver might try to
> > resend endlessly.
> > 
> > Link: https://bugs.archlinux.org/task/62391
> > Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
> > Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
...
> I have little experience in compiling and applying patches, guess I can add
> this diff as a patch to linux-mainline PKGBUILD and try to build it. I'll
> try.

Great, thanks for trying!
Jarkko Nikula April 25, 2019, 8:18 a.m. UTC | #3
On 4/24/19 7:16 PM, Wolfram Sang wrote:
> There are two problems with dev_err() here. One: It is not ratelimited.
> Two: We don't see which driver tried to transfer something with a
> suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
> is that we don't see if multiple drivers are trying to transfer while
> suspended. They need to be discovered one after the other now. This is
> better than a high CPU load because a really broken driver might try to
> resend endlessly.
> 
> Link: https://bugs.archlinux.org/task/62391
> Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> ---
> 
> skdnik: Would you be so kind and test this patch? I can only build-test here.
> 
> I have a prototype to fix the similar issue in the core, but this needs more
> testing first, so I am sending this one out already.
> 
>   drivers/i2c/busses/i2c-designware-master.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
skidnik April 25, 2019, 7:21 p.m. UTC | #4
On 4/24/19 7:16 PM, Wolfram Sang wrote:
> There are two problems with dev_err() here. One: It is not ratelimited.
> Two: We don't see which driver tried to transfer something with a
> suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
> is that we don't see if multiple drivers are trying to transfer while
> suspended. They need to be discovered one after the other now. This is
> better than a high CPU load because a really broken driver might try to
> resend endlessly.
> 
> Link: https://bugs.archlinux.org/task/62391
> Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> ---
> 
> skdnik: Would you be so kind and test this patch? I can only build-test here.
> 
> I have a prototype to fix the similar issue in the core, but this needs more
> testing first, so I am sending this one out already.
> 
>   drivers/i2c/busses/i2c-designware-master.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
> index bb8e3f149979..d464799e40a3 100644
> --- a/drivers/i2c/busses/i2c-designware-master.c
> +++ b/drivers/i2c/busses/i2c-designware-master.c
> @@ -426,8 +426,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>   
>   	pm_runtime_get_sync(dev->dev);
>   
> -	if (dev->suspended) {
> -		dev_err(dev->dev, "Error %s call while suspended\n", __func__);
> +	if (dev_WARN_ONCE(dev->dev, dev->suspended, "Transfer while suspended\n")) {
>   		ret = -ESHUTDOWN;
>   		goto done_nolock;
>   	}
> 
Patched, biult and tested. This is the new output:
---
[  285.441552] i2c_designware 808622C1:00: Transfer while suspended
[  285.441595] WARNING: CPU: 2 PID: 1966 at 
drivers/i2c/busses/i2c-designware-master.c:429 i2c_dw_xfer+0x23b/0x290
[  285.441597] Modules linked in: 8021q garp mrp stp llc ccm 
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic 
hid_sensor_gyro_3d hid_sensor_incl_3d hid_sens
or_magn_3d hid_sensor_accel_3d hid_sensor_rotation hid_sensor_trigger 
industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common 
industrialio btusb btrtl btbcm btintel
  bluetooth uvcvideo ecdh_generic joydev mousedev intel_rapl 
intel_powerclamp kvm_intel hid_rmi rmi_core kvm videobuf2_vmalloc 
videobuf2_memops videobuf2_v4l2 arc4 videobu
f2_common ofpart cmdlinepart videodev intel_spi_platform iwlmvm 
intel_spi mac80211 spi_nor media irqbypass iTCO_wdt iTCO_vendor_support 
i915 mtd nls_iso8859_1 crct10dif_p
clmul nls_cp437 iwlwifi crc32_pclmul vfat fat ghash_clmulni_intel 
wmi_bmof dell_wmi i2c_algo_bit snd_hda_intel drm_kms_helper dell_laptop 
ledtrig_audio snd_hda_codec dell
_smbios cfg80211 snd_hda_core aesni_intel dell_wmi_descriptor dcdbas 
snd_hwdep dell_smm_hwmon aes_x86_64 snd_pcm crypto_simd drm cryptd 
snd_timer glue_helper
[  285.441696]  intel_cstate psmouse pcspkr input_leds snd mei_txe 
tpm_crb soundcore intel_gtt mei agpgart lpc_ich i2c_i801 tpm_tis 
tpm_tis_core syscopyarea sysfillrect s
ysimgblt fb_sys_fops processor_thermal_device intel_xhci_usb_role_switch 
roles intel_soc_dts_iosf tpm wmi battery rng_core i2c_hid dell_rbtn 
evdev rfkill mac_hid pcc_cpuf
req ac soc_button_array intel_vbtn int3403_thermal int3400_thermal 
acpi_thermal_rel int340x_thermal_zone sparse_keymap coretemp msr 
crypto_user ip_tables x_tables ext4 cr
c32c_generic crc16 mbcache jbd2 sd_mod hid_sensor_hub hid_multitouch 
hid_generic usbhid hid serio_raw atkbd libps2 ahci libahci xhci_pci 
libata sdhci_pci crc32c_intel xhc
i_hcd cqhci scsi_mod sdhci mmc_core i8042 serio
[  285.441784] CPU: 2 PID: 1966 Comm: kworker/u8:56 Not tainted 
5.1.0-rc6-mainline-master-00064-gcd8dead0c394 #1
[  285.441788] Hardware name: Dell Inc. Inspiron 11-3168/0G70RP, BIOS 
2.3.0 01/31/2018
[  285.441796] Workqueue: events_unbound async_run_entry_fn
[  285.441804] RIP: 0010:i2c_dw_xfer+0x23b/0x290
[  285.441810] Code: 05 be 32 ca 00 01 48 85 ed 75 05 49 8b 6c 24 10 4c 
89 e7 e8 f7 b5 f4 ff 48 89 ea 48 c7 c7 90 a7 b1 86 48 89 c6 e8 3f 6a a1 
ff <0f> 0b bd 94 ff ff ff
4c 8b 23 e9 49 fe ff ff 83 f8 01 74 14 8b 83
[  285.441814] RSP: 0000:ffffb40a44d9fc70 EFLAGS: 00010282
[  285.441819] RAX: 0000000000000000 RBX: ffff9e313a4d8818 RCX: 
0000000000000000
[  285.441822] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 
00000000ffffffff
[  285.441824] RBP: ffff9e313a4bc640 R08: 0000000000000000 R09: 
0000000000021500
[  285.441828] R10: 000000097a05f1bc R11: 0000000000010148 R12: 
ffff9e313a4ea810
[  285.441831] R13: ffff9e313a4d88c8 R14: 00000000ffffee52 R15: 
ffff9e313a4fb000
[  285.441835] FS:  0000000000000000(0000) GS:ffff9e313bb00000(0000) 
knlGS:0000000000000000
[  285.441838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  285.441840] CR2: 0000000000000000 CR3: 000000002b60e000 CR4: 
00000000001006e0
[  285.441844] Call Trace:
[  285.441860]  __i2c_transfer+0x153/0x4a0
[  285.441868]  i2c_transfer+0x51/0xc0
[  285.441881]  __i2c_hid_command+0x1ed/0x2c0 [i2c_hid]
[  285.441889]  ? finish_wait+0x2f/0x60
[  285.441897]  ? _raw_spin_unlock_irqrestore+0x20/0x40
[  285.441902]  ? async_synchronize_cookie_domain+0xdf/0x150
[  285.441907]  ? wait_woken+0x80/0x80
[  285.441915]  ? chv_gpio_irq_mask_unmask+0x2b/0xb0
[  285.441923]  i2c_hid_set_power+0x6c/0x100 [i2c_hid]
[  285.441931]  i2c_hid_resume+0x85/0x100 [i2c_hid]
[  285.441939]  ? i2c_hid_runtime_resume+0x30/0x30 [i2c_hid]
[  285.441945]  dpm_run_callback+0x4f/0x180
[  285.441953]  device_resume+0xac/0x1f0
[  285.441959]  async_resume+0x19/0x30
[  285.441964]  async_run_entry_fn+0x37/0x140
[  285.441972]  process_one_work+0x1eb/0x410
[  285.441980]  worker_thread+0x2d/0x3d0
[  285.441986]  ? process_one_work+0x410/0x410
[  285.441991]  kthread+0x112/0x130
[  285.441997]  ? kthread_park+0x80/0x80
[  285.442003]  ret_from_fork+0x35/0x40
[  285.442013] ---[ end trace c98b327600d4a21b ]---
[  285.442042] i2c_hid i2c-DELL074D:00: failed to change power setting.
[  285.442051] PM: dpm_run_callback(): i2c_hid_resume+0x0/0x100 
[i2c_hid] returns -108
[  285.442063] PM: Device i2c-DELL074D:00 failed to restore async: error 
-108
---
outputs this once.
Full dmesg from boot to resume from hibernation: 
https://pastebin.com/raw/9akVTHL4
sorry for the weird kernel naming.
This solves system journal flooding, but I still have one core under 
full load after resume from hibernation. The touchpad attached to that 
bus works perfectly fine after resume.
Wolfram Sang April 26, 2019, 2:44 a.m. UTC | #5
> Patched, biult and tested. This is the new output:

Thank you!

> This solves system journal flooding, but I still have one core under full

Good, so the patch is OK and a first step.

> load after resume from hibernation. The touchpad attached to that bus works
> perfectly fine after resume.

Interesting that the touchpad works. From a first look, I can't tell why
there is the high CPU load. Analyzing the call trace, I can see a flaw
in the designware PCI driver, though. It uses UNIVERSAL_DEV_PM_OPS(),
but should use SET_LATE_SYSTEM_SLEEP_PM_OPS() like the platform driver,
so it will be a bit later and allow regular devices to suspend.

Andy? Jarkko? Mika?
Andy Shevchenko April 26, 2019, 2:29 p.m. UTC | #6
On Wed, Apr 24, 2019 at 06:16:32PM +0200, Wolfram Sang wrote:
> There are two problems with dev_err() here. One: It is not ratelimited.
> Two: We don't see which driver tried to transfer something with a
> suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
> is that we don't see if multiple drivers are trying to transfer while
> suspended. They need to be discovered one after the other now. This is
> better than a high CPU load because a really broken driver might try to
> resend endlessly.
> 

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> Link: https://bugs.archlinux.org/task/62391
> Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> ---
> 
> skdnik: Would you be so kind and test this patch? I can only build-test here.
> 
> I have a prototype to fix the similar issue in the core, but this needs more
> testing first, so I am sending this one out already.
> 
>  drivers/i2c/busses/i2c-designware-master.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
> index bb8e3f149979..d464799e40a3 100644
> --- a/drivers/i2c/busses/i2c-designware-master.c
> +++ b/drivers/i2c/busses/i2c-designware-master.c
> @@ -426,8 +426,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>  
>  	pm_runtime_get_sync(dev->dev);
>  
> -	if (dev->suspended) {
> -		dev_err(dev->dev, "Error %s call while suspended\n", __func__);
> +	if (dev_WARN_ONCE(dev->dev, dev->suspended, "Transfer while suspended\n")) {
>  		ret = -ESHUTDOWN;
>  		goto done_nolock;
>  	}
> -- 
> 2.11.0
>
Hans de Goede April 30, 2019, 2:09 p.m. UTC | #7
Hi,

On 4/25/19 9:21 PM, skidnik wrote:
> On 4/24/19 7:16 PM, Wolfram Sang wrote:
>> There are two problems with dev_err() here. One: It is not ratelimited.
>> Two: We don't see which driver tried to transfer something with a
>> suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
>> is that we don't see if multiple drivers are trying to transfer while
>> suspended. They need to be discovered one after the other now. This is
>> better than a high CPU load because a really broken driver might try to
>> resend endlessly.
>>
>> Link: https://bugs.archlinux.org/task/62391
>> Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
>> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
>> ---
>>
>> skdnik: Would you be so kind and test this patch? I can only build-test here.
>>
>> I have a prototype to fix the similar issue in the core, but this needs more
>> testing first, so I am sending this one out already.
>>
>>   drivers/i2c/busses/i2c-designware-master.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
>> index bb8e3f149979..d464799e40a3 100644
>> --- a/drivers/i2c/busses/i2c-designware-master.c
>> +++ b/drivers/i2c/busses/i2c-designware-master.c
>> @@ -426,8 +426,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>>       pm_runtime_get_sync(dev->dev);
>> -    if (dev->suspended) {
>> -        dev_err(dev->dev, "Error %s call while suspended\n", __func__);
>> +    if (dev_WARN_ONCE(dev->dev, dev->suspended, "Transfer while suspended\n")) {
>>           ret = -ESHUTDOWN;
>>           goto done_nolock;
>>       }
>>
> Patched, biult and tested. This is the new output:
> ---
> [  285.441552] i2c_designware 808622C1:00: Transfer while suspended
> [  285.441595] WARNING: CPU: 2 PID: 1966 at drivers/i2c/busses/i2c-designware-master.c:429 i2c_dw_xfer+0x23b/0x290
> [  285.441597] Modules linked in: 8021q garp mrp stp llc ccm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_sensor_gyro_3d hid_sensor_incl_3d hid_sens
> or_magn_3d hid_sensor_accel_3d hid_sensor_rotation hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio btusb btrtl btbcm btintel
>   bluetooth uvcvideo ecdh_generic joydev mousedev intel_rapl intel_powerclamp kvm_intel hid_rmi rmi_core kvm videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 arc4 videobu
> f2_common ofpart cmdlinepart videodev intel_spi_platform iwlmvm intel_spi mac80211 spi_nor media irqbypass iTCO_wdt iTCO_vendor_support i915 mtd nls_iso8859_1 crct10dif_p
> clmul nls_cp437 iwlwifi crc32_pclmul vfat fat ghash_clmulni_intel wmi_bmof dell_wmi i2c_algo_bit snd_hda_intel drm_kms_helper dell_laptop ledtrig_audio snd_hda_codec dell
> _smbios cfg80211 snd_hda_core aesni_intel dell_wmi_descriptor dcdbas snd_hwdep dell_smm_hwmon aes_x86_64 snd_pcm crypto_simd drm cryptd snd_timer glue_helper
> [  285.441696]  intel_cstate psmouse pcspkr input_leds snd mei_txe tpm_crb soundcore intel_gtt mei agpgart lpc_ich i2c_i801 tpm_tis tpm_tis_core syscopyarea sysfillrect s
> ysimgblt fb_sys_fops processor_thermal_device intel_xhci_usb_role_switch roles intel_soc_dts_iosf tpm wmi battery rng_core i2c_hid dell_rbtn evdev rfkill mac_hid pcc_cpuf
> req ac soc_button_array intel_vbtn int3403_thermal int3400_thermal acpi_thermal_rel int340x_thermal_zone sparse_keymap coretemp msr crypto_user ip_tables x_tables ext4 cr
> c32c_generic crc16 mbcache jbd2 sd_mod hid_sensor_hub hid_multitouch hid_generic usbhid hid serio_raw atkbd libps2 ahci libahci xhci_pci libata sdhci_pci crc32c_intel xhc
> i_hcd cqhci scsi_mod sdhci mmc_core i8042 serio
> [  285.441784] CPU: 2 PID: 1966 Comm: kworker/u8:56 Not tainted 5.1.0-rc6-mainline-master-00064-gcd8dead0c394 #1
> [  285.441788] Hardware name: Dell Inc. Inspiron 11-3168/0G70RP, BIOS 2.3.0 01/31/2018
> [  285.441796] Workqueue: events_unbound async_run_entry_fn
> [  285.441804] RIP: 0010:i2c_dw_xfer+0x23b/0x290
> [  285.441810] Code: 05 be 32 ca 00 01 48 85 ed 75 05 49 8b 6c 24 10 4c 89 e7 e8 f7 b5 f4 ff 48 89 ea 48 c7 c7 90 a7 b1 86 48 89 c6 e8 3f 6a a1 ff <0f> 0b bd 94 ff ff ff
> 4c 8b 23 e9 49 fe ff ff 83 f8 01 74 14 8b 83
> [  285.441814] RSP: 0000:ffffb40a44d9fc70 EFLAGS: 00010282
> [  285.441819] RAX: 0000000000000000 RBX: ffff9e313a4d8818 RCX: 0000000000000000
> [  285.441822] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 00000000ffffffff
> [  285.441824] RBP: ffff9e313a4bc640 R08: 0000000000000000 R09: 0000000000021500
> [  285.441828] R10: 000000097a05f1bc R11: 0000000000010148 R12: ffff9e313a4ea810
> [  285.441831] R13: ffff9e313a4d88c8 R14: 00000000ffffee52 R15: ffff9e313a4fb000
> [  285.441835] FS:  0000000000000000(0000) GS:ffff9e313bb00000(0000) knlGS:0000000000000000
> [  285.441838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  285.441840] CR2: 0000000000000000 CR3: 000000002b60e000 CR4: 00000000001006e0
> [  285.441844] Call Trace:
> [  285.441860]  __i2c_transfer+0x153/0x4a0
> [  285.441868]  i2c_transfer+0x51/0xc0
> [  285.441881]  __i2c_hid_command+0x1ed/0x2c0 [i2c_hid]
> [  285.441889]  ? finish_wait+0x2f/0x60
> [  285.441897]  ? _raw_spin_unlock_irqrestore+0x20/0x40
> [  285.441902]  ? async_synchronize_cookie_domain+0xdf/0x150
> [  285.441907]  ? wait_woken+0x80/0x80
> [  285.441915]  ? chv_gpio_irq_mask_unmask+0x2b/0xb0
> [  285.441923]  i2c_hid_set_power+0x6c/0x100 [i2c_hid]
> [  285.441931]  i2c_hid_resume+0x85/0x100 [i2c_hid]
> [  285.441939]  ? i2c_hid_runtime_resume+0x30/0x30 [i2c_hid]
> [  285.441945]  dpm_run_callback+0x4f/0x180
> [  285.441953]  device_resume+0xac/0x1f0
> [  285.441959]  async_resume+0x19/0x30
> [  285.441964]  async_run_entry_fn+0x37/0x140
> [  285.441972]  process_one_work+0x1eb/0x410
> [  285.441980]  worker_thread+0x2d/0x3d0
> [  285.441986]  ? process_one_work+0x410/0x410
> [  285.441991]  kthread+0x112/0x130
> [  285.441997]  ? kthread_park+0x80/0x80
> [  285.442003]  ret_from_fork+0x35/0x40
> [  285.442013] ---[ end trace c98b327600d4a21b ]---
> [  285.442042] i2c_hid i2c-DELL074D:00: failed to change power setting.
> [  285.442051] PM: dpm_run_callback(): i2c_hid_resume+0x0/0x100 [i2c_hid] returns -108
> [  285.442063] PM: Device i2c-DELL074D:00 failed to restore async: error -108
> ---
> outputs this once.
> Full dmesg from boot to resume from hibernation: https://pastebin.com/raw/9akVTHL4
> sorry for the weird kernel naming.
> This solves system journal flooding, but I still have one core under full load after resume from hibernation. The touchpad attached to that bus works perfectly fine after resume.

Ah, this is on a resume from hibernate, in that case I think this patch will fix this:

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?id=c8afd03486c26accdda4846e5561aa3f8e862a9d

Can you build a kernel with that patch added?

Regards,

Hans
skidnik April 30, 2019, 8:15 p.m. UTC | #8
On 4/30/19 5:09 PM, Hans de Goede wrote:
> Hi,
> 
> On 4/25/19 9:21 PM, skidnik wrote:
>> On 4/24/19 7:16 PM, Wolfram Sang wrote:
>>> There are two problems with dev_err() here. One: It is not ratelimited.
>>> Two: We don't see which driver tried to transfer something with a
>>> suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
>>> is that we don't see if multiple drivers are trying to transfer while
>>> suspended. They need to be discovered one after the other now. This is
>>> better than a high CPU load because a really broken driver might try to
>>> resend endlessly.
>>>
>>> Link: https://bugs.archlinux.org/task/62391
>>> Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() 
>>> calls while suspended")
>>> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
>>> ---
>>>
>>> skdnik: Would you be so kind and test this patch? I can only 
>>> build-test here.
>>>
>>> I have a prototype to fix the similar issue in the core, but this 
>>> needs more
>>> testing first, so I am sending this one out already.
>>>
>>>   drivers/i2c/busses/i2c-designware-master.c | 3 +--
>>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/i2c/busses/i2c-designware-master.c 
>>> b/drivers/i2c/busses/i2c-designware-master.c
>>> index bb8e3f149979..d464799e40a3 100644
>>> --- a/drivers/i2c/busses/i2c-designware-master.c
>>> +++ b/drivers/i2c/busses/i2c-designware-master.c
>>> @@ -426,8 +426,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct 
>>> i2c_msg msgs[], int num)
>>>       pm_runtime_get_sync(dev->dev);
>>> -    if (dev->suspended) {
>>> -        dev_err(dev->dev, "Error %s call while suspended\n", __func__);
>>> +    if (dev_WARN_ONCE(dev->dev, dev->suspended, "Transfer while 
>>> suspended\n")) {
>>>           ret = -ESHUTDOWN;
>>>           goto done_nolock;
>>>       }
>>>
>> This solves system journal flooding, but I still have one core under 
>> full load after resume from hibernation. The touchpad attached to that 
>> bus works perfectly fine after resume.
> 
> Ah, this is on a resume from hibernate, in that case I think this patch 
> will fix this:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?id=c8afd03486c26accdda4846e5561aa3f8e862a9d 
> 
> 
> Can you build a kernel with that patch added?
> 
> Regards,
> 
> Hans
> 
Built linux-5.1.0-rc7 (linux-mainline PKGBUILD from aur) with both 
patches applied.

This has fixed the issue. After resume from hibernation idle CPU usage 
goes back to normal (around 1%). The i2c_designware related error 
doesn't appear in kernel log.

I'm going to report back to Archlinux bug tracker that the issue has 
been fixed. Please can you tell which kernel release will have these 
fixes so that I can provide relevant information.

Thanks for the great work, regards
skidnik
Hans de Goede May 1, 2019, 1:59 p.m. UTC | #9
Hi,

On 30-04-19 22:15, skidnik wrote:
> On 4/30/19 5:09 PM, Hans de Goede wrote:
>> Hi,
>>
>> On 4/25/19 9:21 PM, skidnik wrote:
>>> On 4/24/19 7:16 PM, Wolfram Sang wrote:
>>>> There are two problems with dev_err() here. One: It is not ratelimited.
>>>> Two: We don't see which driver tried to transfer something with a
>>>> suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
>>>> is that we don't see if multiple drivers are trying to transfer while
>>>> suspended. They need to be discovered one after the other now. This is
>>>> better than a high CPU load because a really broken driver might try to
>>>> resend endlessly.
>>>>
>>>> Link: https://bugs.archlinux.org/task/62391
>>>> Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
>>>> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
>>>> ---
>>>>
>>>> skdnik: Would you be so kind and test this patch? I can only build-test here.
>>>>
>>>> I have a prototype to fix the similar issue in the core, but this needs more
>>>> testing first, so I am sending this one out already.
>>>>
>>>>   drivers/i2c/busses/i2c-designware-master.c | 3 +--
>>>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
>>>> index bb8e3f149979..d464799e40a3 100644
>>>> --- a/drivers/i2c/busses/i2c-designware-master.c
>>>> +++ b/drivers/i2c/busses/i2c-designware-master.c
>>>> @@ -426,8 +426,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>>>>       pm_runtime_get_sync(dev->dev);
>>>> -    if (dev->suspended) {
>>>> -        dev_err(dev->dev, "Error %s call while suspended\n", __func__);
>>>> +    if (dev_WARN_ONCE(dev->dev, dev->suspended, "Transfer while suspended\n")) {
>>>>           ret = -ESHUTDOWN;
>>>>           goto done_nolock;
>>>>       }
>>>>
>>> This solves system journal flooding, but I still have one core under full load after resume from hibernation. The touchpad attached to that bus works perfectly fine after resume.
>>
>> Ah, this is on a resume from hibernate, in that case I think this patch will fix this:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?id=c8afd03486c26accdda4846e5561aa3f8e862a9d
>>
>> Can you build a kernel with that patch added?
>>
>> Regards,
>>
>> Hans
>>
> Built linux-5.1.0-rc7 (linux-mainline PKGBUILD from aur) with both patches applied.
> 
> This has fixed the issue. After resume from hibernation idle CPU usage goes back to normal (around 1%). The i2c_designware related error doesn't appear in kernel log.
> 
> I'm going to report back to Archlinux bug tracker that the issue has been fixed. Please can you tell which kernel release will have these fixes so that I can provide relevant information.

The patch is scheduled for inclusion into 5.2 and it has a Cc: stable,
so it will get backported to older kernels like the 5.1.x and 5.0.x releases
once it is merged into Linus' tree.

Regards,

Hans
Wolfram Sang May 2, 2019, 4:29 p.m. UTC | #10
On Wed, Apr 24, 2019 at 06:16:32PM +0200, Wolfram Sang wrote:
> There are two problems with dev_err() here. One: It is not ratelimited.
> Two: We don't see which driver tried to transfer something with a
> suspended adapter. Switch to dev_WARN_ONCE to fix both issues. Drawback
> is that we don't see if multiple drivers are trying to transfer while
> suspended. They need to be discovered one after the other now. This is
> better than a high CPU load because a really broken driver might try to
> resend endlessly.
> 
> Link: https://bugs.archlinux.org/task/62391
> Fixes: 275154155538 ("i2c: designware: Do not allow i2c_dw_xfer() calls while suspended")
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

Applied to for-current-fixed, thanks!
Wolfram Sang May 2, 2019, 4:31 p.m. UTC | #11
> I'm going to report back to Archlinux bug tracker that the issue has been
> fixed. Please can you tell which kernel release will have these fixes so
> that I can provide relevant information.

The above patch will be in the new kernel next week, be it rc8 or 5.1.

> Thanks for the great work, regards

And thanks to you for all the help testing on HW!
diff mbox series

Patch

diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
index bb8e3f149979..d464799e40a3 100644
--- a/drivers/i2c/busses/i2c-designware-master.c
+++ b/drivers/i2c/busses/i2c-designware-master.c
@@ -426,8 +426,7 @@  i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
 
 	pm_runtime_get_sync(dev->dev);
 
-	if (dev->suspended) {
-		dev_err(dev->dev, "Error %s call while suspended\n", __func__);
+	if (dev_WARN_ONCE(dev->dev, dev->suspended, "Transfer while suspended\n")) {
 		ret = -ESHUTDOWN;
 		goto done_nolock;
 	}