diff mbox series

HID: logitech-dj: Fix crash when initial logi_dj_recv_query_paired_devices fails

Message ID 20190825153542.79245-1-hdegoede@redhat.com (mailing list archive)
State Mainlined
Commit 8ccff2843fb4e6d9d26e5ae9ffe9840b38b92638
Delegated to: Jiri Kosina
Headers show
Series HID: logitech-dj: Fix crash when initial logi_dj_recv_query_paired_devices fails | expand

Commit Message

Hans de Goede Aug. 25, 2019, 3:35 p.m. UTC
Before this commit dj_probe would exit with an error if the initial
logi_dj_recv_query_paired_devices fails. The initial call may fail
when the receiver is connected through a kvm and the focus is away.

When the call fails this causes 2 problems:

1) dj_probe calls logi_dj_recv_query_paired_devices after calling
hid_device_io_start() so a HID report may have been received in between
and our delayedwork_callback may be running. It seems that the initial
logi_dj_recv_query_paired_devices failure happening with some KVMs triggers
this exact scenario, causing the work-queue to run on free-ed memory,
leading to:

 BUG: unable to handle page fault for address: 0000000000001e88
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 257 Comm: kworker/3:3 Tainted: G           OE     5.3.0-rc5+ #100
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B150M Pro4S/D3, BIOS P7.10 12/06/2016
 Workqueue: events 0xffffffffc02ba200
 RIP: 0010:0xffffffffc02ba1bd
 Code: e8 e8 13 00 d8 48 89 c5 48 85 c0 74 4c 48 8b 7b 10 48 89 ea b9 07 00 00 00 41 b9 09 00 00 00 41 b8 01 00 00 00 be 10 00 00 00 <48> 8b 87 88 1e 00 00 48 8b 40 40 e8 b3 6b b4 d8 48 89 ef 41 89 c4
 RSP: 0018:ffffb760c046bdb8 EFLAGS: 00010286
 RAX: ffff935038ea4550 RBX: ffff935046778000 RCX: 0000000000000007
 RDX: ffff935038ea4550 RSI: 0000000000000010 RDI: 0000000000000000
 RBP: ffff935038ea4550 R08: 0000000000000001 R09: 0000000000000009
 R10: 000000000000e011 R11: 0000000000000001 R12: ffff9350467780e8
 R13: ffff935046778000 R14: 0000000000000000 R15: ffff935046778070
 FS:  0000000000000000(0000) GS:ffff935054e00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000001e88 CR3: 000000075a612002 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  0xffffffffc02ba2f7
  ? process_one_work+0x1b1/0x560
  process_one_work+0x234/0x560
  worker_thread+0x50/0x3b0
  kthread+0x10a/0x140
  ? process_one_work+0x560/0x560
  ? kthread_park+0x80/0x80
  ret_from_fork+0x3a/0x50
 Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep vfat fat btusb btrtl btbcm btintel bluetooth intel_rapl_msr ecdh_generic rfkill ecc snd_usb_audio snd_usbmidi_lib intel_rapl_common snd_rawmidi mc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support mei_wdt mei_hdcp ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_generic crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ghash_clmulni_intel intel_cstate snd_hda_intel snd_hda_codec intel_uncore snd_hda_core snd_hwdep intel_rapl_perf snd_seq snd_seq_device snd_pcm snd_timer intel_wmi_thunderbolt snd e1000e soundcore mxm_wmi i2c_i801 bfq mei_me mei intel_pch_thermal parport_pc parport acpi_pad binfmt_misc hid_lg_g15(E) hid_logitech_dj(E) i915 crc32c_intel i2c_algo_bit drm_kms_helper nvme nvme_core drm wmi video uas usb_storage i2c_dev
 CR2: 0000000000001e88
 ---[ end trace 1d3f8afdcfcbd842 ]---

2) Even if we were to fix 1. by making sure the work is stopped before
failing probe, failing probe is the wrong thing to do, we have
logi_dj_recv_queue_unknown_work to deal with the initial
logi_dj_recv_query_paired_devices failure.

Rather then error-ing out of the probe, causing the receiver to not work at
all we should rely on this, so that the attached devices will get properly
enumerated once the KVM focus is switched back.

Cc: stable@vger.kernel.org
Fixes: 74808f9115ce ("HID: logitech-dj: add support for non unifying receivers")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/hid/hid-logitech-dj.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Benjamin Tissoires Aug. 26, 2019, 7:46 a.m. UTC | #1
Hi Hans,

On Sun, Aug 25, 2019 at 5:35 PM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Before this commit dj_probe would exit with an error if the initial
> logi_dj_recv_query_paired_devices fails. The initial call may fail
> when the receiver is connected through a kvm and the focus is away.
>
> When the call fails this causes 2 problems:
>
> 1) dj_probe calls logi_dj_recv_query_paired_devices after calling
> hid_device_io_start() so a HID report may have been received in between
> and our delayedwork_callback may be running. It seems that the initial
> logi_dj_recv_query_paired_devices failure happening with some KVMs triggers
> this exact scenario, causing the work-queue to run on free-ed memory,
> leading to:
>
>  BUG: unable to handle page fault for address: 0000000000001e88
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 0 P4D 0
>  Oops: 0000 [#1] SMP PTI
>  CPU: 3 PID: 257 Comm: kworker/3:3 Tainted: G           OE     5.3.0-rc5+ #100
>  Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B150M Pro4S/D3, BIOS P7.10 12/06/2016
>  Workqueue: events 0xffffffffc02ba200
>  RIP: 0010:0xffffffffc02ba1bd
>  Code: e8 e8 13 00 d8 48 89 c5 48 85 c0 74 4c 48 8b 7b 10 48 89 ea b9 07 00 00 00 41 b9 09 00 00 00 41 b8 01 00 00 00 be 10 00 00 00 <48> 8b 87 88 1e 00 00 48 8b 40 40 e8 b3 6b b4 d8 48 89 ef 41 89 c4
>  RSP: 0018:ffffb760c046bdb8 EFLAGS: 00010286
>  RAX: ffff935038ea4550 RBX: ffff935046778000 RCX: 0000000000000007
>  RDX: ffff935038ea4550 RSI: 0000000000000010 RDI: 0000000000000000
>  RBP: ffff935038ea4550 R08: 0000000000000001 R09: 0000000000000009
>  R10: 000000000000e011 R11: 0000000000000001 R12: ffff9350467780e8
>  R13: ffff935046778000 R14: 0000000000000000 R15: ffff935046778070
>  FS:  0000000000000000(0000) GS:ffff935054e00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000001e88 CR3: 000000075a612002 CR4: 00000000003606e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   0xffffffffc02ba2f7
>   ? process_one_work+0x1b1/0x560
>   process_one_work+0x234/0x560
>   worker_thread+0x50/0x3b0
>   kthread+0x10a/0x140
>   ? process_one_work+0x560/0x560
>   ? kthread_park+0x80/0x80
>   ret_from_fork+0x3a/0x50
>  Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep vfat fat btusb btrtl btbcm btintel bluetooth intel_rapl_msr ecdh_generic rfkill ecc snd_usb_audio snd_usbmidi_lib intel_rapl_common snd_rawmidi mc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support mei_wdt mei_hdcp ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_generic crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ghash_clmulni_intel intel_cstate snd_hda_intel snd_hda_codec intel_uncore snd_hda_core snd_hwdep intel_rapl_perf snd_seq snd_seq_device snd_pcm snd_timer intel_wmi_thunderbolt snd e1000e soundcore mxm_wmi i2c_i801 bfq mei_me mei intel_pch_thermal parport_pc parport acpi_pad binfmt_misc hid_lg_g15(E) hid_logitech_dj(E) i915 crc32c_intel i2c_algo_bit drm_kms_helper nvme nvme_core drm wmi video uas usb_storage i2c_dev
>  CR2: 0000000000001e88
>  ---[ end trace 1d3f8afdcfcbd842 ]---
>
> 2) Even if we were to fix 1. by making sure the work is stopped before
> failing probe, failing probe is the wrong thing to do, we have
> logi_dj_recv_queue_unknown_work to deal with the initial
> logi_dj_recv_query_paired_devices failure.
>
> Rather then error-ing out of the probe, causing the receiver to not work at
> all we should rely on this, so that the attached devices will get properly
> enumerated once the KVM focus is switched back.
>
> Cc: stable@vger.kernel.org
> Fixes: 74808f9115ce ("HID: logitech-dj: add support for non unifying receivers")
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---

Patch looks good to me, but doesn't logi_dj_recv_switch_to_dj_mode()
has the same potential issue?

Cheers,
Benjamin

>  drivers/hid/hid-logitech-dj.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
> index cc47f948c1d0..7badbaa18878 100644
> --- a/drivers/hid/hid-logitech-dj.c
> +++ b/drivers/hid/hid-logitech-dj.c
> @@ -1734,14 +1734,14 @@ static int logi_dj_probe(struct hid_device *hdev,
>                 if (retval < 0) {
>                         hid_err(hdev, "%s: logi_dj_recv_query_paired_devices error:%d\n",
>                                 __func__, retval);
> -                       goto logi_dj_recv_query_paired_devices_failed;
> +                       /*
> +                        * This can happen with a KVM, let the probe succeed,
> +                        * logi_dj_recv_queue_unknown_work will retry later.
> +                        */
>                 }
>         }
>
> -       return retval;
> -
> -logi_dj_recv_query_paired_devices_failed:
> -       hid_hw_close(hdev);
> +       return 0;
>
>  llopen_failed:
>  switch_to_dj_mode_fail:
> --
> 2.23.0
>
Hans de Goede Aug. 26, 2019, 9:03 a.m. UTC | #2
Hi,

On 26-08-19 09:46, Benjamin Tissoires wrote:
> Hi Hans,
> 
> On Sun, Aug 25, 2019 at 5:35 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Before this commit dj_probe would exit with an error if the initial
>> logi_dj_recv_query_paired_devices fails. The initial call may fail
>> when the receiver is connected through a kvm and the focus is away.
>>
>> When the call fails this causes 2 problems:
>>
>> 1) dj_probe calls logi_dj_recv_query_paired_devices after calling
>> hid_device_io_start() so a HID report may have been received in between
>> and our delayedwork_callback may be running. It seems that the initial
>> logi_dj_recv_query_paired_devices failure happening with some KVMs triggers
>> this exact scenario, causing the work-queue to run on free-ed memory,
>> leading to:
>>
>>   BUG: unable to handle page fault for address: 0000000000001e88
>>   #PF: supervisor read access in kernel mode
>>   #PF: error_code(0x0000) - not-present page
>>   PGD 0 P4D 0
>>   Oops: 0000 [#1] SMP PTI
>>   CPU: 3 PID: 257 Comm: kworker/3:3 Tainted: G           OE     5.3.0-rc5+ #100
>>   Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B150M Pro4S/D3, BIOS P7.10 12/06/2016
>>   Workqueue: events 0xffffffffc02ba200
>>   RIP: 0010:0xffffffffc02ba1bd
>>   Code: e8 e8 13 00 d8 48 89 c5 48 85 c0 74 4c 48 8b 7b 10 48 89 ea b9 07 00 00 00 41 b9 09 00 00 00 41 b8 01 00 00 00 be 10 00 00 00 <48> 8b 87 88 1e 00 00 48 8b 40 40 e8 b3 6b b4 d8 48 89 ef 41 89 c4
>>   RSP: 0018:ffffb760c046bdb8 EFLAGS: 00010286
>>   RAX: ffff935038ea4550 RBX: ffff935046778000 RCX: 0000000000000007
>>   RDX: ffff935038ea4550 RSI: 0000000000000010 RDI: 0000000000000000
>>   RBP: ffff935038ea4550 R08: 0000000000000001 R09: 0000000000000009
>>   R10: 000000000000e011 R11: 0000000000000001 R12: ffff9350467780e8
>>   R13: ffff935046778000 R14: 0000000000000000 R15: ffff935046778070
>>   FS:  0000000000000000(0000) GS:ffff935054e00000(0000) knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 0000000000001e88 CR3: 000000075a612002 CR4: 00000000003606e0
>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   Call Trace:
>>    0xffffffffc02ba2f7
>>    ? process_one_work+0x1b1/0x560
>>    process_one_work+0x234/0x560
>>    worker_thread+0x50/0x3b0
>>    kthread+0x10a/0x140
>>    ? process_one_work+0x560/0x560
>>    ? kthread_park+0x80/0x80
>>    ret_from_fork+0x3a/0x50
>>   Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep vfat fat btusb btrtl btbcm btintel bluetooth intel_rapl_msr ecdh_generic rfkill ecc snd_usb_audio snd_usbmidi_lib intel_rapl_common snd_rawmidi mc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support mei_wdt mei_hdcp ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_generic crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ghash_clmulni_intel intel_cstate snd_hda_intel snd_hda_codec intel_uncore snd_hda_core snd_hwdep intel_rapl_perf snd_seq snd_seq_device snd_pcm snd_timer intel_wmi_thunderbolt snd e1000e soundcore mxm_wmi i2c_i801 bfq mei_me mei intel_pch_thermal parport_pc parport acpi_pad binfmt_misc hid_lg_g15(E) hid_logitech_dj(E) i915 crc32c_intel i2c_algo_bit drm_kms_helper nvme nvme_core drm wmi video uas usb_storage i2c_dev
>>   CR2: 0000000000001e88
>>   ---[ end trace 1d3f8afdcfcbd842 ]---
>>
>> 2) Even if we were to fix 1. by making sure the work is stopped before
>> failing probe, failing probe is the wrong thing to do, we have
>> logi_dj_recv_queue_unknown_work to deal with the initial
>> logi_dj_recv_query_paired_devices failure.
>>
>> Rather then error-ing out of the probe, causing the receiver to not work at
>> all we should rely on this, so that the attached devices will get properly
>> enumerated once the KVM focus is switched back.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 74808f9115ce ("HID: logitech-dj: add support for non unifying receivers")
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
> 
> Patch looks good to me, but doesn't logi_dj_recv_switch_to_dj_mode()
> has the same potential issue?

logi_dj_recv_query_paired_devices() solicits data being send from the
device, so we do: hid_device_io_start(hdev); just before calling it.

logi_dj_recv_switch_to_dj_mode() is done before the hid_device_io_start()
so it cannot cause the work to get queued.

Also logi_dj_recv_switch_to_dj_mode() failing is something we cannot
recover from.

Regards,

Hans



> 
> Cheers,
> Benjamin
> 
>>   drivers/hid/hid-logitech-dj.c | 10 +++++-----
>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
>> index cc47f948c1d0..7badbaa18878 100644
>> --- a/drivers/hid/hid-logitech-dj.c
>> +++ b/drivers/hid/hid-logitech-dj.c
>> @@ -1734,14 +1734,14 @@ static int logi_dj_probe(struct hid_device *hdev,
>>                  if (retval < 0) {
>>                          hid_err(hdev, "%s: logi_dj_recv_query_paired_devices error:%d\n",
>>                                  __func__, retval);
>> -                       goto logi_dj_recv_query_paired_devices_failed;
>> +                       /*
>> +                        * This can happen with a KVM, let the probe succeed,
>> +                        * logi_dj_recv_queue_unknown_work will retry later.
>> +                        */
>>                  }
>>          }
>>
>> -       return retval;
>> -
>> -logi_dj_recv_query_paired_devices_failed:
>> -       hid_hw_close(hdev);
>> +       return 0;
>>
>>   llopen_failed:
>>   switch_to_dj_mode_fail:
>> --
>> 2.23.0
>>
Benjamin Tissoires Aug. 26, 2019, 9:41 a.m. UTC | #3
On Mon, Aug 26, 2019 at 11:04 AM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 26-08-19 09:46, Benjamin Tissoires wrote:
> > Hi Hans,
> >
> > On Sun, Aug 25, 2019 at 5:35 PM Hans de Goede <hdegoede@redhat.com> wrote:
> >>
> >> Before this commit dj_probe would exit with an error if the initial
> >> logi_dj_recv_query_paired_devices fails. The initial call may fail
> >> when the receiver is connected through a kvm and the focus is away.
> >>
> >> When the call fails this causes 2 problems:
> >>
> >> 1) dj_probe calls logi_dj_recv_query_paired_devices after calling
> >> hid_device_io_start() so a HID report may have been received in between
> >> and our delayedwork_callback may be running. It seems that the initial
> >> logi_dj_recv_query_paired_devices failure happening with some KVMs triggers
> >> this exact scenario, causing the work-queue to run on free-ed memory,
> >> leading to:
> >>
> >>   BUG: unable to handle page fault for address: 0000000000001e88
> >>   #PF: supervisor read access in kernel mode
> >>   #PF: error_code(0x0000) - not-present page
> >>   PGD 0 P4D 0
> >>   Oops: 0000 [#1] SMP PTI
> >>   CPU: 3 PID: 257 Comm: kworker/3:3 Tainted: G           OE     5.3.0-rc5+ #100
> >>   Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B150M Pro4S/D3, BIOS P7.10 12/06/2016
> >>   Workqueue: events 0xffffffffc02ba200
> >>   RIP: 0010:0xffffffffc02ba1bd
> >>   Code: e8 e8 13 00 d8 48 89 c5 48 85 c0 74 4c 48 8b 7b 10 48 89 ea b9 07 00 00 00 41 b9 09 00 00 00 41 b8 01 00 00 00 be 10 00 00 00 <48> 8b 87 88 1e 00 00 48 8b 40 40 e8 b3 6b b4 d8 48 89 ef 41 89 c4
> >>   RSP: 0018:ffffb760c046bdb8 EFLAGS: 00010286
> >>   RAX: ffff935038ea4550 RBX: ffff935046778000 RCX: 0000000000000007
> >>   RDX: ffff935038ea4550 RSI: 0000000000000010 RDI: 0000000000000000
> >>   RBP: ffff935038ea4550 R08: 0000000000000001 R09: 0000000000000009
> >>   R10: 000000000000e011 R11: 0000000000000001 R12: ffff9350467780e8
> >>   R13: ffff935046778000 R14: 0000000000000000 R15: ffff935046778070
> >>   FS:  0000000000000000(0000) GS:ffff935054e00000(0000) knlGS:0000000000000000
> >>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>   CR2: 0000000000001e88 CR3: 000000075a612002 CR4: 00000000003606e0
> >>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>   Call Trace:
> >>    0xffffffffc02ba2f7
> >>    ? process_one_work+0x1b1/0x560
> >>    process_one_work+0x234/0x560
> >>    worker_thread+0x50/0x3b0
> >>    kthread+0x10a/0x140
> >>    ? process_one_work+0x560/0x560
> >>    ? kthread_park+0x80/0x80
> >>    ret_from_fork+0x3a/0x50
> >>   Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep vfat fat btusb btrtl btbcm btintel bluetooth intel_rapl_msr ecdh_generic rfkill ecc snd_usb_audio snd_usbmidi_lib intel_rapl_common snd_rawmidi mc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support mei_wdt mei_hdcp ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_generic crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ghash_clmulni_intel intel_cstate snd_hda_intel snd_hda_codec intel_uncore snd_hda_core snd_hwdep intel_rapl_perf snd_seq snd_seq_device snd_pcm snd_timer intel_wmi_thunderbolt snd e1000e soundcore mxm_wmi i2c_i801 bfq mei_me mei intel_pch_thermal parport_pc parport acpi_pad binfmt_misc hid_lg_g15(E) hid_logitech_dj(E) i915 crc32c_intel i2c_algo_bit drm_kms_helper nvme nvme_core drm wmi video uas usb_storage i2c_dev
> >>   CR2: 0000000000001e88
> >>   ---[ end trace 1d3f8afdcfcbd842 ]---
> >>
> >> 2) Even if we were to fix 1. by making sure the work is stopped before
> >> failing probe, failing probe is the wrong thing to do, we have
> >> logi_dj_recv_queue_unknown_work to deal with the initial
> >> logi_dj_recv_query_paired_devices failure.
> >>
> >> Rather then error-ing out of the probe, causing the receiver to not work at
> >> all we should rely on this, so that the attached devices will get properly
> >> enumerated once the KVM focus is switched back.
> >>
> >> Cc: stable@vger.kernel.org
> >> Fixes: 74808f9115ce ("HID: logitech-dj: add support for non unifying receivers")
> >> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> >> ---
> >
> > Patch looks good to me, but doesn't logi_dj_recv_switch_to_dj_mode()
> > has the same potential issue?
>
> logi_dj_recv_query_paired_devices() solicits data being send from the
> device, so we do: hid_device_io_start(hdev); just before calling it.

Right, so the patch is just about fixing the workqueue item (and a
little bit of the device too)

[/me tries to understand the KVM implications]

>
> logi_dj_recv_switch_to_dj_mode() is done before the hid_device_io_start()
> so it cannot cause the work to get queued.
>
> Also logi_dj_recv_switch_to_dj_mode() failing is something we cannot
> recover from.

There is one thing I do not get.
When the KVM hadn't the focus on the device, how can it not forward
reports when you can actually call logi_dj_recv_switch_to_dj_mode()?
Does it present the device to all the hosts connected to it and just
filter the input reports to the "focused" one, or is it something
different?

And in the case logi_dj_recv_switch_to_dj_mode() failed, on a kvm it
should not have much implications, because as long as one host
converts the receiver to the DJ mode and the receiver keeps being
powered on, it won't change back to the non DJ mode...

Anyway, I have queued the patch locally for testing, and will push it soon.

Cheers,
Benjamin

> >>   drivers/hid/hid-logitech-dj.c | 10 +++++-----
> >>   1 file changed, 5 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
> >> index cc47f948c1d0..7badbaa18878 100644
> >> --- a/drivers/hid/hid-logitech-dj.c
> >> +++ b/drivers/hid/hid-logitech-dj.c
> >> @@ -1734,14 +1734,14 @@ static int logi_dj_probe(struct hid_device *hdev,
> >>                  if (retval < 0) {
> >>                          hid_err(hdev, "%s: logi_dj_recv_query_paired_devices error:%d\n",
> >>                                  __func__, retval);
> >> -                       goto logi_dj_recv_query_paired_devices_failed;
> >> +                       /*
> >> +                        * This can happen with a KVM, let the probe succeed,
> >> +                        * logi_dj_recv_queue_unknown_work will retry later.
> >> +                        */
> >>                  }
> >>          }
> >>
> >> -       return retval;
> >> -
> >> -logi_dj_recv_query_paired_devices_failed:
> >> -       hid_hw_close(hdev);
> >> +       return 0;
> >>
> >>   llopen_failed:
> >>   switch_to_dj_mode_fail:
> >> --
> >> 2.23.0
> >>
Benjamin Tissoires Aug. 26, 2019, 10:01 a.m. UTC | #4
On Mon, Aug 26, 2019 at 11:41 AM Benjamin Tissoires
<benjamin.tissoires@redhat.com> wrote:
>
> On Mon, Aug 26, 2019 at 11:04 AM Hans de Goede <hdegoede@redhat.com> wrote:
> >
> > Hi,
> >
> > On 26-08-19 09:46, Benjamin Tissoires wrote:
> > > Hi Hans,
> > >
> > > On Sun, Aug 25, 2019 at 5:35 PM Hans de Goede <hdegoede@redhat.com> wrote:
> > >>
> > >> Before this commit dj_probe would exit with an error if the initial
> > >> logi_dj_recv_query_paired_devices fails. The initial call may fail
> > >> when the receiver is connected through a kvm and the focus is away.
> > >>
> > >> When the call fails this causes 2 problems:
> > >>
> > >> 1) dj_probe calls logi_dj_recv_query_paired_devices after calling
> > >> hid_device_io_start() so a HID report may have been received in between
> > >> and our delayedwork_callback may be running. It seems that the initial
> > >> logi_dj_recv_query_paired_devices failure happening with some KVMs triggers
> > >> this exact scenario, causing the work-queue to run on free-ed memory,
> > >> leading to:
> > >>
> > >>   BUG: unable to handle page fault for address: 0000000000001e88
> > >>   #PF: supervisor read access in kernel mode
> > >>   #PF: error_code(0x0000) - not-present page
> > >>   PGD 0 P4D 0
> > >>   Oops: 0000 [#1] SMP PTI
> > >>   CPU: 3 PID: 257 Comm: kworker/3:3 Tainted: G           OE     5.3.0-rc5+ #100
> > >>   Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B150M Pro4S/D3, BIOS P7.10 12/06/2016
> > >>   Workqueue: events 0xffffffffc02ba200
> > >>   RIP: 0010:0xffffffffc02ba1bd
> > >>   Code: e8 e8 13 00 d8 48 89 c5 48 85 c0 74 4c 48 8b 7b 10 48 89 ea b9 07 00 00 00 41 b9 09 00 00 00 41 b8 01 00 00 00 be 10 00 00 00 <48> 8b 87 88 1e 00 00 48 8b 40 40 e8 b3 6b b4 d8 48 89 ef 41 89 c4
> > >>   RSP: 0018:ffffb760c046bdb8 EFLAGS: 00010286
> > >>   RAX: ffff935038ea4550 RBX: ffff935046778000 RCX: 0000000000000007
> > >>   RDX: ffff935038ea4550 RSI: 0000000000000010 RDI: 0000000000000000
> > >>   RBP: ffff935038ea4550 R08: 0000000000000001 R09: 0000000000000009
> > >>   R10: 000000000000e011 R11: 0000000000000001 R12: ffff9350467780e8
> > >>   R13: ffff935046778000 R14: 0000000000000000 R15: ffff935046778070
> > >>   FS:  0000000000000000(0000) GS:ffff935054e00000(0000) knlGS:0000000000000000
> > >>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >>   CR2: 0000000000001e88 CR3: 000000075a612002 CR4: 00000000003606e0
> > >>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >>   Call Trace:
> > >>    0xffffffffc02ba2f7
> > >>    ? process_one_work+0x1b1/0x560
> > >>    process_one_work+0x234/0x560
> > >>    worker_thread+0x50/0x3b0
> > >>    kthread+0x10a/0x140
> > >>    ? process_one_work+0x560/0x560
> > >>    ? kthread_park+0x80/0x80
> > >>    ret_from_fork+0x3a/0x50
> > >>   Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep vfat fat btusb btrtl btbcm btintel bluetooth intel_rapl_msr ecdh_generic rfkill ecc snd_usb_audio snd_usbmidi_lib intel_rapl_common snd_rawmidi mc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support mei_wdt mei_hdcp ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_generic crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ghash_clmulni_intel intel_cstate snd_hda_intel snd_hda_codec intel_uncore snd_hda_core snd_hwdep intel_rapl_perf snd_seq snd_seq_device snd_pcm snd_timer intel_wmi_thunderbolt snd e1000e soundcore mxm_wmi i2c_i801 bfq mei_me mei intel_pch_thermal parport_pc parport acpi_pad binfmt_misc hid_lg_g15(E) hid_logitech_dj(E) i915 crc32c_intel i2c_algo_bit drm_kms_helper nvme nvme_core drm wmi video uas usb_storage i2c_dev
> > >>   CR2: 0000000000001e88
> > >>   ---[ end trace 1d3f8afdcfcbd842 ]---
> > >>
> > >> 2) Even if we were to fix 1. by making sure the work is stopped before
> > >> failing probe, failing probe is the wrong thing to do, we have
> > >> logi_dj_recv_queue_unknown_work to deal with the initial
> > >> logi_dj_recv_query_paired_devices failure.
> > >>
> > >> Rather then error-ing out of the probe, causing the receiver to not work at
> > >> all we should rely on this, so that the attached devices will get properly
> > >> enumerated once the KVM focus is switched back.
> > >>
> > >> Cc: stable@vger.kernel.org
> > >> Fixes: 74808f9115ce ("HID: logitech-dj: add support for non unifying receivers")
> > >> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> > >> ---
> > >
> > > Patch looks good to me, but doesn't logi_dj_recv_switch_to_dj_mode()
> > > has the same potential issue?
> >
> > logi_dj_recv_query_paired_devices() solicits data being send from the
> > device, so we do: hid_device_io_start(hdev); just before calling it.
>
> Right, so the patch is just about fixing the workqueue item (and a
> little bit of the device too)
>
> [/me tries to understand the KVM implications]
>
> >
> > logi_dj_recv_switch_to_dj_mode() is done before the hid_device_io_start()
> > so it cannot cause the work to get queued.
> >
> > Also logi_dj_recv_switch_to_dj_mode() failing is something we cannot
> > recover from.
>
> There is one thing I do not get.
> When the KVM hadn't the focus on the device, how can it not forward
> reports when you can actually call logi_dj_recv_switch_to_dj_mode()?
> Does it present the device to all the hosts connected to it and just
> filter the input reports to the "focused" one, or is it something
> different?
>
> And in the case logi_dj_recv_switch_to_dj_mode() failed, on a kvm it
> should not have much implications, because as long as one host
> converts the receiver to the DJ mode and the receiver keeps being
> powered on, it won't change back to the non DJ mode...
>
> Anyway, I have queued the patch locally for testing, and will push it soon.

FWIW, patch is now applied in f-5.3/upstream/fixes

But I still would like to have your input on my KVM questions above :)

Cheers,
Benjamin

>
> Cheers,
> Benjamin
>
> > >>   drivers/hid/hid-logitech-dj.c | 10 +++++-----
> > >>   1 file changed, 5 insertions(+), 5 deletions(-)
> > >>
> > >> diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
> > >> index cc47f948c1d0..7badbaa18878 100644
> > >> --- a/drivers/hid/hid-logitech-dj.c
> > >> +++ b/drivers/hid/hid-logitech-dj.c
> > >> @@ -1734,14 +1734,14 @@ static int logi_dj_probe(struct hid_device *hdev,
> > >>                  if (retval < 0) {
> > >>                          hid_err(hdev, "%s: logi_dj_recv_query_paired_devices error:%d\n",
> > >>                                  __func__, retval);
> > >> -                       goto logi_dj_recv_query_paired_devices_failed;
> > >> +                       /*
> > >> +                        * This can happen with a KVM, let the probe succeed,
> > >> +                        * logi_dj_recv_queue_unknown_work will retry later.
> > >> +                        */
> > >>                  }
> > >>          }
> > >>
> > >> -       return retval;
> > >> -
> > >> -logi_dj_recv_query_paired_devices_failed:
> > >> -       hid_hw_close(hdev);
> > >> +       return 0;
> > >>
> > >>   llopen_failed:
> > >>   switch_to_dj_mode_fail:
> > >> --
> > >> 2.23.0
> > >>
Hans de Goede Aug. 26, 2019, 11:19 a.m. UTC | #5
Hi,

On 26-08-19 11:41, Benjamin Tissoires wrote:
> On Mon, Aug 26, 2019 at 11:04 AM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 26-08-19 09:46, Benjamin Tissoires wrote:
>>> Hi Hans,
>>>
>>> On Sun, Aug 25, 2019 at 5:35 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>>>
>>>> Before this commit dj_probe would exit with an error if the initial
>>>> logi_dj_recv_query_paired_devices fails. The initial call may fail
>>>> when the receiver is connected through a kvm and the focus is away.
>>>>
>>>> When the call fails this causes 2 problems:
>>>>
>>>> 1) dj_probe calls logi_dj_recv_query_paired_devices after calling
>>>> hid_device_io_start() so a HID report may have been received in between
>>>> and our delayedwork_callback may be running. It seems that the initial
>>>> logi_dj_recv_query_paired_devices failure happening with some KVMs triggers
>>>> this exact scenario, causing the work-queue to run on free-ed memory,
>>>> leading to:
>>>>
>>>>    BUG: unable to handle page fault for address: 0000000000001e88
>>>>    #PF: supervisor read access in kernel mode
>>>>    #PF: error_code(0x0000) - not-present page
>>>>    PGD 0 P4D 0
>>>>    Oops: 0000 [#1] SMP PTI
>>>>    CPU: 3 PID: 257 Comm: kworker/3:3 Tainted: G           OE     5.3.0-rc5+ #100
>>>>    Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B150M Pro4S/D3, BIOS P7.10 12/06/2016
>>>>    Workqueue: events 0xffffffffc02ba200
>>>>    RIP: 0010:0xffffffffc02ba1bd
>>>>    Code: e8 e8 13 00 d8 48 89 c5 48 85 c0 74 4c 48 8b 7b 10 48 89 ea b9 07 00 00 00 41 b9 09 00 00 00 41 b8 01 00 00 00 be 10 00 00 00 <48> 8b 87 88 1e 00 00 48 8b 40 40 e8 b3 6b b4 d8 48 89 ef 41 89 c4
>>>>    RSP: 0018:ffffb760c046bdb8 EFLAGS: 00010286
>>>>    RAX: ffff935038ea4550 RBX: ffff935046778000 RCX: 0000000000000007
>>>>    RDX: ffff935038ea4550 RSI: 0000000000000010 RDI: 0000000000000000
>>>>    RBP: ffff935038ea4550 R08: 0000000000000001 R09: 0000000000000009
>>>>    R10: 000000000000e011 R11: 0000000000000001 R12: ffff9350467780e8
>>>>    R13: ffff935046778000 R14: 0000000000000000 R15: ffff935046778070
>>>>    FS:  0000000000000000(0000) GS:ffff935054e00000(0000) knlGS:0000000000000000
>>>>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>    CR2: 0000000000001e88 CR3: 000000075a612002 CR4: 00000000003606e0
>>>>    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>    Call Trace:
>>>>     0xffffffffc02ba2f7
>>>>     ? process_one_work+0x1b1/0x560
>>>>     process_one_work+0x234/0x560
>>>>     worker_thread+0x50/0x3b0
>>>>     kthread+0x10a/0x140
>>>>     ? process_one_work+0x560/0x560
>>>>     ? kthread_park+0x80/0x80
>>>>     ret_from_fork+0x3a/0x50
>>>>    Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep vfat fat btusb btrtl btbcm btintel bluetooth intel_rapl_msr ecdh_generic rfkill ecc snd_usb_audio snd_usbmidi_lib intel_rapl_common snd_rawmidi mc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support mei_wdt mei_hdcp ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_generic crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ghash_clmulni_intel intel_cstate snd_hda_intel snd_hda_codec intel_uncore snd_hda_core snd_hwdep intel_rapl_perf snd_seq snd_seq_device snd_pcm snd_timer intel_wmi_thunderbolt snd e1000e soundcore mxm_wmi i2c_i801 bfq mei_me mei intel_pch_thermal parport_pc parport acpi_pad binfmt_misc hid_lg_g15(E) hid_logitech_dj(E) i915 crc32c_intel i2c_algo_bit drm_kms_helper nvme nvme_core drm wmi video uas usb_storage i2c_dev
>>>>    CR2: 0000000000001e88
>>>>    ---[ end trace 1d3f8afdcfcbd842 ]---
>>>>
>>>> 2) Even if we were to fix 1. by making sure the work is stopped before
>>>> failing probe, failing probe is the wrong thing to do, we have
>>>> logi_dj_recv_queue_unknown_work to deal with the initial
>>>> logi_dj_recv_query_paired_devices failure.
>>>>
>>>> Rather then error-ing out of the probe, causing the receiver to not work at
>>>> all we should rely on this, so that the attached devices will get properly
>>>> enumerated once the KVM focus is switched back.
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 74808f9115ce ("HID: logitech-dj: add support for non unifying receivers")
>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>> ---
>>>
>>> Patch looks good to me, but doesn't logi_dj_recv_switch_to_dj_mode()
>>> has the same potential issue?
>>
>> logi_dj_recv_query_paired_devices() solicits data being send from the
>> device, so we do: hid_device_io_start(hdev); just before calling it.
> 
> Right, so the patch is just about fixing the workqueue item (and a
> little bit of the device too)
> 
> [/me tries to understand the KVM implications]
> 
>>
>> logi_dj_recv_switch_to_dj_mode() is done before the hid_device_io_start()
>> so it cannot cause the work to get queued.
>>
>> Also logi_dj_recv_switch_to_dj_mode() failing is something we cannot
>> recover from.
> 
> There is one thing I do not get.
> When the KVM hadn't the focus on the device, how can it not forward
> reports when you can actually call logi_dj_recv_switch_to_dj_mode()?
> Does it present the device to all the hosts connected to it and just
> filter the input reports to the "focused" one, or is it something
> different?

I honestly do not know, I have tested with this exact same kvm before and
then the logi_dj_recv_query_paired_devices() call worked, but the
input-reports send in response to it where forwarded to the wrong machine.

And later when focus got restored the unknown handler would redo
logi_dj_recv_query_paired_devices() and all was well.

For some reason, yesterday when I hit this bug and it crashes my workstation
as well as another machine attached to the KVM also running 5.3-rc5, I am now
getting -32 (EPIPE / USB endpoint stalled) errors from the
logi_dj_recv_switch_to_dj_mode() call. I think it may have something TODO
with which of the 2 USB ports on the KVM for mouse and kbd the dongle is in,
but I have not confirmed this yet.

Also I did do a firmware update of the KVM a while ago, I would like to think
that that was done before my previous round of testing. But I'm not sure.

TL;DR: The logi_dj_recv_query_paired_devices() call failing with EPIPE
when the focus is switched away is new and I do not know why this is
happening, but it does crash my machine(s) without the fix.

logi_dj_recv_switch_to_dj_mode() OTOH seems to not care where the focus is...

> And in the case logi_dj_recv_switch_to_dj_mode() failed, on a kvm it
> should not have much implications, because as long as one host
> converts the receiver to the DJ mode and the receiver keeps being
> powered on, it won't change back to the non DJ mode...

True.

> Anyway, I have queued the patch locally for testing, and will push it soon.

Thanks.

Regards,

Hans



>>>>    drivers/hid/hid-logitech-dj.c | 10 +++++-----
>>>>    1 file changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
>>>> index cc47f948c1d0..7badbaa18878 100644
>>>> --- a/drivers/hid/hid-logitech-dj.c
>>>> +++ b/drivers/hid/hid-logitech-dj.c
>>>> @@ -1734,14 +1734,14 @@ static int logi_dj_probe(struct hid_device *hdev,
>>>>                   if (retval < 0) {
>>>>                           hid_err(hdev, "%s: logi_dj_recv_query_paired_devices error:%d\n",
>>>>                                   __func__, retval);
>>>> -                       goto logi_dj_recv_query_paired_devices_failed;
>>>> +                       /*
>>>> +                        * This can happen with a KVM, let the probe succeed,
>>>> +                        * logi_dj_recv_queue_unknown_work will retry later.
>>>> +                        */
>>>>                   }
>>>>           }
>>>>
>>>> -       return retval;
>>>> -
>>>> -logi_dj_recv_query_paired_devices_failed:
>>>> -       hid_hw_close(hdev);
>>>> +       return 0;
>>>>
>>>>    llopen_failed:
>>>>    switch_to_dj_mode_fail:
>>>> --
>>>> 2.23.0
>>>>
diff mbox series

Patch

diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
index cc47f948c1d0..7badbaa18878 100644
--- a/drivers/hid/hid-logitech-dj.c
+++ b/drivers/hid/hid-logitech-dj.c
@@ -1734,14 +1734,14 @@  static int logi_dj_probe(struct hid_device *hdev,
 		if (retval < 0) {
 			hid_err(hdev, "%s: logi_dj_recv_query_paired_devices error:%d\n",
 				__func__, retval);
-			goto logi_dj_recv_query_paired_devices_failed;
+			/*
+			 * This can happen with a KVM, let the probe succeed,
+			 * logi_dj_recv_queue_unknown_work will retry later.
+			 */
 		}
 	}
 
-	return retval;
-
-logi_dj_recv_query_paired_devices_failed:
-	hid_hw_close(hdev);
+	return 0;
 
 llopen_failed:
 switch_to_dj_mode_fail: