diff mbox

NULL pointer dereference in xhci_suspend

Message ID 09a9f943-f93b-a644-d049-8c3c54253efe@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mathias Nyman April 19, 2018, 11:17 a.m. UTC
On 18.04.2018 20:30, Nazar Mokrynskyi wrote:
> Hi everyone, I'm running a VM using libvirt on kernel 4.15 (Ubuntu 18.04).
> 
> ASMedia USB 3.1 Gen 2 controller is assigned to VM like this:
> 
>      <hostdev mode='subsystem' type='pci' managed='yes'>
>        <source>
>          <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
>        </source>
>        <rom bar='off'/>
>        <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
>      </hostdev>
> 
> Occasionally VM start fails with following error and I'm unable to start a VM anymore until reboot:
> 
> [40061.976095] BUG: unable to handle kernel NULL pointer dereference at 0000000000000228
> [40061.976107] IP: xhci_suspend+0x3a/0x490
> [40061.976109] PGD 0 P4D 0
> [40061.976115] Oops: 0000 [#1] SMP PTI
> [40061.976118] Modules linked in: cpuid vhost_net vhost tap vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter binfmt_misc snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate nls_iso8859_1 mxm_wmi snd_hda_intel snd_usb_audio snd_hda_codec snd_usbmidi_lib snd_hda_core snd_rawmidi snd_hwdep snd_seq_device joydev input_leds intel_rapl_perf snd_pcm cdc_acm snd_timer snd mei_me soundcore mei shpchp wmi nvidia_uvm(POE) acpi_pad mac_hid sch_fq_codel lp parport bfq ip_tables
> [40061.976185]  x_tables autofs4 btrfs xor zstd_compress raid6_pq algif_skcipher af_alg dm_crypt hid_generic usbhid hid uas usb_storage crct10dif_pclmul crc32_pclmul nvidia_drm(POE) ghash_clmulni_intel nvidia_modeset(POE) pcbc i915 nvidia(POE) aesni_intel aes_x86_64 crypto_simd i2c_algo_bit glue_helper cryptd drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ptp nvme pps_core drm nvme_core ahci ipmi_devintf libahci ipmi_msghandler video
> [40061.976230] CPU: 7 PID: 29745 Comm: kworker/7:0 Tainted: P           OE    4.15.0-17-generic #18-Ubuntu
> [40061.976232] Hardware name: Micro-Star International Co., Ltd. MS-7B45/Z370 GAMING PRO CARBON (MS-7B45), BIOS A.51 03/29/2018
> [40061.976239] Workqueue: pm pm_runtime_work
> [40061.976245] RIP: 0010:xhci_suspend+0x3a/0x490
> [40061.976248] RSP: 0018:ffffa5dca0747c40 EFLAGS: 00010246
> [40061.976252] RAX: 00000000ffffffea RBX: ffff913dc3106000 RCX: 0000000000000000
> [40061.976254] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff913dc3106230
> [40061.976257] RBP: ffffa5dca0747c78 R08: 0000000000000000 R09: ffffa5dca0747db8
> [40061.976259] R10: 0000000000000000 R11: 00000000000001cc R12: ffff913dc3106000
> [40061.976262] R13: ffff913dc3106000 R14: ffff913dc3106230 R15: ffffffffb86efaa0
> [40061.976265] FS:  0000000000000000(0000) GS:ffff913e2e7c0000(0000) knlGS:0000000000000000
> [40061.976268] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [40061.976271] CR2: 0000000000000228 CR3: 000000031460a003 CR4: 00000000003606e0
> [40061.976274] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [40061.976276] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [40061.976278] Call Trace:
> [40061.976286]  ? pci_pm_runtime_resume+0xa0/0xa0
> [40061.976292]  xhci_pci_suspend+0x5a/0xd0
> [40061.976297]  suspend_common+0x85/0x160
> [40061.976302]  hcd_pci_runtime_suspend+0x1b/0x50
> [40061.976306]  pci_pm_runtime_suspend+0x64/0x180
> [40061.976311]  ? pci_pm_runtime_resume+0xa0/0xa0
> [40061.976315]  __rpm_callback+0xca/0x210
> [40061.976320]  ? __switch_to_asm+0x34/0x70
> [40061.976324]  ? __switch_to_asm+0x40/0x70
> [40061.976329]  rpm_callback+0x24/0x80
> [40061.976333]  ? pci_pm_runtime_resume+0xa0/0xa0
> [40061.976338]  rpm_suspend+0x137/0x640
> [40061.976343]  rpm_idle+0x58/0x2a0
> [40061.976348]  pm_runtime_work+0x92/0xa0
> [40061.976353]  process_one_work+0x1de/0x410
> [40061.976357]  worker_thread+0x32/0x410
> [40061.976362]  kthread+0x121/0x140
> [40061.976366]  ? process_one_work+0x410/0x410
> [40061.976371]  ? kthread_create_worker_on_cpu+0x70/0x70
> [40061.976376]  ? do_syscall_64+0x73/0x130
> [40061.976380]  ? SyS_exit_group+0x14/0x20
> [40061.976383]  ret_from_fork+0x35/0x40
> [40061.976387] Code: 41 54 53 48 83 ec 10 4c 8b 2f 41 8b 85 28 02 00 00 85 c0 0f 84 d1 01 00 00 83 f8 04 0f 85 51 04 00 00 48 8b 57 08 b8 ea ff ff ff <83> ba 28 02 00 00 04 0f 85 04 01 00 00 41 89 f6 49 89 fc e8 9e
> [40061.976443] RIP: xhci_suspend+0x3a/0x490 RSP: ffffa5dca0747c40
> [40061.976445] CR2: 0000000000000228
> [40061.976449] ---[ end trace 5f17622537522ef8 ]---
> 
> Nothing was attached to this controller when error happened.

Looks like xhci controller was runtime suspended before everything was set up.

Normally PCI core will enable runtime pm, forbid it, and increase usage counter (0 -> 1) before pci driver probe is called.
xhci_pci_probe() increases usage count again (1 -> 2) before creating two hcds.
the first hcd is created using  usb_hcd_pci_probe() which decreases usage (2 -> 1),
xhci_pci_probe() creates and adds second hcd manually, after which it decreases usage (1 -> 0)

I'm not sure how running a VM affects this, but in case usage counter isn't already increased
before xhci_pci_probe() is called then we might runtime suspend after the first hcd is added,
but before second one is set up.

To know what is going on could you try adding the below code, and see if you get any
new messages in dmesg.

  diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 1aad89b..b16c634 100644


Thanks
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Nazar Mokrynskyi April 19, 2018, 6:29 p.m. UTC | #1
19.04.18 14:17, Mathias Nyman пише:
> On 18.04.2018 20:30, Nazar Mokrynskyi wrote:
>> Hi everyone, I'm running a VM using libvirt on kernel 4.15 (Ubuntu 18.04).
>>
>> ASMedia USB 3.1 Gen 2 controller is assigned to VM like this:
>>
>>      <hostdev mode='subsystem' type='pci' managed='yes'>
>>        <source>
>>          <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
>>        </source>
>>        <rom bar='off'/>
>>        <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
>>      </hostdev>
>>
>> Occasionally VM start fails with following error and I'm unable to start a VM anymore until reboot:
>>
>> [40061.976095] BUG: unable to handle kernel NULL pointer dereference at 0000000000000228
>> [40061.976107] IP: xhci_suspend+0x3a/0x490
>> [40061.976109] PGD 0 P4D 0
>> [40061.976115] Oops: 0000 [#1] SMP PTI
>> [40061.976118] Modules linked in: cpuid vhost_net vhost tap vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter binfmt_misc snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate nls_iso8859_1 mxm_wmi snd_hda_intel snd_usb_audio snd_hda_codec snd_usbmidi_lib snd_hda_core snd_rawmidi snd_hwdep snd_seq_device joydev input_leds intel_rapl_perf snd_pcm cdc_acm snd_timer snd mei_me soundcore mei shpchp wmi nvidia_uvm(POE) acpi_pad mac_hid sch_fq_codel lp parport bfq ip_tables
>> [40061.976185]  x_tables autofs4 btrfs xor zstd_compress raid6_pq algif_skcipher af_alg dm_crypt hid_generic usbhid hid uas usb_storage crct10dif_pclmul crc32_pclmul nvidia_drm(POE) ghash_clmulni_intel nvidia_modeset(POE) pcbc i915 nvidia(POE) aesni_intel aes_x86_64 crypto_simd i2c_algo_bit glue_helper cryptd drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ptp nvme pps_core drm nvme_core ahci ipmi_devintf libahci ipmi_msghandler video
>> [40061.976230] CPU: 7 PID: 29745 Comm: kworker/7:0 Tainted: P           OE    4.15.0-17-generic #18-Ubuntu
>> [40061.976232] Hardware name: Micro-Star International Co., Ltd. MS-7B45/Z370 GAMING PRO CARBON (MS-7B45), BIOS A.51 03/29/2018
>> [40061.976239] Workqueue: pm pm_runtime_work
>> [40061.976245] RIP: 0010:xhci_suspend+0x3a/0x490
>> [40061.976248] RSP: 0018:ffffa5dca0747c40 EFLAGS: 00010246
>> [40061.976252] RAX: 00000000ffffffea RBX: ffff913dc3106000 RCX: 0000000000000000
>> [40061.976254] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff913dc3106230
>> [40061.976257] RBP: ffffa5dca0747c78 R08: 0000000000000000 R09: ffffa5dca0747db8
>> [40061.976259] R10: 0000000000000000 R11: 00000000000001cc R12: ffff913dc3106000
>> [40061.976262] R13: ffff913dc3106000 R14: ffff913dc3106230 R15: ffffffffb86efaa0
>> [40061.976265] FS:  0000000000000000(0000) GS:ffff913e2e7c0000(0000) knlGS:0000000000000000
>> [40061.976268] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [40061.976271] CR2: 0000000000000228 CR3: 000000031460a003 CR4: 00000000003606e0
>> [40061.976274] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [40061.976276] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [40061.976278] Call Trace:
>> [40061.976286]  ? pci_pm_runtime_resume+0xa0/0xa0
>> [40061.976292]  xhci_pci_suspend+0x5a/0xd0
>> [40061.976297]  suspend_common+0x85/0x160
>> [40061.976302]  hcd_pci_runtime_suspend+0x1b/0x50
>> [40061.976306]  pci_pm_runtime_suspend+0x64/0x180
>> [40061.976311]  ? pci_pm_runtime_resume+0xa0/0xa0
>> [40061.976315]  __rpm_callback+0xca/0x210
>> [40061.976320]  ? __switch_to_asm+0x34/0x70
>> [40061.976324]  ? __switch_to_asm+0x40/0x70
>> [40061.976329]  rpm_callback+0x24/0x80
>> [40061.976333]  ? pci_pm_runtime_resume+0xa0/0xa0
>> [40061.976338]  rpm_suspend+0x137/0x640
>> [40061.976343]  rpm_idle+0x58/0x2a0
>> [40061.976348]  pm_runtime_work+0x92/0xa0
>> [40061.976353]  process_one_work+0x1de/0x410
>> [40061.976357]  worker_thread+0x32/0x410
>> [40061.976362]  kthread+0x121/0x140
>> [40061.976366]  ? process_one_work+0x410/0x410
>> [40061.976371]  ? kthread_create_worker_on_cpu+0x70/0x70
>> [40061.976376]  ? do_syscall_64+0x73/0x130
>> [40061.976380]  ? SyS_exit_group+0x14/0x20
>> [40061.976383]  ret_from_fork+0x35/0x40
>> [40061.976387] Code: 41 54 53 48 83 ec 10 4c 8b 2f 41 8b 85 28 02 00 00 85 c0 0f 84 d1 01 00 00 83 f8 04 0f 85 51 04 00 00 48 8b 57 08 b8 ea ff ff ff <83> ba 28 02 00 00 04 0f 85 04 01 00 00 41 89 f6 49 89 fc e8 9e
>> [40061.976443] RIP: xhci_suspend+0x3a/0x490 RSP: ffffa5dca0747c40
>> [40061.976445] CR2: 0000000000000228
>> [40061.976449] ---[ end trace 5f17622537522ef8 ]---
>>
>> Nothing was attached to this controller when error happened.
>
> Looks like xhci controller was runtime suspended before everything was set up.
>
> Normally PCI core will enable runtime pm, forbid it, and increase usage counter (0 -> 1) before pci driver probe is called.
> xhci_pci_probe() increases usage count again (1 -> 2) before creating two hcds.
> the first hcd is created using  usb_hcd_pci_probe() which decreases usage (2 -> 1),
> xhci_pci_probe() creates and adds second hcd manually, after which it decreases usage (1 -> 0)
>
> I'm not sure how running a VM affects this, but in case usage counter isn't already increased
> before xhci_pci_probe() is called then we might runtime suspend after the first hcd is added,
> but before second one is set up.
>
> To know what is going on could you try adding the below code, and see if you get any
> new messages in dmesg.
>
>  diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
> index 1aad89b..b16c634 100644
> --- a/drivers/usb/host/xhci-pci.c
> +++ b/drivers/usb/host/xhci-pci.c
> @@ -317,6 +317,7 @@ static int xhci_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
>  dealloc_usb2_hcd:
>         usb_hcd_pci_remove(dev);
>  put_runtime_pm:
> +       dev_info(&dev->dev, "SUSPEND: FAIL in probe, still decrease usage");
>         pm_runtime_put_noidle(&dev->dev);
>         return retval;
>  }
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index da6dbe3..98bec53 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -863,8 +863,15 @@ int xhci_suspend(struct xhci_hcd *xhci, bool do_wakeup)
>         struct usb_hcd          *hcd = xhci_to_hcd(xhci);
>         u32                     command;
>  
> -       if (!hcd->state)
> +       if (!hcd || !hcd->state) {
> +               xhci_err(xhci, "SUSPEND: FAIL No hcd or hcd->state\n");
>                 return 0;
> +       }
> +
> +       if (!xhci->shared_hcd || !xhci->shared_hcd->state) {
> +               xhci_err(xhci, "SUSPEND: FAIL No shared hcd or state\n");
> +               return 0;
> +       }
>  
>         if (hcd->state != HC_STATE_SUSPENDED ||
>                         xhci->shared_hcd->state != HC_STATE_SUSPENDED)
>
>
> Thanks
> Mathias

Thanks for quick response!

I'll recompile latest development kernel with mentioned patch and let you know when it happens again.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -317,6 +317,7 @@  static int xhci_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
  dealloc_usb2_hcd:
         usb_hcd_pci_remove(dev);
  put_runtime_pm:
+       dev_info(&dev->dev, "SUSPEND: FAIL in probe, still decrease usage");
         pm_runtime_put_noidle(&dev->dev);
         return retval;
  }
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index da6dbe3..98bec53 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -863,8 +863,15 @@  int xhci_suspend(struct xhci_hcd *xhci, bool do_wakeup)
         struct usb_hcd          *hcd = xhci_to_hcd(xhci);
         u32                     command;
  
-       if (!hcd->state)
+       if (!hcd || !hcd->state) {
+               xhci_err(xhci, "SUSPEND: FAIL No hcd or hcd->state\n");
                 return 0;
+       }
+
+       if (!xhci->shared_hcd || !xhci->shared_hcd->state) {
+               xhci_err(xhci, "SUSPEND: FAIL No shared hcd or state\n");
+               return 0;
+       }
  
         if (hcd->state != HC_STATE_SUSPENDED ||
                         xhci->shared_hcd->state != HC_STATE_SUSPENDED)