Message ID | 1595419068-4812-1-git-send-email-WeitaoWang-oc@zhaoxin.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci | expand |
On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote: > This bug is found in Zhaoxin platform, but it's a commom code bug. > Fail sequence: > step1: Unbind UHCI controller from native driver; > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one vfio > group's device list and set UHCI's dev->driver_data to struct vfio-pci(for UHCI) Hah, that works? How do you do that properly? What code does that? > step3: Unbind EHCI controller from native driver, will try to tell UHCI native driver > that "I'm removed by set companion_hcd->self.hs_companion to NULL. However, > companion_hcd get from UHCI's dev->driver_data that has modified by vfio-pci > already.So, the vfio-pci structure will be damaged! Because you are messing around with bind/unbind "by hand", which is never guaranteed to work properly. It's amazing any of that works at all... > step4: Bind EHCI controller to vfio-pci driver, which will put EHCI controller in the > same vfio group as UHCI controller; > ... ... > step5: Unbind UHCI controller from vfio-pci, which will delete UHCI from vfio group' > device list that has been damaged in step 3. So,delete operation can random > result into a NULL pointer dereference with the below stack dump. > step6: Bind UHCI controller to native driver; > step7: Unbind EHCI controller from vfio-pci, which will try to remove EHCI controller > from the vfio group; > step8: Bind EHCI controller to native driver; > > [ 929.114641] uhci_hcd 0000:00:10.0: remove, state 1 > [ 929.114652] usb usb1: USB disconnect, device number 1 > [ 929.114655] usb 1-1: USB disconnect, device number 2 > [ 929.270313] usb 1-2: USB disconnect, device number 3 > [ 929.318404] uhci_hcd 0000:00:10.0: USB bus 1 deregistered > [ 929.343029] uhci_hcd 0000:00:10.1: remove, state 4 > [ 929.343045] usb usb3: USB disconnect, device number 1 > [ 929.343685] uhci_hcd 0000:00:10.1: USB bus 3 deregistered > [ 929.369087] ehci-pci 0000:00:10.7: remove, state 4 > [ 929.369102] usb usb4: USB disconnect, device number 1 > [ 929.370325] ehci-pci 0000:00:10.7: USB bus 4 deregistered > [ 932.398494] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > [ 932.398496] PGD 42a67d067 P4D 42a67d067 PUD 42a65f067 PMD 0 > [ 932.398502] Oops: 0002 [#2] SMP NOPTI > [ 932.398505] CPU: 2 PID: 7824 Comm: vfio_unbind.sh Tainted: P D 4.19.65-2020051917-rainos #1 4.19 is a bit old, what about 5.7? > [ 932.398506] Hardware name: Shanghai Zhaoxin Semiconductor Co., Ltd. HX002EH/HX002EH, > BIOS HX002EH0_01_R480_R_200408 04/08/2020 > [ 932.398513] RIP: 0010:vfio_device_put+0x31/0xa0 [vfio] > [ 932.398515] Code: 89 e5 41 54 53 4c 8b 67 18 48 89 fb 49 8d 74 24 30 e8 e3 0e f3 de > 84 c0 74 67 48 8b 53 20 48 8b 43 28 48 8b 7b 18 48 89 42 08 <48> 89 10 > 48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 b8 00 02 00 > [ 932.398516] RSP: 0018:ffffbbfd04cffc18 EFLAGS: 00010202 > [ 932.398518] RAX: 0000000000000000 RBX: ffff92c7ea717880 RCX: 0000000000000000 > [ 932.398519] RDX: ffff92c7ea713620 RSI: ffff92c7ea713630 RDI: ffff92c7ea713600 > [ 932.398521] RBP: ffffbbfd04cffc28 R08: ffff92c7f02a8080 R09: ffff92c7efc03980 > [ 932.398522] R10: ffffbbfd04cff9a8 R11: 0000000000000000 R12: ffff92c7ea713600 > [ 932.398523] R13: ffff92c7ed8bb0a8 R14: ffff92c7ea717880 R15: 0000000000000000 > [ 932.398525] FS: 00007f3031500740(0000) GS:ffff92c7f0280000(0000) knlGS:0000000000000000 > [ 932.398526] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 932.398527] CR2: 0000000000000000 CR3: 0000000428626004 CR4: 0000000000160ee0 > [ 932.398528] Call Trace: > [ 932.398534] vfio_del_group_dev+0xe8/0x2a0 [vfio] > [ 932.398539] ? __blocking_notifier_call_chain+0x52/0x60 > [ 932.398542] ? do_wait_intr_irq+0x90/0x90 > [ 932.398546] ? iommu_bus_notifier+0x75/0x100 > [ 932.398551] vfio_pci_remove+0x20/0xa0 [vfio_pci] > [ 932.398554] pci_device_remove+0x3e/0xc0 > [ 932.398557] device_release_driver_internal+0x17a/0x240 > [ 932.398560] device_release_driver+0x12/0x20 > [ 932.398561] unbind_store+0xee/0x180 > [ 932.398564] drv_attr_store+0x27/0x40 > [ 932.398567] sysfs_kf_write+0x3c/0x50 > [ 932.398568] kernfs_fop_write+0x125/0x1a0 > [ 932.398572] __vfs_write+0x3a/0x190 > [ 932.398575] ? apparmor_file_permission+0x1a/0x20 > [ 932.398577] ? security_file_permission+0x3b/0xc0 > [ 932.398581] ? _cond_resched+0x1a/0x50 > [ 932.398582] vfs_write+0xb8/0x1b0 > [ 932.398584] ksys_write+0x5c/0xe0 > [ 932.398586] __x64_sys_write+0x1a/0x20 > [ 932.398589] do_syscall_64+0x5a/0x110 > [ 932.398592] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Using virt-manager/qemu to boot guest os, we can see the same fail sequence! > > Fix this by check for UHCI driver loaded or not before modifiy UHCI's dev->driver_data, > which will happen in ehci native driver probe/remove. > > Signed-off-by: WeitaoWangoc <WeitaoWang-oc@zhaoxin.com> > --- > drivers/usb/core/hcd-pci.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c > index 1547aa6..484f2a0 100644 > --- a/drivers/usb/core/hcd-pci.c > +++ b/drivers/usb/core/hcd-pci.c > @@ -34,6 +34,7 @@ static DECLARE_RWSEM(companions_rwsem); > #define CL_OHCI PCI_CLASS_SERIAL_USB_OHCI > #define CL_EHCI PCI_CLASS_SERIAL_USB_EHCI > > +#define PCI_DEV_DRV_FLAG 2 Why is the USB core allowed to touch a private PCI structure field? I do not think this is allowed. And why is this a USB host controller-specific issue, shouldn't this type of thing be fixed in the PCI core? thanks, greg k-h
On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote: > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote: > > This bug is found in Zhaoxin platform, but it's a commom code bug. > > Fail sequence: > > step1: Unbind UHCI controller from native driver; > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one vfio > > group's device list and set UHCI's dev->driver_data to struct vfio-pci(for UHCI) > > Hah, that works? How do you do that properly? What code does that? Yeah, that can't possibly work. The USB core expects that any host controller device (or at least, any PCI host controller device) has its driver_data set to point to a struct usb_hcd. It doesn't expect a host controller to be bound to anything other than a host controller driver. Things could easily go very wrong here. For example, suppose at this point the ehci-hcd driver just happens to bind to the EHCI controller. When this happens, the EHCI controller hardware takes over all the USB connections that were routed to the UHCI controller. How will vfio-pci deal with that? Pretty ungracefully, I imagine. The only way to make this work at all is to unbind both uhci-hcd and ehci-hcd first. Then after both are finished you can safely bind vfio-pci to the EHCI controller and the UHCI controllers (in that order). Alan Stern
On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote: > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote: > > This bug is found in Zhaoxin platform, but it's a commom code bug. > > Fail sequence: > > step1: Unbind UHCI controller from native driver; > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one > vfio > > group's device list and set UHCI's dev->driver_data to struct > vfio-pci(for UHCI) > > Hah, that works? How do you do that properly? What code does that? Drivers/vfio/vfio.c The function vfio_group_create_device will set UHCI dev->driver_data to vfio-device struct. > > step3: Unbind EHCI controller from native driver, will try to tell UHCI native > driver > > that "I'm removed by set companion_hcd->self.hs_companion to > NULL. However, > > companion_hcd get from UHCI's dev->driver_data that has modified > by vfio-pci > > already.So, the vfio-pci structure will be damaged! > > Because you are messing around with bind/unbind "by hand", which is > never guaranteed to work properly. > > It's amazing any of that works at all... If adjust the sequence of UHCI/EHCI unbind form native driver, that can avoid Null Pointer dereference. Eg. Step3-->stet4-->step1-->step2. Our experiments prove that this sequence is indeed good as expected. However, some application software such as virt-manager/qemu assign /EHCI to guest OS has the same bind/unbind sequence as test “by hand”. > 4.19.65-2020051917-rainos #1 > > 4.19 is a bit old, what about 5.7? 5.7.7 has the same issue. > > +#define PCI_DEV_DRV_FLAG 2 > > Why is the USB core allowed to touch a private PCI structure field? > > I do not think this is allowed. > > And why is this a USB host controller-specific issue, shouldn't this > type of thing be fixed in the PCI core? When ehci hcd_driver bind or unbind to ehci, it will touch/modify UHCI usb_hcd that get from UHCI's dev->driver_data. However, UHCI maybe bind to a another driver(vfio-pci) and it's driver_data will be rewritten. what we should do is to prevent ehci_hcd to modify UHCI's dev->driver_data when UHCI has already bound to vfio-pci. Thanks weitaowang 保密声明: 本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。 CONFIDENTIAL NOTE: This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited.
On , Jul 22, 2020 at 02:44:14PM +0200, Alan wrote: > On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote: > > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote: > > > This bug is found in Zhaoxin platform, but it's a commom code bug. > > > Fail sequence: > > > step1: Unbind UHCI controller from native driver; > > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one > vfio > > > group's device list and set UHCI's dev->driver_data to struct > vfio-pci(for UHCI) > > > > Hah, that works? How do you do that properly? What code does that? > > Yeah, that can't possibly work. The USB core expects that any host > controller device (or at least, any PCI host controller device) has its > driver_data set to point to a struct usb_hcd. It doesn't expect a host > controller to be bound to anything other than a host controller driver. > > Things could easily go very wrong here. For example, suppose at this > point the ehci-hcd driver just happens to bind to the EHCI controller. > When this happens, the EHCI controller hardware takes over all the USB > connections that were routed to the UHCI controller. How will vfio-pci > deal with that? Pretty ungracefully, I imagine. > > The only way to make this work at all is to unbind both uhci-hcd and > ehci-hcd first. Then after both are finished you can safely bind > vfio-pci to the EHCI controller and the UHCI controllers (in that > order). > I'm agree with you, unbind both uhci-hcd and ehci-hcd first then bind to vfio-pci is a more reasonable sequence. Our experiments prove that this sequence is indeed good as expected. However, I did not find a formal document to prescribe this order. Unfortunately, some application software such as virt-manager/qemu assign UHCI/EHCI to guest OS has the same bind/unbind sequence as test “by hand”. Do we need to consider compatibility with this application scenario? The following log is captured when starting then shutdown the virtual machine. /* starting virtual machine*/ [ 2785.250001] audit: type=1400 audit(1594375837.191:48): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2008 comm="apparmor_parser" [ 2785.467510] uhci_hcd 0000:00:10.0: remove, state 4 [ 2785.472426] usb usb1: USB disconnect, device number 1 /*bind 0000:00:10.0 to vfio-pci*/ [ 2785.478798] uhci_hcd 0000:00:10.0: USB bus 1 deregistered [ 2785.568741] uhci_hcd 0000:00:10.1: remove, state 1 [ 2785.573562] usb usb2: USB disconnect, device number 1 [ 2785.578793] usb 2-2: USB disconnect, device number 3 [ 2785.758016] uhci_hcd 0000:00:10.1: USB bus 2 deregistered /*bind 0000:00:10.1 to vfio-pci*/ [ 2786.037448] ehci-pci 0000:00:10.7: remove, state 4 [ 2786.042460] usb usb3: USB disconnect, device number 1 [ 2786.048700] ehci-pci 0000:00:10.7: USB bus 3 deregistered /*bind 0000:00:10.7 to vfio-pci*/ [ 2787.518041] audit: type=1400 audit(1594375839.459:49): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2034 comm="apparmor_parser" [ 2788.290706] audit: type=1400 audit(1594375840.231:50): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2037 comm="apparmor_parser" [ 2788.960070] audit: type=1400 audit(1594375840.899:51): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2040 comm="apparmor_parser" [ 2788.968821] virbr0: port 2(vnet0) entered blocking state [ 2788.988159] virbr0: port 2(vnet0) entered disabled state [ 2788.993711] device vnet0 entered promiscuous mode [ 2788.999453] virbr0: port 2(vnet0) entered blocking state [ 2789.005053] virbr0: port 2(vnet0) entered listening state [ 2789.098717] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 2789.564241] audit: type=1400 audit(1594375841.507:52): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2065 comm="apparmor_parser" [ 2791.028028] virbr0: port 2(vnet0) entered learning state [ 2793.047999] virbr0: port 2(vnet0) entered forwarding state [ 2793.053449] virbr0: topology change detected, propagating [ 2793.433604] vfio_cap_init: 0000:00:10.7 hiding cap 0xa /* shutdown virtual machine*/ [ 3838.772058] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 3838.815819] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 3838.871002] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 3838.884606] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 3838.894514] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 3838.896791] rfkill: input handler enabled [ 3838.903896] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 3838.907998] systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/packagekit_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=952 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a [ 3838.949500] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 3839.002757] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 3839.182053] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 3839.191313] systemd-journald[286]: Successfully sent stream file descriptor to service manager. [ 3838.302725] libvirt-guests.sh[2161]: Running guests on default URI: generic [ 3838.306783] libvirt-guests.sh[2161]: Shutting down guests on default URI... [ 3838.415103] libvirt-guests.sh[2161]: Starting shutdown on guest: generic plymouth-poweroff.service [ OK ] Stopped Firmware update daemon. [ OK ] Stopped Session 1 of user wang. [ OK ] Removed slice User Slice of wang. Stopping Permit User Sessions... Stopping Login Service... [ OK ] Stopped Permit User Sessions. [ OK ] Unmounted Mount unit for core, revision 9289. [ OK ] Unmounted Mount unit for gnome-system-monitor, revision 148. [ OK ] Unmounted Mount unit for gnome-3-34-1804, revision 33. [ OK ] Unmounted Mount unit for gnome-logs, revision 81. [ OK ] Unmounted Mount unit for core18, revision 1754. [ OK ] Unmounted Mount unit for gnome-3-28-1804, revision 116. [ OK ] Unmounted Mount unit for gnome-characters, revision 550. [ OK ] Unmounted Mount unit for gnome-3-34-1804, revision 36. [ OK ] Unmounted Mount unit for core, revision 9436. [ OK ] Unmounted Mount unit for gnome-characters, revision 539. [ OK ] Unmounted Mount unit for gtk-common-themes, revision 1440. [ OK ] Unmounted Mount unit for gtk-common-themes, revision 1506. [ OK ] Unmounted Mount unit for core18, revision 1668. [ OK ] Unmounted Mount unit for gnome-calculator, revision 544. [ OK ] Unmounted Mount unit for gnome-calculator, revision 748. [ OK ] Unmounted Mount unit for gnome-logs, revision 100. [ OK ] Unmounted Mount unit for gnome-3-28-1804, revision 128. [ OK ] Stopped Login Service. [ OK ] Stopped target User and Group Name Lookups. [ 3839.471635] libvirt-guests.sh[2161]: Waiting for 1 guests to shut down, 120 seconds left [ 3841.824842] virbr0: port 2(vnet0) entered disabled state [ 3841.832949] device vnet0 left promiscuous mode [ 3841.837393] virbr0: port 2(vnet0) entered disabled state [[ 3843.167495] audit: type=1400 audit(1594376895.107:53): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-fa674e73-67a2-4672-8524-e62dea8a3c6c" pid=2301 comm="apparmor_parser" [ 3843.246397] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 3843.254157] PGD 0 P4D 0 [ 3843.256671] Oops: 0002 [#1] SMP NOPTI [ 3843.260301] CPU: 1 PID: 1812 Comm: libvirtd Not tainted 4.19.65 #10 [ 3843.266511] Hardware name: Shanghai Zhaoxin Semiconductor Co., Ltd. HX002EA/HX002EA, BIOS HX002EA0_03_R490_D_200707 07/07/2020 [ 3843.277804] RIP: 0010:vfio_device_put+0xa5/0x140 [vfio] [ 3843.283129] Code: 1c 4e ce 48 8b 73 28 48 8b 53 20 48 c7 c7 48 54 61 c0 e8 51 1c 4e ce 48 8b 53 20 48 8b 43 28 48 c7 c7 80 54 61 c0 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 05 00 01 00 [ 3843.301726] RSP: 0018:ffff8fde2210bc28 EFLAGS: 00010282 [ 3843.306905] RAX: 0000000000000000 RBX: ffff8fde1a59ed00 RCX: 0000000000000000 [ 3843.313975] RDX: ffff8fdde27d6820 RSI: ffff8fde2fa96438 RDI: ffffffffc0615480 [ 3843.321118] RBP: ffff8fde2210bc48 R08: 0000000000000c90 R09: 3d7478656e2c303d [ 3843.328190] R10: ffff8fdde2284520 R11: 3032383664373265 R12: ffff8fdde27d6800 [ 3843.335314] R13: ffff8fde1a59ed20 R14: ffff8fdde27d6800 R15: 0000000000000000 [ 3843.342390] FS: 00007f91fe3d6700(0000) GS:ffff8fde2fa80000(0000) knlGS:0000000000000000 [ 3843.350457] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3843.356221] CR2: 0000000000000000 CR3: 0000000425fec005 CR4: 0000000000160ee0 [ 3843.363312] Call Trace: [ 3843.365778] vfio_del_group_dev+0x105/0x2e0 [vfio] [ 3843.370537] ? do_wait_intr_irq+0x90/0x90 [ 3843.374550] ? vprintk_func+0x47/0xc0 [ 3843.378202] vfio_pci_remove+0x20/0xe0 [vfio_pci] [ 3843.382900] pci_device_remove+0x51/0xd0 [ 3843.386799] device_release_driver_internal+0x18d/0x250 [ 3843.392042] device_release_driver+0x12/0x20 [ 3843.396324] unbind_store+0xbd/0x190 [ 3843.399869] drv_attr_store+0x27/0x40 [ 3843.403541] sysfs_kf_write+0x3c/0x50 [ 3843.407251] kernfs_fop_write+0x125/0x1a0 [ 3843.411431] __vfs_write+0x3a/0x190 [ 3843.414902] ? apparmor_file_permission+0x1a/0x20 [ 3843.419648] ? security_file_permission+0x31/0xc0 [ 3843.424648] ? _cond_resched+0x19/0x40 [ 3843.428407] vfs_write+0xb1/0x1a0 [ 3843.431696] ksys_write+0x5c/0xe0 [ 3843.435048] __x64_sys_write+0x1a/0x20 [ 3843.439056] do_syscall_64+0x5a/0x120 [ 3843.442713] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3843.447757] RIP: 0033:0x7f924e2902b7 [ 3843.451385] Code: 44 00 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 83 ec 10 e8 5b fd ff ff 4c 89 e2 41 89 c0 48 89 ee 89 df b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 94 fd ff ff 48 [ 3843.470696] RSP: 002b:00007f91fe3d5810 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 3843.478508] RAX: ffffffffffffffda RBX: 0000000000000016 RCX: 00007f924e2902b7 [ 3843.485627] RDX: 000000000000000c RSI: 00007f921001c784 RDI: 0000000000000016 [ 3843.493001] RBP: 00007f921001c784 R08: 0000000000000000 R09: 000000000000000d [ 3843.500120] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000c [ 3843.507413] R13: 0000000000000000 R14: 0000000000000016 R15: 00007f91e8112e20 [ 3843.514541] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter amdgpu chash gpu_sched nls_iso8859_1 snd_hda_codec_hdmi radeon snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm via_cputemp hwmon_vid kvm_intel kvm snd_seq_midi ttm snd_seq_midi_event irqbypass snd_rawmidi crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel snd_seq pcbc aesni_intel aes_x86_64 crypto_simd drm snd_seq_device cryptd glue_helper joydev snd_timer input_leds serio_raw snd i2c_algo_bit fb_sys_fops syscopyarea video sysfillrect sysimgblt [ 3843.586463] soundcore mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic ahci psmouse r8169 usbhid hid libahci realtek [ 3843.600637] CR2: 0000000000000000 [ 3843.604108] ---[ end trace 65d72623b84bf7a3 ]--- [ 3843.608839] RIP: 0010:vfio_device_put+0xa5/0x140 [vfio] [ 3843.614209] Code: 1c 4e ce 48 8b 73 28 48 8b 53 20 48 c7 c7 48 54 61 c0 e8 51 1c 4e ce 48 8b 53 20 48 8b 43 28 48 c7 c7 80 54 61 c0 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 05 00 01 00 [ 3843.633419] RSP: 0018:ffff8fde2210bc28 EFLAGS: 00010282 [ 3843.638816] RAX: 0000000000000000 RBX: ffff8fde1a59ed00 RCX: 0000000000000000 [ 3843.646133] RDX: ffff8fdde27d6820 RSI: ffff8fde2fa96438 RDI: ffffffffc0615480 [ 3843.653551] RBP: ffff8fde2210bc48 R08: 0000000000000c90 R09: 3d7478656e2c303d [ 3843.661304] R10: ffff8fdde2284520 R11: 3032383664373265 R12: ffff8fdde27d6800 [ 3843.668540] R13: ffff8fde1a59ed20 R14: ffff8fdde27d6800 R15: 0000000000000000 [ 3843.676029] FS: 00007f91fe3d6700(0000) GS:ffff8fde2fa80000(0000) knlGS:0000000000000000 [ 3843.684398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3843.690269] CR2: 0000000000000000 CR3: 0000000425fec005 CR4: 0000000000160ee0 Thanks weitaowang 保密声明: 本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。 CONFIDENTIAL NOTE: This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited.
On Wed, 22 Jul 2020 19:57:48 +0800 WeitaoWangoc <WeitaoWang-oc@zhaoxin.com> wrote: > This bug is found in Zhaoxin platform, but it's a commom code bug. > Fail sequence: > step1: Unbind UHCI controller from native driver; > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one vfio > group's device list and set UHCI's dev->driver_data to struct vfio-pci(for UHCI) > step3: Unbind EHCI controller from native driver, will try to tell UHCI native driver > that "I'm removed by set companion_hcd->self.hs_companion to NULL. However, > companion_hcd get from UHCI's dev->driver_data that has modified by vfio-pci > already.So, the vfio-pci structure will be damaged! > step4: Bind EHCI controller to vfio-pci driver, which will put EHCI controller in the > same vfio group as UHCI controller; > ... ... > step5: Unbind UHCI controller from vfio-pci, which will delete UHCI from vfio group' > device list that has been damaged in step 3. So,delete operation can random > result into a NULL pointer dereference with the below stack dump. > step6: Bind UHCI controller to native driver; > step7: Unbind EHCI controller from vfio-pci, which will try to remove EHCI controller > from the vfio group; > step8: Bind EHCI controller to native driver; > > [ 929.114641] uhci_hcd 0000:00:10.0: remove, state 1 > [ 929.114652] usb usb1: USB disconnect, device number 1 > [ 929.114655] usb 1-1: USB disconnect, device number 2 > [ 929.270313] usb 1-2: USB disconnect, device number 3 > [ 929.318404] uhci_hcd 0000:00:10.0: USB bus 1 deregistered > [ 929.343029] uhci_hcd 0000:00:10.1: remove, state 4 > [ 929.343045] usb usb3: USB disconnect, device number 1 > [ 929.343685] uhci_hcd 0000:00:10.1: USB bus 3 deregistered > [ 929.369087] ehci-pci 0000:00:10.7: remove, state 4 > [ 929.369102] usb usb4: USB disconnect, device number 1 > [ 929.370325] ehci-pci 0000:00:10.7: USB bus 4 deregistered > [ 932.398494] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > [ 932.398496] PGD 42a67d067 P4D 42a67d067 PUD 42a65f067 PMD 0 > [ 932.398502] Oops: 0002 [#2] SMP NOPTI > [ 932.398505] CPU: 2 PID: 7824 Comm: vfio_unbind.sh Tainted: P D 4.19.65-2020051917-rainos #1 > [ 932.398506] Hardware name: Shanghai Zhaoxin Semiconductor Co., Ltd. HX002EH/HX002EH, > BIOS HX002EH0_01_R480_R_200408 04/08/2020 > [ 932.398513] RIP: 0010:vfio_device_put+0x31/0xa0 [vfio] > [ 932.398515] Code: 89 e5 41 54 53 4c 8b 67 18 48 89 fb 49 8d 74 24 30 e8 e3 0e f3 de > 84 c0 74 67 48 8b 53 20 48 8b 43 28 48 8b 7b 18 48 89 42 08 <48> 89 10 > 48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 b8 00 02 00 > [ 932.398516] RSP: 0018:ffffbbfd04cffc18 EFLAGS: 00010202 > [ 932.398518] RAX: 0000000000000000 RBX: ffff92c7ea717880 RCX: 0000000000000000 > [ 932.398519] RDX: ffff92c7ea713620 RSI: ffff92c7ea713630 RDI: ffff92c7ea713600 > [ 932.398521] RBP: ffffbbfd04cffc28 R08: ffff92c7f02a8080 R09: ffff92c7efc03980 > [ 932.398522] R10: ffffbbfd04cff9a8 R11: 0000000000000000 R12: ffff92c7ea713600 > [ 932.398523] R13: ffff92c7ed8bb0a8 R14: ffff92c7ea717880 R15: 0000000000000000 > [ 932.398525] FS: 00007f3031500740(0000) GS:ffff92c7f0280000(0000) knlGS:0000000000000000 > [ 932.398526] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 932.398527] CR2: 0000000000000000 CR3: 0000000428626004 CR4: 0000000000160ee0 > [ 932.398528] Call Trace: > [ 932.398534] vfio_del_group_dev+0xe8/0x2a0 [vfio] > [ 932.398539] ? __blocking_notifier_call_chain+0x52/0x60 > [ 932.398542] ? do_wait_intr_irq+0x90/0x90 > [ 932.398546] ? iommu_bus_notifier+0x75/0x100 > [ 932.398551] vfio_pci_remove+0x20/0xa0 [vfio_pci] > [ 932.398554] pci_device_remove+0x3e/0xc0 > [ 932.398557] device_release_driver_internal+0x17a/0x240 > [ 932.398560] device_release_driver+0x12/0x20 > [ 932.398561] unbind_store+0xee/0x180 > [ 932.398564] drv_attr_store+0x27/0x40 > [ 932.398567] sysfs_kf_write+0x3c/0x50 > [ 932.398568] kernfs_fop_write+0x125/0x1a0 > [ 932.398572] __vfs_write+0x3a/0x190 > [ 932.398575] ? apparmor_file_permission+0x1a/0x20 > [ 932.398577] ? security_file_permission+0x3b/0xc0 > [ 932.398581] ? _cond_resched+0x1a/0x50 > [ 932.398582] vfs_write+0xb8/0x1b0 > [ 932.398584] ksys_write+0x5c/0xe0 > [ 932.398586] __x64_sys_write+0x1a/0x20 > [ 932.398589] do_syscall_64+0x5a/0x110 > [ 932.398592] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Using virt-manager/qemu to boot guest os, we can see the same fail sequence! > > Fix this by check for UHCI driver loaded or not before modifiy UHCI's dev->driver_data, > which will happen in ehci native driver probe/remove. > > Signed-off-by: WeitaoWangoc <WeitaoWang-oc@zhaoxin.com> > --- > drivers/usb/core/hcd-pci.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c > index 1547aa6..484f2a0 100644 > --- a/drivers/usb/core/hcd-pci.c > +++ b/drivers/usb/core/hcd-pci.c > @@ -34,6 +34,7 @@ static DECLARE_RWSEM(companions_rwsem); > #define CL_OHCI PCI_CLASS_SERIAL_USB_OHCI > #define CL_EHCI PCI_CLASS_SERIAL_USB_EHCI > > +#define PCI_DEV_DRV_FLAG 2 > static inline int is_ohci_or_uhci(struct pci_dev *pdev) > { > return pdev->class == CL_OHCI || pdev->class == CL_UHCI; > @@ -68,6 +69,8 @@ static void for_each_companion(struct pci_dev *pdev, struct usb_hcd *hcd, > if (companion->class != CL_UHCI && companion->class != CL_OHCI && > companion->class != CL_EHCI) > continue; > + if (!(companion->priv_flags & PCI_DEV_DRV_FLAG)) But pci_dev.priv_flags is private data for the driver that currently owns the device, which could be vfio-pci. This is really no different than assuming the structure at device.driver_data. If vfio-pci were to make legitimate use of pci_dev.priv_flags, this could simply blow up again. Should there instead be some sort of registration interface where hcd complaint drivers register their devices and only those registered devices can have their driver private data arbitrarily poked by another driver? Thanks, Alex > + continue; > > companion_hcd = pci_get_drvdata(companion); > if (!companion_hcd || !companion_hcd->self.root_hub) > @@ -254,6 +257,7 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const struct pci_device_id *id, > } > > pci_set_master(dev); > + dev->priv_flags | = PCI_DEV_DRV_FLAG; > > /* Note: dev_set_drvdata must be called while holding the rwsem */ > if (dev->class == CL_EHCI) { > @@ -326,6 +330,7 @@ void usb_hcd_pci_remove(struct pci_dev *dev) > local_irq_disable(); > usb_hcd_irq(0, hcd); > local_irq_enable(); > + dev->priv_flags & = ~PCI_DEV_DRV_FLAG; > > /* Note: dev_set_drvdata must be called while holding the rwsem */ > if (dev->class == CL_EHCI) { > -- > 2.7.4 > > > > 保密声明: > 本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。 > CONFIDENTIAL NOTE: > This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited. >
On Thu, 23 Jul 2020 02:59:55 +0000 "Weitao Wang(BJ-RD)" <WeitaoWang@zhaoxin.com> wrote: > On , Jul 22, 2020 at 02:44:14PM +0200, Alan wrote: > > On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote: > > > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote: > > > > This bug is found in Zhaoxin platform, but it's a commom code bug. > > > > Fail sequence: > > > > step1: Unbind UHCI controller from native driver; > > > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one > > vfio > > > > group's device list and set UHCI's dev->driver_data to struct > > vfio-pci(for UHCI) > > > > > > Hah, that works? How do you do that properly? What code does that? > > > > Yeah, that can't possibly work. The USB core expects that any host > > controller device (or at least, any PCI host controller device) has its > > driver_data set to point to a struct usb_hcd. It doesn't expect a host > > controller to be bound to anything other than a host controller driver. > > > > Things could easily go very wrong here. For example, suppose at this > > point the ehci-hcd driver just happens to bind to the EHCI controller. > > When this happens, the EHCI controller hardware takes over all the USB > > connections that were routed to the UHCI controller. How will vfio-pci > > deal with that? Pretty ungracefully, I imagine. The issue I believe we're seeing here is not with vfio-pci trying to do anything with the device, the IOMMU grouping would prevent a user from opening any device within the group while other devices within the group are bound to host drivers. So it should be fine if the EHCI device takes over the other ports, but it's not ok that ehci-hcd assumes the driver private data for any other UHCI/OHCI/EHCI device in the same slot is something that it's free to modify. It really seems there should be some sort of companion device registration/opt-in rather than modifying unvalidated data. > > The only way to make this work at all is to unbind both uhci-hcd and > > ehci-hcd first. Then after both are finished you can safely bind > > vfio-pci to the EHCI controller and the UHCI controllers (in that > > order). > > > I'm agree with you, unbind both uhci-hcd and ehci-hcd first then bind to > vfio-pci is a more reasonable sequence. Our experiments prove that this > sequence is indeed good as expected. > However, I did not find a formal document to prescribe this order. > Unfortunately, some application software such as virt-manager/qemu assign > UHCI/EHCI to guest OS has the same bind/unbind sequence as test “by hand”. > Do we need to consider compatibility with this application scenario? Unbinding all functions first, before binding any to vfio-pci should indeed work, thanks to the for_each_companion() function at least testing for null private data before going further. I'd still argue though that these hcd drivers are overstepping their authority by walking the PCI bus and assuming any device in the same slot, matching a set of class codes, is making use of a driver with a known data structure that they're allowed to modify. Even if we claim that the user needs to know what they're doing when they change driver binding, that's a pretty subtle interaction with no validation. Thanks, Alex
On Thu, Jul 23, 2020 at 02:59:55AM +0000, Weitao Wang(BJ-RD) wrote: > CONFIDENTIAL NOTE: > This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited. This footer is not compatible with Linux mailing lists, sorry, I am not allowed to respond to it. greg k-h
On Thu,23 July 2020 04:18:00 +0000 Alex wrote: > On Wed, 22 Jul 2020 19:57:48 +0800 > WeitaoWangoc <WeitaoWang-oc@zhaoxin.com> wrote: > > > drivers/usb/core/hcd-pci.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c > > index 1547aa6..484f2a0 100644 > > --- a/drivers/usb/core/hcd-pci.c > > +++ b/drivers/usb/core/hcd-pci.c > > @@ -34,6 +34,7 @@ static DECLARE_RWSEM(companions_rwsem); > > #define CL_OHCI PCI_CLASS_SERIAL_USB_OHCI > > #define CL_EHCI PCI_CLASS_SERIAL_USB_EHCI > > > > +#define PCI_DEV_DRV_FLAG 2 > > static inline int is_ohci_or_uhci(struct pci_dev *pdev) { > > return pdev->class == CL_OHCI || pdev->class == CL_UHCI; @@ > > -68,6 +69,8 @@ static void for_each_companion(struct pci_dev *pdev, struct > usb_hcd *hcd, > > if (companion->class != CL_UHCI && companion->class != > CL_OHCI && > > companion->class != CL_EHCI) > > continue; > > + if (!(companion->priv_flags & PCI_DEV_DRV_FLAG)) > > But pci_dev.priv_flags is private data for the driver that currently > owns the device, which could be vfio-pci. This is really no different > than assuming the structure at device.driver_data. If vfio-pci were to > make legitimate use of pci_dev.priv_flags, this could simply blow up > again. Should there instead be some sort of registration interface > where hcd complaint drivers register their devices and only those > registered devices can have their driver private data arbitrarily poked > by another driver? Thanks, Thanks for your explanation. Set pci_dev.priv_flags is really not a reasonable approach. Are there any more detailed suggestions to patch this issue? Thanks Weitaowang
On Thu, Jul 23, 2020 at 08:36:25AM +0000, WeitaoWang-oc wrote: > > On Thu,23 July 2020 04:18:00 +0000 Alex wrote: > > On Wed, 22 Jul 2020 19:57:48 +0800 > > WeitaoWangoc <WeitaoWang-oc@zhaoxin.com> wrote: > > > > > drivers/usb/core/hcd-pci.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c > > > index 1547aa6..484f2a0 100644 > > > --- a/drivers/usb/core/hcd-pci.c > > > +++ b/drivers/usb/core/hcd-pci.c > > > @@ -34,6 +34,7 @@ static DECLARE_RWSEM(companions_rwsem); > > > #define CL_OHCI PCI_CLASS_SERIAL_USB_OHCI > > > #define CL_EHCI PCI_CLASS_SERIAL_USB_EHCI > > > > > > +#define PCI_DEV_DRV_FLAG 2 > > > static inline int is_ohci_or_uhci(struct pci_dev *pdev) { > > > return pdev->class == CL_OHCI || pdev->class == CL_UHCI; @@ > > > -68,6 +69,8 @@ static void for_each_companion(struct pci_dev *pdev, struct > > usb_hcd *hcd, > > > if (companion->class != CL_UHCI && companion->class != > > CL_OHCI && > > > companion->class != CL_EHCI) > > > continue; > > > + if (!(companion->priv_flags & PCI_DEV_DRV_FLAG)) > > > > But pci_dev.priv_flags is private data for the driver that currently > > owns the device, which could be vfio-pci. This is really no different > > than assuming the structure at device.driver_data. If vfio-pci were to > > make legitimate use of pci_dev.priv_flags, this could simply blow up > > again. Should there instead be some sort of registration interface > > where hcd complaint drivers register their devices and only those > > registered devices can have their driver private data arbitrarily poked > > by another driver? Thanks, > > Thanks for your explanation. Set pci_dev.priv_flags is really not a > reasonable approach. Are there any more detailed suggestions > to patch this issue? This is not a kernel issue, it is a "do not do this in this way from userspace" issue :) thanks, greg k-h
On Wed, Jul 22, 2020 at 10:18:17PM -0600, Alex Williamson wrote: > On Thu, 23 Jul 2020 02:59:55 +0000 > "Weitao Wang(BJ-RD)" <WeitaoWang@zhaoxin.com> wrote: > > > On , Jul 22, 2020 at 02:44:14PM +0200, Alan wrote: > > > On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote: > > > > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote: > > > > > This bug is found in Zhaoxin platform, but it's a commom code bug. > > > > > Fail sequence: > > > > > step1: Unbind UHCI controller from native driver; > > > > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one > > > vfio > > > > > group's device list and set UHCI's dev->driver_data to struct > > > vfio-pci(for UHCI) > > > > > > > > Hah, that works? How do you do that properly? What code does that? > > > > > > Yeah, that can't possibly work. The USB core expects that any host > > > controller device (or at least, any PCI host controller device) has its > > > driver_data set to point to a struct usb_hcd. It doesn't expect a host > > > controller to be bound to anything other than a host controller driver. > > > > > > Things could easily go very wrong here. For example, suppose at this > > > point the ehci-hcd driver just happens to bind to the EHCI controller. > > > When this happens, the EHCI controller hardware takes over all the USB > > > connections that were routed to the UHCI controller. How will vfio-pci > > > deal with that? Pretty ungracefully, I imagine. > > The issue I believe we're seeing here is not with vfio-pci trying to do > anything with the device, the IOMMU grouping would prevent a user from > opening any device within the group while other devices within the > group are bound to host drivers. You've lost me. (A) What is IOMMU grouping? (B) How does it prevent users from opening devices? (C) What do users have to do with the problem anyway (USB host controllers and drivers have to do things on their own even without user intervention)? > So it should be fine if the EHCI > device takes over the other ports, but it's not ok that ehci-hcd > assumes the driver private data for any other UHCI/OHCI/EHCI device in > the same slot is something that it's free to modify. It really seems > there should be some sort of companion device registration/opt-in > rather than modifying unvalidated data. Until now that hasn't been necessary, since nobody wanted to bind a different driver to these devices. > > > The only way to make this work at all is to unbind both uhci-hcd and > > > ehci-hcd first. Then after both are finished you can safely bind > > > vfio-pci to the EHCI controller and the UHCI controllers (in that > > > order). > > > > > I'm agree with you, unbind both uhci-hcd and ehci-hcd first then bind to > > vfio-pci is a more reasonable sequence. Our experiments prove that this > > sequence is indeed good as expected. > > However, I did not find a formal document to prescribe this order. > > Unfortunately, some application software such as virt-manager/qemu assign > > UHCI/EHCI to guest OS has the same bind/unbind sequence as test “by hand”. > > Do we need to consider compatibility with this application scenario? > > Unbinding all functions first, before binding any to vfio-pci should > indeed work, thanks to the for_each_companion() function at least > testing for null private data before going further. I'd still argue > though that these hcd drivers are overstepping their authority by > walking the PCI bus and assuming any device in the same slot, matching > a set of class codes, is making use of a driver with a known data > structure that they're allowed to modify. Until recently that has been a valid assumption. > Even if we claim that the > user needs to know what they're doing when they change driver binding, > that's a pretty subtle interaction with no validation. Thanks, It's worse than that. We're not just dealing with a software interaction issue -- the _hardware_ for these devices also interacts. That's the real reason why the driver for the device on one slot has to be aware of the driver for the device on a different slot. Adding a mechanism for software registration or validation won't fix the hardware issues. Relying on a protocol that requires all the devices to be unbound before any of them are bound to a new class of drivers, on the other hand, will fix the problem. Alan Stern
On Thu, 23 Jul 2020 11:38:21 -0400 Alan Stern <stern@rowland.harvard.edu> wrote: > On Wed, Jul 22, 2020 at 10:18:17PM -0600, Alex Williamson wrote: > > On Thu, 23 Jul 2020 02:59:55 +0000 > > "Weitao Wang(BJ-RD)" <WeitaoWang@zhaoxin.com> wrote: > > > > > On , Jul 22, 2020 at 02:44:14PM +0200, Alan wrote: > > > > On Wed, Jul 22, 2020 at 02:44:14PM +0200, Greg KH wrote: > > > > > On Wed, Jul 22, 2020 at 07:57:48PM +0800, WeitaoWangoc wrote: > > > > > > This bug is found in Zhaoxin platform, but it's a commom code bug. > > > > > > Fail sequence: > > > > > > step1: Unbind UHCI controller from native driver; > > > > > > step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one > > > > vfio > > > > > > group's device list and set UHCI's dev->driver_data to struct > > > > vfio-pci(for UHCI) > > > > > > > > > > Hah, that works? How do you do that properly? What code does that? > > > > > > > > Yeah, that can't possibly work. The USB core expects that any host > > > > controller device (or at least, any PCI host controller device) has its > > > > driver_data set to point to a struct usb_hcd. It doesn't expect a host > > > > controller to be bound to anything other than a host controller driver. > > > > > > > > Things could easily go very wrong here. For example, suppose at this > > > > point the ehci-hcd driver just happens to bind to the EHCI controller. > > > > When this happens, the EHCI controller hardware takes over all the USB > > > > connections that were routed to the UHCI controller. How will vfio-pci > > > > deal with that? Pretty ungracefully, I imagine. > > > > The issue I believe we're seeing here is not with vfio-pci trying to do > > anything with the device, the IOMMU grouping would prevent a user from > > opening any device within the group while other devices within the > > group are bound to host drivers. > > You've lost me. (A) What is IOMMU grouping? (B) How does it prevent > users from opening devices? (C) What do users have to do with the > problem anyway (USB host controllers and drivers have to do things on > their own even without user intervention)? The alternate driver in question here is vfio-pci, which allows IOMMU protected userspace driver access to a device. A primary use case of vfio is to assign PCI devices to a VM, where QEMU is the userspace driver. An IOMMU group is a set of one or more devices that are considered to be DMA isolated from other groups of devices. DMA isolation includes, for instance, the potential for peer-to-peer between devices which cannot be managed by the IOMMU. DMA between PCIe functions within a multifunction slot are generally considered to be non-isolated from one another unless they implement PCIe Access Control Services (ACS) to indicate isolation. I've never seen a USB controller implement ACS, nor should they due to the interactions between functions, therefore all the USB functions within a slot will be grouped together. The vfio framework will not allow users to access groups for which some of the devices within the group are bound to active host drivers, therefore in this scenario where we have one USB function bound to a host driver and the other bound to vfio-pci, the user would not be able to access the device and the vfio-pci usage of the device is essentially nothing more than a stub driver until driver binding of the other devices within the group changes. IOW, vfio-pci is not trying to make use of the device with this split binding, the report here is only a fault seen in the process of moving all of the functions to vfio-pci, which would then make the devices accessible to the user. > > So it should be fine if the EHCI > > device takes over the other ports, but it's not ok that ehci-hcd > > assumes the driver private data for any other UHCI/OHCI/EHCI device in > > the same slot is something that it's free to modify. It really seems > > there should be some sort of companion device registration/opt-in > > rather than modifying unvalidated data. > > Until now that hasn't been necessary, since nobody wanted to bind a > different driver to these devices. > > > > > The only way to make this work at all is to unbind both uhci-hcd and > > > > ehci-hcd first. Then after both are finished you can safely bind > > > > vfio-pci to the EHCI controller and the UHCI controllers (in that > > > > order). > > > > > > > I'm agree with you, unbind both uhci-hcd and ehci-hcd first then bind to > > > vfio-pci is a more reasonable sequence. Our experiments prove that this > > > sequence is indeed good as expected. > > > However, I did not find a formal document to prescribe this order. > > > Unfortunately, some application software such as virt-manager/qemu assign > > > UHCI/EHCI to guest OS has the same bind/unbind sequence as test “by hand”. > > > Do we need to consider compatibility with this application scenario? > > > > Unbinding all functions first, before binding any to vfio-pci should > > indeed work, thanks to the for_each_companion() function at least > > testing for null private data before going further. I'd still argue > > though that these hcd drivers are overstepping their authority by > > walking the PCI bus and assuming any device in the same slot, matching > > a set of class codes, is making use of a driver with a known data > > structure that they're allowed to modify. > > Until recently that has been a valid assumption. > > > Even if we claim that the > > user needs to know what they're doing when they change driver binding, > > that's a pretty subtle interaction with no validation. Thanks, > > It's worse than that. We're not just dealing with a software > interaction issue -- the _hardware_ for these devices also interacts. > That's the real reason why the driver for the device on one slot has to > be aware of the driver for the device on a different slot. > > Adding a mechanism for software registration or validation won't fix the > hardware issues. Relying on a protocol that requires all the devices to > be unbound before any of them are bound to a new class of drivers, on > the other hand, will fix the problem. The IOMMU grouping restriction does solve the hardware issue, so long as one driver doesn't blindly assume the driver private data for another device and modify it. I do agree that your solution would work, requiring all devices are unbound before any can be bound, but it also seems difficult to manage. The issue is largely unique to USB AFAIK. On the other hand, drivers coordinating with each other to register their _private_ data as share-able within a set of drivers seems like a much more direct and explicit interaction between the drivers. Thanks, Alex
On Thu, Jul 23, 2020 at 10:17:35AM -0600, Alex Williamson wrote: > The IOMMU grouping restriction does solve the hardware issue, so long > as one driver doesn't blindly assume the driver private data for > another device and modify it. Correction: The IOMMU grouping restriction solves the hardware issue for vfio-pci. It won't necessarily help if some other driver comes along and wants to bind to this hardware. > I do agree that your solution would > work, requiring all devices are unbound before any can be bound, but it > also seems difficult to manage. The issue is largely unique to USB > AFAIK. On the other hand, drivers coordinating with each other to > register their _private_ data as share-able within a set of drivers > seems like a much more direct and explicit interaction between the > drivers. Thanks, Yes, that makes sense. But it would have to be implemented in the driver core, not in particular subsystems like USB or PCI. And it might be seen as overkill, given that only UHCI/OHCI/EHCI devices require this sort of sharing AFAIK. Also, when you think about it, what form would such coordination among drivers take? From your description, it sounds like the drivers would agree to avoid accessing each other's private data if the proper registration wasn't in place. On the other hand, a stronger and perhaps more robust approach would be to enforce the condition that non-cooperating drivers are never bound to devices in the same group at the same time. That's basically what I'm proposing here -- the question is whether the enforcement should be instituted in the kernel or should merely be part of a standard protocol followed by userspace drivers. Given that it's currently needed in only one place, it seems reasonable to leave this as a "gentlemen's agreement" in userspace for the time being instead of adding it to the kernel. Alan Stern
On Thu, 23 Jul 2020 12:38:21 -0400, Alan wrote: > On Thu, Jul 23, 2020 at 10:17:35AM -0600, Alex Williamson wrote: > > The IOMMU grouping restriction does solve the hardware issue, so long > > as one driver doesn't blindly assume the driver private data for > > another device and modify it. > > Correction: The IOMMU grouping restriction solves the hardware issue for > vfio-pci. It won't necessarily help if some other driver comes along > and wants to bind to this hardware. > > > I do agree that your solution would > > work, requiring all devices are unbound before any can be bound, but it > > also seems difficult to manage. The issue is largely unique to USB > > AFAIK. On the other hand, drivers coordinating with each other to > > register their _private_ data as share-able within a set of drivers > > seems like a much more direct and explicit interaction between the > > drivers. Thanks, > > Yes, that makes sense. But it would have to be implemented in the > driver core, not in particular subsystems like USB or PCI. And it might > be seen as overkill, given that only UHCI/OHCI/EHCI devices require this > sort of sharing AFAIK. > > Also, when you think about it, what form would such coordination among > drivers take? From your description, it sounds like the drivers would > agree to avoid accessing each other's private data if the proper > registration wasn't in place. > > On the other hand, a stronger and perhaps more robust approach would be > to enforce the condition that non-cooperating drivers are never bound to > devices in the same group at the same time. That's basically what I'm > proposing here -- the question is whether the enforcement should be > instituted in the kernel or should merely be part of a standard protocol > followed by userspace drivers. > > Given that it's currently needed in only one place, it seems reasonable > to leave this as a "gentlemen's agreement" in userspace for the time > being instead of adding it to the kernel. > Provided that EHCI and UHCI host controller declare not support P2P and ACS. So, we can assign EHCI and UHCI host controller to different IOMMU group separately. We assign EHCI host controller to host and assign UHCI host controller to VM. Then, ehci_hcd driver load/unload operation in host will cause the same issue as discussed Thanks Weitao
On Fri, 24 Jul 2020 12:57:49 +0000 WeitaoWang-oc <WeitaoWang-oc@zhaoxin.com> wrote: > On Thu, 23 Jul 2020 12:38:21 -0400, Alan wrote: > > On Thu, Jul 23, 2020 at 10:17:35AM -0600, Alex Williamson wrote: > > > The IOMMU grouping restriction does solve the hardware issue, so long > > > as one driver doesn't blindly assume the driver private data for > > > another device and modify it. > > > > Correction: The IOMMU grouping restriction solves the hardware issue for > > vfio-pci. It won't necessarily help if some other driver comes along > > and wants to bind to this hardware. > > > > > I do agree that your solution would > > > work, requiring all devices are unbound before any can be bound, but it > > > also seems difficult to manage. The issue is largely unique to USB > > > AFAIK. On the other hand, drivers coordinating with each other to > > > register their _private_ data as share-able within a set of drivers > > > seems like a much more direct and explicit interaction between the > > > drivers. Thanks, > > > > Yes, that makes sense. But it would have to be implemented in the > > driver core, not in particular subsystems like USB or PCI. And it might > > be seen as overkill, given that only UHCI/OHCI/EHCI devices require this > > sort of sharing AFAIK. > > > > Also, when you think about it, what form would such coordination among > > drivers take? From your description, it sounds like the drivers would > > agree to avoid accessing each other's private data if the proper > > registration wasn't in place. > > > > On the other hand, a stronger and perhaps more robust approach would be > > to enforce the condition that non-cooperating drivers are never bound to > > devices in the same group at the same time. That's basically what I'm > > proposing here -- the question is whether the enforcement should be > > instituted in the kernel or should merely be part of a standard protocol > > followed by userspace drivers. > > > > Given that it's currently needed in only one place, it seems reasonable > > to leave this as a "gentlemen's agreement" in userspace for the time > > being instead of adding it to the kernel. > > > > Provided that EHCI and UHCI host controller declare not support P2P and > ACS. So, we can assign EHCI and UHCI host controller to different IOMMU > group separately. We assign EHCI host controller to host and assign UHCI > host controller to VM. Then, ehci_hcd driver load/unload operation in host > will cause the same issue as discussed And you have an example of such a device? I expect these do not exist, nor should they. It seems like it would be an improper use of ACS. Thanks, Alex
On Fri, 24 Jul 2020 19:17:49 +0000 Alex wrote: > On Fri, 24 Jul 2020 12:57:49 +0000 > WeitaoWang-oc <WeitaoWang-oc@zhaoxin.com> wrote: > > > On Thu, 23 Jul 2020 12:38:21 -0400, Alan wrote: > > > On Thu, Jul 23, 2020 at 10:17:35AM -0600, Alex Williamson wrote: > > > > The IOMMU grouping restriction does solve the hardware issue, so long > > > > as one driver doesn't blindly assume the driver private data for > > > > another device and modify it. > > > > > > Correction: The IOMMU grouping restriction solves the hardware issue for > > > vfio-pci. It won't necessarily help if some other driver comes along > > > and wants to bind to this hardware. > > > > > > > I do agree that your solution would > > > > work, requiring all devices are unbound before any can be bound, but it > > > > also seems difficult to manage. The issue is largely unique to USB > > > > AFAIK. On the other hand, drivers coordinating with each other to > > > > register their _private_ data as share-able within a set of drivers > > > > seems like a much more direct and explicit interaction between the > > > > drivers. Thanks, > > > > > > Yes, that makes sense. But it would have to be implemented in the > > > driver core, not in particular subsystems like USB or PCI. And it might > > > be seen as overkill, given that only UHCI/OHCI/EHCI devices require this > > > sort of sharing AFAIK. > > > > > > Also, when you think about it, what form would such coordination among > > > drivers take? From your description, it sounds like the drivers would > > > agree to avoid accessing each other's private data if the proper > > > registration wasn't in place. > > > > > > On the other hand, a stronger and perhaps more robust approach would be > > > to enforce the condition that non-cooperating drivers are never bound to > > > devices in the same group at the same time. That's basically what I'm > > > proposing here -- the question is whether the enforcement should be > > > instituted in the kernel or should merely be part of a standard protocol > > > followed by userspace drivers. > > > > > > Given that it's currently needed in only one place, it seems reasonable > > > to leave this as a "gentlemen's agreement" in userspace for the time > > > being instead of adding it to the kernel. > > > > > > > Provided that EHCI and UHCI host controller declare not support P2P and > > ACS. So, we can assign EHCI and UHCI host controller to different IOMMU > > group separately. We assign EHCI host controller to host and assign UHCI > > host controller to VM. Then, ehci_hcd driver load/unload operation in host > > will cause the same issue as discussed > > And you have an example of such a device? I expect these do not exist, > nor should they. It seems like it would be an improper use of ACS. > Thanks, In kernel source code tree drivers/pci/quirks.c, There is a device list declared by pci_dev_acs_enabled. In which list, such as multi-function device without acs capability not allow peer-to-peer bewteen functions. Those device can be assign to to different IOMMU group separately. Thnaks weitao
> -----Original Mail----- > Sender : Alan Stern <stern@rowland.harvard.edu> > Time : 2020/7/24 0:39 > Receiver : Alex Williamson <alex.williamson@redhat.com> > CC : Weitao Wang(BJ-RD) <WeitaoWang@zhaoxin.com>; Greg KH > <gregkh@linuxfoundation.org>; WeitaoWang-oc > <WeitaoWang-oc@zhaoxin.com>; mathias.nyman@linux.intel.com; > ulf.hansson@linaro.org; vkoul@kernel.org; hslester96@gmail.com; > linux-usb@vger.kernel.org; linux-kernel@vger.kernel.org; > Carsten_Schmid@mentor.com; efremov@linux.com; Tony W. Wang(XA-RD) > <TonyWWang@zhaoxin.com>; Cobe Chen(BJ-RD) <CobeChen@zhaoxin.com>; > Tim Guo(BJ-RD) <TimGuo@zhaoxin.com>; wwt8723@163.com > Subject : Re: [PATCH] USB:Fix kernel NULL pointer when unbind UHCI form vfio-pci > > On Thu, Jul 23, 2020 at 10:17:35AM -0600, Alex Williamson wrote: > > The IOMMU grouping restriction does solve the hardware issue, so long > > as one driver doesn't blindly assume the driver private data for > > another device and modify it. > > Correction: The IOMMU grouping restriction solves the hardware issue for > vfio-pci. It won't necessarily help if some other driver comes along and wants > to bind to this hardware. > > > I do agree that your solution would > > work, requiring all devices are unbound before any can be bound, but > > it also seems difficult to manage. The issue is largely unique to USB > > AFAIK. On the other hand, drivers coordinating with each other to > > register their _private_ data as share-able within a set of drivers > > seems like a much more direct and explicit interaction between the > > drivers. Thanks, > > Yes, that makes sense. But it would have to be implemented in the driver core, > not in particular subsystems like USB or PCI. And it might be seen as overkill, > given that only UHCI/OHCI/EHCI devices require this sort of sharing AFAIK. > > Also, when you think about it, what form would such coordination among drivers > take? From your description, it sounds like the drivers would agree to avoid > accessing each other's private data if the proper registration wasn't in place. > > On the other hand, a stronger and perhaps more robust approach would be to > enforce the condition that non-cooperating drivers are never bound to devices > in the same group at the same time. That's basically what I'm proposing here > -- the question is whether the enforcement should be instituted in the kernel or > should merely be part of a standard protocol followed by userspace drivers. > > Given that it's currently needed in only one place, it seems reasonable to leave > this as a "gentlemen's agreement" in userspace for the time being instead of > adding it to the kernel. But Qemu and Virt-manager don't comply with this "gentlemen's agreement" even if the EHCI and UHCI sit in the same IOMMU group. And the current behavior of Qemu and linux USB driver will cause a catastrophe. Maybe the hackers can use this BUG to attack the computer. We shouldn't believe the VMM always comply with the "gentlemen's agreement", I think. > > Alan Stern
diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c index 1547aa6..484f2a0 100644 --- a/drivers/usb/core/hcd-pci.c +++ b/drivers/usb/core/hcd-pci.c @@ -34,6 +34,7 @@ static DECLARE_RWSEM(companions_rwsem); #define CL_OHCI PCI_CLASS_SERIAL_USB_OHCI #define CL_EHCI PCI_CLASS_SERIAL_USB_EHCI +#define PCI_DEV_DRV_FLAG 2 static inline int is_ohci_or_uhci(struct pci_dev *pdev) { return pdev->class == CL_OHCI || pdev->class == CL_UHCI; @@ -68,6 +69,8 @@ static void for_each_companion(struct pci_dev *pdev, struct usb_hcd *hcd, if (companion->class != CL_UHCI && companion->class != CL_OHCI && companion->class != CL_EHCI) continue; + if (!(companion->priv_flags & PCI_DEV_DRV_FLAG)) + continue; companion_hcd = pci_get_drvdata(companion); if (!companion_hcd || !companion_hcd->self.root_hub) @@ -254,6 +257,7 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const struct pci_device_id *id, } pci_set_master(dev); + dev->priv_flags | = PCI_DEV_DRV_FLAG; /* Note: dev_set_drvdata must be called while holding the rwsem */ if (dev->class == CL_EHCI) { @@ -326,6 +330,7 @@ void usb_hcd_pci_remove(struct pci_dev *dev) local_irq_disable(); usb_hcd_irq(0, hcd); local_irq_enable(); + dev->priv_flags & = ~PCI_DEV_DRV_FLAG; /* Note: dev_set_drvdata must be called while holding the rwsem */ if (dev->class == CL_EHCI) {
This bug is found in Zhaoxin platform, but it's a commom code bug. Fail sequence: step1: Unbind UHCI controller from native driver; step2: Bind UHCI controller to vfio-pci, which will put UHCI controller in one vfio group's device list and set UHCI's dev->driver_data to struct vfio-pci(for UHCI) step3: Unbind EHCI controller from native driver, will try to tell UHCI native driver that "I'm removed by set companion_hcd->self.hs_companion to NULL. However, companion_hcd get from UHCI's dev->driver_data that has modified by vfio-pci already.So, the vfio-pci structure will be damaged! step4: Bind EHCI controller to vfio-pci driver, which will put EHCI controller in the same vfio group as UHCI controller; ... ... step5: Unbind UHCI controller from vfio-pci, which will delete UHCI from vfio group' device list that has been damaged in step 3. So,delete operation can random result into a NULL pointer dereference with the below stack dump. step6: Bind UHCI controller to native driver; step7: Unbind EHCI controller from vfio-pci, which will try to remove EHCI controller from the vfio group; step8: Bind EHCI controller to native driver; [ 929.114641] uhci_hcd 0000:00:10.0: remove, state 1 [ 929.114652] usb usb1: USB disconnect, device number 1 [ 929.114655] usb 1-1: USB disconnect, device number 2 [ 929.270313] usb 1-2: USB disconnect, device number 3 [ 929.318404] uhci_hcd 0000:00:10.0: USB bus 1 deregistered [ 929.343029] uhci_hcd 0000:00:10.1: remove, state 4 [ 929.343045] usb usb3: USB disconnect, device number 1 [ 929.343685] uhci_hcd 0000:00:10.1: USB bus 3 deregistered [ 929.369087] ehci-pci 0000:00:10.7: remove, state 4 [ 929.369102] usb usb4: USB disconnect, device number 1 [ 929.370325] ehci-pci 0000:00:10.7: USB bus 4 deregistered [ 932.398494] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 932.398496] PGD 42a67d067 P4D 42a67d067 PUD 42a65f067 PMD 0 [ 932.398502] Oops: 0002 [#2] SMP NOPTI [ 932.398505] CPU: 2 PID: 7824 Comm: vfio_unbind.sh Tainted: P D 4.19.65-2020051917-rainos #1 [ 932.398506] Hardware name: Shanghai Zhaoxin Semiconductor Co., Ltd. HX002EH/HX002EH, BIOS HX002EH0_01_R480_R_200408 04/08/2020 [ 932.398513] RIP: 0010:vfio_device_put+0x31/0xa0 [vfio] [ 932.398515] Code: 89 e5 41 54 53 4c 8b 67 18 48 89 fb 49 8d 74 24 30 e8 e3 0e f3 de 84 c0 74 67 48 8b 53 20 48 8b 43 28 48 8b 7b 18 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 20 48 b8 00 02 00 [ 932.398516] RSP: 0018:ffffbbfd04cffc18 EFLAGS: 00010202 [ 932.398518] RAX: 0000000000000000 RBX: ffff92c7ea717880 RCX: 0000000000000000 [ 932.398519] RDX: ffff92c7ea713620 RSI: ffff92c7ea713630 RDI: ffff92c7ea713600 [ 932.398521] RBP: ffffbbfd04cffc28 R08: ffff92c7f02a8080 R09: ffff92c7efc03980 [ 932.398522] R10: ffffbbfd04cff9a8 R11: 0000000000000000 R12: ffff92c7ea713600 [ 932.398523] R13: ffff92c7ed8bb0a8 R14: ffff92c7ea717880 R15: 0000000000000000 [ 932.398525] FS: 00007f3031500740(0000) GS:ffff92c7f0280000(0000) knlGS:0000000000000000 [ 932.398526] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 932.398527] CR2: 0000000000000000 CR3: 0000000428626004 CR4: 0000000000160ee0 [ 932.398528] Call Trace: [ 932.398534] vfio_del_group_dev+0xe8/0x2a0 [vfio] [ 932.398539] ? __blocking_notifier_call_chain+0x52/0x60 [ 932.398542] ? do_wait_intr_irq+0x90/0x90 [ 932.398546] ? iommu_bus_notifier+0x75/0x100 [ 932.398551] vfio_pci_remove+0x20/0xa0 [vfio_pci] [ 932.398554] pci_device_remove+0x3e/0xc0 [ 932.398557] device_release_driver_internal+0x17a/0x240 [ 932.398560] device_release_driver+0x12/0x20 [ 932.398561] unbind_store+0xee/0x180 [ 932.398564] drv_attr_store+0x27/0x40 [ 932.398567] sysfs_kf_write+0x3c/0x50 [ 932.398568] kernfs_fop_write+0x125/0x1a0 [ 932.398572] __vfs_write+0x3a/0x190 [ 932.398575] ? apparmor_file_permission+0x1a/0x20 [ 932.398577] ? security_file_permission+0x3b/0xc0 [ 932.398581] ? _cond_resched+0x1a/0x50 [ 932.398582] vfs_write+0xb8/0x1b0 [ 932.398584] ksys_write+0x5c/0xe0 [ 932.398586] __x64_sys_write+0x1a/0x20 [ 932.398589] do_syscall_64+0x5a/0x110 [ 932.398592] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Using virt-manager/qemu to boot guest os, we can see the same fail sequence! Fix this by check for UHCI driver loaded or not before modifiy UHCI's dev->driver_data, which will happen in ehci native driver probe/remove. Signed-off-by: WeitaoWangoc <WeitaoWang-oc@zhaoxin.com> --- drivers/usb/core/hcd-pci.c | 5 +++++ 1 file changed, 5 insertions(+) -- 2.7.4 保密声明: 本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。 CONFIDENTIAL NOTE: This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited.