diff mbox series

PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier

Message ID PU1P153MB0169DBCFEE7257F5BB93580ABFD90@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM (mailing list archive)
State Superseded, archived
Headers show
Series PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier | expand

Commit Message

Dexuan Cui Aug. 2, 2019, 1:32 a.m. UTC
When a slot is removed, the pci_dev must still exist.

pci_remove_root_bus() removes and free all the pci_devs, so
hv_pci_remove_slots() must be called before pci_remove_root_bus(),
otherwise a general protection fault can happen, if the kernel is built
with the memory debugging options.

Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org

---

When pci-hyperv is unloaded, this panic can happen:

 general protection fault:
 CPU: 2 PID: 1091 Comm: rmmod Not tainted 5.2.0+
 RIP: 0010:pci_slot_release+0x30/0xd0
 Call Trace:
  kobject_release+0x65/0x190
  pci_destroy_slot+0x25/0x60
  hv_pci_remove+0xec/0x110 [pci_hyperv]
  vmbus_remove+0x20/0x30 [hv_vmbus]
  device_release_driver_internal+0xd5/0x1b0
  driver_detach+0x44/0x7c
  bus_remove_driver+0x75/0xc7
  vmbus_driver_unregister+0x50/0xbd [hv_vmbus]
  __x64_sys_delete_module+0x136/0x200
  do_syscall_64+0x5e/0x220

 drivers/pci/controller/pci-hyperv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bjorn Helgaas Aug. 2, 2019, 7:40 p.m. UTC | #1
Hi Dexuan,

The subject line only describes the mechanical code change, which is
obvious from the patch.  It would be better if we could say something
about *why* we need this.

On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote:
> 
> When a slot is removed, the pci_dev must still exist.
> 
> pci_remove_root_bus() removes and free all the pci_devs, so
> hv_pci_remove_slots() must be called before pci_remove_root_bus(),
> otherwise a general protection fault can happen, if the kernel is built

"general protection fault" is an x86 term that doesn't really say what
the issue is.  I suspect this would be a "use-after-free" problem.

> with the memory debugging options.
> 
> Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: stable@vger.kernel.org
> 
> ---
> 
> When pci-hyperv is unloaded, this panic can happen:
> 
>  general protection fault:
>  CPU: 2 PID: 1091 Comm: rmmod Not tainted 5.2.0+
>  RIP: 0010:pci_slot_release+0x30/0xd0
>  Call Trace:
>   kobject_release+0x65/0x190
>   pci_destroy_slot+0x25/0x60
>   hv_pci_remove+0xec/0x110 [pci_hyperv]
>   vmbus_remove+0x20/0x30 [hv_vmbus]
>   device_release_driver_internal+0xd5/0x1b0
>   driver_detach+0x44/0x7c
>   bus_remove_driver+0x75/0xc7
>   vmbus_driver_unregister+0x50/0xbd [hv_vmbus]
>   __x64_sys_delete_module+0x136/0x200
>   do_syscall_64+0x5e/0x220
> 
>  drivers/pci/controller/pci-hyperv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 6b9cc6e60a..68c611d 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
>  		/* Remove the bus from PCI's point of view. */
>  		pci_lock_rescan_remove();
>  		pci_stop_root_bus(hbus->pci_bus);
> -		pci_remove_root_bus(hbus->pci_bus);
>  		hv_pci_remove_slots(hbus);
> +		pci_remove_root_bus(hbus->pci_bus);

I'm curious about why we need hv_pci_remove_slots() at all.  None of
the other callers of pci_stop_root_bus() and pci_remove_root_bus() do
anything similar to hv_pci_remove_slots().

Surely some of those callers also support slots, so there must be some
other path that calls pci_destroy_slot() in those cases.  Can we use a
similar strategy here?

>  		pci_unlock_rescan_remove();
>  		hbus->state = hv_pcibus_removed;
>  	}
> -- 
> 1.8.3.1
>
Dexuan Cui Aug. 2, 2019, 8:31 p.m. UTC | #2
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Friday, August 2, 2019 12:41 PM
> The subject line only describes the mechanical code change, which is
> obvious from the patch.  It would be better if we could say something
> about *why* we need this.

Hi Bjorn,
Sorry. I'll try to write a better changelog in v2. :-)
 
> On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote:
> >
> > When a slot is removed, the pci_dev must still exist.
> >
> > pci_remove_root_bus() removes and free all the pci_devs, so
> > hv_pci_remove_slots() must be called before pci_remove_root_bus(),
> > otherwise a general protection fault can happen, if the kernel is built
> 
> "general protection fault" is an x86 term that doesn't really say what
> the issue is.  I suspect this would be a "use-after-free" problem.

Yes, it's use-after-free. I'll fix the the wording.
 
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
> >  		/* Remove the bus from PCI's point of view. */
> >  		pci_lock_rescan_remove();
> >  		pci_stop_root_bus(hbus->pci_bus);
> > -		pci_remove_root_bus(hbus->pci_bus);
> >  		hv_pci_remove_slots(hbus);
> > +		pci_remove_root_bus(hbus->pci_bus);
> 
> I'm curious about why we need hv_pci_remove_slots() at all.  None of
> the other callers of pci_stop_root_bus() and pci_remove_root_bus() do
> anything similar to hv_pci_remove_slots().
> 
> Surely some of those callers also support slots, so there must be some
> other path that calls pci_destroy_slot() in those cases.  Can we use a
> similar strategy here?

Originally Stephen Heminger added the slot code for pci-hyperv.c:
a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
So he may know this better. My understanding is: we can not use the similar
stragegy used in the 2 other users of pci_create_slot():

drivers/pci/hotplug/pci_hotplug_core.c calls pci_create_slot().
It looks drivers/pci/hotplug/ is quite different from pci-hyperv.c because
pci-hyper-v uses a simple *private* hot-plug protocol, making it impossible
to use the API pci_hp_register() and pci_hp_destroy() -> pci_destroy_slot().

drivers/acpi/pci_slot.c calls pci_create_slot(), and saves the created slots in
the static "slot_list" list in the same file. Again, since pci-hyper-v uses a private
PCI-device-discovery protocol (which is based on VMBus rather the emulated
ACPI and PCI), acpi_pci_slot_enumerate() can not find the PCI devices that are
discovered by pci-hyperv, so we can not use the standard register_slot() ->
pci_create_slot() to create the slots and hence acpi_pci_slot_remove() -> 
pci_destroy_slot() can not work for pci-hyperv.

I think I can use this as the v2 changelog:

The slot must be removed before the pci_dev is removed, otherwise a panic
can happen due to use-after-free.

Thanks,
Dexuan
Bjorn Helgaas Aug. 2, 2019, 10:15 p.m. UTC | #3
On Fri, Aug 02, 2019 at 08:31:26PM +0000, Dexuan Cui wrote:
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Friday, August 2, 2019 12:41 PM
> > The subject line only describes the mechanical code change, which is
> > obvious from the patch.  It would be better if we could say something
> > about *why* we need this.
> 
> Hi Bjorn,
> Sorry. I'll try to write a better changelog in v2. :-)
>  
> > On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote:
> > >
> > > When a slot is removed, the pci_dev must still exist.
> > >
> > > pci_remove_root_bus() removes and free all the pci_devs, so
> > > hv_pci_remove_slots() must be called before pci_remove_root_bus(),
> > > otherwise a general protection fault can happen, if the kernel is built
> > 
> > "general protection fault" is an x86 term that doesn't really say what
> > the issue is.  I suspect this would be a "use-after-free" problem.
> 
> Yes, it's use-after-free. I'll fix the the wording.
>  
> > > --- a/drivers/pci/controller/pci-hyperv.c
> > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
> > >  		/* Remove the bus from PCI's point of view. */
> > >  		pci_lock_rescan_remove();
> > >  		pci_stop_root_bus(hbus->pci_bus);
> > > -		pci_remove_root_bus(hbus->pci_bus);
> > >  		hv_pci_remove_slots(hbus);
> > > +		pci_remove_root_bus(hbus->pci_bus);
> > 
> > I'm curious about why we need hv_pci_remove_slots() at all.  None of
> > the other callers of pci_stop_root_bus() and pci_remove_root_bus() do
> > anything similar to hv_pci_remove_slots().
> > 
> > Surely some of those callers also support slots, so there must be some
> > other path that calls pci_destroy_slot() in those cases.  Can we use a
> > similar strategy here?
> 
> Originally Stephen Heminger added the slot code for pci-hyperv.c:
> a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
> So he may know this better. My understanding is: we can not use the similar
> stragegy used in the 2 other users of pci_create_slot():
> 
> drivers/pci/hotplug/pci_hotplug_core.c calls pci_create_slot().
> It looks drivers/pci/hotplug/ is quite different from pci-hyperv.c because
> pci-hyper-v uses a simple *private* hot-plug protocol, making it impossible
> to use the API pci_hp_register() and pci_hp_destroy() -> pci_destroy_slot().
> 
> drivers/acpi/pci_slot.c calls pci_create_slot(), and saves the created slots in
> the static "slot_list" list in the same file. Again, since pci-hyper-v uses a private
> PCI-device-discovery protocol (which is based on VMBus rather the emulated
> ACPI and PCI), acpi_pci_slot_enumerate() can not find the PCI devices that are
> discovered by pci-hyperv, so we can not use the standard register_slot() ->
> pci_create_slot() to create the slots and hence acpi_pci_slot_remove() -> 
> pci_destroy_slot() can not work for pci-hyperv.

Hmm, ok.  This still doesn't seem right to me, but I think the bottom
line will be that the current slot registration interfaces just don't
work quite right for all the cases we want them to.

Maybe it would be a good project for somebody to rethink them, but it
doesn't seem practical for *this* patch.  Thanks for looking into it
this far!

> I think I can use this as the v2 changelog:
> 
> The slot must be removed before the pci_dev is removed, otherwise a panic
> can happen due to use-after-free.

Sounds good.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 6b9cc6e60a..68c611d 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2757,8 +2757,8 @@  static int hv_pci_remove(struct hv_device *hdev)
 		/* Remove the bus from PCI's point of view. */
 		pci_lock_rescan_remove();
 		pci_stop_root_bus(hbus->pci_bus);
-		pci_remove_root_bus(hbus->pci_bus);
 		hv_pci_remove_slots(hbus);
+		pci_remove_root_bus(hbus->pci_bus);
 		pci_unlock_rescan_remove();
 		hbus->state = hv_pcibus_removed;
 	}