Message ID | 20250307003956.22018-1-emil.s.tantilov@intel.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [iwl-net] idpf: fix adapter NULL pointer dereference on reboot | expand |
On Thu, Mar 06, 2025 at 04:39:56PM -0800, Emil Tantilov wrote: > Driver calls idpf_remove() from idpf_shutdown(), which can end up > calling idpf_remove() again when disabling SRIOV. > The same is done in other drivers (ice, iavf). Why here it is a problem? I am asking because heaving one function to remove is pretty handy. Maybe the problem can be fixed by some changes in idpf_remove() instead? > echo 1 > /sys/class/net/<netif>/device/sriov_numvfs > reboot > > BUG: kernel NULL pointer dereference, address: 0000000000000020 > ... > RIP: 0010:idpf_remove+0x22/0x1f0 [idpf] > ... > ? idpf_remove+0x22/0x1f0 [idpf] > ? idpf_remove+0x1e4/0x1f0 [idpf] > pci_device_remove+0x3f/0xb0 > device_release_driver_internal+0x19f/0x200 > pci_stop_bus_device+0x6d/0x90 > pci_stop_and_remove_bus_device+0x12/0x20 > pci_iov_remove_virtfn+0xbe/0x120 > sriov_disable+0x34/0xe0 > idpf_sriov_configure+0x58/0x140 [idpf] > idpf_remove+0x1b9/0x1f0 [idpf] > idpf_shutdown+0x12/0x30 [idpf] > pci_device_shutdown+0x35/0x60 > device_shutdown+0x156/0x200 > ... > > Replace the direct idpf_remove() call in idpf_shutdown() with > idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform > the bulk of the cleanup, such as stopping the init task, freeing IRQs, > destroying the vports and freeing the mailbox. > > Reported-by: Yuying Ma <yuma@redhat.com> > Fixes: e850efed5e15 ("idpf: add module register and probe functionality") > Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> > Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> > --- > drivers/net/ethernet/intel/idpf/idpf_main.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c > index b6c515d14cbf..bec4a02c5373 100644 > --- a/drivers/net/ethernet/intel/idpf/idpf_main.c > +++ b/drivers/net/ethernet/intel/idpf/idpf_main.c > @@ -87,7 +87,11 @@ static void idpf_remove(struct pci_dev *pdev) > */ > static void idpf_shutdown(struct pci_dev *pdev) > { > - idpf_remove(pdev); > + struct idpf_adapter *adapter = pci_get_drvdata(pdev); > + > + cancel_delayed_work_sync(&adapter->vc_event_task); > + idpf_vc_core_deinit(adapter); > + idpf_deinit_dflt_mbx(adapter); > > if (system_state == SYSTEM_POWER_OFF) > pci_set_power_state(pdev, PCI_D3hot); > -- > 2.17.2
On Thu, Mar 06, 2025 at 04:39:56PM -0800, Emil Tantilov wrote: > Driver calls idpf_remove() from idpf_shutdown(), which can end up > calling idpf_remove() again when disabling SRIOV. > > echo 1 > /sys/class/net/<netif>/device/sriov_numvfs > reboot > > BUG: kernel NULL pointer dereference, address: 0000000000000020 > ... > RIP: 0010:idpf_remove+0x22/0x1f0 [idpf] > ... > ? idpf_remove+0x22/0x1f0 [idpf] > ? idpf_remove+0x1e4/0x1f0 [idpf] > pci_device_remove+0x3f/0xb0 > device_release_driver_internal+0x19f/0x200 > pci_stop_bus_device+0x6d/0x90 > pci_stop_and_remove_bus_device+0x12/0x20 > pci_iov_remove_virtfn+0xbe/0x120 > sriov_disable+0x34/0xe0 > idpf_sriov_configure+0x58/0x140 [idpf] > idpf_remove+0x1b9/0x1f0 [idpf] > idpf_shutdown+0x12/0x30 [idpf] > pci_device_shutdown+0x35/0x60 > device_shutdown+0x156/0x200 > ... > > Replace the direct idpf_remove() call in idpf_shutdown() with > idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform > the bulk of the cleanup, such as stopping the init task, freeing IRQs, > destroying the vports and freeing the mailbox. Hi Emil, I think it would be worth adding some commentary on the rest of the clean-up performed by idpf_remove() and why it is correct to no longer do so directly from a call to idpf_remove() from idpf_shutdown() (IOW, it isn't clear to me :). ...
On 3/9/2025 11:22 PM, Simon Horman wrote: > On Thu, Mar 06, 2025 at 04:39:56PM -0800, Emil Tantilov wrote: >> Driver calls idpf_remove() from idpf_shutdown(), which can end up >> calling idpf_remove() again when disabling SRIOV. >> >> echo 1 > /sys/class/net/<netif>/device/sriov_numvfs >> reboot >> >> BUG: kernel NULL pointer dereference, address: 0000000000000020 >> ... >> RIP: 0010:idpf_remove+0x22/0x1f0 [idpf] >> ... >> ? idpf_remove+0x22/0x1f0 [idpf] >> ? idpf_remove+0x1e4/0x1f0 [idpf] >> pci_device_remove+0x3f/0xb0 >> device_release_driver_internal+0x19f/0x200 >> pci_stop_bus_device+0x6d/0x90 >> pci_stop_and_remove_bus_device+0x12/0x20 >> pci_iov_remove_virtfn+0xbe/0x120 >> sriov_disable+0x34/0xe0 >> idpf_sriov_configure+0x58/0x140 [idpf] >> idpf_remove+0x1b9/0x1f0 [idpf] >> idpf_shutdown+0x12/0x30 [idpf] >> pci_device_shutdown+0x35/0x60 >> device_shutdown+0x156/0x200 >> ... >> >> Replace the direct idpf_remove() call in idpf_shutdown() with >> idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform >> the bulk of the cleanup, such as stopping the init task, freeing IRQs, >> destroying the vports and freeing the mailbox. > > Hi Emil, > > I think it would be worth adding some commentary on the rest of > the clean-up performed by idpf_remove() and why it is correct The main reason behind the change is to avoid calling sriov_disable(), which ends up calling idpf_remove() again via pci_device_remove(). The idpf_remove() will crash in that situation as it attempts to access adapter pointer, which was already freed. > to no longer do so directly from a call to idpf_remove() from > idpf_shutdown() (IOW, it isn't clear to me :). I assume you are asking what portion of the idpf_remove() will not be present in idpf_shutdown() as result? Aside from not calling sriov_disable(), there is a small cleanup of stale netdevs and the destruction of WQs, which did not seem like would be needed on shutdown. Then again, I was not able to find documentation on what steps are required for shutdown and mostly checked on how other drivers handle it (where there is no 1:1 overlap between shutdown and remove), and applied similar steps to idpf. Ideally I do not wish to do more than is needed for that flow. > > ...
On 3/6/2025 9:58 PM, Michal Swiatkowski wrote: > On Thu, Mar 06, 2025 at 04:39:56PM -0800, Emil Tantilov wrote: >> Driver calls idpf_remove() from idpf_shutdown(), which can end up >> calling idpf_remove() again when disabling SRIOV. >> > > The same is done in other drivers (ice, iavf). Why here it is a problem? > I am asking because heaving one function to remove is pretty handy. > Maybe the problem can be fixed by some changes in idpf_remove() instead? It was indeed handy, until we ran into the crash. I did look into fixing it in idpf_remove(), but I don't think I have a lot of options. I can simply check and exit on adapter being NULL, but this types of checks are usually frowned upon, so I looked into alternatives. The main difference between idpf and ice is that idpf will load on both VF and PF devices. From what I can tell, the VFs created by ice are supported by iavf (0x1889 device id). With VFs created, on idpf, we end up calling into idpf_remove() twice. First on shutdown and then again when idpf_remove calls into sriov_disable(), because the VF devices have the same driver, hence the same remove routine. > >> echo 1 > /sys/class/net/<netif>/device/sriov_numvfs >> reboot >> >> BUG: kernel NULL pointer dereference, address: 0000000000000020 >> ... >> RIP: 0010:idpf_remove+0x22/0x1f0 [idpf] >> ... >> ? idpf_remove+0x22/0x1f0 [idpf] >> ? idpf_remove+0x1e4/0x1f0 [idpf] >> pci_device_remove+0x3f/0xb0 >> device_release_driver_internal+0x19f/0x200 >> pci_stop_bus_device+0x6d/0x90 >> pci_stop_and_remove_bus_device+0x12/0x20 >> pci_iov_remove_virtfn+0xbe/0x120 >> sriov_disable+0x34/0xe0 >> idpf_sriov_configure+0x58/0x140 [idpf] >> idpf_remove+0x1b9/0x1f0 [idpf] >> idpf_shutdown+0x12/0x30 [idpf] >> pci_device_shutdown+0x35/0x60 >> device_shutdown+0x156/0x200 >> ... >> >> Replace the direct idpf_remove() call in idpf_shutdown() with >> idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform >> the bulk of the cleanup, such as stopping the init task, freeing IRQs, >> destroying the vports and freeing the mailbox. >> >> Reported-by: Yuying Ma <yuma@redhat.com> >> Fixes: e850efed5e15 ("idpf: add module register and probe functionality") >> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> >> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> >> --- >> drivers/net/ethernet/intel/idpf/idpf_main.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c >> index b6c515d14cbf..bec4a02c5373 100644 >> --- a/drivers/net/ethernet/intel/idpf/idpf_main.c >> +++ b/drivers/net/ethernet/intel/idpf/idpf_main.c >> @@ -87,7 +87,11 @@ static void idpf_remove(struct pci_dev *pdev) >> */ >> static void idpf_shutdown(struct pci_dev *pdev) >> { >> - idpf_remove(pdev); >> + struct idpf_adapter *adapter = pci_get_drvdata(pdev); >> + >> + cancel_delayed_work_sync(&adapter->vc_event_task); >> + idpf_vc_core_deinit(adapter); >> + idpf_deinit_dflt_mbx(adapter); >> >> if (system_state == SYSTEM_POWER_OFF) >> pci_set_power_state(pdev, PCI_D3hot); >> -- >> 2.17.2
diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c index b6c515d14cbf..bec4a02c5373 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_main.c +++ b/drivers/net/ethernet/intel/idpf/idpf_main.c @@ -87,7 +87,11 @@ static void idpf_remove(struct pci_dev *pdev) */ static void idpf_shutdown(struct pci_dev *pdev) { - idpf_remove(pdev); + struct idpf_adapter *adapter = pci_get_drvdata(pdev); + + cancel_delayed_work_sync(&adapter->vc_event_task); + idpf_vc_core_deinit(adapter); + idpf_deinit_dflt_mbx(adapter); if (system_state == SYSTEM_POWER_OFF) pci_set_power_state(pdev, PCI_D3hot);