diff mbox series

[iwl-net,v2] ice: Fix use after free during unload with ports in bridge

Message ID 20241009151835.5971-1-marcin.szycik@linux.intel.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [iwl-net,v2] ice: Fix use after free during unload with ports in bridge | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 6 this patch: 6
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 3 blamed authors not CCed: anthony.l.nguyen@intel.com wojciech.drewek@intel.com piotr.raczynski@intel.com; 7 maintainers not CCed: anthony.l.nguyen@intel.com edumazet@google.com wojciech.drewek@intel.com piotr.raczynski@intel.com przemyslaw.kitszel@intel.com pabeni@redhat.com kuba@kernel.org
netdev/build_clang success Errors and warnings before: 6 this patch: 6
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 5 this patch: 5
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 8 this patch: 8
netdev/source_inline success Was 0 now: 0

Commit Message

Marcin Szycik Oct. 9, 2024, 3:18 p.m. UTC
Unloading the ice driver while switchdev port representors are added to
a bridge can lead to kernel panic. Reproducer:

  modprobe ice

  devlink dev eswitch set $PF1_PCI mode switchdev

  ip link add $BR type bridge
  ip link set $BR up

  echo 2 > /sys/class/net/$PF1/device/sriov_numvfs
  sleep 2

  ip link set $PF1 master $BR
  ip link set $VF1_PR master $BR
  ip link set $VF2_PR master $BR
  ip link set $PF1 up
  ip link set $VF1_PR up
  ip link set $VF2_PR up
  ip link set $VF1 up

  rmmod irdma ice

When unloading the driver, ice_eswitch_detach() is eventually called as
part of VF freeing. First, it removes a port representor from xarray,
then unregister_netdev() is called (via repr->ops.rem()), finally
representor is deallocated. The problem comes from the bridge doing its
own deinit at the same time. unregister_netdev() triggers a notifier
chain, resulting in ice_eswitch_br_port_deinit() being called. It should
set repr->br_port = NULL, but this does not happen since repr has
already been removed from xarray and is not found. Regardless, it
finishes up deallocating br_port. At this point, repr is still not freed
and an fdb event can happen, in which ice_eswitch_br_fdb_event_work()
takes repr->br_port and tries to use it, which causes a panic (use after
free).

Note that this only happens with 2 or more port representors added to
the bridge, since with only one representor port, the bridge deinit is
slightly different (ice_eswitch_br_port_deinit() is called via
ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()).

Trace:
  Oops: general protection fault, probably for non-canonical address 0xf129010fd1a93284: 0000 [#1] PREEMPT SMP KASAN NOPTI
  KASAN: maybe wild-memory-access in range [0x8948287e8d499420-0x8948287e8d499427]
  (...)
  Workqueue: ice_bridge_wq ice_eswitch_br_fdb_event_work [ice]
  RIP: 0010:__rht_bucket_nested+0xb4/0x180
  (...)
  Call Trace:
   (...)
   ice_eswitch_br_fdb_find+0x3fa/0x550 [ice]
   ? __pfx_ice_eswitch_br_fdb_find+0x10/0x10 [ice]
   ice_eswitch_br_fdb_event_work+0x2de/0x1e60 [ice]
   ? __schedule+0xf60/0x5210
   ? mutex_lock+0x91/0xe0
   ? __pfx_ice_eswitch_br_fdb_event_work+0x10/0x10 [ice]
   ? ice_eswitch_br_update_work+0x1f4/0x310 [ice]
   (...)

A workaround is available: brctl setageing $BR 0, which stops the bridge
from adding fdb entries altogether.

Change the order of operations in ice_eswitch_detach(): move the call to
unregister_netdev() before removing repr from xarray. This way
repr->br_port will be correctly set to NULL in
ice_eswitch_br_port_deinit(), preventing a panic.

Fixes: fff292b47ac1 ("ice: add VF representors one by one")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
---
v2: Added trace excerpt
---
 drivers/net/ethernet/intel/ice/ice_eswitch.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Buvaneswaran, Sujai Oct. 22, 2024, 7:57 a.m. UTC | #1
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Marcin Szycik
> Sent: Wednesday, October 9, 2024 8:49 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Paul Menzel <pmenzel@molgen.mpg.de>;
> Marcin Szycik <marcin.szycik@linux.intel.com>; Michal Swiatkowski
> <michal.swiatkowski@linux.intel.com>
> Subject: [Intel-wired-lan] [PATCH iwl-net v2] ice: Fix use after free during
> unload with ports in bridge
> 
> Unloading the ice driver while switchdev port representors are added to a
> bridge can lead to kernel panic. Reproducer:
> 
>   modprobe ice
> 
>   devlink dev eswitch set $PF1_PCI mode switchdev
> 
>   ip link add $BR type bridge
>   ip link set $BR up
> 
>   echo 2 > /sys/class/net/$PF1/device/sriov_numvfs
>   sleep 2
> 
>   ip link set $PF1 master $BR
>   ip link set $VF1_PR master $BR
>   ip link set $VF2_PR master $BR
>   ip link set $PF1 up
>   ip link set $VF1_PR up
>   ip link set $VF2_PR up
>   ip link set $VF1 up
> 
>   rmmod irdma ice
> 
> When unloading the driver, ice_eswitch_detach() is eventually called as part
> of VF freeing. First, it removes a port representor from xarray, then
> unregister_netdev() is called (via repr->ops.rem()), finally representor is
> deallocated. The problem comes from the bridge doing its own deinit at the
> same time. unregister_netdev() triggers a notifier chain, resulting in
> ice_eswitch_br_port_deinit() being called. It should set repr->br_port = NULL,
> but this does not happen since repr has already been removed from xarray
> and is not found. Regardless, it finishes up deallocating br_port. At this point,
> repr is still not freed and an fdb event can happen, in which
> ice_eswitch_br_fdb_event_work() takes repr->br_port and tries to use it,
> which causes a panic (use after free).
> 
> Note that this only happens with 2 or more port representors added to the
> bridge, since with only one representor port, the bridge deinit is slightly
> different (ice_eswitch_br_port_deinit() is called via
> ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()).
> 
> Trace:
>   Oops: general protection fault, probably for non-canonical address
> 0xf129010fd1a93284: 0000 [#1] PREEMPT SMP KASAN NOPTI
>   KASAN: maybe wild-memory-access in range [0x8948287e8d499420-
> 0x8948287e8d499427]
>   (...)
>   Workqueue: ice_bridge_wq ice_eswitch_br_fdb_event_work [ice]
>   RIP: 0010:__rht_bucket_nested+0xb4/0x180
>   (...)
>   Call Trace:
>    (...)
>    ice_eswitch_br_fdb_find+0x3fa/0x550 [ice]
>    ? __pfx_ice_eswitch_br_fdb_find+0x10/0x10 [ice]
>    ice_eswitch_br_fdb_event_work+0x2de/0x1e60 [ice]
>    ? __schedule+0xf60/0x5210
>    ? mutex_lock+0x91/0xe0
>    ? __pfx_ice_eswitch_br_fdb_event_work+0x10/0x10 [ice]
>    ? ice_eswitch_br_update_work+0x1f4/0x310 [ice]
>    (...)
> 
> A workaround is available: brctl setageing $BR 0, which stops the bridge from
> adding fdb entries altogether.
> 
> Change the order of operations in ice_eswitch_detach(): move the call to
> unregister_netdev() before removing repr from xarray. This way
> repr->br_port will be correctly set to NULL in
> ice_eswitch_br_port_deinit(), preventing a panic.
> 
> Fixes: fff292b47ac1 ("ice: add VF representors one by one")
> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
> ---
> v2: Added trace excerpt
> ---
>  drivers/net/ethernet/intel/ice/ice_eswitch.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c
index c0b3e70a7ea3..fb527434b58b 100644
--- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
+++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
@@ -552,13 +552,14 @@  int ice_eswitch_attach_sf(struct ice_pf *pf, struct ice_dynamic_port *sf)
 static void ice_eswitch_detach(struct ice_pf *pf, struct ice_repr *repr)
 {
 	ice_eswitch_stop_reprs(pf);
+	repr->ops.rem(repr);
+
 	xa_erase(&pf->eswitch.reprs, repr->id);
 
 	if (xa_empty(&pf->eswitch.reprs))
 		ice_eswitch_disable_switchdev(pf);
 
 	ice_eswitch_release_repr(pf, repr);
-	repr->ops.rem(repr);
 	ice_repr_destroy(repr);
 
 	if (xa_empty(&pf->eswitch.reprs)) {