Message ID | 20241009124912.9774-2-marcin.szycik@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [iwl-net] ice: Fix use after free during unload with ports in bridge | expand |
Dear Marcin, Thank you for the patch, and the reproducer and detailed commit message. Am 09.10.24 um 14:49 schrieb Marcin Szycik: > Unloading the ice driver while switchdev port representors are added to > a bridge can lead to kernel panic. Reproducer: > > modprobe ice > > devlink dev eswitch set $PF1_PCI mode switchdev > > ip link add $BR type bridge > ip link set $BR up > > echo 2 > /sys/class/net/$PF1/device/sriov_numvfs > sleep 2 > > ip link set $PF1 master $BR > ip link set $VF1_PR master $BR > ip link set $VF2_PR master $BR > ip link set $PF1 up > ip link set $VF1_PR up > ip link set $VF2_PR up > ip link set $VF1 up > > rmmod irdma ice For people hitting the issue, an excerpt from the panic would also be nice, so it can be found more easily. > When unloading the driver, ice_eswitch_detach() is eventually called as > part of VF freeing. First, it removes a port representor from xarray, > then unregister_netdev() is called (via repr->ops.rem()), finally > representor is deallocated. The problem comes from the bridge doing its > own deinit at the same time. unregister_netdev() triggers a notifier > chain, resulting in ice_eswitch_br_port_deinit() being called. It should > set repr->br_port = NULL, but this does not happen since repr has > already been removed from xarray and is not found. Regardless, it > finishes up deallocating br_port. At this point, repr is still not freed > and an fdb event can happen, in which ice_eswitch_br_fdb_event_work() > takes repr->br_port and tries to use it, which causes a panic (use after > free). > > Note that this only happens with 2 or more port representors added to > the bridge, since with only one representor port, the bridge deinit is > slightly different (ice_eswitch_br_port_deinit() is called via > ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()). > > A workaround is available: brctl setageing $BR 0, which stops the bridge > from adding fdb entries altogether. > > Change the order of operations in ice_eswitch_detach(): move the call to > unregister_netdev() before removing repr from xarray. This way > repr->br_port will be correctly set to NULL in > ice_eswitch_br_port_deinit(), preventing a panic. > > Fixes: fff292b47ac1 ("ice: add VF representors one by one") > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> > --- > drivers/net/ethernet/intel/ice/ice_eswitch.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c > index c0b3e70a7ea3..fb527434b58b 100644 > --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c > +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c > @@ -552,13 +552,14 @@ int ice_eswitch_attach_sf(struct ice_pf *pf, struct ice_dynamic_port *sf) > static void ice_eswitch_detach(struct ice_pf *pf, struct ice_repr *repr) > { > ice_eswitch_stop_reprs(pf); > + repr->ops.rem(repr); > + > xa_erase(&pf->eswitch.reprs, repr->id); > > if (xa_empty(&pf->eswitch.reprs)) > ice_eswitch_disable_switchdev(pf); > > ice_eswitch_release_repr(pf, repr); > - repr->ops.rem(repr); > ice_repr_destroy(repr); > > if (xa_empty(&pf->eswitch.reprs)) { Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Kind regards, Paul
On 09.10.2024 15:12, Paul Menzel wrote: > Dear Marcin, > > > Thank you for the patch, and the reproducer and detailed commit message. > > Am 09.10.24 um 14:49 schrieb Marcin Szycik: >> Unloading the ice driver while switchdev port representors are added to >> a bridge can lead to kernel panic. Reproducer: >> >> modprobe ice >> >> devlink dev eswitch set $PF1_PCI mode switchdev >> >> ip link add $BR type bridge >> ip link set $BR up >> >> echo 2 > /sys/class/net/$PF1/device/sriov_numvfs >> sleep 2 >> >> ip link set $PF1 master $BR >> ip link set $VF1_PR master $BR >> ip link set $VF2_PR master $BR >> ip link set $PF1 up >> ip link set $VF1_PR up >> ip link set $VF2_PR up >> ip link set $VF1 up >> >> rmmod irdma ice > > For people hitting the issue, an excerpt from the panic would also be nice, so it can be found more easily. Will add in v2, thanks. Marcin >> When unloading the driver, ice_eswitch_detach() is eventually called as >> part of VF freeing. First, it removes a port representor from xarray, >> then unregister_netdev() is called (via repr->ops.rem()), finally >> representor is deallocated. The problem comes from the bridge doing its >> own deinit at the same time. unregister_netdev() triggers a notifier >> chain, resulting in ice_eswitch_br_port_deinit() being called. It should >> set repr->br_port = NULL, but this does not happen since repr has >> already been removed from xarray and is not found. Regardless, it >> finishes up deallocating br_port. At this point, repr is still not freed >> and an fdb event can happen, in which ice_eswitch_br_fdb_event_work() >> takes repr->br_port and tries to use it, which causes a panic (use after >> free). >> >> Note that this only happens with 2 or more port representors added to >> the bridge, since with only one representor port, the bridge deinit is >> slightly different (ice_eswitch_br_port_deinit() is called via >> ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()). >> >> A workaround is available: brctl setageing $BR 0, which stops the bridge >> from adding fdb entries altogether. >> >> Change the order of operations in ice_eswitch_detach(): move the call to >> unregister_netdev() before removing repr from xarray. This way >> repr->br_port will be correctly set to NULL in >> ice_eswitch_br_port_deinit(), preventing a panic. >> >> Fixes: fff292b47ac1 ("ice: add VF representors one by one") >> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> >> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> >> --- >> drivers/net/ethernet/intel/ice/ice_eswitch.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c >> index c0b3e70a7ea3..fb527434b58b 100644 >> --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c >> +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c >> @@ -552,13 +552,14 @@ int ice_eswitch_attach_sf(struct ice_pf *pf, struct ice_dynamic_port *sf) >> static void ice_eswitch_detach(struct ice_pf *pf, struct ice_repr *repr) >> { >> ice_eswitch_stop_reprs(pf); >> + repr->ops.rem(repr); >> + >> xa_erase(&pf->eswitch.reprs, repr->id); >> if (xa_empty(&pf->eswitch.reprs)) >> ice_eswitch_disable_switchdev(pf); >> ice_eswitch_release_repr(pf, repr); >> - repr->ops.rem(repr); >> ice_repr_destroy(repr); >> if (xa_empty(&pf->eswitch.reprs)) { > > Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> > > > Kind regards, > > Paul
diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c index c0b3e70a7ea3..fb527434b58b 100644 --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c @@ -552,13 +552,14 @@ int ice_eswitch_attach_sf(struct ice_pf *pf, struct ice_dynamic_port *sf) static void ice_eswitch_detach(struct ice_pf *pf, struct ice_repr *repr) { ice_eswitch_stop_reprs(pf); + repr->ops.rem(repr); + xa_erase(&pf->eswitch.reprs, repr->id); if (xa_empty(&pf->eswitch.reprs)) ice_eswitch_disable_switchdev(pf); ice_eswitch_release_repr(pf, repr); - repr->ops.rem(repr); ice_repr_destroy(repr); if (xa_empty(&pf->eswitch.reprs)) {