Message ID | 0cc12ebb07c4d4c41a1265ee2c28b392ff997a86.1713797103.git.petrm@nvidia.com (mailing list archive) |
---|---|
State | Accepted |
Commit | fb4e2b70a7194b209fc7320bbf33b375f7114bd5 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | mlxsw: Various ACL fixes | expand |
On Mon, Apr 22, 2024 at 05:26:02PM +0200, Petr Machata wrote: > From: Ido Schimmel <idosch@nvidia.com> > > The rehash delayed work is rescheduled with a delay if the number of > credits at end of the work is not negative as supposedly it means that > the migration ended. Otherwise, it is rescheduled immediately. > > After "mlxsw: spectrum_acl_tcam: Fix possible use-after-free during > rehash" the above is no longer accurate as a non-negative number of > credits is no longer indicative of the migration being done. It can also > happen if the work encountered an error in which case the migration will > resume the next time the work is scheduled. > > The significance of the above is that it is possible for the work to be > pending and associated with hints that were allocated when the migration > started. This leads to the hints being leaked [1] when the work is > canceled while pending as part of ACL region dismantle. > > Fix by freeing the hints if hints are associated with a work that was > canceled while pending. > > Blame the original commit since the reliance on not having a pending > work associated with hints is fragile. > > [1] > unreferenced object 0xffff88810e7c3000 (size 256): > comm "kworker/0:16", pid 176, jiffies 4295460353 > hex dump (first 32 bytes): > 00 30 95 11 81 88 ff ff 61 00 00 00 00 00 00 80 .0......a....... > 00 00 61 00 40 00 00 00 00 00 00 00 04 00 00 00 ..a.@........... > backtrace (crc 2544ddb9): > [<00000000cf8cfab3>] kmalloc_trace+0x23f/0x2a0 > [<000000004d9a1ad9>] objagg_hints_get+0x42/0x390 > [<000000000b143cf3>] mlxsw_sp_acl_erp_rehash_hints_get+0xca/0x400 > [<0000000059bdb60a>] mlxsw_sp_acl_tcam_vregion_rehash_work+0x868/0x1160 > [<00000000e81fd734>] process_one_work+0x59c/0xf20 > [<00000000ceee9e81>] worker_thread+0x799/0x12c0 > [<00000000bda6fe39>] kthread+0x246/0x300 > [<0000000070056d23>] ret_from_fork+0x34/0x70 > [<00000000dea2b93e>] ret_from_fork_asm+0x1a/0x30 > > Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work") > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > Tested-by: Alexander Zubkov <green@qrator.net> > Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org>
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c index 89a5ebc3463f..92a406f02eae 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c @@ -836,10 +836,14 @@ mlxsw_sp_acl_tcam_vregion_destroy(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_acl_tcam *tcam = vregion->tcam; if (vgroup->vregion_rehash_enabled && ops->region_rehash_hints_get) { + struct mlxsw_sp_acl_tcam_rehash_ctx *ctx = &vregion->rehash.ctx; + mutex_lock(&tcam->lock); list_del(&vregion->tlist); mutex_unlock(&tcam->lock); - cancel_delayed_work_sync(&vregion->rehash.dw); + if (cancel_delayed_work_sync(&vregion->rehash.dw) && + ctx->hints_priv) + ops->region_rehash_hints_put(ctx->hints_priv); } mlxsw_sp_acl_tcam_vgroup_vregion_detach(mlxsw_sp, vregion); if (vregion->region2)