mbox series

[PATCHSET,v2,0/4] xfs: fix cpu hotplug mess

Message ID 169335040678.3522698.12786707653439539265.stgit@frogsfrogsfrogs (mailing list archive)
Headers show
Series xfs: fix cpu hotplug mess | expand

Message

Darrick J. Wong Aug. 29, 2023, 11:06 p.m. UTC
Hi all,

Ritesh and Eric separately reported crashes in XFS's hook function for
CPU hot remove if the remove event races with a filesystem being
mounted.  I also noticed via generic/650 that once in a while the log
will shut down over an apparent overrun of a transaction reservation;
this turned out to be due to CIL percpu list aggregation failing to pick
up the percpu list items from a dying CPU.

Either way, the solution here is to eliminate the need for a CPU dying
hook by using a private cpumask to track which CPUs have added to their
percpu lists directly, and iterating with that mask.  This fixes the log
problems and (I think) solves a theoretical UAF bug in the inodegc code
too.

v2: fix a few put_cpu uses, add necessary memory barriers, and use
    atomic cpumask operations

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been lightly tested with fstests.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fix-percpu-lists-6.6
---
 fs/xfs/xfs_icache.c        |   78 ++++++++++++++--------------------------
 fs/xfs/xfs_icache.h        |    1 -
 fs/xfs/xfs_log_cil.c       |   52 ++++++++-------------------
 fs/xfs/xfs_log_priv.h      |   14 +++----
 fs/xfs/xfs_mount.h         |    7 ++--
 fs/xfs/xfs_super.c         |   86 +-------------------------------------------
 include/linux/cpuhotplug.h |    1 -
 7 files changed, 56 insertions(+), 183 deletions(-)

Comments

Dave Chinner Aug. 30, 2023, 12:14 a.m. UTC | #1
On Tue, Aug 29, 2023 at 04:06:46PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> Ritesh and Eric separately reported crashes in XFS's hook function for
> CPU hot remove if the remove event races with a filesystem being
> mounted.  I also noticed via generic/650 that once in a while the log
> will shut down over an apparent overrun of a transaction reservation;
> this turned out to be due to CIL percpu list aggregation failing to pick
> up the percpu list items from a dying CPU.
> 
> Either way, the solution here is to eliminate the need for a CPU dying
> hook by using a private cpumask to track which CPUs have added to their
> percpu lists directly, and iterating with that mask.  This fixes the log
> problems and (I think) solves a theoretical UAF bug in the inodegc code
> too.
> 
> v2: fix a few put_cpu uses, add necessary memory barriers, and use
>     atomic cpumask operations
> 
> If you're going to start using this code, I strongly recommend pulling
> from my git trees, which are linked below.
> 
> This has been lightly tested with fstests.  Enjoy!
> Comments and questions are, as always, welcome.

Series looks good. Removes a bunch of code and makes things more
reliable, so what's not to like about it?

Reviewed-by: Dave Chinner <dchinner@redhat.com>