mbox series

[0/6,v2] xfs: more shutdown/recovery fixes

Message ID 20220324002103.710477-1-david@fromorbit.com (mailing list archive)
Headers show
Series xfs: more shutdown/recovery fixes | expand

Message

Dave Chinner March 24, 2022, 12:20 a.m. UTC
Hi folks,

V2 of this patchset has blown out from 2 to 6 patches because of
the sudden explosion of everyone having new problems with
shutdown/recovery behaviour. Patches 3-6 are new patches in the
series.

Patch 3 addresses the shutdown log force wakeup failure Brian
reported here:

https://lore.kernel.org/linux-xfs/YjneHEoFRDXu+EcA@bfoster/

Patches 4-6 fix a long standing shutdown race where
xfs_trans_commit() can abort modified log items and leave them
unpinned and dirty in memory while the log is still running,
allowing unjournalled, incomplete changes to be written back to disk
before the log is shut down. This race condition has been around
for a long time - it looks to be a zero-day bug in the original
shutdown code introduced in January 1997.

Fixing this requires the log to be able to shut down indepedently of
the mount (i.e. from log IO completion context), mount shutdowns to
be forced to wait until the log shutdown is complete and for log
shutdowns to also shut down the mount because otherwise shit just
breaks all over the place because random stuff errors out on log
shutdown and xfs_is_shutdown() is not set so those errors are
not handled appropriately by high level code. Or just assert fail
because the mount isn't shutdown down.

Once all that is done, we can fix xfs_trans_commit() and
xfs_trans_cancel() to not leak aborted items into memory until the
log is fully shut down.

This now makes recoveryloop largely stable on my test machines. I am
still seeing failures, but they are one-off, whacky things (like
weird udev/netlink memory freeing warnings) that I'm unable to
reproduce in any way.

-Dave.

Version 2:
- rework inode cluster buffer checks in inode item pushing (patch 1)
- clean up comments and separation of inode abort behaviour (p1)
- Fix shutdown callback/log force wakeup ordering issue (p3)
- Fix writeback of aborted, incomplete, unlogged changes during
  shutdown races (p4-6)

Version 1:
- https://lore.kernel.org/linux-xfs/20220321012329.376307-1-david@fromorbit.com/