Message ID | 20241112221920.1105007-1-david@fromorbit.com (mailing list archive) |
---|---|
Headers | show |
Series | xfs: miscellaneous bug fixes | expand |
On Wed, Nov 13, 2024 at 09:05:13AM +1100, Dave Chinner wrote: > These are three bug fixes for recent issues. > > The first is a repost of the original patch to prevent allocation of > sparse inode clusters at the end of an unaligned runt AG. There > was plenty of discussion over that fix here: > > https://lore.kernel.org/linux-xfs/20241024025142.4082218-1-david@fromorbit.com/ > > And the outcome of that discussion is that we can't allow sparse > inode clusters overlapping the end of the runt AG without an on disk > format definition change. Hence this patch to ensure the check is > done correctly is the only change we need to make to the kernel to > avoid this problem in the future. > > Filesystems that have this problem on disk will need to run > xfs_repair to remove the bad cluster, but no data loss is possible > from this because the kernel currently disallows inode allocation > from the bad cluster and so none of the inodes in the sparse cluster > can actually be used. Hence there is no possible data loss or other > metadata corruption possible from this situation, all we need to do > is ensure that it doesn't happen again once repair has done it's > work. <shrug> How many systems are in this state? Would those users rather we fix the validation code in repair/scrub/wherever to allow ichunks that overrun the end of a runt AG? --D > The other two patches are for issues I've recently hit when running > lots of fstests in parallel. That changes loading and hence timing > of events during tests, exposing latent race conditions in the code. > The quota fix removes racy debug code that has been there since the > quota code was first committed in 1996. > > The log shutdown race fix is a much more recent issue created by > trying to ensure shutdowns operate in a sane and predictable manner. > The logic flaw is that we allow multiple log shutdowns to start and > force the log before selecting on a single log shutdown task. This > leads to a situation where shutdown log item callback processing > gets stuck waiting on a task holding a buffer lock that is waiting > on a log force that is waiting on shutdown log item callback > processing to complete... > > Thoughts? > >
On Tue, Nov 12, 2024 at 03:59:46PM -0800, Darrick J. Wong wrote: > On Wed, Nov 13, 2024 at 09:05:13AM +1100, Dave Chinner wrote: > > These are three bug fixes for recent issues. > > > > The first is a repost of the original patch to prevent allocation of > > sparse inode clusters at the end of an unaligned runt AG. There > > was plenty of discussion over that fix here: > > > > https://lore.kernel.org/linux-xfs/20241024025142.4082218-1-david@fromorbit.com/ > > > > And the outcome of that discussion is that we can't allow sparse > > inode clusters overlapping the end of the runt AG without an on disk > > format definition change. Hence this patch to ensure the check is > > done correctly is the only change we need to make to the kernel to > > avoid this problem in the future. > > > > Filesystems that have this problem on disk will need to run > > xfs_repair to remove the bad cluster, but no data loss is possible > > from this because the kernel currently disallows inode allocation > > from the bad cluster and so none of the inodes in the sparse cluster > > can actually be used. Hence there is no possible data loss or other > > metadata corruption possible from this situation, all we need to do > > is ensure that it doesn't happen again once repair has done it's > > work. > > <shrug> How many systems are in this state? Some. Maybe many. Unfortunately the number is largely unquantifiable. However, when it happens it dumps corruption reports dumped in the log, so I'd say that there aren't that many of them out there because we aren't getting swamped with corruption reports. > Would those users rather we > fix the validation code in repair/scrub/wherever to allow ichunks that > overrun the end of a runt AG? Uh, the previous discussion ended at "considering inode chunks overlapping the end of the runt AG as valid requires an incompat feature flag as older kernels cannot access inodes in that location". i.e. older kernels will flag those inodes as corrupt if we don't add an incompat feature flag to indicate they are valid. At that point, we have a situation where they are forced to upgrade userspace tools to do anything with the filesytsem that the kernel added the new incompat feature flag for on upgrade. That's a much worse situation, because they might not realise they need to upgrade all the userspace tools and disaster recovery utilities to handle this new format that the kernel upgrade introduced.... The repair/scrub/whatever code already detects and fix the issue by removing the bad cluster from the runt AG. We just need to stop the kernel creating the bad clusters again. IOWs, it just simpler for everyone to fix the bug like this and continue to consider the sparse inode cluster at the end of the AG is invalid. Alternatively, if users can grow the block device, then they can simply round up the size of the block device to a whole inode chunk. They don't need to run repair to fix the issue; the cluster is now valid because a whole chunk will fit at end of the runt AG. -Dave.