Message ID | 20200504141154.55887-1-bfoster@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | xfs: flush related error handling cleanups | expand |
On Mon, May 04, 2020 at 10:11:37AM -0400, Brian Foster wrote: > Hi all, > > I think everything has been reviewed to this point. Only minor changes > noted below in this release. A git repo is available here[1]. > > The only outstanding feedback that I'm aware of is Dave's comment on > patch 7 of v3 [2] regarding the shutdown assert check. I'm not aware of > any means to get through xfs_wait_buftarg() with a dirty buffer that > hasn't undergone the permanant error sequence and thus shut down the fs. # echo 0 > /sys/fs/xfs/<dev>/fail_at_unmount And now any error with a "retry forever" config (the default) will be collected by xfs_buftarg_wait() without a preceeding shutdown as xfs_buf_iodone_callback_error() will not treat it as a permanent error during unmount. i.e. this doesn't trigger: /* At unmount we may treat errors differently */ if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount) goto permanent_error; and so the error handling just marks it with a write error and lets it go for a write retry in future. These are then collected in xfs_buftarg_wait() as nothing is going to retry them once unmount gets to this point... Cheers, Dave.
On Tue, May 05, 2020 at 07:53:07AM +1000, Dave Chinner wrote: > On Mon, May 04, 2020 at 10:11:37AM -0400, Brian Foster wrote: > > Hi all, > > > > I think everything has been reviewed to this point. Only minor changes > > noted below in this release. A git repo is available here[1]. > > > > The only outstanding feedback that I'm aware of is Dave's comment on > > patch 7 of v3 [2] regarding the shutdown assert check. I'm not aware of > > any means to get through xfs_wait_buftarg() with a dirty buffer that > > hasn't undergone the permanant error sequence and thus shut down the fs. > > # echo 0 > /sys/fs/xfs/<dev>/fail_at_unmount > > And now any error with a "retry forever" config (the default) will > be collected by xfs_buftarg_wait() without a preceeding shutdown as > xfs_buf_iodone_callback_error() will not treat it as a permanent > error during unmount. i.e. this doesn't trigger: > > /* At unmount we may treat errors differently */ > if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount) > goto permanent_error; > > and so the error handling just marks it with a write error and lets > it go for a write retry in future. These are then collected in > xfs_buftarg_wait() as nothing is going to retry them once unmount > gets to this point... > That doesn't accurately describe the behavior of that configuration, though. "Retry forever" means that dirty buffers are going to cycle through submission retries and the unmount is going to hang indefinitely (on pushing the AIL). Indeed, preventing this unmount hang is the original purpose of the fail at unmount knob (commit here[1]). IOW, we don't get to xfs_wait_buftarg() in that scenario until all dirty buffers are either written back successfully or the error configuration changes to process the failures as permanent errors and shuts down the fs. This can be confirmed easily with the buffer I/O error injection patch (use a value of 1 to simulate persistent errors). Brian [1] e6b3bb78962e6 ("xfs: add "fail at unmount" error handling configuration") > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com >
On Tue, May 05, 2020 at 07:58:10AM -0400, Brian Foster wrote: > On Tue, May 05, 2020 at 07:53:07AM +1000, Dave Chinner wrote: > > On Mon, May 04, 2020 at 10:11:37AM -0400, Brian Foster wrote: > > > Hi all, > > > > > > I think everything has been reviewed to this point. Only minor changes > > > noted below in this release. A git repo is available here[1]. > > > > > > The only outstanding feedback that I'm aware of is Dave's comment on > > > patch 7 of v3 [2] regarding the shutdown assert check. I'm not aware of > > > any means to get through xfs_wait_buftarg() with a dirty buffer that > > > hasn't undergone the permanant error sequence and thus shut down the fs. > > > > # echo 0 > /sys/fs/xfs/<dev>/fail_at_unmount > > > > And now any error with a "retry forever" config (the default) will > > be collected by xfs_buftarg_wait() without a preceeding shutdown as > > xfs_buf_iodone_callback_error() will not treat it as a permanent > > error during unmount. i.e. this doesn't trigger: > > > > /* At unmount we may treat errors differently */ > > if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount) > > goto permanent_error; > > > > and so the error handling just marks it with a write error and lets > > it go for a write retry in future. These are then collected in > > xfs_buftarg_wait() as nothing is going to retry them once unmount > > gets to this point... > > > > That doesn't accurately describe the behavior of that configuration, > though. "Retry forever" means that dirty buffers are going to cycle > through submission retries and the unmount is going to hang indefinitely > (on pushing the AIL). Indeed, preventing this unmount hang is the > original purpose of the fail at unmount knob (commit here[1]). So this patch is relying on something else hanging before we get to xfs_wait_buftarg and hence "we don't have to handle that here". Please document that assumption along with the ASSERT, because if we change AIL behaviour to prevent retry-forever hangs with freeze and/or remount-ro we're going to hit this assert... Cheers, Dave.
On Wed, May 06, 2020 at 08:36:43AM +1000, Dave Chinner wrote: > On Tue, May 05, 2020 at 07:58:10AM -0400, Brian Foster wrote: > > On Tue, May 05, 2020 at 07:53:07AM +1000, Dave Chinner wrote: > > > On Mon, May 04, 2020 at 10:11:37AM -0400, Brian Foster wrote: > > > > Hi all, > > > > > > > > I think everything has been reviewed to this point. Only minor changes > > > > noted below in this release. A git repo is available here[1]. > > > > > > > > The only outstanding feedback that I'm aware of is Dave's comment on > > > > patch 7 of v3 [2] regarding the shutdown assert check. I'm not aware of > > > > any means to get through xfs_wait_buftarg() with a dirty buffer that > > > > hasn't undergone the permanant error sequence and thus shut down the fs. > > > > > > # echo 0 > /sys/fs/xfs/<dev>/fail_at_unmount > > > > > > And now any error with a "retry forever" config (the default) will > > > be collected by xfs_buftarg_wait() without a preceeding shutdown as > > > xfs_buf_iodone_callback_error() will not treat it as a permanent > > > error during unmount. i.e. this doesn't trigger: > > > > > > /* At unmount we may treat errors differently */ > > > if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount) > > > goto permanent_error; > > > > > > and so the error handling just marks it with a write error and lets > > > it go for a write retry in future. These are then collected in > > > xfs_buftarg_wait() as nothing is going to retry them once unmount > > > gets to this point... > > > > > > > That doesn't accurately describe the behavior of that configuration, > > though. "Retry forever" means that dirty buffers are going to cycle > > through submission retries and the unmount is going to hang indefinitely > > (on pushing the AIL). Indeed, preventing this unmount hang is the > > original purpose of the fail at unmount knob (commit here[1]). > > So this patch is relying on something else hanging before we get to > xfs_wait_buftarg and hence "we don't have to handle that here". > Yes, though it's the existing code that relies on the external push to block until everything is written back or a permanent error is issued. This patch just splits up and rate limits a warning message. I added the assert because of the previous discussion around making sure this situation is handled properly and because that error processing dependency is not obvious. > Please document that assumption along with the ASSERT, because if we > change AIL behaviour to prevent retry-forever hangs with freeze > and/or remount-ro we're going to hit this assert... > Sure, I'll add a comment.. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com >