Message ID | 20240902075045.1037365-1-chandanbabu@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] xfs: Prevent umount from indefinitely waiting on XFS_IFLUSHING flag on stale inodes | expand |
On Mon, Sep 02, 2024 at 01:20:41PM +0530, Chandan Babu R wrote: > Executing xfs/057 can lead to an unmount task to wait indefinitely for > XFS_IFLUSHING flag on some inodes to be cleared. The following timeline > describes as to how inodes can get into such a state. > > Task A Task B Iclog endio processing > ---------------------------------------------------------------------------- > Inodes are freed > > Inodes items are > added to the CIL > > CIL contents are > written to iclog > > iclog->ic_fail_crc > is set to true > > iclog is submitted > for writing to the > disk > > Last inode in the cluster > buffer is freed > > XFS_[ISTALE/IFLUSHING] is > set on all inodes in the > cluster buffer > > XFS_STALE is set on > the cluster buffer > iclog crc error is detected > ... during endio processing > During xfs_trans_commit, Set XFS_LI_ABORTED on inode > log shutdown is detected items > on xfs_buf_log_item - Unpin the inode since it > is stale and return -1 > xfs_buf_log_item is freed How do we get the buffer log item freed here? It should be in the CIL and/or the AIL because the unlinked inode list updates should have already logged directly to that buffer and committed in in previous transactions. > Inode log items are not > xfs_buf is not freed here processed further since > since b_hold has a xfs_inode_item_committed() > non-zero value returns -1 > > During normal operation, the stale inodes are processed by > xfs_buf_item_unpin() => xfs_buf_inode_iodone(). This ends up calling > xfs_iflush_abort() which in turn clears the XFS_IFLUSHING flag. However, in > the case of this bug, the xfs_buf_log_item is freed just before the high level > transaction is committed to the CIL. > > To overcome this bug, this commit removes the check for log shutdown during > high level transaction commit operation. The log items in the high level > transaction will now be committed to the CIL despite the log being > shutdown. This will allow the CIL processing logic (i.e. xlog_cil_push_work()) > to invoke xlog_cil_committed() as part of error handling. This will cause > xfs_buf log item to to be unpinned and the corresponding inodes to be aborted > and have their XFS_IFLUSHING flag cleared. I don't know exactly how the problem arose, but I can say for certain that the proposed fix is not valid. Removing that specific log shutdown check re-opens a race condition which can causes on disk corruption. The shutdown was specifically placed to close that race - See commit 3c4cb76bce43 ("xfs: xfs_trans_commit() path must check for log shutdown") for details. I have no idea what the right way to fix this is yet, but removing the shutdown check isn't it... -Dave.
On Mon, Sep 02, 2024 at 11:24:00 PM +1000, Dave Chinner wrote: > On Mon, Sep 02, 2024 at 01:20:41PM +0530, Chandan Babu R wrote: >> Executing xfs/057 can lead to an unmount task to wait indefinitely for >> XFS_IFLUSHING flag on some inodes to be cleared. The following timeline >> describes as to how inodes can get into such a state. >> >> Task A Task B Iclog endio processing >> ---------------------------------------------------------------------------- >> Inodes are freed >> >> Inodes items are >> added to the CIL >> >> CIL contents are >> written to iclog >> >> iclog->ic_fail_crc >> is set to true >> >> iclog is submitted >> for writing to the >> disk >> >> Last inode in the cluster >> buffer is freed >> >> XFS_[ISTALE/IFLUSHING] is >> set on all inodes in the >> cluster buffer >> >> XFS_STALE is set on >> the cluster buffer >> iclog crc error is detected >> ... during endio processing >> During xfs_trans_commit, Set XFS_LI_ABORTED on inode >> log shutdown is detected items >> on xfs_buf_log_item - Unpin the inode since it >> is stale and return -1 >> xfs_buf_log_item is freed > > How do we get the buffer log item freed here? It should be in the > CIL and/or the AIL because the unlinked inode list updates should > have already logged directly to that buffer and committed in in > previous transactions. > Apologies for the late response. I was held up with other work and also accidently overwrote the perf.data file. I had to spend some time to get a good trace file. Apart from the inode freed by Task B (the last inode to be freed), the remaining inodes of the cluster buffer would have gone through the following steps during inode inactivation processing, 1. Add the corresponding inode items to the list at bp->b_li_list. 2. Free the corresponding buffer log item because it is neither dirty nor stale and the refcount of buffer log item was 1 before executing xfs_buf_item_put(). Please refer to xfs_inode_item_precommit() => xfs_trans_brelse() => xfs_buf_item_put(). 3. The inode log item is then added to the CIL. At this point in execution, I can't seem to find the reason for the buffer log item to be on either the CIL or AIL. The new trace file has a slightly different timeline, Task A Task B Iclog endio processing CIL push worker ------------------+---------------------------+-----------------------------+------------------------------ Inodes are freed Inodes items are added to the CIL Last inode in the cluster buffer is freed XFS_[ISTALE/IFLUSHING] is set on all inodes in the cluster buffer XFS_STALE is set on Probably waiting on the cluster buffer cil->xc_[start/commit]_wait ... iclog crc error is detected ... during endio processing. This checkpoint transaction was mostly submitted before any of the inodes from the cluster buffer were freed. Set XLOG_IO_ERROR and XFS_OPSTATE_SHUTDOWN. Wake up waiters waiting on cil->xc_[start/commit]_wait During xfs_trans_commit, Execute xlog_cil_committed() log shutdown is detected as part of error handling. XFS_LI_ABORTED is set Set XFS_LI_ABORTED on inode on xfs_buf_log_item items. xfs_buf_log_item is freed xfs_inode_item_committed() - Unpin the inode since it xfs_buf is not freed here is stale and return -1 since b_hold has a non-zero value Inode log items are not processed further since xfs_inode_item_committed() returns -1 The above time line was contructed based on the following trace data, xfs__xfs_iunlink_remove 2 02114.099694694 9903 kworker/2:3-eve dev=7340037, agno=0, agino=53801 [ffffffffc1a9e039] xfs_iunlink_remove_inode+0x4c9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1a9e039] xfs_iunlink_remove_inode+0x4c9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1aa0254] xfs_iunlink_remove+0x94 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1aa0745] xfs_inode_uninit+0x95 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b4c1f6] xfs_ifree+0x246 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b4c68e] xfs_inactive_ifree+0x18e (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b4cdcd] xfs_inactive+0x4fd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b2f443] xfs_inodegc_worker+0x273 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) 53801 is the last inode in the cluster buffer to be freed. xfs__xfs_trans_binval 2 02114.128632378 9903 kworker/2:3-eve dev=7340037, buf_bno=53792, buf_len=32, buf_hold=35, buf_pincount=0, buf_lockval=0, buf_flags=DONE | INODES | PAGES, bli_recur=0, bli_refcount=1, bli_flags=STALE_INODE, li_flags= [ffffffffc1bdac19] xfs_trans_binval+0x2b9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1bdac19] xfs_trans_binval+0x2b9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b48611] xfs_ifree_cluster+0x4a1 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b4c336] xfs_ifree+0x386 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b4c68e] xfs_inactive_ifree+0x18e (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b4cdcd] xfs_inactive+0x4fd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b2f443] xfs_inodegc_worker+0x273 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) Cluster buffer is invalidated. xfs__xfs_buf_item_release 2 02114.128823015 9903 kworker/2:3-eve dev=7340037, buf_bno=53792, buf_len=32, buf_hold=34, buf_pincount=0, buf_lockval=0, buf_flags=DONE | STALE | INODES | PAGES, bli_recur=0, bli_refcount=1, bli_flags=STALE | STALE_INODE, li_flags=ABORTED [ffffffffc1b9e488] xfs_buf_item_release+0x3d8 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b9e488] xfs_buf_item_release+0x3d8 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b7528d] xfs_trans_free_items+0xcd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b77b5b] __xfs_trans_commit+0x29b (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b796da] xfs_trans_roll+0x11a (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1a306ba] xfs_defer_trans_roll+0x11a (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1a31b27] xfs_defer_finish_noroll+0x667 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b77ce6] __xfs_trans_commit+0x426 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b4c6f0] xfs_inactive_ifree+0x1f0 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b4cdcd] xfs_inactive+0x4fd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b2f443] xfs_inodegc_worker+0x273 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) Buffer log item is aborted by the higher level transaction. Please note that buffer log items's refcount is set to 1. xfs__xfs_buf_rele 2 02114.129083754 9903 kworker/2:3-eve dev=7340037, bno=53792, nblks=32, hold=33, pincount=0, lockval=1, flags=DONE | STALE | INODES | PAGES, caller_ip=18446744072664601229, buf_ops=18446744072669101824 [ffffffffc1af372d] xfs_buf_rele_cached+0x3ed (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1af372d] xfs_buf_rele_cached+0x3ed (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b7528d] xfs_trans_free_items+0xcd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b77b5b] __xfs_trans_commit+0x29b (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b796da] xfs_trans_roll+0x11a (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1a306ba] xfs_defer_trans_roll+0x11a (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1a31b27] xfs_defer_finish_noroll+0x667 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b77ce6] __xfs_trans_commit+0x426 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b4c6f0] xfs_inactive_ifree+0x1f0 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b4cdcd] xfs_inactive+0x4fd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b2f443] xfs_inodegc_worker+0x273 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) xfs_buf is not released since the new 'hold' count will be 32; However the caller i.e. xfs_buf_item_relse() will free the xfs_buf_log_item xfs__xfs_inode_unpin 2 02115.142605975 5368 kworker/u16:0-e dev=7340037, ino=53792, count=0, pincount=2, caller_ip=18446744072664863980 [ffffffffc1bb5344] xfs_inode_item_unpin+0x174 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1bb5344] xfs_inode_item_unpin+0x174 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1bb54ec] xfs_inode_item_committed+0x9c (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b915a0] xlog_cil_ail_insert+0x330 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b92104] xlog_cil_committed+0x454 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b94569] xlog_cil_push_work+0x11a9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) CIL push worker is executing error handling code; 53792 is one of the inodes which was freed and committed to the CIL by Task A. It is being unpinned since the inode is stale. xfs__xfs_inode_unpin 2 02115.312093611 5368 kworker/u16:0-e dev=7340037, ino=53800, count=0, pincount=1, caller_ip=18446744072664863980 [ffffffffc1bb5344] xfs_inode_item_unpin+0x174 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1bb5344] xfs_inode_item_unpin+0x174 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1bb54ec] xfs_inode_item_committed+0x9c (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b915a0] xlog_cil_ail_insert+0x330 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b92104] xlog_cil_committed+0x454 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffffc1b94569] xlog_cil_push_work+0x11a9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) Another stale inode being unpinned. >> >> To overcome this bug, this commit removes the check for log shutdown during >> high level transaction commit operation. The log items in the high level >> transaction will now be committed to the CIL despite the log being >> shutdown. This will allow the CIL processing logic (i.e. xlog_cil_push_work()) >> to invoke xlog_cil_committed() as part of error handling. This will cause >> xfs_buf log item to to be unpinned and the corresponding inodes to be aborted >> and have their XFS_IFLUSHING flag cleared. > > I don't know exactly how the problem arose, but I can say for > certain that the proposed fix is not valid. Removing that specific > log shutdown check re-opens a race condition which can causes on > disk corruption. The shutdown was specifically placed to close that > race - See commit 3c4cb76bce43 ("xfs: xfs_trans_commit() path must > check for log shutdown") for details. > > I have no idea what the right way to fix this is yet, but removing > the shutdown check isn't it... > I haven't read the commit yet. I will do so and get back.
On Thu, Sep 05, 2024 at 06:12:29 PM +0530, Chandan Babu R wrote: > On Mon, Sep 02, 2024 at 11:24:00 PM +1000, Dave Chinner wrote: >> On Mon, Sep 02, 2024 at 01:20:41PM +0530, Chandan Babu R wrote: >>> Executing xfs/057 can lead to an unmount task to wait indefinitely for >>> XFS_IFLUSHING flag on some inodes to be cleared. The following timeline >>> describes as to how inodes can get into such a state. >>> >>> Task A Task B Iclog endio processing >>> ---------------------------------------------------------------------------- >>> Inodes are freed >>> >>> Inodes items are >>> added to the CIL >>> >>> CIL contents are >>> written to iclog >>> >>> iclog->ic_fail_crc >>> is set to true >>> >>> iclog is submitted >>> for writing to the >>> disk >>> >>> Last inode in the cluster >>> buffer is freed >>> >>> XFS_[ISTALE/IFLUSHING] is >>> set on all inodes in the >>> cluster buffer >>> >>> XFS_STALE is set on >>> the cluster buffer >>> iclog crc error is detected >>> ... during endio processing >>> During xfs_trans_commit, Set XFS_LI_ABORTED on inode >>> log shutdown is detected items >>> on xfs_buf_log_item - Unpin the inode since it >>> is stale and return -1 >>> xfs_buf_log_item is freed >> >> How do we get the buffer log item freed here? It should be in the >> CIL and/or the AIL because the unlinked inode list updates should >> have already logged directly to that buffer and committed in in >> previous transactions. >> > > Apologies for the late response. I was held up with other work and also > accidently overwrote the perf.data file. I had to spend some time to get a > good trace file. > > Apart from the inode freed by Task B (the last inode to be freed), the > remaining inodes of the cluster buffer would have gone through the following > steps during inode inactivation processing, > 1. Add the corresponding inode items to the list at bp->b_li_list. > 2. Free the corresponding buffer log item because it is neither dirty nor > stale and the refcount of buffer log item was 1 before executing > xfs_buf_item_put(). Please refer to xfs_inode_item_precommit() => > xfs_trans_brelse() => xfs_buf_item_put(). > 3. The inode log item is then added to the CIL. > > At this point in execution, I can't seem to find the reason for the buffer log > item to be on either the CIL or AIL. > > > The new trace file has a slightly different timeline, > > Task A Task B Iclog endio processing CIL push worker > ------------------+---------------------------+-----------------------------+------------------------------ > Inodes are freed > > Inodes items are > added to the CIL > > > Last inode in the cluster > buffer is freed > > XFS_[ISTALE/IFLUSHING] is > set on all inodes in the > cluster buffer > > XFS_STALE is set on Probably waiting on > the cluster buffer cil->xc_[start/commit]_wait > > ... iclog crc error is detected ... > during endio processing. > This checkpoint transaction > was mostly submitted before > any of the inodes from the > cluster buffer were freed. > > Set XLOG_IO_ERROR and > XFS_OPSTATE_SHUTDOWN. > > Wake up waiters waiting on > cil->xc_[start/commit]_wait > During xfs_trans_commit, Execute xlog_cil_committed() > log shutdown is detected as part of error handling. > > XFS_LI_ABORTED is set Set XFS_LI_ABORTED on inode > on xfs_buf_log_item items. > > xfs_buf_log_item is freed xfs_inode_item_committed() > - Unpin the inode since it > xfs_buf is not freed here is stale and return -1 > since b_hold has a > non-zero value Inode log items are not > processed further since > xfs_inode_item_committed() > returns -1 > > The above time line was contructed based on the following trace data, > > xfs__xfs_iunlink_remove 2 02114.099694694 9903 kworker/2:3-eve > dev=7340037, agno=0, agino=53801 > [ffffffffc1a9e039] xfs_iunlink_remove_inode+0x4c9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1a9e039] xfs_iunlink_remove_inode+0x4c9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1aa0254] xfs_iunlink_remove+0x94 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1aa0745] xfs_inode_uninit+0x95 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b4c1f6] xfs_ifree+0x246 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b4c68e] xfs_inactive_ifree+0x18e (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b4cdcd] xfs_inactive+0x4fd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b2f443] xfs_inodegc_worker+0x273 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) > [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) > [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) > [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) > [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) > > 53801 is the last inode in the cluster buffer to be freed. > > > xfs__xfs_trans_binval 2 02114.128632378 9903 kworker/2:3-eve dev=7340037, > buf_bno=53792, buf_len=32, buf_hold=35, buf_pincount=0, buf_lockval=0, > buf_flags=DONE | INODES | PAGES, bli_recur=0, bli_refcount=1, > bli_flags=STALE_INODE, li_flags= > [ffffffffc1bdac19] xfs_trans_binval+0x2b9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1bdac19] xfs_trans_binval+0x2b9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b48611] xfs_ifree_cluster+0x4a1 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b4c336] xfs_ifree+0x386 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b4c68e] xfs_inactive_ifree+0x18e (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b4cdcd] xfs_inactive+0x4fd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b2f443] xfs_inodegc_worker+0x273 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) > [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) > [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) > [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) > [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) > > Cluster buffer is invalidated. > > > xfs__xfs_buf_item_release 2 02114.128823015 9903 kworker/2:3-eve dev=7340037, > buf_bno=53792, buf_len=32, buf_hold=34, buf_pincount=0, buf_lockval=0, > buf_flags=DONE | STALE | INODES | PAGES, bli_recur=0, bli_refcount=1, > bli_flags=STALE | STALE_INODE, li_flags=ABORTED > [ffffffffc1b9e488] xfs_buf_item_release+0x3d8 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b9e488] xfs_buf_item_release+0x3d8 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b7528d] xfs_trans_free_items+0xcd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b77b5b] __xfs_trans_commit+0x29b (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b796da] xfs_trans_roll+0x11a (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1a306ba] xfs_defer_trans_roll+0x11a (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1a31b27] xfs_defer_finish_noroll+0x667 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b77ce6] __xfs_trans_commit+0x426 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b4c6f0] xfs_inactive_ifree+0x1f0 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b4cdcd] xfs_inactive+0x4fd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b2f443] xfs_inodegc_worker+0x273 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) > [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) > [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) > [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) > [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) > > Buffer log item is aborted by the higher level transaction. Please note that > buffer log items's refcount is set to 1. > > xfs__xfs_buf_rele 2 02114.129083754 9903 kworker/2:3-eve > dev=7340037, bno=53792, nblks=32, hold=33, pincount=0, lockval=1, flags=DONE | > STALE | INODES | PAGES, caller_ip=18446744072664601229, > buf_ops=18446744072669101824 > [ffffffffc1af372d] xfs_buf_rele_cached+0x3ed (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1af372d] xfs_buf_rele_cached+0x3ed (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b7528d] xfs_trans_free_items+0xcd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b77b5b] __xfs_trans_commit+0x29b (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b796da] xfs_trans_roll+0x11a (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1a306ba] xfs_defer_trans_roll+0x11a (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1a31b27] xfs_defer_finish_noroll+0x667 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b77ce6] __xfs_trans_commit+0x426 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b4c6f0] xfs_inactive_ifree+0x1f0 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b4cdcd] xfs_inactive+0x4fd (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b2f443] xfs_inodegc_worker+0x273 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) > [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) > [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) > [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) > [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) > > xfs_buf is not released since the new 'hold' count will be 32; However the > caller i.e. xfs_buf_item_relse() will free the xfs_buf_log_item > > xfs__xfs_inode_unpin 2 02115.142605975 5368 kworker/u16:0-e dev=7340037, > ino=53792, count=0, pincount=2, caller_ip=18446744072664863980 > [ffffffffc1bb5344] xfs_inode_item_unpin+0x174 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1bb5344] xfs_inode_item_unpin+0x174 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1bb54ec] xfs_inode_item_committed+0x9c (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b915a0] xlog_cil_ail_insert+0x330 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b92104] xlog_cil_committed+0x454 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b94569] xlog_cil_push_work+0x11a9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) > [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) > [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) > [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) > [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) > > CIL push worker is executing error handling code; 53792 is one of the inodes > which was freed and committed to the CIL by Task A. It is being unpinned since > the inode is stale. > > xfs__xfs_inode_unpin 2 02115.312093611 5368 kworker/u16:0-e > dev=7340037, ino=53800, count=0, pincount=1, > caller_ip=18446744072664863980 > [ffffffffc1bb5344] xfs_inode_item_unpin+0x174 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1bb5344] xfs_inode_item_unpin+0x174 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1bb54ec] xfs_inode_item_committed+0x9c (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b915a0] xlog_cil_ail_insert+0x330 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b92104] xlog_cil_committed+0x454 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffffc1b94569] xlog_cil_push_work+0x11a9 (/lib/modules/6.11.0-rc3+/kernel/fs/xfs/xfs.ko) > [ffffffff878623a3] process_one_work+0x603 ([kernel.kallsyms]) > [ffffffff87865560] worker_thread+0x7d0 ([kernel.kallsyms]) > [ffffffff8787f9cb] kthread+0x2eb ([kernel.kallsyms]) > [ffffffff876d9424] ret_from_fork+0x34 ([kernel.kallsyms]) > [ffffffff876094ea] ret_from_fork_asm+0x1a ([kernel.kallsyms]) > > Another stale inode being unpinned. > >>> >>> To overcome this bug, this commit removes the check for log shutdown during >>> high level transaction commit operation. The log items in the high level >>> transaction will now be committed to the CIL despite the log being >>> shutdown. This will allow the CIL processing logic (i.e. xlog_cil_push_work()) >>> to invoke xlog_cil_committed() as part of error handling. This will cause >>> xfs_buf log item to to be unpinned and the corresponding inodes to be aborted >>> and have their XFS_IFLUSHING flag cleared. >> >> I don't know exactly how the problem arose, but I can say for >> certain that the proposed fix is not valid. Removing that specific >> log shutdown check re-opens a race condition which can causes on >> disk corruption. The shutdown was specifically placed to close that >> race - See commit 3c4cb76bce43 ("xfs: xfs_trans_commit() path must >> check for log shutdown") for details. >> >> I have no idea what the right way to fix this is yet, but removing >> the shutdown check isn't it... >> Commit 3c4cb76bce43 describes the following scenario, 1. Filesystem is shutdown but the log remains operational. 2. High-level transaction commit (i.e. xfs_trans_commit()) notices the fs shutdown. Hence it aborts the dirty log items. One of the log items being aborted is an inode log item. 3. An inode cluster writeback is executed. Here, we come across the previously aborted inode log item. The inode log item is currently unpinned and dirty. Hence, the inode is included in the cluster buffer writeback. 4. Cluster buffer IO completion tries to remove the inode log item from the AIL and hence trips over an assert statement since the log item was never on the AIL. This indicates that the inode was never written to the journal. Hence the commit 3c4cb76bce43 will abort the transaction commit only when the log has been shutdown. With the log shutdown check removed, we can end up with the following cases during high-level transaction commit operation, 1. The filesystem is shutdown while the log remains operational. In this case, the log items are committed to the CIL where they are pinned before unlocking them. This should prevent the inode cluster writeback code from including such an inode for writeback since the corresponding log item is pinned. From here onwards, the normal flow of log items from the CIL to the AIL occurs after the contents of the log items are written to the journal and then later unpinned. The above logic holds true even without applying the "RFC patch" presented in the mail. 2. The log is shutdown. As in the previous case, the log items are moved to the CIL where they are pinned before unlocking them. The pinning of the log items prevents the inode cluster writeback code from including the pinned inode in its writeback operation. These log items are then processed by xlog_cil_committed() which gets invoked as part of error handling by xlog_cil_push_work(). I can't seem to find the scenario in which the above "RFC patch" can cause the aborted log item's inode to be included by the cluster writeback code.
On Mon, Sep 16, 2024 at 11:14:32AM +0530, Chandan Babu R wrote: > On Thu, Sep 05, 2024 at 06:12:29 PM +0530, Chandan Babu R wrote: > >>> To overcome this bug, this commit removes the check for log shutdown during > >>> high level transaction commit operation. The log items in the high level > >>> transaction will now be committed to the CIL despite the log being > >>> shutdown. This will allow the CIL processing logic (i.e. xlog_cil_push_work()) > >>> to invoke xlog_cil_committed() as part of error handling. This will cause > >>> xfs_buf log item to to be unpinned and the corresponding inodes to be aborted > >>> and have their XFS_IFLUSHING flag cleared. > >> > >> I don't know exactly how the problem arose, but I can say for > >> certain that the proposed fix is not valid. Removing that specific > >> log shutdown check re-opens a race condition which can causes on > >> disk corruption. The shutdown was specifically placed to close that > >> race - See commit 3c4cb76bce43 ("xfs: xfs_trans_commit() path must > >> check for log shutdown") for details. > >> > >> I have no idea what the right way to fix this is yet, but removing > >> the shutdown check isn't it... > >> > > Commit 3c4cb76bce43 describes the following scenario, > > 1. Filesystem is shutdown but the log remains operational. > 2. High-level transaction commit (i.e. xfs_trans_commit()) notices the fs > shutdown. Hence it aborts the dirty log items. One of the log items being > aborted is an inode log item. > 3. An inode cluster writeback is executed. Here, we come across the previously > aborted inode log item. The inode log item is currently unpinned and > dirty. Hence, the inode is included in the cluster buffer writeback. > 4. Cluster buffer IO completion tries to remove the inode log item from the > AIL and hence trips over an assert statement since the log item was never > on the AIL. This indicates that the inode was never written to the journal. > > Hence the commit 3c4cb76bce43 will abort the transaction commit only when the > log has been shutdown. > > With the log shutdown check removed, we can end up with the following cases > during high-level transaction commit operation, > 1. The filesystem is shutdown while the log remains operational. > In this case, the log items are committed to the CIL where they are pinned > before unlocking them. > This should prevent the inode cluster writeback code > from including such an inode for writeback since the corresponding log item > is pinned. From here onwards, the normal flow of log items from the CIL to > the AIL occurs after the contents of the log items are written to the > journal and then later unpinned. > The above logic holds true even without applying the "RFC patch" presented > in the mail. > > 2. The log is shutdown. > As in the previous case, the log items are moved to the CIL where they are > pinned before unlocking them. The pinning of the log items prevents > the inode cluster writeback code from including the pinned inode in its > writeback operation. These log items are then processed by > xlog_cil_committed() which gets invoked as part of error handling by > xlog_cil_push_work(). It's more complex than that. 3. We get an error returned from xfs_defer_finish_noroll() or xfs_trans_run_precommits(). In these cases we skip the insertion into the CIL altogether and cancel the dirty transaction with dirty log items exactly as we currently do right now. This is why, in that series of commits, I changed all the shutdown checks for metadata operations to use xlog_is_shutdown() as opposed to xfs_is_shutdown() - to ensure that shutdown checks are consistent with what __xfs_trans_commit() required. Hence these paths would all behave the same way when dirty transactions are canceled, and that leads to the next point: 4. Random high level code that does this: if (xlog_is_shutdown(log)) xfs_trans_cancel(tp); has the same problem - we abort dirty items and release them when the log has been shut down. In these cases, we have no possible mechanism to insert them into the CIL to avoid the same shutdown race condition. IOWs, removing the shutdown check from __xfs_trans_commit() doesn't resolve the problem entirely as there are other paths that lead to the same situation where we cancel transactions due to the log being shut down. 5. An the architectural issue with the fix: inserting the items into the CIL instead of aborting them requires something or someone to push the CIL to error out the items and release them. Who is responsible for invoking that push? It's not xlog_cil_commit(). That only pushes on the CIL if the amount of queued work is over the push threshold, otherwise it will not queue xlog_cil_push_work() to run again. Hence every call to xfs_trans_commit() is now absolutely reliant on some future code pushing the CIL to free the items that were committed to the CIL after the log is shut down. I think that is eventually caught by the xfs_log_unmount() path doing a log force. i.e. xfs_log_unmount xfs_log_clean xfs_log_quiesce xfs_log_force xlog_cil_force xlog_cil_force_seq xlog_cil_push_now However, this is reliant on there being no shutdown checks in this path, and xlog_cil_push_work() avoiding any shutdown checks until it's done all the work needed to be able to cancle the pending log items, right? IOWs, there's lots of things that have to be done just right for this "future cleanup" mechanism to work, and we have to be very careful not to place a shutdown check in just the wrong spot otherwise we can break the shutdown processing. We know how problematic this can be - this "cleanup is somebody else's future problem" model is how we used to handle shutdown and it was an -utter mess-. Christoph and I spent a lot of time years ago fixing the shutdown mess by moving to a model where the current holder of an item is responsible for releasing that item when a shutdown is detected. That included xfs_trans_commit(). This change of shutdown processing model was one of the reasons that tests like generic/388 (and other fstress+shutdown tests) are largely reliable now - they don't crash, hang, leak or trigger UAF errors all the time like they used to. Shutdown reliability used to be -horrible-. Hence I really don't want to see us moving back towards a "cleanup is somebody else's future problem" model, regardless of whether it works in this case or not, because it's proven to be a broken model. It's probably obvious by now that racing shutdowns with cancelling dirty transaction safely isn't a simple problem. That's why I said "I have no idea what the right way to fix this is yet" - I didn't have time to write a long, long email explaining all this. I -think- that the ->iop_shutdown() mechanism I described to Christoph recently can be applied here, too. That provided a mechanism for the AIL push to be able to process items safely once a shutdown had been detected, and I think it applies equally here. i.e. we know the log has been shut down, and so we need to error out the dirty log items via IO error mechanisms rather than simply aborting the log items and hoping everyone notices that it's been aborted.... -Dave.
On Wed, Sep 18, 2024 at 11:20:12 AM +1000, Dave Chinner wrote: > On Mon, Sep 16, 2024 at 11:14:32AM +0530, Chandan Babu R wrote: >> On Thu, Sep 05, 2024 at 06:12:29 PM +0530, Chandan Babu R wrote: >> >>> To overcome this bug, this commit removes the check for log shutdown during >> >>> high level transaction commit operation. The log items in the high level >> >>> transaction will now be committed to the CIL despite the log being >> >>> shutdown. This will allow the CIL processing logic (i.e. xlog_cil_push_work()) >> >>> to invoke xlog_cil_committed() as part of error handling. This will cause >> >>> xfs_buf log item to to be unpinned and the corresponding inodes to be aborted >> >>> and have their XFS_IFLUSHING flag cleared. >> >> >> >> I don't know exactly how the problem arose, but I can say for >> >> certain that the proposed fix is not valid. Removing that specific >> >> log shutdown check re-opens a race condition which can causes on >> >> disk corruption. The shutdown was specifically placed to close that >> >> race - See commit 3c4cb76bce43 ("xfs: xfs_trans_commit() path must >> >> check for log shutdown") for details. >> >> >> >> I have no idea what the right way to fix this is yet, but removing >> >> the shutdown check isn't it... >> >> >> >> Commit 3c4cb76bce43 describes the following scenario, >> >> 1. Filesystem is shutdown but the log remains operational. >> 2. High-level transaction commit (i.e. xfs_trans_commit()) notices the fs >> shutdown. Hence it aborts the dirty log items. One of the log items being >> aborted is an inode log item. >> 3. An inode cluster writeback is executed. Here, we come across the previously >> aborted inode log item. The inode log item is currently unpinned and >> dirty. Hence, the inode is included in the cluster buffer writeback. >> 4. Cluster buffer IO completion tries to remove the inode log item from the >> AIL and hence trips over an assert statement since the log item was never >> on the AIL. This indicates that the inode was never written to the journal. >> >> Hence the commit 3c4cb76bce43 will abort the transaction commit only when the >> log has been shutdown. >> >> With the log shutdown check removed, we can end up with the following cases >> during high-level transaction commit operation, >> 1. The filesystem is shutdown while the log remains operational. >> In this case, the log items are committed to the CIL where they are pinned >> before unlocking them. >> This should prevent the inode cluster writeback code >> from including such an inode for writeback since the corresponding log item >> is pinned. From here onwards, the normal flow of log items from the CIL to >> the AIL occurs after the contents of the log items are written to the >> journal and then later unpinned. >> The above logic holds true even without applying the "RFC patch" presented >> in the mail. >> >> 2. The log is shutdown. >> As in the previous case, the log items are moved to the CIL where they are >> pinned before unlocking them. The pinning of the log items prevents >> the inode cluster writeback code from including the pinned inode in its >> writeback operation. These log items are then processed by >> xlog_cil_committed() which gets invoked as part of error handling by >> xlog_cil_push_work(). > > It's more complex than that. > > 3. We get an error returned from xfs_defer_finish_noroll() or > xfs_trans_run_precommits(). > Commit b5f17bec1213a3ed2f4d79ad4c566e00cabe2a9b has the following, 2. xfs_force_shutdown() is used in places to cause the current modification to be aborted via xfs_trans_commit() because it may be impractical or impossible to cancel the transaction directly, and hence xfs_trans_commit() must cancel transactions when xfs_is_shutdown() is true in this situation. But we can't do that because of #1. I see that xfs_trans_run_precommits() and xfs_defer_finish_noroll() both call xfs_force_shutdown() when they encounter an error. But these functions could have percolated the error code to __xfs_trans_commit() and let __xfs_trans_commit() deal with the scenario more gracefully rather than checking for log shutdown. Can you please explain why this is impractical or impossible?
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index bdf3704dc301..b43436c8abaa 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -868,17 +868,6 @@ __xfs_trans_commit( if (!(tp->t_flags & XFS_TRANS_DIRTY)) goto out_unreserve; - /* - * We must check against log shutdown here because we cannot abort log - * items and leave them dirty, inconsistent and unpinned in memory while - * the log is active. This leaves them open to being written back to - * disk, and that will lead to on-disk corruption. - */ - if (xlog_is_shutdown(log)) { - error = -EIO; - goto out_unreserve; - } - ASSERT(tp->t_ticket != NULL); /*
Executing xfs/057 can lead to an unmount task to wait indefinitely for XFS_IFLUSHING flag on some inodes to be cleared. The following timeline describes as to how inodes can get into such a state. Task A Task B Iclog endio processing ---------------------------------------------------------------------------- Inodes are freed Inodes items are added to the CIL CIL contents are written to iclog iclog->ic_fail_crc is set to true iclog is submitted for writing to the disk Last inode in the cluster buffer is freed XFS_[ISTALE/IFLUSHING] is set on all inodes in the cluster buffer XFS_STALE is set on the cluster buffer iclog crc error is detected ... during endio processing During xfs_trans_commit, Set XFS_LI_ABORTED on inode log shutdown is detected items XFS_LI_ABORTED is set xfs_inode_item_committed() on xfs_buf_log_item - Unpin the inode since it is stale and return -1 xfs_buf_log_item is freed Inode log items are not xfs_buf is not freed here processed further since since b_hold has a xfs_inode_item_committed() non-zero value returns -1 During normal operation, the stale inodes are processed by xfs_buf_item_unpin() => xfs_buf_inode_iodone(). This ends up calling xfs_iflush_abort() which in turn clears the XFS_IFLUSHING flag. However, in the case of this bug, the xfs_buf_log_item is freed just before the high level transaction is committed to the CIL. To overcome this bug, this commit removes the check for log shutdown during high level transaction commit operation. The log items in the high level transaction will now be committed to the CIL despite the log being shutdown. This will allow the CIL processing logic (i.e. xlog_cil_push_work()) to invoke xlog_cil_committed() as part of error handling. This will cause xfs_buf log item to to be unpinned and the corresponding inodes to be aborted and have their XFS_IFLUSHING flag cleared. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> --- PS: I have tested this patch by executing xfs/057 in a loop for about 24 hours. On a non-patched kernel, this issue gets recreated within 24 hours. fs/xfs/xfs_trans.c | 11 ----------- 1 file changed, 11 deletions(-)