diff mbox series

[v2] xfs: fix hung when transaction commit fail in xfs_inactive_ifree

Message ID 20230227062952.GA53788@ceph-admin (mailing list archive)
State New, archived
Headers show
Series [v2] xfs: fix hung when transaction commit fail in xfs_inactive_ifree | expand

Commit Message

Long Li Feb. 27, 2023, 6:29 a.m. UTC
After running unplug disk test and unmount filesystem, the umount thread
hung all the time.

 crash> dmesg
 sd 0:0:0:0: rejecting I/O to offline device
 XFS (sda): log I/O error -5
 XFS (sda): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0x12e0/0x1cf0
	(fs/xfs/libxfs/xfs_defer.c:504).  Shutting down filesystem.
 XFS (sda): Please unmount the filesystem and rectify the problem(s)
 XFS (sda): xfs_inactive_ifree: xfs_trans_commit returned error -5
 XFS (sda): Unmounting Filesystem

 crash> bt 3368
 PID: 3368   TASK: ffff88801bcd8040  CPU: 3   COMMAND: "umount"
  #0 [ffffc900086a7ae0] __schedule at ffffffff83d3fd25
  #1 [ffffc900086a7be8] schedule at ffffffff83d414dd
  #2 [ffffc900086a7c10] xfs_ail_push_all_sync at ffffffff8256db24
  #3 [ffffc900086a7d18] xfs_unmount_flush_inodes at ffffffff824ee7e2
  #4 [ffffc900086a7d28] xfs_unmountfs at ffffffff824f2eff
  #5 [ffffc900086a7da8] xfs_fs_put_super at ffffffff82503e69
  #6 [ffffc900086a7de8] generic_shutdown_super at ffffffff81aeb8cd
  #7 [ffffc900086a7e10] kill_block_super at ffffffff81aefcfa
  #8 [ffffc900086a7e30] deactivate_locked_super at ffffffff81aeb2da
  #9 [ffffc900086a7e48] deactivate_super at ffffffff81aeb639
 #10 [ffffc900086a7e68] cleanup_mnt at ffffffff81b6ddd5
 #11 [ffffc900086a7ea0] __cleanup_mnt at ffffffff81b6dfdf
 #12 [ffffc900086a7eb0] task_work_run at ffffffff8126e5cf
 #13 [ffffc900086a7ef8] exit_to_user_mode_prepare at ffffffff813fa136
 #14 [ffffc900086a7f28] syscall_exit_to_user_mode at ffffffff83d25dbb
 #15 [ffffc900086a7f40] do_syscall_64 at ffffffff83d1f8d9
 #16 [ffffc900086a7f50] entry_SYSCALL_64_after_hwframe at ffffffff83e00085

When we free a cluster buffer from xfs_ifree_cluster, all the inodes in
cache are marked XFS_ISTALE. On journal commit dirty stale inodes as are
handled by both buffer and inode log items, inodes marked as XFS_ISTALE
in AIL will be removed from the AIL because the buffer log item will clean
it. If the transaction commit fails in the xfs_inactive_ifree(), inodes
marked as XFS_ISTALE will be left in AIL due to buf log item is not
committed, this will cause the unmount thread above to be blocked all the
time. Abort inodes flushing associated with the buffer that is stale when
buf item release, prevent inode item left in AIL and can not being pushed.

Signed-off-by: Long Li <leo.lilong@huawei.com>
---
 fs/xfs/xfs_buf_item.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Darrick J. Wong March 1, 2023, 1:05 a.m. UTC | #1
On Mon, Feb 27, 2023 at 02:29:52PM +0800, Long Li wrote:
> After running unplug disk test and unmount filesystem, the umount thread
> hung all the time.
> 
>  crash> dmesg
>  sd 0:0:0:0: rejecting I/O to offline device
>  XFS (sda): log I/O error -5
>  XFS (sda): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0x12e0/0x1cf0
> 	(fs/xfs/libxfs/xfs_defer.c:504).  Shutting down filesystem.
>  XFS (sda): Please unmount the filesystem and rectify the problem(s)
>  XFS (sda): xfs_inactive_ifree: xfs_trans_commit returned error -5
>  XFS (sda): Unmounting Filesystem
> 
>  crash> bt 3368
>  PID: 3368   TASK: ffff88801bcd8040  CPU: 3   COMMAND: "umount"
>   #0 [ffffc900086a7ae0] __schedule at ffffffff83d3fd25
>   #1 [ffffc900086a7be8] schedule at ffffffff83d414dd
>   #2 [ffffc900086a7c10] xfs_ail_push_all_sync at ffffffff8256db24
>   #3 [ffffc900086a7d18] xfs_unmount_flush_inodes at ffffffff824ee7e2
>   #4 [ffffc900086a7d28] xfs_unmountfs at ffffffff824f2eff
>   #5 [ffffc900086a7da8] xfs_fs_put_super at ffffffff82503e69
>   #6 [ffffc900086a7de8] generic_shutdown_super at ffffffff81aeb8cd
>   #7 [ffffc900086a7e10] kill_block_super at ffffffff81aefcfa
>   #8 [ffffc900086a7e30] deactivate_locked_super at ffffffff81aeb2da
>   #9 [ffffc900086a7e48] deactivate_super at ffffffff81aeb639
>  #10 [ffffc900086a7e68] cleanup_mnt at ffffffff81b6ddd5
>  #11 [ffffc900086a7ea0] __cleanup_mnt at ffffffff81b6dfdf
>  #12 [ffffc900086a7eb0] task_work_run at ffffffff8126e5cf
>  #13 [ffffc900086a7ef8] exit_to_user_mode_prepare at ffffffff813fa136
>  #14 [ffffc900086a7f28] syscall_exit_to_user_mode at ffffffff83d25dbb
>  #15 [ffffc900086a7f40] do_syscall_64 at ffffffff83d1f8d9
>  #16 [ffffc900086a7f50] entry_SYSCALL_64_after_hwframe at ffffffff83e00085
> 
> When we free a cluster buffer from xfs_ifree_cluster, all the inodes in
> cache are marked XFS_ISTALE. On journal commit dirty stale inodes as are
> handled by both buffer and inode log items, inodes marked as XFS_ISTALE
> in AIL will be removed from the AIL because the buffer log item will clean
> it. If the transaction commit fails in the xfs_inactive_ifree(), inodes
> marked as XFS_ISTALE will be left in AIL due to buf log item is not
> committed,

Ah.  So the inode log items *are* in the AIL, but the buffer log item
for the inode cluster buffer is /not/ in the AIL?

Is it possible for neither inode nor cluster buffer are in the AIL?
I think the answer is no because freeing the inode will put it in the
AIL?

> this will cause the unmount thread above to be blocked all the
> time. Abort inodes flushing associated with the buffer that is stale when
> buf item release, prevent inode item left in AIL and can not being pushed.
> 
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
>  fs/xfs/xfs_buf_item.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index df7322ed73fa..825e638d1088 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -659,14 +659,16 @@ xfs_buf_item_release(
>  {
>  	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
>  	struct xfs_buf		*bp = bip->bli_buf;
> +	struct xfs_inode_log_item *iip;
> +	struct xfs_log_item     *lp, *n;

tabs here not spaces       ^^^^^

>  	bool			released;
>  	bool			hold = bip->bli_flags & XFS_BLI_HOLD;
>  	bool			stale = bip->bli_flags & XFS_BLI_STALE;
> +	bool			aborted = test_bit(XFS_LI_ABORTED,
> +						   &lip->li_flags);
>  #if defined(DEBUG) || defined(XFS_WARN)
>  	bool			ordered = bip->bli_flags & XFS_BLI_ORDERED;
>  	bool			dirty = bip->bli_flags & XFS_BLI_DIRTY;
> -	bool			aborted = test_bit(XFS_LI_ABORTED,
> -						   &lip->li_flags);
>  #endif
>  
>  	trace_xfs_buf_item_release(bip);
> @@ -679,6 +681,19 @@ xfs_buf_item_release(
>  	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
>  	ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
>  
> +	/*
> +	 * If it is an inode buffer and item marked as stale, abort flushing
> +	 * inodes associated with the buf, prevent inode item left in AIL.
> +	 */
> +	if (aborted && (bip->bli_flags & XFS_BLI_STALE_INODE)) {
> +		list_for_each_entry_safe(lp, n, &bp->b_li_list, li_bio_list) {
> +			iip = (struct xfs_inode_log_item *)lp;

Use container_of(), not raw casting.

> +
> +			if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE))
> +				xfs_iflush_abort(iip->ili_inode);
> +		}
> +	}
> +
>  	/*
>  	 * Clear the buffer's association with this transaction and
>  	 * per-transaction state from the bli, which has been copied above.
> -- 
> 2.31.1
>
Dave Chinner March 1, 2023, 5:01 a.m. UTC | #2
On Tue, Feb 28, 2023 at 05:05:26PM -0800, Darrick J. Wong wrote:
> On Mon, Feb 27, 2023 at 02:29:52PM +0800, Long Li wrote:
> > After running unplug disk test and unmount filesystem, the umount thread
> > hung all the time.
> > 
> >  crash> dmesg
> >  sd 0:0:0:0: rejecting I/O to offline device
> >  XFS (sda): log I/O error -5
> >  XFS (sda): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0x12e0/0x1cf0
> > 	(fs/xfs/libxfs/xfs_defer.c:504).  Shutting down filesystem.
> >  XFS (sda): Please unmount the filesystem and rectify the problem(s)
> >  XFS (sda): xfs_inactive_ifree: xfs_trans_commit returned error -5
> >  XFS (sda): Unmounting Filesystem
> > 
> >  crash> bt 3368
> >  PID: 3368   TASK: ffff88801bcd8040  CPU: 3   COMMAND: "umount"
> >   #0 [ffffc900086a7ae0] __schedule at ffffffff83d3fd25
> >   #1 [ffffc900086a7be8] schedule at ffffffff83d414dd
> >   #2 [ffffc900086a7c10] xfs_ail_push_all_sync at ffffffff8256db24
> >   #3 [ffffc900086a7d18] xfs_unmount_flush_inodes at ffffffff824ee7e2
> >   #4 [ffffc900086a7d28] xfs_unmountfs at ffffffff824f2eff
> >   #5 [ffffc900086a7da8] xfs_fs_put_super at ffffffff82503e69
> >   #6 [ffffc900086a7de8] generic_shutdown_super at ffffffff81aeb8cd
> >   #7 [ffffc900086a7e10] kill_block_super at ffffffff81aefcfa
> >   #8 [ffffc900086a7e30] deactivate_locked_super at ffffffff81aeb2da
> >   #9 [ffffc900086a7e48] deactivate_super at ffffffff81aeb639
> >  #10 [ffffc900086a7e68] cleanup_mnt at ffffffff81b6ddd5
> >  #11 [ffffc900086a7ea0] __cleanup_mnt at ffffffff81b6dfdf
> >  #12 [ffffc900086a7eb0] task_work_run at ffffffff8126e5cf
> >  #13 [ffffc900086a7ef8] exit_to_user_mode_prepare at ffffffff813fa136
> >  #14 [ffffc900086a7f28] syscall_exit_to_user_mode at ffffffff83d25dbb
> >  #15 [ffffc900086a7f40] do_syscall_64 at ffffffff83d1f8d9
> >  #16 [ffffc900086a7f50] entry_SYSCALL_64_after_hwframe at ffffffff83e00085
> > 
> > When we free a cluster buffer from xfs_ifree_cluster, all the inodes in
> > cache are marked XFS_ISTALE. On journal commit dirty stale inodes as are
> > handled by both buffer and inode log items, inodes marked as XFS_ISTALE
> > in AIL will be removed from the AIL because the buffer log item will clean
> > it. If the transaction commit fails in the xfs_inactive_ifree(), inodes
> > marked as XFS_ISTALE will be left in AIL due to buf log item is not
> > committed,
> 
> Ah.  So the inode log items *are* in the AIL, but the buffer log item
> for the inode cluster buffer is /not/ in the AIL?

Which is the rare case, and I think can only happen if an unlinked
file is held open until the unlinked list mods that last logged the
buffer have been written to disk. We can keep modifying the inode
and having it logged while the buffer is clean and has no active log
item...


> Is it possible for neither inode nor cluster buffer are in the AIL?
> I think the answer is no because freeing the inode will put it in the
> AIL?

I think the answer is yes, because after an unlink the buffer log
item should be in the AIL at the same LSN as the inode log item when
the unlink transaction updated both of them. Pushing a dirty inode
flush all the dirty inodes and so both the inode and buffer items
get cleaned and removed from the AIL in the same IO completion.

Hence if the unlinked inode has been held open long enough for
metadata writeback to complete, close() can trigger inactivation on
both a clean inode cluster buffer and clean inode log item. i.e.
neither are in the AIL at the time the inactivation and inode chunk
freeing starts, and the commit has to insert both.


> > @@ -679,6 +681,19 @@ xfs_buf_item_release(
> >  	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
> >  	ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
> >  
> > +	/*
> > +	 * If it is an inode buffer and item marked as stale, abort flushing
> > +	 * inodes associated with the buf, prevent inode item left in AIL.
> > +	 */
> > +	if (aborted && (bip->bli_flags & XFS_BLI_STALE_INODE)) {
> > +		list_for_each_entry_safe(lp, n, &bp->b_li_list, li_bio_list) {
> > +			iip = (struct xfs_inode_log_item *)lp;
> 
> Use container_of(), not raw casting.
> 
> > +
> > +			if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE))
> > +				xfs_iflush_abort(iip->ili_inode);
> > +		}
> > +	}
> > +

This is closer to the sort of fix that is needed, but this should
not be done until the last reference to the buf log item goes away.
i.e. in xfs_buf_item_put().

But then I look at the same conditions in xfs_buf_item_unpin(),
which is the normal path that runs this stale inode cleanup, it does
this for stale buffers if it drops the last reference to the buf log
item.

                /*
                 * If we get called here because of an IO error, we may or may
                 * not have the item on the AIL. xfs_trans_ail_delete() will
                 * take care of that situation. xfs_trans_ail_delete() drops
                 * the AIL lock.
                 */
                if (bip->bli_flags & XFS_BLI_STALE_INODE) {
                        xfs_buf_item_done(bp);
                        xfs_buf_inode_iodone(bp);
                        ASSERT(list_empty(&bp->b_li_list));
                } else {
                        xfs_trans_ail_delete(lip, SHUTDOWN_LOG_IO_ERROR);
                        xfs_buf_item_relse(bp);
                        ASSERT(bp->b_log_item == NULL);
                }
		xfs_buf_relse(bp);

And if the buffer is not stale, then it runs it through
xfs_buf_ioend_fail() to actually mark the attached log items as
failed.

So it seems to me that the cleanup needed here is more complex than
unconditionally aborting stale inodes, but I haven't had a chance to
think it through fully yet. This is one of the more complex corners
of the buffer/inode item life cycles, and it's been a source of
shutdown issues for a long time....

-Dave.
Long Li March 9, 2023, 2:10 p.m. UTC | #3
On Wed, Mar 01, 2023 at 04:01:35PM +1100, Dave Chinner wrote:
> On Tue, Feb 28, 2023 at 05:05:26PM -0800, Darrick J. Wong wrote:
> > On Mon, Feb 27, 2023 at 02:29:52PM +0800, Long Li wrote:
> > > After running unplug disk test and unmount filesystem, the umount thread
> > > hung all the time.
> > > 
> > >  crash> dmesg
> > >  sd 0:0:0:0: rejecting I/O to offline device
> > >  XFS (sda): log I/O error -5
> > >  XFS (sda): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0x12e0/0x1cf0
> > > 	(fs/xfs/libxfs/xfs_defer.c:504).  Shutting down filesystem.
> > >  XFS (sda): Please unmount the filesystem and rectify the problem(s)
> > >  XFS (sda): xfs_inactive_ifree: xfs_trans_commit returned error -5
> > >  XFS (sda): Unmounting Filesystem
> > > 
> > >  crash> bt 3368
> > >  PID: 3368   TASK: ffff88801bcd8040  CPU: 3   COMMAND: "umount"
> > >   #0 [ffffc900086a7ae0] __schedule at ffffffff83d3fd25
> > >   #1 [ffffc900086a7be8] schedule at ffffffff83d414dd
> > >   #2 [ffffc900086a7c10] xfs_ail_push_all_sync at ffffffff8256db24
> > >   #3 [ffffc900086a7d18] xfs_unmount_flush_inodes at ffffffff824ee7e2
> > >   #4 [ffffc900086a7d28] xfs_unmountfs at ffffffff824f2eff
> > >   #5 [ffffc900086a7da8] xfs_fs_put_super at ffffffff82503e69
> > >   #6 [ffffc900086a7de8] generic_shutdown_super at ffffffff81aeb8cd
> > >   #7 [ffffc900086a7e10] kill_block_super at ffffffff81aefcfa
> > >   #8 [ffffc900086a7e30] deactivate_locked_super at ffffffff81aeb2da
> > >   #9 [ffffc900086a7e48] deactivate_super at ffffffff81aeb639
> > >  #10 [ffffc900086a7e68] cleanup_mnt at ffffffff81b6ddd5
> > >  #11 [ffffc900086a7ea0] __cleanup_mnt at ffffffff81b6dfdf
> > >  #12 [ffffc900086a7eb0] task_work_run at ffffffff8126e5cf
> > >  #13 [ffffc900086a7ef8] exit_to_user_mode_prepare at ffffffff813fa136
> > >  #14 [ffffc900086a7f28] syscall_exit_to_user_mode at ffffffff83d25dbb
> > >  #15 [ffffc900086a7f40] do_syscall_64 at ffffffff83d1f8d9
> > >  #16 [ffffc900086a7f50] entry_SYSCALL_64_after_hwframe at ffffffff83e00085
> > > 
> > > When we free a cluster buffer from xfs_ifree_cluster, all the inodes in
> > > cache are marked XFS_ISTALE. On journal commit dirty stale inodes as are
> > > handled by both buffer and inode log items, inodes marked as XFS_ISTALE
> > > in AIL will be removed from the AIL because the buffer log item will clean
> > > it. If the transaction commit fails in the xfs_inactive_ifree(), inodes
> > > marked as XFS_ISTALE will be left in AIL due to buf log item is not
> > > committed,
> > 
> > Ah.  So the inode log items *are* in the AIL, but the buffer log item
> > for the inode cluster buffer is /not/ in the AIL?
> 
> Which is the rare case, and I think can only happen if an unlinked
> file is held open until the unlinked list mods that last logged the
> buffer have been written to disk. We can keep modifying the inode
> and having it logged while the buffer is clean and has no active log
> item...
> 

Yes, buffer log item may be in the AIL, but it could be delete from AIL by
xfs_buf_item_release when transaction commit fail.
> 
> > Is it possible for neither inode nor cluster buffer are in the AIL?
> > I think the answer is no because freeing the inode will put it in the
> > AIL?
> 
> I think the answer is yes, because after an unlink the buffer log
> item should be in the AIL at the same LSN as the inode log item when
> the unlink transaction updated both of them. Pushing a dirty inode
> flush all the dirty inodes and so both the inode and buffer items
> get cleaned and removed from the AIL in the same IO completion.
> 
> Hence if the unlinked inode has been held open long enough for
> metadata writeback to complete, close() can trigger inactivation on
> both a clean inode cluster buffer and clean inode log item. i.e.
> neither are in the AIL at the time the inactivation and inode chunk
> freeing starts, and the commit has to insert both.
> 
> 
> > > @@ -679,6 +681,19 @@ xfs_buf_item_release(
> > >  	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
> > >  	ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
> > >  
> > > +	/*
> > > +	 * If it is an inode buffer and item marked as stale, abort flushing
> > > +	 * inodes associated with the buf, prevent inode item left in AIL.
> > > +	 */
> > > +	if (aborted && (bip->bli_flags & XFS_BLI_STALE_INODE)) {
> > > +		list_for_each_entry_safe(lp, n, &bp->b_li_list, li_bio_list) {
> > > +			iip = (struct xfs_inode_log_item *)lp;
> > 
> > Use container_of(), not raw casting.
> > 
> > > +
> > > +			if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE))
> > > +				xfs_iflush_abort(iip->ili_inode);
> > > +		}
> > > +	}
> > > +
> 
> This is closer to the sort of fix that is needed, but this should
> not be done until the last reference to the buf log item goes away.
> i.e. in xfs_buf_item_put().
> 
Agree with you, Take a look at the other code, aborting stale inodes are
after last reference to the buf log item.

> But then I look at the same conditions in xfs_buf_item_unpin(),
> which is the normal path that runs this stale inode cleanup, it does
> this for stale buffers if it drops the last reference to the buf log
> item.
> 
>                 /*
>                  * If we get called here because of an IO error, we may or may
>                  * not have the item on the AIL. xfs_trans_ail_delete() will
>                  * take care of that situation. xfs_trans_ail_delete() drops
>                  * the AIL lock.
>                  */
>                 if (bip->bli_flags & XFS_BLI_STALE_INODE) {
>                         xfs_buf_item_done(bp);
>                         xfs_buf_inode_iodone(bp);
>                         ASSERT(list_empty(&bp->b_li_list));
>                 } else {
>                         xfs_trans_ail_delete(lip, SHUTDOWN_LOG_IO_ERROR);
>                         xfs_buf_item_relse(bp);
>                         ASSERT(bp->b_log_item == NULL);
>                 }
> 		xfs_buf_relse(bp);
> 
> And if the buffer is not stale, then it runs it through
> xfs_buf_ioend_fail() to actually mark the attached log items as
> failed.
> 
> So it seems to me that the cleanup needed here is more complex than
> unconditionally aborting stale inodes, but I haven't had a chance to
> think it through fully yet. This is one of the more complex corners
> of the buffer/inode item life cycles, and it's been a source of
> shutdown issues for a long time....
> 
Sorry I replied so late, I tried to think a little clearer. The normal path
that runs stale inode cleanup in xfs_buf_item_unpin() just release buf log
item and aborting stale inodes, three will be no non-stale inode in the buf
b_li_list list because of buf lock, so I think it is enougth aborting stale
inodes. There may be something I haven't thought through, thanks for pointing
it out.

As you said, it is a complex corners of the buffer/inode item life cycles.
The problem [1] about inode log item UAF that I tried to solve previously,
it can't be solved simply due to inode log item life cycles, and I don't 
have a new solution yet.

[1] https://patchwork.kernel.org/project/xfs/patch/20230211022941.GA1515023@ceph-admin/

Thanks
Long Li
diff mbox series

Patch

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index df7322ed73fa..825e638d1088 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -659,14 +659,16 @@  xfs_buf_item_release(
 {
 	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
 	struct xfs_buf		*bp = bip->bli_buf;
+	struct xfs_inode_log_item *iip;
+	struct xfs_log_item     *lp, *n;
 	bool			released;
 	bool			hold = bip->bli_flags & XFS_BLI_HOLD;
 	bool			stale = bip->bli_flags & XFS_BLI_STALE;
+	bool			aborted = test_bit(XFS_LI_ABORTED,
+						   &lip->li_flags);
 #if defined(DEBUG) || defined(XFS_WARN)
 	bool			ordered = bip->bli_flags & XFS_BLI_ORDERED;
 	bool			dirty = bip->bli_flags & XFS_BLI_DIRTY;
-	bool			aborted = test_bit(XFS_LI_ABORTED,
-						   &lip->li_flags);
 #endif
 
 	trace_xfs_buf_item_release(bip);
@@ -679,6 +681,19 @@  xfs_buf_item_release(
 	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
 	ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
 
+	/*
+	 * If it is an inode buffer and item marked as stale, abort flushing
+	 * inodes associated with the buf, prevent inode item left in AIL.
+	 */
+	if (aborted && (bip->bli_flags & XFS_BLI_STALE_INODE)) {
+		list_for_each_entry_safe(lp, n, &bp->b_li_list, li_bio_list) {
+			iip = (struct xfs_inode_log_item *)lp;
+
+			if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE))
+				xfs_iflush_abort(iip->ili_inode);
+		}
+	}
+
 	/*
 	 * Clear the buffer's association with this transaction and
 	 * per-transaction state from the bli, which has been copied above.