diff mbox series

[2/3] xfs: abort intent items when recovery intents fail

Message ID 20230629131725.945004-3-leo.lilong@huawei.com (mailing list archive)
State Superseded
Headers show
Series xfs: fix two problem when recovery intents fails | expand

Commit Message

Long Li June 29, 2023, 1:17 p.m. UTC
When recovery intents, it may capture some deferred ops and commit the new
intent items, if recovery intents fails, there will be no done item drop
the reference to the new intent item. New intent items will left in AIL
and caused mount thread hung all the time as fllows:

    [root@localhost ~]# cat /proc/539/stack
    [<0>] xfs_ail_push_all_sync+0x174/0x230
    [<0>] xfs_unmount_flush_inodes+0x8d/0xd0
    [<0>] xfs_mountfs+0x15f7/0x1e70
    [<0>] xfs_fs_fill_super+0x10ec/0x1b20
    [<0>] get_tree_bdev+0x3c8/0x730
    [<0>] vfs_get_tree+0x89/0x2c0
    [<0>] path_mount+0xecf/0x1800
    [<0>] do_mount+0xf3/0x110
    [<0>] __x64_sys_mount+0x154/0x1f0
    [<0>] do_syscall_64+0x39/0x80
    [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

During intent item recover, if transaction that have deferred ops commmit
fails in xfs_defer_ops_capture_and_commit(), defer capture would not added
to capture list, so matching done items would not commit when finish defer
operations, this will cause intent item leaks:

unreferenced object 0xffff888016719108 (size 432):
  comm "mount", pid 529, jiffies 4294706839 (age 144.463s)
  hex dump (first 32 bytes):
    08 91 71 16 80 88 ff ff 08 91 71 16 80 88 ff ff  ..q.......q.....
    18 91 71 16 80 88 ff ff 18 91 71 16 80 88 ff ff  ..q.......q.....
  backtrace:
    [<ffffffff8230c68f>] xfs_efi_init+0x18f/0x1d0
    [<ffffffff8230c720>] xfs_extent_free_create_intent+0x50/0x150
    [<ffffffff821b671a>] xfs_defer_create_intents+0x16a/0x340
    [<ffffffff821bac3e>] xfs_defer_ops_capture_and_commit+0x8e/0xad0
    [<ffffffff82322bb9>] xfs_cui_item_recover+0x819/0x980
    [<ffffffff823289b6>] xlog_recover_process_intents+0x246/0xb70
    [<ffffffff8233249a>] xlog_recover_finish+0x8a/0x9a0
    [<ffffffff822eeafb>] xfs_log_mount_finish+0x2bb/0x4a0
    [<ffffffff822c0f4f>] xfs_mountfs+0x14bf/0x1e70
    [<ffffffff822d1f80>] xfs_fs_fill_super+0x10d0/0x1b20
    [<ffffffff81a21fa2>] get_tree_bdev+0x3d2/0x6d0
    [<ffffffff81a1ee09>] vfs_get_tree+0x89/0x2c0
    [<ffffffff81a9f35f>] path_mount+0xecf/0x1800
    [<ffffffff81a9fd83>] do_mount+0xf3/0x110
    [<ffffffff81aa00e4>] __x64_sys_mount+0x154/0x1f0
    [<ffffffff83968739>] do_syscall_64+0x39/0x80

Fix the problem above by abort intent items that don't have a done item
when recovery intents fail.

Fixes: e6fff81e4870 ("xfs: proper replay of deferred ops queued during log recovery")
Signed-off-by: Long Li <leo.lilong@huawei.com>
---
 fs/xfs/libxfs/xfs_defer.c | 1 +
 fs/xfs/libxfs/xfs_defer.h | 1 +
 fs/xfs/xfs_log_recover.c  | 1 +
 3 files changed, 3 insertions(+)

Comments

Darrick J. Wong June 29, 2023, 2:42 p.m. UTC | #1
On Thu, Jun 29, 2023 at 09:17:24PM +0800, Long Li wrote:
> When recovery intents, it may capture some deferred ops and commit the new
> intent items, if recovery intents fails, there will be no done item drop
> the reference to the new intent item. New intent items will left in AIL
> and caused mount thread hung all the time as fllows:

Er... let me try rewriting this a bit:

"When recovering intents, we capture newly created intent items as part
of committing recovered intent items.  If intent recovery fails at a
later point, we forget to remove those newly created intent items from
the AIL and hang:

> 
>     [root@localhost ~]# cat /proc/539/stack
>     [<0>] xfs_ail_push_all_sync+0x174/0x230
>     [<0>] xfs_unmount_flush_inodes+0x8d/0xd0
>     [<0>] xfs_mountfs+0x15f7/0x1e70
>     [<0>] xfs_fs_fill_super+0x10ec/0x1b20
>     [<0>] get_tree_bdev+0x3c8/0x730
>     [<0>] vfs_get_tree+0x89/0x2c0
>     [<0>] path_mount+0xecf/0x1800
>     [<0>] do_mount+0xf3/0x110
>     [<0>] __x64_sys_mount+0x154/0x1f0
>     [<0>] do_syscall_64+0x39/0x80
>     [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> During intent item recover, if transaction that have deferred ops commmit
> fails in xfs_defer_ops_capture_and_commit(), defer capture would not added
> to capture list, so matching done items would not commit when finish defer
> operations, this will cause intent item leaks:

"Intent recovery hasn't created done items for these newly created
intent items, so the capture structure is the sole owner of the captured
intent items.  We must release them explicitly or else they leak:

> unreferenced object 0xffff888016719108 (size 432):
>   comm "mount", pid 529, jiffies 4294706839 (age 144.463s)
>   hex dump (first 32 bytes):
>     08 91 71 16 80 88 ff ff 08 91 71 16 80 88 ff ff  ..q.......q.....
>     18 91 71 16 80 88 ff ff 18 91 71 16 80 88 ff ff  ..q.......q.....
>   backtrace:
>     [<ffffffff8230c68f>] xfs_efi_init+0x18f/0x1d0
>     [<ffffffff8230c720>] xfs_extent_free_create_intent+0x50/0x150
>     [<ffffffff821b671a>] xfs_defer_create_intents+0x16a/0x340
>     [<ffffffff821bac3e>] xfs_defer_ops_capture_and_commit+0x8e/0xad0
>     [<ffffffff82322bb9>] xfs_cui_item_recover+0x819/0x980
>     [<ffffffff823289b6>] xlog_recover_process_intents+0x246/0xb70
>     [<ffffffff8233249a>] xlog_recover_finish+0x8a/0x9a0
>     [<ffffffff822eeafb>] xfs_log_mount_finish+0x2bb/0x4a0
>     [<ffffffff822c0f4f>] xfs_mountfs+0x14bf/0x1e70
>     [<ffffffff822d1f80>] xfs_fs_fill_super+0x10d0/0x1b20
>     [<ffffffff81a21fa2>] get_tree_bdev+0x3d2/0x6d0
>     [<ffffffff81a1ee09>] vfs_get_tree+0x89/0x2c0
>     [<ffffffff81a9f35f>] path_mount+0xecf/0x1800
>     [<ffffffff81a9fd83>] do_mount+0xf3/0x110
>     [<ffffffff81aa00e4>] __x64_sys_mount+0x154/0x1f0
>     [<ffffffff83968739>] do_syscall_64+0x39/0x80
> 
> Fix the problem above by abort intent items that don't have a done item
> when recovery intents fail.
> 
> Fixes: e6fff81e4870 ("xfs: proper replay of deferred ops queued during log recovery")
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c | 1 +
>  fs/xfs/libxfs/xfs_defer.h | 1 +
>  fs/xfs/xfs_log_recover.c  | 1 +
>  3 files changed, 3 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 7ec6812fa625..b2b46d142281 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -809,6 +809,7 @@ xfs_defer_ops_capture_and_commit(
>  	/* Commit the transaction and add the capture structure to the list. */
>  	error = xfs_trans_commit(tp);
>  	if (error) {
> +		xfs_defer_pending_abort(mp, &dfc->dfc_dfops);
>  		xfs_defer_ops_capture_free(mp, dfc);

I prefer that we not go extern'ing two functions that mess around with
internal state.  Could you instead add the xfs_defer_pending_abort call
to the start of xfs_defer_ops_capture_free, and rename
xfs_defer_ops_capture_free to xfs_defer_ops_capture_abort?

Other than those two things, I /think/ this looks correct.  Assuming my
understanding of the problem is reflected in the tweaks I made to the
commit message. ;)

--D

>  		return error;
>  	}
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 114a3a4930a3..c3775014f7ab 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -37,6 +37,7 @@ struct xfs_defer_pending {
>  	enum xfs_defer_ops_type		dfp_type;
>  };
>  
> +void xfs_defer_pending_abort(struct xfs_mount *mp, struct list_head *dop_list);
>  void xfs_defer_add(struct xfs_trans *tp, enum xfs_defer_ops_type type,
>  		struct list_head *h);
>  int xfs_defer_finish_noroll(struct xfs_trans **tp);
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 82c81d20459d..924beecf07bb 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2511,6 +2511,7 @@ xlog_abort_defer_ops(
>  
>  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
>  		list_del_init(&dfc->dfc_list);
> +		xfs_defer_pending_abort(mp, &dfc->dfc_dfops);
>  		xfs_defer_ops_capture_free(mp, dfc);
>  	}
>  }
> -- 
> 2.31.1
>
Long Li June 30, 2023, 1:51 a.m. UTC | #2
On Thu, Jun 29, 2023 at 07:42:36AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 29, 2023 at 09:17:24PM +0800, Long Li wrote:
> > When recovery intents, it may capture some deferred ops and commit the new
> > intent items, if recovery intents fails, there will be no done item drop
> > the reference to the new intent item. New intent items will left in AIL
> > and caused mount thread hung all the time as fllows:
> 
> Er... let me try rewriting this a bit:
> 

I have tried my best to express it clearly, but there are still some
language barriers. Thank you very much for making the commit message
clearer.

> "When recovering intents, we capture newly created intent items as part
> of committing recovered intent items.  If intent recovery fails at a
> later point, we forget to remove those newly created intent items from
> the AIL and hang:
> 
> > 
> >     [root@localhost ~]# cat /proc/539/stack
> >     [<0>] xfs_ail_push_all_sync+0x174/0x230
> >     [<0>] xfs_unmount_flush_inodes+0x8d/0xd0
> >     [<0>] xfs_mountfs+0x15f7/0x1e70
> >     [<0>] xfs_fs_fill_super+0x10ec/0x1b20
> >     [<0>] get_tree_bdev+0x3c8/0x730
> >     [<0>] vfs_get_tree+0x89/0x2c0
> >     [<0>] path_mount+0xecf/0x1800
> >     [<0>] do_mount+0xf3/0x110
> >     [<0>] __x64_sys_mount+0x154/0x1f0
> >     [<0>] do_syscall_64+0x39/0x80
> >     [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > 
> > During intent item recover, if transaction that have deferred ops commmit
> > fails in xfs_defer_ops_capture_and_commit(), defer capture would not added
> > to capture list, so matching done items would not commit when finish defer
> > operations, this will cause intent item leaks:
> 
> "Intent recovery hasn't created done items for these newly created
> intent items, so the capture structure is the sole owner of the captured
> intent items.  We must release them explicitly or else they leak:
> 
> > unreferenced object 0xffff888016719108 (size 432):
> >   comm "mount", pid 529, jiffies 4294706839 (age 144.463s)
> >   hex dump (first 32 bytes):
> >     08 91 71 16 80 88 ff ff 08 91 71 16 80 88 ff ff  ..q.......q.....
> >     18 91 71 16 80 88 ff ff 18 91 71 16 80 88 ff ff  ..q.......q.....
> >   backtrace:
> >     [<ffffffff8230c68f>] xfs_efi_init+0x18f/0x1d0
> >     [<ffffffff8230c720>] xfs_extent_free_create_intent+0x50/0x150
> >     [<ffffffff821b671a>] xfs_defer_create_intents+0x16a/0x340
> >     [<ffffffff821bac3e>] xfs_defer_ops_capture_and_commit+0x8e/0xad0
> >     [<ffffffff82322bb9>] xfs_cui_item_recover+0x819/0x980
> >     [<ffffffff823289b6>] xlog_recover_process_intents+0x246/0xb70
> >     [<ffffffff8233249a>] xlog_recover_finish+0x8a/0x9a0
> >     [<ffffffff822eeafb>] xfs_log_mount_finish+0x2bb/0x4a0
> >     [<ffffffff822c0f4f>] xfs_mountfs+0x14bf/0x1e70
> >     [<ffffffff822d1f80>] xfs_fs_fill_super+0x10d0/0x1b20
> >     [<ffffffff81a21fa2>] get_tree_bdev+0x3d2/0x6d0
> >     [<ffffffff81a1ee09>] vfs_get_tree+0x89/0x2c0
> >     [<ffffffff81a9f35f>] path_mount+0xecf/0x1800
> >     [<ffffffff81a9fd83>] do_mount+0xf3/0x110
> >     [<ffffffff81aa00e4>] __x64_sys_mount+0x154/0x1f0
> >     [<ffffffff83968739>] do_syscall_64+0x39/0x80
> > 
> > Fix the problem above by abort intent items that don't have a done item
> > when recovery intents fail.
> > 
> > Fixes: e6fff81e4870 ("xfs: proper replay of deferred ops queued during log recovery")
> > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > ---
> >  fs/xfs/libxfs/xfs_defer.c | 1 +
> >  fs/xfs/libxfs/xfs_defer.h | 1 +
> >  fs/xfs/xfs_log_recover.c  | 1 +
> >  3 files changed, 3 insertions(+)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index 7ec6812fa625..b2b46d142281 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -809,6 +809,7 @@ xfs_defer_ops_capture_and_commit(
> >  	/* Commit the transaction and add the capture structure to the list. */
> >  	error = xfs_trans_commit(tp);
> >  	if (error) {
> > +		xfs_defer_pending_abort(mp, &dfc->dfc_dfops);
> >  		xfs_defer_ops_capture_free(mp, dfc);
> 
> I prefer that we not go extern'ing two functions that mess around with
> internal state.  Could you instead add the xfs_defer_pending_abort call
> to the start of xfs_defer_ops_capture_free, and rename
> xfs_defer_ops_capture_free to xfs_defer_ops_capture_abort?
> 

Thanks for your suggestion, it seems reasonable and clean enough, it will
change in the next version.

> Other than those two things, I /think/ this looks correct.  Assuming my
> understanding of the problem is reflected in the tweaks I made to the
> commit message. ;)
> 
> --D

Thanks for your review again, the commit message you wrote is exactly
what I want to describe, so I will send a new version later. 

ddThanks,
Long Li

> 
> >  		return error;
> >  	}
> > diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> > index 114a3a4930a3..c3775014f7ab 100644
> > --- a/fs/xfs/libxfs/xfs_defer.h
> > +++ b/fs/xfs/libxfs/xfs_defer.h
> > @@ -37,6 +37,7 @@ struct xfs_defer_pending {
> >  	enum xfs_defer_ops_type		dfp_type;
> >  };
> >  
> > +void xfs_defer_pending_abort(struct xfs_mount *mp, struct list_head *dop_list);
> >  void xfs_defer_add(struct xfs_trans *tp, enum xfs_defer_ops_type type,
> >  		struct list_head *h);
> >  int xfs_defer_finish_noroll(struct xfs_trans **tp);
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index 82c81d20459d..924beecf07bb 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -2511,6 +2511,7 @@ xlog_abort_defer_ops(
> >  
> >  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> >  		list_del_init(&dfc->dfc_list);
> > +		xfs_defer_pending_abort(mp, &dfc->dfc_dfops);
> >  		xfs_defer_ops_capture_free(mp, dfc);
> >  	}
> >  }
> > -- 
> > 2.31.1
> >
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 7ec6812fa625..b2b46d142281 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -809,6 +809,7 @@  xfs_defer_ops_capture_and_commit(
 	/* Commit the transaction and add the capture structure to the list. */
 	error = xfs_trans_commit(tp);
 	if (error) {
+		xfs_defer_pending_abort(mp, &dfc->dfc_dfops);
 		xfs_defer_ops_capture_free(mp, dfc);
 		return error;
 	}
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 114a3a4930a3..c3775014f7ab 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -37,6 +37,7 @@  struct xfs_defer_pending {
 	enum xfs_defer_ops_type		dfp_type;
 };
 
+void xfs_defer_pending_abort(struct xfs_mount *mp, struct list_head *dop_list);
 void xfs_defer_add(struct xfs_trans *tp, enum xfs_defer_ops_type type,
 		struct list_head *h);
 int xfs_defer_finish_noroll(struct xfs_trans **tp);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 82c81d20459d..924beecf07bb 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2511,6 +2511,7 @@  xlog_abort_defer_ops(
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
 		list_del_init(&dfc->dfc_list);
+		xfs_defer_pending_abort(mp, &dfc->dfc_dfops);
 		xfs_defer_ops_capture_free(mp, dfc);
 	}
 }