diff mbox series

[RFC] btrfs: trigger orphan inodes cleanup during sync ioctl

Message ID 79d32abc0d6c1a3afcf9224bd44875f5594c80b6.1683848508.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series [RFC] btrfs: trigger orphan inodes cleanup during sync ioctl | expand

Commit Message

Qu Wenruo May 11, 2023, 11:42 p.m. UTC
There is an internal error report that scrub found an error in an orphan
inode's data.

However there are very limited ways to cleanup such orphan inodes:

- btrfs_start_pre_rw_mount()
  This happens at either mount, or RO->RW switch.
  This is not a valiable solution for rootfs which may not be unmounted
  or RO mounted.

  Furthermore this doesn't cover every subvolume, it only covers the
  currently cached subvolumes.

- btrfs_lookup_dentry()
  This happens when we first lookup the subvolume dentry.
  But dentry can be cached thus it's not ensured to be triggered every
  time.

- create_snapshot()
  This only happens for the created snapshot, not the source one.

This means if we didn't trigger orphan items cleanup, there is really no
way to manually trigger it.

Thus this patch would add a manual trigger for sync ioctl.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Reason for RFC:
Although this patch is very small, it involves a behavior change and
hugely delay the sync ioctl.
---
 fs/btrfs/ioctl.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

David Sterba May 11, 2023, 11:48 p.m. UTC | #1
On Fri, May 12, 2023 at 07:42:05AM +0800, Qu Wenruo wrote:
> There is an internal error report that scrub found an error in an orphan
> inode's data.
> 
> However there are very limited ways to cleanup such orphan inodes:
> 
> - btrfs_start_pre_rw_mount()
>   This happens at either mount, or RO->RW switch.
>   This is not a valiable solution for rootfs which may not be unmounted
>   or RO mounted.
> 
>   Furthermore this doesn't cover every subvolume, it only covers the
>   currently cached subvolumes.
> 
> - btrfs_lookup_dentry()
>   This happens when we first lookup the subvolume dentry.
>   But dentry can be cached thus it's not ensured to be triggered every
>   time.
> 
> - create_snapshot()
>   This only happens for the created snapshot, not the source one.
> 
> This means if we didn't trigger orphan items cleanup, there is really no
> way to manually trigger it.
> 
> Thus this patch would add a manual trigger for sync ioctl.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Reason for RFC:
> Although this patch is very small, it involves a behavior change and
> hugely delay the sync ioctl.

I think this is ok, sync is expected to do all it needs and be
potentially slow. We once extended sync to start subvolume cleanup,
which is not slowing down sync itself but is adding more semantics to
it.

The dentry lookup that starts orphan cleanup is something that I'd
otherwise consider sufficient but if there's a report it is not enough.
Also we can't recommend a workaround to cd into a random directory that
was not used recently, we don't know the state of the dentry cache so
making it explicit in sync() which is a known slow operation is quite
reasonable.
David Sterba May 22, 2023, 11:43 a.m. UTC | #2
On Fri, May 12, 2023 at 07:42:05AM +0800, Qu Wenruo wrote:
> There is an internal error report that scrub found an error in an orphan
> inode's data.
> 
> However there are very limited ways to cleanup such orphan inodes:
> 
> - btrfs_start_pre_rw_mount()
>   This happens at either mount, or RO->RW switch.
>   This is not a valiable solution for rootfs which may not be unmounted
>   or RO mounted.
> 
>   Furthermore this doesn't cover every subvolume, it only covers the
>   currently cached subvolumes.
> 
> - btrfs_lookup_dentry()
>   This happens when we first lookup the subvolume dentry.
>   But dentry can be cached thus it's not ensured to be triggered every
>   time.
> 
> - create_snapshot()
>   This only happens for the created snapshot, not the source one.
> 
> This means if we didn't trigger orphan items cleanup, there is really no
> way to manually trigger it.
> 
> Thus this patch would add a manual trigger for sync ioctl.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Reason for RFC:
> Although this patch is very small, it involves a behavior change and
> hugely delay the sync ioctl.
> ---
>  fs/btrfs/ioctl.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index df7603c30686..51ad2f0f9dd8 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3111,6 +3111,11 @@ static noinline long btrfs_ioctl_start_sync(struct btrfs_root *root,
>  {
>  	struct btrfs_trans_handle *trans;
>  	u64 transid;
> +	int ret;
> +
> +	ret = btrfs_orphan_cleanup(root);
> +	if (ret < 0)
> +		return ret;

I think this should not fail the whole sync ioctl, namely when it would
not even try to start the transaction commit.
Qu Wenruo May 22, 2023, 11:51 a.m. UTC | #3
On 2023/5/22 19:43, David Sterba wrote:
> On Fri, May 12, 2023 at 07:42:05AM +0800, Qu Wenruo wrote:
>> There is an internal error report that scrub found an error in an orphan
>> inode's data.
>>
>> However there are very limited ways to cleanup such orphan inodes:
>>
>> - btrfs_start_pre_rw_mount()
>>    This happens at either mount, or RO->RW switch.
>>    This is not a valiable solution for rootfs which may not be unmounted
>>    or RO mounted.
>>
>>    Furthermore this doesn't cover every subvolume, it only covers the
>>    currently cached subvolumes.
>>
>> - btrfs_lookup_dentry()
>>    This happens when we first lookup the subvolume dentry.
>>    But dentry can be cached thus it's not ensured to be triggered every
>>    time.
>>
>> - create_snapshot()
>>    This only happens for the created snapshot, not the source one.
>>
>> This means if we didn't trigger orphan items cleanup, there is really no
>> way to manually trigger it.
>>
>> Thus this patch would add a manual trigger for sync ioctl.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Reason for RFC:
>> Although this patch is very small, it involves a behavior change and
>> hugely delay the sync ioctl.
>> ---
>>   fs/btrfs/ioctl.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index df7603c30686..51ad2f0f9dd8 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -3111,6 +3111,11 @@ static noinline long btrfs_ioctl_start_sync(struct btrfs_root *root,
>>   {
>>   	struct btrfs_trans_handle *trans;
>>   	u64 transid;
>> +	int ret;
>> +
>> +	ret = btrfs_orphan_cleanup(root);
>> +	if (ret < 0)
>> +		return ret;
>
> I think this should not fail the whole sync ioctl, namely when it would
> not even try to start the transaction commit.

If the cleanup failed, it's mostly possible that the subvolume tree is
corrupted, and we would still abort transaction during that subvolume
tree operation.

Thanks,
Qu
David Sterba May 22, 2023, 11:53 a.m. UTC | #4
On Mon, May 22, 2023 at 07:51:17PM +0800, Qu Wenruo wrote:
> On 2023/5/22 19:43, David Sterba wrote:
> > On Fri, May 12, 2023 at 07:42:05AM +0800, Qu Wenruo wrote:
> >> There is an internal error report that scrub found an error in an orphan
> >> inode's data.
> >>
> >> However there are very limited ways to cleanup such orphan inodes:
> >>
> >> - btrfs_start_pre_rw_mount()
> >>    This happens at either mount, or RO->RW switch.
> >>    This is not a valiable solution for rootfs which may not be unmounted
> >>    or RO mounted.
> >>
> >>    Furthermore this doesn't cover every subvolume, it only covers the
> >>    currently cached subvolumes.
> >>
> >> - btrfs_lookup_dentry()
> >>    This happens when we first lookup the subvolume dentry.
> >>    But dentry can be cached thus it's not ensured to be triggered every
> >>    time.
> >>
> >> - create_snapshot()
> >>    This only happens for the created snapshot, not the source one.
> >>
> >> This means if we didn't trigger orphan items cleanup, there is really no
> >> way to manually trigger it.
> >>
> >> Thus this patch would add a manual trigger for sync ioctl.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >> Reason for RFC:
> >> Although this patch is very small, it involves a behavior change and
> >> hugely delay the sync ioctl.
> >> ---
> >>   fs/btrfs/ioctl.c | 5 +++++
> >>   1 file changed, 5 insertions(+)
> >>
> >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> >> index df7603c30686..51ad2f0f9dd8 100644
> >> --- a/fs/btrfs/ioctl.c
> >> +++ b/fs/btrfs/ioctl.c
> >> @@ -3111,6 +3111,11 @@ static noinline long btrfs_ioctl_start_sync(struct btrfs_root *root,
> >>   {
> >>   	struct btrfs_trans_handle *trans;
> >>   	u64 transid;
> >> +	int ret;
> >> +
> >> +	ret = btrfs_orphan_cleanup(root);
> >> +	if (ret < 0)
> >> +		return ret;
> >
> > I think this should not fail the whole sync ioctl, namely when it would
> > not even try to start the transaction commit.
> 
> If the cleanup failed, it's mostly possible that the subvolume tree is
> corrupted, and we would still abort transaction during that subvolume
> tree operation.

Yes, and that's the right place to detect the error.
David Sterba May 22, 2023, 12:02 p.m. UTC | #5
On Mon, May 22, 2023 at 01:53:32PM +0200, David Sterba wrote:
> On Mon, May 22, 2023 at 07:51:17PM +0800, Qu Wenruo wrote:
> > On 2023/5/22 19:43, David Sterba wrote:
> > > On Fri, May 12, 2023 at 07:42:05AM +0800, Qu Wenruo wrote:
> > >> --- a/fs/btrfs/ioctl.c
> > >> +++ b/fs/btrfs/ioctl.c
> > >> @@ -3111,6 +3111,11 @@ static noinline long btrfs_ioctl_start_sync(struct btrfs_root *root,
> > >>   {
> > >>   	struct btrfs_trans_handle *trans;
> > >>   	u64 transid;
> > >> +	int ret;
> > >> +
> > >> +	ret = btrfs_orphan_cleanup(root);
> > >> +	if (ret < 0)
> > >> +		return ret;
> > >
> > > I think this should not fail the whole sync ioctl, namely when it would
> > > not even try to start the transaction commit.
> > 
> > If the cleanup failed, it's mostly possible that the subvolume tree is
> > corrupted, and we would still abort transaction during that subvolume
> > tree operation.
> 
> Yes, and that's the right place to detect the error.

I've updated with

---
-       int ret;
 
-       ret = btrfs_orphan_cleanup(root);
-       if (ret < 0)
-               return ret;
+       /*
+        * Start orphan cleanup here for the given root in case it hasn't been
+        * started already by other means. Errors are handled in the other
+        * functions during transaction commit.
+        */
+       btrfs_orphan_cleanup(root);
---
diff mbox series

Patch

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index df7603c30686..51ad2f0f9dd8 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3111,6 +3111,11 @@  static noinline long btrfs_ioctl_start_sync(struct btrfs_root *root,
 {
 	struct btrfs_trans_handle *trans;
 	u64 transid;
+	int ret;
+
+	ret = btrfs_orphan_cleanup(root);
+	if (ret < 0)
+		return ret;
 
 	trans = btrfs_attach_transaction_barrier(root);
 	if (IS_ERR(trans)) {