diff mbox series

btrfs: don't commit transaction for every subvol create

Message ID 61e8946ae040075ce2fe378e39b500c4ac97e8a3.1681151504.git.sweettea-kernel@dorminy.me (mailing list archive)
State New, archived
Headers show
Series btrfs: don't commit transaction for every subvol create | expand

Commit Message

Sweet Tea Dorminy April 11, 2023, 7:10 p.m. UTC
Recently a Meta-internal workload encountered subvolume creation taking
up to 2s each, significantly slower than directory creation. As they
were hoping to be able to use subvolumes instead of directories, and
were looking to create hundreds, this was a significant issue. After
Josef investigated, it turned out to be due to the transaction commit
currently performed at the end of subvolume creation.

This change improves the workload by not doing transaction commit for every
subvolume creation, and merely requiring a transaction commit on fsync.
In the worst case, of doing a subvolume create and fsync in a loop, this
should require an equal amount of time to the current scheme; and in the
best case, the internal workload creating hundreds of subvols before
fsyncing is greatly improved.

While it would be nice to be able to use the log tree and use the normal
fsync path, logtree replay can't deal with new subvolume inodes
presently.

It's possible that there's some reason that the transaction commit is
necessary for correctness during subvolume creation; however,
git logs indicate that the commit dates back to the beginning of
subvolume creation, and there are no notes on why it would be necessary.

Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
---
 fs/btrfs/ioctl.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Neal Gompa April 12, 2023, 1:45 a.m. UTC | #1
On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
<sweettea-kernel@dorminy.me> wrote:
>
> Recently a Meta-internal workload encountered subvolume creation taking
> up to 2s each, significantly slower than directory creation. As they
> were hoping to be able to use subvolumes instead of directories, and
> were looking to create hundreds, this was a significant issue. After
> Josef investigated, it turned out to be due to the transaction commit
> currently performed at the end of subvolume creation.
>
> This change improves the workload by not doing transaction commit for every
> subvolume creation, and merely requiring a transaction commit on fsync.
> In the worst case, of doing a subvolume create and fsync in a loop, this
> should require an equal amount of time to the current scheme; and in the
> best case, the internal workload creating hundreds of subvols before
> fsyncing is greatly improved.
>
> While it would be nice to be able to use the log tree and use the normal
> fsync path, logtree replay can't deal with new subvolume inodes
> presently.
>
> It's possible that there's some reason that the transaction commit is
> necessary for correctness during subvolume creation; however,
> git logs indicate that the commit dates back to the beginning of
> subvolume creation, and there are no notes on why it would be necessary.
>
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> ---
>  fs/btrfs/ioctl.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 25833b4eeaf5..a6f1ee2dc1b9 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
>         }
>         trans->block_rsv = &block_rsv;
>         trans->bytes_reserved = block_rsv.size;
> +       /* tree log can't currently deal with an inode which is a new root */
> +       btrfs_set_log_full_commit(trans);
>
>         ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
>         if (ret)
> @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
>         trans->bytes_reserved = 0;
>         btrfs_subvolume_release_metadata(root, &block_rsv);
>
> -       if (ret)
> -               btrfs_end_transaction(trans);
> -       else
> -               ret = btrfs_commit_transaction(trans);
> +       btrfs_end_transaction(trans);
>  out_new_inode_args:
>         btrfs_new_inode_args_destroy(&new_inode_args);
>  out_inode:
> --
> 2.40.0
>

Wouldn't the consequence of this mean that we lose some guarantees
around subvolume creation? That is, without it being committed, we
would have a window in which the subvolume and data can be written
without being written to disk? If so, how large is that window?
Qu Wenruo April 12, 2023, 2:29 a.m. UTC | #2
On 2023/4/12 03:10, Sweet Tea Dorminy wrote:
> Recently a Meta-internal workload encountered subvolume creation taking
> up to 2s each, significantly slower than directory creation. As they
> were hoping to be able to use subvolumes instead of directories, and
> were looking to create hundreds, this was a significant issue. After
> Josef investigated, it turned out to be due to the transaction commit
> currently performed at the end of subvolume creation.
> 
> This change improves the workload by not doing transaction commit for every
> subvolume creation, and merely requiring a transaction commit on fsync.
> In the worst case, of doing a subvolume create and fsync in a loop, this
> should require an equal amount of time to the current scheme; and in the
> best case, the internal workload creating hundreds of subvols before
> fsyncing is greatly improved.
> 
> While it would be nice to be able to use the log tree and use the normal
> fsync path, logtree replay can't deal with new subvolume inodes
> presently.
> 
> It's possible that there's some reason that the transaction commit is
> necessary for correctness during subvolume creation; however,
> git logs indicate that the commit dates back to the beginning of
> subvolume creation, and there are no notes on why it would be necessary.

For subvolume creation it looks fine.

My main concern related to this topic is mostly around snapshots:

- Snapshots can only be created during commit transaction
- Snapshots qgroup accounting
   For now we only support quick path (aka, inherit the old numbers from
   the source, and that's no other qgroup level involved).

But for subvolume creation, those are not involved at all.

So the idea of skipping transaction commit looks solid to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> 
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> ---
>   fs/btrfs/ioctl.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 25833b4eeaf5..a6f1ee2dc1b9 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
>   	}
>   	trans->block_rsv = &block_rsv;
>   	trans->bytes_reserved = block_rsv.size;
> +	/* tree log can't currently deal with an inode which is a new root */
> +	btrfs_set_log_full_commit(trans);
>   
>   	ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
>   	if (ret)
> @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
>   	trans->bytes_reserved = 0;
>   	btrfs_subvolume_release_metadata(root, &block_rsv);
>   
> -	if (ret)
> -		btrfs_end_transaction(trans);
> -	else
> -		ret = btrfs_commit_transaction(trans);
> +	btrfs_end_transaction(trans);
>   out_new_inode_args:
>   	btrfs_new_inode_args_destroy(&new_inode_args);
>   out_inode:
Qu Wenruo April 12, 2023, 2:34 a.m. UTC | #3
On 2023/4/12 09:45, Neal Gompa wrote:
> On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
> <sweettea-kernel@dorminy.me> wrote:
>>
>> Recently a Meta-internal workload encountered subvolume creation taking
>> up to 2s each, significantly slower than directory creation. As they
>> were hoping to be able to use subvolumes instead of directories, and
>> were looking to create hundreds, this was a significant issue. After
>> Josef investigated, it turned out to be due to the transaction commit
>> currently performed at the end of subvolume creation.
>>
>> This change improves the workload by not doing transaction commit for every
>> subvolume creation, and merely requiring a transaction commit on fsync.
>> In the worst case, of doing a subvolume create and fsync in a loop, this
>> should require an equal amount of time to the current scheme; and in the
>> best case, the internal workload creating hundreds of subvols before
>> fsyncing is greatly improved.
>>
>> While it would be nice to be able to use the log tree and use the normal
>> fsync path, logtree replay can't deal with new subvolume inodes
>> presently.
>>
>> It's possible that there's some reason that the transaction commit is
>> necessary for correctness during subvolume creation; however,
>> git logs indicate that the commit dates back to the beginning of
>> subvolume creation, and there are no notes on why it would be necessary.
>>
>> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
>> ---
>>   fs/btrfs/ioctl.c | 7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 25833b4eeaf5..a6f1ee2dc1b9 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
>>          }
>>          trans->block_rsv = &block_rsv;
>>          trans->bytes_reserved = block_rsv.size;
>> +       /* tree log can't currently deal with an inode which is a new root */
>> +       btrfs_set_log_full_commit(trans);
>>
>>          ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
>>          if (ret)
>> @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
>>          trans->bytes_reserved = 0;
>>          btrfs_subvolume_release_metadata(root, &block_rsv);
>>
>> -       if (ret)
>> -               btrfs_end_transaction(trans);
>> -       else
>> -               ret = btrfs_commit_transaction(trans);
>> +       btrfs_end_transaction(trans);
>>   out_new_inode_args:
>>          btrfs_new_inode_args_destroy(&new_inode_args);
>>   out_inode:
>> --
>> 2.40.0
>>
> 
> Wouldn't the consequence of this mean that we lose some guarantees
> around subvolume creation? That is, without it being committed, we
> would have a window in which the subvolume and data can be written
> without being written to disk? If so, how large is that window?

That can be avoided in btrfs-progs, by adding some -C|-c options just 
like subvolume deletion to ensure the case.


If you're really concern about the window size, it's no smaller than 
commit interval (by default 30s).


But that's not the full story, if you do fsync() on any inode of the new 
subvolume, it should cause a full commit transaction, which would write 
every needed metadata, including the subvolume itself, to the disk.

So the end result is, you don't really need to bother the subvolume 
itself, just do your usual fsync() work on the inode you're really 
concerned, no need to bother the subvolume itself.

Thanks,
Qu

> 
>
Qu Wenruo April 12, 2023, 2:35 a.m. UTC | #4
On 2023/4/12 10:34, Qu Wenruo wrote:
> 
> 
> On 2023/4/12 09:45, Neal Gompa wrote:
>> On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
>> <sweettea-kernel@dorminy.me> wrote:
>>>
>>> Recently a Meta-internal workload encountered subvolume creation taking
>>> up to 2s each, significantly slower than directory creation. As they
>>> were hoping to be able to use subvolumes instead of directories, and
>>> were looking to create hundreds, this was a significant issue. After
>>> Josef investigated, it turned out to be due to the transaction commit
>>> currently performed at the end of subvolume creation.
>>>
>>> This change improves the workload by not doing transaction commit for 
>>> every
>>> subvolume creation, and merely requiring a transaction commit on fsync.
>>> In the worst case, of doing a subvolume create and fsync in a loop, this
>>> should require an equal amount of time to the current scheme; and in the
>>> best case, the internal workload creating hundreds of subvols before
>>> fsyncing is greatly improved.
>>>
>>> While it would be nice to be able to use the log tree and use the normal
>>> fsync path, logtree replay can't deal with new subvolume inodes
>>> presently.
>>>
>>> It's possible that there's some reason that the transaction commit is
>>> necessary for correctness during subvolume creation; however,
>>> git logs indicate that the commit dates back to the beginning of
>>> subvolume creation, and there are no notes on why it would be necessary.
>>>
>>> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
>>> ---
>>>   fs/btrfs/ioctl.c | 7 +++----
>>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>> index 25833b4eeaf5..a6f1ee2dc1b9 100644
>>> --- a/fs/btrfs/ioctl.c
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -647,6 +647,8 @@ static noinline int create_subvol(struct 
>>> mnt_idmap *idmap,
>>>          }
>>>          trans->block_rsv = &block_rsv;
>>>          trans->bytes_reserved = block_rsv.size;
>>> +       /* tree log can't currently deal with an inode which is a new 
>>> root */
>>> +       btrfs_set_log_full_commit(trans);
>>>
>>>          ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
>>>          if (ret)
>>> @@ -755,10 +757,7 @@ static noinline int create_subvol(struct 
>>> mnt_idmap *idmap,
>>>          trans->bytes_reserved = 0;
>>>          btrfs_subvolume_release_metadata(root, &block_rsv);
>>>
>>> -       if (ret)
>>> -               btrfs_end_transaction(trans);
>>> -       else
>>> -               ret = btrfs_commit_transaction(trans);
>>> +       btrfs_end_transaction(trans);
>>>   out_new_inode_args:
>>>          btrfs_new_inode_args_destroy(&new_inode_args);
>>>   out_inode:
>>> -- 
>>> 2.40.0
>>>
>>
>> Wouldn't the consequence of this mean that we lose some guarantees
>> around subvolume creation? That is, without it being committed, we
>> would have a window in which the subvolume and data can be written
>> without being written to disk? If so, how large is that window?
> 
> That can be avoided in btrfs-progs, by adding some -C|-c options just 
> like subvolume deletion to ensure the case.
> 
> 
> If you're really concern about the window size, it's no smaller than 
> commit interval (by default 30s).

s/smaller/larger/

> 
> 
> But that's not the full story, if you do fsync() on any inode of the new 
> subvolume, it should cause a full commit transaction, which would write 
> every needed metadata, including the subvolume itself, to the disk.
> 
> So the end result is, you don't really need to bother the subvolume 
> itself, just do your usual fsync() work on the inode you're really 
> concerned, no need to bother the subvolume itself.
> 
> Thanks,
> Qu
> 
>>
>>
Josef Bacik April 12, 2023, 1:49 p.m. UTC | #5
On Tue, Apr 11, 2023 at 09:45:34PM -0400, Neal Gompa wrote:
> On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
> <sweettea-kernel@dorminy.me> wrote:
> >
> > Recently a Meta-internal workload encountered subvolume creation taking
> > up to 2s each, significantly slower than directory creation. As they
> > were hoping to be able to use subvolumes instead of directories, and
> > were looking to create hundreds, this was a significant issue. After
> > Josef investigated, it turned out to be due to the transaction commit
> > currently performed at the end of subvolume creation.
> >
> > This change improves the workload by not doing transaction commit for every
> > subvolume creation, and merely requiring a transaction commit on fsync.
> > In the worst case, of doing a subvolume create and fsync in a loop, this
> > should require an equal amount of time to the current scheme; and in the
> > best case, the internal workload creating hundreds of subvols before
> > fsyncing is greatly improved.
> >
> > While it would be nice to be able to use the log tree and use the normal
> > fsync path, logtree replay can't deal with new subvolume inodes
> > presently.
> >
> > It's possible that there's some reason that the transaction commit is
> > necessary for correctness during subvolume creation; however,
> > git logs indicate that the commit dates back to the beginning of
> > subvolume creation, and there are no notes on why it would be necessary.
> >
> > Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> > ---
> >  fs/btrfs/ioctl.c | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 25833b4eeaf5..a6f1ee2dc1b9 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> >         }
> >         trans->block_rsv = &block_rsv;
> >         trans->bytes_reserved = block_rsv.size;
> > +       /* tree log can't currently deal with an inode which is a new root */
> > +       btrfs_set_log_full_commit(trans);
> >
> >         ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
> >         if (ret)
> > @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> >         trans->bytes_reserved = 0;
> >         btrfs_subvolume_release_metadata(root, &block_rsv);
> >
> > -       if (ret)
> > -               btrfs_end_transaction(trans);
> > -       else
> > -               ret = btrfs_commit_transaction(trans);
> > +       btrfs_end_transaction(trans);
> >  out_new_inode_args:
> >         btrfs_new_inode_args_destroy(&new_inode_args);
> >  out_inode:
> > --
> > 2.40.0
> >
> 
> Wouldn't the consequence of this mean that we lose some guarantees
> around subvolume creation? That is, without it being committed, we
> would have a window in which the subvolume and data can be written
> without being written to disk? If so, how large is that window?

It's the same as a normal directory, we create a subvol and then create a file
in that subvol and start writing to that file, then lose power before the
transaction commits, we lose everything.  The same would happen for a new
directory.

The only concern here that's different is with the directory case you could
fsync() the new file and it would all go into the tree log.  As pointed out
above the tree log stuff can't handle a subvolume that doesn't exist yet, so in
the above case if the user fsync()'s that new file it'll commit the whole
transaction and everything will be fine.  Thanks,

Josef
Josef Bacik April 12, 2023, 1:51 p.m. UTC | #6
On Tue, Apr 11, 2023 at 03:10:53PM -0400, Sweet Tea Dorminy wrote:
> Recently a Meta-internal workload encountered subvolume creation taking
> up to 2s each, significantly slower than directory creation. As they
> were hoping to be able to use subvolumes instead of directories, and
> were looking to create hundreds, this was a significant issue. After
> Josef investigated, it turned out to be due to the transaction commit
> currently performed at the end of subvolume creation.
> 
> This change improves the workload by not doing transaction commit for every
> subvolume creation, and merely requiring a transaction commit on fsync.
> In the worst case, of doing a subvolume create and fsync in a loop, this
> should require an equal amount of time to the current scheme; and in the
> best case, the internal workload creating hundreds of subvols before
> fsyncing is greatly improved.
> 
> While it would be nice to be able to use the log tree and use the normal
> fsync path, logtree replay can't deal with new subvolume inodes
> presently.
> 
> It's possible that there's some reason that the transaction commit is
> necessary for correctness during subvolume creation; however,
> git logs indicate that the commit dates back to the beginning of
> subvolume creation, and there are no notes on why it would be necessary.
> 
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
Neal Gompa April 12, 2023, 10:35 p.m. UTC | #7
On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
<sweettea-kernel@dorminy.me> wrote:
>
> Recently a Meta-internal workload encountered subvolume creation taking
> up to 2s each, significantly slower than directory creation. As they
> were hoping to be able to use subvolumes instead of directories, and
> were looking to create hundreds, this was a significant issue. After
> Josef investigated, it turned out to be due to the transaction commit
> currently performed at the end of subvolume creation.
>
> This change improves the workload by not doing transaction commit for every
> subvolume creation, and merely requiring a transaction commit on fsync.
> In the worst case, of doing a subvolume create and fsync in a loop, this
> should require an equal amount of time to the current scheme; and in the
> best case, the internal workload creating hundreds of subvols before
> fsyncing is greatly improved.
>
> While it would be nice to be able to use the log tree and use the normal
> fsync path, logtree replay can't deal with new subvolume inodes
> presently.
>
> It's possible that there's some reason that the transaction commit is
> necessary for correctness during subvolume creation; however,
> git logs indicate that the commit dates back to the beginning of
> subvolume creation, and there are no notes on why it would be necessary.
>
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> ---
>  fs/btrfs/ioctl.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 25833b4eeaf5..a6f1ee2dc1b9 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
>         }
>         trans->block_rsv = &block_rsv;
>         trans->bytes_reserved = block_rsv.size;
> +       /* tree log can't currently deal with an inode which is a new root */
> +       btrfs_set_log_full_commit(trans);
>
>         ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
>         if (ret)
> @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
>         trans->bytes_reserved = 0;
>         btrfs_subvolume_release_metadata(root, &block_rsv);
>
> -       if (ret)
> -               btrfs_end_transaction(trans);
> -       else
> -               ret = btrfs_commit_transaction(trans);
> +       btrfs_end_transaction(trans);
>  out_new_inode_args:
>         btrfs_new_inode_args_destroy(&new_inode_args);
>  out_inode:
> --
> 2.40.0
>

Reviewed-by: Neal Gompa <neal@gompa.dev>
David Sterba April 17, 2023, 6:46 p.m. UTC | #8
On Tue, Apr 11, 2023 at 03:10:53PM -0400, Sweet Tea Dorminy wrote:
> Recently a Meta-internal workload encountered subvolume creation taking
> up to 2s each, significantly slower than directory creation. As they
> were hoping to be able to use subvolumes instead of directories, and
> were looking to create hundreds, this was a significant issue. After
> Josef investigated, it turned out to be due to the transaction commit
> currently performed at the end of subvolume creation.
> 
> This change improves the workload by not doing transaction commit for every
> subvolume creation, and merely requiring a transaction commit on fsync.
> In the worst case, of doing a subvolume create and fsync in a loop, this
> should require an equal amount of time to the current scheme; and in the
> best case, the internal workload creating hundreds of subvols before
> fsyncing is greatly improved.
> 
> While it would be nice to be able to use the log tree and use the normal
> fsync path, logtree replay can't deal with new subvolume inodes
> presently.
> 
> It's possible that there's some reason that the transaction commit is
> necessary for correctness during subvolume creation; however,
> git logs indicate that the commit dates back to the beginning of
> subvolume creation, and there are no notes on why it would be necessary.
> 
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>

For subvolume creation the situation is much easier than with deletion,
as said by others, the semantics is similar to creating a directory,
fsync works to make sure the data are persistent.

So I think it's safe and allows the use case of creating many subvolumes
without the full transaction commit overhead. And we don't need the
commandline options. Added to misc-next, thanks.
Sweet Tea Dorminy April 21, 2023, 12:58 p.m. UTC | #9
> 
> So I think it's safe and allows the use case of creating many subvolumes
> without the full transaction commit overhead. And we don't need the
> commandline options. Added to misc-next, thanks.

Just to check, have you pushed misc-next containing this to github? I 
can't find it in git log.
Thanks!
David Sterba April 21, 2023, 3:23 p.m. UTC | #10
On Fri, Apr 21, 2023 at 08:58:54AM -0400, Sweet Tea Dorminy wrote:
> 
> > 
> > So I think it's safe and allows the use case of creating many subvolumes
> > without the full transaction commit overhead. And we don't need the
> > commandline options. Added to misc-next, thanks.
> 
> Just to check, have you pushed misc-next containing this to github? I 
> can't find it in git log.

Maybe I forgot to push the latest changes, now pushed.
diff mbox series

Patch

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 25833b4eeaf5..a6f1ee2dc1b9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -647,6 +647,8 @@  static noinline int create_subvol(struct mnt_idmap *idmap,
 	}
 	trans->block_rsv = &block_rsv;
 	trans->bytes_reserved = block_rsv.size;
+	/* tree log can't currently deal with an inode which is a new root */
+	btrfs_set_log_full_commit(trans);
 
 	ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
 	if (ret)
@@ -755,10 +757,7 @@  static noinline int create_subvol(struct mnt_idmap *idmap,
 	trans->bytes_reserved = 0;
 	btrfs_subvolume_release_metadata(root, &block_rsv);
 
-	if (ret)
-		btrfs_end_transaction(trans);
-	else
-		ret = btrfs_commit_transaction(trans);
+	btrfs_end_transaction(trans);
 out_new_inode_args:
 	btrfs_new_inode_args_destroy(&new_inode_args);
 out_inode: