diff mbox series

btrfs: Allow btrfs_truncate_block() to fallback to nocow for data space reservation

Message ID 20200130052822.11765-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: Allow btrfs_truncate_block() to fallback to nocow for data space reservation | expand

Commit Message

Qu Wenruo Jan. 30, 2020, 5:28 a.m. UTC
[BUG]
When the data space is exhausted, even the inode has NOCOW attribute,
btrfs will still refuse to truncate unaligned range due to ENOSPC.

The following script can reproduce it pretty easily:
  #!/bin/bash

  dev=/dev/test/test
  mnt=/mnt/btrfs

  umount $dev &> /dev/null
  umount $mnt&> /dev/null

  mkfs.btrfs -f $dev -b 1G
  mount -o nospace_cache $dev $mnt
  touch $mnt/foobar
  chattr +C $mnt/foobar

  xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
  xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
  sync

  xfs_io -c "fpunch 0 2k" $mnt/foobar
  umount $mnt

Current btrfs will fail at the fpunch part.

[CAUSE]
Because btrfs_truncate_block() always reserve space without checking the
NOCOW attribute.

Since the writeback path follows NOCOW bit, we only need to bother the
space reservation code in btrfs_truncate_block().

[FIX]
Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to
reserve data space first, and falls back to NOCOW check only when we
don't have enough space.

Such always-try-reserve is an optimization introduced in
btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.

Since now check_can_nocow() is needed outside of inode.c, also export it
and rename it to btrfs_check_can_nocow().

Reported-by: Martin Doucha <martin.doucha@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Test case will be submitted to fstests by the reporter.
---
 fs/btrfs/ctree.h |  2 ++
 fs/btrfs/file.c  | 10 +++++-----
 fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++-------
 3 files changed, 41 insertions(+), 12 deletions(-)

Comments

Filipe Manana Jan. 30, 2020, 10:02 a.m. UTC | #1
On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> When the data space is exhausted, even the inode has NOCOW attribute,
> btrfs will still refuse to truncate unaligned range due to ENOSPC.
>
> The following script can reproduce it pretty easily:
>   #!/bin/bash
>
>   dev=/dev/test/test
>   mnt=/mnt/btrfs
>
>   umount $dev &> /dev/null
>   umount $mnt&> /dev/null
>
>   mkfs.btrfs -f $dev -b 1G
>   mount -o nospace_cache $dev $mnt
>   touch $mnt/foobar
>   chattr +C $mnt/foobar
>
>   xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
>   xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
>   sync
>
>   xfs_io -c "fpunch 0 2k" $mnt/foobar
>   umount $mnt
>
> Current btrfs will fail at the fpunch part.
>
> [CAUSE]
> Because btrfs_truncate_block() always reserve space without checking the
> NOCOW attribute.
>
> Since the writeback path follows NOCOW bit, we only need to bother the
> space reservation code in btrfs_truncate_block().
>
> [FIX]
> Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to
> reserve data space first, and falls back to NOCOW check only when we
> don't have enough space.
>
> Such always-try-reserve is an optimization introduced in
> btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
>
> Since now check_can_nocow() is needed outside of inode.c, also export it
> and rename it to btrfs_check_can_nocow().
>
> Reported-by: Martin Doucha <martin.doucha@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Test case will be submitted to fstests by the reporter.

Well, this is a sudden change of mind, isn't it? :)

We had btrfs/172, which you removed very recently, that precisely tested this:

https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126

Even though there are several reasons why this can still fail (at
writeback time), like regular buffered writes through the family of
write() syscalls can, I think it's perfectly fine to have this
behaviour.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

So I think we can just resurrect btrfs/172 now...

> ---
>  fs/btrfs/ctree.h |  2 ++
>  fs/btrfs/file.c  | 10 +++++-----
>  fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++-------
>  3 files changed, 41 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 54efb21c2727..b5639f3461e4 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>  loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
>                               struct file *file_out, loff_t pos_out,
>                               loff_t len, unsigned int remap_flags);
> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> +                         size_t *write_bytes);
>
>  /* tree-defrag.c */
>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 8d47c76b7bd1..8dc084600f4e 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>         return ret;
>  }
>
> -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> -                                   size_t *write_bytes)
> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> +                         size_t *write_bytes)
>  {
>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
>         struct btrfs_root *root = inode->root;
> @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>                 if (ret < 0) {
>                         if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>                                                       BTRFS_INODE_PREALLOC)) &&
> -                           check_can_nocow(BTRFS_I(inode), pos,
> -                                       &write_bytes) > 0) {
> +                           btrfs_check_can_nocow(BTRFS_I(inode), pos,
> +                                                 &write_bytes) > 0) {
>                                 /*
>                                  * For nodata cow case, no need to reserve
>                                  * data space.
> @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>                  */
>                 if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>                                               BTRFS_INODE_PREALLOC)) ||
> -                   check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
> +                   btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>                         inode_unlock(inode);
>                         return -EAGAIN;
>                 }
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5509c41a4f43..b5ae4bbf1ad4 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>         struct extent_state *cached_state = NULL;
>         struct extent_changeset *data_reserved = NULL;
>         char *kaddr;
> +       bool only_release_metadata = false;
>         u32 blocksize = fs_info->sectorsize;
>         pgoff_t index = from >> PAGE_SHIFT;
>         unsigned offset = from & (blocksize - 1);
>         struct page *page;
>         gfp_t mask = btrfs_alloc_write_mask(mapping);
> +       size_t write_bytes = blocksize;
>         int ret = 0;
>         u64 block_start;
>         u64 block_end;
> @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>         block_start = round_down(from, blocksize);
>         block_end = block_start + blocksize - 1;
>
> -       ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
> -                                          block_start, blocksize);
> -       if (ret)
> +       ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
> +                                         blocksize);
> +       if (ret < 0) {
> +               if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> +                                             BTRFS_INODE_PREALLOC)) &&
> +                   btrfs_check_can_nocow(BTRFS_I(inode), block_start,
> +                                         &write_bytes) > 0) {
> +                       /* For nocow case, no need to reserve data space. */
> +                       only_release_metadata = true;
> +               } else {
> +                       goto out;
> +               }
> +       }
> +       ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
> +       if (ret < 0) {
> +               if (!only_release_metadata)
> +                       btrfs_free_reserved_data_space(inode, data_reserved,
> +                                       block_start, blocksize);
>                 goto out;
> -
> +       }
>  again:
>         page = find_or_create_page(mapping, index, mask);
>         if (!page) {
> @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>         set_page_dirty(page);
>         unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
>
> +       if (only_release_metadata)
> +               set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
> +                               block_end, EXTENT_NORESERVE, NULL, NULL,
> +                               GFP_NOFS);
> +
>  out_unlock:
> -       if (ret)
> -               btrfs_delalloc_release_space(inode, data_reserved, block_start,
> -                                            blocksize, true);
> +       if (ret) {
> +               if (!only_release_metadata)
> +                       btrfs_delalloc_release_space(inode, data_reserved,
> +                                       block_start, blocksize, true);
> +               else
> +                       btrfs_delalloc_release_metadata(BTRFS_I(inode),
> +                                       blocksize, true);

I usually find it more intuitive to have it the other way around:

if (only_release_metadata)
  ...
else
  ...

E.g., positive case first, negative in the else branch. But that's
likely too much of a personal preference.

Thanks.

> +       }
>         btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
>         unlock_page(page);
>         put_page(page);
> --
> 2.25.0
>
Qu Wenruo Jan. 30, 2020, 10:36 a.m. UTC | #2
On 2020/1/30 下午6:02, Filipe Manana wrote:
> On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> When the data space is exhausted, even the inode has NOCOW attribute,
>> btrfs will still refuse to truncate unaligned range due to ENOSPC.
>>
>> The following script can reproduce it pretty easily:
>>   #!/bin/bash
>>
>>   dev=/dev/test/test
>>   mnt=/mnt/btrfs
>>
>>   umount $dev &> /dev/null
>>   umount $mnt&> /dev/null
>>
>>   mkfs.btrfs -f $dev -b 1G
>>   mount -o nospace_cache $dev $mnt
>>   touch $mnt/foobar
>>   chattr +C $mnt/foobar
>>
>>   xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
>>   xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
>>   sync
>>
>>   xfs_io -c "fpunch 0 2k" $mnt/foobar
>>   umount $mnt
>>
>> Current btrfs will fail at the fpunch part.
>>
>> [CAUSE]
>> Because btrfs_truncate_block() always reserve space without checking the
>> NOCOW attribute.
>>
>> Since the writeback path follows NOCOW bit, we only need to bother the
>> space reservation code in btrfs_truncate_block().
>>
>> [FIX]
>> Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to
>> reserve data space first, and falls back to NOCOW check only when we
>> don't have enough space.
>>
>> Such always-try-reserve is an optimization introduced in
>> btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
>>
>> Since now check_can_nocow() is needed outside of inode.c, also export it
>> and rename it to btrfs_check_can_nocow().
>>
>> Reported-by: Martin Doucha <martin.doucha@suse.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Test case will be submitted to fstests by the reporter.
> 
> Well, this is a sudden change of mind, isn't it? :)
> 
> We had btrfs/172, which you removed very recently, that precisely tested this:
> 
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126

I didn't notice the nodatacow mount option. Super duper big facepalm.

All my bad, especially feel sorry for Anand.

With nodatacow mount option there, that test case in fact makes a lot of
sense.
Sorry again for that.

Anand, mind to resubmit it to generic group?

Thanks,
Qu

> 
> Even though there are several reasons why this can still fail (at
> writeback time), like regular buffered writes through the family of
> write() syscalls can, I think it's perfectly fine to have this
> behaviour.
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> So I think we can just resurrect btrfs/172 now...
> 
>> ---
>>  fs/btrfs/ctree.h |  2 ++
>>  fs/btrfs/file.c  | 10 +++++-----
>>  fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++-------
>>  3 files changed, 41 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 54efb21c2727..b5639f3461e4 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>>  loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
>>                               struct file *file_out, loff_t pos_out,
>>                               loff_t len, unsigned int remap_flags);
>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> +                         size_t *write_bytes);
>>
>>  /* tree-defrag.c */
>>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 8d47c76b7bd1..8dc084600f4e 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>>         return ret;
>>  }
>>
>> -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> -                                   size_t *write_bytes)
>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> +                         size_t *write_bytes)
>>  {
>>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
>>         struct btrfs_root *root = inode->root;
>> @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>                 if (ret < 0) {
>>                         if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>                                                       BTRFS_INODE_PREALLOC)) &&
>> -                           check_can_nocow(BTRFS_I(inode), pos,
>> -                                       &write_bytes) > 0) {
>> +                           btrfs_check_can_nocow(BTRFS_I(inode), pos,
>> +                                                 &write_bytes) > 0) {
>>                                 /*
>>                                  * For nodata cow case, no need to reserve
>>                                  * data space.
>> @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>>                  */
>>                 if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>                                               BTRFS_INODE_PREALLOC)) ||
>> -                   check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>> +                   btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>>                         inode_unlock(inode);
>>                         return -EAGAIN;
>>                 }
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 5509c41a4f43..b5ae4bbf1ad4 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>         struct extent_state *cached_state = NULL;
>>         struct extent_changeset *data_reserved = NULL;
>>         char *kaddr;
>> +       bool only_release_metadata = false;
>>         u32 blocksize = fs_info->sectorsize;
>>         pgoff_t index = from >> PAGE_SHIFT;
>>         unsigned offset = from & (blocksize - 1);
>>         struct page *page;
>>         gfp_t mask = btrfs_alloc_write_mask(mapping);
>> +       size_t write_bytes = blocksize;
>>         int ret = 0;
>>         u64 block_start;
>>         u64 block_end;
>> @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>         block_start = round_down(from, blocksize);
>>         block_end = block_start + blocksize - 1;
>>
>> -       ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>> -                                          block_start, blocksize);
>> -       if (ret)
>> +       ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
>> +                                         blocksize);
>> +       if (ret < 0) {
>> +               if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>> +                                             BTRFS_INODE_PREALLOC)) &&
>> +                   btrfs_check_can_nocow(BTRFS_I(inode), block_start,
>> +                                         &write_bytes) > 0) {
>> +                       /* For nocow case, no need to reserve data space. */
>> +                       only_release_metadata = true;
>> +               } else {
>> +                       goto out;
>> +               }
>> +       }
>> +       ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
>> +       if (ret < 0) {
>> +               if (!only_release_metadata)
>> +                       btrfs_free_reserved_data_space(inode, data_reserved,
>> +                                       block_start, blocksize);
>>                 goto out;
>> -
>> +       }
>>  again:
>>         page = find_or_create_page(mapping, index, mask);
>>         if (!page) {
>> @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>         set_page_dirty(page);
>>         unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
>>
>> +       if (only_release_metadata)
>> +               set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
>> +                               block_end, EXTENT_NORESERVE, NULL, NULL,
>> +                               GFP_NOFS);
>> +
>>  out_unlock:
>> -       if (ret)
>> -               btrfs_delalloc_release_space(inode, data_reserved, block_start,
>> -                                            blocksize, true);
>> +       if (ret) {
>> +               if (!only_release_metadata)
>> +                       btrfs_delalloc_release_space(inode, data_reserved,
>> +                                       block_start, blocksize, true);
>> +               else
>> +                       btrfs_delalloc_release_metadata(BTRFS_I(inode),
>> +                                       blocksize, true);
> 
> I usually find it more intuitive to have it the other way around:
> 
> if (only_release_metadata)
>   ...
> else
>   ...
> 
> E.g., positive case first, negative in the else branch. But that's
> likely too much of a personal preference.
> 
> Thanks.
> 
>> +       }
>>         btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
>>         unlock_page(page);
>>         put_page(page);
>> --
>> 2.25.0
>>
> 
>
Filipe Manana Jan. 30, 2020, 10:46 a.m. UTC | #3
On Thu, Jan 30, 2020 at 10:36 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2020/1/30 下午6:02, Filipe Manana wrote:
> > On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> [BUG]
> >> When the data space is exhausted, even the inode has NOCOW attribute,
> >> btrfs will still refuse to truncate unaligned range due to ENOSPC.
> >>
> >> The following script can reproduce it pretty easily:
> >>   #!/bin/bash
> >>
> >>   dev=/dev/test/test
> >>   mnt=/mnt/btrfs
> >>
> >>   umount $dev &> /dev/null
> >>   umount $mnt&> /dev/null
> >>
> >>   mkfs.btrfs -f $dev -b 1G
> >>   mount -o nospace_cache $dev $mnt
> >>   touch $mnt/foobar
> >>   chattr +C $mnt/foobar
> >>
> >>   xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
> >>   xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
> >>   sync
> >>
> >>   xfs_io -c "fpunch 0 2k" $mnt/foobar
> >>   umount $mnt
> >>
> >> Current btrfs will fail at the fpunch part.
> >>
> >> [CAUSE]
> >> Because btrfs_truncate_block() always reserve space without checking the
> >> NOCOW attribute.
> >>
> >> Since the writeback path follows NOCOW bit, we only need to bother the
> >> space reservation code in btrfs_truncate_block().
> >>
> >> [FIX]
> >> Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to
> >> reserve data space first, and falls back to NOCOW check only when we
> >> don't have enough space.
> >>
> >> Such always-try-reserve is an optimization introduced in
> >> btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
> >>
> >> Since now check_can_nocow() is needed outside of inode.c, also export it
> >> and rename it to btrfs_check_can_nocow().
> >>
> >> Reported-by: Martin Doucha <martin.doucha@suse.com>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >> Test case will be submitted to fstests by the reporter.
> >
> > Well, this is a sudden change of mind, isn't it? :)
> >
> > We had btrfs/172, which you removed very recently, that precisely tested this:
> >
> > https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126
>
> I didn't notice the nodatacow mount option. Super duper big facepalm.
>
> All my bad, especially feel sorry for Anand.
>
> With nodatacow mount option there, that test case in fact makes a lot of
> sense.
> Sorry again for that.
>
> Anand, mind to resubmit it to generic group?

Why the generic group?
The nodatacow mount option is btrfs specific, and most filesystems
don't support chattr +C (ext4 for example).

>
> Thanks,
> Qu
>
> >
> > Even though there are several reasons why this can still fail (at
> > writeback time), like regular buffered writes through the family of
> > write() syscalls can, I think it's perfectly fine to have this
> > behaviour.
> >
> > Reviewed-by: Filipe Manana <fdmanana@suse.com>
> >
> > So I think we can just resurrect btrfs/172 now...
> >
> >> ---
> >>  fs/btrfs/ctree.h |  2 ++
> >>  fs/btrfs/file.c  | 10 +++++-----
> >>  fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++-------
> >>  3 files changed, 41 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> >> index 54efb21c2727..b5639f3461e4 100644
> >> --- a/fs/btrfs/ctree.h
> >> +++ b/fs/btrfs/ctree.h
> >> @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
> >>  loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
> >>                               struct file *file_out, loff_t pos_out,
> >>                               loff_t len, unsigned int remap_flags);
> >> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> >> +                         size_t *write_bytes);
> >>
> >>  /* tree-defrag.c */
> >>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
> >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> >> index 8d47c76b7bd1..8dc084600f4e 100644
> >> --- a/fs/btrfs/file.c
> >> +++ b/fs/btrfs/file.c
> >> @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
> >>         return ret;
> >>  }
> >>
> >> -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> >> -                                   size_t *write_bytes)
> >> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> >> +                         size_t *write_bytes)
> >>  {
> >>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
> >>         struct btrfs_root *root = inode->root;
> >> @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> >>                 if (ret < 0) {
> >>                         if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> >>                                                       BTRFS_INODE_PREALLOC)) &&
> >> -                           check_can_nocow(BTRFS_I(inode), pos,
> >> -                                       &write_bytes) > 0) {
> >> +                           btrfs_check_can_nocow(BTRFS_I(inode), pos,
> >> +                                                 &write_bytes) > 0) {
> >>                                 /*
> >>                                  * For nodata cow case, no need to reserve
> >>                                  * data space.
> >> @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
> >>                  */
> >>                 if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> >>                                               BTRFS_INODE_PREALLOC)) ||
> >> -                   check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
> >> +                   btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
> >>                         inode_unlock(inode);
> >>                         return -EAGAIN;
> >>                 }
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index 5509c41a4f43..b5ae4bbf1ad4 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
> >>         struct extent_state *cached_state = NULL;
> >>         struct extent_changeset *data_reserved = NULL;
> >>         char *kaddr;
> >> +       bool only_release_metadata = false;
> >>         u32 blocksize = fs_info->sectorsize;
> >>         pgoff_t index = from >> PAGE_SHIFT;
> >>         unsigned offset = from & (blocksize - 1);
> >>         struct page *page;
> >>         gfp_t mask = btrfs_alloc_write_mask(mapping);
> >> +       size_t write_bytes = blocksize;
> >>         int ret = 0;
> >>         u64 block_start;
> >>         u64 block_end;
> >> @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
> >>         block_start = round_down(from, blocksize);
> >>         block_end = block_start + blocksize - 1;
> >>
> >> -       ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
> >> -                                          block_start, blocksize);
> >> -       if (ret)
> >> +       ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
> >> +                                         blocksize);
> >> +       if (ret < 0) {
> >> +               if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> >> +                                             BTRFS_INODE_PREALLOC)) &&
> >> +                   btrfs_check_can_nocow(BTRFS_I(inode), block_start,
> >> +                                         &write_bytes) > 0) {
> >> +                       /* For nocow case, no need to reserve data space. */
> >> +                       only_release_metadata = true;
> >> +               } else {
> >> +                       goto out;
> >> +               }
> >> +       }
> >> +       ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
> >> +       if (ret < 0) {
> >> +               if (!only_release_metadata)
> >> +                       btrfs_free_reserved_data_space(inode, data_reserved,
> >> +                                       block_start, blocksize);
> >>                 goto out;
> >> -
> >> +       }
> >>  again:
> >>         page = find_or_create_page(mapping, index, mask);
> >>         if (!page) {
> >> @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
> >>         set_page_dirty(page);
> >>         unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
> >>
> >> +       if (only_release_metadata)
> >> +               set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
> >> +                               block_end, EXTENT_NORESERVE, NULL, NULL,
> >> +                               GFP_NOFS);
> >> +
> >>  out_unlock:
> >> -       if (ret)
> >> -               btrfs_delalloc_release_space(inode, data_reserved, block_start,
> >> -                                            blocksize, true);
> >> +       if (ret) {
> >> +               if (!only_release_metadata)
> >> +                       btrfs_delalloc_release_space(inode, data_reserved,
> >> +                                       block_start, blocksize, true);
> >> +               else
> >> +                       btrfs_delalloc_release_metadata(BTRFS_I(inode),
> >> +                                       blocksize, true);
> >
> > I usually find it more intuitive to have it the other way around:
> >
> > if (only_release_metadata)
> >   ...
> > else
> >   ...
> >
> > E.g., positive case first, negative in the else branch. But that's
> > likely too much of a personal preference.
> >
> > Thanks.
> >
> >> +       }
> >>         btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
> >>         unlock_page(page);
> >>         put_page(page);
> >> --
> >> 2.25.0
> >>
> >
> >
>
Qu Wenruo Jan. 30, 2020, 11:02 a.m. UTC | #4
On 2020/1/30 下午6:46, Filipe Manana wrote:
> On Thu, Jan 30, 2020 at 10:36 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2020/1/30 下午6:02, Filipe Manana wrote:
>>> On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@suse.com> wrote:
>>>>
>>>> [BUG]
>>>> When the data space is exhausted, even the inode has NOCOW attribute,
>>>> btrfs will still refuse to truncate unaligned range due to ENOSPC.
>>>>
>>>> The following script can reproduce it pretty easily:
>>>>   #!/bin/bash
>>>>
>>>>   dev=/dev/test/test
>>>>   mnt=/mnt/btrfs
>>>>
>>>>   umount $dev &> /dev/null
>>>>   umount $mnt&> /dev/null
>>>>
>>>>   mkfs.btrfs -f $dev -b 1G
>>>>   mount -o nospace_cache $dev $mnt
>>>>   touch $mnt/foobar
>>>>   chattr +C $mnt/foobar
>>>>
>>>>   xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
>>>>   xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
>>>>   sync
>>>>
>>>>   xfs_io -c "fpunch 0 2k" $mnt/foobar
>>>>   umount $mnt
>>>>
>>>> Current btrfs will fail at the fpunch part.
>>>>
>>>> [CAUSE]
>>>> Because btrfs_truncate_block() always reserve space without checking the
>>>> NOCOW attribute.
>>>>
>>>> Since the writeback path follows NOCOW bit, we only need to bother the
>>>> space reservation code in btrfs_truncate_block().
>>>>
>>>> [FIX]
>>>> Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to
>>>> reserve data space first, and falls back to NOCOW check only when we
>>>> don't have enough space.
>>>>
>>>> Such always-try-reserve is an optimization introduced in
>>>> btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
>>>>
>>>> Since now check_can_nocow() is needed outside of inode.c, also export it
>>>> and rename it to btrfs_check_can_nocow().
>>>>
>>>> Reported-by: Martin Doucha <martin.doucha@suse.com>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>> Test case will be submitted to fstests by the reporter.
>>>
>>> Well, this is a sudden change of mind, isn't it? :)
>>>
>>> We had btrfs/172, which you removed very recently, that precisely tested this:
>>>
>>> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126
>>
>> I didn't notice the nodatacow mount option. Super duper big facepalm.
>>
>> All my bad, especially feel sorry for Anand.
>>
>> With nodatacow mount option there, that test case in fact makes a lot of
>> sense.
>> Sorry again for that.
>>
>> Anand, mind to resubmit it to generic group?
> 
> Why the generic group?

Since all other fses should have the same behavior.
Either it supports COW, and get disabled by that chattr, and go ahead.
Or it doesn't support COW, the truncate should go overwrite directly
with or without that chattr +C.

> The nodatacow mount option is btrfs specific, and most filesystems
> don't support chattr +C (ext4 for example).

We can just ignore the chattr call for unsupported fs, and go ahead
without any problem.

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> Even though there are several reasons why this can still fail (at
>>> writeback time), like regular buffered writes through the family of
>>> write() syscalls can, I think it's perfectly fine to have this
>>> behaviour.
>>>
>>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>>>
>>> So I think we can just resurrect btrfs/172 now...
>>>
>>>> ---
>>>>  fs/btrfs/ctree.h |  2 ++
>>>>  fs/btrfs/file.c  | 10 +++++-----
>>>>  fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++-------
>>>>  3 files changed, 41 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>> index 54efb21c2727..b5639f3461e4 100644
>>>> --- a/fs/btrfs/ctree.h
>>>> +++ b/fs/btrfs/ctree.h
>>>> @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>>>>  loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
>>>>                               struct file *file_out, loff_t pos_out,
>>>>                               loff_t len, unsigned int remap_flags);
>>>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>>>> +                         size_t *write_bytes);
>>>>
>>>>  /* tree-defrag.c */
>>>>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
>>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>>> index 8d47c76b7bd1..8dc084600f4e 100644
>>>> --- a/fs/btrfs/file.c
>>>> +++ b/fs/btrfs/file.c
>>>> @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>>>>         return ret;
>>>>  }
>>>>
>>>> -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>>>> -                                   size_t *write_bytes)
>>>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>>>> +                         size_t *write_bytes)
>>>>  {
>>>>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
>>>>         struct btrfs_root *root = inode->root;
>>>> @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>>>                 if (ret < 0) {
>>>>                         if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>>>                                                       BTRFS_INODE_PREALLOC)) &&
>>>> -                           check_can_nocow(BTRFS_I(inode), pos,
>>>> -                                       &write_bytes) > 0) {
>>>> +                           btrfs_check_can_nocow(BTRFS_I(inode), pos,
>>>> +                                                 &write_bytes) > 0) {
>>>>                                 /*
>>>>                                  * For nodata cow case, no need to reserve
>>>>                                  * data space.
>>>> @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>>>>                  */
>>>>                 if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>>>                                               BTRFS_INODE_PREALLOC)) ||
>>>> -                   check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>>>> +                   btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>>>>                         inode_unlock(inode);
>>>>                         return -EAGAIN;
>>>>                 }
>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>> index 5509c41a4f43..b5ae4bbf1ad4 100644
>>>> --- a/fs/btrfs/inode.c
>>>> +++ b/fs/btrfs/inode.c
>>>> @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>>>         struct extent_state *cached_state = NULL;
>>>>         struct extent_changeset *data_reserved = NULL;
>>>>         char *kaddr;
>>>> +       bool only_release_metadata = false;
>>>>         u32 blocksize = fs_info->sectorsize;
>>>>         pgoff_t index = from >> PAGE_SHIFT;
>>>>         unsigned offset = from & (blocksize - 1);
>>>>         struct page *page;
>>>>         gfp_t mask = btrfs_alloc_write_mask(mapping);
>>>> +       size_t write_bytes = blocksize;
>>>>         int ret = 0;
>>>>         u64 block_start;
>>>>         u64 block_end;
>>>> @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>>>         block_start = round_down(from, blocksize);
>>>>         block_end = block_start + blocksize - 1;
>>>>
>>>> -       ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>>>> -                                          block_start, blocksize);
>>>> -       if (ret)
>>>> +       ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
>>>> +                                         blocksize);
>>>> +       if (ret < 0) {
>>>> +               if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>>> +                                             BTRFS_INODE_PREALLOC)) &&
>>>> +                   btrfs_check_can_nocow(BTRFS_I(inode), block_start,
>>>> +                                         &write_bytes) > 0) {
>>>> +                       /* For nocow case, no need to reserve data space. */
>>>> +                       only_release_metadata = true;
>>>> +               } else {
>>>> +                       goto out;
>>>> +               }
>>>> +       }
>>>> +       ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
>>>> +       if (ret < 0) {
>>>> +               if (!only_release_metadata)
>>>> +                       btrfs_free_reserved_data_space(inode, data_reserved,
>>>> +                                       block_start, blocksize);
>>>>                 goto out;
>>>> -
>>>> +       }
>>>>  again:
>>>>         page = find_or_create_page(mapping, index, mask);
>>>>         if (!page) {
>>>> @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>>>         set_page_dirty(page);
>>>>         unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
>>>>
>>>> +       if (only_release_metadata)
>>>> +               set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
>>>> +                               block_end, EXTENT_NORESERVE, NULL, NULL,
>>>> +                               GFP_NOFS);
>>>> +
>>>>  out_unlock:
>>>> -       if (ret)
>>>> -               btrfs_delalloc_release_space(inode, data_reserved, block_start,
>>>> -                                            blocksize, true);
>>>> +       if (ret) {
>>>> +               if (!only_release_metadata)
>>>> +                       btrfs_delalloc_release_space(inode, data_reserved,
>>>> +                                       block_start, blocksize, true);
>>>> +               else
>>>> +                       btrfs_delalloc_release_metadata(BTRFS_I(inode),
>>>> +                                       blocksize, true);
>>>
>>> I usually find it more intuitive to have it the other way around:
>>>
>>> if (only_release_metadata)
>>>   ...
>>> else
>>>   ...
>>>
>>> E.g., positive case first, negative in the else branch. But that's
>>> likely too much of a personal preference.
>>>
>>> Thanks.
>>>
>>>> +       }
>>>>         btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
>>>>         unlock_page(page);
>>>>         put_page(page);
>>>> --
>>>> 2.25.0
>>>>
>>>
>>>
>>
> 
>
Anand Jain Jan. 31, 2020, 4:42 a.m. UTC | #5
On 30/1/20 7:02 PM, Qu Wenruo wrote:
> 
> 
> On 2020/1/30 下午6:46, Filipe Manana wrote:
>> On Thu, Jan 30, 2020 at 10:36 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>>
>>> On 2020/1/30 下午6:02, Filipe Manana wrote:
>>>> On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@suse.com> wrote:
>>>>>
>>>>> [BUG]
>>>>> When the data space is exhausted, even the inode has NOCOW attribute,
>>>>> btrfs will still refuse to truncate unaligned range due to ENOSPC.
>>>>>
>>>>> The following script can reproduce it pretty easily:
>>>>>    #!/bin/bash
>>>>>
>>>>>    dev=/dev/test/test
>>>>>    mnt=/mnt/btrfs
>>>>>
>>>>>    umount $dev &> /dev/null
>>>>>    umount $mnt&> /dev/null
>>>>>
>>>>>    mkfs.btrfs -f $dev -b 1G
>>>>>    mount -o nospace_cache $dev $mnt
>>>>>    touch $mnt/foobar
>>>>>    chattr +C $mnt/foobar
>>>>>
>>>>>    xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
>>>>>    xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
>>>>>    sync
>>>>>
>>>>>    xfs_io -c "fpunch 0 2k" $mnt/foobar
>>>>>    umount $mnt
>>>>>
>>>>> Current btrfs will fail at the fpunch part.
>>>>>
>>>>> [CAUSE]
>>>>> Because btrfs_truncate_block() always reserve space without checking the
>>>>> NOCOW attribute.
>>>>>
>>>>> Since the writeback path follows NOCOW bit, we only need to bother the
>>>>> space reservation code in btrfs_truncate_block().
>>>>>
>>>>> [FIX]
>>>>> Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to
>>>>> reserve data space first, and falls back to NOCOW check only when we
>>>>> don't have enough space.
>>>>>
>>>>> Such always-try-reserve is an optimization introduced in
>>>>> btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
>>>>>
>>>>> Since now check_can_nocow() is needed outside of inode.c, also export it
>>>>> and rename it to btrfs_check_can_nocow().
>>>>>
>>>>> Reported-by: Martin Doucha <martin.doucha@suse.com>
>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>> ---
>>>>> Test case will be submitted to fstests by the reporter.
>>>>
>>>> Well, this is a sudden change of mind, isn't it? :)
>>>>
>>>> We had btrfs/172, which you removed very recently, that precisely tested this:
>>>>
>>>> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126
>>>
>>> I didn't notice the nodatacow mount option. Super duper big facepalm.
>>>
>>> All my bad, especially feel sorry for Anand.
>>>
>>> With nodatacow mount option there, that test case in fact makes a lot of
>>> sense.
>>> Sorry again for that.
>>>
>>> Anand, mind to resubmit it to generic group?


  Lets restore it step by step. Patch sent to restore.
  You may like bring it to generic.

  Tested-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


>> Why the generic group?
> 
> Since all other fses should have the same behavior.
> Either it supports COW, and get disabled by that chattr, and go ahead.
> Or it doesn't support COW, the truncate should go overwrite directly
> with or without that chattr +C.
> 
>> The nodatacow mount option is btrfs specific, and most filesystems
>> don't support chattr +C (ext4 for example).
> 
> We can just ignore the chattr call for unsupported fs, and go ahead
> without any problem.



> Thanks,
> Qu
> 
>>
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> Even though there are several reasons why this can still fail (at
>>>> writeback time), like regular buffered writes through the family of
>>>> write() syscalls can, I think it's perfectly fine to have this
>>>> behaviour.
>>>>
>>>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>>>>
>>>> So I think we can just resurrect btrfs/172 now...
>>>>
>>>>> ---
>>>>>   fs/btrfs/ctree.h |  2 ++
>>>>>   fs/btrfs/file.c  | 10 +++++-----
>>>>>   fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++-------
>>>>>   3 files changed, 41 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>> index 54efb21c2727..b5639f3461e4 100644
>>>>> --- a/fs/btrfs/ctree.h
>>>>> +++ b/fs/btrfs/ctree.h
>>>>> @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>>>>>   loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
>>>>>                                struct file *file_out, loff_t pos_out,
>>>>>                                loff_t len, unsigned int remap_flags);
>>>>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>>>>> +                         size_t *write_bytes);
>>>>>
>>>>>   /* tree-defrag.c */
>>>>>   int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
>>>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>>>> index 8d47c76b7bd1..8dc084600f4e 100644
>>>>> --- a/fs/btrfs/file.c
>>>>> +++ b/fs/btrfs/file.c
>>>>> @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>>>>>          return ret;
>>>>>   }
>>>>>
>>>>> -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>>>>> -                                   size_t *write_bytes)
>>>>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>>>>> +                         size_t *write_bytes)
>>>>>   {
>>>>>          struct btrfs_fs_info *fs_info = inode->root->fs_info;
>>>>>          struct btrfs_root *root = inode->root;
>>>>> @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>>>>                  if (ret < 0) {
>>>>>                          if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>>>>                                                        BTRFS_INODE_PREALLOC)) &&
>>>>> -                           check_can_nocow(BTRFS_I(inode), pos,
>>>>> -                                       &write_bytes) > 0) {
>>>>> +                           btrfs_check_can_nocow(BTRFS_I(inode), pos,
>>>>> +                                                 &write_bytes) > 0) {
>>>>>                                  /*
>>>>>                                   * For nodata cow case, no need to reserve
>>>>>                                   * data space.
>>>>> @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>>>>>                   */
>>>>>                  if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>>>>                                                BTRFS_INODE_PREALLOC)) ||
>>>>> -                   check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>>>>> +                   btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>>>>>                          inode_unlock(inode);
>>>>>                          return -EAGAIN;
>>>>>                  }
>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>>> index 5509c41a4f43..b5ae4bbf1ad4 100644
>>>>> --- a/fs/btrfs/inode.c
>>>>> +++ b/fs/btrfs/inode.c
>>>>> @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>>>>          struct extent_state *cached_state = NULL;
>>>>>          struct extent_changeset *data_reserved = NULL;
>>>>>          char *kaddr;
>>>>> +       bool only_release_metadata = false;
>>>>>          u32 blocksize = fs_info->sectorsize;
>>>>>          pgoff_t index = from >> PAGE_SHIFT;
>>>>>          unsigned offset = from & (blocksize - 1);
>>>>>          struct page *page;
>>>>>          gfp_t mask = btrfs_alloc_write_mask(mapping);
>>>>> +       size_t write_bytes = blocksize;
>>>>>          int ret = 0;
>>>>>          u64 block_start;
>>>>>          u64 block_end;
>>>>> @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>>>>          block_start = round_down(from, blocksize);
>>>>>          block_end = block_start + blocksize - 1;
>>>>>
>>>>> -       ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>>>>> -                                          block_start, blocksize);
>>>>> -       if (ret)
>>>>> +       ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
>>>>> +                                         blocksize);
>>>>> +       if (ret < 0) {
>>>>> +               if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>>>> +                                             BTRFS_INODE_PREALLOC)) &&
>>>>> +                   btrfs_check_can_nocow(BTRFS_I(inode), block_start,
>>>>> +                                         &write_bytes) > 0) {
>>>>> +                       /* For nocow case, no need to reserve data space. */
>>>>> +                       only_release_metadata = true;
>>>>> +               } else {
>>>>> +                       goto out;
>>>>> +               }
>>>>> +       }
>>>>> +       ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
>>>>> +       if (ret < 0) {
>>>>> +               if (!only_release_metadata)
>>>>> +                       btrfs_free_reserved_data_space(inode, data_reserved,
>>>>> +                                       block_start, blocksize);
>>>>>                  goto out;
>>>>> -
>>>>> +       }
>>>>>   again:
>>>>>          page = find_or_create_page(mapping, index, mask);
>>>>>          if (!page) {
>>>>> @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>>>>          set_page_dirty(page);
>>>>>          unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
>>>>>
>>>>> +       if (only_release_metadata)
>>>>> +               set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
>>>>> +                               block_end, EXTENT_NORESERVE, NULL, NULL,
>>>>> +                               GFP_NOFS);
>>>>> +
>>>>>   out_unlock:
>>>>> -       if (ret)
>>>>> -               btrfs_delalloc_release_space(inode, data_reserved, block_start,
>>>>> -                                            blocksize, true);
>>>>> +       if (ret) {
>>>>> +               if (!only_release_metadata)
>>>>> +                       btrfs_delalloc_release_space(inode, data_reserved,
>>>>> +                                       block_start, blocksize, true);
>>>>> +               else
>>>>> +                       btrfs_delalloc_release_metadata(BTRFS_I(inode),
>>>>> +                                       blocksize, true);
>>>>
>>>> I usually find it more intuitive to have it the other way around:
>>>>
>>>> if (only_release_metadata)
>>>>    ...
>>>> else
>>>>    ...
>>>>
>>>> E.g., positive case first, negative in the else branch. But that's
>>>> likely too much of a personal preference.
>>>>
>>>> Thanks.
>>>>
>>>>> +       }
>>>>>          btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
>>>>>          unlock_page(page);
>>>>>          put_page(page);
>>>>> --
>>>>> 2.25.0
>>>>>
>>>>
>>>>
>>>
>>
>>
>
diff mbox series

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 54efb21c2727..b5639f3461e4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2954,6 +2954,8 @@  int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
 loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
 			      struct file *file_out, loff_t pos_out,
 			      loff_t len, unsigned int remap_flags);
+int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
+			  size_t *write_bytes);
 
 /* tree-defrag.c */
 int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 8d47c76b7bd1..8dc084600f4e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1544,8 +1544,8 @@  lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 	return ret;
 }
 
-static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
-				    size_t *write_bytes)
+int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
+			  size_t *write_bytes)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_root *root = inode->root;
@@ -1645,8 +1645,8 @@  static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		if (ret < 0) {
 			if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
 						      BTRFS_INODE_PREALLOC)) &&
-			    check_can_nocow(BTRFS_I(inode), pos,
-					&write_bytes) > 0) {
+			    btrfs_check_can_nocow(BTRFS_I(inode), pos,
+						  &write_bytes) > 0) {
 				/*
 				 * For nodata cow case, no need to reserve
 				 * data space.
@@ -1923,7 +1923,7 @@  static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 		 */
 		if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
 					      BTRFS_INODE_PREALLOC)) ||
-		    check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
+		    btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
 			inode_unlock(inode);
 			return -EAGAIN;
 		}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5509c41a4f43..b5ae4bbf1ad4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4974,11 +4974,13 @@  int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	struct extent_state *cached_state = NULL;
 	struct extent_changeset *data_reserved = NULL;
 	char *kaddr;
+	bool only_release_metadata = false;
 	u32 blocksize = fs_info->sectorsize;
 	pgoff_t index = from >> PAGE_SHIFT;
 	unsigned offset = from & (blocksize - 1);
 	struct page *page;
 	gfp_t mask = btrfs_alloc_write_mask(mapping);
+	size_t write_bytes = blocksize;
 	int ret = 0;
 	u64 block_start;
 	u64 block_end;
@@ -4990,11 +4992,26 @@  int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	block_start = round_down(from, blocksize);
 	block_end = block_start + blocksize - 1;
 
-	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
-					   block_start, blocksize);
-	if (ret)
+	ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
+					  blocksize);
+	if (ret < 0) {
+		if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+					      BTRFS_INODE_PREALLOC)) &&
+		    btrfs_check_can_nocow(BTRFS_I(inode), block_start,
+					  &write_bytes) > 0) {
+			/* For nocow case, no need to reserve data space. */
+			only_release_metadata = true;
+		} else {
+			goto out;
+		}
+	}
+	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
+	if (ret < 0) {
+		if (!only_release_metadata)
+			btrfs_free_reserved_data_space(inode, data_reserved,
+					block_start, blocksize);
 		goto out;
-
+	}
 again:
 	page = find_or_create_page(mapping, index, mask);
 	if (!page) {
@@ -5063,10 +5080,20 @@  int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	set_page_dirty(page);
 	unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
 
+	if (only_release_metadata)
+		set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
+				block_end, EXTENT_NORESERVE, NULL, NULL,
+				GFP_NOFS);
+
 out_unlock:
-	if (ret)
-		btrfs_delalloc_release_space(inode, data_reserved, block_start,
-					     blocksize, true);
+	if (ret) {
+		if (!only_release_metadata)
+			btrfs_delalloc_release_space(inode, data_reserved,
+					block_start, blocksize, true);
+		else
+			btrfs_delalloc_release_metadata(BTRFS_I(inode),
+					blocksize, true);
+	}
 	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
 	unlock_page(page);
 	put_page(page);