[1/3] btrfs: Don't discard unwritten extents
diff mbox series

Message ID 20191121120331.29070-2-nborisov@suse.com
State New
Headers show
Series
  • 3 misc patches
Related show

Commit Message

Nikolay Borisov Nov. 21, 2019, 12:03 p.m. UTC
All callers of btrfs_free_reserved_extent (respectively
__btrfs_free_reserved_extent with in set to 0) pass in extents which
have only been reserved but not yet written to. Namely,

* In cow_file_range that function is called only if create_io_em fails or
btrfs_add_ordered_extent fail, both of which happen _before_ any IO is
submitted to the newly reserved range.

* In submit_compressed_extents the code flow is similar - out_free_reserve
can be called only before btrfs_submit_compressed_write which is where
any writes to the range could occur.

* btrfs_new_extent_direct also calls btrfs_free_reserved_extent only
if extent_map fails, before any IO is issued.

* __btrfs_prealloc_file_range also calls btrfs_free_reserved_extent
in case insertion of the metadata fails.

* btrfs_alloc_tree_block again can only be called in case in-memory
operations fail, before any IO is submitted.

* btrfs_finish_ordered_io - this is the only caller where discarding
the extent could have a material effect, since it can be called for
an extent which was partially written.

With this change the submission of discards is optimised since discards
are now not being created for extents which are known to not have been
touched on disk.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 2 --
 fs/btrfs/inode.c       | 7 ++++++-
 2 files changed, 6 insertions(+), 3 deletions(-)

Comments

David Sterba Nov. 27, 2019, 4:06 p.m. UTC | #1
On Thu, Nov 21, 2019 at 02:03:29PM +0200, Nikolay Borisov wrote:
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4169,8 +4169,6 @@ static int __btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info,
>  		if (ret)
>  			goto out;
>  	} else {
> -		if (btrfs_test_opt(fs_info, DISCARD))
> -			ret = btrfs_discard_extent(fs_info, start, len, NULL);
>  		btrfs_add_free_space(cache, start, len);
>  		btrfs_free_reserved_bytes(cache, len, delalloc);
>  		trace_btrfs_reserved_extent_free(fs_info, start, len);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0ac0f5b33003..5d80fe030e79 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3250,10 +3250,15 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
>  		if ((ret || !logical_len) &&
>  		    clear_reserved_extent &&
>  		    !test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags) &&
> -		    !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags))
> +		    !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) {
>  			btrfs_free_reserved_extent(fs_info,
>  						   ordered_extent->start,
>  						   ordered_extent->disk_len, 1);
> +			if (ret && btrfs_test_opt(fs_info, DISCARD))
> +				btrfs_discard_extent(fs_info,
> +				ordered_extent->start, ordered_extent->disk_len,
> +				NULL);

This brings back vague memories of misplaced discard (in
finish_ordered_io), that was quite hard to catch. I can't find the fix
though. Filipe, is it the same issue?
David Sterba Nov. 27, 2019, 4:15 p.m. UTC | #2
On Wed, Nov 27, 2019 at 05:06:00PM +0100, David Sterba wrote:
> On Thu, Nov 21, 2019 at 02:03:29PM +0200, Nikolay Borisov wrote:
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -4169,8 +4169,6 @@ static int __btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info,
> >  		if (ret)
> >  			goto out;
> >  	} else {
> > -		if (btrfs_test_opt(fs_info, DISCARD))
> > -			ret = btrfs_discard_extent(fs_info, start, len, NULL);
> >  		btrfs_add_free_space(cache, start, len);
> >  		btrfs_free_reserved_bytes(cache, len, delalloc);
> >  		trace_btrfs_reserved_extent_free(fs_info, start, len);
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 0ac0f5b33003..5d80fe030e79 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -3250,10 +3250,15 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
> >  		if ((ret || !logical_len) &&
> >  		    clear_reserved_extent &&
> >  		    !test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags) &&
> > -		    !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags))
> > +		    !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) {
> >  			btrfs_free_reserved_extent(fs_info,
> >  						   ordered_extent->start,
> >  						   ordered_extent->disk_len, 1);
> > +			if (ret && btrfs_test_opt(fs_info, DISCARD))
> > +				btrfs_discard_extent(fs_info,
> > +				ordered_extent->start, ordered_extent->disk_len,
> > +				NULL);
> 
> This brings back vague memories of misplaced discard (in
> finish_ordered_io), that was quite hard to catch. I can't find the fix
> though. Filipe, is it the same issue?

Closest to what I thought are dcc82f4783ad9 "Btrfs: fix log tree
corruption when fs mounted with -o discard" or 678886bdc6378 "Btrfs: fix
fs corruption on transaction abort if device supports discard".
Nikolay Borisov Nov. 27, 2019, 4:23 p.m. UTC | #3
On 27.11.19 г. 18:15 ч., David Sterba wrote:
> On Wed, Nov 27, 2019 at 05:06:00PM +0100, David Sterba wrote:
>> On Thu, Nov 21, 2019 at 02:03:29PM +0200, Nikolay Borisov wrote:
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -4169,8 +4169,6 @@ static int __btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info,
>>>  		if (ret)
>>>  			goto out;
>>>  	} else {
>>> -		if (btrfs_test_opt(fs_info, DISCARD))
>>> -			ret = btrfs_discard_extent(fs_info, start, len, NULL);
>>>  		btrfs_add_free_space(cache, start, len);
>>>  		btrfs_free_reserved_bytes(cache, len, delalloc);
>>>  		trace_btrfs_reserved_extent_free(fs_info, start, len);
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index 0ac0f5b33003..5d80fe030e79 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -3250,10 +3250,15 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
>>>  		if ((ret || !logical_len) &&
>>>  		    clear_reserved_extent &&
>>>  		    !test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags) &&
>>> -		    !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags))
>>> +		    !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) {
>>>  			btrfs_free_reserved_extent(fs_info,
>>>  						   ordered_extent->start,
>>>  						   ordered_extent->disk_len, 1);
>>> +			if (ret && btrfs_test_opt(fs_info, DISCARD))
>>> +				btrfs_discard_extent(fs_info,
>>> +				ordered_extent->start, ordered_extent->disk_len,
>>> +				NULL);
>>
>> This brings back vague memories of misplaced discard (in
>> finish_ordered_io), that was quite hard to catch. I can't find the fix
>> though. Filipe, is it the same issue?
> 
> Closest to what I thought are dcc82f4783ad9 "Btrfs: fix log tree
> corruption when fs mounted with -o discard" or 678886bdc6378 "Btrfs: fix
> fs corruption on transaction abort if device supports discard".
> 

Pinned extents are discarded when btrfs_finish_extent_commit is called
from transaction commit. If you look closer you'd see
btrfs_Free_Reserved_extent here means we call into
__btrfs_Free_Reserved_extent with pin equal to false, so the original
code would have gone to the else {} clause where the discard was. By
predicating on (ret) I only handle the case where an ordered extent
(which only applies to data) is potentially partially written.
Filipe Manana Nov. 27, 2019, 5:37 p.m. UTC | #4
On Wed, Nov 27, 2019 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Thu, Nov 21, 2019 at 02:03:29PM +0200, Nikolay Borisov wrote:
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -4169,8 +4169,6 @@ static int __btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info,
> >               if (ret)
> >                       goto out;
> >       } else {
> > -             if (btrfs_test_opt(fs_info, DISCARD))
> > -                     ret = btrfs_discard_extent(fs_info, start, len, NULL);
> >               btrfs_add_free_space(cache, start, len);
> >               btrfs_free_reserved_bytes(cache, len, delalloc);
> >               trace_btrfs_reserved_extent_free(fs_info, start, len);
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 0ac0f5b33003..5d80fe030e79 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -3250,10 +3250,15 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
> >               if ((ret || !logical_len) &&
> >                   clear_reserved_extent &&
> >                   !test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags) &&
> > -                 !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags))
> > +                 !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) {
> >                       btrfs_free_reserved_extent(fs_info,
> >                                                  ordered_extent->start,
> >                                                  ordered_extent->disk_len, 1);
> > +                     if (ret && btrfs_test_opt(fs_info, DISCARD))
> > +                             btrfs_discard_extent(fs_info,
> > +                             ordered_extent->start, ordered_extent->disk_len,
> > +                             NULL);
>
> This brings back vague memories of misplaced discard (in
> finish_ordered_io), that was quite hard to catch. I can't find the fix
> though. Filipe, is it the same issue?

It's different (as you identified already in another reply).
Even though this is rarely hit on a healthy system, it looks fine to me.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

thanks

Patch
diff mbox series

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fe0f33dd344d..613c7bbf5cbd 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4169,8 +4169,6 @@  static int __btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info,
 		if (ret)
 			goto out;
 	} else {
-		if (btrfs_test_opt(fs_info, DISCARD))
-			ret = btrfs_discard_extent(fs_info, start, len, NULL);
 		btrfs_add_free_space(cache, start, len);
 		btrfs_free_reserved_bytes(cache, len, delalloc);
 		trace_btrfs_reserved_extent_free(fs_info, start, len);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0ac0f5b33003..5d80fe030e79 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3250,10 +3250,15 @@  static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		if ((ret || !logical_len) &&
 		    clear_reserved_extent &&
 		    !test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags) &&
-		    !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags))
+		    !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) {
 			btrfs_free_reserved_extent(fs_info,
 						   ordered_extent->start,
 						   ordered_extent->disk_len, 1);
+			if (ret && btrfs_test_opt(fs_info, DISCARD))
+				btrfs_discard_extent(fs_info,
+				ordered_extent->start, ordered_extent->disk_len,
+				NULL);
+		}
 	}