[V2] Btrfs: really fix trim 0 bytes after a device delete
diff mbox

Message ID 21672.2859.468940.450115@localhost.localdomain
State New, archived
Headers show

Commit Message

Lutz Euler Jan. 3, 2015, 3:30 p.m. UTC
Commit 2cac13e41bf5b99ffc426bd28dfd2248df1dfa67, "fix trim 0 bytes after
a device delete", said:
  A user reported a bug of btrfs's trim, that is we will trim 0 bytes
  after a device delete.
The commit didn't attack the root of the problem so did not fix the bug
except for a special case.

For block discard, btrfs_trim_fs directly compares the range passed in
against the filesystem's objectids. The former is bounded by the sum of
the sizes of the devices of the filesystem, the latter is a completely
unrelated set of intervals of 64-bit integers. The bug reported occurred
as the smallest objectid was larger than the sum of the device sizes.
The above mentioned commit only fixed the case where the smallest
objectid is nonzero and the largest objectid less than the sum of the
device sizes, but it still trims too little if the largest objectid is
larger than that, and nothing in the reported situation.

The current mapping between the given range and the objectids is thus
clearly broken, so, to fix the bug and as a first step towards a
complete solution, simply ignore the range parameter's start and length
fields and always trim the whole filesystem. (While this makes it
impossible to trim a filesystem only partly, due to the broken mapping
this often didn't work anyway.)

V2:
- Rebased onto 3.9. (still applies to and works with 3.19-rc2)
- Take range->minlen into account.

Reported-by: Lutz Euler <lutz.euler@freenet.de>
Signed-off-by: Lutz Euler <lutz.euler@freenet.de>
---
 fs/btrfs/extent-tree.c |   25 +++++++++++--------------
 1 files changed, 11 insertions(+), 14 deletions(-)

Comments

Lutz Euler Jan. 3, 2015, 4:16 p.m. UTC | #1
Martin Steigerwald wrote:

> I have a 3.19-rc2 with a patch and a working fstrim now:
[...]
> I leave it to the patch author to come up with it on the mailing list :)

That would be me. I have just sent in the patch; please see
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg40618.html

For a discussion about the issue that goes into more detail than the
commit message and the code comments please see this thread:
http://comments.gmane.org/gmane.comp.file-systems.btrfs/15597
At that time I had sent in a first version of this patch but it has not
been applied so far. I made the new version as the old one did not apply
any more since about kernel 3.9., and had a flaw (range->minlen was
being ignored).

I am using this patch on my system since two years without problems with
kernel versions ranging from 3.3 to 3.16.

Looking forward to a review,

Lutz Euler
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Steigerwald Jan. 5, 2015, 4:59 p.m. UTC | #2
Happy new year!

Am Samstag, 3. Januar 2015, 16:30:51 schrieb Lutz Euler:
> Commit 2cac13e41bf5b99ffc426bd28dfd2248df1dfa67, "fix trim 0 bytes after
> a device delete", said:
>   A user reported a bug of btrfs's trim, that is we will trim 0 bytes
>   after a device delete.
> The commit didn't attack the root of the problem so did not fix the bug
> except for a special case.
> 
> For block discard, btrfs_trim_fs directly compares the range passed in
> against the filesystem's objectids. The former is bounded by the sum of
> the sizes of the devices of the filesystem, the latter is a completely
> unrelated set of intervals of 64-bit integers. The bug reported occurred
> as the smallest objectid was larger than the sum of the device sizes.
> The above mentioned commit only fixed the case where the smallest
> objectid is nonzero and the largest objectid less than the sum of the
> device sizes, but it still trims too little if the largest objectid is
> larger than that, and nothing in the reported situation.
> 
> The current mapping between the given range and the objectids is thus
> clearly broken, so, to fix the bug and as a first step towards a
> complete solution, simply ignore the range parameter's start and length
> fields and always trim the whole filesystem. (While this makes it
> impossible to trim a filesystem only partly, due to the broken mapping
> this often didn't work anyway.)
> 
> V2:
> - Rebased onto 3.9. (still applies to and works with 3.19-rc2)
> - Take range->minlen into account.
> 
> Reported-by: Lutz Euler <lutz.euler@freenet.de>
> Signed-off-by: Lutz Euler <lutz.euler@freenet.de>

Is that the patch you send me for testing?

If so, feel free to add:

Reported-and-tested-by: Martin Steigerwald <martin@lichtvoll.de>

If not I can retest with this one.

Thanks,
Martin

> ---
>  fs/btrfs/extent-tree.c |   25 +++++++++++--------------
>  1 files changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index cfb3cf7..81006c1 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8824,26 +8824,23 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range)
>  	u64 start;
>  	u64 end;
>  	u64 trimmed = 0;
> -	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
>  	int ret = 0;
>  
>  	/*
> -	 * try to trim all FS space, our block group may start from non-zero.
> +	 * The range passed in is a subinterval of the interval from 0
> +	 * to the sum of the sizes of the devices of the filesystem.
> +	 * The objectid's used in the filesystem can span any set of
> +	 * subintervals of the interval from 0 to (u64)-1. As there is
> +	 * neither a simple nor an agreed upon mapping between these
> +	 * two ranges we ignore the range parameter's start and len
> +	 * fields and always trim the whole filesystem (that is, only
> +	 * the free space in allocated chunks).
>  	 */
> -	if (range->len == total_bytes)
> -		cache = btrfs_lookup_first_block_group(fs_info, range->start);
> -	else
> -		cache = btrfs_lookup_block_group(fs_info, range->start);
> +	cache = btrfs_lookup_first_block_group(fs_info, 0);
>  
>  	while (cache) {
> -		if (cache->key.objectid >= (range->start + range->len)) {
> -			btrfs_put_block_group(cache);
> -			break;
> -		}
> -
> -		start = max(range->start, cache->key.objectid);
> -		end = min(range->start + range->len,
> -				cache->key.objectid + cache->key.offset);
> +		start = cache->key.objectid;
> +		end = cache->key.objectid + cache->key.offset;
>  
>  		if (end - start >= range->minlen) {
>  			if (!block_group_cache_done(cache)) {
>
Lutz Euler Jan. 5, 2015, 7:29 p.m. UTC | #3
Hello,

happy new year to you, too!

[Martin Steigerwald:]
> Happy new year!
> 
> Am Samstag, 3. Januar 2015, 16:30:51 schrieb Lutz Euler:
> > Commit 2cac13e41bf5b99ffc426bd28dfd2248df1dfa67, "fix trim 0 bytes after
> > a device delete", said:
> >   A user reported a bug of btrfs's trim, that is we will trim 0 bytes
> >   after a device delete.
> > The commit didn't attack the root of the problem so did not fix the bug
> > except for a special case.
> > 
> > For block discard, btrfs_trim_fs directly compares the range passed in
> > against the filesystem's objectids. The former is bounded by the sum of
> > the sizes of the devices of the filesystem, the latter is a completely
> > unrelated set of intervals of 64-bit integers. The bug reported occurred
> > as the smallest objectid was larger than the sum of the device sizes.
> > The above mentioned commit only fixed the case where the smallest
> > objectid is nonzero and the largest objectid less than the sum of the
> > device sizes, but it still trims too little if the largest objectid is
> > larger than that, and nothing in the reported situation.
> > 
> > The current mapping between the given range and the objectids is thus
> > clearly broken, so, to fix the bug and as a first step towards a
> > complete solution, simply ignore the range parameter's start and length
> > fields and always trim the whole filesystem. (While this makes it
> > impossible to trim a filesystem only partly, due to the broken mapping
> > this often didn't work anyway.)
> > 
> > V2:
> > - Rebased onto 3.9. (still applies to and works with 3.19-rc2)
> > - Take range->minlen into account.
> > 
> > Reported-by: Lutz Euler <lutz.euler@freenet.de>
> > Signed-off-by: Lutz Euler <lutz.euler@freenet.de>
> 
> Is that the patch you send me for testing?

Yes, it is.

Kind regards,

Lutz

> If so, feel free to add:
> 
> Reported-and-tested-by: Martin Steigerwald <martin@lichtvoll.de>
> 
> If not I can retest with this one.
> 
> Thanks,
> Martin
> 
> > ---
> >  fs/btrfs/extent-tree.c |   25 +++++++++++--------------
> >  1 files changed, 11 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index cfb3cf7..81006c1 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -8824,26 +8824,23 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range)
> >  	u64 start;
> >  	u64 end;
> >  	u64 trimmed = 0;
> > -	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
> >  	int ret = 0;
> >  
> >  	/*
> > -	 * try to trim all FS space, our block group may start from non-zero.
> > +	 * The range passed in is a subinterval of the interval from 0
> > +	 * to the sum of the sizes of the devices of the filesystem.
> > +	 * The objectid's used in the filesystem can span any set of
> > +	 * subintervals of the interval from 0 to (u64)-1. As there is
> > +	 * neither a simple nor an agreed upon mapping between these
> > +	 * two ranges we ignore the range parameter's start and len
> > +	 * fields and always trim the whole filesystem (that is, only
> > +	 * the free space in allocated chunks).
> >  	 */
> > -	if (range->len == total_bytes)
> > -		cache = btrfs_lookup_first_block_group(fs_info, range->start);
> > -	else
> > -		cache = btrfs_lookup_block_group(fs_info, range->start);
> > +	cache = btrfs_lookup_first_block_group(fs_info, 0);
> >  
> >  	while (cache) {
> > -		if (cache->key.objectid >= (range->start + range->len)) {
> > -			btrfs_put_block_group(cache);
> > -			break;
> > -		}
> > -
> > -		start = max(range->start, cache->key.objectid);
> > -		end = min(range->start + range->len,
> > -				cache->key.objectid + cache->key.offset);
> > +		start = cache->key.objectid;
> > +		end = cache->key.objectid + cache->key.offset;
> >  
> >  		if (end - start >= range->minlen) {
> >  			if (!block_group_cache_done(cache)) {
> > 
> 
> -- 
> Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
> GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Steigerwald May 1, 2015, 10:43 a.m. UTC | #4
Hello!

The following patch fixes the trimming on my /home Dual SSD BTRFS RAID 1
for quite some kernel releases already. Without it, it doesn´t trim
anything, with it, it trims possibly.

What would be required to get this or a similar fix into the kernel?

In either case this gets my:

Reported-and-tested-by: Martin Steigerwald <martin@lichtvoll.de>

I used trimming with this patch countless of times and scrub still tells
all is fine.

Thanks,
Martin

Am Samstag, 3. Januar 2015, 16:30:51 schrieb Lutz Euler:
> Commit 2cac13e41bf5b99ffc426bd28dfd2248df1dfa67, "fix trim 0 bytes after
> a device delete", said:
>   A user reported a bug of btrfs's trim, that is we will trim 0 bytes
>   after a device delete.
> The commit didn't attack the root of the problem so did not fix the bug
> except for a special case.
> 
> For block discard, btrfs_trim_fs directly compares the range passed in
> against the filesystem's objectids. The former is bounded by the sum of
> the sizes of the devices of the filesystem, the latter is a completely
> unrelated set of intervals of 64-bit integers. The bug reported occurred
> as the smallest objectid was larger than the sum of the device sizes.
> The above mentioned commit only fixed the case where the smallest
> objectid is nonzero and the largest objectid less than the sum of the
> device sizes, but it still trims too little if the largest objectid is
> larger than that, and nothing in the reported situation.
> 
> The current mapping between the given range and the objectids is thus
> clearly broken, so, to fix the bug and as a first step towards a
> complete solution, simply ignore the range parameter's start and length
> fields and always trim the whole filesystem. (While this makes it
> impossible to trim a filesystem only partly, due to the broken mapping
> this often didn't work anyway.)
> 
> V2:
> - Rebased onto 3.9. (still applies to and works with 3.19-rc2)
> - Take range->minlen into account.
> 
> Reported-by: Lutz Euler <lutz.euler@freenet.de>
> Signed-off-by: Lutz Euler <lutz.euler@freenet.de>
> ---
>  fs/btrfs/extent-tree.c |   25 +++++++++++--------------
>  1 files changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index cfb3cf7..81006c1 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8824,26 +8824,23 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range)
>  	u64 start;
>  	u64 end;
>  	u64 trimmed = 0;
> -	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
>  	int ret = 0;
>  
>  	/*
> -	 * try to trim all FS space, our block group may start from non-zero.
> +	 * The range passed in is a subinterval of the interval from 0
> +	 * to the sum of the sizes of the devices of the filesystem.
> +	 * The objectid's used in the filesystem can span any set of
> +	 * subintervals of the interval from 0 to (u64)-1. As there is
> +	 * neither a simple nor an agreed upon mapping between these
> +	 * two ranges we ignore the range parameter's start and len
> +	 * fields and always trim the whole filesystem (that is, only
> +	 * the free space in allocated chunks).
>  	 */
> -	if (range->len == total_bytes)
> -		cache = btrfs_lookup_first_block_group(fs_info, range->start);
> -	else
> -		cache = btrfs_lookup_block_group(fs_info, range->start);
> +	cache = btrfs_lookup_first_block_group(fs_info, 0);
>  
>  	while (cache) {
> -		if (cache->key.objectid >= (range->start + range->len)) {
> -			btrfs_put_block_group(cache);
> -			break;
> -		}
> -
> -		start = max(range->start, cache->key.objectid);
> -		end = min(range->start + range->len,
> -				cache->key.objectid + cache->key.offset);
> +		start = cache->key.objectid;
> +		end = cache->key.objectid + cache->key.offset;
>  
>  		if (end - start >= range->minlen) {
>  			if (!block_group_cache_done(cache)) {
>
Rich Freeman May 19, 2015, 3:18 p.m. UTC | #5
On Sat, Jan 3, 2015 at 11:16 AM, Lutz Euler <lutz.euler@freenet.de> wrote:
> Martin Steigerwald wrote:
>
>> I have a 3.19-rc2 with a patch and a working fstrim now:
> [...]
>> I leave it to the patch author to come up with it on the mailing list :)
>
> That would be me. I have just sent in the patch; please see
> http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg40618.html
>
> For a discussion about the issue that goes into more detail than the
> commit message and the code comments please see this thread:
> http://comments.gmane.org/gmane.comp.file-systems.btrfs/15597
> At that time I had sent in a first version of this patch but it has not
> been applied so far. I made the new version as the old one did not apply
> any more since about kernel 3.9., and had a flaw (range->minlen was
> being ignored).
>
> I am using this patch on my system since two years without problems with
> kernel versions ranging from 3.3 to 3.16.
>
> Looking forward to a review,
>

Did this go anywhere?  I don't see this in 3.18.13 unless I'm missing
it, and I also have been unable to fstrim my ssd btrfs volume for as
long as I've had it on an SSD.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index cfb3cf7..81006c1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8824,26 +8824,23 @@  int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range)
 	u64 start;
 	u64 end;
 	u64 trimmed = 0;
-	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
 	int ret = 0;
 
 	/*
-	 * try to trim all FS space, our block group may start from non-zero.
+	 * The range passed in is a subinterval of the interval from 0
+	 * to the sum of the sizes of the devices of the filesystem.
+	 * The objectid's used in the filesystem can span any set of
+	 * subintervals of the interval from 0 to (u64)-1. As there is
+	 * neither a simple nor an agreed upon mapping between these
+	 * two ranges we ignore the range parameter's start and len
+	 * fields and always trim the whole filesystem (that is, only
+	 * the free space in allocated chunks).
 	 */
-	if (range->len == total_bytes)
-		cache = btrfs_lookup_first_block_group(fs_info, range->start);
-	else
-		cache = btrfs_lookup_block_group(fs_info, range->start);
+	cache = btrfs_lookup_first_block_group(fs_info, 0);
 
 	while (cache) {
-		if (cache->key.objectid >= (range->start + range->len)) {
-			btrfs_put_block_group(cache);
-			break;
-		}
-
-		start = max(range->start, cache->key.objectid);
-		end = min(range->start + range->len,
-				cache->key.objectid + cache->key.offset);
+		start = cache->key.objectid;
+		end = cache->key.objectid + cache->key.offset;
 
 		if (end - start >= range->minlen) {
 			if (!block_group_cache_done(cache)) {