diff mbox

Btrfs: only exclude supers in the range of our block group V2

Message ID

1366742934-4214-1-git-send-email-jbacik@fusionio.com (mailing list archive)

State

New, archived

Headers

show

From: Josef Bacik <jbacik@fusionio.com>
To: <linux-btrfs@vger.kernel.org>
Subject: [PATCH] Btrfs: only exclude supers in the range of our block group
	V2
Date: Tue, 23 Apr 2013 14:48:54 -0400
Message-ID: <1366742934-4214-1-git-send-email-jbacik@fusionio.com>
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Josef Bacik April 23, 2013, 6:48 p.m. UTC

If we fail to load block groups halfway through we can leave extent_state's on
the excluded tree.  This is because we just lookup the supers and add them to
the excluded tree regardless of which block group we are looking at currently.
This is a problem because we remove the excluded extents for the range of the
block group only, so if we don't ever load a block group for one of the excluded
extents we won't ever free it.  This fixes the problem by only adding excluded
extents if it falls in the block group range we care about.  With this patch
we're no longer leaking space when we fail to read all of the block groups.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
---
V1->V2: fixed a slight problem where i should have been comparing to the end of
hte block group not the begining.

 fs/btrfs/extent-tree.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

Comments

Liu Bo April 24, 2013, 8:57 a.m. UTC | #1

On Tue, Apr 23, 2013 at 02:48:54PM -0400, Josef Bacik wrote:
> If we fail to load block groups halfway through we can leave extent_state's on
> the excluded tree.  This is because we just lookup the supers and add them to
> the excluded tree regardless of which block group we are looking at currently.
> This is a problem because we remove the excluded extents for the range of the
> block group only, so if we don't ever load a block group for one of the excluded
> extents we won't ever free it.  This fixes the problem by only adding excluded
> extents if it falls in the block group range we care about.  With this patch
> we're no longer leaking space when we fail to read all of the block groups.
> Thanks,
> 
> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> ---
> V1->V2: fixed a slight problem where i should have been comparing to the end of
> hte block group not the begining.
> 
>  fs/btrfs/extent-tree.c |   24 +++++++++++++++++++++---
>  1 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index b441be3..a81f689 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -270,9 +270,27 @@ static int exclude_super_stripes(struct btrfs_root *root,
>  			return ret;
>  
>  		while (nr--) {
> -			cache->bytes_super += stripe_len;
> -			ret = add_excluded_extent(root, logical[nr],
> -						  stripe_len);
> +			u64 start, len;
> +
> +			if (logical[nr] > cache->key.objectid +
> +			    cache->key.offset)
> +				continue;
> +
> +			if (logical[nr] + stripe_len <= cache->key.objectid)
> +				continue;

hmm...I just doubt that these two cases can happen.

btrfs_rmap_block() ensures that logical[nr] will be larger than
cache->key.objectid.

Am I missing something?

thanks,
liubo

> +
> +			start = logical[nr];
> +			if (start < cache->key.objectid) {
> +				start = cache->key.objectid;
> +				len = (logical[nr] + stripe_len) - start;
> +			} else {
> +				len = min_t(u64, stripe_len,
> +					    cache->key.objectid +
> +					    cache->key.offset - start);
> +			}
> +
> +			cache->bytes_super += len;
> +			ret = add_excluded_extent(root, start, len);
>  			if (ret) {
>  				kfree(logical);
>  				return ret;
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik April 24, 2013, 1 p.m. UTC | #2

On Wed, Apr 24, 2013 at 02:57:40AM -0600, Liu Bo wrote:
> On Tue, Apr 23, 2013 at 02:48:54PM -0400, Josef Bacik wrote:
> > If we fail to load block groups halfway through we can leave extent_state's on
> > the excluded tree.  This is because we just lookup the supers and add them to
> > the excluded tree regardless of which block group we are looking at currently.
> > This is a problem because we remove the excluded extents for the range of the
> > block group only, so if we don't ever load a block group for one of the excluded
> > extents we won't ever free it.  This fixes the problem by only adding excluded
> > extents if it falls in the block group range we care about.  With this patch
> > we're no longer leaking space when we fail to read all of the block groups.
> > Thanks,
> > 
> > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > ---
> > V1->V2: fixed a slight problem where i should have been comparing to the end of
> > hte block group not the begining.
> > 
> >  fs/btrfs/extent-tree.c |   24 +++++++++++++++++++++---
> >  1 files changed, 21 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index b441be3..a81f689 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -270,9 +270,27 @@ static int exclude_super_stripes(struct btrfs_root *root,
> >  			return ret;
> >  
> >  		while (nr--) {
> > -			cache->bytes_super += stripe_len;
> > -			ret = add_excluded_extent(root, logical[nr],
> > -						  stripe_len);
> > +			u64 start, len;
> > +
> > +			if (logical[nr] > cache->key.objectid +
> > +			    cache->key.offset)
> > +				continue;
> > +
> > +			if (logical[nr] + stripe_len <= cache->key.objectid)
> > +				continue;
> 
> hmm...I just doubt that these two cases can happen.
> 
> btrfs_rmap_block() ensures that logical[nr] will be larger than
> cache->key.objectid.
> 
> Am I missing something?

Yeah, we can still get ranges that are past the end of the cache, just put a
printk in there and you'll see it happen.  Now it's not likely that a logical
will be less than the start but better safe than sorry.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo April 24, 2013, 2:43 p.m. UTC | #3

On Wed, Apr 24, 2013 at 09:00:16AM -0400, Josef Bacik wrote:
> On Wed, Apr 24, 2013 at 02:57:40AM -0600, Liu Bo wrote:
> > On Tue, Apr 23, 2013 at 02:48:54PM -0400, Josef Bacik wrote:
> > > If we fail to load block groups halfway through we can leave extent_state's on
> > > the excluded tree.  This is because we just lookup the supers and add them to
> > > the excluded tree regardless of which block group we are looking at currently.
> > > This is a problem because we remove the excluded extents for the range of the
> > > block group only, so if we don't ever load a block group for one of the excluded
> > > extents we won't ever free it.  This fixes the problem by only adding excluded
> > > extents if it falls in the block group range we care about.  With this patch
> > > we're no longer leaking space when we fail to read all of the block groups.
> > > Thanks,
> > > 
> > > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > > ---
> > > V1->V2: fixed a slight problem where i should have been comparing to the end of
> > > hte block group not the begining.
> > > 
> > >  fs/btrfs/extent-tree.c |   24 +++++++++++++++++++++---
> > >  1 files changed, 21 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > index b441be3..a81f689 100644
> > > --- a/fs/btrfs/extent-tree.c
> > > +++ b/fs/btrfs/extent-tree.c
> > > @@ -270,9 +270,27 @@ static int exclude_super_stripes(struct btrfs_root *root,
> > >  			return ret;
> > >  
> > >  		while (nr--) {
> > > -			cache->bytes_super += stripe_len;
> > > -			ret = add_excluded_extent(root, logical[nr],
> > > -						  stripe_len);
> > > +			u64 start, len;
> > > +
> > > +			if (logical[nr] > cache->key.objectid +
> > > +			    cache->key.offset)
> > > +				continue;
> > > +
> > > +			if (logical[nr] + stripe_len <= cache->key.objectid)
> > > +				continue;
> > 
> > hmm...I just doubt that these two cases can happen.
> > 
> > btrfs_rmap_block() ensures that logical[nr] will be larger than
> > cache->key.objectid.
> > 
> > Am I missing something?
> 
> Yeah, we can still get ranges that are past the end of the cache, just put a
> printk in there and you'll see it happen.  Now it's not likely that a logical
> will be less than the start but better safe than sorry.  Thanks,
> 

But if it's really past the end of the cache, there might be something wrong in
btrfs_rmap_block() IMO.

Ok, I'll dig it somehow.

thanks,
liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik April 24, 2013, 2:48 p.m. UTC | #4

On Wed, Apr 24, 2013 at 08:43:21AM -0600, Liu Bo wrote:
> On Wed, Apr 24, 2013 at 09:00:16AM -0400, Josef Bacik wrote:
> > On Wed, Apr 24, 2013 at 02:57:40AM -0600, Liu Bo wrote:
> > > On Tue, Apr 23, 2013 at 02:48:54PM -0400, Josef Bacik wrote:
> > > > If we fail to load block groups halfway through we can leave extent_state's on
> > > > the excluded tree.  This is because we just lookup the supers and add them to
> > > > the excluded tree regardless of which block group we are looking at currently.
> > > > This is a problem because we remove the excluded extents for the range of the
> > > > block group only, so if we don't ever load a block group for one of the excluded
> > > > extents we won't ever free it.  This fixes the problem by only adding excluded
> > > > extents if it falls in the block group range we care about.  With this patch
> > > > we're no longer leaking space when we fail to read all of the block groups.
> > > > Thanks,
> > > > 
> > > > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > > > ---
> > > > V1->V2: fixed a slight problem where i should have been comparing to the end of
> > > > hte block group not the begining.
> > > > 
> > > >  fs/btrfs/extent-tree.c |   24 +++++++++++++++++++++---
> > > >  1 files changed, 21 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > > index b441be3..a81f689 100644
> > > > --- a/fs/btrfs/extent-tree.c
> > > > +++ b/fs/btrfs/extent-tree.c
> > > > @@ -270,9 +270,27 @@ static int exclude_super_stripes(struct btrfs_root *root,
> > > >  			return ret;
> > > >  
> > > >  		while (nr--) {
> > > > -			cache->bytes_super += stripe_len;
> > > > -			ret = add_excluded_extent(root, logical[nr],
> > > > -						  stripe_len);
> > > > +			u64 start, len;
> > > > +
> > > > +			if (logical[nr] > cache->key.objectid +
> > > > +			    cache->key.offset)
> > > > +				continue;
> > > > +
> > > > +			if (logical[nr] + stripe_len <= cache->key.objectid)
> > > > +				continue;
> > > 
> > > hmm...I just doubt that these two cases can happen.
> > > 
> > > btrfs_rmap_block() ensures that logical[nr] will be larger than
> > > cache->key.objectid.
> > > 
> > > Am I missing something?
> > 
> > Yeah, we can still get ranges that are past the end of the cache, just put a
> > printk in there and you'll see it happen.  Now it's not likely that a logical
> > will be less than the start but better safe than sorry.  Thanks,
> > 
> 
> But if it's really past the end of the cache, there might be something wrong in
> btrfs_rmap_block() IMO.
> 
> Ok, I'll dig it somehow.
> 

It's doing it right, we just loop through all of the supers, which we have no
idea where they show up logically.  It's not a problem with rmap, it's doing the
right thing, we just need to add this extra check because rmap is not bounded in
its logical search.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo April 25, 2013, 3:48 a.m. UTC | #5

On Wed, Apr 24, 2013 at 10:48:09AM -0400, Josef Bacik wrote:
> On Wed, Apr 24, 2013 at 08:43:21AM -0600, Liu Bo wrote:
> > On Wed, Apr 24, 2013 at 09:00:16AM -0400, Josef Bacik wrote:
> > > On Wed, Apr 24, 2013 at 02:57:40AM -0600, Liu Bo wrote:
> > > > On Tue, Apr 23, 2013 at 02:48:54PM -0400, Josef Bacik wrote:
> > > > > If we fail to load block groups halfway through we can leave extent_state's on
> > > > > the excluded tree.  This is because we just lookup the supers and add them to
> > > > > the excluded tree regardless of which block group we are looking at currently.
> > > > > This is a problem because we remove the excluded extents for the range of the
> > > > > block group only, so if we don't ever load a block group for one of the excluded
> > > > > extents we won't ever free it.  This fixes the problem by only adding excluded
> > > > > extents if it falls in the block group range we care about.  With this patch
> > > > > we're no longer leaking space when we fail to read all of the block groups.
> > > > > Thanks,
> > > > > 
> > > > > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > > > > ---
> > > > > V1->V2: fixed a slight problem where i should have been comparing to the end of
> > > > > hte block group not the begining.
> > > > > 
> > > > >  fs/btrfs/extent-tree.c |   24 +++++++++++++++++++++---
> > > > >  1 files changed, 21 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > > > index b441be3..a81f689 100644
> > > > > --- a/fs/btrfs/extent-tree.c
> > > > > +++ b/fs/btrfs/extent-tree.c
> > > > > @@ -270,9 +270,27 @@ static int exclude_super_stripes(struct btrfs_root *root,
> > > > >  			return ret;
> > > > >  
> > > > >  		while (nr--) {
> > > > > -			cache->bytes_super += stripe_len;
> > > > > -			ret = add_excluded_extent(root, logical[nr],
> > > > > -						  stripe_len);
> > > > > +			u64 start, len;
> > > > > +
> > > > > +			if (logical[nr] > cache->key.objectid +
> > > > > +			    cache->key.offset)
> > > > > +				continue;
> > > > > +
> > > > > +			if (logical[nr] + stripe_len <= cache->key.objectid)
> > > > > +				continue;
> > > > 
> > > > hmm...I just doubt that these two cases can happen.
> > > > 
> > > > btrfs_rmap_block() ensures that logical[nr] will be larger than
> > > > cache->key.objectid.
> > > > 
> > > > Am I missing something?
> > > 
> > > Yeah, we can still get ranges that are past the end of the cache, just put a
> > > printk in there and you'll see it happen.  Now it's not likely that a logical
> > > will be less than the start but better safe than sorry.  Thanks,
> > > 
> > 
> > But if it's really past the end of the cache, there might be something wrong in
> > btrfs_rmap_block() IMO.
> > 
> > Ok, I'll dig it somehow.
> > 
> 
> It's doing it right, we just loop through all of the supers, which we have no
> idea where they show up logically.  It's not a problem with rmap, it's doing the
> right thing, we just need to add this extra check because rmap is not bounded in
> its logical search.  Thanks,

Sorry, I still didn't get how this happens...

I'll try to create new btrfs with raid0, raid1, raid10, raid5, raid6...

Could you please show me the testcase or something so that I can persuade
myself?

thanks,
liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik April 25, 2013, 1:04 p.m. UTC | #6

On Wed, Apr 24, 2013 at 09:48:27PM -0600, Liu Bo wrote:
> On Wed, Apr 24, 2013 at 10:48:09AM -0400, Josef Bacik wrote:
> > On Wed, Apr 24, 2013 at 08:43:21AM -0600, Liu Bo wrote:
> > > On Wed, Apr 24, 2013 at 09:00:16AM -0400, Josef Bacik wrote:
> > > > On Wed, Apr 24, 2013 at 02:57:40AM -0600, Liu Bo wrote:
> > > > > On Tue, Apr 23, 2013 at 02:48:54PM -0400, Josef Bacik wrote:
> > > > > > If we fail to load block groups halfway through we can leave extent_state's on
> > > > > > the excluded tree.  This is because we just lookup the supers and add them to
> > > > > > the excluded tree regardless of which block group we are looking at currently.
> > > > > > This is a problem because we remove the excluded extents for the range of the
> > > > > > block group only, so if we don't ever load a block group for one of the excluded
> > > > > > extents we won't ever free it.  This fixes the problem by only adding excluded
> > > > > > extents if it falls in the block group range we care about.  With this patch
> > > > > > we're no longer leaking space when we fail to read all of the block groups.
> > > > > > Thanks,
> > > > > > 
> > > > > > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > > > > > ---
> > > > > > V1->V2: fixed a slight problem where i should have been comparing to the end of
> > > > > > hte block group not the begining.
> > > > > > 
> > > > > >  fs/btrfs/extent-tree.c |   24 +++++++++++++++++++++---
> > > > > >  1 files changed, 21 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > > > > index b441be3..a81f689 100644
> > > > > > --- a/fs/btrfs/extent-tree.c
> > > > > > +++ b/fs/btrfs/extent-tree.c
> > > > > > @@ -270,9 +270,27 @@ static int exclude_super_stripes(struct btrfs_root *root,
> > > > > >  			return ret;
> > > > > >  
> > > > > >  		while (nr--) {
> > > > > > -			cache->bytes_super += stripe_len;
> > > > > > -			ret = add_excluded_extent(root, logical[nr],
> > > > > > -						  stripe_len);
> > > > > > +			u64 start, len;
> > > > > > +
> > > > > > +			if (logical[nr] > cache->key.objectid +
> > > > > > +			    cache->key.offset)
> > > > > > +				continue;
> > > > > > +
> > > > > > +			if (logical[nr] + stripe_len <= cache->key.objectid)
> > > > > > +				continue;
> > > > > 
> > > > > hmm...I just doubt that these two cases can happen.
> > > > > 
> > > > > btrfs_rmap_block() ensures that logical[nr] will be larger than
> > > > > cache->key.objectid.
> > > > > 
> > > > > Am I missing something?
> > > > 
> > > > Yeah, we can still get ranges that are past the end of the cache, just put a
> > > > printk in there and you'll see it happen.  Now it's not likely that a logical
> > > > will be less than the start but better safe than sorry.  Thanks,
> > > > 
> > > 
> > > But if it's really past the end of the cache, there might be something wrong in
> > > btrfs_rmap_block() IMO.
> > > 
> > > Ok, I'll dig it somehow.
> > > 
> > 
> > It's doing it right, we just loop through all of the supers, which we have no
> > idea where they show up logically.  It's not a problem with rmap, it's doing the
> > right thing, we just need to add this extra check because rmap is not bounded in
> > its logical search.  Thanks,
> 
> Sorry, I still didn't get how this happens...
> 
> I'll try to create new btrfs with raid0, raid1, raid10, raid5, raid6...
> 
> Could you please show me the testcase or something so that I can persuade
> myself?
> 

Ok I see what happened, I was using an old btrfs-image which makes one big chunk
to cover the entire file system, so that is how I was getting logical values
higher than the cache.  So not a normal case for sure, but since it is possible
for it to happen in a bad file system situation we should still leave it.
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba April 25, 2013, 1:27 p.m. UTC | #7

On Thu, Apr 25, 2013 at 09:04:13AM -0400, Josef Bacik wrote:
> > Sorry, I still didn't get how this happens...
> > 
> > I'll try to create new btrfs with raid0, raid1, raid10, raid5, raid6...
> > 
> > Could you please show me the testcase or something so that I can persuade
> > myself?
> > 
> 
> Ok I see what happened, I was using an old btrfs-image which makes one big chunk
> to cover the entire file system, so that is how I was getting logical values
> higher than the cache.  So not a normal case for sure, but since it is possible
> for it to happen in a bad file system situation we should still leave it.

I agree, various versions of btrfs-progs are scattered among distros,
kernel would be more robust when mounting a fs image created by an
unknown btrfs-image versoin.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff mbox

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b441be3..a81f689 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -270,9 +270,27 @@  static int exclude_super_stripes(struct btrfs_root *root,
 			return ret;
 
 		while (nr--) {
-			cache->bytes_super += stripe_len;
-			ret = add_excluded_extent(root, logical[nr],
-						  stripe_len);
+			u64 start, len;
+
+			if (logical[nr] > cache->key.objectid +
+			    cache->key.offset)
+				continue;
+
+			if (logical[nr] + stripe_len <= cache->key.objectid)
+				continue;
+
+			start = logical[nr];
+			if (start < cache->key.objectid) {
+				start = cache->key.objectid;
+				len = (logical[nr] + stripe_len) - start;
+			} else {
+				len = min_t(u64, stripe_len,
+					    cache->key.objectid +
+					    cache->key.offset - start);
+			}
+
+			cache->bytes_super += len;
+			ret = add_excluded_extent(root, start, len);
 			if (ret) {
 				kfree(logical);
 				return ret;