Still getting a lot of -28 (ENOSPC?) errors during balance

Message ID	20130402134626.GO1876@localhost.localdomain (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@vger.kernel.org> Date: Tue, 2 Apr 2013 09:46:26 -0400 From: Josef Bacik <jbacik@fusionio.com> To: Roman Mamedov <rm@romanrm.ru> CC: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org> Subject: Re: Still getting a lot of -28 (ENOSPC?) errors during balance Message-ID: <20130402134626.GO1876@localhost.localdomain> References: <20130402140452.2b71794f@natsu> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20130402140452.2b71794f@natsu> User-Agent: Mutt/1.5.21 (2011-07-01) Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk

Message ID

20130402134626.GO1876@localhost.localdomain (mailing list archive)

State

New, archived

Headers

Date: Tue, 2 Apr 2013 09:46:26 -0400
From: Josef Bacik <jbacik@fusionio.com>
To: Roman Mamedov <rm@romanrm.ru>
CC: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Still getting a lot of -28 (ENOSPC?) errors during balance
Message-ID: <20130402134626.GO1876@localhost.localdomain>
References: <20130402140452.2b71794f@natsu>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <20130402140452.2b71794f@natsu>
User-Agent: Mutt/1.5.21 (2011-07-01)
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Josef Bacik April 2, 2013, 1:46 p.m. UTC

On Tue, Apr 02, 2013 at 02:04:52AM -0600, Roman Mamedov wrote:
> Hello,
> 
> With kernel 3.7.10 patched with "Btrfs: limit the global reserve to 512mb".
> (the problem was occuring also without this patch, but seemed to be even worse).
> 
> At the start of balance:
> 
> Data: total=31.85GB, used=9.96GB
> System: total=4.00MB, used=16.00KB
> Metadata: total=1.01GB, used=696.17MB
> 
> "btrfs balance start -musage=5 -dusage=5" is going on for about 50 minutes
> 
> Current situation:
> 
> Balance on '/mnt/r1/' is running
> 1 out of about 2 chunks balanced (20 considered),  50% left
> 
> Data: total=30.85GB, used=10.04GB
> System: total=4.00MB, used=16.00KB
> Metadata: total=1.01GB, used=851.69MB
> 
> And a constant stream of these in dmesg:
> 

Can you try this out and see if it helps?  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Roman Mamedov April 2, 2013, 4:55 p.m. UTC | #1

On Tue, 2 Apr 2013 09:46:26 -0400
Josef Bacik <jbacik@fusionio.com> wrote:

> On Tue, Apr 02, 2013 at 02:04:52AM -0600, Roman Mamedov wrote:
> > Hello,
> > 
> > With kernel 3.7.10 patched with "Btrfs: limit the global reserve to 512mb".
> > (the problem was occuring also without this patch, but seemed to be even worse).
> > 
> > At the start of balance:
> > 
> > Data: total=31.85GB, used=9.96GB
> > System: total=4.00MB, used=16.00KB
> > Metadata: total=1.01GB, used=696.17MB
> > 
> > "btrfs balance start -musage=5 -dusage=5" is going on for about 50 minutes
> > 
> > Current situation:
> > 
> > Balance on '/mnt/r1/' is running
> > 1 out of about 2 chunks balanced (20 considered),  50% left
> > 
> > Data: total=30.85GB, used=10.04GB
> > System: total=4.00MB, used=16.00KB
> > Metadata: total=1.01GB, used=851.69MB
> > 
> > And a constant stream of these in dmesg:
> > 
> 
> Can you try this out and see if it helps?  Thanks,

Hello,

Well that balance has now completed, and unfortunately I don't have a complete
image of the filesystem from before, to apply the patch and check if the same
operation goes better this time.

I'll keep it in mind and will try to test it out if I will get a similar
situation again on some filesystem.

Generally what seems to make me run into various problems with balance, is the
following usage scenario: On an active filesystem (used as /home and root FS),
a snapshot is made every 30 minutes with an unique (timestamped) name; and once
a day snapshots from more than two days ago are purged. And it goes like this
for months.

Another variant of this, a backup partition, where snapshots are made every six
hours, and all snapshots are kept for 1-3 months before getting purged.

I guess this kind of usage causes a lot of internal fragmentation or
something, which makes it difficult for a balance to find enough free space to
work with.

> 
> Josef
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0d89ff0..9830e86 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -2548,6 +2548,13 @@ static int do_relocation(struct btrfs_trans_handle *trans,
>  	list_for_each_entry(edge, &node->upper, list[LOWER]) {
>  		cond_resched();
>  
> +		ret = btrfs_block_rsv_refill(rc->extent_root, rc->block_rsv,
> +					     rc->extent_root->leafsize,
> +					     BTRFS_RESERVE_FLUSH_ALL);
> +		if (ret) {
> +			err = ret;
> +			break;
> +		}
>  		upper = edge->node[UPPER];
>  		root = select_reloc_root(trans, rc, upper, edges, &nr);
>  		BUG_ON(!root);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik April 2, 2013, 5 p.m. UTC | #2

On Tue, Apr 02, 2013 at 10:55:04AM -0600, Roman Mamedov wrote:
> On Tue, 2 Apr 2013 09:46:26 -0400
> Josef Bacik <jbacik@fusionio.com> wrote:
> 
> > On Tue, Apr 02, 2013 at 02:04:52AM -0600, Roman Mamedov wrote:
> > > Hello,
> > > 
> > > With kernel 3.7.10 patched with "Btrfs: limit the global reserve to 512mb".
> > > (the problem was occuring also without this patch, but seemed to be even worse).
> > > 
> > > At the start of balance:
> > > 
> > > Data: total=31.85GB, used=9.96GB
> > > System: total=4.00MB, used=16.00KB
> > > Metadata: total=1.01GB, used=696.17MB
> > > 
> > > "btrfs balance start -musage=5 -dusage=5" is going on for about 50 minutes
> > > 
> > > Current situation:
> > > 
> > > Balance on '/mnt/r1/' is running
> > > 1 out of about 2 chunks balanced (20 considered),  50% left
> > > 
> > > Data: total=30.85GB, used=10.04GB
> > > System: total=4.00MB, used=16.00KB
> > > Metadata: total=1.01GB, used=851.69MB
> > > 
> > > And a constant stream of these in dmesg:
> > > 
> > 
> > Can you try this out and see if it helps?  Thanks,
> 
> Hello,
> 
> Well that balance has now completed, and unfortunately I don't have a complete
> image of the filesystem from before, to apply the patch and check if the same
> operation goes better this time.
> 
> I'll keep it in mind and will try to test it out if I will get a similar
> situation again on some filesystem.
> 
> Generally what seems to make me run into various problems with balance, is the
> following usage scenario: On an active filesystem (used as /home and root FS),
> a snapshot is made every 30 minutes with an unique (timestamped) name; and once
> a day snapshots from more than two days ago are purged. And it goes like this
> for months.
> 
> Another variant of this, a backup partition, where snapshots are made every six
> hours, and all snapshots are kept for 1-3 months before getting purged.
> 
> I guess this kind of usage causes a lot of internal fragmentation or
> something, which makes it difficult for a balance to find enough free space to
> work with.
> 

Well one thing to keep in mind is that these warnings are truly just warnings,
it finds space and uses it, it's just our internal space reservation
calculations are coming up short so it's letting us know we need to adjust our
math.  So really we need to sit down and adjust how balance does it's
reservation stuff so we can stop these warnings from happening at all, but it
doesn't affect the actual balance other than making it super noisy and slow.
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0d89ff0..9830e86 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2548,6 +2548,13 @@  static int do_relocation(struct btrfs_trans_handle *trans,
 	list_for_each_entry(edge, &node->upper, list[LOWER]) {
 		cond_resched();
 
+		ret = btrfs_block_rsv_refill(rc->extent_root, rc->block_rsv,
+					     rc->extent_root->leafsize,
+					     BTRFS_RESERVE_FLUSH_ALL);
+		if (ret) {
+			err = ret;
+			break;
+		}
 		upper = edge->node[UPPER];
 		root = select_reloc_root(trans, rc, upper, edges, &nr);
 		BUG_ON(!root);

Still getting a lot of -28 (ENOSPC?) errors during balance

Commit Message

Comments

Patch