diff mbox

[V2,10/10] Btrfs: reclaim the reserved metadata space at background

Message ID 1394085304-32589-10-git-send-email-miaox@cn.fujitsu.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Miao Xie March 6, 2014, 5:55 a.m. UTC
Before applying this patch, the task had to reclaim the metadata space
by itself if the metadata space was not enough. And When the task started
the space reclamation, all the other tasks which wanted to reserve the
metadata space were blocked. At some cases, they would be blocked for
a long time, it made the performance fluctuate wildly.

So we introduce the background metadata space reclamation, when the space
is about to be exhausted, we insert a reclaim work into the workqueue, the
worker of the workqueue helps us to reclaim the reserved space at the
background. By this way, the tasks needn't reclaim the space by themselves at
most cases, and even if the tasks have to reclaim the space or are blocked
for the space reclamation, they will get enough space more quickly.

We needn't worry about the early enospc problem because all the reclaim work
is serialized by the lock.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
I just do some simple test now, I'll do more performance test and send out
the result.

Changelog v1 -> v2:
- change the reclaim size.
---
 fs/btrfs/ctree.h       |  6 +++
 fs/btrfs/disk-io.c     |  3 ++
 fs/btrfs/extent-tree.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/super.c       |  1 +
 4 files changed, 108 insertions(+), 1 deletion(-)

Comments

Josef Bacik March 10, 2014, 1:35 p.m. UTC | #1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/06/2014 12:55 AM, Miao Xie wrote:
> Before applying this patch, the task had to reclaim the metadata
> space by itself if the metadata space was not enough. And When the
> task started the space reclamation, all the other tasks which
> wanted to reserve the metadata space were blocked. At some cases,
> they would be blocked for a long time, it made the performance
> fluctuate wildly.
> 
> So we introduce the background metadata space reclamation, when the
> space is about to be exhausted, we insert a reclaim work into the
> workqueue, the worker of the workqueue helps us to reclaim the
> reserved space at the background. By this way, the tasks needn't
> reclaim the space by themselves at most cases, and even if the
> tasks have to reclaim the space or are blocked for the space
> reclamation, they will get enough space more quickly.
> 
> We needn't worry about the early enospc problem because all the
> reclaim work is serialized by the lock.
> 
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

This causes generic/015 to fail with early enospc, I'm kicking this
patch out, I'll take the rest.  Thanks,

Josef

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTHb+NAAoJEANb+wAKly3BCcUP/jGmW85hiurfTF7eom+wzDcr
nxqvdTB/F21UJU1RRrb92CdYRYb9d4hHKhXE5OK+qamE+K55GEtgCUWCLQgDfJJL
Wx0aUD/pTqv3J5S5zM43UBJkn2ZR99Q7hJzm9PPMSMn7hBgK87QUEme8HerCPUgY
0VS4OcqUGhg88qO8GjdEFLnHawhjMDw9iGPUi+tMdCEmr9aQQo8ntiahdVKyTHej
vSRQRs0igvAt73OWHXiP6vc4LOQdu1vKCFdbxhgg+duKjNOHfUoaiiaUiGhWIA9l
BcTWd62bEJNOaXd6k06GzhpCWzMM6faTLfjI6XADUFY0VZ79akzk2KAO6YdaLz8w
3IAKN1chTpr7q7oPuRDgDQuwwdeLPImN29CKlAF3jlSRJEblM8CKoXYD1fyqVwDy
c1mA6mMUJnEnXrkJ/Pb5zuNIZMAlU+v3d6CCjYKHMACORvJeZVlg9gLLMATaAJIA
xLjFlzbgSbp/OUNuBuS4YGIaa51aAyODd2h1T3E+T5JYbVkA39N3Ni9HODE8AuSE
E6U/06FK47L0e5uGFrM3tMTL0XBF62C1iml4NsjOWgiERz8lFDdFVArgXamCVacM
1+VdeLLS88RHFEuwlMBy/ZQBdnvWCVsNVjYukuxntmWbSWrsLUFUSzExWnp+7TAO
xkEd2yMw75yasTVGKSXU
=Q/fM
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Miao Xie May 8, 2014, 2:06 a.m. UTC | #2
On Mon, 10 Mar 2014 09:35:13 -0400, Josef Bacik wrote:
> On 03/06/2014 12:55 AM, Miao Xie wrote:
>> Before applying this patch, the task had to reclaim the metadata
>> space by itself if the metadata space was not enough. And When the
>> task started the space reclamation, all the other tasks which
>> wanted to reserve the metadata space were blocked. At some cases,
>> they would be blocked for a long time, it made the performance
>> fluctuate wildly.
>>
>> So we introduce the background metadata space reclamation, when the
>> space is about to be exhausted, we insert a reclaim work into the
>> workqueue, the worker of the workqueue helps us to reclaim the
>> reserved space at the background. By this way, the tasks needn't
>> reclaim the space by themselves at most cases, and even if the
>> tasks have to reclaim the space or are blocked for the space
>> reclamation, they will get enough space more quickly.
>>
>> We needn't worry about the early enospc problem because all the
>> reclaim work is serialized by the lock.
>>
>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> 
> This causes generic/015 to fail with early enospc, I'm kicking this
> patch out, I'll take the rest.  Thanks,

It is not early enospc problem.

This test is to check that the space of the file is released immediately
or not after the file is deleted. In fact, the result of the test is
unstable, because the kernel may be syncing the file data when we delete
it, if so the space of file would not be released immediately.

But the case I said above is rare because the size of fs in this test is
just 50MB, and the memory size of the most machine is very large(maybe > 1GB),
that is the dirty pages is not so many, the background flusher may not
be waked up immediately, so no one holds the inode of the test file after
we delete it, and then the space of it can be released immediately.

After applying this patch, we will flush the dirty pages because our background
metadata space reclaimer finds that the metadata space is going to be used up
(< 5% of the total metadata size), and need flush dirty pages to reclaim some
delalloc metadata space. that is this patch makes the above case happen easily.

Anyway, we need improve this patch though it is not a bug. I will send out
a new one.

Thanks
Miao

> 
> Josef
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iQIcBAEBAgAGBQJTHb+NAAoJEANb+wAKly3BCcUP/jGmW85hiurfTF7eom+wzDcr
> nxqvdTB/F21UJU1RRrb92CdYRYb9d4hHKhXE5OK+qamE+K55GEtgCUWCLQgDfJJL
> Wx0aUD/pTqv3J5S5zM43UBJkn2ZR99Q7hJzm9PPMSMn7hBgK87QUEme8HerCPUgY
> 0VS4OcqUGhg88qO8GjdEFLnHawhjMDw9iGPUi+tMdCEmr9aQQo8ntiahdVKyTHej
> vSRQRs0igvAt73OWHXiP6vc4LOQdu1vKCFdbxhgg+duKjNOHfUoaiiaUiGhWIA9l
> BcTWd62bEJNOaXd6k06GzhpCWzMM6faTLfjI6XADUFY0VZ79akzk2KAO6YdaLz8w
> 3IAKN1chTpr7q7oPuRDgDQuwwdeLPImN29CKlAF3jlSRJEblM8CKoXYD1fyqVwDy
> c1mA6mMUJnEnXrkJ/Pb5zuNIZMAlU+v3d6CCjYKHMACORvJeZVlg9gLLMATaAJIA
> xLjFlzbgSbp/OUNuBuS4YGIaa51aAyODd2h1T3E+T5JYbVkA39N3Ni9HODE8AuSE
> E6U/06FK47L0e5uGFrM3tMTL0XBF62C1iml4NsjOWgiERz8lFDdFVArgXamCVacM
> 1+VdeLLS88RHFEuwlMBy/ZQBdnvWCVsNVjYukuxntmWbSWrsLUFUSzExWnp+7TAO
> xkEd2yMw75yasTVGKSXU
> =Q/fM
> -----END PGP SIGNATURE-----
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ec47aa9..21f156b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -33,6 +33,7 @@ 
 #include <asm/kmap_types.h>
 #include <linux/pagemap.h>
 #include <linux/btrfs.h>
+#include <linux/workqueue.h>
 #include "extent_io.h"
 #include "extent_map.h"
 #include "async-thread.h"
@@ -1305,6 +1306,8 @@  struct btrfs_stripe_hash_table {
 
 #define BTRFS_STRIPE_HASH_TABLE_BITS 11
 
+void btrfs_init_async_reclaim_work(struct work_struct *work);
+
 /* fs_info */
 struct reloc_control;
 struct btrfs_device;
@@ -1681,6 +1684,9 @@  struct btrfs_fs_info {
 
 	struct semaphore uuid_tree_rescan_sem;
 	unsigned int update_uuid_tree_gen:1;
+
+	/* Used to reclaim the metadata space in the background. */
+	struct work_struct async_reclaim_work;
 };
 
 /*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2bb0bbd..d77516e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2237,6 +2237,7 @@  int open_ctree(struct super_block *sb,
 	atomic_set(&fs_info->balance_cancel_req, 0);
 	fs_info->balance_ctl = NULL;
 	init_waitqueue_head(&fs_info->balance_wait_q);
+	btrfs_init_async_reclaim_work(&fs_info->async_reclaim_work);
 
 	sb->s_blocksize = 4096;
 	sb->s_blocksize_bits = blksize_bits(4096);
@@ -3580,6 +3581,8 @@  int close_ctree(struct btrfs_root *root)
 	/* clear out the rbtree of defraggable inodes */
 	btrfs_cleanup_defrag_inodes(fs_info);
 
+	cancel_work_sync(&fs_info->async_reclaim_work);
+
 	if (!(fs_info->sb->s_flags & MS_RDONLY)) {
 		ret = btrfs_commit_super(root);
 		if (ret)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index da43003..6640d28 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4200,6 +4200,98 @@  static int flush_space(struct btrfs_root *root,
 
 	return ret;
 }
+
+static inline u64
+btrfs_calc_reclaim_metadata_size(struct btrfs_root *root,
+				 struct btrfs_space_info *space_info)
+{
+	u64 used;
+	u64 expected;
+	u64 to_reclaim;
+
+	to_reclaim = min_t(u64, num_online_cpus() * 1024 * 1024,
+				16 * 1024 * 1024);
+	spin_lock(&space_info->lock);
+	if (can_overcommit(root, space_info, to_reclaim,
+			   BTRFS_RESERVE_FLUSH_ALL)) {
+		to_reclaim = 0;
+		goto out;
+	}
+
+	used = space_info->bytes_used + space_info->bytes_reserved +
+	       space_info->bytes_pinned + space_info->bytes_readonly +
+	       space_info->bytes_may_use;
+	if (can_overcommit(root, space_info, 1024 * 1024,
+			   BTRFS_RESERVE_FLUSH_ALL))
+		expected = div_factor_fine(space_info->total_bytes, 95);
+	else
+		expected = div_factor_fine(space_info->total_bytes, 90);
+	to_reclaim = used - expected;
+out:
+	spin_unlock(&space_info->lock);
+
+	return to_reclaim;
+}
+
+static inline int need_do_async_reclaim(struct btrfs_space_info *space_info,
+					struct btrfs_fs_info *fs_info, u64 used)
+{
+	return (used >= div_factor_fine(space_info->total_bytes, 95) &&
+		!btrfs_fs_closing(fs_info) &&
+		!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
+}
+
+static int btrfs_need_do_async_reclaim(struct btrfs_space_info *space_info,
+				       struct btrfs_fs_info *fs_info)
+{
+	u64 used;
+
+	spin_lock(&space_info->lock);
+	used = space_info->bytes_used + space_info->bytes_reserved +
+	       space_info->bytes_pinned + space_info->bytes_readonly +
+	       space_info->bytes_may_use;
+	if (need_do_async_reclaim(space_info, fs_info, used)) {
+		spin_unlock(&space_info->lock);
+		return 1;
+	}
+	spin_unlock(&space_info->lock);
+
+	return 0;
+}
+
+static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
+{
+	struct btrfs_fs_info *fs_info;
+	struct btrfs_space_info *space_info;
+	u64 to_reclaim;
+	int flush_state;
+
+	fs_info = container_of(work, struct btrfs_fs_info, async_reclaim_work);
+	space_info = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
+
+	to_reclaim = btrfs_calc_reclaim_metadata_size(fs_info->fs_root,
+						      space_info);
+	if (!to_reclaim)
+		return;
+
+	flush_state = FLUSH_DELAYED_ITEMS_NR;
+	do {
+		flush_space(fs_info->fs_root, space_info, to_reclaim,
+			    to_reclaim, flush_state);
+		flush_state++;
+		if (!btrfs_need_do_async_reclaim(space_info, fs_info))
+			return;
+	} while (flush_state <= COMMIT_TRANS);
+
+	if (btrfs_need_do_async_reclaim(space_info, fs_info))
+		queue_work(system_unbound_wq, work);
+}
+
+void btrfs_init_async_reclaim_work(struct work_struct *work)
+{
+	INIT_WORK(work, btrfs_async_reclaim_metadata_space);
+}
+
 /**
  * reserve_metadata_bytes - try to reserve bytes from the block_rsv's space
  * @root - the root we're allocating for
@@ -4307,8 +4399,13 @@  again:
 	if (ret && flush != BTRFS_RESERVE_NO_FLUSH) {
 		flushing = true;
 		space_info->flush = 1;
+	} else if (!ret && space_info->flags & BTRFS_BLOCK_GROUP_METADATA) {
+		used += orig_bytes;
+		if (need_do_async_reclaim(space_info, root->fs_info, used) &&
+		    !work_busy(&root->fs_info->async_reclaim_work))
+			queue_work(system_unbound_wq,
+				   &root->fs_info->async_reclaim_work);
 	}
-
 	spin_unlock(&space_info->lock);
 
 	if (!ret || flush == BTRFS_RESERVE_NO_FLUSH)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 97cc241..7cc7423 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1401,6 +1401,7 @@  static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 		 * this also happens on 'umount -rf' or on shutdown, when
 		 * the filesystem is busy.
 		 */
+		cancel_work_sync(&fs_info->async_reclaim_work);
 
 		/* wait for the uuid_scan task to finish */
 		down(&fs_info->uuid_tree_rescan_sem);