diff mbox

[RFC] Btrfs: Allow the compressed extent size limit to be modified.

Message ID 1360167017-19632-1-git-send-email-mitch.harder@sabayonlinux.org (mailing list archive)
State New, archived
Headers show

Commit Message

Mitch Harder Feb. 6, 2013, 4:10 p.m. UTC
Provide for modification of the limit of compressed extent size
utilizing mount-time configuration settings.

The size of compressed extents was limited to 128K, which
leads to fragmentation of the extents (although the extents
themselves may still be located contiguously).  This limit is
put in place to ease the RAM required when spreading compression
across several CPUs, and to make sure the amount of IO required
to do a random read is reasonably small.

This patch is still preliminary.

In this version of the patch, the allowed compressed extent size is
restricted to 128 (the default) and 512. I wanted to extensively test
a single value for a change in compressed extent size before expanding
and testing a wider range of parameters.

I submitted a similar patch about a year and a half ago where the
change was hard-coded and not tuneable.

http://comments.gmane.org/gmane.comp.file-systems.btrfs/10516

---
 fs/btrfs/ctree.h      |  3 +++
 fs/btrfs/disk-io.c    |  1 +
 fs/btrfs/inode.c      |  8 ++++----
 fs/btrfs/relocation.c |  7 ++++---
 fs/btrfs/super.c      | 19 ++++++++++++++++++-
 5 files changed, 30 insertions(+), 8 deletions(-)

Comments

Zach Brown Feb. 6, 2013, 6:46 p.m. UTC | #1
> +	unsigned compressed_extent_size;

It kind of jumps out that this mentions neither that it's the max nor
that it's in KB.  How about max_compressed_extent_kb?

> +	fs_info->compressed_extent_size = 128;

I'd put a DEFAULT_MAX_EXTENTS up by the MAX_ definition instead of using
a raw 128 here.

> +	unsigned long max_compressed = root->fs_info->compressed_extent_size * 1024;
> +	unsigned long max_uncompressed = root->fs_info->compressed_extent_size * 1024;

(so max_compressed is in bytes)

>  	nr_pages = (end >> PAGE_CACHE_SHIFT) - (start >> PAGE_CACHE_SHIFT) + 1;
> -	nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE);
> +	nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE);

(and now that expression adds another * 1024, allowing {128,512}MB
extents :))

>  	 * We also want to make sure the amount of IO required to do
>  	 * a random read is reasonably small, so we limit the size of
> -	 * a compressed extent to 128k.
> +	 * a compressed extent (default of 128k).

Just drop the value so that this comment doesn't need to be updated
again.

-	 * a compressed extent to 128k.
+	 * a compressed extent.

> +	{Opt_compr_extent_size, "compressed_extent_size=%d"},

It's even more important to make the exposed option self-documenting
than it was to get the fs_info member right.

> +			if ((intarg == 128) || (intarg == 512)) {
> +				info->compressed_extent_size = intarg;
> +				printk(KERN_INFO "btrfs: compressed extent size %d\n",
> +				       info->compressed_extent_size);
> +			} else {
> +				printk(KERN_INFO "btrfs: "
> +					"Invalid compressed extent size,"
> +					" using default.\n");

I'd print the default value when it's used and would include a unit in
both.

> +	if (btrfs_test_opt(root, COMPR_EXTENT_SIZE))
> +		seq_printf(seq, ",compressed_extent_size=%d",
> +			   (unsigned long long)info->compressed_extent_size);

The (ull) cast doesn't match the %d format and wouldn't be needed if it
was printed with %u.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mitch Harder Feb. 6, 2013, 11:41 p.m. UTC | #2
On Wed, Feb 6, 2013 at 12:46 PM, Zach Brown <zab@redhat.com> wrote:
>> +     unsigned compressed_extent_size;
>
> It kind of jumps out that this mentions neither that it's the max nor
> that it's in KB.  How about max_compressed_extent_kb?
>
>> +     fs_info->compressed_extent_size = 128;
>
> I'd put a DEFAULT_MAX_EXTENTS up by the MAX_ definition instead of using
> a raw 128 here.
>
>> +     unsigned long max_compressed = root->fs_info->compressed_extent_size * 1024;
>> +     unsigned long max_uncompressed = root->fs_info->compressed_extent_size * 1024;
>
> (so max_compressed is in bytes)
>
>>       nr_pages = (end >> PAGE_CACHE_SHIFT) - (start >> PAGE_CACHE_SHIFT) + 1;
>> -     nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE);
>> +     nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE);
>
> (and now that expression adds another * 1024, allowing {128,512}MB
> extents :))
>

Yuk!  I'm surprised this never manifested as a problem during testing.

>>        * We also want to make sure the amount of IO required to do
>>        * a random read is reasonably small, so we limit the size of
>> -      * a compressed extent to 128k.
>> +      * a compressed extent (default of 128k).
>
> Just drop the value so that this comment doesn't need to be updated
> again.
>
> -        * a compressed extent to 128k.
> +        * a compressed extent.
>
>> +     {Opt_compr_extent_size, "compressed_extent_size=%d"},
>
> It's even more important to make the exposed option self-documenting
> than it was to get the fs_info member right.
>
>> +                     if ((intarg == 128) || (intarg == 512)) {
>> +                             info->compressed_extent_size = intarg;
>> +                             printk(KERN_INFO "btrfs: compressed extent size %d\n",
>> +                                    info->compressed_extent_size);
>> +                     } else {
>> +                             printk(KERN_INFO "btrfs: "
>> +                                     "Invalid compressed extent size,"
>> +                                     " using default.\n");
>
> I'd print the default value when it's used and would include a unit in
> both.
>
>> +     if (btrfs_test_opt(root, COMPR_EXTENT_SIZE))
>> +             seq_printf(seq, ",compressed_extent_size=%d",
>> +                        (unsigned long long)info->compressed_extent_size);
>
> The (ull) cast doesn't match the %d format and wouldn't be needed if it
> was printed with %u.
>
> - z

Thanks for the review.

All these comments make sense, and I should be able to work them in.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Feb. 20, 2013, 12:54 p.m. UTC | #3
On Wed, Feb 06, 2013 at 05:41:42PM -0600, Mitch Harder wrote:
> On Wed, Feb 6, 2013 at 12:46 PM, Zach Brown <zab@redhat.com> wrote:
> >> +     unsigned compressed_extent_size;
> >
> > It kind of jumps out that this mentions neither that it's the max nor
> > that it's in KB.  How about max_compressed_extent_kb?
> >
> >> +     fs_info->compressed_extent_size = 128;
> >
> > I'd put a DEFAULT_MAX_EXTENTS up by the MAX_ definition instead of using
> > a raw 128 here.
> >
> >> +     unsigned long max_compressed = root->fs_info->compressed_extent_size * 1024;
> >> +     unsigned long max_uncompressed = root->fs_info->compressed_extent_size * 1024;
> >
> > (so max_compressed is in bytes)
> >
> >>       nr_pages = (end >> PAGE_CACHE_SHIFT) - (start >> PAGE_CACHE_SHIFT) + 1;
> >> -     nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE);
> >> +     nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE);
> >
> > (and now that expression adds another * 1024, allowing {128,512}MB
> > extents :))
> >
> 
> Yuk!  I'm surprised this never manifested as a problem during testing.

Answer for the curious:

cow_file_range_async()

1076                 if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS)
1077                         cur_end = end;
1078                 else
1079                         cur_end = min(end, start + 512 * 1024 - 1);

in case of a compressed extent the end is at most 512K and the range
[start,end) processed in comprssed_file_range() calculates nr_pages from
that range.

So, 512 should be replaced wit max_compressed_extent_kb value. I've
experimented with higher values, it starts to break with 1024. The
errors seem to be around calculating block checksums and they just do
not fit into one page, roughly:

blocks = 1M / 4096 = 256
csum bytes = csum * blocks = 4 * 256 = 4096
leaf headers + csum bytes > PAGE_SIZE and slab debug reports corruption

max at 768 worked fine, the safe maximum is ~900.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 547b7b0..f37ec32 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1477,6 +1477,8 @@  struct btrfs_fs_info {
 	unsigned data_chunk_allocations;
 	unsigned metadata_ratio;
 
+	unsigned compressed_extent_size;
+
 	void *bdev_holder;
 
 	/* private scrub information */
@@ -1829,6 +1831,7 @@  struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_CHECK_INTEGRITY	(1 << 20)
 #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 << 21)
 #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR	(1 << 22)
+#define BTRFS_MOUNT_COMPR_EXTENT_SIZE (1 << 23)
 
 #define btrfs_clear_opt(o, opt)		((o) &= ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)		((o) |= BTRFS_MOUNT_##opt)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 830bc17..2d2be03 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2056,6 +2056,7 @@  int open_ctree(struct super_block *sb,
 	fs_info->trans_no_join = 0;
 	fs_info->free_chunk_space = 0;
 	fs_info->tree_mod_log = RB_ROOT;
+	fs_info->compressed_extent_size = 128;
 
 	/* readahead state */
 	INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_WAIT);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 148abeb..5b81b56 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -346,8 +346,8 @@  static noinline int compress_file_range(struct inode *inode,
 	unsigned long nr_pages_ret = 0;
 	unsigned long total_compressed = 0;
 	unsigned long total_in = 0;
-	unsigned long max_compressed = 128 * 1024;
-	unsigned long max_uncompressed = 128 * 1024;
+	unsigned long max_compressed = root->fs_info->compressed_extent_size * 1024;
+	unsigned long max_uncompressed = root->fs_info->compressed_extent_size * 1024;
 	int i;
 	int will_compress;
 	int compress_type = root->fs_info->compress_type;
@@ -361,7 +361,7 @@  static noinline int compress_file_range(struct inode *inode,
 again:
 	will_compress = 0;
 	nr_pages = (end >> PAGE_CACHE_SHIFT) - (start >> PAGE_CACHE_SHIFT) + 1;
-	nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE);
+	nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE);
 
 	/*
 	 * we don't want to send crud past the end of i_size through
@@ -386,7 +386,7 @@  again:
 	 *
 	 * We also want to make sure the amount of IO required to do
 	 * a random read is reasonably small, so we limit the size of
-	 * a compressed extent to 128k.
+	 * a compressed extent (default of 128k).
 	 */
 	total_compressed = min(total_compressed, max_uncompressed);
 	num_bytes = (end - start + blocksize) & ~(blocksize - 1);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 300e09a..8d6f6bf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -144,7 +144,7 @@  struct tree_block {
 	unsigned int key_ready:1;
 };
 
-#define MAX_EXTENTS 128
+#define MAX_EXTENTS 512
 
 struct file_extent_cluster {
 	u64 start;
@@ -3055,6 +3055,7 @@  int relocate_data_extent(struct inode *inode, struct btrfs_key *extent_key,
 			 struct file_extent_cluster *cluster)
 {
 	int ret;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 
 	if (cluster->nr > 0 && extent_key->objectid != cluster->end + 1) {
 		ret = relocate_file_extent_cluster(inode, cluster);
@@ -3066,12 +3067,12 @@  int relocate_data_extent(struct inode *inode, struct btrfs_key *extent_key,
 	if (!cluster->nr)
 		cluster->start = extent_key->objectid;
 	else
-		BUG_ON(cluster->nr >= MAX_EXTENTS);
+		BUG_ON(cluster->nr >= fs_info->compressed_extent_size);
 	cluster->end = extent_key->objectid + extent_key->offset - 1;
 	cluster->boundary[cluster->nr] = extent_key->objectid;
 	cluster->nr++;
 
-	if (cluster->nr >= MAX_EXTENTS) {
+	if (cluster->nr >= fs_info->compressed_extent_size) {
 		ret = relocate_file_extent_cluster(inode, cluster);
 		if (ret)
 			return ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d8982e9..6e9dc69 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -322,7 +322,7 @@  enum {
 	Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
 	Opt_check_integrity, Opt_check_integrity_including_extent_data,
 	Opt_check_integrity_print_mask, Opt_fatal_errors,
-	Opt_err,
+	Opt_compr_extent_size, Opt_err,
 };
 
 static match_table_t tokens = {
@@ -361,6 +361,7 @@  static match_table_t tokens = {
 	{Opt_check_integrity, "check_int"},
 	{Opt_check_integrity_including_extent_data, "check_int_data"},
 	{Opt_check_integrity_print_mask, "check_int_print_mask=%d"},
+	{Opt_compr_extent_size, "compressed_extent_size=%d"},
 	{Opt_fatal_errors, "fatal_errors=%s"},
 	{Opt_err, NULL},
 };
@@ -612,6 +613,19 @@  int btrfs_parse_options(struct btrfs_root *root, char *options)
 			ret = -EINVAL;
 			goto out;
 #endif
+		case Opt_compr_extent_size:
+			intarg = 0;
+			match_int(&args[0], &intarg);
+			if ((intarg == 128) || (intarg == 512)) {
+				info->compressed_extent_size = intarg;
+				printk(KERN_INFO "btrfs: compressed extent size %d\n",
+				       info->compressed_extent_size);
+			} else {
+				printk(KERN_INFO "btrfs: "
+					"Invalid compressed extent size,"
+					" using default.\n");
+			}
+			break;
 		case Opt_fatal_errors:
 			if (strcmp(args[0].from, "panic") == 0)
 				btrfs_set_opt(info->mount_opt,
@@ -951,6 +965,9 @@  static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 		seq_puts(seq, ",skip_balance");
 	if (btrfs_test_opt(root, PANIC_ON_FATAL_ERROR))
 		seq_puts(seq, ",fatal_errors=panic");
+	if (btrfs_test_opt(root, COMPR_EXTENT_SIZE))
+		seq_printf(seq, ",compressed_extent_size=%d",
+			   (unsigned long long)info->compressed_extent_size);
 	return 0;
 }