Message ID | 1464918615-6458-1-git-send-email-andrew@asquaredlabs.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jun 02, 2016 at 09:50:15PM -0400, Andrew Armenia wrote: > This patch adds mount option 'chunk_width_limit=X', which when set forces > the chunk allocator to use only up to X devices when allocating a chunk. > This may help reduce the seek penalties seen in filesystems with large > numbers of devices. There is a use for such limit but passing it as a mount is not the right way to do it. Changing the value requires remount, different raid levels might need different values, it's not persistent etc. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 6, 2016 at 5:17 AM, David Sterba <dsterba@suse.cz> wrote: > On Thu, Jun 02, 2016 at 09:50:15PM -0400, Andrew Armenia wrote: >> This patch adds mount option 'chunk_width_limit=X', which when set forces >> the chunk allocator to use only up to X devices when allocating a chunk. >> This may help reduce the seek penalties seen in filesystems with large >> numbers of devices. > > There is a use for such limit but passing it as a mount is not the right > way to do it. Changing the value requires remount, different raid levels > might need different values, it's not persistent etc. Thanks for the feedback. I agree that it's less than an ideal way to do it. I guess I was a bit quick to release proof-of-concept code. I'll try to do some more work on the configuration side of things and resubmit. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 06, 2016 at 09:43:19AM -0400, Andrew Armenia wrote: > On Mon, Jun 6, 2016 at 5:17 AM, David Sterba <dsterba@suse.cz> wrote: > > On Thu, Jun 02, 2016 at 09:50:15PM -0400, Andrew Armenia wrote: > >> This patch adds mount option 'chunk_width_limit=X', which when set forces > >> the chunk allocator to use only up to X devices when allocating a chunk. > >> This may help reduce the seek penalties seen in filesystems with large > >> numbers of devices. > > > > There is a use for such limit but passing it as a mount is not the right > > way to do it. Changing the value requires remount, different raid levels > > might need different values, it's not persistent etc. > > Thanks for the feedback. I agree that it's less than an ideal way to > do it. I guess I was a bit quick to release proof-of-concept code. > I'll try to do some more work on the configuration side of things and > resubmit. Last time something like this came up in discussion, the consensus was to put the configuration in xattrs in the btrfs namespace. There should be one with a name to indicate "global" configuration, applied to the top of subvol 5. If the feature allows configuration on a per-subvol or per-object basis, then there should also be a name for the relevant xattr (also in the btrfs namespace) that can be created on each object as required. Hugo.
On Mon, Jun 6, 2016 at 11:00 AM, Hugo Mills <hugo@carfax.org.uk> wrote: > On Mon, Jun 06, 2016 at 09:43:19AM -0400, Andrew Armenia wrote: >> On Mon, Jun 6, 2016 at 5:17 AM, David Sterba <dsterba@suse.cz> wrote: >> > On Thu, Jun 02, 2016 at 09:50:15PM -0400, Andrew Armenia wrote: >> >> This patch adds mount option 'chunk_width_limit=X', which when set forces >> >> the chunk allocator to use only up to X devices when allocating a chunk. >> >> This may help reduce the seek penalties seen in filesystems with large >> >> numbers of devices. >> > >> > There is a use for such limit but passing it as a mount is not the right >> > way to do it. Changing the value requires remount, different raid levels >> > might need different values, it's not persistent etc. >> >> Thanks for the feedback. I agree that it's less than an ideal way to >> do it. I guess I was a bit quick to release proof-of-concept code. >> I'll try to do some more work on the configuration side of things and >> resubmit. > > Last time something like this came up in discussion, the consensus > was to put the configuration in xattrs in the btrfs namespace. There > should be one with a name to indicate "global" configuration, applied > to the top of subvol 5. If the feature allows configuration on a > per-subvol or per-object basis, then there should also be a name for > the relevant xattr (also in the btrfs namespace) that can be created > on each object as required. > > Hugo. > > -- > Hugo Mills | Klytus, I'm bored. What plaything can you offer me > hugo@... carfax.org.uk | today? > http://carfax.org.uk/ | > PGP: E2AB1DE4 | Ming the Merciless, Flash Gordon OK, since I'm new here I'll throw my thoughts out before writing and submitting another patch. I wouldn't want to have to read the xattrs from disk every time a chunk needs to be allocated. That means the allocator configuration will need to be cached in the fs_info structure. My current line of thinking is to create a btrfs_allocator_config structure containing all of the allocator parameters - for now a user-set devs_max for each RAID level. I'd add a couple ioctls to allow the userspace tools to get and set these parameters. The "set" ioctl would also persist this configuration to one or more xattrs on the root of subvol 5 per Hugo's suggestion. I'm still a bit hung up on the proper way to read the config back at mount time. Patching the mount routines doesn't seem like a good way to do it. Perhaps just doing it at the first invocation of __btrfs_alloc_chunk is best? Any other thoughts? -Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/03/2016 09:50 AM, Andrew Armenia wrote: > This patch adds mount option 'chunk_width_limit=X', which when set forces > the chunk allocator to use only up to X devices when allocating a chunk. > This may help reduce the seek penalties seen in filesystems with large > numbers of devices. Have you reviewed implementations like device allocation grouping? Some info is in the btrfs project ideas.. https://btrfs.wiki.kernel.org/index.php/Project_ideas Chunk allocation groups Limits on number of stripes (stripe width) Linear chunk allocation mode (Device allocation grouping is important for enterprise storage solutions). Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jun 12, 2016 at 10:06 PM, Anand Jain <anand.jain@oracle.com> wrote: > > > On 06/03/2016 09:50 AM, Andrew Armenia wrote: >> >> This patch adds mount option 'chunk_width_limit=X', which when set forces >> the chunk allocator to use only up to X devices when allocating a chunk. >> This may help reduce the seek penalties seen in filesystems with large >> numbers of devices. > > > Have you reviewed implementations like device allocation grouping? > Some info is in the btrfs project ideas.. > > https://btrfs.wiki.kernel.org/index.php/Project_ideas > Chunk allocation groups > Limits on number of stripes (stripe width) > Linear chunk allocation mode > > (Device allocation grouping is important for enterprise storage solutions). > > Thanks, Anand I have looked at those ideas; allocation groups are what I'm ideally after but I decided to start out small. I just spotted the dev_group field in the device tree that appears to be currently unused, so perhaps I will look at developing a group-aware allocator instead of just limiting the chunk width. -Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 14, 2016 at 03:44:47PM -0400, Andrew Armenia wrote: > On Sun, Jun 12, 2016 at 10:06 PM, Anand Jain <anand.jain@oracle.com> wrote: > > > > > > On 06/03/2016 09:50 AM, Andrew Armenia wrote: > >> > >> This patch adds mount option 'chunk_width_limit=X', which when set forces > >> the chunk allocator to use only up to X devices when allocating a chunk. > >> This may help reduce the seek penalties seen in filesystems with large > >> numbers of devices. > > > > > > Have you reviewed implementations like device allocation grouping? > > Some info is in the btrfs project ideas.. > > > > https://btrfs.wiki.kernel.org/index.php/Project_ideas > > Chunk allocation groups > > Limits on number of stripes (stripe width) > > Linear chunk allocation mode > > > > (Device allocation grouping is important for enterprise storage solutions). > > > > Thanks, Anand > > I have looked at those ideas; allocation groups are what I'm ideally > after but I decided to start out small. I just spotted the dev_group > field in the device tree that appears to be currently unused, so > perhaps I will look at developing a group-aware allocator instead of > just limiting the chunk width. I made some design notes on a generalised approach for this a while ago: http://www.spinics.net/lists/linux-btrfs/msg33782.html http://www.spinics.net/lists/linux-btrfs/msg33916.html Hugo.
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 101c3cf..27b6f8f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -754,6 +754,9 @@ struct btrfs_fs_info { unsigned long pending_changes; unsigned long compress_type:4; int commit_interval; + + int chunk_width_limit; + /* * It is a suggestive number, the read side is safe even it gets a * wrong number because we will write out the data into a regular diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 4e59a91..3da5220 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -300,7 +300,7 @@ enum { Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard, Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_datacow, Opt_datasum, Opt_treelog, Opt_noinode_cache, Opt_usebackuproot, - Opt_nologreplay, Opt_norecovery, + Opt_nologreplay, Opt_norecovery, Opt_width_limit, #ifdef CONFIG_BTRFS_DEBUG Opt_fragment_data, Opt_fragment_metadata, Opt_fragment_all, #endif @@ -360,6 +360,7 @@ static const match_table_t tokens = { {Opt_rescan_uuid_tree, "rescan_uuid_tree"}, {Opt_fatal_errors, "fatal_errors=%s"}, {Opt_commit_interval, "commit=%d"}, + {Opt_width_limit, "chunk_width_limit=%d"}, #ifdef CONFIG_BTRFS_DEBUG {Opt_fragment_data, "fragment=data"}, {Opt_fragment_metadata, "fragment=metadata"}, @@ -782,6 +783,22 @@ int btrfs_parse_options(struct btrfs_root *root, char *options, info->commit_interval = BTRFS_DEFAULT_COMMIT_INTERVAL; } break; + case Opt_width_limit: + intarg = 0; + ret = match_int(&args[0], &intarg); + if (ret < 0) { + btrfs_err(root->fs_info, "invalid chunk width limit"); + ret = -EINVAL; + goto out; + } + + if (intarg > 0) { + info->chunk_width_limit = intarg; + } else { + btrfs_info(root->fs_info, "chunk width is unlimited"); + info->chunk_width_limit = 0; + } + break; #ifdef CONFIG_BTRFS_DEBUG case Opt_fragment_all: btrfs_info(root->fs_info, "fragmenting all space"); @@ -1207,6 +1224,9 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) if (info->thread_pool_size != min_t(unsigned long, num_online_cpus() + 2, 8)) seq_printf(seq, ",thread_pool=%d", info->thread_pool_size); + if (info->chunk_width_limit != 0) + seq_printf(seq, ",chunk_width_limit=%d", + info->chunk_width_limit); if (btrfs_test_opt(root, COMPRESS)) { if (info->compress_type == BTRFS_COMPRESS_ZLIB) compress_type = "zlib"; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bdc6256..6d0d35d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4558,6 +4558,32 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, devs_increment = btrfs_raid_array[index].devs_increment; ncopies = btrfs_raid_array[index].ncopies; + /* + * if we have a statically-configured chunk width, and the type doesn't + * specify one, go ahead and use the statically-configured max instead. + * + * If the static value is greater than the BTRFS_MAX_DEVS for the + * chunk tree, we ignore it. + * + * Also, we ignore the static value for system chunks. + */ + if ( + devs_max == 0 && info->chunk_width_limit != 0 + && !(type & BTRFS_BLOCK_GROUP_SYSTEM) + && info->chunk_width_limit <= BTRFS_MAX_DEVS(info->chunk_root) + ) { + if (info->chunk_width_limit >= devs_min) { + devs_max = info->chunk_width_limit; + } else { + /* warn that the static devs_max is unusable */ + btrfs_warn(info, + "can't satisfy max chunk width of %d; " + "minimum %d devices needed", + info->chunk_width_limit, devs_max + ); + } + } + if (type & BTRFS_BLOCK_GROUP_DATA) { max_stripe_size = SZ_1G; max_chunk_size = 10 * max_stripe_size;
This patch adds mount option 'chunk_width_limit=X', which when set forces the chunk allocator to use only up to X devices when allocating a chunk. This may help reduce the seek penalties seen in filesystems with large numbers of devices. Signed-off-by: Andrew Armenia <andrew@asquaredlabs.com> --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/super.c | 22 +++++++++++++++++++++- fs/btrfs/volumes.c | 26 ++++++++++++++++++++++++++ 3 files changed, 50 insertions(+), 1 deletion(-)