diff mbox series

fuse: Enable dynamic configuration of FUSE_MAX_MAX_PAGES

Message ID 20240628001355.243805-1-joannelkoong@gmail.com (mailing list archive)
State New, archived
Headers show
Series fuse: Enable dynamic configuration of FUSE_MAX_MAX_PAGES | expand

Commit Message

Joanne Koong June 28, 2024, 12:13 a.m. UTC
Introduce the capability to dynamically configure the FUSE_MAX_MAX_PAGES
limit through a sysctl. This enhancement allows system administrators to
adjust the value based on system-specific requirements.

This removes the previous static limit of 256 max pages, which limits
the max write size of a request to 1 MiB (on 4096 pagesize systems).
Having the ability to up the max write size beyond 1 MiB allows for the
perf improvements detailed in this thread [1].

$ sysctl -a | grep fuse_max_max_pages
fs.fuse.fuse_max_max_pages = 256

$ sysctl -n fs.fuse.fuse_max_max_pages
256

$ echo 1024 | sudo tee /proc/sys/fs/fuse/fuse_max_max_pages
1024

$ sysctl -n fs.fuse.fuse_max_max_pages
1024

[1] https://lore.kernel.org/linux-fsdevel/20240124070512.52207-1-jefflexu@linux.alibaba.com/T/#u

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 Documentation/admin-guide/sysctl/fs.rst | 10 +++++++
 fs/fuse/Makefile                        |  2 +-
 fs/fuse/fuse_i.h                        | 14 +++++++---
 fs/fuse/inode.c                         | 11 +++++++-
 fs/fuse/ioctl.c                         |  4 ++-
 fs/fuse/sysctl.c                        | 35 +++++++++++++++++++++++++
 6 files changed, 70 insertions(+), 6 deletions(-)
 create mode 100644 fs/fuse/sysctl.c

Comments

Josef Bacik June 28, 2024, 6:03 p.m. UTC | #1
On Thu, Jun 27, 2024 at 05:13:55PM -0700, Joanne Koong wrote:
> Introduce the capability to dynamically configure the FUSE_MAX_MAX_PAGES
> limit through a sysctl. This enhancement allows system administrators to
> adjust the value based on system-specific requirements.
> 
> This removes the previous static limit of 256 max pages, which limits
> the max write size of a request to 1 MiB (on 4096 pagesize systems).
> Having the ability to up the max write size beyond 1 MiB allows for the
> perf improvements detailed in this thread [1].
> 
> $ sysctl -a | grep fuse_max_max_pages
> fs.fuse.fuse_max_max_pages = 256
> 
> $ sysctl -n fs.fuse.fuse_max_max_pages
> 256
> 
> $ echo 1024 | sudo tee /proc/sys/fs/fuse/fuse_max_max_pages
> 1024
> 
> $ sysctl -n fs.fuse.fuse_max_max_pages
> 1024
> 
> [1] https://lore.kernel.org/linux-fsdevel/20240124070512.52207-1-jefflexu@linux.alibaba.com/T/#u
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>

Overall the change is great, and I see why you named it the way you did, but I
think may be 'fuse_max_pages_limit' may be a better name?  The original constant
name wasn't great, but it was fine in its context.  I think having it as an
interface we should name it something less silly.

I'm not married to this thought, what do the rest of you think?  Whatever name
we settle on is fine, you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

once we settle on the right name for this.  Thanks,

Josef
Joanne Koong July 1, 2024, 4:48 p.m. UTC | #2
On Fri, Jun 28, 2024 at 11:03 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Thu, Jun 27, 2024 at 05:13:55PM -0700, Joanne Koong wrote:
> > Introduce the capability to dynamically configure the FUSE_MAX_MAX_PAGES
> > limit through a sysctl. This enhancement allows system administrators to
> > adjust the value based on system-specific requirements.
> >
> > This removes the previous static limit of 256 max pages, which limits
> > the max write size of a request to 1 MiB (on 4096 pagesize systems).
> > Having the ability to up the max write size beyond 1 MiB allows for the
> > perf improvements detailed in this thread [1].
> >
> > $ sysctl -a | grep fuse_max_max_pages
> > fs.fuse.fuse_max_max_pages = 256
> >
> > $ sysctl -n fs.fuse.fuse_max_max_pages
> > 256
> >
> > $ echo 1024 | sudo tee /proc/sys/fs/fuse/fuse_max_max_pages
> > 1024
> >
> > $ sysctl -n fs.fuse.fuse_max_max_pages
> > 1024
> >
> > [1] https://lore.kernel.org/linux-fsdevel/20240124070512.52207-1-jefflexu@linux.alibaba.com/T/#u
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>
> Overall the change is great, and I see why you named it the way you did, but I
> think may be 'fuse_max_pages_limit' may be a better name?  The original constant
> name wasn't great, but it was fine in its context.  I think having it as an
> interface we should name it something less silly.

'fuse_max_pages_limit' sounds great to me. I'll submit v2 with this rename.

>
> I'm not married to this thought, what do the rest of you think?  Whatever name
> we settle on is fine, you can add
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>
> once we settle on the right name for this.  Thanks,
>
> Josef
Sweet Tea Dorminy July 1, 2024, 7:15 p.m. UTC | #3
On 7/1/24 12:48 PM, Joanne Koong wrote:
> On Fri, Jun 28, 2024 at 11:03 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> On Thu, Jun 27, 2024 at 05:13:55PM -0700, Joanne Koong wrote:
>>> Introduce the capability to dynamically configure the FUSE_MAX_MAX_PAGES
>>> limit through a sysctl. This enhancement allows system administrators to
>>> adjust the value based on system-specific requirements.
>>>
>>> This removes the previous static limit of 256 max pages, which limits
>>> the max write size of a request to 1 MiB (on 4096 pagesize systems).
>>> Having the ability to up the max write size beyond 1 MiB allows for the
>>> perf improvements detailed in this thread [1].
>>>
>>> $ sysctl -a | grep fuse_max_max_pages
>>> fs.fuse.fuse_max_max_pages = 256
>>>
>>> $ sysctl -n fs.fuse.fuse_max_max_pages
>>> 256
>>>
>>> $ echo 1024 | sudo tee /proc/sys/fs/fuse/fuse_max_max_pages
>>> 1024
>>>
>>> $ sysctl -n fs.fuse.fuse_max_max_pages
>>> 1024
>>>
>>> [1] https://lore.kernel.org/linux-fsdevel/20240124070512.52207-1-jefflexu@linux.alibaba.com/T/#u
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>
>> Overall the change is great, and I see why you named it the way you did, but I
>> think may be 'fuse_max_pages_limit' may be a better name?  The original constant
>> name wasn't great, but it was fine in its context.  I think having it as an
>> interface we should name it something less silly.
> 
> 'fuse_max_pages_limit' sounds great to me. I'll submit v2 with this rename.
> 

Whatever-it's-called might make sense to be in bytes, since max_read, 
max_write are the values visible to a userspace fuse server and are in 
bytes.

Additionally, the sysctl should probably disallow values at or above 
1<<16 pages -- fuse_init_out expects that max_pages fits in a u16, and 
there are a couple of places where fc->max_pages << PAGE_SHIFT is 
expected to fit in a u32. fc->max_pages is constrained by max_max_pages 
at present so this is true but would not be true if max_max_pages is 
only constrained by being a uint.
Joanne Koong July 1, 2024, 11:28 p.m. UTC | #4
On Mon, Jul 1, 2024 at 12:15 PM Sweet Tea Dorminy
<sweettea-kernel@dorminy.me> wrote:
>
> On 7/1/24 12:48 PM, Joanne Koong wrote:
> > On Fri, Jun 28, 2024 at 11:03 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >>
> >> On Thu, Jun 27, 2024 at 05:13:55PM -0700, Joanne Koong wrote:
> >>> Introduce the capability to dynamically configure the FUSE_MAX_MAX_PAGES
> >>> limit through a sysctl. This enhancement allows system administrators to
> >>> adjust the value based on system-specific requirements.
> >>>
> >>> This removes the previous static limit of 256 max pages, which limits
> >>> the max write size of a request to 1 MiB (on 4096 pagesize systems).
> >>> Having the ability to up the max write size beyond 1 MiB allows for the
> >>> perf improvements detailed in this thread [1].
> >>>
> >>> $ sysctl -a | grep fuse_max_max_pages
> >>> fs.fuse.fuse_max_max_pages = 256
> >>>
> >>> $ sysctl -n fs.fuse.fuse_max_max_pages
> >>> 256
> >>>
> >>> $ echo 1024 | sudo tee /proc/sys/fs/fuse/fuse_max_max_pages
> >>> 1024
> >>>
> >>> $ sysctl -n fs.fuse.fuse_max_max_pages
> >>> 1024
> >>>
> >>> [1] https://lore.kernel.org/linux-fsdevel/20240124070512.52207-1-jefflexu@linux.alibaba.com/T/#u
> >>>
> >>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >>
> >> Overall the change is great, and I see why you named it the way you did, but I
> >> think may be 'fuse_max_pages_limit' may be a better name?  The original constant
> >> name wasn't great, but it was fine in its context.  I think having it as an
> >> interface we should name it something less silly.
> >
> > 'fuse_max_pages_limit' sounds great to me. I'll submit v2 with this rename.
> >
>
> Whatever-it's-called might make sense to be in bytes, since max_read,
> max_write are the values visible to a userspace fuse server and are in
> bytes.

I like this in theory but in practice, I think this adds more
logic/complexity in the kernel than it's worth, since we will need to
keep track of and enforce non-page-size-aligned values that may be set
as the limit. For example, if the sysctl value gets set to 1048577
bytes (eg 1 MiB + 1 byte) and the user sets "max_write" to 1048578
bytes in the init callback, we need to make sure we cap writes to
exactly 1048577 bytes, which means we'll need to keep track of this in
the kernel at the byte granularity. Additionally, there are some
places where fc->max_pages is used (eg for fuse_readahead) where it
doesn't quite make sense to have this limit be a non-page-size-aligned
number.

My personal vote is to keep it simple with page size granularity, but
I'm happy to talk more about this if you or anyone else feels
strongly.

>
> Additionally, the sysctl should probably disallow values at or above
> 1<<16 pages -- fuse_init_out expects that max_pages fits in a u16, and
> there are a couple of places where fc->max_pages << PAGE_SHIFT is
> expected to fit in a u32. fc->max_pages is constrained by max_max_pages
> at present so this is true but would not be true if max_max_pages is
> only constrained by being a uint.

Good point! I missed that fc->max_pages will be bounded anyways by the
uint16_t "fuse_init_out->max_pages", so setting larger values than u16
won't do anything. I'll make this change in v2.
(For the fc->max_pages << PAGE_SHIFT being expected to fit in a u32, I
only see that for fuse_verify_ioctl_iov() but that function looks like
it should be defining "max" to size_t instead of a u32 and then
casting to size_t? Not relevant to this conversation since I'll be
changing it to u16 but I was just curious)

Thanks for taking a look!
diff mbox series

Patch

diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 47499a1742bd..3f96e2f47b01 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -332,3 +332,13 @@  Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes
 on a 64-bit one.
 The current default value for ``max_user_watches`` is 4% of the
 available low memory, divided by the "watch" cost in bytes.
+
+5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems
+=====================================================================
+
+This directory contains the following configuration options for FUSE
+filesystems:
+
+``/proc/sys/fs/fuse/fuse_max_max_pages`` is a read/write file for
+setting/getting the maximum number of pages that can be used for servicing
+requests in FUSE.
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 6e0228c6d0cb..cd4ef3e08ebf 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -7,7 +7,7 @@  obj-$(CONFIG_FUSE_FS) += fuse.o
 obj-$(CONFIG_CUSE) += cuse.o
 obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
 
-fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
+fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o sysctl.o
 fuse-y += iomode.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
 fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f23919610313..0c9aaf626341 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -35,9 +35,6 @@ 
 /** Default max number of pages that can be used in a single read request */
 #define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32
 
-/** Maximum of max_pages received in init_out */
-#define FUSE_MAX_MAX_PAGES 256
-
 /** Bias for fi->writectr, meaning new writepages must not be sent */
 #define FUSE_NOWRITE INT_MIN
 
@@ -47,6 +44,9 @@ 
 /** Number of dentries for each connection in the control filesystem */
 #define FUSE_CTL_NUM_DENTRIES 5
 
+/** Maximum of max_pages received in init_out */
+extern unsigned int fuse_max_max_pages;
+
 /** List of active connections */
 extern struct list_head fuse_conn_list;
 
@@ -1472,4 +1472,12 @@  ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe,
 				      size_t len, unsigned int flags);
 ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma);
 
+#ifdef CONFIG_SYSCTL
+extern int fuse_sysctl_register(void);
+extern void fuse_sysctl_unregister(void);
+#else
+#define fuse_sysctl_register()		(0)
+#define fuse_sysctl_unregister()	do { } while (0)
+#endif /* CONFIG_SYSCTL */
+
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 99e44ea7d875..5d29a92389e6 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -35,6 +35,8 @@  DEFINE_MUTEX(fuse_mutex);
 
 static int set_global_limit(const char *val, const struct kernel_param *kp);
 
+unsigned int fuse_max_max_pages = 256;
+
 unsigned max_user_bgreq;
 module_param_call(max_user_bgreq, set_global_limit, param_get_uint,
 		  &max_user_bgreq, 0644);
@@ -932,7 +934,7 @@  void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
 	fc->user_ns = get_user_ns(user_ns);
 	fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ;
-	fc->max_pages_limit = FUSE_MAX_MAX_PAGES;
+	fc->max_pages_limit = fuse_max_max_pages;
 
 	if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
 		fuse_backing_files_init(fc);
@@ -2039,8 +2041,14 @@  static int __init fuse_fs_init(void)
 	if (err)
 		goto out3;
 
+	err = fuse_sysctl_register();
+	if (err)
+		goto out4;
+
 	return 0;
 
+ out4:
+	unregister_filesystem(&fuse_fs_type);
  out3:
 	unregister_fuseblk();
  out2:
@@ -2053,6 +2061,7 @@  static void fuse_fs_cleanup(void)
 {
 	unregister_filesystem(&fuse_fs_type);
 	unregister_fuseblk();
+	fuse_sysctl_unregister();
 
 	/*
 	 * Make sure all delayed rcu free inodes are flushed before we
diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
index 572ce8a82ceb..a6c8ee551635 100644
--- a/fs/fuse/ioctl.c
+++ b/fs/fuse/ioctl.c
@@ -10,6 +10,8 @@ 
 #include <linux/fileattr.h>
 #include <linux/fsverity.h>
 
+#define FUSE_VERITY_ENABLE_ARG_MAX_PAGES 256
+
 static ssize_t fuse_send_ioctl(struct fuse_mount *fm, struct fuse_args *args,
 			       struct fuse_ioctl_out *outarg)
 {
@@ -140,7 +142,7 @@  static int fuse_setup_enable_verity(unsigned long arg, struct iovec *iov,
 {
 	struct fsverity_enable_arg enable;
 	struct fsverity_enable_arg __user *uarg = (void __user *)arg;
-	const __u32 max_buffer_len = FUSE_MAX_MAX_PAGES * PAGE_SIZE;
+	const __u32 max_buffer_len = FUSE_VERITY_ENABLE_ARG_MAX_PAGES * PAGE_SIZE;
 
 	if (copy_from_user(&enable, uarg, sizeof(enable)))
 		return -EFAULT;
diff --git a/fs/fuse/sysctl.c b/fs/fuse/sysctl.c
new file mode 100644
index 000000000000..0dbcb9688f73
--- /dev/null
+++ b/fs/fuse/sysctl.c
@@ -0,0 +1,35 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * linux/fs/fuse/fuse_sysctl.c
+ *
+ * Sysctl interface to fuse parameters
+ */
+#include <linux/sysctl.h>
+
+#include "fuse_i.h"
+
+static struct ctl_table_header *fuse_table_header;
+
+static struct ctl_table fuse_sysctl_table[] = {
+	{
+		.procname	= "fuse_max_max_pages",
+		.data		= &fuse_max_max_pages,
+		.maxlen		= sizeof(fuse_max_max_pages),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec,
+	},
+};
+
+int fuse_sysctl_register(void)
+{
+	fuse_table_header = register_sysctl("fs/fuse", fuse_sysctl_table);
+	if (!fuse_table_header)
+		return -ENOMEM;
+	return 0;
+}
+
+void fuse_sysctl_unregister(void)
+{
+	unregister_sysctl_table(fuse_table_header);
+	fuse_table_header = NULL;
+}