[RFC,v4,00/36] Fuse-BPF and plans on merging with Fuse Passthrough

Message ID	20240329015351.624249-1-drosen@google.com (mailing list archive)
Headers	show Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF7D8A927 for <bpf@vger.kernel.org>; Fri, 29 Mar 2024 01:54:00 +0000 (UTC) Date: Thu, 28 Mar 2024 18:53:15 -0700 Precedence: bulk Mime-Version: 1.0 Message-ID: <20240329015351.624249-1-drosen@google.com> Subject: [RFC PATCH v4 00/36] Fuse-BPF and plans on merging with Fuse Passthrough From: Daniel Rosenberg <drosen@google.com> To: Miklos Szeredi <miklos@szeredi.hu>, bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org> Cc: Amir Goldstein <amir73il@gmail.com>, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-unionfs@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>, John Fastabend <john.fastabend@gmail.com>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Song Liu <song@kernel.org>, Eduard Zingerman <eddyz87@gmail.com>, Yonghong Song <yonghong.song@linux.dev>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>, Jonathan Corbet <corbet@lwn.net>, Joanne Koong <joannelkoong@gmail.com>, Mykola Lysenko <mykolal@fb.com>, Christian Brauner <brauner@kernel.org>, kernel-team@android.com, Daniel Rosenberg <drosen@google.com> Content-Type: text/plain; charset="UTF-8"
Series	Fuse-BPF and plans on merging with Fuse Passthrough \| expand [RFC,v4,00/36] Fuse-BPF and plans on merging with Fuse Passthrough [RFC,v4,01/36] fuse-bpf: Update fuse side uapi [RFC,v4,02/36] fuse-bpf: Add data structures for fuse-bpf [RFC,v4,03/36] fuse-bpf: Prepare for fuse-bpf patch [RFC,v4,04/36] fuse: Add fuse-bpf, a stacked fs extension for FUSE [RFC,v4,05/36] fuse-bpf: Add ioctl interface for /dev/fuse [RFC,v4,06/36] fuse-bpf: Don't support export_operations [RFC,v4,07/36] fuse-bpf: Add support for access [RFC,v4,08/36] fuse-bpf: Partially add mapping support [RFC,v4,09/36] fuse-bpf: Add lseek support [RFC,v4,10/36] fuse-bpf: Add support for fallocate [RFC,v4,11/36] fuse-bpf: Support file/dir open/close [RFC,v4,12/36] fuse-bpf: Support mknod/unlink/mkdir/rmdir [RFC,v4,13/36] fuse-bpf: Add support for read/write iter [RFC,v4,14/36] fuse-bpf: support readdir [RFC,v4,15/36] fuse-bpf: Add support for sync operations [RFC,v4,16/36] fuse-bpf: Add Rename support [RFC,v4,17/36] fuse-bpf: Add attr support [RFC,v4,18/36] fuse-bpf: Add support for FUSE_COPY_FILE_RANGE [RFC,v4,19/36] fuse-bpf: Add xattr support [RFC,v4,20/36] fuse-bpf: Add symlink/link support [RFC,v4,21/36] fuse-bpf: Add partial flock support [RFC,v4,22/36] fuse-bpf: Add partial ioctl support [RFC,v4,23/36] fuse-bpf: allow mounting with no userspace daemon [RFC,v4,24/36] fuse-bpf: Add fuse-bpf constants [RFC,v4,25/36] bpf: Increase struct_op max members [RFC,v4,26/36] WIP: bpf: Add fuse_ops struct_op programs [RFC,v4,27/36] fuse-bpf: Export Functions [RFC,v4,28/36] fuse: Provide registration functions for fuse-bpf [RFC,v4,29/36] fuse-bpf: Set fuse_ops at mount or lookup time [RFC,v4,30/36] fuse-bpf: Call bpf for pre/post filters [RFC,v4,31/36] fuse-bpf: Add userspace pre/post filters [RFC,v4,32/36] WIP: fuse-bpf: add error_out [RFC,v4,33/36] fuse-bpf: Add default filter op [RFC,v4,34/36] tools: Add FUSE, update bpf includes [RFC,v4,35/36] fuse-bpf: Add selftests [RFC,v4,36/36] fuse: Provide easy way to test fuse struct_op call

Message ID

20240329015351.624249-1-drosen@google.com (mailing list archive)

Headers

Date: Thu, 28 Mar 2024 18:53:15 -0700
Precedence: bulk
Mime-Version: 1.0
Message-ID: <20240329015351.624249-1-drosen@google.com>
Subject: [RFC PATCH v4 00/36] Fuse-BPF and plans on merging with Fuse
 Passthrough
From: Daniel Rosenberg <drosen@google.com>
To: Miklos Szeredi <miklos@szeredi.hu>, bpf@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>
Cc: Amir Goldstein <amir73il@gmail.com>, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-unionfs@vger.kernel.org,
	Daniel Borkmann <daniel@iogearbox.net>,
 John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>,
 Song Liu <song@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
 Yonghong Song <yonghong.song@linux.dev>,
	KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>,
 Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
 Jonathan Corbet <corbet@lwn.net>,
	Joanne Koong <joannelkoong@gmail.com>, Mykola Lysenko <mykolal@fb.com>,
	Christian Brauner <brauner@kernel.org>, kernel-team@android.com,
	Daniel Rosenberg <drosen@google.com>
Content-Type: text/plain; charset="UTF-8"

Series

Fuse-BPF and plans on merging with Fuse Passthrough | expand

Message

Daniel Rosenberg March 29, 2024, 1:53 a.m. UTC

I've recently gotten some time to re-focus on fuse-bpf efforts, and
had some questions on how to best integrate with recent changes that
have landed in the last year. I've included a rebased version (ontop
of bpf-next e63985ecd226 ("bpf, riscv64/cfi: Support kCFI + BPF on
riscv64") of the old patchset for reference here.

On the bpf end, I'm struggling a little bit with the interface for
selecting programs. I'd like to be able to pass the map id to fuse,
since that's a value userspace already knows the program by. Would it
be reasonable to either pass that ID down to the registration
function, or otherwise provide a path for a separate module to
translate from a map id to a struct_op program?

On the fuse end, I'm wondering how the interface will extend to
directories. At LSFMMBPF last year, some people brought up concerns
with the interface we had, specifically that it required opens to get
fds, which we'd then use to respond to lookup requests, adding a lot
of extra overhead. I had been planning to switch to a path that the
response would supply instead, likely limited by RESOLVE_BENEATH. That
seems pretty different from Fuse Passthrough's approach. Are there any
current plans on how that interface will extend for a directory
passthrough?

Could someone clarify why passthrough has an extra layer to register
for use as a backing file? Does the ioctl provide some additional
tracking purpose? I recall there being some security issue around
directly responding with the fd. In fuse-bpf, we were handling this by
responding to the fuse request via an ioctl in those cases.

Passthrough also maintains a separate cred instance for each backing
file. I had been planning to have a single one for the userspace
daemon, likely grabbed during the init response. I'm unsure how the
current Passthrough method there should scale to directories.

Now on to my plans. Struct ops programs have more dynamic support now
[1]. I'm hoping to be able to move most of the Fuse BPF related code
to live closer to Fuse, and to have it more neatly encapsulated when
building as a module. I'm not sure if everything that's needed for
that exists, but I need to play with it a bit more to understand what
I'm missing. I'll probably show up at bpf office hours at some point.

Struct ops have proper multi page support now [2], which removes
another patch. I'm still slightly over the struct ops limit, but that
may change with other changes I'm considering.

I'm very excited to see the new generic stacking filesystem support
with backing-file [3]. I imagine in time we'll extend that to have a
backing-inode as well, for the various inode_operations. That will
definitely involve a lot of refactoring of the way fuse-bpf is
currently structured, but it's clearly the right way forward.

I'm glad to see fuse passthrough, which provides a subset of the
fuse-bpf functionality, has landed[4]. I'm planning to rework the
patch set to integrate better with that. First off, I've been
considering splitting up the bpf progam into a dentry, inode, and file
set. That has the added bonus of pushing us back down below the
current struct_op function list limits. I would want to establish some
linkage between the sets, so that you could still just set the bpf
program at a folder level, and have all objects underneath inherit the
correct program. That's not an issue for a version with just file
support, but I'll want to ensure the interface extends naturally. With
the increased module support, I plan to redo all of the bpf program
linking anyways. The existing code was a temporary placeholder while
the method of registering struct ops programs was still in flux.

My plan for the next patch set is to prune down to just the file
operations. That removes a lot of the tricky questions for the moment,
and should shrink down the patch set massively. Along with that, I'll
clean up the struct_op implementation to take more advantage of the
recent bpf additions.

[1] https://lore.kernel.org/r/20240119225005.668602-12-thinker.li@gmail.com
[2] https://lore.kernel.org/all/20240224223418.526631-3-thinker.li@gmail.com/
[3] https://lore.kernel.org/all/20240105-vfs-rw-9b5809292b57@brauner/
[4] https://lore.kernel.org/all/CAJfpegsZoLMfcpBXBPr7wdAnuXfAYUZYyinru3jrOWWEz7DJPQ@mail.gmail.com/


Daniel Rosenberg (36):
  fuse-bpf: Update fuse side uapi
  fuse-bpf: Add data structures for fuse-bpf
  fuse-bpf: Prepare for fuse-bpf patch
  fuse: Add fuse-bpf, a stacked fs extension for FUSE
  fuse-bpf: Add ioctl interface for /dev/fuse
  fuse-bpf: Don't support export_operations
  fuse-bpf: Add support for access
  fuse-bpf: Partially add mapping support
  fuse-bpf: Add lseek support
  fuse-bpf: Add support for fallocate
  fuse-bpf: Support file/dir open/close
  fuse-bpf: Support mknod/unlink/mkdir/rmdir
  fuse-bpf: Add support for read/write iter
  fuse-bpf: support readdir
  fuse-bpf: Add support for sync operations
  fuse-bpf: Add Rename support
  fuse-bpf: Add attr support
  fuse-bpf: Add support for FUSE_COPY_FILE_RANGE
  fuse-bpf: Add xattr support
  fuse-bpf: Add symlink/link support
  fuse-bpf: Add partial flock support
  fuse-bpf: Add partial ioctl support
  fuse-bpf: allow mounting with no userspace daemon
  fuse-bpf: Add fuse-bpf constants
  bpf: Increase struct_op max members
  WIP: bpf: Add fuse_ops struct_op programs
  fuse-bpf: Export Functions
  fuse: Provide registration functions for fuse-bpf
  fuse-bpf: Set fuse_ops at mount or lookup time
  fuse-bpf: Call bpf for pre/post filters
  fuse-bpf: Add userspace pre/post filters
  WIP: fuse-bpf: add error_out
  fuse-bpf: Add default filter op
  tools: Add FUSE, update bpf includes
  fuse-bpf: Add selftests
  fuse: Provide easy way to test fuse struct_op call

 fs/fuse/Kconfig                               |    8 +
 fs/fuse/Makefile                              |    1 +
 fs/fuse/backing.c                             | 4287 +++++++++++++++++
 fs/fuse/bpf_register.c                        |  207 +
 fs/fuse/control.c                             |    2 +-
 fs/fuse/dev.c                                 |   85 +-
 fs/fuse/dir.c                                 |  318 +-
 fs/fuse/file.c                                |  126 +-
 fs/fuse/fuse_i.h                              |  472 +-
 fs/fuse/inode.c                               |  377 +-
 fs/fuse/ioctl.c                               |   11 +-
 fs/fuse/readdir.c                             |    5 +
 fs/fuse/xattr.c                               |   18 +
 include/linux/bpf.h                           |    2 +-
 include/linux/bpf_fuse.h                      |  285 ++
 include/uapi/linux/bpf.h                      |   13 +
 include/uapi/linux/fuse.h                     |   41 +
 kernel/bpf/Makefile                           |    4 +
 kernel/bpf/bpf_fuse.c                         |  716 +++
 kernel/bpf/bpf_struct_ops.c                   |    2 +
 kernel/bpf/btf.c                              |    1 +
 kernel/bpf/verifier.c                         |   10 +-
 tools/include/uapi/linux/bpf.h                |   13 +
 tools/include/uapi/linux/fuse.h               | 1197 +++++
 .../selftests/filesystems/fuse/.gitignore     |    2 +
 .../selftests/filesystems/fuse/Makefile       |  189 +
 .../testing/selftests/filesystems/fuse/OWNERS |    2 +
 .../selftests/filesystems/fuse/bpf_common.h   |   51 +
 .../selftests/filesystems/fuse/bpf_loader.c   |  597 +++
 .../testing/selftests/filesystems/fuse/fd.txt |   21 +
 .../selftests/filesystems/fuse/fd_bpf.bpf.c   |  397 ++
 .../selftests/filesystems/fuse/fuse_daemon.c  |  300 ++
 .../selftests/filesystems/fuse/fuse_test.c    | 2476 ++++++++++
 .../filesystems/fuse/struct_op_test.bpf.c     |  642 +++
 .../selftests/filesystems/fuse/test.bpf.c     | 1045 ++++
 .../filesystems/fuse/test_framework.h         |  172 +
 .../selftests/filesystems/fuse/test_fuse.h    |  494 ++
 37 files changed, 14385 insertions(+), 204 deletions(-)
 create mode 100644 fs/fuse/backing.c
 create mode 100644 fs/fuse/bpf_register.c
 create mode 100644 include/linux/bpf_fuse.h
 create mode 100644 kernel/bpf/bpf_fuse.c
 create mode 100644 tools/include/uapi/linux/fuse.h
 create mode 100644 tools/testing/selftests/filesystems/fuse/.gitignore
 create mode 100644 tools/testing/selftests/filesystems/fuse/Makefile
 create mode 100644 tools/testing/selftests/filesystems/fuse/OWNERS
 create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_common.h
 create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_loader.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/fd.txt
 create mode 100644 tools/testing/selftests/filesystems/fuse/fd_bpf.bpf.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_daemon.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_test.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/struct_op_test.bpf.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/test.bpf.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/test_framework.h
 create mode 100644 tools/testing/selftests/filesystems/fuse/test_fuse.h


base-commit: e63985ecd22681c7f5975f2e8637187a326b6791

Comments

Amir Goldstein March 29, 2024, 6:44 a.m. UTC | #1

Hi Daniel,

On Fri, Mar 29, 2024 at 4:54 AM Daniel Rosenberg <drosen@google.com> wrote:
>
> I've recently gotten some time to re-focus on fuse-bpf efforts, and

Glad to hear that this effort is still on track!

> had some questions on how to best integrate with recent changes that
> have landed in the last year. I've included a rebased version (ontop
> of bpf-next e63985ecd226 ("bpf, riscv64/cfi: Support kCFI + BPF on
> riscv64") of the old patchset for reference here.
>
> On the bpf end, I'm struggling a little bit with the interface for
> selecting programs. I'd like to be able to pass the map id to fuse,
> since that's a value userspace already knows the program by. Would it
> be reasonable to either pass that ID down to the registration
> function, or otherwise provide a path for a separate module to
> translate from a map id to a struct_op program?
>
> On the fuse end, I'm wondering how the interface will extend to
> directories. At LSFMMBPF last year, some people brought up concerns
> with the interface we had, specifically that it required opens to get
> fds, which we'd then use to respond to lookup requests, adding a lot
> of extra overhead. I had been planning to switch to a path that the
> response would supply instead, likely limited by RESOLVE_BENEATH. That
> seems pretty different from Fuse Passthrough's approach. Are there any
> current plans on how that interface will extend for a directory
> passthrough?
>

My plan was to start from passthrough ioctl with O_PATH fd on lookup
and deal with performance improvements later when there are actual
workloads that report a problem and that depends where the overhead is.

Is it with the opening of O_PATH fds?
Is it with the passthtough ioctls?
If latter, then when fuse uring is merged, I think we could get loose
the ioctls anyway.

> Could someone clarify why passthrough has an extra layer to register
> for use as a backing file? Does the ioctl provide some additional
> tracking purpose? I recall there being some security issue around
> directly responding with the fd. In fuse-bpf, we were handling this by
> responding to the fuse request via an ioctl in those cases.
>

The original reason was to mitigate an attack vector of fooling a
privileged process into writing the fd (number) to /dev/fuse to
gain access to a backing file this way.

The fuse-bpf way of doing all responds with ioctls seems fine for
this purpose, but note that the explicit setup also provides feedback
to the server in case the passthrough cannot be accomplished
for a specific inode (e.g. because of stacking depths overflow)
and that is a big benefit IMO.

> Passthrough also maintains a separate cred instance for each backing
> file. I had been planning to have a single one for the userspace
> daemon, likely grabbed during the init response. I'm unsure how the
> current Passthrough method there should scale to directories.
>

Using a global cred should be fine, just as overlayfs does.
The specific inode passthrough setup could mention if the global
cred should be used.

However, note that overlayfs needs to handle some special cases
when using mounter creds (e.g.: ovl_create_or_link() and dropping
of CAP_SYS_RESOURCE).

If you are going to mimic all this, better have that in the stacking fs
common code.

> Now on to my plans. Struct ops programs have more dynamic support now
> [1]. I'm hoping to be able to move most of the Fuse BPF related code
> to live closer to Fuse, and to have it more neatly encapsulated when
> building as a module. I'm not sure if everything that's needed for
> that exists, but I need to play with it a bit more to understand what
> I'm missing. I'll probably show up at bpf office hours at some point.
>
> Struct ops have proper multi page support now [2], which removes
> another patch. I'm still slightly over the struct ops limit, but that
> may change with other changes I'm considering.
>
> I'm very excited to see the new generic stacking filesystem support
> with backing-file [3]. I imagine in time we'll extend that to have a
> backing-inode as well, for the various inode_operations. That will

Yes, that was the plan.
I had initially planned to start an fsstack library, but I found out that
this "library" already exists (fs/stack.c), but it has code that is not a
good abstraction for ovl/fuse and ecryptfs will not be changed to
improve the abstraction, so we first need to burn this lib and start over:
https://github.com/amir73il/linux/commits/fsstack
Then, we can move backing_file.c into fs/stack/.

> definitely involve a lot of refactoring of the way fuse-bpf is
> currently structured, but it's clearly the right way forward.
>
> I'm glad to see fuse passthrough, which provides a subset of the
> fuse-bpf functionality, has landed[4]. I'm planning to rework the
> patch set to integrate better with that. First off, I've been
> considering splitting up the bpf progam into a dentry, inode, and file

That sounds like a good plan, but also, please remember Miklos' request -
please split the patch sets for review to:
1. FUSE-passthrough-all-mode
2. Attach BPF program

We FUSE developers must be able to review the FUSE/passthough changes
without any BPF code at all (which we have little understanding thereof)

As a merge strategy, I think we need to aim for merging all the FUSE
passthrough infrastructure needed for passthrough of inode operations
strictly before merging any FUSE-BPF specific code.

In parallel you may get BPF infrastructure merged, but integrating FUSE+BPF,
should be done only after all infrastructure is already merged IMO.

> set. That has the added bonus of pushing us back down below the
> current struct_op function list limits. I would want to establish some
> linkage between the sets, so that you could still just set the bpf
> program at a folder level, and have all objects underneath inherit the
> correct program. That's not an issue for a version with just file
> support, but I'll want to ensure the interface extends naturally. With
> the increased module support, I plan to redo all of the bpf program
> linking anyways. The existing code was a temporary placeholder while
> the method of registering struct ops programs was still in flux.
>

So I don't think there is any point in anyone actually reviewing the
v4 patch set that you just posted?

> My plan for the next patch set is to prune down to just the file

Please explain what you mean by that.
How are fuse-bpf file operations expected to be used and specifically,
How are they expected to extend the current FUSE passthrough functionality?

Do you mean that an passthrough setup will include a reference to a bpf
program that will be used to decide per read/write/splice operation
whether it should be passed through to backing file or sent to server
direct_io style?

I just wanted to make sure that you are aware of the fact that direct io
to server is the only mode of io that is allowed on an inode with an attached
backing file.

Thanks,
Amir.

> operations. That removes a lot of the tricky questions for the moment,
> and should shrink down the patch set massively. Along with that, I'll
> clean up the struct_op implementation to take more advantage of the
> recent bpf additions.
>
> [1] https://lore.kernel.org/r/20240119225005.668602-12-thinker.li@gmail.com
> [2] https://lore.kernel.org/all/20240224223418.526631-3-thinker.li@gmail.com/
> [3] https://lore.kernel.org/all/20240105-vfs-rw-9b5809292b57@brauner/
> [4] https://lore.kernel.org/all/CAJfpegsZoLMfcpBXBPr7wdAnuXfAYUZYyinru3jrOWWEz7DJPQ@mail.gmail.com/
>
>
> Daniel Rosenberg (36):
>   fuse-bpf: Update fuse side uapi
>   fuse-bpf: Add data structures for fuse-bpf
>   fuse-bpf: Prepare for fuse-bpf patch
>   fuse: Add fuse-bpf, a stacked fs extension for FUSE
>   fuse-bpf: Add ioctl interface for /dev/fuse
>   fuse-bpf: Don't support export_operations
>   fuse-bpf: Add support for access
>   fuse-bpf: Partially add mapping support
>   fuse-bpf: Add lseek support
>   fuse-bpf: Add support for fallocate
>   fuse-bpf: Support file/dir open/close
>   fuse-bpf: Support mknod/unlink/mkdir/rmdir
>   fuse-bpf: Add support for read/write iter
>   fuse-bpf: support readdir
>   fuse-bpf: Add support for sync operations
>   fuse-bpf: Add Rename support
>   fuse-bpf: Add attr support
>   fuse-bpf: Add support for FUSE_COPY_FILE_RANGE
>   fuse-bpf: Add xattr support
>   fuse-bpf: Add symlink/link support
>   fuse-bpf: Add partial flock support
>   fuse-bpf: Add partial ioctl support
>   fuse-bpf: allow mounting with no userspace daemon
>   fuse-bpf: Add fuse-bpf constants
>   bpf: Increase struct_op max members
>   WIP: bpf: Add fuse_ops struct_op programs
>   fuse-bpf: Export Functions
>   fuse: Provide registration functions for fuse-bpf
>   fuse-bpf: Set fuse_ops at mount or lookup time
>   fuse-bpf: Call bpf for pre/post filters
>   fuse-bpf: Add userspace pre/post filters
>   WIP: fuse-bpf: add error_out
>   fuse-bpf: Add default filter op
>   tools: Add FUSE, update bpf includes
>   fuse-bpf: Add selftests
>   fuse: Provide easy way to test fuse struct_op call
>
>  fs/fuse/Kconfig                               |    8 +
>  fs/fuse/Makefile                              |    1 +
>  fs/fuse/backing.c                             | 4287 +++++++++++++++++
>  fs/fuse/bpf_register.c                        |  207 +
>  fs/fuse/control.c                             |    2 +-
>  fs/fuse/dev.c                                 |   85 +-
>  fs/fuse/dir.c                                 |  318 +-
>  fs/fuse/file.c                                |  126 +-
>  fs/fuse/fuse_i.h                              |  472 +-
>  fs/fuse/inode.c                               |  377 +-
>  fs/fuse/ioctl.c                               |   11 +-
>  fs/fuse/readdir.c                             |    5 +
>  fs/fuse/xattr.c                               |   18 +
>  include/linux/bpf.h                           |    2 +-
>  include/linux/bpf_fuse.h                      |  285 ++
>  include/uapi/linux/bpf.h                      |   13 +
>  include/uapi/linux/fuse.h                     |   41 +
>  kernel/bpf/Makefile                           |    4 +
>  kernel/bpf/bpf_fuse.c                         |  716 +++
>  kernel/bpf/bpf_struct_ops.c                   |    2 +
>  kernel/bpf/btf.c                              |    1 +
>  kernel/bpf/verifier.c                         |   10 +-
>  tools/include/uapi/linux/bpf.h                |   13 +
>  tools/include/uapi/linux/fuse.h               | 1197 +++++
>  .../selftests/filesystems/fuse/.gitignore     |    2 +
>  .../selftests/filesystems/fuse/Makefile       |  189 +
>  .../testing/selftests/filesystems/fuse/OWNERS |    2 +
>  .../selftests/filesystems/fuse/bpf_common.h   |   51 +
>  .../selftests/filesystems/fuse/bpf_loader.c   |  597 +++
>  .../testing/selftests/filesystems/fuse/fd.txt |   21 +
>  .../selftests/filesystems/fuse/fd_bpf.bpf.c   |  397 ++
>  .../selftests/filesystems/fuse/fuse_daemon.c  |  300 ++
>  .../selftests/filesystems/fuse/fuse_test.c    | 2476 ++++++++++
>  .../filesystems/fuse/struct_op_test.bpf.c     |  642 +++
>  .../selftests/filesystems/fuse/test.bpf.c     | 1045 ++++
>  .../filesystems/fuse/test_framework.h         |  172 +
>  .../selftests/filesystems/fuse/test_fuse.h    |  494 ++
>  37 files changed, 14385 insertions(+), 204 deletions(-)
>  create mode 100644 fs/fuse/backing.c
>  create mode 100644 fs/fuse/bpf_register.c
>  create mode 100644 include/linux/bpf_fuse.h
>  create mode 100644 kernel/bpf/bpf_fuse.c
>  create mode 100644 tools/include/uapi/linux/fuse.h
>  create mode 100644 tools/testing/selftests/filesystems/fuse/.gitignore
>  create mode 100644 tools/testing/selftests/filesystems/fuse/Makefile
>  create mode 100644 tools/testing/selftests/filesystems/fuse/OWNERS
>  create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_common.h
>  create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_loader.c
>  create mode 100644 tools/testing/selftests/filesystems/fuse/fd.txt
>  create mode 100644 tools/testing/selftests/filesystems/fuse/fd_bpf.bpf.c
>  create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_daemon.c
>  create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_test.c
>  create mode 100644 tools/testing/selftests/filesystems/fuse/struct_op_test.bpf.c
>  create mode 100644 tools/testing/selftests/filesystems/fuse/test.bpf.c
>  create mode 100644 tools/testing/selftests/filesystems/fuse/test_framework.h
>  create mode 100644 tools/testing/selftests/filesystems/fuse/test_fuse.h
>
>
> base-commit: e63985ecd22681c7f5975f2e8637187a326b6791
> --
> 2.44.0.478.gd926399ef9-goog
>

Daniel Rosenberg March 30, 2024, 12:59 a.m. UTC | #2

On Thu, Mar 28, 2024 at 11:45 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> My plan was to start from passthrough ioctl with O_PATH fd on lookup
> and deal with performance improvements later when there are actual
> workloads that report a problem and that depends where the overhead is.
>
> Is it with the opening of O_PATH fds?
> Is it with the passthtough ioctls?
> If latter, then when fuse uring is merged, I think we could get loose
> the ioctls anyway.
>

I'm not terribly sure. Ideally I would have cc'ed them on this email,
but I didn't take down contact info with my notes. I was under the
impression that it was triggering all of the opens before an actual
open was needed, for example, during ls -l, but I don't know if that
was with O_PATH fds. I was a bit concerned that a performance fix
there might end up needing a different interface, and managing
multiple of those could get pretty cluttered. But I agree that it
doesn't make sense to do anything there without a concrete use-case
and issue.

>
> The original reason was to mitigate an attack vector of fooling a
> privileged process into writing the fd (number) to /dev/fuse to
> gain access to a backing file this way.
>
> The fuse-bpf way of doing all responds with ioctls seems fine for
> this purpose, but note that the explicit setup also provides feedback
> to the server in case the passthrough cannot be accomplished
> for a specific inode (e.g. because of stacking depths overflow)
> and that is a big benefit IMO.
>

That certainly informs the daemon of the error earlier. So long as we
can still run the complete passthrough mode serverless that's fine by
me. I've found that mode helpful for running filesystem tests on pure
backing mode, plus I imagine some simple Fuse filesystems could get
away with only the bpf programs.

>
> Using a global cred should be fine, just as overlayfs does.
> The specific inode passthrough setup could mention if the global
> cred should be used.
>
> However, note that overlayfs needs to handle some special cases
> when using mounter creds (e.g.: ovl_create_or_link() and dropping
> of CAP_SYS_RESOURCE).
>
> If you are going to mimic all this, better have that in the stacking fs
> common code.
>

Sure. The less duplicate code the better :)

>
> That sounds like a good plan, but also, please remember Miklos' request -
> please split the patch sets for review to:
> 1. FUSE-passthrough-all-mode
> 2. Attach BPF program
>
> We FUSE developers must be able to review the FUSE/passthough changes
> without any BPF code at all (which we have little understanding thereof)
>
> As a merge strategy, I think we need to aim for merging all the FUSE
> passthrough infrastructure needed for passthrough of inode operations
> strictly before merging any FUSE-BPF specific code.
>
> In parallel you may get BPF infrastructure merged, but integrating FUSE+BPF,
> should be done only after all infrastructure is already merged IMO.
>

Ok. I'll probably mess around with the module stuff at least, in order
to work out if everything I need is present on the bpf side. Do you
know if anyone is actively working on extending the file-backing work
to something like inode-backing? I don't want to duplicate work there,
but I'd be happy to start looking at it. Otherwise I'd focus on the
bpf end for now. I expect we'll want to be able to optionally set the
bpf program at the same place where we set the backing file/inode.
Hence the spit into a file and inode program set. I'm still thinking
over what the best way to address the programs is...

>
> So I don't think there is any point in anyone actually reviewing the
> v4 patch set that you just posted?
>

Correct. The only reason I included it was as a reference for the sort
of stuff fuse-bpf is currently doing.

>
> Please explain what you mean by that.
> How are fuse-bpf file operations expected to be used and specifically,
> How are they expected to extend the current FUSE passthrough functionality?
>
> Do you mean that an passthrough setup will include a reference to a bpf
> program that will be used to decide per read/write/splice operation
> whether it should be passed through to backing file or sent to server
> direct_io style?
>

So in the current fuse-bpf setup, the bpf program does two things. It
can edit certain parameters, and it can indicate what the next action
should be. That action could be queuing up the post filter after the
backing operation, deferring to a userspace pre/post filter, or going
back to normal fuse operations.
The latter one isn't currently very well fleshed out. Unless you do
some specific tracking, under existing fuse-bpf you'd have a node id
of 0, and userspace can't make much out of that. With that aside,
there's all sorts of caching nightmares to deal with there.

We're only using the parameter changing currently in our use cases. I
wouldn't be opposed to leaving the falling back to fuse for specific
operations out of v1 of the bpf enhancements, especially if we have
the userspace pre/post filters available.
So you'd optionally specify a bpf program to use with the backing
file. That would allow you to manipulate some data in the files like
you might in Fuse itself. For instance, data redaction. You could null
out location metadata in images, provided a map or something with the
offsets that should be nulled. You could also prepend some data at the
beginning of a file by adjusting offsets and attrs and whatnot. I
could imagine having multiple backing files, and the bpf program
splitting a read into multiple parts to handle parts of it using
different backing files, although that's not in the current design.

>
> I just wanted to make sure that you are aware of the fact that direct io
> to server is the only mode of io that is allowed on an inode with an attached
> backing file.
>
> Thanks,
> Amir.
>

Can you not read/write without interacting with the server? Or do you
mean FOPEN_DIRECT_IO sends some file ops to the server even in
passthrough mode? At the moment I'm tempted to follow the same
mechanics passthrough is using. The only exception would be possibly
tossing back to the server, which I mentioned above. That'd only
happen for, say, read, if we're not under FOPEN_DIRECT_IO. I've not
looked too closely at FOPEN_DIRECT_IO. In Fuse bpf we currently have
bpf mode taking priority. Are there any email threads I should look at
for more background there?

-Daniel

Amir Goldstein April 1, 2024, 2:43 p.m. UTC | #3

> >
> > That sounds like a good plan, but also, please remember Miklos' request -
> > please split the patch sets for review to:
> > 1. FUSE-passthrough-all-mode
> > 2. Attach BPF program
> >
> > We FUSE developers must be able to review the FUSE/passthough changes
> > without any BPF code at all (which we have little understanding thereof)
> >
> > As a merge strategy, I think we need to aim for merging all the FUSE
> > passthrough infrastructure needed for passthrough of inode operations
> > strictly before merging any FUSE-BPF specific code.
> >
> > In parallel you may get BPF infrastructure merged, but integrating FUSE+BPF,
> > should be done only after all infrastructure is already merged IMO.
> >
>
> Ok. I'll probably mess around with the module stuff at least, in order
> to work out if everything I need is present on the bpf side. Do you
> know if anyone is actively working on extending the file-backing work
> to something like inode-backing? I don't want to duplicate work there,

I am actively *thinking* about working on passthrough for getattr/getxattr.
As soon as I come up with something concrete I will let you know.

> but I'd be happy to start looking at it. Otherwise I'd focus on the
> bpf end for now. I expect we'll want to be able to optionally set the
> bpf program at the same place where we set the backing file/inode.
> Hence the spit into a file and inode program set. I'm still thinking
> over what the best way to address the programs is...
>

My thoughts were doing something similar to FOPEN_PASSTHROUGH
but in response to LOOKUP request and in that case the fuse inode
will enter passthrough mode early and will not leave passthrough mode
until inode is evicted.

> > Please explain what you mean by that.
> > How are fuse-bpf file operations expected to be used and specifically,
> > How are they expected to extend the current FUSE passthrough functionality?
> >
> > Do you mean that an passthrough setup will include a reference to a bpf
> > program that will be used to decide per read/write/splice operation
> > whether it should be passed through to backing file or sent to server
> > direct_io style?
> >
>
> So in the current fuse-bpf setup, the bpf program does two things. It
> can edit certain parameters, and it can indicate what the next action
> should be. That action could be queuing up the post filter after the
> backing operation, deferring to a userspace pre/post filter, or going
> back to normal fuse operations.
> The latter one isn't currently very well fleshed out. Unless you do
> some specific tracking, under existing fuse-bpf you'd have a node id
> of 0, and userspace can't make much out of that. With that aside,

node id 0 sounds weird.
I was wondering if and how a passthrough lookup operation would work.
The only thing I can think of is that in this setup, fuse must use the backing
file st_ino as the fuse node id, so that the kernel can instantiate a fuse inode
before the server knows about it.

> there's all sorts of caching nightmares to deal with there.
>

Yeh...

> We're only using the parameter changing currently in our use cases. I
> wouldn't be opposed to leaving the falling back to fuse for specific
> operations out of v1 of the bpf enhancements, especially if we have
> the userspace pre/post filters available.
> So you'd optionally specify a bpf program to use with the backing
> file. That would allow you to manipulate some data in the files like
> you might in Fuse itself. For instance, data redaction. You could null
> out location metadata in images, provided a map or something with the
> offsets that should be nulled. You could also prepend some data at the
> beginning of a file by adjusting offsets and attrs and whatnot. I
> could imagine having multiple backing files, and the bpf program
> splitting a read into multiple parts to handle parts of it using
> different backing files, although that's not in the current design.
>

Lots of plans ;)

>
> Can you not read/write without interacting with the server? Or do you
> mean FOPEN_DIRECT_IO sends some file ops to the server even in
> passthrough mode?

FOPEN_DIRECT_IO sends write() and read() to the server even in
passthrough mode.

> At the moment I'm tempted to follow the same
> mechanics passthrough is using. The only exception would be possibly
> tossing back to the server, which I mentioned above. That'd only
> happen for, say, read, if we're not under FOPEN_DIRECT_IO. I've not
> looked too closely at FOPEN_DIRECT_IO. In Fuse bpf we currently have
> bpf mode taking priority. Are there any email threads I should look at
> for more background there?

Maybe this patch set:
https://lore.kernel.org/linux-fsdevel/20240208170603.2078871-1-amir73il@gmail.com/

Bernd and I worked on it together as a prerequisite to fuse passthrough.
Benrd has some followup direct_io re-factoring patches.

Thanks,
Amir.

Bernd Schubert April 1, 2024, 8:27 p.m. UTC | #4

On 4/1/24 16:43, Amir Goldstein wrote:
> 
>>
>> Can you not read/write without interacting with the server? Or do you
>> mean FOPEN_DIRECT_IO sends some file ops to the server even in
>> passthrough mode?
> 
> FOPEN_DIRECT_IO sends write() and read() to the server even in
> passthrough mode.
> 
>> At the moment I'm tempted to follow the same
>> mechanics passthrough is using. The only exception would be possibly
>> tossing back to the server, which I mentioned above. That'd only
>> happen for, say, read, if we're not under FOPEN_DIRECT_IO. I've not
>> looked too closely at FOPEN_DIRECT_IO. In Fuse bpf we currently have
>> bpf mode taking priority. Are there any email threads I should look at
>> for more background there?
> 
> Maybe this patch set:
> https://lore.kernel.org/linux-fsdevel/20240208170603.2078871-1-amir73il@gmail.com/
> 
> Bernd and I worked on it together as a prerequisite to fuse passthrough.
> Benrd has some followup direct_io re-factoring patches.


They are in this branch
https://github.com/bsbernd/linux/commits/fuse-dio-v5/

Going to rebase it to 6.9 this week and will send it out then. I had
been rather occupied with with ddn internal work and didn't have the
time since February...


Thanks,
Bernd