diff mbox series

[v13,05/10] fuse: Handle asynchronous read and write in passthrough

Message ID 20230519125705.598234-6-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series fuse: Add support for passthrough read/write | expand

Commit Message

Amir Goldstein May 19, 2023, 12:57 p.m. UTC
Extend the passthrough feature by handling asynchronous IO both for read
and write operations.

When an AIO request is received, if the request targets a FUSE file with
the passthrough functionality enabled, a new identical AIO request is
created.  The new request targets the backing file and gets assigned
a special FUSE passthrough AIO completion callback.

When the backing file AIO request is completed, the FUSE
passthrough AIO completion callback is executed and propagates the
completion signal to the FUSE AIO request by triggering its completion
callback as well.

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/passthrough.c | 82 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 75 insertions(+), 7 deletions(-)

Comments

Miklos Szeredi May 22, 2023, 3:20 p.m. UTC | #1
On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
>
> Extend the passthrough feature by handling asynchronous IO both for read
> and write operations.
>
> When an AIO request is received, if the request targets a FUSE file with
> the passthrough functionality enabled, a new identical AIO request is
> created.  The new request targets the backing file and gets assigned
> a special FUSE passthrough AIO completion callback.
>
> When the backing file AIO request is completed, the FUSE
> passthrough AIO completion callback is executed and propagates the
> completion signal to the FUSE AIO request by triggering its completion
> callback as well.

Overlayfs added refcounting to the async req (commit 9a2544037600
("ovl: fix use after free in struct ovl_aio_req")).  Is that not
needed for fuse as well?

Would it make sense to try and merge the two implementations?

Thanks,
Miklos
Amir Goldstein May 24, 2023, 10:03 a.m. UTC | #2
On Mon, May 22, 2023 at 6:20 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Extend the passthrough feature by handling asynchronous IO both for read
> > and write operations.
> >
> > When an AIO request is received, if the request targets a FUSE file with
> > the passthrough functionality enabled, a new identical AIO request is
> > created.  The new request targets the backing file and gets assigned
> > a special FUSE passthrough AIO completion callback.
> >
> > When the backing file AIO request is completed, the FUSE
> > passthrough AIO completion callback is executed and propagates the
> > completion signal to the FUSE AIO request by triggering its completion
> > callback as well.
>
> Overlayfs added refcounting to the async req (commit 9a2544037600
> ("ovl: fix use after free in struct ovl_aio_req")).  Is that not
> needed for fuse as well?
>
> Would it make sense to try and merge the two implementations?
>

Makes sense - I will look into it.

Thanks,
Amir.
Amir Goldstein Aug. 21, 2023, 3:27 p.m. UTC | #3
On Wed, May 24, 2023 at 1:03 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, May 22, 2023 at 6:20 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > Extend the passthrough feature by handling asynchronous IO both for read
> > > and write operations.
> > >
> > > When an AIO request is received, if the request targets a FUSE file with
> > > the passthrough functionality enabled, a new identical AIO request is
> > > created.  The new request targets the backing file and gets assigned
> > > a special FUSE passthrough AIO completion callback.
> > >
> > > When the backing file AIO request is completed, the FUSE
> > > passthrough AIO completion callback is executed and propagates the
> > > completion signal to the FUSE AIO request by triggering its completion
> > > callback as well.
> >
> > Overlayfs added refcounting to the async req (commit 9a2544037600
> > ("ovl: fix use after free in struct ovl_aio_req")).  Is that not
> > needed for fuse as well?
> >
> > Would it make sense to try and merge the two implementations?
> >
>
> Makes sense - I will look into it.


Hi Miklos,

Getting back to this.
Did you mean something like that? (only compile tested)

https://github.com/amir73il/linux/commits/backing_fs

If yes, then I wonder:
1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
    (i.e. the APPEND flag) intentional?
2. What would be the right way to do ovl_copyattr() on io completion?
    Pass another completion handler to read/write helpers?
    This seems a bit ugly. Do you have a nicer idea?

Thanks,
Amir.
Amir Goldstein Aug. 22, 2023, 10:18 a.m. UTC | #4
On Mon, Aug 21, 2023 at 6:27 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, May 24, 2023 at 1:03 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, May 22, 2023 at 6:20 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > Extend the passthrough feature by handling asynchronous IO both for read
> > > > and write operations.
> > > >
> > > > When an AIO request is received, if the request targets a FUSE file with
> > > > the passthrough functionality enabled, a new identical AIO request is
> > > > created.  The new request targets the backing file and gets assigned
> > > > a special FUSE passthrough AIO completion callback.
> > > >
> > > > When the backing file AIO request is completed, the FUSE
> > > > passthrough AIO completion callback is executed and propagates the
> > > > completion signal to the FUSE AIO request by triggering its completion
> > > > callback as well.
> > >
> > > Overlayfs added refcounting to the async req (commit 9a2544037600
> > > ("ovl: fix use after free in struct ovl_aio_req")).  Is that not
> > > needed for fuse as well?
> > >
> > > Would it make sense to try and merge the two implementations?
> > >
> >
> > Makes sense - I will look into it.
>
>
> Hi Miklos,
>
> Getting back to this.
> Did you mean something like that? (only compile tested)
>
> https://github.com/amir73il/linux/commits/backing_fs
>
> If yes, then I wonder:
> 1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
>     (i.e. the APPEND flag) intentional?
> 2. What would be the right way to do ovl_copyattr() on io completion?
>     Pass another completion handler to read/write helpers?
>     This seems a bit ugly. Do you have a nicer idea?
>

Hmm. Looking closer, ovl_copyattr() in ovl_aio_cleanup_handler()
seems a bit racy as it is not done under inode_lock().

I wonder if it is enough to fix that by adding the lock or if we need
to resort to a more complicated scheme like FUSE_I_SIZE_UNSTABLE
for overlayfs aio?

Thanks,
Amir.
Miklos Szeredi Aug. 22, 2023, 11:03 a.m. UTC | #5
On Tue, 22 Aug 2023 at 12:18, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, Aug 21, 2023 at 6:27 PM Amir Goldstein <amir73il@gmail.com> wrote:

> > Getting back to this.
> > Did you mean something like that? (only compile tested)
> >
> > https://github.com/amir73il/linux/commits/backing_fs
> >
> > If yes, then I wonder:
> > 1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
> >     (i.e. the APPEND flag) intentional?

Setting IOCB_APPEND on the backing file doesn't make a difference as
long as the backing file is not modified during the write.

In overlayfs the case of the backing file being modified is not
defined, so I guess that's the reason to omit it.  However I don't see
a problem with setting it on the backing file either, the file
size/position is synchronized after the write, so nothing bad should
happen if the backing file was modified.

> > 2. What would be the right way to do ovl_copyattr() on io completion?
> >     Pass another completion handler to read/write helpers?
> >     This seems a bit ugly. Do you have a nicer idea?
> >

Ugh, I missed that little detail.   I don't have a better idea than to
use a callback function.

>
> Hmm. Looking closer, ovl_copyattr() in ovl_aio_cleanup_handler()
> seems a bit racy as it is not done under inode_lock().
>
> I wonder if it is enough to fix that by adding the lock or if we need
> to resort to a more complicated scheme like FUSE_I_SIZE_UNSTABLE
> for overlayfs aio?

Quite recently rename didn't take inode lock on source, so
ovl_aio_cleanup_handler() wasn't the only unlocked instance.

I don't see a strong reason to always lock the inode before
ovl_copyattr(), but I could be wrong.

Thanks,
Miklos
Amir Goldstein Aug. 22, 2023, 1:22 p.m. UTC | #6
On Tue, Aug 22, 2023 at 2:03 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 22 Aug 2023 at 12:18, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, Aug 21, 2023 at 6:27 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > Getting back to this.
> > > Did you mean something like that? (only compile tested)
> > >
> > > https://github.com/amir73il/linux/commits/backing_fs
> > >
> > > If yes, then I wonder:
> > > 1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
> > >     (i.e. the APPEND flag) intentional?
>
> Setting IOCB_APPEND on the backing file doesn't make a difference as
> long as the backing file is not modified during the write.
>
> In overlayfs the case of the backing file being modified is not
> defined, so I guess that's the reason to omit it.  However I don't see
> a problem with setting it on the backing file either, the file
> size/position is synchronized after the write, so nothing bad should
> happen if the backing file was modified.
>
> > > 2. What would be the right way to do ovl_copyattr() on io completion?
> > >     Pass another completion handler to read/write helpers?
> > >     This seems a bit ugly. Do you have a nicer idea?
> > >
>
> Ugh, I missed that little detail.   I don't have a better idea than to
> use a callback function.
>
> >
> > Hmm. Looking closer, ovl_copyattr() in ovl_aio_cleanup_handler()
> > seems a bit racy as it is not done under inode_lock().
> >
> > I wonder if it is enough to fix that by adding the lock or if we need
> > to resort to a more complicated scheme like FUSE_I_SIZE_UNSTABLE
> > for overlayfs aio?
>
> Quite recently rename didn't take inode lock on source, so
> ovl_aio_cleanup_handler() wasn't the only unlocked instance.
>
> I don't see a strong reason to always lock the inode before
> ovl_copyattr(), but I could be wrong.
>

IDK, ovl_copyattr() looks like a textbook example of a race
if not protected by something because it reads a bunch of stuff
from realinode and then writes a bunch of stuff to inode.

Anyway, I guess it wouldn't hurt to wrap it with inode_lock()
in the ovl completion callback.

Thanks,
Amir.
Miklos Szeredi Aug. 22, 2023, 2:06 p.m. UTC | #7
On Tue, 22 Aug 2023 at 15:22, Amir Goldstein <amir73il@gmail.com> wrote:

> IDK, ovl_copyattr() looks like a textbook example of a race
> if not protected by something because it reads a bunch of stuff
> from realinode and then writes a bunch of stuff to inode.

Yeah, you are right.

> Anyway, I guess it wouldn't hurt to wrap it with inode_lock()
> in the ovl completion callback.

Okay.

Thanks,
Miklos
Amir Goldstein Aug. 24, 2023, 12:11 p.m. UTC | #8
On Tue, Aug 22, 2023 at 2:03 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 22 Aug 2023 at 12:18, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, Aug 21, 2023 at 6:27 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > Getting back to this.
> > > Did you mean something like that? (only compile tested)
> > >
> > > https://github.com/amir73il/linux/commits/backing_fs
> > >
> > > If yes, then I wonder:
> > > 1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
> > >     (i.e. the APPEND flag) intentional?
>
> Setting IOCB_APPEND on the backing file doesn't make a difference as
> long as the backing file is not modified during the write.
>
> In overlayfs the case of the backing file being modified is not
> defined, so I guess that's the reason to omit it.  However I don't see
> a problem with setting it on the backing file either, the file
> size/position is synchronized after the write, so nothing bad should
> happen if the backing file was modified.
>

That raises the question if FUSE passthrough behavior is defined when
backing file is being modified. Should it be any different than ovl?
I don't really care if we set IOCB_APPEND or not, just if we need
a different mask for ovl and FUSE.

> > > 2. What would be the right way to do ovl_copyattr() on io completion?
> > >     Pass another completion handler to read/write helpers?
> > >     This seems a bit ugly. Do you have a nicer idea?
> > >
>
> Ugh, I missed that little detail.   I don't have a better idea than to
> use a callback function.
>

Ok. added the cleanup callback.
I think it's not that bad?

https://github.com/amir73il/linux/commits/backing_fs

> >
> > Hmm. Looking closer, ovl_copyattr() in ovl_aio_cleanup_handler()
> > seems a bit racy as it is not done under inode_lock().
> >

Decided to fix that by taking ovl inode spinlock inside ovl_copyattr()
and did the same for ovl_file_accessed() for read ops.

I've found and fixed two other issues with aio completion on this branch,
one of them is a fix for a possible realfile refcount leak, so will probably
need to backport this one.

Thanks,
Amir.
Miklos Szeredi Aug. 29, 2023, 12:42 p.m. UTC | #9
On Thu, 24 Aug 2023 at 14:12, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 2:03 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Tue, 22 Aug 2023 at 12:18, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Mon, Aug 21, 2023 at 6:27 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > > Getting back to this.
> > > > Did you mean something like that? (only compile tested)
> > > >
> > > > https://github.com/amir73il/linux/commits/backing_fs
> > > >
> > > > If yes, then I wonder:
> > > > 1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
> > > >     (i.e. the APPEND flag) intentional?
> >
> > Setting IOCB_APPEND on the backing file doesn't make a difference as
> > long as the backing file is not modified during the write.
> >
> > In overlayfs the case of the backing file being modified is not
> > defined, so I guess that's the reason to omit it.  However I don't see
> > a problem with setting it on the backing file either, the file
> > size/position is synchronized after the write, so nothing bad should
> > happen if the backing file was modified.
> >
>
> That raises the question if FUSE passthrough behavior is defined when
> backing file is being modified. Should it be any different than ovl?
> I don't really care if we set IOCB_APPEND or not, just if we need
> a different mask for ovl and FUSE.

Dunno.

I feel that instead of making the behavior defined when file is
modified behind fuse's back, there should be some mechanism between
the server and the client (userspace and kernel) that allows proper
handling of "remote" modification (oplocks/leases/younameit).

I thought about this quite a bit, and even have some patches for the
read-only lease case.  But this is far from trivial.

In the meantime just setting IOCB_APPEND is quite harmless, so I think
we should do it for both overlayfs and fuse for consistency.

>
> > > > 2. What would be the right way to do ovl_copyattr() on io completion?
> > > >     Pass another completion handler to read/write helpers?
> > > >     This seems a bit ugly. Do you have a nicer idea?
> > > >
> >
> > Ugh, I missed that little detail.   I don't have a better idea than to
> > use a callback function.
> >
>
> Ok. added the cleanup callback.
> I think it's not that bad?
>
> https://github.com/amir73il/linux/commits/backing_fs

Looks good.


>
> > >
> > > Hmm. Looking closer, ovl_copyattr() in ovl_aio_cleanup_handler()
> > > seems a bit racy as it is not done under inode_lock().
> > >
>
> Decided to fix that by taking ovl inode spinlock inside ovl_copyattr()
> and did the same for ovl_file_accessed() for read ops.
>
> I've found and fixed two other issues with aio completion on this branch,
> one of them is a fix for a possible realfile refcount leak, so will probably
> need to backport this one.

Great.  Thanks.

Miklos
diff mbox series

Patch

diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 9d81f3982c96..2ccd2d6de736 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -9,6 +9,36 @@ 
 #define FUSE_IOCB_MASK                                                  \
 	(IOCB_APPEND | IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
 
+struct fuse_aio_req {
+	struct kiocb iocb;
+	struct kiocb *iocb_fuse;
+};
+
+static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req)
+{
+	struct kiocb *iocb = &aio_req->iocb;
+	struct kiocb *iocb_fuse = aio_req->iocb_fuse;
+
+	if (iocb->ki_flags & IOCB_WRITE) {
+		__sb_writers_acquired(file_inode(iocb->ki_filp)->i_sb,
+				      SB_FREEZE_WRITE);
+		file_end_write(iocb->ki_filp);
+	}
+
+	iocb_fuse->ki_pos = iocb->ki_pos;
+	kfree(aio_req);
+}
+
+static void fuse_aio_rw_complete(struct kiocb *iocb, long res)
+{
+	struct fuse_aio_req *aio_req =
+		container_of(iocb, struct fuse_aio_req, iocb);
+	struct kiocb *iocb_fuse = aio_req->iocb_fuse;
+
+	fuse_aio_cleanup_handler(aio_req);
+	iocb_fuse->ki_complete(iocb_fuse, res);
+}
+
 ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 				   struct iov_iter *iter)
 {
@@ -21,8 +51,24 @@  ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 	if (!iov_iter_count(iter))
 		return 0;
 
-	rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
-	ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos, rwf);
+	if (is_sync_kiocb(iocb_fuse)) {
+		rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
+		ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos,
+				    rwf);
+	} else {
+		struct fuse_aio_req *aio_req;
+
+		aio_req = kmalloc(sizeof(struct fuse_aio_req), GFP_KERNEL);
+		if (!aio_req)
+			return -ENOMEM;
+
+		aio_req->iocb_fuse = iocb_fuse;
+		kiocb_clone(&aio_req->iocb, iocb_fuse, passthrough_filp);
+		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
+		ret = call_read_iter(passthrough_filp, &aio_req->iocb, iter);
+		if (ret != -EIOCBQUEUED)
+			fuse_aio_cleanup_handler(aio_req);
+	}
 
 	return ret;
 }
@@ -34,6 +80,7 @@  ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 	struct fuse_file *ff = fuse_filp->private_data;
 	struct inode *fuse_inode = file_inode(fuse_filp);
 	struct file *passthrough_filp = ff->passthrough->filp;
+	struct inode *passthrough_inode = file_inode(passthrough_filp);
 	ssize_t ret;
 	rwf_t rwf;
 
@@ -42,11 +89,32 @@  ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 
 	inode_lock(fuse_inode);
 
-	file_start_write(passthrough_filp);
-	rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
-	ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos, rwf);
-	file_end_write(passthrough_filp);
-
+	if (is_sync_kiocb(iocb_fuse)) {
+		file_start_write(passthrough_filp);
+		rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
+		ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos,
+				     rwf);
+		file_end_write(passthrough_filp);
+	} else {
+		struct fuse_aio_req *aio_req;
+
+		aio_req = kmalloc(sizeof(struct fuse_aio_req), GFP_KERNEL);
+		if (!aio_req) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		file_start_write(passthrough_filp);
+		__sb_writers_release(passthrough_inode->i_sb, SB_FREEZE_WRITE);
+
+		aio_req->iocb_fuse = iocb_fuse;
+		kiocb_clone(&aio_req->iocb, iocb_fuse, passthrough_filp);
+		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
+		ret = call_write_iter(passthrough_filp, &aio_req->iocb, iter);
+		if (ret != -EIOCBQUEUED)
+			fuse_aio_cleanup_handler(aio_req);
+	}
+out:
 	inode_unlock(fuse_inode);
 
 	return ret;