diff mbox series

fuse: add FOPEN_FETCH_ATTR flag for fetching attributes after open

Message ID 20240813212149.1909627-1-joannelkoong@gmail.com (mailing list archive)
State New
Headers show
Series fuse: add FOPEN_FETCH_ATTR flag for fetching attributes after open | expand

Commit Message

Joanne Koong Aug. 13, 2024, 9:21 p.m. UTC
Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
fetched from the server after an open.

For fuse servers that are backed by network filesystems, this is
needed to ensure that file attributes are up to date between
consecutive open calls.

For example, if there is a file that is opened on two fuse mounts,
in the following scenario:

on mount A, open file.txt w/ O_APPEND, write "hi", close file
on mount B, open file.txt w/ O_APPEND, write "world", close file
on mount A, open file.txt w/ O_APPEND, write "123", close file

when the file is reopened on mount A, the file inode contains the old
size and the last append will overwrite the data that was written when
the file was opened/written on mount B.

(This corruption can be reproduced on the example libfuse passthrough_hp
server with writeback caching disabled and nopassthrough)

Having this flag as an option enables parity with NFS's close-to-open
consistency.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/fuse/file.c            | 7 ++++++-
 include/uapi/linux/fuse.h | 7 ++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

Comments

Joanne Koong Aug. 13, 2024, 9:37 p.m. UTC | #1
On Tue, Aug 13, 2024 at 2:22 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
> fetched from the server after an open.
>
> For fuse servers that are backed by network filesystems, this is
> needed to ensure that file attributes are up to date between
> consecutive open calls.
>
> For example, if there is a file that is opened on two fuse mounts,
> in the following scenario:
>
> on mount A, open file.txt w/ O_APPEND, write "hi", close file
> on mount B, open file.txt w/ O_APPEND, write "world", close file
> on mount A, open file.txt w/ O_APPEND, write "123", close file
>
> when the file is reopened on mount A, the file inode contains the old
> size and the last append will overwrite the data that was written when
> the file was opened/written on mount B.
>
> (This corruption can be reproduced on the example libfuse passthrough_hp
> server with writeback caching disabled and nopassthrough)
>
> Having this flag as an option enables parity with NFS's close-to-open
> consistency.
>

This is the corresponding libfuse change:
https://github.com/libfuse/libfuse/pull/1012/commits/a59f8e0f5ac9e09c0a54c5cd94e6bca7b635f57d

This is for the same overwrite/corruption issue Sweet Tea brought up
earlier this year in March in this thread:
https://lore.kernel.org/linux-fsdevel/9d71a4fd1f1d8d4cfc28480f01e5fe3dc5a7e3f0.1709821568.git.sweettea-kernel@dorminy.me/#t

In that thread, there was mention that Bernd's atomic open patchset
may solve the issue. However, I don't think this works for cases where
open is called without needing a fuse lookup.

An alternative approach we considered is similar to Sweet Tea's
approach, but gated behind an optional config flag that calls
fuse_update_attributes() for statx_size at the beginning of every
write and removes the FUSE_STATX_MODSIZE invalidation that every write
does (in fuse_write_update_attr). The latter is needed or else every
fuse_update_attributes() call in the beginning of the write will
trigger a FUSE_GETATTR request to the server.
Bernd Schubert Aug. 13, 2024, 9:44 p.m. UTC | #2
On 8/13/24 23:21, Joanne Koong wrote:
> Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
> fetched from the server after an open.
> 
> For fuse servers that are backed by network filesystems, this is
> needed to ensure that file attributes are up to date between
> consecutive open calls.
> 
> For example, if there is a file that is opened on two fuse mounts,
> in the following scenario:
> 
> on mount A, open file.txt w/ O_APPEND, write "hi", close file
> on mount B, open file.txt w/ O_APPEND, write "world", close file
> on mount A, open file.txt w/ O_APPEND, write "123", close file
> 
> when the file is reopened on mount A, the file inode contains the old
> size and the last append will overwrite the data that was written when
> the file was opened/written on mount B.
> 
> (This corruption can be reproduced on the example libfuse passthrough_hp
> server with writeback caching disabled and nopassthrough)
> 
> Having this flag as an option enables parity with NFS's close-to-open
> consistency.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  fs/fuse/file.c            | 7 ++++++-
>  include/uapi/linux/fuse.h | 7 ++++++-
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index f39456c65ed7..437487ce413d 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -264,7 +264,12 @@ static int fuse_open(struct inode *inode, struct file *file)
>  	err = fuse_do_open(fm, get_node_id(inode), file, false);
>  	if (!err) {
>  		ff = file->private_data;
> -		err = fuse_finish_open(inode, file);
> +		if (ff->open_flags & FOPEN_FETCH_ATTR) {
> +			fuse_invalidate_attr(inode);
> +			err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
> +		}
> +		if (!err)
> +			err = fuse_finish_open(inode, file);
>  		if (err)
>  			fuse_sync_release(fi, ff, file->f_flags);
>  		else if (is_truncate)

I didn't come to it yet, but I actually wanted to update Dharmendras/my
atomic open patches - giving up all the vfs changes (for now) and then
always use atomic open if available, for FUSE_OPEN and FUSE_CREATE. And
then update attributes through that.
Would that be an alternative for you? Would basically require to add an
atomic_open method into your file system.

Definitely more complex than your solution, but avoids a another
kernel/userspace transition.

Thanks,
Bernd
Joanne Koong Aug. 13, 2024, 9:57 p.m. UTC | #3
On Tue, Aug 13, 2024 at 2:44 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> On 8/13/24 23:21, Joanne Koong wrote:
> > Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
> > fetched from the server after an open.
> >
> > For fuse servers that are backed by network filesystems, this is
> > needed to ensure that file attributes are up to date between
> > consecutive open calls.
> >
> > For example, if there is a file that is opened on two fuse mounts,
> > in the following scenario:
> >
> > on mount A, open file.txt w/ O_APPEND, write "hi", close file
> > on mount B, open file.txt w/ O_APPEND, write "world", close file
> > on mount A, open file.txt w/ O_APPEND, write "123", close file
> >
> > when the file is reopened on mount A, the file inode contains the old
> > size and the last append will overwrite the data that was written when
> > the file was opened/written on mount B.
> >
> > (This corruption can be reproduced on the example libfuse passthrough_hp
> > server with writeback caching disabled and nopassthrough)
> >
> > Having this flag as an option enables parity with NFS's close-to-open
> > consistency.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  fs/fuse/file.c            | 7 ++++++-
> >  include/uapi/linux/fuse.h | 7 ++++++-
> >  2 files changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index f39456c65ed7..437487ce413d 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -264,7 +264,12 @@ static int fuse_open(struct inode *inode, struct file *file)
> >       err = fuse_do_open(fm, get_node_id(inode), file, false);
> >       if (!err) {
> >               ff = file->private_data;
> > -             err = fuse_finish_open(inode, file);
> > +             if (ff->open_flags & FOPEN_FETCH_ATTR) {
> > +                     fuse_invalidate_attr(inode);
> > +                     err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
> > +             }
> > +             if (!err)
> > +                     err = fuse_finish_open(inode, file);
> >               if (err)
> >                       fuse_sync_release(fi, ff, file->f_flags);
> >               else if (is_truncate)
>
> I didn't come to it yet, but I actually wanted to update Dharmendras/my
> atomic open patches - giving up all the vfs changes (for now) and then
> always use atomic open if available, for FUSE_OPEN and FUSE_CREATE. And
> then update attributes through that.
> Would that be an alternative for you? Would basically require to add an
> atomic_open method into your file system.
>
> Definitely more complex than your solution, but avoids a another
> kernel/userspace transition.

Hi Bernd,

Unfortunately I don't think this is an alternative for my use case. I
haven't looked closely at the implementation details of your atomic
open patchset yet but if I'm understanding the gist of it correctly,
it bundles the lookup with the open into 1 request, where the
attributes can be passed from server -> kernel through the reply to
that request. I think in the case I'm working on, the file open call
does not require a lookup so it can't take advantage of your feature.
I just tested it on libfuse on the passthrough_hp server (with no
writeback caching and nopassthrough) on the example in the commit
message and I'm not seeing any lookup request being sent for that last
open call (for writing "123").

Thanks,
Joanne

>
> Thanks,
> Bernd
Bernd Schubert Aug. 13, 2024, 10:41 p.m. UTC | #4
On August 13, 2024 11:57:44 PM GMT+02:00, Joanne Koong <joannelkoong@gmail.com> wrote:
>On Tue, Aug 13, 2024 at 2:44 PM Bernd Schubert
><bernd.schubert@fastmail.fm> wrote:
>>
>> On 8/13/24 23:21, Joanne Koong wrote:
>> > Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
>> > fetched from the server after an open.
>> >
>> > For fuse servers that are backed by network filesystems, this is
>> > needed to ensure that file attributes are up to date between
>> > consecutive open calls.
>> >
>> > For example, if there is a file that is opened on two fuse mounts,
>> > in the following scenario:
>> >
>> > on mount A, open file.txt w/ O_APPEND, write "hi", close file
>> > on mount B, open file.txt w/ O_APPEND, write "world", close file
>> > on mount A, open file.txt w/ O_APPEND, write "123", close file
>> >
>> > when the file is reopened on mount A, the file inode contains the old
>> > size and the last append will overwrite the data that was written when
>> > the file was opened/written on mount B.
>> >
>> > (This corruption can be reproduced on the example libfuse passthrough_hp
>> > server with writeback caching disabled and nopassthrough)
>> >
>> > Having this flag as an option enables parity with NFS's close-to-open
>> > consistency.
>> >
>> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>> > ---
>> >  fs/fuse/file.c            | 7 ++++++-
>> >  include/uapi/linux/fuse.h | 7 ++++++-
>> >  2 files changed, 12 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> > index f39456c65ed7..437487ce413d 100644
>> > --- a/fs/fuse/file.c
>> > +++ b/fs/fuse/file.c
>> > @@ -264,7 +264,12 @@ static int fuse_open(struct inode *inode, struct file *file)
>> >       err = fuse_do_open(fm, get_node_id(inode), file, false);
>> >       if (!err) {
>> >               ff = file->private_data;
>> > -             err = fuse_finish_open(inode, file);
>> > +             if (ff->open_flags & FOPEN_FETCH_ATTR) {
>> > +                     fuse_invalidate_attr(inode);
>> > +                     err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
>> > +             }
>> > +             if (!err)
>> > +                     err = fuse_finish_open(inode, file);
>> >               if (err)
>> >                       fuse_sync_release(fi, ff, file->f_flags);
>> >               else if (is_truncate)
>>
>> I didn't come to it yet, but I actually wanted to update Dharmendras/my
>> atomic open patches - giving up all the vfs changes (for now) and then
>> always use atomic open if available, for FUSE_OPEN and FUSE_CREATE. And
>> then update attributes through that.
>> Would that be an alternative for you? Would basically require to add an
>> atomic_open method into your file system.
>>
>> Definitely more complex than your solution, but avoids a another
>> kernel/userspace transition.
>
>Hi Bernd,
>
>Unfortunately I don't think this is an alternative for my use case. I
>haven't looked closely at the implementation details of your atomic
>open patchset yet but if I'm understanding the gist of it correctly,
>it bundles the lookup with the open into 1 request, where the
>attributes can be passed from server -> kernel through the reply to
>that request. I think in the case I'm working on, the file open call
>does not require a lookup so it can't take advantage of your feature.
>I just tested it on libfuse on the passthrough_hp server (with no
>writeback caching and nopassthrough) on the example in the commit
>message and I'm not seeing any lookup request being sent for that last
>open call (for writing "123").
>


Hi Joanne,

gets late here and I'm typing on my phone.  I hope formatting is ok.

what I meant is that we use the atomic open op code for both, lookup-open and plain open - i.e. we always update attributes on open. Past atomic open patches did not do that yet, but I later realized that always using atomic open op 

- avoids the data corruption you run into
- probably no need for atomic-revalidate-open vfs patches anymore  as we can now safely set a high attr timeout


Kind of the same as your patch, just through a new op code.

Thanks, 
Bernd
Jingbo Xu Aug. 14, 2024, 2:46 a.m. UTC | #5
On 8/14/24 5:21 AM, Joanne Koong wrote:
> Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
> fetched from the server after an open.
> 
> For fuse servers that are backed by network filesystems, this is
> needed to ensure that file attributes are up to date between
> consecutive open calls.
> 
> For example, if there is a file that is opened on two fuse mounts,
> in the following scenario:
> 
> on mount A, open file.txt w/ O_APPEND, write "hi", close file
> on mount B, open file.txt w/ O_APPEND, write "world", close file
> on mount A, open file.txt w/ O_APPEND, write "123", close file
> 
> when the file is reopened on mount A, the file inode contains the old
> size and the last append will overwrite the data that was written when
> the file was opened/written on mount B.
> 
> (This corruption can be reproduced on the example libfuse passthrough_hp
> server with writeback caching disabled and nopassthrough)
> 
> Having this flag as an option enables parity with NFS's close-to-open
> consistency.

It seems a general demand for close-to-open consistency similar to NFS
when the backend store for FUSE is a NFS-like filesystem.  We have a
similar private implementation for close-to-open consistency in our
internal distribution.  Also FYI there was a similar proposal for this:

https://lore.kernel.org/linux-fsdevel/20220608104202.19461-1-zhangjiachen.jaycee@bytedance.com/

> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  fs/fuse/file.c            | 7 ++++++-
>  include/uapi/linux/fuse.h | 7 ++++++-
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index f39456c65ed7..437487ce413d 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -264,7 +264,12 @@ static int fuse_open(struct inode *inode, struct file *file)
>  	err = fuse_do_open(fm, get_node_id(inode), file, false);
>  	if (!err) {
>  		ff = file->private_data;
> -		err = fuse_finish_open(inode, file);
> +		if (ff->open_flags & FOPEN_FETCH_ATTR) {
> +			fuse_invalidate_attr(inode);
> +			err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
> +		}
> +		if (!err)
> +			err = fuse_finish_open(inode, file);
>  		if (err)
>  			fuse_sync_release(fi, ff, file->f_flags);
>  		else if (is_truncate)
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index d08b99d60f6f..f5d1af6fe352 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -217,6 +217,9 @@
>   *  - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag
>   *  - add FUSE_NO_EXPORT_SUPPORT init flag
>   *  - add FUSE_NOTIFY_RESEND, add FUSE_HAS_RESEND init flag
> + *
> + *  7.41
> + *  - add FOPEN_FETCH_ATTR
>   */
>  
>  #ifndef _LINUX_FUSE_H
> @@ -252,7 +255,7 @@
>  #define FUSE_KERNEL_VERSION 7
>  
>  /** Minor version number of this interface */
> -#define FUSE_KERNEL_MINOR_VERSION 40
> +#define FUSE_KERNEL_MINOR_VERSION 41
>  
>  /** The node ID of the root inode */
>  #define FUSE_ROOT_ID 1
> @@ -360,6 +363,7 @@ struct fuse_file_lock {
>   * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
>   * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
>   * FOPEN_PASSTHROUGH: passthrough read/write io for this open file
> + * FOPEN_FETCH_ATTR: attributes are fetched after file is opened
>   */
>  #define FOPEN_DIRECT_IO		(1 << 0)
>  #define FOPEN_KEEP_CACHE	(1 << 1)
> @@ -369,6 +373,7 @@ struct fuse_file_lock {
>  #define FOPEN_NOFLUSH		(1 << 5)
>  #define FOPEN_PARALLEL_DIRECT_WRITES	(1 << 6)
>  #define FOPEN_PASSTHROUGH	(1 << 7)
> +#define FOPEN_FETCH_ATTR	(1 << 8)
>  
>  /**
>   * INIT request/reply flags

Does this close-to-open consistency support writeback mode? AFAIK, the
cached ctime/mtime/size at the kernel side are always trusted while
these attributes from the server are dropped, see:

```
fuse_update_attributes
    fuse_update_get_attr
        cache_mask = fuse_get_cache_mask(inode)
            if writeback mode:
                return STATX_MTIME | STATX_CTIME | STATX_SIZE
```

Also FYI there's a similar proposal for enhancing the close-to-open
consistency in writeback mode to fix the above issue:

https://lore.kernel.org/linux-fsdevel/20220624055825.29183-1-zhangjiachen.jaycee@bytedance.com/

Besides, IIUC this patch only implements the revalidate-on-open semantic
for metadata.  To fulfill the full close-to-open consistency, do you
need to disable FOPEN_KEEP_CACHE to fulfill the revalidate-on-open
semantic for data? (Though the revalidate-on-open semantic for data is
not needed in your append-only case.)
Joanne Koong Aug. 14, 2024, 5:18 p.m. UTC | #6
On Tue, Aug 13, 2024 at 3:41 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> On August 13, 2024 11:57:44 PM GMT+02:00, Joanne Koong <joannelkoong@gmail.com> wrote:
> >On Tue, Aug 13, 2024 at 2:44 PM Bernd Schubert
> ><bernd.schubert@fastmail.fm> wrote:
> >>
> >> On 8/13/24 23:21, Joanne Koong wrote:
> >> > Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
> >> > fetched from the server after an open.
> >> >
> >> > For fuse servers that are backed by network filesystems, this is
> >> > needed to ensure that file attributes are up to date between
> >> > consecutive open calls.
> >> >
> >> > For example, if there is a file that is opened on two fuse mounts,
> >> > in the following scenario:
> >> >
> >> > on mount A, open file.txt w/ O_APPEND, write "hi", close file
> >> > on mount B, open file.txt w/ O_APPEND, write "world", close file
> >> > on mount A, open file.txt w/ O_APPEND, write "123", close file
> >> >
> >> > when the file is reopened on mount A, the file inode contains the old
> >> > size and the last append will overwrite the data that was written when
> >> > the file was opened/written on mount B.
> >> >
> >> > (This corruption can be reproduced on the example libfuse passthrough_hp
> >> > server with writeback caching disabled and nopassthrough)
> >> >
> >> > Having this flag as an option enables parity with NFS's close-to-open
> >> > consistency.
> >> >
> >> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >> > ---
> >> >  fs/fuse/file.c            | 7 ++++++-
> >> >  include/uapi/linux/fuse.h | 7 ++++++-
> >> >  2 files changed, 12 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> >> > index f39456c65ed7..437487ce413d 100644
> >> > --- a/fs/fuse/file.c
> >> > +++ b/fs/fuse/file.c
> >> > @@ -264,7 +264,12 @@ static int fuse_open(struct inode *inode, struct file *file)
> >> >       err = fuse_do_open(fm, get_node_id(inode), file, false);
> >> >       if (!err) {
> >> >               ff = file->private_data;
> >> > -             err = fuse_finish_open(inode, file);
> >> > +             if (ff->open_flags & FOPEN_FETCH_ATTR) {
> >> > +                     fuse_invalidate_attr(inode);
> >> > +                     err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
> >> > +             }
> >> > +             if (!err)
> >> > +                     err = fuse_finish_open(inode, file);
> >> >               if (err)
> >> >                       fuse_sync_release(fi, ff, file->f_flags);
> >> >               else if (is_truncate)
> >>
> >> I didn't come to it yet, but I actually wanted to update Dharmendras/my
> >> atomic open patches - giving up all the vfs changes (for now) and then
> >> always use atomic open if available, for FUSE_OPEN and FUSE_CREATE. And
> >> then update attributes through that.
> >> Would that be an alternative for you? Would basically require to add an
> >> atomic_open method into your file system.
> >>
> >> Definitely more complex than your solution, but avoids a another
> >> kernel/userspace transition.
> >
> >Hi Bernd,
> >
> >Unfortunately I don't think this is an alternative for my use case. I
> >haven't looked closely at the implementation details of your atomic
> >open patchset yet but if I'm understanding the gist of it correctly,
> >it bundles the lookup with the open into 1 request, where the
> >attributes can be passed from server -> kernel through the reply to
> >that request. I think in the case I'm working on, the file open call
> >does not require a lookup so it can't take advantage of your feature.
> >I just tested it on libfuse on the passthrough_hp server (with no
> >writeback caching and nopassthrough) on the example in the commit
> >message and I'm not seeing any lookup request being sent for that last
> >open call (for writing "123").
> >
>
>
> Hi Joanne,
>
> gets late here and I'm typing on my phone.  I hope formatting is ok.
>
> what I meant is that we use the atomic open op code for both, lookup-open and plain open - i.e. we always update attributes on open. Past atomic open patches did not do that yet, but I later realized that always using atomic open op
>
> - avoids the data corruption you run into
> - probably no need for atomic-revalidate-open vfs patches anymore  as we can now safely set a high attr timeout
>
>
> Kind of the same as your patch, just through a new op code.

Awesome, thanks for the context Bernd. I think this works for our use
case then. To confirm the "we will always update attributes on open"
part, this will only send the FUSE_GETATTR request to the server if
the server has invalidated the inode (eg through the
fuse_lowlevel_notify_inval_inode() api), otherwise this will not send
an extra FUSE_GETATTR request, correct? Other than the attribute
updating, would there be any other differences from using plain open
vs the atomic open version of plain open?

Do you have a tentative timeline in mind for when the next iteration
of the atomic open patchset would be out?

Thanks,
Joanne
>
> Thanks,
> Bernd
>
Joanne Koong Aug. 14, 2024, 5:50 p.m. UTC | #7
On Tue, Aug 13, 2024 at 7:46 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>
> On 8/14/24 5:21 AM, Joanne Koong wrote:
> > Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
> > fetched from the server after an open.
> >
> > For fuse servers that are backed by network filesystems, this is
> > needed to ensure that file attributes are up to date between
> > consecutive open calls.
> >
> > For example, if there is a file that is opened on two fuse mounts,
> > in the following scenario:
> >
> > on mount A, open file.txt w/ O_APPEND, write "hi", close file
> > on mount B, open file.txt w/ O_APPEND, write "world", close file
> > on mount A, open file.txt w/ O_APPEND, write "123", close file
> >
> > when the file is reopened on mount A, the file inode contains the old
> > size and the last append will overwrite the data that was written when
> > the file was opened/written on mount B.
> >
> > (This corruption can be reproduced on the example libfuse passthrough_hp
> > server with writeback caching disabled and nopassthrough)
> >
> > Having this flag as an option enables parity with NFS's close-to-open
> > consistency.
>
> It seems a general demand for close-to-open consistency similar to NFS
> when the backend store for FUSE is a NFS-like filesystem.  We have a
> similar private implementation for close-to-open consistency in our
> internal distribution.  Also FYI there was a similar proposal for this:
>
> https://lore.kernel.org/linux-fsdevel/20220608104202.19461-1-zhangjiachen.jaycee@bytedance.com/

Thanks for the link. I think that proposal though only invalidates the
attributes on file open, but the use case we have needs the attributes
to be updated on open (so that the subsequent write call uses the
updated file size).

>
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  fs/fuse/file.c            | 7 ++++++-
> >  include/uapi/linux/fuse.h | 7 ++++++-
> >  2 files changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index f39456c65ed7..437487ce413d 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -264,7 +264,12 @@ static int fuse_open(struct inode *inode, struct file *file)
> >       err = fuse_do_open(fm, get_node_id(inode), file, false);
> >       if (!err) {
> >               ff = file->private_data;
> > -             err = fuse_finish_open(inode, file);
> > +             if (ff->open_flags & FOPEN_FETCH_ATTR) {
> > +                     fuse_invalidate_attr(inode);
> > +                     err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
> > +             }
> > +             if (!err)
> > +                     err = fuse_finish_open(inode, file);
> >               if (err)
> >                       fuse_sync_release(fi, ff, file->f_flags);
> >               else if (is_truncate)
> > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> > index d08b99d60f6f..f5d1af6fe352 100644
> > --- a/include/uapi/linux/fuse.h
> > +++ b/include/uapi/linux/fuse.h
> > @@ -217,6 +217,9 @@
> >   *  - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag
> >   *  - add FUSE_NO_EXPORT_SUPPORT init flag
> >   *  - add FUSE_NOTIFY_RESEND, add FUSE_HAS_RESEND init flag
> > + *
> > + *  7.41
> > + *  - add FOPEN_FETCH_ATTR
> >   */
> >
> >  #ifndef _LINUX_FUSE_H
> > @@ -252,7 +255,7 @@
> >  #define FUSE_KERNEL_VERSION 7
> >
> >  /** Minor version number of this interface */
> > -#define FUSE_KERNEL_MINOR_VERSION 40
> > +#define FUSE_KERNEL_MINOR_VERSION 41
> >
> >  /** The node ID of the root inode */
> >  #define FUSE_ROOT_ID 1
> > @@ -360,6 +363,7 @@ struct fuse_file_lock {
> >   * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
> >   * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
> >   * FOPEN_PASSTHROUGH: passthrough read/write io for this open file
> > + * FOPEN_FETCH_ATTR: attributes are fetched after file is opened
> >   */
> >  #define FOPEN_DIRECT_IO              (1 << 0)
> >  #define FOPEN_KEEP_CACHE     (1 << 1)
> > @@ -369,6 +373,7 @@ struct fuse_file_lock {
> >  #define FOPEN_NOFLUSH                (1 << 5)
> >  #define FOPEN_PARALLEL_DIRECT_WRITES (1 << 6)
> >  #define FOPEN_PASSTHROUGH    (1 << 7)
> > +#define FOPEN_FETCH_ATTR     (1 << 8)
> >
> >  /**
> >   * INIT request/reply flags
>
> Does this close-to-open consistency support writeback mode? AFAIK, the
> cached ctime/mtime/size at the kernel side are always trusted while
> these attributes from the server are dropped, see:

No, the use case we're running into doesn't have the writeback cache
enabled, so we haven't looked into what would be needed for full CTO
consistency on writeback mode.

>
> ```
> fuse_update_attributes
>     fuse_update_get_attr
>         cache_mask = fuse_get_cache_mask(inode)
>             if writeback mode:
>                 return STATX_MTIME | STATX_CTIME | STATX_SIZE
> ```
>
> Also FYI there's a similar proposal for enhancing the close-to-open
> consistency in writeback mode to fix the above issue:
>
> https://lore.kernel.org/linux-fsdevel/20220624055825.29183-1-zhangjiachen.jaycee@bytedance.com/
>
> Besides, IIUC this patch only implements the revalidate-on-open semantic
> for metadata.  To fulfill the full close-to-open consistency, do you
> need to disable FOPEN_KEEP_CACHE to fulfill the revalidate-on-open
> semantic for data? (Though the revalidate-on-open semantic for data is
> not needed in your append-only case.)

This patch is aimed at addressing the overwrite/corruption issue we've
been seeing rather than adding full support for close-to-open
consistency. We haven't so far had a need for revalidate-on-open for
data. I should have phrased the last sentence of the commit message as
"enables better parity with NFS's close-to-open consistency" :)


Thanks,
Joanne

>
> --
> Thanks,
> Jingbo
Bernd Schubert Aug. 14, 2024, 5:52 p.m. UTC | #8
On 8/14/24 19:18, Joanne Koong wrote:
> On Tue, Aug 13, 2024 at 3:41 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>> On August 13, 2024 11:57:44 PM GMT+02:00, Joanne Koong <joannelkoong@gmail.com> wrote:
>>> On Tue, Aug 13, 2024 at 2:44 PM Bernd Schubert
>>> <bernd.schubert@fastmail.fm> wrote:
>>>>
>>>> On 8/13/24 23:21, Joanne Koong wrote:
>>>>> Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
>>>>> fetched from the server after an open.
>>>>>
>>>>> For fuse servers that are backed by network filesystems, this is
>>>>> needed to ensure that file attributes are up to date between
>>>>> consecutive open calls.
>>>>>
>>>>> For example, if there is a file that is opened on two fuse mounts,
>>>>> in the following scenario:
>>>>>
>>>>> on mount A, open file.txt w/ O_APPEND, write "hi", close file
>>>>> on mount B, open file.txt w/ O_APPEND, write "world", close file
>>>>> on mount A, open file.txt w/ O_APPEND, write "123", close file
>>>>>
>>>>> when the file is reopened on mount A, the file inode contains the old
>>>>> size and the last append will overwrite the data that was written when
>>>>> the file was opened/written on mount B.
>>>>>
>>>>> (This corruption can be reproduced on the example libfuse passthrough_hp
>>>>> server with writeback caching disabled and nopassthrough)
>>>>>
>>>>> Having this flag as an option enables parity with NFS's close-to-open
>>>>> consistency.
>>>>>
>>>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>>>> ---
>>>>>  fs/fuse/file.c            | 7 ++++++-
>>>>>  include/uapi/linux/fuse.h | 7 ++++++-
>>>>>  2 files changed, 12 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>>>> index f39456c65ed7..437487ce413d 100644
>>>>> --- a/fs/fuse/file.c
>>>>> +++ b/fs/fuse/file.c
>>>>> @@ -264,7 +264,12 @@ static int fuse_open(struct inode *inode, struct file *file)
>>>>>       err = fuse_do_open(fm, get_node_id(inode), file, false);
>>>>>       if (!err) {
>>>>>               ff = file->private_data;
>>>>> -             err = fuse_finish_open(inode, file);
>>>>> +             if (ff->open_flags & FOPEN_FETCH_ATTR) {
>>>>> +                     fuse_invalidate_attr(inode);
>>>>> +                     err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
>>>>> +             }
>>>>> +             if (!err)
>>>>> +                     err = fuse_finish_open(inode, file);
>>>>>               if (err)
>>>>>                       fuse_sync_release(fi, ff, file->f_flags);
>>>>>               else if (is_truncate)
>>>>
>>>> I didn't come to it yet, but I actually wanted to update Dharmendras/my
>>>> atomic open patches - giving up all the vfs changes (for now) and then
>>>> always use atomic open if available, for FUSE_OPEN and FUSE_CREATE. And
>>>> then update attributes through that.
>>>> Would that be an alternative for you? Would basically require to add an
>>>> atomic_open method into your file system.
>>>>
>>>> Definitely more complex than your solution, but avoids a another
>>>> kernel/userspace transition.
>>>
>>> Hi Bernd,
>>>
>>> Unfortunately I don't think this is an alternative for my use case. I
>>> haven't looked closely at the implementation details of your atomic
>>> open patchset yet but if I'm understanding the gist of it correctly,
>>> it bundles the lookup with the open into 1 request, where the
>>> attributes can be passed from server -> kernel through the reply to
>>> that request. I think in the case I'm working on, the file open call
>>> does not require a lookup so it can't take advantage of your feature.
>>> I just tested it on libfuse on the passthrough_hp server (with no
>>> writeback caching and nopassthrough) on the example in the commit
>>> message and I'm not seeing any lookup request being sent for that last
>>> open call (for writing "123").
>>>
>>
>>
>> Hi Joanne,
>>
>> gets late here and I'm typing on my phone.  I hope formatting is ok.
>>
>> what I meant is that we use the atomic open op code for both, lookup-open and plain open - i.e. we always update attributes on open. Past atomic open patches did not do that yet, but I later realized that always using atomic open op
>>
>> - avoids the data corruption you run into
>> - probably no need for atomic-revalidate-open vfs patches anymore  as we can now safely set a high attr timeout
>>
>>
>> Kind of the same as your patch, just through a new op code.
> 
> Awesome, thanks for the context Bernd. I think this works for our use
> case then. To confirm the "we will always update attributes on open"
> part, this will only send the FUSE_GETATTR request to the server if
> the server has invalidated the inode (eg through the
> fuse_lowlevel_notify_inval_inode() api), otherwise this will not send
> an extra FUSE_GETATTR request, correct? Other than the attribute

If we send FUSE_OPEN_ATOMIC (or whatever we name it) in
fuse_file_open(), it would always ask server side for attributes.
I.e. we assume that a server that has atomic open implemented can easily
provide attributes or asks for close-to-open coherency.


I'm not sure if I correctly understood your questions about
notifications and FUSE_GETATTR - from my point of view that that is
entirely independent from open. And personally I try to reduce
kernel/userspace transitions - additional notifications and FUSE_GETATTR
are not helpful here :)

> updating, would there be any other differences from using plain open
> vs the atomic open version of plain open?

Just the additional file attributes and complexity that brings.

> 
> Do you have a tentative timeline in mind for when the next iteration
> of the atomic open patchset would be out?

I wanted to have new fuse-uring patches ready by last week, but I'm
still refactoring things - changing things on top of the existing series
is easy, rebasing it is painful...  I can _try_ to make a raw new
atomic-open patch set during the next days (till Sunday), but not promised.


Thanks,
Bernd
Joanne Koong Aug. 14, 2024, 6:06 p.m. UTC | #9
On Wed, Aug 14, 2024 at 10:52 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 8/14/24 19:18, Joanne Koong wrote:
> > On Tue, Aug 13, 2024 at 3:41 PM Bernd Schubert
> > <bernd.schubert@fastmail.fm> wrote:
> >>
> >> On August 13, 2024 11:57:44 PM GMT+02:00, Joanne Koong <joannelkoong@gmail.com> wrote:
> >>> On Tue, Aug 13, 2024 at 2:44 PM Bernd Schubert
> >>> <bernd.schubert@fastmail.fm> wrote:
> >>>>
> >>>> On 8/13/24 23:21, Joanne Koong wrote:
> >>>>> Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
> >>>>> fetched from the server after an open.
> >>>>>
> >>>>> For fuse servers that are backed by network filesystems, this is
> >>>>> needed to ensure that file attributes are up to date between
> >>>>> consecutive open calls.
> >>>>>
> >>>>> For example, if there is a file that is opened on two fuse mounts,
> >>>>> in the following scenario:
> >>>>>
> >>>>> on mount A, open file.txt w/ O_APPEND, write "hi", close file
> >>>>> on mount B, open file.txt w/ O_APPEND, write "world", close file
> >>>>> on mount A, open file.txt w/ O_APPEND, write "123", close file
> >>>>>
> >>>>> when the file is reopened on mount A, the file inode contains the old
> >>>>> size and the last append will overwrite the data that was written when
> >>>>> the file was opened/written on mount B.
> >>>>>
> >>>>> (This corruption can be reproduced on the example libfuse passthrough_hp
> >>>>> server with writeback caching disabled and nopassthrough)
> >>>>>
> >>>>> Having this flag as an option enables parity with NFS's close-to-open
> >>>>> consistency.
> >>>>>
> >>>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >>>>> ---
> >>>>>  fs/fuse/file.c            | 7 ++++++-
> >>>>>  include/uapi/linux/fuse.h | 7 ++++++-
> >>>>>  2 files changed, 12 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> >>>>> index f39456c65ed7..437487ce413d 100644
> >>>>> --- a/fs/fuse/file.c
> >>>>> +++ b/fs/fuse/file.c
> >>>>> @@ -264,7 +264,12 @@ static int fuse_open(struct inode *inode, struct file *file)
> >>>>>       err = fuse_do_open(fm, get_node_id(inode), file, false);
> >>>>>       if (!err) {
> >>>>>               ff = file->private_data;
> >>>>> -             err = fuse_finish_open(inode, file);
> >>>>> +             if (ff->open_flags & FOPEN_FETCH_ATTR) {
> >>>>> +                     fuse_invalidate_attr(inode);
> >>>>> +                     err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
> >>>>> +             }
> >>>>> +             if (!err)
> >>>>> +                     err = fuse_finish_open(inode, file);
> >>>>>               if (err)
> >>>>>                       fuse_sync_release(fi, ff, file->f_flags);
> >>>>>               else if (is_truncate)
> >>>>
> >>>> I didn't come to it yet, but I actually wanted to update Dharmendras/my
> >>>> atomic open patches - giving up all the vfs changes (for now) and then
> >>>> always use atomic open if available, for FUSE_OPEN and FUSE_CREATE. And
> >>>> then update attributes through that.
> >>>> Would that be an alternative for you? Would basically require to add an
> >>>> atomic_open method into your file system.
> >>>>
> >>>> Definitely more complex than your solution, but avoids a another
> >>>> kernel/userspace transition.
> >>>
> >>> Hi Bernd,
> >>>
> >>> Unfortunately I don't think this is an alternative for my use case. I
> >>> haven't looked closely at the implementation details of your atomic
> >>> open patchset yet but if I'm understanding the gist of it correctly,
> >>> it bundles the lookup with the open into 1 request, where the
> >>> attributes can be passed from server -> kernel through the reply to
> >>> that request. I think in the case I'm working on, the file open call
> >>> does not require a lookup so it can't take advantage of your feature.
> >>> I just tested it on libfuse on the passthrough_hp server (with no
> >>> writeback caching and nopassthrough) on the example in the commit
> >>> message and I'm not seeing any lookup request being sent for that last
> >>> open call (for writing "123").
> >>>
> >>
> >>
> >> Hi Joanne,
> >>
> >> gets late here and I'm typing on my phone.  I hope formatting is ok.
> >>
> >> what I meant is that we use the atomic open op code for both, lookup-open and plain open - i.e. we always update attributes on open. Past atomic open patches did not do that yet, but I later realized that always using atomic open op
> >>
> >> - avoids the data corruption you run into
> >> - probably no need for atomic-revalidate-open vfs patches anymore  as we can now safely set a high attr timeout
> >>
> >>
> >> Kind of the same as your patch, just through a new op code.
> >
> > Awesome, thanks for the context Bernd. I think this works for our use
> > case then. To confirm the "we will always update attributes on open"
> > part, this will only send the FUSE_GETATTR request to the server if
> > the server has invalidated the inode (eg through the
> > fuse_lowlevel_notify_inval_inode() api), otherwise this will not send
> > an extra FUSE_GETATTR request, correct? Other than the attribute
>
> If we send FUSE_OPEN_ATOMIC (or whatever we name it) in
> fuse_file_open(), it would always ask server side for attributes.

Oh I see, the FUSE_OPEN_ATOMIC request itself would ask for attributes
and the attributes would be sent by the server as the reply to the
FUSE_ATOMIC_OPEN. This sounds great! in my patch, there's an
additional FUSE_GETATTR request incurred to get the attributes.

> I.e. we assume that a server that has atomic open implemented can easily
> provide attributes or asks for close-to-open coherency.
>
>
> I'm not sure if I correctly understood your questions about
> notifications and FUSE_GETATTR - from my point of view that that is
> entirely independent from open. And personally I try to reduce

I missed that the attributes would be bundled with FUSE_OPEN_ATOMIC so
I thought we would need an additional FUSE_GETATTR request to get
them. Apologies for the confusion!

> kernel/userspace transitions - additional notifications and FUSE_GETATTR
> are not helpful here :)
>
> > updating, would there be any other differences from using plain open
> > vs the atomic open version of plain open?
>
> Just the additional file attributes and complexity that brings.
>
> >
> > Do you have a tentative timeline in mind for when the next iteration
> > of the atomic open patchset would be out?
>
> I wanted to have new fuse-uring patches ready by last week, but I'm
> still refactoring things - changing things on top of the existing series
> is easy, rebasing it is painful...  I can _try_ to make a raw new
> atomic-open patch set during the next days (till Sunday), but not promised.
>

Sounds great. thanks for your work on this!

>
> Thanks,
> Bernd
Bernd Schubert Aug. 19, 2024, 10:36 p.m. UTC | #10
On 8/14/24 20:06, Joanne Koong wrote:
> On Wed, Aug 14, 2024 at 10:52 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 8/14/24 19:18, Joanne Koong wrote:
>>> On Tue, Aug 13, 2024 at 3:41 PM Bernd Schubert
>>> <bernd.schubert@fastmail.fm> wrote:
>>>>
>>>> On August 13, 2024 11:57:44 PM GMT+02:00, Joanne Koong <joannelkoong@gmail.com> wrote:
>>>>> On Tue, Aug 13, 2024 at 2:44 PM Bernd Schubert
>>>>> <bernd.schubert@fastmail.fm> wrote:
>>>>>>
>>>>>> On 8/13/24 23:21, Joanne Koong wrote:
>>>>>>> Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
>>>>>>> fetched from the server after an open.
>>>>>>>
>>>>>>> For fuse servers that are backed by network filesystems, this is
>>>>>>> needed to ensure that file attributes are up to date between
>>>>>>> consecutive open calls.
>>>>>>>
>>>>>>> For example, if there is a file that is opened on two fuse mounts,
>>>>>>> in the following scenario:
>>>>>>>
>>>>>>> on mount A, open file.txt w/ O_APPEND, write "hi", close file
>>>>>>> on mount B, open file.txt w/ O_APPEND, write "world", close file
>>>>>>> on mount A, open file.txt w/ O_APPEND, write "123", close file
>>>>>>>
>>>>>>> when the file is reopened on mount A, the file inode contains the old
>>>>>>> size and the last append will overwrite the data that was written when
>>>>>>> the file was opened/written on mount B.
>>>>>>>
>>>>>>> (This corruption can be reproduced on the example libfuse passthrough_hp
>>>>>>> server with writeback caching disabled and nopassthrough)
>>>>>>>
>>>>>>> Having this flag as an option enables parity with NFS's close-to-open
>>>>>>> consistency.
>>>>>>>
>>>>>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>>>>>> ---
>>>>>>>  fs/fuse/file.c            | 7 ++++++-
>>>>>>>  include/uapi/linux/fuse.h | 7 ++++++-
>>>>>>>  2 files changed, 12 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>>>>>> index f39456c65ed7..437487ce413d 100644
>>>>>>> --- a/fs/fuse/file.c
>>>>>>> +++ b/fs/fuse/file.c
>>>>>>> @@ -264,7 +264,12 @@ static int fuse_open(struct inode *inode, struct file *file)
>>>>>>>       err = fuse_do_open(fm, get_node_id(inode), file, false);
>>>>>>>       if (!err) {
>>>>>>>               ff = file->private_data;
>>>>>>> -             err = fuse_finish_open(inode, file);
>>>>>>> +             if (ff->open_flags & FOPEN_FETCH_ATTR) {
>>>>>>> +                     fuse_invalidate_attr(inode);
>>>>>>> +                     err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
>>>>>>> +             }
>>>>>>> +             if (!err)
>>>>>>> +                     err = fuse_finish_open(inode, file);
>>>>>>>               if (err)
>>>>>>>                       fuse_sync_release(fi, ff, file->f_flags);
>>>>>>>               else if (is_truncate)
>>>>>>
>>>>>> I didn't come to it yet, but I actually wanted to update Dharmendras/my
>>>>>> atomic open patches - giving up all the vfs changes (for now) and then
>>>>>> always use atomic open if available, for FUSE_OPEN and FUSE_CREATE. And
>>>>>> then update attributes through that.
>>>>>> Would that be an alternative for you? Would basically require to add an
>>>>>> atomic_open method into your file system.
>>>>>>
>>>>>> Definitely more complex than your solution, but avoids a another
>>>>>> kernel/userspace transition.
>>>>>
>>>>> Hi Bernd,
>>>>>
>>>>> Unfortunately I don't think this is an alternative for my use case. I
>>>>> haven't looked closely at the implementation details of your atomic
>>>>> open patchset yet but if I'm understanding the gist of it correctly,
>>>>> it bundles the lookup with the open into 1 request, where the
>>>>> attributes can be passed from server -> kernel through the reply to
>>>>> that request. I think in the case I'm working on, the file open call
>>>>> does not require a lookup so it can't take advantage of your feature.
>>>>> I just tested it on libfuse on the passthrough_hp server (with no
>>>>> writeback caching and nopassthrough) on the example in the commit
>>>>> message and I'm not seeing any lookup request being sent for that last
>>>>> open call (for writing "123").
>>>>>
>>>>
>>>>
>>>> Hi Joanne,
>>>>
>>>> gets late here and I'm typing on my phone.  I hope formatting is ok.
>>>>
>>>> what I meant is that we use the atomic open op code for both, lookup-open and plain open - i.e. we always update attributes on open. Past atomic open patches did not do that yet, but I later realized that always using atomic open op
>>>>
>>>> - avoids the data corruption you run into
>>>> - probably no need for atomic-revalidate-open vfs patches anymore  as we can now safely set a high attr timeout
>>>>
>>>>
>>>> Kind of the same as your patch, just through a new op code.
>>>
>>> Awesome, thanks for the context Bernd. I think this works for our use
>>> case then. To confirm the "we will always update attributes on open"
>>> part, this will only send the FUSE_GETATTR request to the server if
>>> the server has invalidated the inode (eg through the
>>> fuse_lowlevel_notify_inval_inode() api), otherwise this will not send
>>> an extra FUSE_GETATTR request, correct? Other than the attribute
>>
>> If we send FUSE_OPEN_ATOMIC (or whatever we name it) in
>> fuse_file_open(), it would always ask server side for attributes.
> 
> Oh I see, the FUSE_OPEN_ATOMIC request itself would ask for attributes
> and the attributes would be sent by the server as the reply to the
> FUSE_ATOMIC_OPEN. This sounds great! in my patch, there's an
> additional FUSE_GETATTR request incurred to get the attributes.
> 
>> I.e. we assume that a server that has atomic open implemented can easily
>> provide attributes or asks for close-to-open coherency.
>>
>>
>> I'm not sure if I correctly understood your questions about
>> notifications and FUSE_GETATTR - from my point of view that that is
>> entirely independent from open. And personally I try to reduce
> 
> I missed that the attributes would be bundled with FUSE_OPEN_ATOMIC so
> I thought we would need an additional FUSE_GETATTR request to get
> them. Apologies for the confusion!
> 
>> kernel/userspace transitions - additional notifications and FUSE_GETATTR
>> are not helpful here :)
>>
>>> updating, would there be any other differences from using plain open
>>> vs the atomic open version of plain open?
>>
>> Just the additional file attributes and complexity that brings.
>>
>>>
>>> Do you have a tentative timeline in mind for when the next iteration
>>> of the atomic open patchset would be out?
>>
>> I wanted to have new fuse-uring patches ready by last week, but I'm
>> still refactoring things - changing things on top of the existing series
>> is easy, rebasing it is painful...  I can _try_ to make a raw new
>> atomic-open patch set during the next days (till Sunday), but not promised.
>>
> 
> Sounds great. thanks for your work on this!

Here is a totally untested (and probably ugly) version of what I had in my 
mind 

https://github.com/bsbernd/linux/commits/open-getattr/
https://github.com/libfuse/libfuse/pull/1020

(It builds, but nothing more tested).

Instead of rather complex atomic-open it adds FUSE_OPEN_GETATTR and hooks into
fuse_file_open. 
I was considering to hook into fuse_do_open, but that would cause quite some code
dup for fuse_file_open. We need the inode to update attributes and in fuse_do_open
we could use file->f_inode, but I didn't verify if it is reliable at this stage
(do_dentry_open() assignes it, but I didn't verify possible other code paths) - for
now I added the inode parameter to all code paths.


Going to test and clean it up tomorrow.


Thanks,
Bernd
Joanne Koong Aug. 20, 2024, 1:15 a.m. UTC | #11
On Mon, Aug 19, 2024 at 3:36 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 8/14/24 20:06, Joanne Koong wrote:
> > On Wed, Aug 14, 2024 at 10:52 AM Bernd Schubert
> > <bernd.schubert@fastmail.fm> wrote:
> >>
> >>
> >>
> >> On 8/14/24 19:18, Joanne Koong wrote:
> >>> On Tue, Aug 13, 2024 at 3:41 PM Bernd Schubert
> >>> <bernd.schubert@fastmail.fm> wrote:
> >>>>
> >>>> On August 13, 2024 11:57:44 PM GMT+02:00, Joanne Koong <joannelkoong@gmail.com> wrote:
> >>>>> On Tue, Aug 13, 2024 at 2:44 PM Bernd Schubert
> >>>>> <bernd.schubert@fastmail.fm> wrote:
> >>>>>>
> >>>>>> On 8/13/24 23:21, Joanne Koong wrote:
> >>>>>>> Add FOPEN_FETCH_ATTR flag to indicate that attributes should be
> >>>>>>> fetched from the server after an open.
> >>>>>>>
> >>>>>>> For fuse servers that are backed by network filesystems, this is
> >>>>>>> needed to ensure that file attributes are up to date between
> >>>>>>> consecutive open calls.
> >>>>>>>
> >>>>>>> For example, if there is a file that is opened on two fuse mounts,
> >>>>>>> in the following scenario:
> >>>>>>>
> >>>>>>> on mount A, open file.txt w/ O_APPEND, write "hi", close file
> >>>>>>> on mount B, open file.txt w/ O_APPEND, write "world", close file
> >>>>>>> on mount A, open file.txt w/ O_APPEND, write "123", close file
> >>>>>>>
> >>>>>>> when the file is reopened on mount A, the file inode contains the old
> >>>>>>> size and the last append will overwrite the data that was written when
> >>>>>>> the file was opened/written on mount B.
> >>>>>>>
> >>>>>>> (This corruption can be reproduced on the example libfuse passthrough_hp
> >>>>>>> server with writeback caching disabled and nopassthrough)
> >>>>>>>
> >>>>>>> Having this flag as an option enables parity with NFS's close-to-open
> >>>>>>> consistency.
> >>>>>>>
> >>>>>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >>>>>>> ---
> >>>>>>>  fs/fuse/file.c            | 7 ++++++-
> >>>>>>>  include/uapi/linux/fuse.h | 7 ++++++-
> >>>>>>>  2 files changed, 12 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> >>>>>>> index f39456c65ed7..437487ce413d 100644
> >>>>>>> --- a/fs/fuse/file.c
> >>>>>>> +++ b/fs/fuse/file.c
> >>>>>>> @@ -264,7 +264,12 @@ static int fuse_open(struct inode *inode, struct file *file)
> >>>>>>>       err = fuse_do_open(fm, get_node_id(inode), file, false);
> >>>>>>>       if (!err) {
> >>>>>>>               ff = file->private_data;
> >>>>>>> -             err = fuse_finish_open(inode, file);
> >>>>>>> +             if (ff->open_flags & FOPEN_FETCH_ATTR) {
> >>>>>>> +                     fuse_invalidate_attr(inode);
> >>>>>>> +                     err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
> >>>>>>> +             }
> >>>>>>> +             if (!err)
> >>>>>>> +                     err = fuse_finish_open(inode, file);
> >>>>>>>               if (err)
> >>>>>>>                       fuse_sync_release(fi, ff, file->f_flags);
> >>>>>>>               else if (is_truncate)
> >>>>>>
> >>>>>> I didn't come to it yet, but I actually wanted to update Dharmendras/my
> >>>>>> atomic open patches - giving up all the vfs changes (for now) and then
> >>>>>> always use atomic open if available, for FUSE_OPEN and FUSE_CREATE. And
> >>>>>> then update attributes through that.
> >>>>>> Would that be an alternative for you? Would basically require to add an
> >>>>>> atomic_open method into your file system.
> >>>>>>
> >>>>>> Definitely more complex than your solution, but avoids a another
> >>>>>> kernel/userspace transition.
> >>>>>
> >>>>> Hi Bernd,
> >>>>>
> >>>>> Unfortunately I don't think this is an alternative for my use case. I
> >>>>> haven't looked closely at the implementation details of your atomic
> >>>>> open patchset yet but if I'm understanding the gist of it correctly,
> >>>>> it bundles the lookup with the open into 1 request, where the
> >>>>> attributes can be passed from server -> kernel through the reply to
> >>>>> that request. I think in the case I'm working on, the file open call
> >>>>> does not require a lookup so it can't take advantage of your feature.
> >>>>> I just tested it on libfuse on the passthrough_hp server (with no
> >>>>> writeback caching and nopassthrough) on the example in the commit
> >>>>> message and I'm not seeing any lookup request being sent for that last
> >>>>> open call (for writing "123").
> >>>>>
> >>>>
> >>>>
> >>>> Hi Joanne,
> >>>>
> >>>> gets late here and I'm typing on my phone.  I hope formatting is ok.
> >>>>
> >>>> what I meant is that we use the atomic open op code for both, lookup-open and plain open - i.e. we always update attributes on open. Past atomic open patches did not do that yet, but I later realized that always using atomic open op
> >>>>
> >>>> - avoids the data corruption you run into
> >>>> - probably no need for atomic-revalidate-open vfs patches anymore  as we can now safely set a high attr timeout
> >>>>
> >>>>
> >>>> Kind of the same as your patch, just through a new op code.
> >>>
> >>> Awesome, thanks for the context Bernd. I think this works for our use
> >>> case then. To confirm the "we will always update attributes on open"
> >>> part, this will only send the FUSE_GETATTR request to the server if
> >>> the server has invalidated the inode (eg through the
> >>> fuse_lowlevel_notify_inval_inode() api), otherwise this will not send
> >>> an extra FUSE_GETATTR request, correct? Other than the attribute
> >>
> >> If we send FUSE_OPEN_ATOMIC (or whatever we name it) in
> >> fuse_file_open(), it would always ask server side for attributes.
> >
> > Oh I see, the FUSE_OPEN_ATOMIC request itself would ask for attributes
> > and the attributes would be sent by the server as the reply to the
> > FUSE_ATOMIC_OPEN. This sounds great! in my patch, there's an
> > additional FUSE_GETATTR request incurred to get the attributes.
> >
> >> I.e. we assume that a server that has atomic open implemented can easily
> >> provide attributes or asks for close-to-open coherency.
> >>
> >>
> >> I'm not sure if I correctly understood your questions about
> >> notifications and FUSE_GETATTR - from my point of view that that is
> >> entirely independent from open. And personally I try to reduce
> >
> > I missed that the attributes would be bundled with FUSE_OPEN_ATOMIC so
> > I thought we would need an additional FUSE_GETATTR request to get
> > them. Apologies for the confusion!
> >
> >> kernel/userspace transitions - additional notifications and FUSE_GETATTR
> >> are not helpful here :)
> >>
> >>> updating, would there be any other differences from using plain open
> >>> vs the atomic open version of plain open?
> >>
> >> Just the additional file attributes and complexity that brings.
> >>
> >>>
> >>> Do you have a tentative timeline in mind for when the next iteration
> >>> of the atomic open patchset would be out?
> >>
> >> I wanted to have new fuse-uring patches ready by last week, but I'm
> >> still refactoring things - changing things on top of the existing series
> >> is easy, rebasing it is painful...  I can _try_ to make a raw new
> >> atomic-open patch set during the next days (till Sunday), but not promised.
> >>
> >
> > Sounds great. thanks for your work on this!
>
> Here is a totally untested (and probably ugly) version of what I had in my
> mind
>
> https://github.com/bsbernd/linux/commits/open-getattr/
> https://github.com/libfuse/libfuse/pull/1020
>
> (It builds, but nothing more tested).
>
> Instead of rather complex atomic-open it adds FUSE_OPEN_GETATTR and hooks into
> fuse_file_open.
> I was considering to hook into fuse_do_open, but that would cause quite some code
> dup for fuse_file_open. We need the inode to update attributes and in fuse_do_open
> we could use file->f_inode, but I didn't verify if it is reliable at this stage
> (do_dentry_open() assignes it, but I didn't verify possible other code paths) - for
> now I added the inode parameter to all code paths.
>
>
> Going to test and clean it up tomorrow.

Thanks for the update, Bernd!

>
>
> Thanks,
> Bernd
diff mbox series

Patch

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f39456c65ed7..437487ce413d 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -264,7 +264,12 @@  static int fuse_open(struct inode *inode, struct file *file)
 	err = fuse_do_open(fm, get_node_id(inode), file, false);
 	if (!err) {
 		ff = file->private_data;
-		err = fuse_finish_open(inode, file);
+		if (ff->open_flags & FOPEN_FETCH_ATTR) {
+			fuse_invalidate_attr(inode);
+			err = fuse_update_attributes(inode, file, STATX_BASIC_STATS);
+		}
+		if (!err)
+			err = fuse_finish_open(inode, file);
 		if (err)
 			fuse_sync_release(fi, ff, file->f_flags);
 		else if (is_truncate)
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index d08b99d60f6f..f5d1af6fe352 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -217,6 +217,9 @@ 
  *  - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag
  *  - add FUSE_NO_EXPORT_SUPPORT init flag
  *  - add FUSE_NOTIFY_RESEND, add FUSE_HAS_RESEND init flag
+ *
+ *  7.41
+ *  - add FOPEN_FETCH_ATTR
  */
 
 #ifndef _LINUX_FUSE_H
@@ -252,7 +255,7 @@ 
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 40
+#define FUSE_KERNEL_MINOR_VERSION 41
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -360,6 +363,7 @@  struct fuse_file_lock {
  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
  * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
  * FOPEN_PASSTHROUGH: passthrough read/write io for this open file
+ * FOPEN_FETCH_ATTR: attributes are fetched after file is opened
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
@@ -369,6 +373,7 @@  struct fuse_file_lock {
 #define FOPEN_NOFLUSH		(1 << 5)
 #define FOPEN_PARALLEL_DIRECT_WRITES	(1 << 6)
 #define FOPEN_PASSTHROUGH	(1 << 7)
+#define FOPEN_FETCH_ATTR	(1 << 8)
 
 /**
  * INIT request/reply flags