diff mbox series

fuse: Allow to align reads/writes

Message ID 20240702163108.616342-1-bschubert@ddn.com (mailing list archive)
State New
Headers show
Series fuse: Allow to align reads/writes | expand

Commit Message

Bernd Schubert July 2, 2024, 4:31 p.m. UTC
Read/writes IOs should be page aligned as fuse server
might need to copy data to another buffer otherwise in
order to fulfill network or device storage requirements.

Simple reproducer is with libfuse, example/passthrough*
and opening a file with O_DIRECT - without this change
writing to that file failed with -EINVAL if the underlying
file system was using ext4 (for passthrough_hp the
'passthrough' feature has to be disabled).

Given this needs server side changes as new feature flag is
introduced.

Disadvantage of aligned writes is that server side needs
needs another splice syscall (when splice is used) to seek
over the unaligned area - i.e. syscall and memory copy overhead.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>

---
From implementation point of view 'struct fuse_in_arg' /
'struct fuse_arg' gets another parameter 'align_size', which has to
be set by fuse_write_args_fill. For all other fuse operations this
parameter has to be 0, which is guranteed by the existing
initialization via FUSE_ARGS and C99 style
initialization { .size = 0, .value = NULL }, i.e. other members are
zero.
Another choice would have been to extend fuse_write_in to
PAGE_SIZE - sizeof(fuse_in_header), but then would be an
arch/PAGE_SIZE depending struct size and would also require
lots of stack usage.
---
 fs/fuse/dev.c             | 21 +++++++++++++++++++--
 fs/fuse/file.c            | 12 ++++++++++++
 fs/fuse/fuse_i.h          |  9 +++++++--
 fs/fuse/inode.c           |  5 ++++-
 include/uapi/linux/fuse.h | 13 +++++++++++--
 5 files changed, 53 insertions(+), 7 deletions(-)

Comments

Bernd Schubert July 3, 2024, 11:59 a.m. UTC | #1
On 7/2/24 18:31, Bernd Schubert wrote:
> Read/writes IOs should be page aligned as fuse server
> might need to copy data to another buffer otherwise in
> order to fulfill network or device storage requirements.

Sorry subject line and and the description above wrongly mention reads -
this change is about writes only and also only required for writes.



Thanks,
Bernd
Josef Bacik July 3, 2024, 3:15 p.m. UTC | #2
On Tue, Jul 02, 2024 at 06:31:08PM +0200, Bernd Schubert wrote:
> Read/writes IOs should be page aligned as fuse server
> might need to copy data to another buffer otherwise in
> order to fulfill network or device storage requirements.
> 
> Simple reproducer is with libfuse, example/passthrough*
> and opening a file with O_DIRECT - without this change
> writing to that file failed with -EINVAL if the underlying
> file system was using ext4 (for passthrough_hp the
> 'passthrough' feature has to be disabled).
> 
> Given this needs server side changes as new feature flag is
> introduced.
> 
> Disadvantage of aligned writes is that server side needs
> needs another splice syscall (when splice is used) to seek
> over the unaligned area - i.e. syscall and memory copy overhead.
> 
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> 
> ---
> From implementation point of view 'struct fuse_in_arg' /
> 'struct fuse_arg' gets another parameter 'align_size', which has to
> be set by fuse_write_args_fill. For all other fuse operations this
> parameter has to be 0, which is guranteed by the existing
> initialization via FUSE_ARGS and C99 style
> initialization { .size = 0, .value = NULL }, i.e. other members are
> zero.
> Another choice would have been to extend fuse_write_in to
> PAGE_SIZE - sizeof(fuse_in_header), but then would be an
> arch/PAGE_SIZE depending struct size and would also require
> lots of stack usage.

Can I see the libfuse side of this?  I'm confused why we need the align_size at
all?  Is it enough to just say that this connection is aligned, negotiate what
the alignment is up front, and then avoid sending it along on every write?
Thanks,

Josef
Bernd Schubert July 3, 2024, 3:58 p.m. UTC | #3
On 7/3/24 17:15, Josef Bacik wrote:
> On Tue, Jul 02, 2024 at 06:31:08PM +0200, Bernd Schubert wrote:
>> Read/writes IOs should be page aligned as fuse server
>> might need to copy data to another buffer otherwise in
>> order to fulfill network or device storage requirements.
>>
>> Simple reproducer is with libfuse, example/passthrough*
>> and opening a file with O_DIRECT - without this change
>> writing to that file failed with -EINVAL if the underlying
>> file system was using ext4 (for passthrough_hp the
>> 'passthrough' feature has to be disabled).
>>
>> Given this needs server side changes as new feature flag is
>> introduced.
>>
>> Disadvantage of aligned writes is that server side needs
>> needs another splice syscall (when splice is used) to seek
>> over the unaligned area - i.e. syscall and memory copy overhead.
>>
>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>>
>> ---
>> From implementation point of view 'struct fuse_in_arg' /
>> 'struct fuse_arg' gets another parameter 'align_size', which has to
>> be set by fuse_write_args_fill. For all other fuse operations this
>> parameter has to be 0, which is guranteed by the existing
>> initialization via FUSE_ARGS and C99 style
>> initialization { .size = 0, .value = NULL }, i.e. other members are
>> zero.
>> Another choice would have been to extend fuse_write_in to
>> PAGE_SIZE - sizeof(fuse_in_header), but then would be an
>> arch/PAGE_SIZE depending struct size and would also require
>> lots of stack usage.
> 
> Can I see the libfuse side of this?  I'm confused why we need the align_size at
> all?  Is it enough to just say that this connection is aligned, negotiate what
> the alignment is up front, and then avoid sending it along on every write?

Sure, I had forgotten to post it
https://github.com/bsbernd/libfuse/commit/89049d066efade047a72bcd1af8ad68061b11e7c

We could also just act on fc->align_writes / FUSE_ALIGN_WRITES and always use 
sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in) in libfuse and would
avoid to send it inside of fuse_write_in. We still need to add it to struct fuse_in_arg,
unless you want to check the request type within fuse_copy_args().

The part I don't like in general about current fuse header handling (besides alignment)
is that any header size changes will break fuse server and therefore need to be very
carefully handled. See for example libfuse commit 681a0c1178fa.



Thanks,
Bernd
Josef Bacik July 3, 2024, 5:30 p.m. UTC | #4
On Wed, Jul 03, 2024 at 05:58:20PM +0200, Bernd Schubert wrote:
> 
> 
> On 7/3/24 17:15, Josef Bacik wrote:
> > On Tue, Jul 02, 2024 at 06:31:08PM +0200, Bernd Schubert wrote:
> >> Read/writes IOs should be page aligned as fuse server
> >> might need to copy data to another buffer otherwise in
> >> order to fulfill network or device storage requirements.
> >>
> >> Simple reproducer is with libfuse, example/passthrough*
> >> and opening a file with O_DIRECT - without this change
> >> writing to that file failed with -EINVAL if the underlying
> >> file system was using ext4 (for passthrough_hp the
> >> 'passthrough' feature has to be disabled).
> >>
> >> Given this needs server side changes as new feature flag is
> >> introduced.
> >>
> >> Disadvantage of aligned writes is that server side needs
> >> needs another splice syscall (when splice is used) to seek
> >> over the unaligned area - i.e. syscall and memory copy overhead.
> >>
> >> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> >>
> >> ---
> >> From implementation point of view 'struct fuse_in_arg' /
> >> 'struct fuse_arg' gets another parameter 'align_size', which has to
> >> be set by fuse_write_args_fill. For all other fuse operations this
> >> parameter has to be 0, which is guranteed by the existing
> >> initialization via FUSE_ARGS and C99 style
> >> initialization { .size = 0, .value = NULL }, i.e. other members are
> >> zero.
> >> Another choice would have been to extend fuse_write_in to
> >> PAGE_SIZE - sizeof(fuse_in_header), but then would be an
> >> arch/PAGE_SIZE depending struct size and would also require
> >> lots of stack usage.
> > 
> > Can I see the libfuse side of this?  I'm confused why we need the align_size at
> > all?  Is it enough to just say that this connection is aligned, negotiate what
> > the alignment is up front, and then avoid sending it along on every write?
> 
> Sure, I had forgotten to post it
> https://github.com/bsbernd/libfuse/commit/89049d066efade047a72bcd1af8ad68061b11e7c
> 
> We could also just act on fc->align_writes / FUSE_ALIGN_WRITES and always use 
> sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in) in libfuse and would
> avoid to send it inside of fuse_write_in. We still need to add it to struct fuse_in_arg,
> unless you want to check the request type within fuse_copy_args().

I think I like this approach better, at the very least it allows us to use the
padding for other silly things in the future.

> 
> The part I don't like in general about current fuse header handling (besides alignment)
> is that any header size changes will break fuse server and therefore need to be very
> carefully handled. See for example libfuse commit 681a0c1178fa.
> 

Agreed, if we could have the length of the control struct in the header then
then things would be a lot simpler to extend later on, but here we are.  Thanks,

Josef
Joanne Koong July 3, 2024, 5:49 p.m. UTC | #5
On Wed, Jul 3, 2024 at 10:30 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Wed, Jul 03, 2024 at 05:58:20PM +0200, Bernd Schubert wrote:
> >
> >
> > On 7/3/24 17:15, Josef Bacik wrote:
> > > On Tue, Jul 02, 2024 at 06:31:08PM +0200, Bernd Schubert wrote:
> > >> Read/writes IOs should be page aligned as fuse server
> > >> might need to copy data to another buffer otherwise in
> > >> order to fulfill network or device storage requirements.
> > >>
> > >> Simple reproducer is with libfuse, example/passthrough*
> > >> and opening a file with O_DIRECT - without this change
> > >> writing to that file failed with -EINVAL if the underlying
> > >> file system was using ext4 (for passthrough_hp the
> > >> 'passthrough' feature has to be disabled).
> > >>
> > >> Given this needs server side changes as new feature flag is
> > >> introduced.
> > >>
> > >> Disadvantage of aligned writes is that server side needs
> > >> needs another splice syscall (when splice is used) to seek
> > >> over the unaligned area - i.e. syscall and memory copy overhead.
> > >>
> > >> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> > >>
> > >> ---
> > >> From implementation point of view 'struct fuse_in_arg' /
> > >> 'struct fuse_arg' gets another parameter 'align_size', which has to
> > >> be set by fuse_write_args_fill. For all other fuse operations this
> > >> parameter has to be 0, which is guranteed by the existing
> > >> initialization via FUSE_ARGS and C99 style
> > >> initialization { .size = 0, .value = NULL }, i.e. other members are
> > >> zero.
> > >> Another choice would have been to extend fuse_write_in to
> > >> PAGE_SIZE - sizeof(fuse_in_header), but then would be an
> > >> arch/PAGE_SIZE depending struct size and would also require
> > >> lots of stack usage.
> > >
> > > Can I see the libfuse side of this?  I'm confused why we need the align_size at
> > > all?  Is it enough to just say that this connection is aligned, negotiate what
> > > the alignment is up front, and then avoid sending it along on every write?
> >
> > Sure, I had forgotten to post it
> > https://github.com/bsbernd/libfuse/commit/89049d066efade047a72bcd1af8ad68061b11e7c
> >
> > We could also just act on fc->align_writes / FUSE_ALIGN_WRITES and always use
> > sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in) in libfuse and would
> > avoid to send it inside of fuse_write_in. We still need to add it to struct fuse_in_arg,
> > unless you want to check the request type within fuse_copy_args().
>
> I think I like this approach better, at the very least it allows us to use the
> padding for other silly things in the future.
>

This approach seems cleaner to me as well.
I also like the idea of having callers pass in whether alignment
should be done or not to fuse_copy_args() instead of adding
"align_writes" to struct fuse_in_arg.

Thanks,
Joanne

> >
> > The part I don't like in general about current fuse header handling (besides alignment)
> > is that any header size changes will break fuse server and therefore need to be very
> > carefully handled. See for example libfuse commit 681a0c1178fa.
> >
>
> Agreed, if we could have the length of the control struct in the header then
> then things would be a lot simpler to extend later on, but here we are.  Thanks,
>
> Josef
>
Bernd Schubert July 3, 2024, 6:07 p.m. UTC | #6
On 7/3/24 19:49, Joanne Koong wrote:
> On Wed, Jul 3, 2024 at 10:30 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> On Wed, Jul 03, 2024 at 05:58:20PM +0200, Bernd Schubert wrote:
>>>
>>>
>>> On 7/3/24 17:15, Josef Bacik wrote:
>>>> On Tue, Jul 02, 2024 at 06:31:08PM +0200, Bernd Schubert wrote:
>>>>> Read/writes IOs should be page aligned as fuse server
>>>>> might need to copy data to another buffer otherwise in
>>>>> order to fulfill network or device storage requirements.
>>>>>
>>>>> Simple reproducer is with libfuse, example/passthrough*
>>>>> and opening a file with O_DIRECT - without this change
>>>>> writing to that file failed with -EINVAL if the underlying
>>>>> file system was using ext4 (for passthrough_hp the
>>>>> 'passthrough' feature has to be disabled).
>>>>>
>>>>> Given this needs server side changes as new feature flag is
>>>>> introduced.
>>>>>
>>>>> Disadvantage of aligned writes is that server side needs
>>>>> needs another splice syscall (when splice is used) to seek
>>>>> over the unaligned area - i.e. syscall and memory copy overhead.
>>>>>
>>>>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>>>>>
>>>>> ---
>>>>> From implementation point of view 'struct fuse_in_arg' /
>>>>> 'struct fuse_arg' gets another parameter 'align_size', which has to
>>>>> be set by fuse_write_args_fill. For all other fuse operations this
>>>>> parameter has to be 0, which is guranteed by the existing
>>>>> initialization via FUSE_ARGS and C99 style
>>>>> initialization { .size = 0, .value = NULL }, i.e. other members are
>>>>> zero.
>>>>> Another choice would have been to extend fuse_write_in to
>>>>> PAGE_SIZE - sizeof(fuse_in_header), but then would be an
>>>>> arch/PAGE_SIZE depending struct size and would also require
>>>>> lots of stack usage.
>>>>
>>>> Can I see the libfuse side of this?  I'm confused why we need the align_size at
>>>> all?  Is it enough to just say that this connection is aligned, negotiate what
>>>> the alignment is up front, and then avoid sending it along on every write?
>>>
>>> Sure, I had forgotten to post it
>>> https://github.com/bsbernd/libfuse/commit/89049d066efade047a72bcd1af8ad68061b11e7c
>>>
>>> We could also just act on fc->align_writes / FUSE_ALIGN_WRITES and always use
>>> sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in) in libfuse and would
>>> avoid to send it inside of fuse_write_in. We still need to add it to struct fuse_in_arg,
>>> unless you want to check the request type within fuse_copy_args().
>>
>> I think I like this approach better, at the very least it allows us to use the
>> padding for other silly things in the future.
>>
> 
> This approach seems cleaner to me as well.
> I also like the idea of having callers pass in whether alignment
> should be done or not to fuse_copy_args() instead of adding
> "align_writes" to struct fuse_in_arg.

There is no caller for FUSE_WRITE for fuse_copy_args(), but it is called
from fuse_dev_do_read for all request types. I'm going to add in request
parsing within fuse_copy_args, I can't decide myself which of both
versions I like less.

Thanks,
Bernd
Joanne Koong July 3, 2024, 8:28 p.m. UTC | #7
On Wed, Jul 3, 2024 at 11:08 AM Bernd Schubert <bschubert@ddn.com> wrote:
>
> On 7/3/24 19:49, Joanne Koong wrote:
> > On Wed, Jul 3, 2024 at 10:30 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >>
> >> On Wed, Jul 03, 2024 at 05:58:20PM +0200, Bernd Schubert wrote:
> >>>
> >>>
> >>> On 7/3/24 17:15, Josef Bacik wrote:
> >>>> On Tue, Jul 02, 2024 at 06:31:08PM +0200, Bernd Schubert wrote:
> >>>>> Read/writes IOs should be page aligned as fuse server
> >>>>> might need to copy data to another buffer otherwise in
> >>>>> order to fulfill network or device storage requirements.
> >>>>>
> >>>>> Simple reproducer is with libfuse, example/passthrough*
> >>>>> and opening a file with O_DIRECT - without this change
> >>>>> writing to that file failed with -EINVAL if the underlying
> >>>>> file system was using ext4 (for passthrough_hp the
> >>>>> 'passthrough' feature has to be disabled).
> >>>>>
> >>>>> Given this needs server side changes as new feature flag is
> >>>>> introduced.
> >>>>>
> >>>>> Disadvantage of aligned writes is that server side needs
> >>>>> needs another splice syscall (when splice is used) to seek
> >>>>> over the unaligned area - i.e. syscall and memory copy overhead.
> >>>>>
> >>>>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> >>>>>
> >>>>> ---
> >>>>> From implementation point of view 'struct fuse_in_arg' /
> >>>>> 'struct fuse_arg' gets another parameter 'align_size', which has to
> >>>>> be set by fuse_write_args_fill. For all other fuse operations this
> >>>>> parameter has to be 0, which is guranteed by the existing
> >>>>> initialization via FUSE_ARGS and C99 style
> >>>>> initialization { .size = 0, .value = NULL }, i.e. other members are
> >>>>> zero.
> >>>>> Another choice would have been to extend fuse_write_in to
> >>>>> PAGE_SIZE - sizeof(fuse_in_header), but then would be an
> >>>>> arch/PAGE_SIZE depending struct size and would also require
> >>>>> lots of stack usage.
> >>>>
> >>>> Can I see the libfuse side of this?  I'm confused why we need the align_size at
> >>>> all?  Is it enough to just say that this connection is aligned, negotiate what
> >>>> the alignment is up front, and then avoid sending it along on every write?
> >>>
> >>> Sure, I had forgotten to post it
> >>> https://github.com/bsbernd/libfuse/commit/89049d066efade047a72bcd1af8ad68061b11e7c
> >>>
> >>> We could also just act on fc->align_writes / FUSE_ALIGN_WRITES and always use
> >>> sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in) in libfuse and would
> >>> avoid to send it inside of fuse_write_in. We still need to add it to struct fuse_in_arg,
> >>> unless you want to check the request type within fuse_copy_args().
> >>
> >> I think I like this approach better, at the very least it allows us to use the
> >> padding for other silly things in the future.
> >>
> >
> > This approach seems cleaner to me as well.
> > I also like the idea of having callers pass in whether alignment
> > should be done or not to fuse_copy_args() instead of adding
> > "align_writes" to struct fuse_in_arg.
>
> There is no caller for FUSE_WRITE for fuse_copy_args(), but it is called
> from fuse_dev_do_read for all request types. I'm going to add in request
> parsing within fuse_copy_args, I can't decide myself which of both
> versions I like less.

Sorry I should have clarified better :) By callers, I meant callers to
fuse_copy_args(). I'm still getting up to speed with the fuse code but
it looks like it gets called by both fuse_dev_do_read and
fuse_dev_do_write (through copy_out_args() -> fuse_copy_args()). The
cleanest solution to me seems like to pass in from those callers
whether the request should be page-aligned after the headers or not,
instead of doing the request parsing within fuse_copy_args() itself. I
think if we do the request parsing within fuse_copy_args() then we
would also need to have some way to differentiate between FUSE_WRITE
requests from the dev_do_read vs dev_do_write side (since, as I
understand it, writes only needs to be aligned for dev_do_read write
requests).

Thanks,
Joanne

>
> Thanks,
> Bernd
>
Bernd Schubert July 3, 2024, 8:44 p.m. UTC | #8
On 7/3/24 22:28, Joanne Koong wrote:
> On Wed, Jul 3, 2024 at 11:08 AM Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> On 7/3/24 19:49, Joanne Koong wrote:
>>> On Wed, Jul 3, 2024 at 10:30 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>
>>>> On Wed, Jul 03, 2024 at 05:58:20PM +0200, Bernd Schubert wrote:
>>>>>
>>>>>
>>>>> On 7/3/24 17:15, Josef Bacik wrote:
>>>>>> On Tue, Jul 02, 2024 at 06:31:08PM +0200, Bernd Schubert wrote:
>>>>>>> Read/writes IOs should be page aligned as fuse server
>>>>>>> might need to copy data to another buffer otherwise in
>>>>>>> order to fulfill network or device storage requirements.
>>>>>>>
>>>>>>> Simple reproducer is with libfuse, example/passthrough*
>>>>>>> and opening a file with O_DIRECT - without this change
>>>>>>> writing to that file failed with -EINVAL if the underlying
>>>>>>> file system was using ext4 (for passthrough_hp the
>>>>>>> 'passthrough' feature has to be disabled).
>>>>>>>
>>>>>>> Given this needs server side changes as new feature flag is
>>>>>>> introduced.
>>>>>>>
>>>>>>> Disadvantage of aligned writes is that server side needs
>>>>>>> needs another splice syscall (when splice is used) to seek
>>>>>>> over the unaligned area - i.e. syscall and memory copy overhead.
>>>>>>>
>>>>>>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>>>>>>>
>>>>>>> ---
>>>>>>> From implementation point of view 'struct fuse_in_arg' /
>>>>>>> 'struct fuse_arg' gets another parameter 'align_size', which has to
>>>>>>> be set by fuse_write_args_fill. For all other fuse operations this
>>>>>>> parameter has to be 0, which is guranteed by the existing
>>>>>>> initialization via FUSE_ARGS and C99 style
>>>>>>> initialization { .size = 0, .value = NULL }, i.e. other members are
>>>>>>> zero.
>>>>>>> Another choice would have been to extend fuse_write_in to
>>>>>>> PAGE_SIZE - sizeof(fuse_in_header), but then would be an
>>>>>>> arch/PAGE_SIZE depending struct size and would also require
>>>>>>> lots of stack usage.
>>>>>>
>>>>>> Can I see the libfuse side of this?  I'm confused why we need the align_size at
>>>>>> all?  Is it enough to just say that this connection is aligned, negotiate what
>>>>>> the alignment is up front, and then avoid sending it along on every write?
>>>>>
>>>>> Sure, I had forgotten to post it
>>>>> https://github.com/bsbernd/libfuse/commit/89049d066efade047a72bcd1af8ad68061b11e7c
>>>>>
>>>>> We could also just act on fc->align_writes / FUSE_ALIGN_WRITES and always use
>>>>> sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in) in libfuse and would
>>>>> avoid to send it inside of fuse_write_in. We still need to add it to struct fuse_in_arg,
>>>>> unless you want to check the request type within fuse_copy_args().
>>>>
>>>> I think I like this approach better, at the very least it allows us to use the
>>>> padding for other silly things in the future.
>>>>
>>>
>>> This approach seems cleaner to me as well.
>>> I also like the idea of having callers pass in whether alignment
>>> should be done or not to fuse_copy_args() instead of adding
>>> "align_writes" to struct fuse_in_arg.
>>
>> There is no caller for FUSE_WRITE for fuse_copy_args(), but it is called
>> from fuse_dev_do_read for all request types. I'm going to add in request
>> parsing within fuse_copy_args, I can't decide myself which of both
>> versions I like less.
> 
> Sorry I should have clarified better :) By callers, I meant callers to
> fuse_copy_args(). I'm still getting up to speed with the fuse code but
> it looks like it gets called by both fuse_dev_do_read and
> fuse_dev_do_write (through copy_out_args() -> fuse_copy_args()). The
> cleanest solution to me seems like to pass in from those callers
> whether the request should be page-aligned after the headers or not,
> instead of doing the request parsing within fuse_copy_args() itself. I
> think if we do the request parsing within fuse_copy_args() then we
> would also need to have some way to differentiate between FUSE_WRITE
> requests from the dev_do_read vs dev_do_write side (since, as I
> understand it, writes only needs to be aligned for dev_do_read write
> requests).

fuse_dev_do_write() is used to submit results from fuse server
(userspace), i.e. not interesting here. If we don't parse in
fuse_copy_args(), we would have to do that in fuse_dev_do_read() - it
doesn't have knowledge about the request it handles either - it just
takes from lists what is there. So if we don't want to have it encoded
in fuse_in_arg, there has to request type checking. Given the existing
number of conditions in fuse_dev_do_read, I would like to avoid adding
in even more there.


Thanks,
Bernd
Josef Bacik July 4, 2024, 3:10 p.m. UTC | #9
On Wed, Jul 03, 2024 at 10:44:28PM +0200, Bernd Schubert wrote:
> 
> 
> On 7/3/24 22:28, Joanne Koong wrote:
> > On Wed, Jul 3, 2024 at 11:08 AM Bernd Schubert <bschubert@ddn.com> wrote:
> >>
> >> On 7/3/24 19:49, Joanne Koong wrote:
> >>> On Wed, Jul 3, 2024 at 10:30 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >>>>
> >>>> On Wed, Jul 03, 2024 at 05:58:20PM +0200, Bernd Schubert wrote:
> >>>>>
> >>>>>
> >>>>> On 7/3/24 17:15, Josef Bacik wrote:
> >>>>>> On Tue, Jul 02, 2024 at 06:31:08PM +0200, Bernd Schubert wrote:
> >>>>>>> Read/writes IOs should be page aligned as fuse server
> >>>>>>> might need to copy data to another buffer otherwise in
> >>>>>>> order to fulfill network or device storage requirements.
> >>>>>>>
> >>>>>>> Simple reproducer is with libfuse, example/passthrough*
> >>>>>>> and opening a file with O_DIRECT - without this change
> >>>>>>> writing to that file failed with -EINVAL if the underlying
> >>>>>>> file system was using ext4 (for passthrough_hp the
> >>>>>>> 'passthrough' feature has to be disabled).
> >>>>>>>
> >>>>>>> Given this needs server side changes as new feature flag is
> >>>>>>> introduced.
> >>>>>>>
> >>>>>>> Disadvantage of aligned writes is that server side needs
> >>>>>>> needs another splice syscall (when splice is used) to seek
> >>>>>>> over the unaligned area - i.e. syscall and memory copy overhead.
> >>>>>>>
> >>>>>>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> >>>>>>>
> >>>>>>> ---
> >>>>>>> From implementation point of view 'struct fuse_in_arg' /
> >>>>>>> 'struct fuse_arg' gets another parameter 'align_size', which has to
> >>>>>>> be set by fuse_write_args_fill. For all other fuse operations this
> >>>>>>> parameter has to be 0, which is guranteed by the existing
> >>>>>>> initialization via FUSE_ARGS and C99 style
> >>>>>>> initialization { .size = 0, .value = NULL }, i.e. other members are
> >>>>>>> zero.
> >>>>>>> Another choice would have been to extend fuse_write_in to
> >>>>>>> PAGE_SIZE - sizeof(fuse_in_header), but then would be an
> >>>>>>> arch/PAGE_SIZE depending struct size and would also require
> >>>>>>> lots of stack usage.
> >>>>>>
> >>>>>> Can I see the libfuse side of this?  I'm confused why we need the align_size at
> >>>>>> all?  Is it enough to just say that this connection is aligned, negotiate what
> >>>>>> the alignment is up front, and then avoid sending it along on every write?
> >>>>>
> >>>>> Sure, I had forgotten to post it
> >>>>> https://github.com/bsbernd/libfuse/commit/89049d066efade047a72bcd1af8ad68061b11e7c
> >>>>>
> >>>>> We could also just act on fc->align_writes / FUSE_ALIGN_WRITES and always use
> >>>>> sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in) in libfuse and would
> >>>>> avoid to send it inside of fuse_write_in. We still need to add it to struct fuse_in_arg,
> >>>>> unless you want to check the request type within fuse_copy_args().
> >>>>
> >>>> I think I like this approach better, at the very least it allows us to use the
> >>>> padding for other silly things in the future.
> >>>>
> >>>
> >>> This approach seems cleaner to me as well.
> >>> I also like the idea of having callers pass in whether alignment
> >>> should be done or not to fuse_copy_args() instead of adding
> >>> "align_writes" to struct fuse_in_arg.
> >>
> >> There is no caller for FUSE_WRITE for fuse_copy_args(), but it is called
> >> from fuse_dev_do_read for all request types. I'm going to add in request
> >> parsing within fuse_copy_args, I can't decide myself which of both
> >> versions I like less.
> > 
> > Sorry I should have clarified better :) By callers, I meant callers to
> > fuse_copy_args(). I'm still getting up to speed with the fuse code but
> > it looks like it gets called by both fuse_dev_do_read and
> > fuse_dev_do_write (through copy_out_args() -> fuse_copy_args()). The
> > cleanest solution to me seems like to pass in from those callers
> > whether the request should be page-aligned after the headers or not,
> > instead of doing the request parsing within fuse_copy_args() itself. I
> > think if we do the request parsing within fuse_copy_args() then we
> > would also need to have some way to differentiate between FUSE_WRITE
> > requests from the dev_do_read vs dev_do_write side (since, as I
> > understand it, writes only needs to be aligned for dev_do_read write
> > requests).
> 
> fuse_dev_do_write() is used to submit results from fuse server
> (userspace), i.e. not interesting here. If we don't parse in
> fuse_copy_args(), we would have to do that in fuse_dev_do_read() - it
> doesn't have knowledge about the request it handles either - it just
> takes from lists what is there. So if we don't want to have it encoded
> in fuse_in_arg, there has to request type checking. Given the existing
> number of conditions in fuse_dev_do_read, I would like to avoid adding
> in even more there.
> 

Your original alternative I think is better, leave it in fuse_in_arg and take it
out of the write arg.  Thanks,

Josef
Bernd Schubert July 4, 2024, 3:49 p.m. UTC | #10
On 7/4/24 17:10, Josef Bacik wrote:
> On Wed, Jul 03, 2024 at 10:44:28PM +0200, Bernd Schubert wrote:
>>
>>
>> On 7/3/24 22:28, Joanne Koong wrote:
>>> On Wed, Jul 3, 2024 at 11:08 AM Bernd Schubert <bschubert@ddn.com> wrote:
>>>>
>>>> On 7/3/24 19:49, Joanne Koong wrote:
>>>>> On Wed, Jul 3, 2024 at 10:30 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>>
>>>>>> On Wed, Jul 03, 2024 at 05:58:20PM +0200, Bernd Schubert wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 7/3/24 17:15, Josef Bacik wrote:
>>>>>>>> On Tue, Jul 02, 2024 at 06:31:08PM +0200, Bernd Schubert wrote:
>>>>>>>>> Read/writes IOs should be page aligned as fuse server
>>>>>>>>> might need to copy data to another buffer otherwise in
>>>>>>>>> order to fulfill network or device storage requirements.
>>>>>>>>>
>>>>>>>>> Simple reproducer is with libfuse, example/passthrough*
>>>>>>>>> and opening a file with O_DIRECT - without this change
>>>>>>>>> writing to that file failed with -EINVAL if the underlying
>>>>>>>>> file system was using ext4 (for passthrough_hp the
>>>>>>>>> 'passthrough' feature has to be disabled).
>>>>>>>>>
>>>>>>>>> Given this needs server side changes as new feature flag is
>>>>>>>>> introduced.
>>>>>>>>>
>>>>>>>>> Disadvantage of aligned writes is that server side needs
>>>>>>>>> needs another splice syscall (when splice is used) to seek
>>>>>>>>> over the unaligned area - i.e. syscall and memory copy overhead.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>  From implementation point of view 'struct fuse_in_arg' /
>>>>>>>>> 'struct fuse_arg' gets another parameter 'align_size', which has to
>>>>>>>>> be set by fuse_write_args_fill. For all other fuse operations this
>>>>>>>>> parameter has to be 0, which is guranteed by the existing
>>>>>>>>> initialization via FUSE_ARGS and C99 style
>>>>>>>>> initialization { .size = 0, .value = NULL }, i.e. other members are
>>>>>>>>> zero.
>>>>>>>>> Another choice would have been to extend fuse_write_in to
>>>>>>>>> PAGE_SIZE - sizeof(fuse_in_header), but then would be an
>>>>>>>>> arch/PAGE_SIZE depending struct size and would also require
>>>>>>>>> lots of stack usage.
>>>>>>>>
>>>>>>>> Can I see the libfuse side of this?  I'm confused why we need the align_size at
>>>>>>>> all?  Is it enough to just say that this connection is aligned, negotiate what
>>>>>>>> the alignment is up front, and then avoid sending it along on every write?
>>>>>>>
>>>>>>> Sure, I had forgotten to post it
>>>>>>> https://github.com/bsbernd/libfuse/commit/89049d066efade047a72bcd1af8ad68061b11e7c
>>>>>>>
>>>>>>> We could also just act on fc->align_writes / FUSE_ALIGN_WRITES and always use
>>>>>>> sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in) in libfuse and would
>>>>>>> avoid to send it inside of fuse_write_in. We still need to add it to struct fuse_in_arg,
>>>>>>> unless you want to check the request type within fuse_copy_args().
>>>>>>
>>>>>> I think I like this approach better, at the very least it allows us to use the
>>>>>> padding for other silly things in the future.
>>>>>>
>>>>>
>>>>> This approach seems cleaner to me as well.
>>>>> I also like the idea of having callers pass in whether alignment
>>>>> should be done or not to fuse_copy_args() instead of adding
>>>>> "align_writes" to struct fuse_in_arg.
>>>>
>>>> There is no caller for FUSE_WRITE for fuse_copy_args(), but it is called
>>>> from fuse_dev_do_read for all request types. I'm going to add in request
>>>> parsing within fuse_copy_args, I can't decide myself which of both
>>>> versions I like less.
>>>
>>> Sorry I should have clarified better :) By callers, I meant callers to
>>> fuse_copy_args(). I'm still getting up to speed with the fuse code but
>>> it looks like it gets called by both fuse_dev_do_read and
>>> fuse_dev_do_write (through copy_out_args() -> fuse_copy_args()). The
>>> cleanest solution to me seems like to pass in from those callers
>>> whether the request should be page-aligned after the headers or not,
>>> instead of doing the request parsing within fuse_copy_args() itself. I
>>> think if we do the request parsing within fuse_copy_args() then we
>>> would also need to have some way to differentiate between FUSE_WRITE
>>> requests from the dev_do_read vs dev_do_write side (since, as I
>>> understand it, writes only needs to be aligned for dev_do_read write
>>> requests).
>>
>> fuse_dev_do_write() is used to submit results from fuse server
>> (userspace), i.e. not interesting here. If we don't parse in
>> fuse_copy_args(), we would have to do that in fuse_dev_do_read() - it
>> doesn't have knowledge about the request it handles either - it just
>> takes from lists what is there. So if we don't want to have it encoded
>> in fuse_in_arg, there has to request type checking. Given the existing
>> number of conditions in fuse_dev_do_read, I would like to avoid adding
>> in even more there.
>>
> 
> Your original alternative I think is better, leave it in fuse_in_arg and take it
> out of the write arg.  Thanks,

Thank you! I'm going to send out a new version in one or two days. 
Currently I believe we don't need the actual alignment size at all and 
can just use:

fuse_copy_align()
cs->offset += cs->len;
cs->len = 0;

Because offset and len are for for the current page.

I need to ponder about it a bit, checking if there is any exception...


Thanks,
Bernd
diff mbox series

Patch

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9eb191b5c4de..a13793507d0b 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1009,6 +1009,20 @@  static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size)
 	return 0;
 }
 
+static int fuse_copy_align(struct fuse_copy_state *cs, unsigned int align_size)
+{
+	/* Might happen if fuse-server does not use page aligned buffers */
+	if (cs->len < align_size) {
+		pr_info("Remaining cs->len (%u) too small for alignment (%u)\n",
+			cs->len, align_size);
+		return -EINVAL;
+	}
+	cs->len -= align_size;
+	cs->offset += align_size;
+
+	return 0;
+}
+
 /* Copy request arguments to/from userspace buffer */
 static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
 			  unsigned argpages, struct fuse_arg *args,
@@ -1019,10 +1033,13 @@  static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
 
 	for (i = 0; !err && i < numargs; i++)  {
 		struct fuse_arg *arg = &args[i];
-		if (i == numargs - 1 && argpages)
+		if (i == numargs - 1 && argpages) {
 			err = fuse_copy_pages(cs, arg->size, zeroing);
-		else
+		} else {
 			err = fuse_copy_one(cs, arg->value, arg->size);
+			if (!err && arg->align_size)
+				err = fuse_copy_align(cs, arg->align_size);
+		}
 	}
 	return err;
 }
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f39456c65ed7..0e1c540c6139 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1062,6 +1062,18 @@  static void fuse_write_args_fill(struct fuse_io_args *ia, struct fuse_file *ff,
 		args->in_args[0].size = FUSE_COMPAT_WRITE_IN_SIZE;
 	else
 		args->in_args[0].size = sizeof(ia->write.in);
+
+	if (ff->fm->fc->align_writes) {
+		/*
+		 * add an extra alignment offset after the fuse header to
+		 * the next page
+		 */
+		args->in_args[0].align_size = PAGE_SIZE -
+					      sizeof(struct fuse_in_header) -
+					      sizeof(ia->write.in);
+		ia->write.in.align_size = args->in_args[0].align_size;
+	}
+
 	args->in_args[0].value = &ia->write.in;
 	args->in_args[1].size = count;
 	args->out_numargs = 1;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f23919610313..cb15153c6785 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -275,13 +275,15 @@  struct fuse_file {
 
 /** One input argument of a request */
 struct fuse_in_arg {
-	unsigned size;
+	unsigned int size;
+	unsigned int align_size;
 	const void *value;
 };
 
 /** One output argument of a request */
 struct fuse_arg {
-	unsigned size;
+	unsigned int size;
+	unsigned int align_size;
 	void *value;
 };
 
@@ -860,6 +862,9 @@  struct fuse_conn {
 	/** Passthrough support for read/write IO */
 	unsigned int passthrough:1;
 
+	/** Should (write) data be page aligned? */
+	unsigned int align_writes:1;
+
 	/** Maximum stack depth for passthrough backing files */
 	int max_stack_depth;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 99e44ea7d875..e8b42859f553 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1331,6 +1331,9 @@  static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 			}
 			if (flags & FUSE_NO_EXPORT_SUPPORT)
 				fm->sb->s_export_op = &fuse_export_fid_operations;
+
+			if (flags & FUSE_ALIGN_WRITES)
+				fc->align_writes = 1;
 		} else {
 			ra_pages = fc->max_read / PAGE_SIZE;
 			fc->no_lock = 1;
@@ -1378,7 +1381,7 @@  void fuse_send_init(struct fuse_mount *fm)
 		FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
 		FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP |
 		FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_ALLOW_MMAP |
-		FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND;
+		FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND | FUSE_ALIGN_WRITES;
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		flags |= FUSE_MAP_ALIGNMENT;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index d08b99d60f6f..4f5ddd7fe9b4 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -217,6 +217,11 @@ 
  *  - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag
  *  - add FUSE_NO_EXPORT_SUPPORT init flag
  *  - add FUSE_NOTIFY_RESEND, add FUSE_HAS_RESEND init flag
+ *
+ * 7.41
+ *  - add FUSE_ALIGN_WRITES init flag
+ *  - make use of padding in struct fuse_write_in when
+ *    initialization agrees on aligned writes
  */
 
 #ifndef _LINUX_FUSE_H
@@ -252,7 +257,7 @@ 
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 40
+#define FUSE_KERNEL_MINOR_VERSION 41
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -421,6 +426,8 @@  struct fuse_file_lock {
  * FUSE_NO_EXPORT_SUPPORT: explicitly disable export support
  * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit
  *		    of the request ID indicates resend requests
+ * FUSE_ALIGN_WRITES: For opcode FUSE_WRITE,  data follow the headers with a
+ *		      page aligned offset
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -463,6 +470,7 @@  struct fuse_file_lock {
 #define FUSE_PASSTHROUGH	(1ULL << 37)
 #define FUSE_NO_EXPORT_SUPPORT	(1ULL << 38)
 #define FUSE_HAS_RESEND		(1ULL << 39)
+#define FUSE_ALIGN_WRITES	(1ULL << 40)
 
 /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
 #define FUSE_DIRECT_IO_RELAX	FUSE_DIRECT_IO_ALLOW_MMAP
@@ -496,6 +504,7 @@  struct fuse_file_lock {
  * FUSE_WRITE_CACHE: delayed write from page cache, file handle is guessed
  * FUSE_WRITE_LOCKOWNER: lock_owner field is valid
  * FUSE_WRITE_KILL_SUIDGID: kill suid and sgid bits
+ * FUSE_WRITE_ALIGNED: Data are at an page size aligned offset
  */
 #define FUSE_WRITE_CACHE	(1 << 0)
 #define FUSE_WRITE_LOCKOWNER	(1 << 1)
@@ -812,7 +821,7 @@  struct fuse_write_in {
 	uint32_t	write_flags;
 	uint64_t	lock_owner;
 	uint32_t	flags;
-	uint32_t	padding;
+	uint32_t	align_size; /* extra alignment offset to the next page */
 };
 
 struct fuse_write_out {