diff mbox series

[v3,9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP

Message ID 20240208170603.2078871-10-amir73il@gmail.com (mailing list archive)
State New
Headers show
Series fuse: inode IO modes and mmap + parallel dio | expand

Commit Message

Amir Goldstein Feb. 8, 2024, 5:06 p.m. UTC
Instead of denying caching mode on parallel dio open, deny caching
open only while parallel dio are in-progress and wait for in-progress
parallel dio writes before entering inode caching io mode.

This allows executing parallel dio when inode is not in caching mode
even if shared mmap is allowed, but no mmaps have been performed on
the inode in question.

An mmap on direct_io file now waits for all in-progress parallel dio
writes to complete, so parallel dio writes together with
FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/file.c   | 28 ++++++++++++++++++++--------
 fs/fuse/fuse_i.h |  3 +++
 fs/fuse/iomode.c | 45 ++++++++++++++++++++++++++++++++++++---------
 3 files changed, 59 insertions(+), 17 deletions(-)

Comments

Miklos Szeredi Feb. 9, 2024, 10:50 a.m. UTC | #1
On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:

>  static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>  {
> +       int err = 0;
> +
>         assert_spin_locked(&fi->lock);
> -       if (fi->iocachectr < 0)
> -               return -ETXTBSY;
> -       if (fi->iocachectr++ == 0)
> -               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> -       return 0;
> +       /*
> +        * Setting the bit advises new direct-io writes to use an exclusive
> +        * lock - without it the wait below might be forever.
> +        */
> +       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> +       while (!err && fuse_is_io_cache_wait(fi)) {
> +               spin_unlock(&fi->lock);
> +               err = wait_event_killable(fi->direct_io_waitq,
> +                                         !fuse_is_io_cache_wait(fi));
> +               spin_lock(&fi->lock);
> +       }
> +       /*
> +        * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
> +        * failed to enter caching mode and no other caching open exists.
> +        */
> +       if (!err)
> +               fi->iocachectr++;
> +       else if (fi->iocachectr <= 0)
> +               clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);

This seems wrong:  if the current task is killed, and there's anther
task trying to get cached open mode, then clearing
FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
logic.

I'd suggest fixing this by not making the wait killable.  It's just a
nice to have, but without all the other waits being killable (e.g.
filesystem locks) it's just a drop in the ocean.

Thanks,
Miklos
Bernd Schubert Feb. 9, 2024, 11:21 a.m. UTC | #2
On 2/9/24 11:50, Miklos Szeredi wrote:
> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
> 
>>  static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>>  {
>> +       int err = 0;
>> +
>>         assert_spin_locked(&fi->lock);
>> -       if (fi->iocachectr < 0)
>> -               return -ETXTBSY;
>> -       if (fi->iocachectr++ == 0)
>> -               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>> -       return 0;
>> +       /*
>> +        * Setting the bit advises new direct-io writes to use an exclusive
>> +        * lock - without it the wait below might be forever.
>> +        */
>> +       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>> +       while (!err && fuse_is_io_cache_wait(fi)) {
>> +               spin_unlock(&fi->lock);
>> +               err = wait_event_killable(fi->direct_io_waitq,
>> +                                         !fuse_is_io_cache_wait(fi));
>> +               spin_lock(&fi->lock);
>> +       }
>> +       /*
>> +        * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>> +        * failed to enter caching mode and no other caching open exists.
>> +        */
>> +       if (!err)
>> +               fi->iocachectr++;
>> +       else if (fi->iocachectr <= 0)
>> +               clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> 
> This seems wrong:  if the current task is killed, and there's anther
> task trying to get cached open mode, then clearing
> FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
> logic.

This is called holding a spin lock, another task cannot enter here?
Neither can direct-IO, because it is also locked out. The bit helps DIO
code to avoid trying to do parallel DIO without the need to take a spin
lock. When DIO decides it wants to do parallel IO, it first has to get
past fi->iocachectr < 0 - if there is another task trying to do cache
IO, either DIO gets < 0 first and the other cache task has to wait, or
cache tasks gets > 0 and dio will continue with the exclusive lock. Or
do I miss something?


> 
> I'd suggest fixing this by not making the wait killable.  It's just a
> nice to have, but without all the other waits being killable (e.g.
> filesystem locks) it's just a drop in the ocean.


Thanks,
Bernd
Bernd Schubert Feb. 9, 2024, 11:48 a.m. UTC | #3
On 2/9/24 12:21, Bernd Schubert wrote:
> 
> 
> On 2/9/24 11:50, Miklos Szeredi wrote:
>> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>>>  static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>>>  {
>>> +       int err = 0;
>>> +
>>>         assert_spin_locked(&fi->lock);
>>> -       if (fi->iocachectr < 0)
>>> -               return -ETXTBSY;
>>> -       if (fi->iocachectr++ == 0)
>>> -               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>> -       return 0;
>>> +       /*
>>> +        * Setting the bit advises new direct-io writes to use an exclusive
>>> +        * lock - without it the wait below might be forever.
>>> +        */
>>> +       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>> +       while (!err && fuse_is_io_cache_wait(fi)) {
>>> +               spin_unlock(&fi->lock);
>>> +               err = wait_event_killable(fi->direct_io_waitq,
>>> +                                         !fuse_is_io_cache_wait(fi));
>>> +               spin_lock(&fi->lock);
>>> +       }
>>> +       /*
>>> +        * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>>> +        * failed to enter caching mode and no other caching open exists.
>>> +        */
>>> +       if (!err)
>>> +               fi->iocachectr++;
>>> +       else if (fi->iocachectr <= 0)
>>> +               clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>
>> This seems wrong:  if the current task is killed, and there's anther
>> task trying to get cached open mode, then clearing
>> FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
>> logic.
> 
> This is called holding a spin lock, another task cannot enter here?
> Neither can direct-IO, because it is also locked out. The bit helps DIO
> code to avoid trying to do parallel DIO without the need to take a spin
> lock. When DIO decides it wants to do parallel IO, it first has to get
> past fi->iocachectr < 0 - if there is another task trying to do cache
> IO, either DIO gets < 0 first and the other cache task has to wait, or
> cache tasks gets > 0 and dio will continue with the exclusive lock. Or
> do I miss something?

Now I see what you mean, there is an unlock and another task might have also already set the bit

I think this should do

diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
index acd0833ae873..7c22edd674cb 100644
--- a/fs/fuse/iomode.c
+++ b/fs/fuse/iomode.c
@@ -41,6 +41,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
                err = wait_event_killable(fi->direct_io_waitq,
                                          !fuse_is_io_cache_wait(fi));
                spin_lock(&fi->lock);
+               if (!err)
+			/* Another interrupted task might have unset it */
+                       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
        }
        /*
         * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
Amir Goldstein Feb. 9, 2024, 12:12 p.m. UTC | #4
On Fri, Feb 9, 2024 at 1:48 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 2/9/24 12:21, Bernd Schubert wrote:
> >
> >
> > On 2/9/24 11:50, Miklos Szeredi wrote:
> >> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
> >>
> >>>  static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> >>>  {
> >>> +       int err = 0;
> >>> +
> >>>         assert_spin_locked(&fi->lock);
> >>> -       if (fi->iocachectr < 0)
> >>> -               return -ETXTBSY;
> >>> -       if (fi->iocachectr++ == 0)
> >>> -               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> >>> -       return 0;
> >>> +       /*
> >>> +        * Setting the bit advises new direct-io writes to use an exclusive
> >>> +        * lock - without it the wait below might be forever.
> >>> +        */
> >>> +       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> >>> +       while (!err && fuse_is_io_cache_wait(fi)) {
> >>> +               spin_unlock(&fi->lock);
> >>> +               err = wait_event_killable(fi->direct_io_waitq,
> >>> +                                         !fuse_is_io_cache_wait(fi));
> >>> +               spin_lock(&fi->lock);
> >>> +       }
> >>> +       /*
> >>> +        * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
> >>> +        * failed to enter caching mode and no other caching open exists.
> >>> +        */
> >>> +       if (!err)
> >>> +               fi->iocachectr++;
> >>> +       else if (fi->iocachectr <= 0)
> >>> +               clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> >>
> >> This seems wrong:  if the current task is killed, and there's anther
> >> task trying to get cached open mode, then clearing
> >> FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
> >> logic.
> >
> > This is called holding a spin lock, another task cannot enter here?
> > Neither can direct-IO, because it is also locked out. The bit helps DIO
> > code to avoid trying to do parallel DIO without the need to take a spin
> > lock. When DIO decides it wants to do parallel IO, it first has to get
> > past fi->iocachectr < 0 - if there is another task trying to do cache
> > IO, either DIO gets < 0 first and the other cache task has to wait, or
> > cache tasks gets > 0 and dio will continue with the exclusive lock. Or
> > do I miss something?
>
> Now I see what you mean, there is an unlock and another task might have also already set the bit
>
> I think this should do
>
> diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
> index acd0833ae873..7c22edd674cb 100644
> --- a/fs/fuse/iomode.c
> +++ b/fs/fuse/iomode.c
> @@ -41,6 +41,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>                 err = wait_event_killable(fi->direct_io_waitq,
>                                           !fuse_is_io_cache_wait(fi));
>                 spin_lock(&fi->lock);
> +               if (!err)
> +                       /* Another interrupted task might have unset it */
> +                       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>         }
>         /*
>          * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we

I think this race can happen even if we remove killable_
not sure - anyway, with fuse passthrough there is another error
condition:

        /*
         * Check if inode entered passthrough io mode while waiting for parallel
         * dio write completion.
         */
        if (fuse_inode_backing(fi))
                err = -ETXTBSY;

But in this condition, all waiting tasks should abort the wait,
so it does not seem a problem to clean the flag.

Anyway, IMO it is better to set the flag before every wait and on
success. Like below.

Thanks,
Amir.

--- a/fs/fuse/iomode.c
+++ b/fs/fuse/iomode.c
@@ -35,8 +35,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
         * Setting the bit advises new direct-io writes to use an exclusive
         * lock - without it the wait below might be forever.
         */
-       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
        while (!err && fuse_is_io_cache_wait(fi)) {
+               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
                spin_unlock(&fi->lock);
                err = wait_event_killable(fi->direct_io_waitq,
                                          !fuse_is_io_cache_wait(fi));
@@ -53,8 +53,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
         * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
         * failed to enter caching mode and no other caching open exists.
         */
-       if (!err)
-               fi->iocachectr++;
+       if (!err && fi->iocachectr++ == 0)
+               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
        else if (fi->iocachectr <= 0)
                clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
        return err;
Amir Goldstein Feb. 9, 2024, 12:19 p.m. UTC | #5
On Fri, Feb 9, 2024 at 2:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, Feb 9, 2024 at 1:48 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> >
> >
> > On 2/9/24 12:21, Bernd Schubert wrote:
> > >
> > >
> > > On 2/9/24 11:50, Miklos Szeredi wrote:
> > >> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
> > >>
> > >>>  static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> > >>>  {
> > >>> +       int err = 0;
> > >>> +
> > >>>         assert_spin_locked(&fi->lock);
> > >>> -       if (fi->iocachectr < 0)
> > >>> -               return -ETXTBSY;
> > >>> -       if (fi->iocachectr++ == 0)
> > >>> -               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> > >>> -       return 0;
> > >>> +       /*
> > >>> +        * Setting the bit advises new direct-io writes to use an exclusive
> > >>> +        * lock - without it the wait below might be forever.
> > >>> +        */
> > >>> +       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> > >>> +       while (!err && fuse_is_io_cache_wait(fi)) {
> > >>> +               spin_unlock(&fi->lock);
> > >>> +               err = wait_event_killable(fi->direct_io_waitq,
> > >>> +                                         !fuse_is_io_cache_wait(fi));
> > >>> +               spin_lock(&fi->lock);
> > >>> +       }
> > >>> +       /*
> > >>> +        * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
> > >>> +        * failed to enter caching mode and no other caching open exists.
> > >>> +        */
> > >>> +       if (!err)
> > >>> +               fi->iocachectr++;
> > >>> +       else if (fi->iocachectr <= 0)
> > >>> +               clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> > >>
> > >> This seems wrong:  if the current task is killed, and there's anther
> > >> task trying to get cached open mode, then clearing
> > >> FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
> > >> logic.
> > >
> > > This is called holding a spin lock, another task cannot enter here?
> > > Neither can direct-IO, because it is also locked out. The bit helps DIO
> > > code to avoid trying to do parallel DIO without the need to take a spin
> > > lock. When DIO decides it wants to do parallel IO, it first has to get
> > > past fi->iocachectr < 0 - if there is another task trying to do cache
> > > IO, either DIO gets < 0 first and the other cache task has to wait, or
> > > cache tasks gets > 0 and dio will continue with the exclusive lock. Or
> > > do I miss something?
> >
> > Now I see what you mean, there is an unlock and another task might have also already set the bit
> >
> > I think this should do
> >
> > diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
> > index acd0833ae873..7c22edd674cb 100644
> > --- a/fs/fuse/iomode.c
> > +++ b/fs/fuse/iomode.c
> > @@ -41,6 +41,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> >                 err = wait_event_killable(fi->direct_io_waitq,
> >                                           !fuse_is_io_cache_wait(fi));
> >                 spin_lock(&fi->lock);
> > +               if (!err)
> > +                       /* Another interrupted task might have unset it */
> > +                       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> >         }
> >         /*
> >          * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>
> I think this race can happen even if we remove killable_
> not sure - anyway, with fuse passthrough there is another error
> condition:
>
>         /*
>          * Check if inode entered passthrough io mode while waiting for parallel
>          * dio write completion.
>          */
>         if (fuse_inode_backing(fi))
>                 err = -ETXTBSY;
>
> But in this condition, all waiting tasks should abort the wait,
> so it does not seem a problem to clean the flag.
>
> Anyway, IMO it is better to set the flag before every wait and on
> success. Like below.
>
> Thanks,
> Amir.
>
> --- a/fs/fuse/iomode.c
> +++ b/fs/fuse/iomode.c
> @@ -35,8 +35,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>          * Setting the bit advises new direct-io writes to use an exclusive
>          * lock - without it the wait below might be forever.
>          */
> -       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>         while (!err && fuse_is_io_cache_wait(fi)) {
> +               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>                 spin_unlock(&fi->lock);
>                 err = wait_event_killable(fi->direct_io_waitq,
>                                           !fuse_is_io_cache_wait(fi));
> @@ -53,8 +53,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>          * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>          * failed to enter caching mode and no other caching open exists.
>          */
> -       if (!err)
> -               fi->iocachectr++;
> +       if (!err && fi->iocachectr++ == 0)
> +               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>         else if (fi->iocachectr <= 0)
>                 clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>         return err;

Oops should be:

       if (!err) {
               fi->iocachectr++;
               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
       } else if (fi->iocachectr <= 0) {
               clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
       }
       return err;
Bernd Schubert Feb. 9, 2024, 12:19 p.m. UTC | #6
On 2/9/24 13:12, Amir Goldstein wrote:
> On Fri, Feb 9, 2024 at 1:48 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 2/9/24 12:21, Bernd Schubert wrote:
>>>
>>>
>>> On 2/9/24 11:50, Miklos Szeredi wrote:
>>>> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>
>>>>>  static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>>>>>  {
>>>>> +       int err = 0;
>>>>> +
>>>>>         assert_spin_locked(&fi->lock);
>>>>> -       if (fi->iocachectr < 0)
>>>>> -               return -ETXTBSY;
>>>>> -       if (fi->iocachectr++ == 0)
>>>>> -               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>>>> -       return 0;
>>>>> +       /*
>>>>> +        * Setting the bit advises new direct-io writes to use an exclusive
>>>>> +        * lock - without it the wait below might be forever.
>>>>> +        */
>>>>> +       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>>>> +       while (!err && fuse_is_io_cache_wait(fi)) {
>>>>> +               spin_unlock(&fi->lock);
>>>>> +               err = wait_event_killable(fi->direct_io_waitq,
>>>>> +                                         !fuse_is_io_cache_wait(fi));
>>>>> +               spin_lock(&fi->lock);
>>>>> +       }
>>>>> +       /*
>>>>> +        * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>>>>> +        * failed to enter caching mode and no other caching open exists.
>>>>> +        */
>>>>> +       if (!err)
>>>>> +               fi->iocachectr++;
>>>>> +       else if (fi->iocachectr <= 0)
>>>>> +               clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>>>
>>>> This seems wrong:  if the current task is killed, and there's anther
>>>> task trying to get cached open mode, then clearing
>>>> FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
>>>> logic.
>>>
>>> This is called holding a spin lock, another task cannot enter here?
>>> Neither can direct-IO, because it is also locked out. The bit helps DIO
>>> code to avoid trying to do parallel DIO without the need to take a spin
>>> lock. When DIO decides it wants to do parallel IO, it first has to get
>>> past fi->iocachectr < 0 - if there is another task trying to do cache
>>> IO, either DIO gets < 0 first and the other cache task has to wait, or
>>> cache tasks gets > 0 and dio will continue with the exclusive lock. Or
>>> do I miss something?
>>
>> Now I see what you mean, there is an unlock and another task might have also already set the bit
>>
>> I think this should do
>>
>> diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
>> index acd0833ae873..7c22edd674cb 100644
>> --- a/fs/fuse/iomode.c
>> +++ b/fs/fuse/iomode.c
>> @@ -41,6 +41,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>>                 err = wait_event_killable(fi->direct_io_waitq,
>>                                           !fuse_is_io_cache_wait(fi));
>>                 spin_lock(&fi->lock);
>> +               if (!err)
>> +                       /* Another interrupted task might have unset it */
>> +                       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>         }
>>         /*
>>          * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
> 
> I think this race can happen even if we remove killable_
> not sure - anyway, with fuse passthrough there is another error
> condition:
> 
>         /*
>          * Check if inode entered passthrough io mode while waiting for parallel
>          * dio write completion.
>          */
>         if (fuse_inode_backing(fi))
>                 err = -ETXTBSY;
> 
> But in this condition, all waiting tasks should abort the wait,
> so it does not seem a problem to clean the flag.
> 
> Anyway, IMO it is better to set the flag before every wait and on
> success. Like below.
> 
> Thanks,
> Amir.
> 
> --- a/fs/fuse/iomode.c
> +++ b/fs/fuse/iomode.c
> @@ -35,8 +35,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>          * Setting the bit advises new direct-io writes to use an exclusive
>          * lock - without it the wait below might be forever.
>          */
> -       set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>         while (!err && fuse_is_io_cache_wait(fi)) {
> +               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>                 spin_unlock(&fi->lock);
>                 err = wait_event_killable(fi->direct_io_waitq,
>                                           !fuse_is_io_cache_wait(fi));
> @@ -53,8 +53,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>          * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>          * failed to enter caching mode and no other caching open exists.
>          */
> -       if (!err)
> -               fi->iocachectr++;
> +       if (!err && fi->iocachectr++ == 0)
> +               set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>         else if (fi->iocachectr <= 0)
>                 clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>         return err;


Yeah, that is fine with me. Basically every wait signals "please let me
in", instead of doing it a single time only.


Thanks,
Bernd
Miklos Szeredi Feb. 9, 2024, 1:26 p.m. UTC | #7
On Fri, 9 Feb 2024 at 13:12, Amir Goldstein <amir73il@gmail.com> wrote:

> I think this race can happen even if we remove killable_

Without _killable, the loop will exit with iocachectr >= 0, hence the
FUSE_I_CACHE_IO_MODE will not be cleared.

> not sure - anyway, with fuse passthrough there is another error
> condition:
>
>         /*
>          * Check if inode entered passthrough io mode while waiting for parallel
>          * dio write completion.
>          */
>         if (fuse_inode_backing(fi))
>                 err = -ETXTBSY;
>
> But in this condition, all waiting tasks should abort the wait,
> so it does not seem a problem to clean the flag.

Ah, this complicates things.  But I think it's safe to clear
FUSE_I_CACHE_IO_MODE in this case, since other
fuse_inode_get_io_cache() calls will also fail.

> Anyway, IMO it is better to set the flag before every wait and on
> success. Like below.

This would still have  the race, since there will be a window during
which the FUSE_I_CACHE_IO_MODE flag has been cleared and new parallel
writes can start, even though there are one or more waiters for cached
open.

Not that this would be a problem in practice, but I also don't see
removing the _killable being a big issue.

Thanks,
Miklos
Amir Goldstein Feb. 9, 2024, 3:28 p.m. UTC | #8
On Fri, Feb 9, 2024 at 3:27 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 9 Feb 2024 at 13:12, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > I think this race can happen even if we remove killable_
>
> Without _killable, the loop will exit with iocachectr >= 0, hence the
> FUSE_I_CACHE_IO_MODE will not be cleared.
>
> > not sure - anyway, with fuse passthrough there is another error
> > condition:
> >
> >         /*
> >          * Check if inode entered passthrough io mode while waiting for parallel
> >          * dio write completion.
> >          */
> >         if (fuse_inode_backing(fi))
> >                 err = -ETXTBSY;
> >
> > But in this condition, all waiting tasks should abort the wait,
> > so it does not seem a problem to clean the flag.
>
> Ah, this complicates things.  But I think it's safe to clear
> FUSE_I_CACHE_IO_MODE in this case, since other
> fuse_inode_get_io_cache() calls will also fail.
>

Right.

> > Anyway, IMO it is better to set the flag before every wait and on
> > success. Like below.
>
> This would still have  the race, since there will be a window during
> which the FUSE_I_CACHE_IO_MODE flag has been cleared and new parallel
> writes can start, even though there are one or more waiters for cached
> open.
>
> Not that this would be a problem in practice, but I also don't see
> removing the _killable being a big issue.

ok. Remove killable.

Pushed branches fuse_io_mode-090224 and fuse-backing-fd-090224
with requested fixes.

Note that I had to update libfuse fuse-backing-fd branch, because when
removing FOPEN_CACHE_IO, I changed the constant value of
FOPEN_PASSTHOUGH.

Passes my sanity tests.
Bernd, please verify that I did not break anything on your end.

Thanks,
Amir.
Amir Goldstein March 17, 2024, 1:54 p.m. UTC | #9
On Fri, Feb 9, 2024 at 5:28 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, Feb 9, 2024 at 3:27 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Fri, 9 Feb 2024 at 13:12, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > I think this race can happen even if we remove killable_
> >
> > Without _killable, the loop will exit with iocachectr >= 0, hence the
> > FUSE_I_CACHE_IO_MODE will not be cleared.
> >
> > > not sure - anyway, with fuse passthrough there is another error
> > > condition:
> > >
> > >         /*
> > >          * Check if inode entered passthrough io mode while waiting for parallel
> > >          * dio write completion.
> > >          */
> > >         if (fuse_inode_backing(fi))
> > >                 err = -ETXTBSY;
> > >
> > > But in this condition, all waiting tasks should abort the wait,
> > > so it does not seem a problem to clean the flag.
> >
> > Ah, this complicates things.  But I think it's safe to clear
> > FUSE_I_CACHE_IO_MODE in this case, since other
> > fuse_inode_get_io_cache() calls will also fail.
> >
>
> Right.
>
> > > Anyway, IMO it is better to set the flag before every wait and on
> > > success. Like below.
> >
> > This would still have  the race, since there will be a window during
> > which the FUSE_I_CACHE_IO_MODE flag has been cleared and new parallel
> > writes can start, even though there are one or more waiters for cached
> > open.
> >
> > Not that this would be a problem in practice, but I also don't see
> > removing the _killable being a big issue.
>
> ok. Remove killable.
>
> Pushed branches fuse_io_mode-090224 and fuse-backing-fd-090224
> with requested fixes.
>
> Note that I had to update libfuse fuse-backing-fd branch, because when
> removing FOPEN_CACHE_IO, I changed the constant value of
> FOPEN_PASSTHOUGH.
>
> Passes my sanity tests.
> Bernd, please verify that I did not break anything on your end.
>

Miklos,

I see that you decided to drop the waiting for parallel dio logic
in the final version. Decided to simply or found a problem?

Also, FUSE passthrough patch 9/9 ("fuse: auto-invalidate inode
attributes in passthrough mode") was not included in the final version,
so these fstests are failing on non uptodate size/mtime/ctime after
mapped write:

Failures: generic/080 generic/120 generic/207 generic/215

Was it left out by mistake or for a reason?

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20240206142453.1906268-10-amir73il@gmail.com/
Miklos Szeredi March 17, 2024, 7:31 p.m. UTC | #10
On Sun, 17 Mar 2024 at 14:54, Amir Goldstein <amir73il@gmail.com> wrote:>

> I see that you decided to drop the waiting for parallel dio logic
> in the final version. Decided to simply or found a problem?

I don't think I dropped this.  Which one are you thinking?

> Also, FUSE passthrough patch 9/9 ("fuse: auto-invalidate inode
> attributes in passthrough mode") was not included in the final version,
> so these fstests are failing on non uptodate size/mtime/ctime after
> mapped write:
>
> Failures: generic/080 generic/120 generic/207 generic/215
>
> Was it left out by mistake or for a reason?

Yes, this was deliberate.  For plain read/write it's simply
unnecessary, since it should get 100% hit rate on the c/mtime tests.

For mmap we do need something, and that something might be best done
by looking at backing inode times.  But even in that case I'd just
compare the times from a previous version of the backing inode to the
current version, instead of comparing with the cached values.

Thanks,
Miklos
Amir Goldstein March 17, 2024, 8 p.m. UTC | #11
On Sun, Mar 17, 2024 at 9:31 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Sun, 17 Mar 2024 at 14:54, Amir Goldstein <amir73il@gmail.com> wrote:>
>
> > I see that you decided to drop the waiting for parallel dio logic
> > in the final version. Decided to simply or found a problem?
>
> I don't think I dropped this.  Which one are you thinking?

My bad. just looked at the wrong diff.

>
> > Also, FUSE passthrough patch 9/9 ("fuse: auto-invalidate inode
> > attributes in passthrough mode") was not included in the final version,
> > so these fstests are failing on non uptodate size/mtime/ctime after
> > mapped write:
> >
> > Failures: generic/080 generic/120 generic/207 generic/215
> >
> > Was it left out by mistake or for a reason?
>
> Yes, this was deliberate.  For plain read/write it's simply
> unnecessary, since it should get 100% hit rate on the c/mtime tests.
>
> For mmap we do need something, and that something might be best done
> by looking at backing inode times.  But even in that case I'd just
> compare the times from a previous version of the backing inode to the
> current version, instead of comparing with the cached values.
>

Well, looking ahead to implementing more passthrough ops, I have
a use case for passthrough of getattr(), so if that will be a possible
configuration in the future, maybe we won't need to worry about
the cached values at all?

Thanks,
Amir.
Miklos Szeredi March 17, 2024, 8:02 p.m. UTC | #12
On Sun, 17 Mar 2024 at 21:00, Amir Goldstein <amir73il@gmail.com> wrote:

> Well, looking ahead to implementing more passthrough ops, I have
> a use case for passthrough of getattr(), so if that will be a possible
> configuration in the future, maybe we won't need to worry about
> the cached values at all?

Yes, if attributes are passed through, then things become simpler.  So
introducing that mode first and seeing if that covers all the use
cases is probably a good first step.

Thanks,
Miklos
diff mbox series

Patch

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 29e18e5a6f6c..eb226457c4bd 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1335,6 +1335,7 @@  static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
 	struct file *file = iocb->ki_filp;
 	struct fuse_file *ff = file->private_data;
 	struct inode *inode = file_inode(iocb->ki_filp);
+	struct fuse_inode *fi = get_fuse_inode(inode);
 
 	/* server side has to advise that it supports parallel dio writes */
 	if (!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES))
@@ -1346,11 +1347,9 @@  static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
 	if (iocb->ki_flags & IOCB_APPEND)
 		return true;
 
-	/* combination opf page access and direct-io difficult, shared
-	 * locks actually introduce a conflict.
-	 */
-	if (get_fuse_conn(inode)->direct_io_allow_mmap)
-		return true;
+	/* shared locks are not allowed with parallel page cache IO */
+	if (test_bit(FUSE_I_CACHE_IO_MODE, &fi->state))
+		return false;
 
 	/* parallel dio beyond eof is at least for now not supported */
 	if (fuse_io_past_eof(iocb, from))
@@ -1370,10 +1369,14 @@  static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from,
 	} else {
 		inode_lock_shared(inode);
 		/*
-		 * Previous check was without inode lock and might have raced,
-		 * check again.
+		 * New parallal dio allowed only if inode is not in caching
+		 * mode and denies new opens in caching mode. This check
+		 * should be performed only after taking shared inode lock.
+		 * Previous past eof check was without inode lock and might
+		 * have raced, so check it again.
 		 */
-		if (fuse_io_past_eof(iocb, from)) {
+		if (fuse_io_past_eof(iocb, from) ||
+		    fuse_file_uncached_io_start(inode) != 0) {
 			inode_unlock_shared(inode);
 			inode_lock(inode);
 			*exclusive = true;
@@ -1386,6 +1389,8 @@  static void fuse_dio_unlock(struct inode *inode, bool exclusive)
 	if (exclusive) {
 		inode_unlock(inode);
 	} else {
+		/* Allow opens in caching mode after last parallel dio end */
+		fuse_file_uncached_io_end(inode);
 		inode_unlock_shared(inode);
 	}
 }
@@ -2521,6 +2526,10 @@  static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 	if (FUSE_IS_DAX(file_inode(file)))
 		return fuse_dax_mmap(file, vma);
 
+	/*
+	 * FOPEN_DIRECT_IO handling is special compared to O_DIRECT,
+	 * as does not allow MAP_SHARED mmap without FUSE_DIRECT_IO_ALLOW_MMAP.
+	 */
 	if (ff->open_flags & FOPEN_DIRECT_IO) {
 		/*
 		 * Can't provide the coherency needed for MAP_SHARED
@@ -2533,6 +2542,8 @@  static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 
 		/*
 		 * First mmap of direct_io file enters caching inode io mode.
+		 * Also waits for parallel dio writers to go into serial mode
+		 * (exclusive instead of shared lock).
 		 */
 		rc = fuse_file_io_mmap(ff, file_inode(file));
 		if (rc)
@@ -3312,6 +3323,7 @@  void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 	fi->writectr = 0;
 	fi->iocachectr = 0;
 	init_waitqueue_head(&fi->page_waitq);
+	init_waitqueue_head(&fi->direct_io_waitq);
 	fi->writepages = RB_ROOT;
 
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 5e5465f6a1ac..dede4378c719 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -129,6 +129,9 @@  struct fuse_inode {
 			/* Waitq for writepage completion */
 			wait_queue_head_t page_waitq;
 
+			/* waitq for direct-io completion */
+			wait_queue_head_t direct_io_waitq;
+
 			/* List of writepage requestst (pending or sent) */
 			struct rb_root writepages;
 		};
diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
index 13faae77aec4..acd0833ae873 100644
--- a/fs/fuse/iomode.c
+++ b/fs/fuse/iomode.c
@@ -12,18 +12,45 @@ 
 #include <linux/file.h>
 #include <linux/fs.h>
 
+/*
+ * Return true if need to wait for new opens in caching mode.
+ */
+static inline bool fuse_is_io_cache_wait(struct fuse_inode *fi)
+{
+	return READ_ONCE(fi->iocachectr) < 0;
+}
+
 /*
  * Request an open in caching mode.
+ * Blocks new parallel dio writes and waits for the in-progress parallel dio
+ * writes to complete.
  * Return 0 if in caching mode.
  */
 static int fuse_inode_get_io_cache(struct fuse_inode *fi)
 {
+	int err = 0;
+
 	assert_spin_locked(&fi->lock);
-	if (fi->iocachectr < 0)
-		return -ETXTBSY;
-	if (fi->iocachectr++ == 0)
-		set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
-	return 0;
+	/*
+	 * Setting the bit advises new direct-io writes to use an exclusive
+	 * lock - without it the wait below might be forever.
+	 */
+	set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+	while (!err && fuse_is_io_cache_wait(fi)) {
+		spin_unlock(&fi->lock);
+		err = wait_event_killable(fi->direct_io_waitq,
+					  !fuse_is_io_cache_wait(fi));
+		spin_lock(&fi->lock);
+	}
+	/*
+	 * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
+	 * failed to enter caching mode and no other caching open exists.
+	 */
+	if (!err)
+		fi->iocachectr++;
+	else if (fi->iocachectr <= 0)
+		clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+	return err;
 }
 
 /*
@@ -102,10 +129,13 @@  int fuse_file_uncached_io_start(struct inode *inode)
 void fuse_file_uncached_io_end(struct inode *inode)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
+	int uncached_io;
 
 	spin_lock(&fi->lock);
-	fuse_inode_allow_io_cache(fi);
+	uncached_io = fuse_inode_allow_io_cache(fi);
 	spin_unlock(&fi->lock);
+	if (!uncached_io)
+		wake_up(&fi->direct_io_waitq);
 }
 
 /* Open flags to determine regular file io mode */
@@ -155,13 +185,10 @@  int fuse_file_io_open(struct file *file, struct inode *inode)
 
 	/*
 	 * First caching file open enters caching inode io mode.
-	 * First parallel dio open denies caching inode io mode.
 	 */
 	err = 0;
 	if (ff->open_flags & FOPEN_CACHE_IO)
 		err = fuse_file_cached_io_start(inode);
-	else if (ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES)
-		err = fuse_file_uncached_io_start(inode);
 	if (err)
 		goto fail;