diff mbox series

[1/2] netfs: release the folio lock and put the folio before retrying

Message ID 20220701022947.10716-2-xiubli@redhat.com (mailing list archive)
State New
Headers show
Series netfs, ceph: fix the crash when unlocking the folio | expand

Commit Message

Xiubo Li July 1, 2022, 2:29 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

The lower layer filesystem should always make sure the folio is
locked and do the unlock and put the folio in netfs layer.

URL: https://tracker.ceph.com/issues/56423
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/netfs/buffered_read.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Jeff Layton July 1, 2022, 10:38 a.m. UTC | #1
On Fri, 2022-07-01 at 10:29 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> The lower layer filesystem should always make sure the folio is
> locked and do the unlock and put the folio in netfs layer.
> 
> URL: https://tracker.ceph.com/issues/56423
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/netfs/buffered_read.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
> index 42f892c5712e..257fd37c2461 100644
> --- a/fs/netfs/buffered_read.c
> +++ b/fs/netfs/buffered_read.c
> @@ -351,8 +351,11 @@ int netfs_write_begin(struct netfs_inode *ctx,
>  		ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
>  		if (ret < 0) {
>  			trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
> -			if (ret == -EAGAIN)
> +			if (ret == -EAGAIN) {
> +				folio_unlock(folio);
> +				folio_put(folio);
>  				goto retry;
> +			}
>  			goto error;
>  		}
>  	}

I don't know here... I think it might be better to just expect that when
this function returns an error that the folio has already been unlocked.
Doing it this way will mean that you will lock and unlock the folio a
second time for no reason.

Maybe something like this instead?

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 42f892c5712e..8ae7b0f4c909 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -353,7 +353,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
                        trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
                        if (ret == -EAGAIN)
                                goto retry;
-                       goto error;
+                       goto error_unlocked;
                }
        }
 
@@ -418,6 +418,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
 error:
        folio_unlock(folio);
        folio_put(folio);
+error_unlocked:
        _leave(" = %d", ret);
        return ret;
 }
Xiubo Li July 4, 2022, 1:13 a.m. UTC | #2
On 7/1/22 6:38 PM, Jeff Layton wrote:
> On Fri, 2022-07-01 at 10:29 +0800, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> The lower layer filesystem should always make sure the folio is
>> locked and do the unlock and put the folio in netfs layer.
>>
>> URL: https://tracker.ceph.com/issues/56423
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/netfs/buffered_read.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
>> index 42f892c5712e..257fd37c2461 100644
>> --- a/fs/netfs/buffered_read.c
>> +++ b/fs/netfs/buffered_read.c
>> @@ -351,8 +351,11 @@ int netfs_write_begin(struct netfs_inode *ctx,
>>   		ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
>>   		if (ret < 0) {
>>   			trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
>> -			if (ret == -EAGAIN)
>> +			if (ret == -EAGAIN) {
>> +				folio_unlock(folio);
>> +				folio_put(folio);
>>   				goto retry;
>> +			}
>>   			goto error;
>>   		}
>>   	}
> I don't know here... I think it might be better to just expect that when
> this function returns an error that the folio has already been unlocked.
> Doing it this way will mean that you will lock and unlock the folio a
> second time for no reason.
>
> Maybe something like this instead?
>
> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
> index 42f892c5712e..8ae7b0f4c909 100644
> --- a/fs/netfs/buffered_read.c
> +++ b/fs/netfs/buffered_read.c
> @@ -353,7 +353,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
>                          trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
>                          if (ret == -EAGAIN)
>                                  goto retry;
> -                       goto error;
> +                       goto error_unlocked;
>                  }
>          }
>   
> @@ -418,6 +418,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
>   error:
>          folio_unlock(folio);
>          folio_put(folio);
> +error_unlocked:
>          _leave(" = %d", ret);
>          return ret;
>   }

Then the "afs" won't work correctly:

377 static int afs_check_write_begin(struct file *file, loff_t pos, 
unsigned len,
378                                  struct folio *folio, void **_fsdata)
379 {
380         struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
381
382         return test_bit(AFS_VNODE_DELETED, &vnode->flags) ? -ESTALE : 0;
383 }

The "afs" does nothing with the folio lock.

-- Xiubo
Matthew Wilcox July 4, 2022, 2:10 a.m. UTC | #3
On Mon, Jul 04, 2022 at 09:13:44AM +0800, Xiubo Li wrote:
> On 7/1/22 6:38 PM, Jeff Layton wrote:
> > I don't know here... I think it might be better to just expect that when
> > this function returns an error that the folio has already been unlocked.
> > Doing it this way will mean that you will lock and unlock the folio a
> > second time for no reason.
> > 
> > Maybe something like this instead?
> > 
> > diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
> > index 42f892c5712e..8ae7b0f4c909 100644
> > --- a/fs/netfs/buffered_read.c
> > +++ b/fs/netfs/buffered_read.c
> > @@ -353,7 +353,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
> >                          trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
> >                          if (ret == -EAGAIN)
> >                                  goto retry;
> > -                       goto error;
> > +                       goto error_unlocked;
> >                  }
> >          }
> > @@ -418,6 +418,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
> >   error:
> >          folio_unlock(folio);
> >          folio_put(folio);
> > +error_unlocked:
> >          _leave(" = %d", ret);
> >          return ret;
> >   }
> 
> Then the "afs" won't work correctly:
> 
> 377 static int afs_check_write_begin(struct file *file, loff_t pos, unsigned
> len,
> 378                                  struct folio *folio, void **_fsdata)
> 379 {
> 380         struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> 381
> 382         return test_bit(AFS_VNODE_DELETED, &vnode->flags) ? -ESTALE : 0;
> 383 }
> 
> The "afs" does nothing with the folio lock.

It's OK to fix AFS too.
Xiubo Li July 4, 2022, 2:40 a.m. UTC | #4
On 7/4/22 10:10 AM, Matthew Wilcox wrote:
> On Mon, Jul 04, 2022 at 09:13:44AM +0800, Xiubo Li wrote:
>> On 7/1/22 6:38 PM, Jeff Layton wrote:
>>> I don't know here... I think it might be better to just expect that when
>>> this function returns an error that the folio has already been unlocked.
>>> Doing it this way will mean that you will lock and unlock the folio a
>>> second time for no reason.
>>>
>>> Maybe something like this instead?
>>>
>>> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
>>> index 42f892c5712e..8ae7b0f4c909 100644
>>> --- a/fs/netfs/buffered_read.c
>>> +++ b/fs/netfs/buffered_read.c
>>> @@ -353,7 +353,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
>>>                           trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
>>>                           if (ret == -EAGAIN)
>>>                                   goto retry;
>>> -                       goto error;
>>> +                       goto error_unlocked;
>>>                   }
>>>           }
>>> @@ -418,6 +418,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
>>>    error:
>>>           folio_unlock(folio);
>>>           folio_put(folio);
>>> +error_unlocked:
>>>           _leave(" = %d", ret);
>>>           return ret;
>>>    }
>> Then the "afs" won't work correctly:
>>
>> 377 static int afs_check_write_begin(struct file *file, loff_t pos, unsigned
>> len,
>> 378                                  struct folio *folio, void **_fsdata)
>> 379 {
>> 380         struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
>> 381
>> 382         return test_bit(AFS_VNODE_DELETED, &vnode->flags) ? -ESTALE : 0;
>> 383 }
>>
>> The "afs" does nothing with the folio lock.
> It's OK to fix AFS too.
>
Okay, will fix it. Thanks!

-- Xiubo
Xiubo Li July 4, 2022, 6:58 a.m. UTC | #5
On 7/1/22 6:38 PM, Jeff Layton wrote:
> On Fri, 2022-07-01 at 10:29 +0800, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> The lower layer filesystem should always make sure the folio is
>> locked and do the unlock and put the folio in netfs layer.
>>
>> URL: https://tracker.ceph.com/issues/56423
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/netfs/buffered_read.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
>> index 42f892c5712e..257fd37c2461 100644
>> --- a/fs/netfs/buffered_read.c
>> +++ b/fs/netfs/buffered_read.c
>> @@ -351,8 +351,11 @@ int netfs_write_begin(struct netfs_inode *ctx,
>>   		ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
>>   		if (ret < 0) {
>>   			trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
>> -			if (ret == -EAGAIN)
>> +			if (ret == -EAGAIN) {
>> +				folio_unlock(folio);
>> +				folio_put(folio);
>>   				goto retry;
>> +			}
>>   			goto error;
>>   		}
>>   	}
> I don't know here... I think it might be better to just expect that when
> this function returns an error that the folio has already been unlocked.
> Doing it this way will mean that you will lock and unlock the folio a
> second time for no reason.
>
> Maybe something like this instead?
>
> diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
> index 42f892c5712e..8ae7b0f4c909 100644
> --- a/fs/netfs/buffered_read.c
> +++ b/fs/netfs/buffered_read.c
> @@ -353,7 +353,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
>                          trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
>                          if (ret == -EAGAIN)
>                                  goto retry;
> -                       goto error;
> +                       goto error_unlocked;
>                  }
>          }
>   
> @@ -418,6 +418,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
>   error:
>          folio_unlock(folio);
>          folio_put(folio);
> +error_unlocked:

Should we also put the folio in ceph and afs ? Won't it introduce 
something like use-after-free bug ?

Maybe we should unlock it in ceph and afs and put it in netfs layer.

-- Xiubo



>          _leave(" = %d", ret);
>          return ret;
>   }
>
David Howells July 5, 2022, 1:21 p.m. UTC | #6
Jeff Layton <jlayton@kernel.org> wrote:

> I don't know here... I think it might be better to just expect that when
> this function returns an error that the folio has already been unlocked.
> Doing it this way will mean that you will lock and unlock the folio a
> second time for no reason.

I seem to remember there was some reason you wanted the folio unlocking and
putting.  I guess you need to drop the ref to flush it.

Would it make sense for ->check_write_begin() to be passed a "struct folio
**folio" rather than "struct folio *folio" and then the filesystem can clear
*folio if it disposes of the page?

David
Jeff Layton July 5, 2022, 1:41 p.m. UTC | #7
On Tue, 2022-07-05 at 14:21 +0100, David Howells wrote:
> Jeff Layton <jlayton@kernel.org> wrote:
> 
> > I don't know here... I think it might be better to just expect that when
> > this function returns an error that the folio has already been unlocked.
> > Doing it this way will mean that you will lock and unlock the folio a
> > second time for no reason.
> 
> I seem to remember there was some reason you wanted the folio unlocking and
> putting.  I guess you need to drop the ref to flush it.
> 
> Would it make sense for ->check_write_begin() to be passed a "struct folio
> **folio" rather than "struct folio *folio" and then the filesystem can clear
> *folio if it disposes of the page?
> 

I'd be OK with that too.
Xiubo Li July 6, 2022, 12:58 a.m. UTC | #8
On 7/5/22 9:21 PM, David Howells wrote:
> Jeff Layton <jlayton@kernel.org> wrote:
>
>> I don't know here... I think it might be better to just expect that when
>> this function returns an error that the folio has already been unlocked.
>> Doing it this way will mean that you will lock and unlock the folio a
>> second time for no reason.
> I seem to remember there was some reason you wanted the folio unlocking and
> putting.  I guess you need to drop the ref to flush it.
>
> Would it make sense for ->check_write_begin() to be passed a "struct folio
> **folio" rather than "struct folio *folio" and then the filesystem can clear
> *folio if it disposes of the page?

Yeah, this also sounds good to me.

-- Xiubo


> David
>
diff mbox series

Patch

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 42f892c5712e..257fd37c2461 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -351,8 +351,11 @@  int netfs_write_begin(struct netfs_inode *ctx,
 		ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
 		if (ret < 0) {
 			trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
-			if (ret == -EAGAIN)
+			if (ret == -EAGAIN) {
+				folio_unlock(folio);
+				folio_put(folio);
 				goto retry;
+			}
 			goto error;
 		}
 	}