diff mbox series

nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

Message ID 20240711-nfsd-next-v1-1-f9f944500503@kernel.org (mailing list archive)
State New
Headers show
Series nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire | expand

Commit Message

Jeff Layton July 11, 2024, 7:11 p.m. UTC
Given that we do the search and insertion while holding the i_lock, I
don't think it's possible for us to get EEXIST here. Remove this case.

Cc: Youzhong Yang <youzhong@gmail.com>
Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
This is replacement for PATCH 1/3 in the series I sent yesterday. I
think it makes sense to just eliminate this case.
---
 fs/nfsd/filecache.c | 2 --
 1 file changed, 2 deletions(-)


---
base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
change-id: 20240711-nfsd-next-c9d17f66e2bd

Best regards,

Comments

Chuck Lever July 12, 2024, 1:32 p.m. UTC | #1
On Thu, Jul 11, 2024 at 03:11:13PM -0400, Jeff Layton wrote:
> Given that we do the search and insertion while holding the i_lock, I
> don't think it's possible for us to get EEXIST here. Remove this case.
> 
> Cc: Youzhong Yang <youzhong@gmail.com>
> Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> This is replacement for PATCH 1/3 in the series I sent yesterday. I
> think it makes sense to just eliminate this case.
> ---
>  fs/nfsd/filecache.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index f84913691b78..b9dc7c22242c 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (likely(ret == 0))
>  		goto open_file;
>  
> -	if (ret == -EEXIST)
> -		goto retry;
>  	trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
>  	status = nfserr_jukebox;
>  	goto construction_err;
> 
> ---
> base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> change-id: 20240711-nfsd-next-c9d17f66e2bd
> 
> Best regards,
> -- 
> Jeff Layton <jlayton@kernel.org>

Youzhong, can you replace 1/3 in Jeff's file cache series and
test again please?
Youzhong Yang July 12, 2024, 1:37 p.m. UTC | #2
It's in progress, I will report back once it's done, most likely late
this afternoon. This time it will have much more nfs load on the
server.

On Fri, Jul 12, 2024 at 9:32 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On Thu, Jul 11, 2024 at 03:11:13PM -0400, Jeff Layton wrote:
> > Given that we do the search and insertion while holding the i_lock, I
> > don't think it's possible for us to get EEXIST here. Remove this case.
> >
> > Cc: Youzhong Yang <youzhong@gmail.com>
> > Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > This is replacement for PATCH 1/3 in the series I sent yesterday. I
> > think it makes sense to just eliminate this case.
> > ---
> >  fs/nfsd/filecache.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index f84913691b78..b9dc7c22242c 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >       if (likely(ret == 0))
> >               goto open_file;
> >
> > -     if (ret == -EEXIST)
> > -             goto retry;
> >       trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
> >       status = nfserr_jukebox;
> >       goto construction_err;
> >
> > ---
> > base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> > change-id: 20240711-nfsd-next-c9d17f66e2bd
> >
> > Best regards,
> > --
> > Jeff Layton <jlayton@kernel.org>
>
> Youzhong, can you replace 1/3 in Jeff's file cache series and
> test again please?
>
> --
> Chuck Lever
Youzhong Yang July 12, 2024, 6:30 p.m. UTC | #3
Testing is done with this patch applied. All good, no surprise.

On Fri, Jul 12, 2024 at 9:32 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On Thu, Jul 11, 2024 at 03:11:13PM -0400, Jeff Layton wrote:
> > Given that we do the search and insertion while holding the i_lock, I
> > don't think it's possible for us to get EEXIST here. Remove this case.
> >
> > Cc: Youzhong Yang <youzhong@gmail.com>
> > Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > This is replacement for PATCH 1/3 in the series I sent yesterday. I
> > think it makes sense to just eliminate this case.
> > ---
> >  fs/nfsd/filecache.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index f84913691b78..b9dc7c22242c 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >       if (likely(ret == 0))
> >               goto open_file;
> >
> > -     if (ret == -EEXIST)
> > -             goto retry;
> >       trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
> >       status = nfserr_jukebox;
> >       goto construction_err;
> >
> > ---
> > base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> > change-id: 20240711-nfsd-next-c9d17f66e2bd
> >
> > Best regards,
> > --
> > Jeff Layton <jlayton@kernel.org>
>
> Youzhong, can you replace 1/3 in Jeff's file cache series and
> test again please?
>
> --
> Chuck Lever
Chuck Lever July 12, 2024, 6:58 p.m. UTC | #4
On Fri, Jul 12, 2024 at 02:30:04PM -0400, Youzhong Yang wrote:
> Testing is done with this patch applied. All good, no surprise.

Awesome, I've replaced 1/3 with this patch.

These are all applied to a private tree at the moment. I will push
them to the public nfsd-next branch when the 6.11 merge window
closes.


> On Fri, Jul 12, 2024 at 9:32 AM Chuck Lever <chuck.lever@oracle.com> wrote:
> >
> > On Thu, Jul 11, 2024 at 03:11:13PM -0400, Jeff Layton wrote:
> > > Given that we do the search and insertion while holding the i_lock, I
> > > don't think it's possible for us to get EEXIST here. Remove this case.
> > >
> > > Cc: Youzhong Yang <youzhong@gmail.com>
> > > Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > > This is replacement for PATCH 1/3 in the series I sent yesterday. I
> > > think it makes sense to just eliminate this case.
> > > ---
> > >  fs/nfsd/filecache.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > >
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index f84913691b78..b9dc7c22242c 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > >       if (likely(ret == 0))
> > >               goto open_file;
> > >
> > > -     if (ret == -EEXIST)
> > > -             goto retry;
> > >       trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
> > >       status = nfserr_jukebox;
> > >       goto construction_err;
> > >
> > > ---
> > > base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> > > change-id: 20240711-nfsd-next-c9d17f66e2bd
> > >
> > > Best regards,
> > > --
> > > Jeff Layton <jlayton@kernel.org>
> >
> > Youzhong, can you replace 1/3 in Jeff's file cache series and
> > test again please?
> >
> > --
> > Chuck Lever
NeilBrown July 15, 2024, 12:27 a.m. UTC | #5
On Fri, 12 Jul 2024, Jeff Layton wrote:
> Given that we do the search and insertion while holding the i_lock, I
> don't think it's possible for us to get EEXIST here. Remove this case.

I was going to comment that as rhltable_insert() cannot return -EEXIST
that is an extra reason to discard the check.  But then I looked at the
code an I cannot convince myself that it cannot.
If __rhashtable_insert_fast() finds that tbl->future_tbl is not NULL it
calls rhashtable_insert_slow(), and that seems to fail if the key
already exists.  But it shouldn't for an rhltable, it should just add
the new item to the linked list for that key.

It looks like this has always been broken: adding to an rhltable during
a resize event can cause EEXIST....

Would anyone like to check my work?  I'm surprise that hasn't been
noticed if it is really the case.

NeilBrown


> 
> Cc: Youzhong Yang <youzhong@gmail.com>
> Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> This is replacement for PATCH 1/3 in the series I sent yesterday. I
> think it makes sense to just eliminate this case.
> ---
>  fs/nfsd/filecache.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index f84913691b78..b9dc7c22242c 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (likely(ret == 0))
>  		goto open_file;
>  
> -	if (ret == -EEXIST)
> -		goto retry;
>  	trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
>  	status = nfserr_jukebox;
>  	goto construction_err;
> 
> ---
> base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> change-id: 20240711-nfsd-next-c9d17f66e2bd
> 
> Best regards,
> -- 
> Jeff Layton <jlayton@kernel.org>
> 
>
Jeff Layton July 15, 2024, 12:25 p.m. UTC | #6
On Mon, 2024-07-15 at 10:27 +1000, NeilBrown wrote:
> On Fri, 12 Jul 2024, Jeff Layton wrote:
> > Given that we do the search and insertion while holding the i_lock, I
> > don't think it's possible for us to get EEXIST here. Remove this case.
> 
> I was going to comment that as rhltable_insert() cannot return -EEXIST
> that is an extra reason to discard the check.  But then I looked at the
> code an I cannot convince myself that it cannot.
> If __rhashtable_insert_fast() finds that tbl->future_tbl is not NULL it
> calls rhashtable_insert_slow(), and that seems to fail if the key
> already exists.  But it shouldn't for an rhltable, it should just add
> the new item to the linked list for that key.
> 
> It looks like this has always been broken: adding to an rhltable during
> a resize event can cause EEXIST....
> 
> Would anyone like to check my work?  I'm surprise that hasn't been
> noticed if it is really the case.
> 
> 

I don't know this code well at all, but it looks correct to me:

static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
                                   struct rhash_head *obj)
{
        struct bucket_table *new_tbl;
        struct bucket_table *tbl;
        struct rhash_lock_head __rcu **bkt;
        unsigned long flags;
        unsigned int hash;
        void *data;

        new_tbl = rcu_dereference(ht->tbl);

        do {
                tbl = new_tbl;
                hash = rht_head_hashfn(ht, tbl, obj, ht->p);
                if (rcu_access_pointer(tbl->future_tbl))
                        /* Failure is OK */
                        bkt = rht_bucket_var(tbl, hash);
                else
                        bkt = rht_bucket_insert(ht, tbl, hash);
                if (bkt == NULL) {
                        new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
                        data = ERR_PTR(-EAGAIN);
                } else {
                        flags = rht_lock(tbl, bkt);
                        data = rhashtable_lookup_one(ht, bkt, tbl,
                                                     hash, key, obj);
                        new_tbl = rhashtable_insert_one(ht, bkt, tbl,
                                                        hash, obj, data);
                        if (PTR_ERR(new_tbl) != -EEXIST)
                                data = ERR_CAST(new_tbl);

                        rht_unlock(tbl, bkt, flags);
                }
        } while (!IS_ERR_OR_NULL(new_tbl));

        if (PTR_ERR(data) == -EAGAIN)
                data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
                               -EAGAIN);

        return data;
}

I'm assuming the part we need to worry about is where
rhashtable_insert_one returns -EEXIST.

It holds the rht_lock across the lookup and insert though. So if
rhashtable_insert_one returns -EEXIST, then "data" must be something
valid. In that case, "data" won't be overwritten and it will fall
through and return the pointer to the entry already there.

That said, this logic is really convoluted, so I may have missed
something too.

> > 
> > Cc: Youzhong Yang <youzhong@gmail.com>
> > Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > This is replacement for PATCH 1/3 in the series I sent yesterday. I
> > think it makes sense to just eliminate this case.
> > ---
> >  fs/nfsd/filecache.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index f84913691b78..b9dc7c22242c 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (likely(ret == 0))
> >  		goto open_file;
> >  
> > -	if (ret == -EEXIST)
> > -		goto retry;
> >  	trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
> >  	status = nfserr_jukebox;
> >  	goto construction_err;
> > 
> > ---
> > base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> > change-id: 20240711-nfsd-next-c9d17f66e2bd
> > 
> > Best regards,
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > 
> > 
>
Chuck Lever July 15, 2024, 2:06 p.m. UTC | #7
On Mon, Jul 15, 2024 at 08:25:53AM -0400, Jeff Layton wrote:
> On Mon, 2024-07-15 at 10:27 +1000, NeilBrown wrote:
> > On Fri, 12 Jul 2024, Jeff Layton wrote:
> > > Given that we do the search and insertion while holding the i_lock, I
> > > don't think it's possible for us to get EEXIST here. Remove this case.
> > 
> > I was going to comment that as rhltable_insert() cannot return -EEXIST
> > that is an extra reason to discard the check.  But then I looked at the
> > code an I cannot convince myself that it cannot.
> > If __rhashtable_insert_fast() finds that tbl->future_tbl is not NULL it
> > calls rhashtable_insert_slow(), and that seems to fail if the key
> > already exists.  But it shouldn't for an rhltable, it should just add
> > the new item to the linked list for that key.
> > 
> > It looks like this has always been broken: adding to an rhltable during
> > a resize event can cause EEXIST....
> > 
> > Would anyone like to check my work?  I'm surprise that hasn't been
> > noticed if it is really the case.
> > 
> > 
> 
> I don't know this code well at all, but it looks correct to me:
> 
> static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>                                    struct rhash_head *obj)
> {
>         struct bucket_table *new_tbl;
>         struct bucket_table *tbl;
>         struct rhash_lock_head __rcu **bkt;
>         unsigned long flags;
>         unsigned int hash;
>         void *data;
> 
>         new_tbl = rcu_dereference(ht->tbl);
> 
>         do {
>                 tbl = new_tbl;
>                 hash = rht_head_hashfn(ht, tbl, obj, ht->p);
>                 if (rcu_access_pointer(tbl->future_tbl))
>                         /* Failure is OK */
>                         bkt = rht_bucket_var(tbl, hash);
>                 else
>                         bkt = rht_bucket_insert(ht, tbl, hash);
>                 if (bkt == NULL) {
>                         new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>                         data = ERR_PTR(-EAGAIN);
>                 } else {
>                         flags = rht_lock(tbl, bkt);
>                         data = rhashtable_lookup_one(ht, bkt, tbl,
>                                                      hash, key, obj);
>                         new_tbl = rhashtable_insert_one(ht, bkt, tbl,
>                                                         hash, obj, data);
>                         if (PTR_ERR(new_tbl) != -EEXIST)
>                                 data = ERR_CAST(new_tbl);
> 
>                         rht_unlock(tbl, bkt, flags);
>                 }
>         } while (!IS_ERR_OR_NULL(new_tbl));
> 
>         if (PTR_ERR(data) == -EAGAIN)
>                 data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
>                                -EAGAIN);
> 
>         return data;
> }
> 
> I'm assuming the part we need to worry about is where
> rhashtable_insert_one returns -EEXIST.
> 
> It holds the rht_lock across the lookup and insert though. So if
> rhashtable_insert_one returns -EEXIST, then "data" must be something
> valid. In that case, "data" won't be overwritten and it will fall
> through and return the pointer to the entry already there.
> 
> That said, this logic is really convoluted, so I may have missed
> something too.

This is the issue I was concerned about after my review: it's
obvious that the rhtable API can return -EEXIST, but it's just
really hard to tell whether the rh/l/table API will ever return
-EEXIST.

As Neil says, the rhtable "hash table full" case should not happen
with rhltable. But can we prove that?

If we are not yet confident, then maybe PATCH 1/3 should replace
the "if (ret == -EEXIST)" with "WARN_ON(ret == -EEXIST)"...? It's
also possible to ask the human(s) who constructed the rhltable
code. :-)


> > > Cc: Youzhong Yang <youzhong@gmail.com>
> > > Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > > This is replacement for PATCH 1/3 in the series I sent yesterday. I
> > > think it makes sense to just eliminate this case.
> > > ---
> > >  fs/nfsd/filecache.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index f84913691b78..b9dc7c22242c 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > >  	if (likely(ret == 0))
> > >  		goto open_file;
> > >  
> > > -	if (ret == -EEXIST)
> > > -		goto retry;
> > >  	trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
> > >  	status = nfserr_jukebox;
> > >  	goto construction_err;
> > > 
> > > ---
> > > base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> > > change-id: 20240711-nfsd-next-c9d17f66e2bd
> > > 
> > > Best regards,
> > > -- 
> > > Jeff Layton <jlayton@kernel.org>
> > > 
> > > 
> > 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
Youzhong Yang July 29, 2024, 2:26 p.m. UTC | #8
How is this going? any chance to move forward and deal with the EEXIST
case in a future patch? I see no harm in keeping the EEXIST check.

On Mon, Jul 15, 2024 at 10:06 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On Mon, Jul 15, 2024 at 08:25:53AM -0400, Jeff Layton wrote:
> > On Mon, 2024-07-15 at 10:27 +1000, NeilBrown wrote:
> > > On Fri, 12 Jul 2024, Jeff Layton wrote:
> > > > Given that we do the search and insertion while holding the i_lock, I
> > > > don't think it's possible for us to get EEXIST here. Remove this case.
> > >
> > > I was going to comment that as rhltable_insert() cannot return -EEXIST
> > > that is an extra reason to discard the check.  But then I looked at the
> > > code an I cannot convince myself that it cannot.
> > > If __rhashtable_insert_fast() finds that tbl->future_tbl is not NULL it
> > > calls rhashtable_insert_slow(), and that seems to fail if the key
> > > already exists.  But it shouldn't for an rhltable, it should just add
> > > the new item to the linked list for that key.
> > >
> > > It looks like this has always been broken: adding to an rhltable during
> > > a resize event can cause EEXIST....
> > >
> > > Would anyone like to check my work?  I'm surprise that hasn't been
> > > noticed if it is really the case.
> > >
> > >
> >
> > I don't know this code well at all, but it looks correct to me:
> >
> > static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
> >                                    struct rhash_head *obj)
> > {
> >         struct bucket_table *new_tbl;
> >         struct bucket_table *tbl;
> >         struct rhash_lock_head __rcu **bkt;
> >         unsigned long flags;
> >         unsigned int hash;
> >         void *data;
> >
> >         new_tbl = rcu_dereference(ht->tbl);
> >
> >         do {
> >                 tbl = new_tbl;
> >                 hash = rht_head_hashfn(ht, tbl, obj, ht->p);
> >                 if (rcu_access_pointer(tbl->future_tbl))
> >                         /* Failure is OK */
> >                         bkt = rht_bucket_var(tbl, hash);
> >                 else
> >                         bkt = rht_bucket_insert(ht, tbl, hash);
> >                 if (bkt == NULL) {
> >                         new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
> >                         data = ERR_PTR(-EAGAIN);
> >                 } else {
> >                         flags = rht_lock(tbl, bkt);
> >                         data = rhashtable_lookup_one(ht, bkt, tbl,
> >                                                      hash, key, obj);
> >                         new_tbl = rhashtable_insert_one(ht, bkt, tbl,
> >                                                         hash, obj, data);
> >                         if (PTR_ERR(new_tbl) != -EEXIST)
> >                                 data = ERR_CAST(new_tbl);
> >
> >                         rht_unlock(tbl, bkt, flags);
> >                 }
> >         } while (!IS_ERR_OR_NULL(new_tbl));
> >
> >         if (PTR_ERR(data) == -EAGAIN)
> >                 data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
> >                                -EAGAIN);
> >
> >         return data;
> > }
> >
> > I'm assuming the part we need to worry about is where
> > rhashtable_insert_one returns -EEXIST.
> >
> > It holds the rht_lock across the lookup and insert though. So if
> > rhashtable_insert_one returns -EEXIST, then "data" must be something
> > valid. In that case, "data" won't be overwritten and it will fall
> > through and return the pointer to the entry already there.
> >
> > That said, this logic is really convoluted, so I may have missed
> > something too.
>
> This is the issue I was concerned about after my review: it's
> obvious that the rhtable API can return -EEXIST, but it's just
> really hard to tell whether the rh/l/table API will ever return
> -EEXIST.
>
> As Neil says, the rhtable "hash table full" case should not happen
> with rhltable. But can we prove that?
>
> If we are not yet confident, then maybe PATCH 1/3 should replace
> the "if (ret == -EEXIST)" with "WARN_ON(ret == -EEXIST)"...? It's
> also possible to ask the human(s) who constructed the rhltable
> code. :-)
>
>
> > > > Cc: Youzhong Yang <youzhong@gmail.com>
> > > > Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > > This is replacement for PATCH 1/3 in the series I sent yesterday. I
> > > > think it makes sense to just eliminate this case.
> > > > ---
> > > >  fs/nfsd/filecache.c | 2 --
> > > >  1 file changed, 2 deletions(-)
> > > >
> > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > index f84913691b78..b9dc7c22242c 100644
> > > > --- a/fs/nfsd/filecache.c
> > > > +++ b/fs/nfsd/filecache.c
> > > > @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > >   if (likely(ret == 0))
> > > >           goto open_file;
> > > >
> > > > - if (ret == -EEXIST)
> > > > -         goto retry;
> > > >   trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
> > > >   status = nfserr_jukebox;
> > > >   goto construction_err;
> > > >
> > > > ---
> > > > base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
> > > > change-id: 20240711-nfsd-next-c9d17f66e2bd
> > > >
> > > > Best regards,
> > > > --
> > > > Jeff Layton <jlayton@kernel.org>
> > > >
> > > >
> > >
> >
> > --
> > Jeff Layton <jlayton@kernel.org>
>
> --
> Chuck Lever
Chuck Lever July 29, 2024, 2:40 p.m. UTC | #9
> On Jul 29, 2024, at 10:26 AM, Youzhong Yang <youzhong@gmail.com> wrote:
> 
> How is this going? any chance to move forward and deal with the EEXIST
> case in a future patch? I see no harm in keeping the EEXIST check.

The EEXIST patch has been applied to nfsd-next for a while, along
with the other patches in this series.


> On Mon, Jul 15, 2024 at 10:06 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>> On Mon, Jul 15, 2024 at 08:25:53AM -0400, Jeff Layton wrote:
>>> On Mon, 2024-07-15 at 10:27 +1000, NeilBrown wrote:
>>>> On Fri, 12 Jul 2024, Jeff Layton wrote:
>>>>> Given that we do the search and insertion while holding the i_lock, I
>>>>> don't think it's possible for us to get EEXIST here. Remove this case.
>>>> 
>>>> I was going to comment that as rhltable_insert() cannot return -EEXIST
>>>> that is an extra reason to discard the check.  But then I looked at the
>>>> code an I cannot convince myself that it cannot.
>>>> If __rhashtable_insert_fast() finds that tbl->future_tbl is not NULL it
>>>> calls rhashtable_insert_slow(), and that seems to fail if the key
>>>> already exists.  But it shouldn't for an rhltable, it should just add
>>>> the new item to the linked list for that key.
>>>> 
>>>> It looks like this has always been broken: adding to an rhltable during
>>>> a resize event can cause EEXIST....
>>>> 
>>>> Would anyone like to check my work?  I'm surprise that hasn't been
>>>> noticed if it is really the case.
>>>> 
>>>> 
>>> 
>>> I don't know this code well at all, but it looks correct to me:
>>> 
>>> static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>>>                                   struct rhash_head *obj)
>>> {
>>>        struct bucket_table *new_tbl;
>>>        struct bucket_table *tbl;
>>>        struct rhash_lock_head __rcu **bkt;
>>>        unsigned long flags;
>>>        unsigned int hash;
>>>        void *data;
>>> 
>>>        new_tbl = rcu_dereference(ht->tbl);
>>> 
>>>        do {
>>>                tbl = new_tbl;
>>>                hash = rht_head_hashfn(ht, tbl, obj, ht->p);
>>>                if (rcu_access_pointer(tbl->future_tbl))
>>>                        /* Failure is OK */
>>>                        bkt = rht_bucket_var(tbl, hash);
>>>                else
>>>                        bkt = rht_bucket_insert(ht, tbl, hash);
>>>                if (bkt == NULL) {
>>>                        new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>>>                        data = ERR_PTR(-EAGAIN);
>>>                } else {
>>>                        flags = rht_lock(tbl, bkt);
>>>                        data = rhashtable_lookup_one(ht, bkt, tbl,
>>>                                                     hash, key, obj);
>>>                        new_tbl = rhashtable_insert_one(ht, bkt, tbl,
>>>                                                        hash, obj, data);
>>>                        if (PTR_ERR(new_tbl) != -EEXIST)
>>>                                data = ERR_CAST(new_tbl);
>>> 
>>>                        rht_unlock(tbl, bkt, flags);
>>>                }
>>>        } while (!IS_ERR_OR_NULL(new_tbl));
>>> 
>>>        if (PTR_ERR(data) == -EAGAIN)
>>>                data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
>>>                               -EAGAIN);
>>> 
>>>        return data;
>>> }
>>> 
>>> I'm assuming the part we need to worry about is where
>>> rhashtable_insert_one returns -EEXIST.
>>> 
>>> It holds the rht_lock across the lookup and insert though. So if
>>> rhashtable_insert_one returns -EEXIST, then "data" must be something
>>> valid. In that case, "data" won't be overwritten and it will fall
>>> through and return the pointer to the entry already there.
>>> 
>>> That said, this logic is really convoluted, so I may have missed
>>> something too.
>> 
>> This is the issue I was concerned about after my review: it's
>> obvious that the rhtable API can return -EEXIST, but it's just
>> really hard to tell whether the rh/l/table API will ever return
>> -EEXIST.
>> 
>> As Neil says, the rhtable "hash table full" case should not happen
>> with rhltable. But can we prove that?
>> 
>> If we are not yet confident, then maybe PATCH 1/3 should replace
>> the "if (ret == -EEXIST)" with "WARN_ON(ret == -EEXIST)"...? It's
>> also possible to ask the human(s) who constructed the rhltable
>> code. :-)
>> 
>> 
>>>>> Cc: Youzhong Yang <youzhong@gmail.com>
>>>>> Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>> ---
>>>>> This is replacement for PATCH 1/3 in the series I sent yesterday. I
>>>>> think it makes sense to just eliminate this case.
>>>>> ---
>>>>> fs/nfsd/filecache.c | 2 --
>>>>> 1 file changed, 2 deletions(-)
>>>>> 
>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>>> index f84913691b78..b9dc7c22242c 100644
>>>>> --- a/fs/nfsd/filecache.c
>>>>> +++ b/fs/nfsd/filecache.c
>>>>> @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>  if (likely(ret == 0))
>>>>>          goto open_file;
>>>>> 
>>>>> - if (ret == -EEXIST)
>>>>> -         goto retry;
>>>>>  trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
>>>>>  status = nfserr_jukebox;
>>>>>  goto construction_err;
>>>>> 
>>>>> ---
>>>>> base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
>>>>> change-id: 20240711-nfsd-next-c9d17f66e2bd
>>>>> 
>>>>> Best regards,
>>>>> --
>>>>> Jeff Layton <jlayton@kernel.org>
>>>>> 
>>>>> 
>>>> 
>>> 
>>> --
>>> Jeff Layton <jlayton@kernel.org>
>> 
>> --
>> Chuck Lever

--
Chuck Lever
diff mbox series

Patch

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index f84913691b78..b9dc7c22242c 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -1038,8 +1038,6 @@  nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (likely(ret == 0))
 		goto open_file;
 
-	if (ret == -EEXIST)
-		goto retry;
 	trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
 	status = nfserr_jukebox;
 	goto construction_err;