[0/7] lockd: fix races that can result in stuck filelocks

Message ID	20230303121603.132103-1-jlayton@kernel.org (mailing list archive)
Headers	show Return-Path: <linux-nfs-owner@vger.kernel.org> From: Jeff Layton <jlayton@kernel.org> To: trond.myklebust@hammerspace.com, anna@kernel.org, chuck.lever@oracle.com Cc: linux-nfs@vger.kernel.org, yoyang@redhat.com Subject: [PATCH 0/7] lockd: fix races that can result in stuck filelocks Date: Fri, 3 Mar 2023 07:15:56 -0500 Message-Id: <20230303121603.132103-1-jlayton@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	lockd: fix races that can result in stuck filelocks \| expand [0/7] lockd: fix races that can result in stuck filelocks [1/7] lockd: purge resources held on behalf of nlm clients when shutting down [2/7] lockd: remove 2 unused helper functions [3/7] lockd: move struct nlm_wait to lockd.h [4/7] lockd: fix races in client GRANTED_MSG wait logic [5/7] lockd: server should unlock lock if client rejects the grant [6/7] nfs: move nfs_fhandle_hash to common include file [7/7] lockd: add some client-side tracepoints

Jeff Layton March 3, 2023, 12:15 p.m. UTC

I sent the first patch in this series the other day, but didn't get any
responses. Since then I've had time to follow up on the client-side part
of this problem, which eventually also pointed out yet another bug on
the server side. There are also a couple of cleanup patches in here too,
and a patch to add some tracepoints that I found useful while diagnosing
this.

With this set on both client and server, I'm now able to run Yongcheng's
test for an hour straight with no stuck locks.

Jeff Layton (7):
  lockd: purge resources held on behalf of nlm clients when shutting
    down
  lockd: remove 2 unused helper functions
  lockd: move struct nlm_wait to lockd.h
  lockd: fix races in client GRANTED_MSG wait logic
  lockd: server should unlock lock if client rejects the grant
  nfs: move nfs_fhandle_hash to common include file
  lockd: add some client-side tracepoints

 fs/lockd/Makefile           |  6 ++-
 fs/lockd/clntlock.c         | 58 +++++++++++---------------
 fs/lockd/clntproc.c         | 42 ++++++++++++++-----
 fs/lockd/host.c             |  1 +
 fs/lockd/svclock.c          | 21 ++++++++--
 fs/lockd/trace.c            |  3 ++
 fs/lockd/trace.h            | 83 +++++++++++++++++++++++++++++++++++++
 fs/nfs/internal.h           | 15 -------
 include/linux/lockd/lockd.h | 29 ++++++-------
 include/linux/nfs.h         | 20 +++++++++
 10 files changed, 200 insertions(+), 78 deletions(-)
 create mode 100644 fs/lockd/trace.c
 create mode 100644 fs/lockd/trace.h

Chuck Lever III March 3, 2023, 2:41 p.m. UTC | #1

> On Mar 3, 2023, at 7:15 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> I sent the first patch in this series the other day, but didn't get any
> responses.

We'll have to work out who will take which patches in this set.
Once fully reviewed, I can take the set if the client maintainers
send Acks for 2-4 and 6-7.

nfsd-next for v6.4 is not yet open. I can work on setting that up
today.


> Since then I've had time to follow up on the client-side part
> of this problem, which eventually also pointed out yet another bug on
> the server side. There are also a couple of cleanup patches in here too,
> and a patch to add some tracepoints that I found useful while diagnosing
> this.
> 
> With this set on both client and server, I'm now able to run Yongcheng's
> test for an hour straight with no stuck locks.
> 
> Jeff Layton (7):
>  lockd: purge resources held on behalf of nlm clients when shutting
>    down
>  lockd: remove 2 unused helper functions
>  lockd: move struct nlm_wait to lockd.h
>  lockd: fix races in client GRANTED_MSG wait logic
>  lockd: server should unlock lock if client rejects the grant
>  nfs: move nfs_fhandle_hash to common include file
>  lockd: add some client-side tracepoints
> 
> fs/lockd/Makefile           |  6 ++-
> fs/lockd/clntlock.c         | 58 +++++++++++---------------
> fs/lockd/clntproc.c         | 42 ++++++++++++++-----
> fs/lockd/host.c             |  1 +
> fs/lockd/svclock.c          | 21 ++++++++--
> fs/lockd/trace.c            |  3 ++
> fs/lockd/trace.h            | 83 +++++++++++++++++++++++++++++++++++++
> fs/nfs/internal.h           | 15 -------
> include/linux/lockd/lockd.h | 29 ++++++-------
> include/linux/nfs.h         | 20 +++++++++
> 10 files changed, 200 insertions(+), 78 deletions(-)
> create mode 100644 fs/lockd/trace.c
> create mode 100644 fs/lockd/trace.h
> 
> -- 
> 2.39.2
> 

--
Chuck Lever

Chuck Lever III March 3, 2023, 6:11 p.m. UTC | #2

> On Mar 3, 2023, at 9:41 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Mar 3, 2023, at 7:15 AM, Jeff Layton <jlayton@kernel.org> wrote:
>> 
>> I sent the first patch in this series the other day, but didn't get any
>> responses.
> 
> We'll have to work out who will take which patches in this set.
> Once fully reviewed, I can take the set if the client maintainers
> send Acks for 2-4 and 6-7.
> 
> nfsd-next for v6.4 is not yet open. I can work on setting that up
> today.
> 
> 
>> Since then I've had time to follow up on the client-side part
>> of this problem, which eventually also pointed out yet another bug on
>> the server side. There are also a couple of cleanup patches in here too,
>> and a patch to add some tracepoints that I found useful while diagnosing
>> this.
>> 
>> With this set on both client and server, I'm now able to run Yongcheng's
>> test for an hour straight with no stuck locks.
>> 
>> Jeff Layton (7):
>> lockd: purge resources held on behalf of nlm clients when shutting
>>   down
>> lockd: remove 2 unused helper functions
>> lockd: move struct nlm_wait to lockd.h
>> lockd: fix races in client GRANTED_MSG wait logic
>> lockd: server should unlock lock if client rejects the grant
>> nfs: move nfs_fhandle_hash to common include file
>> lockd: add some client-side tracepoints
>> 
>> fs/lockd/Makefile           |  6 ++-
>> fs/lockd/clntlock.c         | 58 +++++++++++---------------
>> fs/lockd/clntproc.c         | 42 ++++++++++++++-----
>> fs/lockd/host.c             |  1 +
>> fs/lockd/svclock.c          | 21 ++++++++--
>> fs/lockd/trace.c            |  3 ++
>> fs/lockd/trace.h            | 83 +++++++++++++++++++++++++++++++++++++
>> fs/nfs/internal.h           | 15 -------
>> include/linux/lockd/lockd.h | 29 ++++++-------
>> include/linux/nfs.h         | 20 +++++++++
>> 10 files changed, 200 insertions(+), 78 deletions(-)
>> create mode 100644 fs/lockd/trace.c
>> create mode 100644 fs/lockd/trace.h
>> 
>> -- 
>> 2.39.2

I've opened nfsd-next for v6.4 and applied these. I can drop any
that the client maintainers wish to take through their tree or
would prefer to reject.

Noted that several of these had checkpatch.pl warnings or errors.
I fixed up the issues before applying them.


--
Chuck Lever

Amir Goldstein March 12, 2023, 3:33 p.m. UTC | #3

On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Mar 3, 2023, at 7:15 AM, Jeff Layton <jlayton@kernel.org> wrote:
> >
> > I sent the first patch in this series the other day, but didn't get any
> > responses.
>
> We'll have to work out who will take which patches in this set.
> Once fully reviewed, I can take the set if the client maintainers
> send Acks for 2-4 and 6-7.
>
> nfsd-next for v6.4 is not yet open. I can work on setting that up
> today.
>
>
> > Since then I've had time to follow up on the client-side part
> > of this problem, which eventually also pointed out yet another bug on
> > the server side. There are also a couple of cleanup patches in here too,
> > and a patch to add some tracepoints that I found useful while diagnosing
> > this.
> >
> > With this set on both client and server, I'm now able to run Yongcheng's
> > test for an hour straight with no stuck locks.

My nfstest_lock test occasionally gets into an endless wait loop for the lock in
one of the optests.

AFAIK, this started happening after I upgraded my client machine to v5.15.88.
Does this seem related to the client bug fixes in this patch set?

If so, is this bug a regression? and why are the fixes aimed for v6.4?

Thanks,
Amir.

Chuck Lever III March 12, 2023, 4:44 p.m. UTC | #4

> On Mar 12, 2023, at 11:33 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On Mar 3, 2023, at 7:15 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>> 
>>> I sent the first patch in this series the other day, but didn't get any
>>> responses.
>> 
>> We'll have to work out who will take which patches in this set.
>> Once fully reviewed, I can take the set if the client maintainers
>> send Acks for 2-4 and 6-7.
>> 
>> nfsd-next for v6.4 is not yet open. I can work on setting that up
>> today.
>> 
>> 
>>> Since then I've had time to follow up on the client-side part
>>> of this problem, which eventually also pointed out yet another bug on
>>> the server side. There are also a couple of cleanup patches in here too,
>>> and a patch to add some tracepoints that I found useful while diagnosing
>>> this.
>>> 
>>> With this set on both client and server, I'm now able to run Yongcheng's
>>> test for an hour straight with no stuck locks.
> 
> My nfstest_lock test occasionally gets into an endless wait loop for the lock in
> one of the optests.
> 
> AFAIK, this started happening after I upgraded my client machine to v5.15.88.
> Does this seem related to the client bug fixes in this patch set?

I will let Jeff tackle that question. He did not add a Fixes:
tag, so it's difficult to say off-hand.


> If so, is this bug a regression?

If your test misbehavior is related to these fixes, then probably
yes. But this is the first I've heard of a longer-term problem.


> and why are the fixes aimed for v6.4?

Because these are test failures, not failures of non-artificial
workloads. Also because we haven't heard reports, potential or
otherwise, of a regression, until now.

Since they are test failures only, there doesn't seem to be an
urgency to get them into 6.3-rc. I prefer to let these sit in
-next for a bit, therefore. As we are well aware, patches that
go into -rc are rather aggressively pulled into stable, and I
would like to have some confidence that these fixes do not
introduce further problems.

You are welcome to petition for faster integration. It would
help if you had a positive test result to share with us.

--
Chuck Lever

Jeff Layton March 13, 2023, 10:45 a.m. UTC | #5

On Sun, 2023-03-12 at 17:33 +0200, Amir Goldstein wrote:
> On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> > 
> > 
> > 
> > > On Mar 3, 2023, at 7:15 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > I sent the first patch in this series the other day, but didn't get any
> > > responses.
> > 
> > We'll have to work out who will take which patches in this set.
> > Once fully reviewed, I can take the set if the client maintainers
> > send Acks for 2-4 and 6-7.
> > 
> > nfsd-next for v6.4 is not yet open. I can work on setting that up
> > today.
> > 
> > 
> > > Since then I've had time to follow up on the client-side part
> > > of this problem, which eventually also pointed out yet another bug on
> > > the server side. There are also a couple of cleanup patches in here too,
> > > and a patch to add some tracepoints that I found useful while diagnosing
> > > this.
> > > 
> > > With this set on both client and server, I'm now able to run Yongcheng's
> > > test for an hour straight with no stuck locks.
> 
> My nfstest_lock test occasionally gets into an endless wait loop for the lock in
> one of the optests.
> 
> AFAIK, this started happening after I upgraded my client machine to v5.15.88.
> Does this seem related to the client bug fixes in this patch set?
> 
> If so, is this bug a regression? and why are the fixes aimed for v6.4?
> 

Most of this (lockd) code hasn't changed in well over a decade, so if
this is a regression then it's a very old one. I suppose it's possible
that this regressed after the BKL was removed from this code, but that
was a long time ago now and I'm not sure I can identify a commit that
this fixes.

I'm fine with this going in sooner than v6.4, but given that this has
been broken so long, I didn't see the need to rush.

Cheers,

Amir Goldstein March 13, 2023, 3:14 p.m. UTC | #6

On Mon, Mar 13, 2023 at 12:45 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Sun, 2023-03-12 at 17:33 +0200, Amir Goldstein wrote:
> > On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> > >
> > >
> > >
> > > > On Mar 3, 2023, at 7:15 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > >
> > > > I sent the first patch in this series the other day, but didn't get any
> > > > responses.
> > >
> > > We'll have to work out who will take which patches in this set.
> > > Once fully reviewed, I can take the set if the client maintainers
> > > send Acks for 2-4 and 6-7.
> > >
> > > nfsd-next for v6.4 is not yet open. I can work on setting that up
> > > today.
> > >
> > >
> > > > Since then I've had time to follow up on the client-side part
> > > > of this problem, which eventually also pointed out yet another bug on
> > > > the server side. There are also a couple of cleanup patches in here too,
> > > > and a patch to add some tracepoints that I found useful while diagnosing
> > > > this.
> > > >
> > > > With this set on both client and server, I'm now able to run Yongcheng's
> > > > test for an hour straight with no stuck locks.
> >
> > My nfstest_lock test occasionally gets into an endless wait loop for the lock in
> > one of the optests.

I forgot to mention that the regression is only with nfsversion=3!
Is anyone else running nfstest_lock with nfsversion=3?

> >
> > AFAIK, this started happening after I upgraded my client machine to v5.15.88.
> > Does this seem related to the client bug fixes in this patch set?
> >
> > If so, is this bug a regression? and why are the fixes aimed for v6.4?
> >
>
> Most of this (lockd) code hasn't changed in well over a decade, so if
> this is a regression then it's a very old one. I suppose it's possible
> that this regressed after the BKL was removed from this code, but that
> was a long time ago now and I'm not sure I can identify a commit that
> this fixes.
>
> I'm fine with this going in sooner than v6.4, but given that this has
> been broken so long, I didn't see the need to rush.
>

I don't know what is the relation of the optest regression that I am
experiencing and the client and server bugs mentioned in this patch set.
I just re-tested optest01 with several combinations of client-server kernels.
I rebooted both client and server before each test.
The results are a bit odd:

client           server      optest01 result
------------------------------------------------------
5.10.109     5.10.109  optest01 completes successfully after <30s
5.15.88       5.15.88    optest01 never completes (see attached log)
5.15.88       5.10.109  optest01 never completes
5.15.88+ [*] 5.15.88   optest01 never completes
5.15.88+     5.10.109  optest01 never completes
5.15.88+     5.15.88+  optest01 completes successfully after ~300s [**]

Unless I missed something with the tests, it looks like
1.a. There was a regressions in client from 5.10.109..5.15.88
1.b. The regression is manifested with both 5.10 and 5.15 servers
2.a. The patches improve the situation (from infinite to 30s per wait)...
2.b. ...but only when applied to both client and server and...
2.c. The situation is still a lot worse than 5.10 client with 5.10 server

Attached also the NFS[D] Kconfig which is identical for the tested
5.10 and 5.15 kernels.

Do you need me to provide any traces or any other info?

Thanks,
Amir.

[*] 5.15.88+ stands for 5.15.88 + the patches in this set, which all
apply cleanly
[**] The test takes 300s because every single 30s wait takes the entire 30s:

    DBG1: 15:21:47.118095 - Unlock file (F_UNLCK, F_SETLK) off=0 len=0
range(0, 18446744073709551615)
    DBG3: 15:21:47.119832 - Wait up to 30 secs to check if blocked
lock has been granted @253.87
    DBG3: 15:21:48.121296 - Check if blocked lock has been granted @254.87
...
    DBG3: 15:22:14.158314 - Check if blocked lock has been granted @280.90
    DBG3: 15:22:15.017594 - Getting results from blocked lock @281.76
    DBG1: 15:22:15.017832 - Unlock file (F_UNLCK, F_SETLK) off=0 len=0
range(0, 18446744073709551615) on second process @281.76
    PASS: Locking byte range (72 passed, 0 failed)

Jeff Layton March 13, 2023, 7:19 p.m. UTC | #7

On Mon, 2023-03-13 at 17:14 +0200, Amir Goldstein wrote:
> On Mon, Mar 13, 2023 at 12:45 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Sun, 2023-03-12 at 17:33 +0200, Amir Goldstein wrote:
> > > On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> > > > 
> > > > 
> > > > 
> > > > > On Mar 3, 2023, at 7:15 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > 
> > > > > I sent the first patch in this series the other day, but didn't get any
> > > > > responses.
> > > > 
> > > > We'll have to work out who will take which patches in this set.
> > > > Once fully reviewed, I can take the set if the client maintainers
> > > > send Acks for 2-4 and 6-7.
> > > > 
> > > > nfsd-next for v6.4 is not yet open. I can work on setting that up
> > > > today.
> > > > 
> > > > 
> > > > > Since then I've had time to follow up on the client-side part
> > > > > of this problem, which eventually also pointed out yet another bug on
> > > > > the server side. There are also a couple of cleanup patches in here too,
> > > > > and a patch to add some tracepoints that I found useful while diagnosing
> > > > > this.
> > > > > 
> > > > > With this set on both client and server, I'm now able to run Yongcheng's
> > > > > test for an hour straight with no stuck locks.
> > > 
> > > My nfstest_lock test occasionally gets into an endless wait loop for the lock in
> > > one of the optests.
> 
> I forgot to mention that the regression is only with nfsversion=3!
> Is anyone else running nfstest_lock with nfsversion=3?
> 
> > > 
> > > AFAIK, this started happening after I upgraded my client machine to v5.15.88.
> > > Does this seem related to the client bug fixes in this patch set?
> > > 
> > > If so, is this bug a regression? and why are the fixes aimed for v6.4?
> > > 
> > 
> > Most of this (lockd) code hasn't changed in well over a decade, so if
> > this is a regression then it's a very old one. I suppose it's possible
> > that this regressed after the BKL was removed from this code, but that
> > was a long time ago now and I'm not sure I can identify a commit that
> > this fixes.
> > 
> > I'm fine with this going in sooner than v6.4, but given that this has
> > been broken so long, I didn't see the need to rush.
> > 
> 
> I don't know what is the relation of the optest regression that I am
> experiencing and the client and server bugs mentioned in this patch set.
> I just re-tested optest01 with several combinations of client-server kernels.
> I rebooted both client and server before each test.
> The results are a bit odd:
> 
> client           server      optest01 result
> ------------------------------------------------------
> 5.10.109     5.10.109  optest01 completes successfully after <30s
> 5.15.88       5.15.88    optest01 never completes (see attached log)
> 5.15.88       5.10.109  optest01 never completes
> 5.15.88+ [*] 5.15.88   optest01 never completes
> 5.15.88+     5.10.109  optest01 never completes
> 5.15.88+     5.15.88+  optest01 completes successfully after ~300s [**]
> 
> Unless I missed something with the tests, it looks like
> 1.a. There was a regressions in client from 5.10.109..5.15.88
> 1.b. The regression is manifested with both 5.10 and 5.15 servers
> 2.a. The patches improve the situation (from infinite to 30s per wait)...
> 2.b. ...but only when applied to both client and server and...
> 2.c. The situation is still a lot worse than 5.10 client with 5.10 server
> 
> Attached also the NFS[D] Kconfig which is identical for the tested
> 5.10 and 5.15 kernels.
> 
> Do you need me to provide any traces or any other info?
> 
> Thanks,
> Amir.
> 
> [*] 5.15.88+ stands for 5.15.88 + the patches in this set, which all
> apply cleanly
> [**] The test takes 300s because every single 30s wait takes the entire 30s:
> 
>     DBG1: 15:21:47.118095 - Unlock file (F_UNLCK, F_SETLK) off=0 len=0
> range(0, 18446744073709551615)
>     DBG3: 15:21:47.119832 - Wait up to 30 secs to check if blocked
> lock has been granted @253.87
>     DBG3: 15:21:48.121296 - Check if blocked lock has been granted @254.87
> ...
>     DBG3: 15:22:14.158314 - Check if blocked lock has been granted @280.90
>     DBG3: 15:22:15.017594 - Getting results from blocked lock @281.76
>     DBG1: 15:22:15.017832 - Unlock file (F_UNLCK, F_SETLK) off=0 len=0
> range(0, 18446744073709551615) on second process @281.76
>     PASS: Locking byte range (72 passed, 0 failed)

This sounds like a different problem than what this patchset fixes. This
patchset is really all about signal handling during the wait for a lock.
That sounds more like the wait is just not completing?

I just kicked off this test in nfstests with vers=3 and I think I see
the same 30s stalls. Coincidentally:

    #define NLMCLNT_POLL_TIMEOUT    (30*HZ)                            

So it does look like something may be going wrong with the lock granting
mechanism. I'll need to do a bit of investigation to figure out what's
going on.

[0/7] lockd: fix races that can result in stuck filelocks

Message

Comments