diff mbox series

[1/2] lockd: fix server crash on reboot of client holding lock

Message ID 20220118220016.GB16108@fieldses.org (mailing list archive)
State New, archived
Headers show
Series [1/2] lockd: fix server crash on reboot of client holding lock | expand

Commit Message

J. Bruce Fields Jan. 18, 2022, 10 p.m. UTC
From: "J. Bruce Fields" <bfields@redhat.com>

I thought I was iterating over the array when actually the iteration is
over the values contained in the array?

Ugh, keep it simple.

Symptoms were a null deference in vfs_lock_file() when an NFSv3 client
that previously held a lock came back up and sent a notify.

Reported-by: Jonathan Woithe <jwoithe@just42.net>
Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/lockd/svcsubs.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Comments

Jonathan Woithe Jan. 18, 2022, 10:20 p.m. UTC | #1
On Tue, Jan 18, 2022 at 05:00:16PM -0500, Bruce Fields wrote:
> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> I thought I was iterating over the array when actually the iteration is
> over the values contained in the array?
> :

Would you like me to apply this against a 5.15.x kernel and test locally? 
Or should I just wait for a 5.15.x stable series update which includes it?

Regards
  jonathan
J. Bruce Fields Jan. 18, 2022, 10:27 p.m. UTC | #2
On Wed, Jan 19, 2022 at 08:50:50AM +1030, Jonathan Woithe wrote:
> On Tue, Jan 18, 2022 at 05:00:16PM -0500, Bruce Fields wrote:
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > I thought I was iterating over the array when actually the iteration is
> > over the values contained in the array?
> > :
> 
> Would you like me to apply this against a 5.15.x kernel and test locally? 
> Or should I just wait for a 5.15.x stable series update which includes it?

I'm pretty confident I'm reproducing the same problem you saw, so it'd
be fine to just wait for an update.

(But if you do test these patches, let us know, one more confirmation
never hurts.)

--b.
Chuck Lever Jan. 19, 2022, 4:18 p.m. UTC | #3
> On Jan 18, 2022, at 5:00 PM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> I thought I was iterating over the array when actually the iteration is
> over the values contained in the array?
> 
> Ugh, keep it simple.
> 
> Symptoms were a null deference in vfs_lock_file() when an NFSv3 client
> that previously held a lock came back up and sent a notify.
> 
> Reported-by: Jonathan Woithe <jwoithe@just42.net>
> Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file")
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
> fs/lockd/svcsubs.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
> index cb3a7512c33e..54c2e42130ca 100644
> --- a/fs/lockd/svcsubs.c
> +++ b/fs/lockd/svcsubs.c
> @@ -179,19 +179,20 @@ nlm_delete_file(struct nlm_file *file)
> static int nlm_unlock_files(struct nlm_file *file)
> {
> 	struct file_lock lock;
> -	struct file *f;
> 
> 	lock.fl_type  = F_UNLCK;
> 	lock.fl_start = 0;
> 	lock.fl_end   = OFFSET_MAX;
> -	for (f = file->f_file[0]; f <= file->f_file[1]; f++) {
> -		if (f && vfs_lock_file(f, F_SETLK, &lock, NULL) < 0) {
> -			pr_warn("lockd: unlock failure in %s:%d\n",
> -				__FILE__, __LINE__);
> -			return 1;
> -		}
> -	}
> +	if (file->f_file[O_RDONLY] &&
> +	    vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL))
> +		goto out_err;
> +	if (file->f_file[O_WRONLY] &&
> +	    vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, &lock, NULL))
> +		goto out_err;
> 	return 0;
> +out_err:
> +	pr_warn("lockd: unlock failure in %s:%d\n", __FILE__, __LINE__);
> +	return 1;
> }
> 
> /*
> -- 
> 2.34.1
> 

Hi Bruce, thanks for the fixes. They've passed my basic smoke tests.
I've applied both patches for the very next nfsd PR. See:

git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git for-next

--
Chuck Lever
Jonathan Woithe Jan. 31, 2022, 10:20 p.m. UTC | #4
On Wed, Jan 19, 2022 at 04:18:10PM +0000, Chuck Lever III wrote:
> > On Jan 18, 2022, at 5:00 PM, Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > I thought I was iterating over the array when actually the iteration is
> > over the values contained in the array?
> > 
> > Ugh, keep it simple.
> > 
> > Symptoms were a null deference in vfs_lock_file() when an NFSv3 client
> > that previously held a lock came back up and sent a notify.
> > 
> > Reported-by: Jonathan Woithe <jwoithe@just42.net>
> > Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file")
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> > :
> Hi Bruce, thanks for the fixes. They've passed my basic smoke tests.
> I've applied both patches for the very next nfsd PR. See:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git for-next

OOI, is it expected that these fixes will appear in a 5.15.x stable branch
patch at some point?  I've looked at the 5.15.17 and 5.15.18 changelogs and
they don't appear to be mentioned yet.

Regards
  jonathan
Chuck Lever Feb. 1, 2022, 2:10 a.m. UTC | #5
> On Jan 31, 2022, at 5:20 PM, Jonathan Woithe <jwoithe@just42.net> wrote:
> 
> On Wed, Jan 19, 2022 at 04:18:10PM +0000, Chuck Lever III wrote:
>>> On Jan 18, 2022, at 5:00 PM, Bruce Fields <bfields@fieldses.org> wrote:
>>> 
>>> From: "J. Bruce Fields" <bfields@redhat.com>
>>> 
>>> I thought I was iterating over the array when actually the iteration is
>>> over the values contained in the array?
>>> 
>>> Ugh, keep it simple.
>>> 
>>> Symptoms were a null deference in vfs_lock_file() when an NFSv3 client
>>> that previously held a lock came back up and sent a notify.
>>> 
>>> Reported-by: Jonathan Woithe <jwoithe@just42.net>
>>> Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file")
>>> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
>>> ---
>>> :
>> Hi Bruce, thanks for the fixes. They've passed my basic smoke tests.
>> I've applied both patches for the very next nfsd PR. See:
>> 
>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git for-next
> 
> OOI, is it expected that these fixes will appear in a 5.15.x stable branch
> patch at some point?  I've looked at the 5.15.17 and 5.15.18 changelogs and
> they don't appear to be mentioned yet.

These are in linux-next right now. I intend to send a pull request this
week. I was waiting for some other fixes, but those are not going as
quickly as I hoped, so I'm going to send these two along with another
fix that are all ready now.

Once they are in v5.17-rc, they will be picked up automatically and
applied to open stable branches. If they do not apply cleanly, then
someone will have to construct and test a version of the fixes
specifically for each of the stable branches.

--
Chuck Lever
Jonathan Woithe March 23, 2022, 11:33 p.m. UTC | #6
On Tue, Jan 18, 2022 at 05:27:03PM -0500, Bruce Fields wrote:
> On Wed, Jan 19, 2022 at 08:50:50AM +1030, Jonathan Woithe wrote:
> > On Tue, Jan 18, 2022 at 05:00:16PM -0500, Bruce Fields wrote:
> > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > 
> > > I thought I was iterating over the array when actually the iteration is
> > > over the values contained in the array?
> > > :
> > 
> > Would you like me to apply this against a 5.15.x kernel and test locally? 
> > Or should I just wait for a 5.15.x stable series update which includes it?
> 
> I'm pretty confident I'm reproducing the same problem you saw, so it'd
> be fine to just wait for an update.
> 
> (But if you do test these patches, let us know, one more confirmation
> never hurts.)

The shift back to a 5.15.x kernel ended up being delayed for a while for
various reasons.  The server concerned was eventually upgraded to 5.15.27 on
9 March 2022.  In that time, client machines have been turned on and off and
inevitably the conditions which caused the crash have been exercised many
times (libreoffice, firefox and thunderbird are used daily on almost all of
the clients).  The server has not experienced the crash since the upgrade. 
On the basis of this I think it's fair to consider our problem solved.

Thanks for your quick response to the report and the rapid provision of the
fix.

Regards
  jonathan
J. Bruce Fields March 24, 2022, 6:28 p.m. UTC | #7
On Thu, Mar 24, 2022 at 10:03:23AM +1030, Jonathan Woithe wrote:
> On Tue, Jan 18, 2022 at 05:27:03PM -0500, Bruce Fields wrote:
> > On Wed, Jan 19, 2022 at 08:50:50AM +1030, Jonathan Woithe wrote:
> > > On Tue, Jan 18, 2022 at 05:00:16PM -0500, Bruce Fields wrote:
> > > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > > 
> > > > I thought I was iterating over the array when actually the iteration is
> > > > over the values contained in the array?
> > > > :
> > > 
> > > Would you like me to apply this against a 5.15.x kernel and test locally? 
> > > Or should I just wait for a 5.15.x stable series update which includes it?
> > 
> > I'm pretty confident I'm reproducing the same problem you saw, so it'd
> > be fine to just wait for an update.
> > 
> > (But if you do test these patches, let us know, one more confirmation
> > never hurts.)
> 
> The shift back to a 5.15.x kernel ended up being delayed for a while for
> various reasons.  The server concerned was eventually upgraded to 5.15.27 on
> 9 March 2022.  In that time, client machines have been turned on and off and
> inevitably the conditions which caused the crash have been exercised many
> times (libreoffice, firefox and thunderbird are used daily on almost all of
> the clients).  The server has not experienced the crash since the upgrade. 
> On the basis of this I think it's fair to consider our problem solved.

Thanks for the confirmation.--b.
diff mbox series

Patch

diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
index cb3a7512c33e..54c2e42130ca 100644
--- a/fs/lockd/svcsubs.c
+++ b/fs/lockd/svcsubs.c
@@ -179,19 +179,20 @@  nlm_delete_file(struct nlm_file *file)
 static int nlm_unlock_files(struct nlm_file *file)
 {
 	struct file_lock lock;
-	struct file *f;
 
 	lock.fl_type  = F_UNLCK;
 	lock.fl_start = 0;
 	lock.fl_end   = OFFSET_MAX;
-	for (f = file->f_file[0]; f <= file->f_file[1]; f++) {
-		if (f && vfs_lock_file(f, F_SETLK, &lock, NULL) < 0) {
-			pr_warn("lockd: unlock failure in %s:%d\n",
-				__FILE__, __LINE__);
-			return 1;
-		}
-	}
+	if (file->f_file[O_RDONLY] &&
+	    vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL))
+		goto out_err;
+	if (file->f_file[O_WRONLY] &&
+	    vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, &lock, NULL))
+		goto out_err;
 	return 0;
+out_err:
+	pr_warn("lockd: unlock failure in %s:%d\n", __FILE__, __LINE__);
+	return 1;
 }
 
 /*