diff mbox

[v3,1/1] PNFS fix dangling DS mount

Message ID 1500584980.6577.4.camel@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust July 20, 2017, 9:09 p.m. UTC
On Thu, 2017-07-20 at 16:14 -0400, Olga Kornievskaia wrote:
> > On Jul 20, 2017, at 3:56 PM, Trond Myklebust <trondmy@primarydata.c
> > om> wrote:
> > 
> > Hi Olga,
> > 
> > Apologies for missing this patch. It was hiding in my 'linux-
> > fsdevel'
> > mailbox, so I didn't recognise it as a NFS patch.
> > 
> 
> Yeah after I mailed it out I realized I cc-ed fsdevel incorrectly.
> 
> > On Fri, 2017-06-30 at 15:52 -0400, Olga Kornievskaia wrote:
> > > There is a regression by commit 8d40b0f14846 ("NFS
> > > filelayout:call
> > > GETDEVICEINFO after pnfs_layout_process completes"). It leaves
> > > the
> > > DS mount dangling.
> > > 
> > > Previously, filelayout_alloc_sec() would call
> > > filelayout_check_layout()
> > > which would call nfs4_find_get_deviceid which ups the count on
> > > the
> > > device_id. It's only called once and it's matched by the
> > > filelayout_free_lseg() that calls nfs4_fl_put_deviceid().
> > > 
> > > After that patch, each read/write ends up calling
> > > nfs4_find_get_deviceid
> > > and there is no balance for that. Instead, do
> > > nfs4_fl_put_deviceid()
> > > in the filelayout's .pg_cleanup and remove it from
> > > filelayout_free_lseg.
> > > 
> > > But we still need a reference to hold over the lifetime of the
> > > segment.
> > > For every new lseg that's created we need to take a reference on
> > > deviceid
> > > that uses it. It will be released in the "free_lseg" routine.
> > 
> > This is what I'm not understanding. If you have a reference in the
> > layout segment, then why do you need to call
> > nfs4_find_get_deviceid()
> > in the read/write code?
> 
> I think I’m probably misunderstanding the question. It sounds to me
> that you asking me for why the commit 8d40b0f14846 was done the way
> it was done (I’d would say it was done as per your suggestion).
> 
> I would say the call to nfs4_find_get_deviceid() has always been in
> the read/write code. It was a part of the pnfs_update_layout() but
> the commit 8d40b0f14846 moved it out of it (so that the layoutget was
> complete) and then the call to the getdeviceinfo would be done. 
> 
> > Isn't it sufficient to change the "pg_init" calls to check whether
> > or
> > not the struct nfs4_filelayout_segment has set a value for dsaddr
> > (that
> > needs to be done with care to avoid races - cmpxchg() is your
> > friend),
> > and then rely on that reference being set for the remainder of the
> > layout segment lifetime?
> 
> Are you suggesting to change when getdeviceinfo is suppose to be
> called?
> 

No. Now that I'm looking at filelayout_check_deviceid(), here is what I
 suggest: we need to ensure that filelayout_check_deviceid() sets the
deviceid once, and only once!

How about something like the following (untested) patch?

8<-------------------------------------------------------
From 1e40ee13950d03ad2a54a0d1dba35f2a04c28ca0 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Thu, 20 Jul 2017 17:00:02 -0400
Subject: [PATCH] NFS/filelayout: Fix racy setting of fl->dsaddr in
 filelayout_check_deviceid()

We must set fl->dsaddr once, and once only, even if there are multiple
processes calling filelayout_check_deviceid() for the same layout
segment.

Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/filelayout/filelayout.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

-- 
2.13.3

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

Comments

Olga Kornievskaia July 21, 2017, 3:46 p.m. UTC | #1
On Thu, Jul 20, 2017 at 5:09 PM, Trond Myklebust
<trondmy@primarydata.com> wrote:
> On Thu, 2017-07-20 at 16:14 -0400, Olga Kornievskaia wrote:
>> > On Jul 20, 2017, at 3:56 PM, Trond Myklebust <trondmy@primarydata.c
>> > om> wrote:
>> >
>> > Hi Olga,
>> >
>> > Apologies for missing this patch. It was hiding in my 'linux-
>> > fsdevel'
>> > mailbox, so I didn't recognise it as a NFS patch.
>> >
>>
>> Yeah after I mailed it out I realized I cc-ed fsdevel incorrectly.
>>
>> > On Fri, 2017-06-30 at 15:52 -0400, Olga Kornievskaia wrote:
>> > > There is a regression by commit 8d40b0f14846 ("NFS
>> > > filelayout:call
>> > > GETDEVICEINFO after pnfs_layout_process completes"). It leaves
>> > > the
>> > > DS mount dangling.
>> > >
>> > > Previously, filelayout_alloc_sec() would call
>> > > filelayout_check_layout()
>> > > which would call nfs4_find_get_deviceid which ups the count on
>> > > the
>> > > device_id. It's only called once and it's matched by the
>> > > filelayout_free_lseg() that calls nfs4_fl_put_deviceid().
>> > >
>> > > After that patch, each read/write ends up calling
>> > > nfs4_find_get_deviceid
>> > > and there is no balance for that. Instead, do
>> > > nfs4_fl_put_deviceid()
>> > > in the filelayout's .pg_cleanup and remove it from
>> > > filelayout_free_lseg.
>> > >
>> > > But we still need a reference to hold over the lifetime of the
>> > > segment.
>> > > For every new lseg that's created we need to take a reference on
>> > > deviceid
>> > > that uses it. It will be released in the "free_lseg" routine.
>> >
>> > This is what I'm not understanding. If you have a reference in the
>> > layout segment, then why do you need to call
>> > nfs4_find_get_deviceid()
>> > in the read/write code?
>>
>> I think I’m probably misunderstanding the question. It sounds to me
>> that you asking me for why the commit 8d40b0f14846 was done the way
>> it was done (I’d would say it was done as per your suggestion).
>>
>> I would say the call to nfs4_find_get_deviceid() has always been in
>> the read/write code. It was a part of the pnfs_update_layout() but
>> the commit 8d40b0f14846 moved it out of it (so that the layoutget was
>> complete) and then the call to the getdeviceinfo would be done.
>>
>> > Isn't it sufficient to change the "pg_init" calls to check whether
>> > or
>> > not the struct nfs4_filelayout_segment has set a value for dsaddr
>> > (that
>> > needs to be done with care to avoid races - cmpxchg() is your
>> > friend),
>> > and then rely on that reference being set for the remainder of the
>> > layout segment lifetime?
>>
>> Are you suggesting to change when getdeviceinfo is suppose to be
>> called?
>>
>
> No. Now that I'm looking at filelayout_check_deviceid(), here is what I
>  suggest: we need to ensure that filelayout_check_deviceid() sets the
> deviceid once, and only once!
>
> How about something like the following (untested) patch?
>
> 8<-------------------------------------------------------
> From 1e40ee13950d03ad2a54a0d1dba35f2a04c28ca0 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@primarydata.com>
> Date: Thu, 20 Jul 2017 17:00:02 -0400
> Subject: [PATCH] NFS/filelayout: Fix racy setting of fl->dsaddr in
>  filelayout_check_deviceid()
>
> We must set fl->dsaddr once, and once only, even if there are multiple
> processes calling filelayout_check_deviceid() for the same layout
> segment.
>
> Reported-by: Olga Kornievskaia <kolga@netapp.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> ---
>  fs/nfs/filelayout/filelayout.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
> index 080fc6b278bd..44c638b7876c 100644
> --- a/fs/nfs/filelayout/filelayout.c
> +++ b/fs/nfs/filelayout/filelayout.c
> @@ -542,6 +542,10 @@ filelayout_check_deviceid(struct pnfs_layout_hdr *lo,
>         struct nfs4_file_layout_dsaddr *dsaddr;
>         int status = -EINVAL;
>
> +       /* Is the deviceid already set? If so, we're good. */
> +       if (fl->dsaddr != NULL)
> +               return 0;
> +
>         /* find and reference the deviceid */
>         d = nfs4_find_get_deviceid(NFS_SERVER(lo->plh_inode), &fl->deviceid,
>                         lo->plh_lc_cred, gfp_flags);
> @@ -553,8 +557,6 @@ filelayout_check_deviceid(struct pnfs_layout_hdr *lo,
>         if (filelayout_test_devid_unavailable(&dsaddr->id_node))
>                 goto out_put;
>
> -       fl->dsaddr = dsaddr;
> -
>         if (fl->first_stripe_index >= dsaddr->stripe_count) {
>                 dprintk("%s Bad first_stripe_index %u\n",
>                                 __func__, fl->first_stripe_index);
> @@ -570,6 +572,13 @@ filelayout_check_deviceid(struct pnfs_layout_hdr *lo,
>                 goto out_put;
>         }
>         status = 0;
> +
> +       /*
> +        * Atomic compare and xchange to ensure we don't scribble
> +        * over a non-NULL pointer.
> +        */
> +       if (cmpxchg(&fl->dsaddr, NULL, dsaddr) != NULL)
> +               goto out_put;
>  out:
>         return status;
>  out_put:
> --
> 2.13.3
>

No dangling DS after this patch as well.

Question: do we need to worry about that the checks that are done on
the deviceid done after finding one are skipped with this patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 080fc6b278bd..44c638b7876c 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -542,6 +542,10 @@  filelayout_check_deviceid(struct pnfs_layout_hdr *lo,
 	struct nfs4_file_layout_dsaddr *dsaddr;
 	int status = -EINVAL;
 
+	/* Is the deviceid already set? If so, we're good. */
+	if (fl->dsaddr != NULL)
+		return 0;
+
 	/* find and reference the deviceid */
 	d = nfs4_find_get_deviceid(NFS_SERVER(lo->plh_inode), &fl->deviceid,
 			lo->plh_lc_cred, gfp_flags);
@@ -553,8 +557,6 @@  filelayout_check_deviceid(struct pnfs_layout_hdr *lo,
 	if (filelayout_test_devid_unavailable(&dsaddr->id_node))
 		goto out_put;
 
-	fl->dsaddr = dsaddr;
-
 	if (fl->first_stripe_index >= dsaddr->stripe_count) {
 		dprintk("%s Bad first_stripe_index %u\n",
 				__func__, fl->first_stripe_index);
@@ -570,6 +572,13 @@  filelayout_check_deviceid(struct pnfs_layout_hdr *lo,
 		goto out_put;
 	}
 	status = 0;
+
+	/*
+	 * Atomic compare and xchange to ensure we don't scribble
+	 * over a non-NULL pointer.
+	 */
+	if (cmpxchg(&fl->dsaddr, NULL, dsaddr) != NULL)
+		goto out_put;
 out:
 	return status;
 out_put: