diff mbox

[4/4] NFSv4.1: Don't cache deviceids that have no notifications

Message ID 1425931503-37261-5-git-send-email-trond.myklebust@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust March 9, 2015, 8:05 p.m. UTC
The spec says that once all layouts that reference a given deviceid
have been returned, then we are only allowed to continue to cache
the deviceid if the metadata server supports notifications.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/nfs4proc.c | 2 ++
 fs/nfs/pnfs.h     | 2 ++
 fs/nfs/pnfs_dev.c | 9 +++++++++
 3 files changed, 13 insertions(+)

Comments

Christoph Hellwig March 17, 2015, 10:37 a.m. UTC | #1
On Mon, Mar 09, 2015 at 04:05:03PM -0400, Trond Myklebust wrote:
> The spec says that once all layouts that reference a given deviceid
> have been returned, then we are only allowed to continue to cache
> the deviceid if the metadata server supports notifications.

This causes massive performance issues for object and block layout
servers where a GETDEVICEINFO (or rather the client processing of it)
is expensive.  Also it increases the deadlock potential as the
GETDEVICEINFO generally isn't safe for writeback under memory pressure
(and yes, we'll need more fixes in that area).

I've also filed an errata a while ago to update the language in the spec
in this area to be consistent and not enforce this behavior:

http://www.rfc-editor.org/errata_search.php?rfc=5661&eid=4119

I think the right fix is to have a shrinker that allows the nfs client
to retire unused devices on a lru basis under memory pressure.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust March 22, 2015, 7:27 p.m. UTC | #2
On Tue, Mar 17, 2015 at 6:37 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Mar 09, 2015 at 04:05:03PM -0400, Trond Myklebust wrote:
>> The spec says that once all layouts that reference a given deviceid
>> have been returned, then we are only allowed to continue to cache
>> the deviceid if the metadata server supports notifications.
>
> This causes massive performance issues for object and block layout
> servers where a GETDEVICEINFO (or rather the client processing of it)
> is expensive.  Also it increases the deadlock potential as the
> GETDEVICEINFO generally isn't safe for writeback under memory pressure
> (and yes, we'll need more fixes in that area).
>
> I've also filed an errata a while ago to update the language in the spec
> in this area to be consistent and not enforce this behavior:
>
> http://www.rfc-editor.org/errata_search.php?rfc=5661&eid=4119
>
> I think the right fix is to have a shrinker that allows the nfs client
> to retire unused devices on a lru basis under memory pressure.

No. The right fix is to make the server support device notifications.
Christoph Hellwig March 25, 2015, 8:55 a.m. UTC | #3
On Sun, Mar 22, 2015 at 03:27:32PM -0400, Trond Myklebust wrote:
> No. The right fix is to make the server support device notifications.

And you instantly thrash performane on any server that doesn't, which
is why I sent out the errata that no one objected out.  I'm happy to
open this again on the WG list, but the notification scheme is such a
big hammer that it absolutely isn't worth implementing.  Given how
GETDEVICELIST is specified in 4.1 the original RFC language also is
inconsistent with itself, so that's not an argument for the old
behavior.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ba8b2b5e98a1..9ff8c63c9cac 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7962,6 +7962,8 @@  _nfs4_proc_getdeviceinfo(struct nfs_server *server,
 	status = nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0);
 	if (res.notification & ~args.notify_types)
 		dprintk("%s: unsupported notification\n", __func__);
+	if (res.notification != args.notify_types)
+		pdev->nocache = 1;
 
 	dprintk("<-- %s status=%d\n", __func__, status);
 
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index a1fc16c971a7..b5654e8da936 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -203,6 +203,7 @@  struct pnfs_device {
 	struct page **pages;
 	unsigned int  pgbase;
 	unsigned int  pglen;	/* reply buffer length */
+	unsigned char nocache : 1;/* May not be cached */
 };
 
 #define NFS4_PNFS_GETDEVLIST_MAXNUM 16
@@ -291,6 +292,7 @@  void pnfs_error_mark_layout_for_return(struct inode *inode,
 enum {
 	NFS_DEVICEID_INVALID = 0,       /* set when MDS clientid recalled */
 	NFS_DEVICEID_UNAVAILABLE,	/* device temporarily unavailable */
+	NFS_DEVICEID_NOCACHE,		/* device may not be cached */
 };
 
 /* pnfs_dev.c */
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index 7e07f4ba4822..2961fcd7a2df 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -149,6 +149,8 @@  nfs4_get_device_info(struct nfs_server *server,
 	 */
 	d = server->pnfs_curr_ld->alloc_deviceid_node(server, pdev,
 			gfp_flags);
+	if (d && pdev->nocache)
+		set_bit(NFS_DEVICEID_NOCACHE, &d->flags);
 
 out_free_pages:
 	for (i = 0; i < max_pages; i++)
@@ -235,6 +237,7 @@  nfs4_delete_deviceid(const struct pnfs_layoutdriver_type *ld,
 		return;
 	}
 	hlist_del_init_rcu(&d->node);
+	clear_bit(NFS_DEVICEID_NOCACHE, &d->flags);
 	spin_unlock(&nfs4_deviceid_lock);
 
 	/* balance the initial ref set in pnfs_insert_deviceid */
@@ -269,6 +272,11 @@  EXPORT_SYMBOL_GPL(nfs4_init_deviceid_node);
 bool
 nfs4_put_deviceid_node(struct nfs4_deviceid_node *d)
 {
+	if (test_bit(NFS_DEVICEID_NOCACHE, &d->flags)) {
+		if (atomic_add_unless(&d->ref, -1, 2))
+			return false;
+		nfs4_delete_deviceid(d->ld, d->nfs_client, &d->deviceid);
+	}
 	if (!atomic_dec_and_test(&d->ref))
 		return false;
 	d->ld->free_deviceid_node(d);
@@ -312,6 +320,7 @@  _deviceid_purge_client(const struct nfs_client *clp, long hash)
 		if (d->nfs_client == clp && atomic_read(&d->ref)) {
 			hlist_del_init_rcu(&d->node);
 			hlist_add_head(&d->tmpnode, &tmp);
+			clear_bit(NFS_DEVICEID_NOCACHE, &d->flags);
 		}
 	rcu_read_unlock();
 	spin_unlock(&nfs4_deviceid_lock);