From patchwork Tue Aug 30 11:48:23 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jeff Layton <jlayton@redhat.com>
X-Patchwork-Id: 9305263
Return-Path: <linux-nfs-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	F012760756 for <patchwork-linux-nfs@patchwork.kernel.org>;
	Tue, 30 Aug 2016 11:48:29 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E13B928B75
	for <patchwork-linux-nfs@patchwork.kernel.org>;
	Tue, 30 Aug 2016 11:48:29 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id D5E1328B8C; Tue, 30 Aug 2016 11:48:29 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B670B28B75
	for <patchwork-linux-nfs@patchwork.kernel.org>;
	Tue, 30 Aug 2016 11:48:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1758297AbcH3Ls1 (ORCPT
	<rfc822;patchwork-linux-nfs@patchwork.kernel.org>);
	Tue, 30 Aug 2016 07:48:27 -0400
Received: from mx1.redhat.com ([209.132.183.28]:53392 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1758156AbcH3Ls0 (ORCPT <rfc822;linux-nfs@vger.kernel.org>);
	Tue, 30 Aug 2016 07:48:26 -0400
Received: from int-mx10.intmail.prod.int.phx2.redhat.com
	(int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 1BB38387B1B;
	Tue, 30 Aug 2016 11:48:25 +0000 (UTC)
Received: from tlielax.poochiereds.net (ovpn-116-28.rdu2.redhat.com
	[10.10.116.28])
	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	ESMTP id u7UBmNIF031086; Tue, 30 Aug 2016 07:48:24 -0400
From: Jeff Layton <jlayton@redhat.com>
To: bfields@fieldses.org
Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	Olaf Hering <olaf@aepfle.de>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH] nfsd: more robust allocation failure handling in
	nfsd_reply_cache_init
Date: Tue, 30 Aug 2016 07:48:23 -0400
Message-Id: <1472557703-5985-1-git-send-email-jlayton@redhat.com>
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.5.110.29]);
	Tue, 30 Aug 2016 11:48:25 +0000 (UTC)
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-nfs.vger.kernel.org>
X-Mailing-List: linux-nfs@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Currently, we try to allocate the cache as a single, large chunk, which
can fail if no big chunks of memory are available. We _do_ try to size
it according to the amount of memory in the box, but if the server is
started well after boot time, then the allocation can fail due to memory
fragmentation.

Try to handle this more gracefully by cutting the max_drc_entries in
half and then retrying if the allocation fails. Only give up if the
failed allocation is smaller than a page.

Reported-by: Olaf Hering <olaf@aepfle.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfscache.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

While this would be good to get in, I don't see any particular urgency
here. This seems like it'd be reasonable for v4.9.

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 54cde9a5864e..b8aaa7a71412 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -155,14 +155,12 @@ nfsd_reply_cache_free(struct nfsd_drc_bucket *b, struct svc_cacherep *rp)
 
 int nfsd_reply_cache_init(void)
 {
-	unsigned int hashsize;
+	unsigned int hashsize, target_hashsize;
 	unsigned int i;
 	int status = 0;
 
 	max_drc_entries = nfsd_cache_size_limit();
 	atomic_set(&num_drc_entries, 0);
-	hashsize = nfsd_hashsize(max_drc_entries);
-	maskbits = ilog2(hashsize);
 
 	status = register_shrinker(&nfsd_reply_cache_shrinker);
 	if (status)
@@ -173,9 +171,30 @@ int nfsd_reply_cache_init(void)
 	if (!drc_slab)
 		goto out_nomem;
 
-	drc_hashtbl = kcalloc(hashsize, sizeof(*drc_hashtbl), GFP_KERNEL);
+	/*
+	 * Attempt to allocate the hashtable, and progressively shrink the
+	 * size as the allocations fail. If the allocation size ends up being
+	 * smaller than a page however, then just give up.
+	 */
+	target_hashsize = nfsd_hashsize(max_drc_entries);
+	hashsize = target_hashsize;
+	do {
+		maskbits = ilog2(hashsize);
+		drc_hashtbl = kcalloc(hashsize, sizeof(*drc_hashtbl),
+					GFP_KERNEL|__GFP_NOWARN);
+		if (drc_hashtbl)
+			break;
+		max_drc_entries /= 2;
+		hashsize = nfsd_hashsize(max_drc_entries);
+	} while ((hashsize * sizeof(*drc_hashtbl)) >= PAGE_SIZE);
+
 	if (!drc_hashtbl)
 		goto out_nomem;
+
+	if (hashsize != target_hashsize)
+		pr_warn("NFSD: had to shrink reply cache hashtable (wanted %u, got %u)\n",
+			target_hashsize, hashsize);
+
 	for (i = 0; i < hashsize; i++) {
 		INIT_LIST_HEAD(&drc_hashtbl[i].lru_head);
 		spin_lock_init(&drc_hashtbl[i].cache_lock);