From patchwork Fri Sep 20 06:33:16 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 11153791
Return-Path: <SRS0=tV5x=XP=vger.kernel.org=linux-nfs-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 139C276
	for <patchwork-linux-nfs@patchwork.kernel.org>;
 Fri, 20 Sep 2019 06:33:26 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id E95542080F
	for <patchwork-linux-nfs@patchwork.kernel.org>;
 Fri, 20 Sep 2019 06:33:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2388569AbfITGdZ (ORCPT
        <rfc822;patchwork-linux-nfs@patchwork.kernel.org>);
        Fri, 20 Sep 2019 02:33:25 -0400
Received: from mx2.suse.de ([195.135.220.15]:42586 "EHLO mx1.suse.de"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S2387829AbfITGdZ (ORCPT <rfc822;linux-nfs@vger.kernel.org>);
        Fri, 20 Sep 2019 02:33:25 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
        by mx1.suse.de (Postfix) with ESMTP id B9AA5ADD9;
        Fri, 20 Sep 2019 06:33:22 +0000 (UTC)
From: NeilBrown <neilb@suse.de>
To: "J. Bruce Fields" <bfields@fieldses.org>,
        "J. Bruce Fields" <bfields@redhat.com>
Date: Fri, 20 Sep 2019 16:33:16 +1000
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH 1/2 - vers2] nfsd: handle drc over-allocation gracefully.
In-Reply-To: <877e63a3yg.fsf@notabene.neil.brown.name>
References: <1506345704-9486-1-git-send-email-bfields@redhat.com>
 <1506345704-9486-3-git-send-email-bfields@redhat.com>
 <87d0fx9jph.fsf@notabene.neil.brown.name>
 <20190919162211.GA333@pick.fieldses.org>
 <20190919171730.GB333@pick.fieldses.org>
 <20190919184139.GG26654@fieldses.org>
 <877e63a3yg.fsf@notabene.neil.brown.name>
Message-ID: <8736gra34j.fsf@notabene.neil.brown.name>
MIME-Version: 1.0
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-nfs.vger.kernel.org>
X-Mailing-List: linux-nfs@vger.kernel.org

Currently, if there are more clients than allowed for by the
space allocation in set_max_drc(), we fail a SESSION_CREATE
request with NFS4ERR_DELAY.
This means that the client retries indefinitely, which isn't
a user-friendly response.

The RFC requires NFS4ERR_NOSPC, but that would at best result in a
clean failure on the client, which is not much more friendly.

The current space allocation is a best-guess and doesn't provide any
guarantees, we could still run out of space when trying to allocate
drc space.

So fail more gracefully - always give out at least one slot.
If all clients used all the space in all slots, we might start getting
memory pressure, but that is possible anyway.

So ensure 'num' is always at least 1, and remove the test for it
being zero.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Sorry, but I was confused about clamp_t().
If the 'max' is smaller than the 'min', then the 'max' wins, so
'avail' will never be more than total_avail/3.
I might have believed the comment here instead of the code there.

So the current code cannot overflow, but I think we agree that
failure is not good.  So this patch avoids failure.
NeilBrown

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 7857942c5ca6..084a30d77359 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1570,14 +1570,25 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
 	unsigned long avail, total_avail;
 
 	spin_lock(&nfsd_drc_lock);
-	total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
+	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
+		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
+	else
+		/* We have handed out more space than we chose in
+		 * set_max_drc() to allow.  That isn't really a
+		 * problem as long as that doesn't make us thing we
+		 * have lots more due to integer overflow.
+		 */
+		total_avail = 0;
 	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
 	/*
 	 * Never use more than a third of the remaining memory,
-	 * unless it's the only way to give this client a slot:
+	 * unless it's the only way to give this client a slot.
+	 * Give the client one slot even that would require
+	 * over-allocation, it is better than failure.
 	 */
 	avail = clamp_t(unsigned long, avail, slotsize, total_avail/3);
 	num = min_t(int, num, avail / slotsize);
+	num = max_t(int, num, 1);
 	nfsd_drc_mem_used += num * slotsize;
 	spin_unlock(&nfsd_drc_lock);
 
@@ -3169,10 +3180,10 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
 	 * performance.  When short on memory we therefore prefer to
 	 * decrease number of slots instead of their size.  Clients that
 	 * request larger slots than they need will get poor results:
+	 * Note that we always allow at least one slot, because our
+	 * accounting is soft and provides no guarantees either way.
 	 */
 	ca->maxreqs = nfsd4_get_drc_mem(ca);
-	if (!ca->maxreqs)
-		return nfserr_jukebox;
 
 	return nfs_ok;
 }

From patchwork Fri Sep 20 06:36:45 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 11153815
Return-Path: <SRS0=tV5x=XP=vger.kernel.org=linux-nfs-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 56D3D14E5
	for <patchwork-linux-nfs@patchwork.kernel.org>;
 Fri, 20 Sep 2019 06:36:53 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 3FFB120882
	for <patchwork-linux-nfs@patchwork.kernel.org>;
 Fri, 20 Sep 2019 06:36:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2394927AbfITGgw (ORCPT
        <rfc822;patchwork-linux-nfs@patchwork.kernel.org>);
        Fri, 20 Sep 2019 02:36:52 -0400
Received: from mx2.suse.de ([195.135.220.15]:43670 "EHLO mx1.suse.de"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S2394926AbfITGgw (ORCPT <rfc822;linux-nfs@vger.kernel.org>);
        Fri, 20 Sep 2019 02:36:52 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
        by mx1.suse.de (Postfix) with ESMTP id F0B5DABF6;
        Fri, 20 Sep 2019 06:36:50 +0000 (UTC)
From: NeilBrown <neilb@suse.de>
To: "J. Bruce Fields" <bfields@fieldses.org>,
        "J. Bruce Fields" <bfields@redhat.com>
Date: Fri, 20 Sep 2019 16:36:45 +1000
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH - 2/2] nfsd: degraded slot-count more gracefully as allocation
 nears
 exhaustion.
In-Reply-To: <8736gra34j.fsf@notabene.neil.brown.name>
References: <1506345704-9486-1-git-send-email-bfields@redhat.com>
 <1506345704-9486-3-git-send-email-bfields@redhat.com>
 <87d0fx9jph.fsf@notabene.neil.brown.name>
 <20190919162211.GA333@pick.fieldses.org>
 <20190919171730.GB333@pick.fieldses.org>
 <20190919184139.GG26654@fieldses.org>
 <877e63a3yg.fsf@notabene.neil.brown.name>
 <8736gra34j.fsf@notabene.neil.brown.name>
Message-ID: <87zhiz8oea.fsf@notabene.neil.brown.name>
MIME-Version: 1.0
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-nfs.vger.kernel.org>
X-Mailing-List: linux-nfs@vger.kernel.org

This original code in nfsd4_get_drc_mem() would hand out 30
slots (approximately NFSD_MAX_MEM_PER_SESSION bytes at slightly
over 2K per slot) to each requesting client until it ran out
of space, then it would possibly give one last client a reduced
allocation, then fail the allocation.

Since commit de766e570413 ("nfsd: give out fewer session slots as
limit approaches") the last 90 slots to be given to about 12
clients with quickly reducing slot counts (better than just 3
clients).  This still seems unnecessarily hasty.
A subsequent patch allows over-allocation so every client gets
at least one slot, but that might be a bit restrictive.

The requested number of nfsd threads is the best guide we have to the
expected number of clients, so use that - if it is at least 8.

256 threads on a 256Meg machine - which is a lot for a tiny machine -
would result in nfsd_drc_max_mem being 2Meg, so 8K (3 slots) would be
available for the first client, and over 200 clients would get more
than 1 slot.  So I don't think this change will be too debilitating on
poorly configured machines, though it does mean that a sensible
configuration is a little more important.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 084a30d77359..c49af2ba3924 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1563,11 +1563,12 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
  * re-negotiate active sessions and reduce their slot usage to make
  * room for new connections. For now we just fail the create session.
  */
-static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
+static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
 {
 	u32 slotsize = slot_bytes(ca);
 	u32 num = ca->maxreqs;
 	unsigned long avail, total_avail;
+	unsigned int scale_factor;
 
 	spin_lock(&nfsd_drc_lock);
 	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
@@ -1581,12 +1582,18 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
 		total_avail = 0;
 	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
 	/*
-	 * Never use more than a third of the remaining memory,
+	 * Never use more than a fraction of the remaining memory,
 	 * unless it's the only way to give this client a slot.
+	 * The chosen fraction is either 1/8 or 1/number of threads,
+	 * whichever is smaller.  This ensures there are adequate
+	 * slots to support multiple clients per thread.
 	 * Give the client one slot even that would require
 	 * over-allocation, it is better than failure.
 	 */
-	avail = clamp_t(unsigned long, avail, slotsize, total_avail/3);
+	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
+
+	avail = clamp_t(unsigned long, avail, slotsize,
+			total_avail/scale_factor);
 	num = min_t(int, num, avail / slotsize);
 	num = max_t(int, num, 1);
 	nfsd_drc_mem_used += num * slotsize;
@@ -3183,7 +3190,7 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
 	 * Note that we always allow at least one slot, because our
 	 * accounting is soft and provides no guarantees either way.
 	 */
-	ca->maxreqs = nfsd4_get_drc_mem(ca);
+	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
 
 	return nfs_ok;
 }