From patchwork Sat Sep 30 03:42:45 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Hugh Dickins <hughd@google.com>
X-Patchwork-Id: 13404937
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2FA98E77350
	for <linux-mm@archiver.kernel.org>; Sat, 30 Sep 2023 03:42:52 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 9B65C8D010D; Fri, 29 Sep 2023 23:42:51 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 93E198D002B; Fri, 29 Sep 2023 23:42:51 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 7DEFA8D010D; Fri, 29 Sep 2023 23:42:51 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com
 [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id 6F2F38D002B
	for <linux-mm@kvack.org>; Fri, 29 Sep 2023 23:42:51 -0400 (EDT)
Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id 37AD012068B
	for <linux-mm@kvack.org>; Sat, 30 Sep 2023 03:42:51 +0000 (UTC)
X-FDA: 81291867342.19.32FC7A3
Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com
 [209.85.128.175])
	by imf05.hostedemail.com (Postfix) with ESMTP id 79205100006
	for <linux-mm@kvack.org>; Sat, 30 Sep 2023 03:42:49 +0000 (UTC)
Authentication-Results: imf05.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=b+rDDOu9;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates
 209.85.128.175 as permitted sender) smtp.mailfrom=hughd@google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1696045369;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=2x6fH+w1PdDZvpTev5khLvy5t3y2+BALqf/PDla7s+s=;
	b=DT1awXWr9m5p15pakJd03UsIxR6uD2OJDgZKmM77a67Ve1nSi6GsvrBCh75lU34P5tZdO0
	+JVmNSPHZ84LTS+0tgtjb4a8AAmuYsryqhlWnJcVVk7uOI0GLsnrB7XTH13kyz67H5DnH5
	20T5MLNmpRtuo2ithe9kpBAu32DqcbA=
ARC-Authentication-Results: i=1;
	imf05.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=b+rDDOu9;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates
 209.85.128.175 as permitted sender) smtp.mailfrom=hughd@google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696045369; a=rsa-sha256;
	cv=none;
	b=GoP3sZT3CiVb8XLdrmygUUSxOhdCD6DFIjLZSZsA/9aHrnWMdJ9+tp4a8N9QmnTFgszF7t
	Dr8V35Q0Qj3f0Ft5ZKfpeTXMCX6elL6XBz1ikNiphWt6deNhQ4OpQJcbNqwMq4Dy1iXNWT
	tTlhSzupiQdFMXt1iv8w29QuQmEKJqo=
Received: by mail-yw1-f175.google.com with SMTP id
 00721157ae682-5a2536adaf3so5446207b3.2
        for <linux-mm@kvack.org>; Fri, 29 Sep 2023 20:42:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1696045368; x=1696650168; darn=kvack.org;
        h=mime-version:references:message-id:in-reply-to:subject:cc:to:from
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=2x6fH+w1PdDZvpTev5khLvy5t3y2+BALqf/PDla7s+s=;
        b=b+rDDOu9Xfl3owdybsbdCoiEeYKnYSU4N8zrxMI0/MWxdHt9Gc8Zw0Qgb3ZxhW1q/B
         1w3/6Oixtop8KsQIqkL03fp6aKz7Uph9P1NZkhVkWuiNDzSnJb7g0zPZ6YGN8X/GciCd
         ELab8255+4bM99QcFOk19G9QdVmDHOQDymmzVK9lJedaWWJZ3kfJG1I9WdMazLaru8NO
         LM+zQGWyrgdsKJ5Ylt5oRupdodYSAa/2L16FAQMeXnXZO8Peu36mzP7HB1krnjfikxjT
         EISoGvzalreGKj03wHUJSQzuOU7EsE9mnA9GMJITsD8lQWjQso01BjEh2zt0FgNkatE4
         cDzg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1696045368; x=1696650168;
        h=mime-version:references:message-id:in-reply-to:subject:cc:to:from
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=2x6fH+w1PdDZvpTev5khLvy5t3y2+BALqf/PDla7s+s=;
        b=jnDVIZc312Hi6AlMhnB1cRLTUlokIPaEjJpEDpzLR8eb9kTI66BPwZwbz5Nsa1Om1u
         4EYDOpHf6/+cIgggPohBS7dSmLi7EwMwtUkzvTfF2qvIyFH0e+Wt940GvYZfBYwS3/kN
         n5DvlPXXDFCJCE+9kVSNzXiXSFvRcTzPhmBK6Kxu2OfINwMCa2bCY0UTE5tRN+hXmgUU
         QyunXm7U1ckGwcq9PE6VONBvlGw8aPaymaJbkOx0ZswYXkWriZmusH84QDiycq84b7qW
         5j9bxmA4SE+zWkT/qho8fzeEf768/yZzKXy+ylp0LwJiVlVnqbfgR7e8C4TdYlJ2HO5N
         WsXA==
X-Gm-Message-State: AOJu0YzPL9VZlSa4Mfpg9+cuW1GKFtsNS6YDUqVIqLtX9W0BzdMgb0pC
	WxaDGu1TUssdLKhshlsmfHJ0+A==
X-Google-Smtp-Source: 
 AGHT+IHb14+lcWYvYpZ5abpN3vaue9DpNWsOHa4WiXVhBeSyALADDXCj4J3VWtqEBUCWVKYnyTLA1A==
X-Received: by 2002:a81:a24e:0:b0:59b:5170:a0f3 with SMTP id
 z14-20020a81a24e000000b0059b5170a0f3mr5957990ywg.36.1696045368515;
        Fri, 29 Sep 2023 20:42:48 -0700 (PDT)
Received: from ripple.attlocal.net
 (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147])
        by smtp.gmail.com with ESMTPSA id
 f184-20020a0dc3c1000000b0059ae483b89dsm5983309ywd.50.2023.09.29.20.42.46
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 29 Sep 2023 20:42:47 -0700 (PDT)
Date: Fri, 29 Sep 2023 20:42:45 -0700 (PDT)
From: Hugh Dickins <hughd@google.com>
X-X-Sender: hugh@ripple.attlocal.net
To: Andrew Morton <akpm@linux-foundation.org>
cc: Tim Chen <tim.c.chen@intel.com>, Dave Chinner <dchinner@redhat.com>,
    "Darrick J. Wong" <djwong@kernel.org>,
    Christian Brauner <brauner@kernel.org>, Carlos Maiolino <cem@kernel.org>,
    Chuck Lever <chuck.lever@oracle.com>, Jan Kara <jack@suse.cz>,
    Matthew Wilcox <willy@infradead.org>,
 Johannes Weiner <hannes@cmpxchg.org>,
    Axel Rasmussen <axelrasmussen@google.com>, linux-fsdevel@vger.kernel.org,
    linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 8/8] shmem,percpu_counter: add _limited_add(fbc, limit,
 amount)
In-Reply-To: <c7441dc6-f3bb-dd60-c670-9f5cbd9f266@google.com>
Message-ID: <bb817848-2d19-bcc8-39ca-ea179af0f0b4@google.com>
References: <c7441dc6-f3bb-dd60-c670-9f5cbd9f266@google.com>
MIME-Version: 1.0
X-Rspamd-Queue-Id: 79205100006
X-Rspam-User: 
X-Rspamd-Server: rspam04
X-Stat-Signature: io44oagywsqum15ssnghd736natdkwnk
X-HE-Tag: 1696045369-122912
X-HE-Meta: 
 U2FsdGVkX193/uiRFYOgGeR49sQ68h4KsZmVF6tiVmaEZbM761JrRNSbs3N2koZeSMUGCg4NQm1wsUq6cU7txhsUAZx2IZW66PoECVdTBaJFJH3zs9CjZL9UGWSQkU0TF8HQlucXahpqAX7rA7Zx9FEtl5Tyoz3w8FqOShy0ld0mixIC7pPMt1QIsfrU86zjqJ5CWgPiSJBbyq3KM7s8KCjyWzjtBKGm2JwAzGNpWoekpIrck88/iV0zqlAGBKDK0vgH2ZBgICGzuL9SVYz2HqIAJzNwSCDewwXUSS+NK8rTe5PwSTqf0Zd4KLW8kppK1dsStxy8SmjlQ2Msm7zDfo7ARNDaP7y9bXQC+fP2WAAhI1cgOMJOtcg35X+1Ituo498bV2UkU8yyhvg/flBEGTS+GYEy+n28Uek8bM4JqCAfIBIZRkTdspELwsHLTkUWx00kYy0jSPzWqOHDrN03RbeuUiE4x1R5cwrInDOqH4WR+nr1iYUxbqkRRYSpQFFQW+cyuWZTqLbrlWP0O1XJBdSzWJDgjPFQkh/a6a6uafJZ+9bEZkn/BkGaUEeWTDNCiZmzQgTuoFff2RXxWaljgqY15Dc71Mg7g5lKzNB3F6Gy8eXOXsuFlsBnnPZFuiUoZ58PUl9DhyYeCCwmQJjaBSO3ZUYj7sxNDPBsXcJXGBwL1WcXi18sEsOHSA8EWbZCyAWJXbUWI2cs0iCu8K9oVGSk3I4XmkkOVkrmjpcRDBqF88bhZRgH1L+Pc2kMS7z69ks+qeN7zd2uGI+AOo75ux10/8Tieqm9XtW1dhAjTAM5IdAnhp2Z6acQeg7XdLLd9MApdh+deSMlyrgfBjn5laMZ/wVn7Tnj4Cuqai/Uj2V28Mh3qyUjg5+i6SSP2atup0/Hdq1jAo7TyAIJD7u+I/uiF7sOOjsXKbSf6mZBvImWn5tgf2bi6oUeg/kLohsWk6UM6U5aPddHvR+pJTI
 RX2ZVMxT
 LM8sILLEhNAMWvVo6Z4OTBczHQwW3nQszkaCaxIYtrm9cgnjNWVLpq8DU7Hj7cCdY1h0TiXBrl7fXuWpENeT6jAKjxJOT+nlqcyOpSqvWdEJmflQ+HaETCiiSqzfq+EKn+jAZYc+ygm+/4HOSWEP/x9iPVqgS5HY7S5bc2ZbAGbkvZjLh2KsK3GivcRhd8nl4BUQDyYGc8fT+daYweNurDRevXxGhgXXS8hVfDS84ui3gyuCPQr+rIY14iRN+T3Qi5AvuxVjTSdarfMzFqP+XPvXvVZs1r0BCES8MMM7ftJj3P3KnzwCSmnW8/gA/zbYk628dHO9PP0g6ymZuf4jvkZuhwPAMhYDCuownNeW/oF1vosWCuJ1+6KgOG+/eqDENT3mRuq52iTg3PsSsFDyk0JyBghYcvutoIKvZ0kZX0wLazQZZh6rdvtsbjzXdULT9NQj7Vzuth0sVFhUp4iHQJuloHx6NZuHfmtaXoUbIOUm/TB3oOBfF5IF2ZA9SfWxkX81aNzOyqJ1r4yRGkcrhwS/VWcFpbgYrMi+D
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Percpu counter's compare and add are separate functions: without locking
around them (which would defeat their purpose), it has been possible to
overflow the intended limit.  Imagine all the other CPUs fallocating
tmpfs huge pages to the limit, in between this CPU's compare and its add.

I have not seen reports of that happening; but tmpfs's recent addition
of dquot_alloc_block_nodirty() in between the compare and the add makes
it even more likely, and I'd be uncomfortable to leave it unfixed.

Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it.

I believe this implementation is correct, and slightly more efficient
than the combination of compare and add (taking the lock once rather
than twice when nearing full - the last 128MiB of a tmpfs volume on a
machine with 128 CPUs and 4KiB pages); but it does beg for a better
design - when nearing full, there is no new batching, but the costly
percpu counter sum across CPUs still has to be done, while locked.

Follow __percpu_counter_sum()'s example, including cpu_dying_mask as
well as cpu_online_mask: but shouldn't __percpu_counter_compare() and
__percpu_counter_limited_add() then be adding a num_dying_cpus() to
num_online_cpus(), when they calculate the maximum which could be held
across CPUs?  But the times when it matters would be vanishingly rare.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
---
Tim, Dave, Darrick: I didn't want to waste your time on patches 1-7,
which are just internal to shmem, and do not affect this patch (which
applies to v6.6-rc and linux-next as is): but want to run this by you.

 include/linux/percpu_counter.h | 23 +++++++++++++++
 lib/percpu_counter.c           | 53 ++++++++++++++++++++++++++++++++++
 mm/shmem.c                     | 10 +++----
 3 files changed, 81 insertions(+), 5 deletions(-)

diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index d01351b1526f..8cb7c071bd5c 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -57,6 +57,8 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
 			      s32 batch);
 s64 __percpu_counter_sum(struct percpu_counter *fbc);
 int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
+bool __percpu_counter_limited_add(struct percpu_counter *fbc, s64 limit,
+				  s64 amount, s32 batch);
 void percpu_counter_sync(struct percpu_counter *fbc);
 
 static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
@@ -69,6 +71,13 @@ static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount)
 	percpu_counter_add_batch(fbc, amount, percpu_counter_batch);
 }
 
+static inline bool
+percpu_counter_limited_add(struct percpu_counter *fbc, s64 limit, s64 amount)
+{
+	return __percpu_counter_limited_add(fbc, limit, amount,
+					    percpu_counter_batch);
+}
+
 /*
  * With percpu_counter_add_local() and percpu_counter_sub_local(), counts
  * are accumulated in local per cpu counter and not in fbc->count until
@@ -185,6 +194,20 @@ percpu_counter_add(struct percpu_counter *fbc, s64 amount)
 	local_irq_restore(flags);
 }
 
+static inline bool
+percpu_counter_limited_add(struct percpu_counter *fbc, s64 limit, s64 amount)
+{
+	unsigned long flags;
+	s64 count;
+
+	local_irq_save(flags);
+	count = fbc->count + amount;
+	if (count <= limit)
+		fbc->count = count;
+	local_irq_restore(flags);
+	return count <= limit;
+}
+
 /* non-SMP percpu_counter_add_local is the same with percpu_counter_add */
 static inline void
 percpu_counter_add_local(struct percpu_counter *fbc, s64 amount)
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 9073430dc865..58a3392f471b 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -278,6 +278,59 @@ int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
 }
 EXPORT_SYMBOL(__percpu_counter_compare);
 
+/*
+ * Compare counter, and add amount if the total is within limit.
+ * Return true if amount was added, false if it would exceed limit.
+ */
+bool __percpu_counter_limited_add(struct percpu_counter *fbc,
+				  s64 limit, s64 amount, s32 batch)
+{
+	s64 count;
+	s64 unknown;
+	unsigned long flags;
+	bool good;
+
+	if (amount > limit)
+		return false;
+
+	local_irq_save(flags);
+	unknown = batch * num_online_cpus();
+	count = __this_cpu_read(*fbc->counters);
+
+	/* Skip taking the lock when safe */
+	if (abs(count + amount) <= batch &&
+	    fbc->count + unknown <= limit) {
+		this_cpu_add(*fbc->counters, amount);
+		local_irq_restore(flags);
+		return true;
+	}
+
+	raw_spin_lock(&fbc->lock);
+	count = fbc->count + amount;
+
+	/* Skip percpu_counter_sum() when safe */
+	if (count + unknown > limit) {
+		s32 *pcount;
+		int cpu;
+
+		for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
+			pcount = per_cpu_ptr(fbc->counters, cpu);
+			count += *pcount;
+		}
+	}
+
+	good = count <= limit;
+	if (good) {
+		count = __this_cpu_read(*fbc->counters);
+		fbc->count += count + amount;
+		__this_cpu_sub(*fbc->counters, count);
+	}
+
+	raw_spin_unlock(&fbc->lock);
+	local_irq_restore(flags);
+	return good;
+}
+
 static int __init percpu_counter_startup(void)
 {
 	int ret;
diff --git a/mm/shmem.c b/mm/shmem.c
index 4f4ab26bc58a..7cb72c747954 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -217,15 +217,15 @@ static int shmem_inode_acct_blocks(struct inode *inode, long pages)
 
 	might_sleep();	/* when quotas */
 	if (sbinfo->max_blocks) {
-		if (percpu_counter_compare(&sbinfo->used_blocks,
-					   sbinfo->max_blocks - pages) > 0)
+		if (!percpu_counter_limited_add(&sbinfo->used_blocks,
+						sbinfo->max_blocks, pages))
 			goto unacct;
 
 		err = dquot_alloc_block_nodirty(inode, pages);
-		if (err)
+		if (err) {
+			percpu_counter_sub(&sbinfo->used_blocks, pages);
 			goto unacct;
-
-		percpu_counter_add(&sbinfo->used_blocks, pages);
+		}
 	} else {
 		err = dquot_alloc_block_nodirty(inode, pages);
 		if (err)