From patchwork Sat Sep 30 03:42:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13404937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FA98E77350 for ; Sat, 30 Sep 2023 03:42:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B65C8D010D; Fri, 29 Sep 2023 23:42:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 93E198D002B; Fri, 29 Sep 2023 23:42:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DEFA8D010D; Fri, 29 Sep 2023 23:42:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6F2F38D002B for ; Fri, 29 Sep 2023 23:42:51 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 37AD012068B for ; Sat, 30 Sep 2023 03:42:51 +0000 (UTC) X-FDA: 81291867342.19.32FC7A3 Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) by imf05.hostedemail.com (Postfix) with ESMTP id 79205100006 for ; Sat, 30 Sep 2023 03:42:49 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=b+rDDOu9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696045369; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2x6fH+w1PdDZvpTev5khLvy5t3y2+BALqf/PDla7s+s=; b=DT1awXWr9m5p15pakJd03UsIxR6uD2OJDgZKmM77a67Ve1nSi6GsvrBCh75lU34P5tZdO0 +JVmNSPHZ84LTS+0tgtjb4a8AAmuYsryqhlWnJcVVk7uOI0GLsnrB7XTH13kyz67H5DnH5 20T5MLNmpRtuo2ithe9kpBAu32DqcbA= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=b+rDDOu9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696045369; a=rsa-sha256; cv=none; b=GoP3sZT3CiVb8XLdrmygUUSxOhdCD6DFIjLZSZsA/9aHrnWMdJ9+tp4a8N9QmnTFgszF7t Dr8V35Q0Qj3f0Ft5ZKfpeTXMCX6elL6XBz1ikNiphWt6deNhQ4OpQJcbNqwMq4Dy1iXNWT tTlhSzupiQdFMXt1iv8w29QuQmEKJqo= Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-5a2536adaf3so5446207b3.2 for ; Fri, 29 Sep 2023 20:42:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696045368; x=1696650168; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2x6fH+w1PdDZvpTev5khLvy5t3y2+BALqf/PDla7s+s=; b=b+rDDOu9Xfl3owdybsbdCoiEeYKnYSU4N8zrxMI0/MWxdHt9Gc8Zw0Qgb3ZxhW1q/B 1w3/6Oixtop8KsQIqkL03fp6aKz7Uph9P1NZkhVkWuiNDzSnJb7g0zPZ6YGN8X/GciCd ELab8255+4bM99QcFOk19G9QdVmDHOQDymmzVK9lJedaWWJZ3kfJG1I9WdMazLaru8NO LM+zQGWyrgdsKJ5Ylt5oRupdodYSAa/2L16FAQMeXnXZO8Peu36mzP7HB1krnjfikxjT EISoGvzalreGKj03wHUJSQzuOU7EsE9mnA9GMJITsD8lQWjQso01BjEh2zt0FgNkatE4 cDzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696045368; x=1696650168; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2x6fH+w1PdDZvpTev5khLvy5t3y2+BALqf/PDla7s+s=; b=jnDVIZc312Hi6AlMhnB1cRLTUlokIPaEjJpEDpzLR8eb9kTI66BPwZwbz5Nsa1Om1u 4EYDOpHf6/+cIgggPohBS7dSmLi7EwMwtUkzvTfF2qvIyFH0e+Wt940GvYZfBYwS3/kN n5DvlPXXDFCJCE+9kVSNzXiXSFvRcTzPhmBK6Kxu2OfINwMCa2bCY0UTE5tRN+hXmgUU QyunXm7U1ckGwcq9PE6VONBvlGw8aPaymaJbkOx0ZswYXkWriZmusH84QDiycq84b7qW 5j9bxmA4SE+zWkT/qho8fzeEf768/yZzKXy+ylp0LwJiVlVnqbfgR7e8C4TdYlJ2HO5N WsXA== X-Gm-Message-State: AOJu0YzPL9VZlSa4Mfpg9+cuW1GKFtsNS6YDUqVIqLtX9W0BzdMgb0pC WxaDGu1TUssdLKhshlsmfHJ0+A== X-Google-Smtp-Source: AGHT+IHb14+lcWYvYpZ5abpN3vaue9DpNWsOHa4WiXVhBeSyALADDXCj4J3VWtqEBUCWVKYnyTLA1A== X-Received: by 2002:a81:a24e:0:b0:59b:5170:a0f3 with SMTP id z14-20020a81a24e000000b0059b5170a0f3mr5957990ywg.36.1696045368515; Fri, 29 Sep 2023 20:42:48 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id f184-20020a0dc3c1000000b0059ae483b89dsm5983309ywd.50.2023.09.29.20.42.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 20:42:47 -0700 (PDT) Date: Fri, 29 Sep 2023 20:42:45 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Tim Chen , Dave Chinner , "Darrick J. Wong" , Christian Brauner , Carlos Maiolino , Chuck Lever , Jan Kara , Matthew Wilcox , Johannes Weiner , Axel Rasmussen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 8/8] shmem,percpu_counter: add _limited_add(fbc, limit, amount) In-Reply-To: Message-ID: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: 79205100006 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: io44oagywsqum15ssnghd736natdkwnk X-HE-Tag: 1696045369-122912 X-HE-Meta: U2FsdGVkX193/uiRFYOgGeR49sQ68h4KsZmVF6tiVmaEZbM761JrRNSbs3N2koZeSMUGCg4NQm1wsUq6cU7txhsUAZx2IZW66PoECVdTBaJFJH3zs9CjZL9UGWSQkU0TF8HQlucXahpqAX7rA7Zx9FEtl5Tyoz3w8FqOShy0ld0mixIC7pPMt1QIsfrU86zjqJ5CWgPiSJBbyq3KM7s8KCjyWzjtBKGm2JwAzGNpWoekpIrck88/iV0zqlAGBKDK0vgH2ZBgICGzuL9SVYz2HqIAJzNwSCDewwXUSS+NK8rTe5PwSTqf0Zd4KLW8kppK1dsStxy8SmjlQ2Msm7zDfo7ARNDaP7y9bXQC+fP2WAAhI1cgOMJOtcg35X+1Ituo498bV2UkU8yyhvg/flBEGTS+GYEy+n28Uek8bM4JqCAfIBIZRkTdspELwsHLTkUWx00kYy0jSPzWqOHDrN03RbeuUiE4x1R5cwrInDOqH4WR+nr1iYUxbqkRRYSpQFFQW+cyuWZTqLbrlWP0O1XJBdSzWJDgjPFQkh/a6a6uafJZ+9bEZkn/BkGaUEeWTDNCiZmzQgTuoFff2RXxWaljgqY15Dc71Mg7g5lKzNB3F6Gy8eXOXsuFlsBnnPZFuiUoZ58PUl9DhyYeCCwmQJjaBSO3ZUYj7sxNDPBsXcJXGBwL1WcXi18sEsOHSA8EWbZCyAWJXbUWI2cs0iCu8K9oVGSk3I4XmkkOVkrmjpcRDBqF88bhZRgH1L+Pc2kMS7z69ks+qeN7zd2uGI+AOo75ux10/8Tieqm9XtW1dhAjTAM5IdAnhp2Z6acQeg7XdLLd9MApdh+deSMlyrgfBjn5laMZ/wVn7Tnj4Cuqai/Uj2V28Mh3qyUjg5+i6SSP2atup0/Hdq1jAo7TyAIJD7u+I/uiF7sOOjsXKbSf6mZBvImWn5tgf2bi6oUeg/kLohsWk6UM6U5aPddHvR+pJTI RX2ZVMxT LM8sILLEhNAMWvVo6Z4OTBczHQwW3nQszkaCaxIYtrm9cgnjNWVLpq8DU7Hj7cCdY1h0TiXBrl7fXuWpENeT6jAKjxJOT+nlqcyOpSqvWdEJmflQ+HaETCiiSqzfq+EKn+jAZYc+ygm+/4HOSWEP/x9iPVqgS5HY7S5bc2ZbAGbkvZjLh2KsK3GivcRhd8nl4BUQDyYGc8fT+daYweNurDRevXxGhgXXS8hVfDS84ui3gyuCPQr+rIY14iRN+T3Qi5AvuxVjTSdarfMzFqP+XPvXvVZs1r0BCES8MMM7ftJj3P3KnzwCSmnW8/gA/zbYk628dHO9PP0g6ymZuf4jvkZuhwPAMhYDCuownNeW/oF1vosWCuJ1+6KgOG+/eqDENT3mRuq52iTg3PsSsFDyk0JyBghYcvutoIKvZ0kZX0wLazQZZh6rdvtsbjzXdULT9NQj7Vzuth0sVFhUp4iHQJuloHx6NZuHfmtaXoUbIOUm/TB3oOBfF5IF2ZA9SfWxkX81aNzOyqJ1r4yRGkcrhwS/VWcFpbgYrMi+D X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add. I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed. Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked. Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare. Signed-off-by: Hugh Dickins Cc: Tim Chen Cc: Dave Chinner Cc: Darrick J. Wong Reviewed-by: Jan Kara --- Tim, Dave, Darrick: I didn't want to waste your time on patches 1-7, which are just internal to shmem, and do not affect this patch (which applies to v6.6-rc and linux-next as is): but want to run this by you. include/linux/percpu_counter.h | 23 +++++++++++++++ lib/percpu_counter.c | 53 ++++++++++++++++++++++++++++++++++ mm/shmem.c | 10 +++---- 3 files changed, 81 insertions(+), 5 deletions(-) diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index d01351b1526f..8cb7c071bd5c 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -57,6 +57,8 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch); s64 __percpu_counter_sum(struct percpu_counter *fbc); int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch); +bool __percpu_counter_limited_add(struct percpu_counter *fbc, s64 limit, + s64 amount, s32 batch); void percpu_counter_sync(struct percpu_counter *fbc); static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs) @@ -69,6 +71,13 @@ static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount) percpu_counter_add_batch(fbc, amount, percpu_counter_batch); } +static inline bool +percpu_counter_limited_add(struct percpu_counter *fbc, s64 limit, s64 amount) +{ + return __percpu_counter_limited_add(fbc, limit, amount, + percpu_counter_batch); +} + /* * With percpu_counter_add_local() and percpu_counter_sub_local(), counts * are accumulated in local per cpu counter and not in fbc->count until @@ -185,6 +194,20 @@ percpu_counter_add(struct percpu_counter *fbc, s64 amount) local_irq_restore(flags); } +static inline bool +percpu_counter_limited_add(struct percpu_counter *fbc, s64 limit, s64 amount) +{ + unsigned long flags; + s64 count; + + local_irq_save(flags); + count = fbc->count + amount; + if (count <= limit) + fbc->count = count; + local_irq_restore(flags); + return count <= limit; +} + /* non-SMP percpu_counter_add_local is the same with percpu_counter_add */ static inline void percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c index 9073430dc865..58a3392f471b 100644 --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -278,6 +278,59 @@ int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch) } EXPORT_SYMBOL(__percpu_counter_compare); +/* + * Compare counter, and add amount if the total is within limit. + * Return true if amount was added, false if it would exceed limit. + */ +bool __percpu_counter_limited_add(struct percpu_counter *fbc, + s64 limit, s64 amount, s32 batch) +{ + s64 count; + s64 unknown; + unsigned long flags; + bool good; + + if (amount > limit) + return false; + + local_irq_save(flags); + unknown = batch * num_online_cpus(); + count = __this_cpu_read(*fbc->counters); + + /* Skip taking the lock when safe */ + if (abs(count + amount) <= batch && + fbc->count + unknown <= limit) { + this_cpu_add(*fbc->counters, amount); + local_irq_restore(flags); + return true; + } + + raw_spin_lock(&fbc->lock); + count = fbc->count + amount; + + /* Skip percpu_counter_sum() when safe */ + if (count + unknown > limit) { + s32 *pcount; + int cpu; + + for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { + pcount = per_cpu_ptr(fbc->counters, cpu); + count += *pcount; + } + } + + good = count <= limit; + if (good) { + count = __this_cpu_read(*fbc->counters); + fbc->count += count + amount; + __this_cpu_sub(*fbc->counters, count); + } + + raw_spin_unlock(&fbc->lock); + local_irq_restore(flags); + return good; +} + static int __init percpu_counter_startup(void) { int ret; diff --git a/mm/shmem.c b/mm/shmem.c index 4f4ab26bc58a..7cb72c747954 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -217,15 +217,15 @@ static int shmem_inode_acct_blocks(struct inode *inode, long pages) might_sleep(); /* when quotas */ if (sbinfo->max_blocks) { - if (percpu_counter_compare(&sbinfo->used_blocks, - sbinfo->max_blocks - pages) > 0) + if (!percpu_counter_limited_add(&sbinfo->used_blocks, + sbinfo->max_blocks, pages)) goto unacct; err = dquot_alloc_block_nodirty(inode, pages); - if (err) + if (err) { + percpu_counter_sub(&sbinfo->used_blocks, pages); goto unacct; - - percpu_counter_add(&sbinfo->used_blocks, pages); + } } else { err = dquot_alloc_block_nodirty(inode, pages); if (err)