From patchwork Thu Mar 20 17:39:29 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Suren Baghdasaryan <surenb@google.com>
X-Patchwork-Id: 14024210
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1631CC36001
	for <linux-mm@archiver.kernel.org>; Thu, 20 Mar 2025 17:39:40 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id A76CD280005; Thu, 20 Mar 2025 13:39:39 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id A26C5280001; Thu, 20 Mar 2025 13:39:39 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 852F7280005; Thu, 20 Mar 2025 13:39:39 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com
 [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 5F5BE280001
	for <linux-mm@kvack.org>; Thu, 20 Mar 2025 13:39:39 -0400 (EDT)
Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id 87D98817C7
	for <linux-mm@kvack.org>; Thu, 20 Mar 2025 17:39:39 +0000 (UTC)
X-FDA: 83242641678.03.F6BEEBE
Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com
 [209.85.214.201])
	by imf17.hostedemail.com (Postfix) with ESMTP id 8AD6D40013
	for <linux-mm@kvack.org>; Thu, 20 Mar 2025 17:39:37 +0000 (UTC)
Authentication-Results: imf17.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=E0p85XGJ;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf17.hostedemail.com: domain of
 32FLcZwYKCDUjliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com
 designates 209.85.214.201 as permitted sender)
 smtp.mailfrom=32FLcZwYKCDUjliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1742492377;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=t+zfvdKqKRtuF8/jqI7ab7rT2M9nI8zR6FIUh1soqhk=;
	b=1E16+HE9wztYSEhs6WB3gubBZ1xF+zTIdbbEtbTTAqRJ/314pJZXovhtZ/3KCdV6jRBNcR
	xRs22i3zDu9IhZZVX2A84TqBWc6ZOS+VP+clAqaJl4oZMR7YPYyS3BefK27KrvzE3FJMJQ
	u9SQ31Xfs/VEkLT8ZW4pyajP9nrD+8I=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742492377; a=rsa-sha256;
	cv=none;
	b=YycKnVOY/4mwu9RwVChJGLAufceS3wNWFCS+ARr9q8uSgPhdJz+3nIMBneZlI35dn8VLDV
	Ybx5zFvOjqfBC3JM7WvKPtO7WRD61PbKvGl0Zl4tXooXKy6xlVjAafhOvVvxnMtbt/6O98
	7UdE1nEpKkF8RjGvNc7u4u1L3g9T9Lg=
ARC-Authentication-Results: i=1;
	imf17.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=E0p85XGJ;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf17.hostedemail.com: domain of
 32FLcZwYKCDUjliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com
 designates 209.85.214.201 as permitted sender)
 smtp.mailfrom=32FLcZwYKCDUjliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com
Received: by mail-pl1-f201.google.com with SMTP id
 d9443c01a7336-225107fbdc7so15401535ad.0
        for <linux-mm@kvack.org>; Thu, 20 Mar 2025 10:39:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1742492376; x=1743097176; darn=kvack.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=t+zfvdKqKRtuF8/jqI7ab7rT2M9nI8zR6FIUh1soqhk=;
        b=E0p85XGJeNxe4rjStO5BOD5jst6aLUwu7SwsAaqizRZfsh/VL6ZZAG/NOTtY05YpY3
         9pv5k9e0v/Gz5Dqr1UVwvCSUlHUI5FVNlY5eVnKy52wy/SqNExPGc8x0TNrtKFJbh4u4
         TypnCTvB8sbZhRLcLaDuACIDSBlhqcIeXnJ9+lU+oP2cZiwM5aszsfcsd2CkQEO6WZvu
         DH8ovQ+lBESOJ0tSoIf4O9iBA/UDfY+13NG2ALkPmlsuZb5PFggm+oSBJSZoQxQ7U8Sh
         fUzgbdoBAakAAEPUcJofPO0ixls3vYMdxBdo1fSvzmlnY8QkLbtRpzITzcrAiVGEqQR2
         /fVw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742492376; x=1743097176;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=t+zfvdKqKRtuF8/jqI7ab7rT2M9nI8zR6FIUh1soqhk=;
        b=EzYAW5NZiJUnjZGMH9gUMsZZ0F1NnByeI653M5KoITZqx6hdXQQzcvzblCfvL6uD1l
         cm3lxvjvFUzohwgOc4Sdu36UEIyJ4wD+aEnEtCUJPkkS8zwgngYTqOFhOAennLKm/xHV
         Xxa36/8+ZcnL6uopBZr9JTNRiRY2p5U8+oStUqwYeXRhKlqihM7tvcJzK4JkjIQV4RGY
         2UOz09uLuv5I77r5UucvTGlEazL8kT3u+VFoqhCk9LF0mEfN71WVV+eEVBKXScQ14+iN
         m1cOTDu8zZBdR9fTXDzwT1myIkWEoSsvfWNlm+5ekNXaY7M3/DDwwatrr0TgieljGdP2
         MJYg==
X-Forwarded-Encrypted: i=1;
 AJvYcCX3epGS4cV5HExSM2HzX6OqHfg1qdC1f7YlBMFj23nGlu8gMBWLh35YwdQAy+hIJdm5D6xUpALFtg==@kvack.org
X-Gm-Message-State: AOJu0YxeMK6t7DHwtD88Agcq8HVb50yfMvrxlwpA4WNtV5oVlvlREfYU
	I4iizyrMQKKOii2veVArB3JLZzjNhRJm/3pJyJ+tTmp/j85phbszEpRmUgCi+gYdfNlMspvgPPY
	6bQ==
X-Google-Smtp-Source: 
 AGHT+IEj9U/qqkg2JK42tE1kXZKV4PCOb8pIWaocVHh7N8zNZBI6gh3bGtWfH3FUXtlThY+c9ekx5NvITFQ=
X-Received: from pgbdp2.prod.google.com ([2002:a05:6a02:f02:b0:af2:446a:1332])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a20:748d:b0:1f5:7f56:a649
 with SMTP id adf61e73a8af0-1fe42f359e6mr409764637.13.1742492376275; Thu, 20
 Mar 2025 10:39:36 -0700 (PDT)
Date: Thu, 20 Mar 2025 10:39:29 -0700
In-Reply-To: <20250320173931.1583800-1-surenb@google.com>
Mime-Version: 1.0
References: <20250320173931.1583800-1-surenb@google.com>
X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog
Message-ID: <20250320173931.1583800-2-surenb@google.com>
Subject: [RFC 1/3] mm: implement cleancache
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: willy@infradead.org, david@redhat.com, vbabka@suse.cz,
	lorenzo.stoakes@oracle.com, liam.howlett@oracle.com,
 alexandru.elisei@arm.com,
	peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org,
	m.szyprowski@samsung.com, iamjoonsoo.kim@lge.com, mina86@mina86.com,
	axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org,
	hch@infradead.org, jack@suse.cz, hbathini@linux.ibm.com,
	sourabhjain@linux.ibm.com, ritesh.list@gmail.com, aneesh.kumar@kernel.org,
	bhelgaas@google.com, sj@kernel.org, fvdl@google.com, ziy@nvidia.com,
	yuzhao@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org,
	linuxppc-dev@lists.ozlabs.org, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, Minchan Kim <minchan@google.com>
X-Rspam-User: 
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 8AD6D40013
X-Stat-Signature: yxdjdsw1h5xxdfgktreu7u13t8r8injo
X-HE-Tag: 1742492377-425234
X-HE-Meta: 
 U2FsdGVkX19OdceVVeTjUQfU/CMfNVohgO+OKB97plykCpd7ux4J3MlksReh4vxDkCZMHXOJhTNKfXS6xj5PrVRi/sp6JFxieOpKnCeUuDYEEvVBGBZ2G+RVvUEzmzoCMPkN/Brg55shy1WN1vKWtH137DHcX20VPl85DebmYb9rJRsrhOyK6+7Vh1shZtaJw6KXP0LKJfQ3vN9ILa3NRhzQYp4ZIh8vxB+GZqZ7LzJ6an7hUlqY4vevdyCJJgos5Ty1zefcYmtiiNQ+7HuAjCRr0B5BEUBtnOER4/J6p/D98F0NCdXJT2wY6xD3qQBPuD311i7Ja0vjKtfI5bn8YZjK9/drsmh/xRnGtUudW2pnyT+97rWu+FsK3PRnCFlOACwnf1N/OHYWM5f+IQ5zCQ/1Lki/KWQm27Ew+B31cTdyYneGvyBR0pZzcfyZRgIOR1Xua0anhSs4gzb8REaPYMCOC5RBTrEI4AJ8h5Gi+kqvbB3nfJUJ1xwla2LKSCdTXl5lI2dhcNN9W0kHPdKwsvkWnuTs8GniJh7C3JtMuZUbcgqAMNJerogH5zJWc1IRQ12/wF/s0FguPO0wXWkMzCb9MJ/hS9lSrGDJ6iFbNlmFFAsYb13OD401Y6lvJd+ZkKMoJWt23tx/mcumqLkhkWFEMSv7PH6gA0DPhoDhaFF70RohxkUj6+l06PmLYvTeaUzxQ0EGfED8c7Y5i7AFLGRgPU9mpPe9i5a5PEQG8jDWi2muUzCnruXHp8fVorlui7Tf/2WGyfO+SkfvWH3Dd7WzW/dUVYNyWnk4tnnycyrHgd62u3LH7QhWnspM5K2Wesa6J8Bsz/SqPaVq94J3t36+fK3izn4EtxJuVgXk7kkaWmcgVN0Tapxu6PKxUGCbN34jA0BQTqSiPM+qXaieuz/P4EKcEE7noRPc6jok0SAPtsBCxYjV5vDEbpvYZjZwzC8foqBk/PkJx4bSC3u
 isYK0dVw
 2+Y3WNc64fakvW0ot0B0rsbkI9BY9k4AdIFPHtr9HsPALmQ0eIyfjuH+feWTi2bYL3UdUSds/Q2QrS7c+RSafh8CnsO+HvcMty0tCwQAlKswJaEcnx+wOJ+KxIACWcCHEegOwg9jJh3sc/LXWN4GvlcNZxZxaERiEVUXWyWVOl87t8FDtLwXeY/euqlFpIr12cnZrfzxFBSFhNkUu9Riik31t2Q7W3mCx9A9Dm2EuDnXR9tu+0pV1NpVeYi6rKQItFMXM5CyJNFS+Kqzj/CApbOZAwoAn6Kuu+RFv8SH+ed2sqcYSXk//rZzOq84D5nqIt7FfYTvGtmKpIV8kpOta94zsAdEEg6oL4vXEd3IHNTRuU4HYHK0XY4nOG80C7O/9lMrEWyMWvynBexuSYVALz7QM5nG8wV4+B0ReR4yz2ndPa3cqKcTUAfEdI4qFOYuBmNv5SDmq5LAJnCt3pinDtyKw7GTzeGoGzE5t52K624deZe60C5rie1x3yrWrpuaAaYC7Mda/vEhTOA59xS2pjpa8U+CoW1BpcbXLBpN8DEkBV64CPf7xGkBnC6KWVXGWuNQ5y0iJ2jdnVZDqY3HjyERaec/am0JWN/Gq2XA8bA2lRg8=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Cleancache can be thought of as a page-granularity victim cache for clean
pages that the kernel's pageframe replacement algorithm (PFRA) would like
to keep around, but can't since there isn't enough memory.  So when the
PFRA "evicts" a page, it first attempts to use cleancache code to
put the data contained in that page into "transcendent memory", memory
that is not directly accessible or addressable by the kernel.
Later, when the system needs to access a page in a file on disk, it first
checks cleancache to see if it already contains it; if it does, the page
of data is copied into the kernel and a disk access is avoided.
The patchset borrows the idea, some code and documentation from previous
cleancache implementation but as opposed to being a thin pass-through
layer it now implements housekeeping code to associate cleancache pages
with their inodes and handling of page pools donated by the cleancache
backends. If also avoids intrusive hooks into filesystem code, limiting
itself to hooks in mm reclaim and page-in paths and two hooks to detect
new filesystem mount/unmount events.
This patch implements the basic cleancache support. Future plans include
implementing large folio support, sysfs statistics and cleancache page
eviction mechanism.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Minchan Kim <minchan@google.com>
---
 block/bdev.c               |   8 +
 fs/super.c                 |   3 +
 include/linux/cleancache.h |  88 ++++
 include/linux/fs.h         |   7 +
 mm/Kconfig                 |  17 +
 mm/Makefile                |   1 +
 mm/cleancache.c            | 926 +++++++++++++++++++++++++++++++++++++
 mm/filemap.c               |  63 ++-
 mm/truncate.c              |  21 +-
 9 files changed, 1124 insertions(+), 10 deletions(-)
 create mode 100644 include/linux/cleancache.h
 create mode 100644 mm/cleancache.c

diff --git a/block/bdev.c b/block/bdev.c
index 9d73a8fbf7f9..aa00b9da9e0a 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -28,6 +28,7 @@
 #include <linux/part_stat.h>
 #include <linux/uaccess.h>
 #include <linux/stat.h>
+#include <linux/cleancache.h>
 #include "../fs/internal.h"
 #include "blk.h"
 
@@ -95,12 +96,19 @@ static void kill_bdev(struct block_device *bdev)
 void invalidate_bdev(struct block_device *bdev)
 {
 	struct address_space *mapping = bdev->bd_mapping;
+	struct cleancache_filekey key;
 
 	if (mapping->nrpages) {
 		invalidate_bh_lrus();
 		lru_add_drain_all();	/* make sure all lru add caches are flushed */
 		invalidate_mapping_pages(mapping, 0, -1);
 	}
+	/*
+	 * 99% of the time, we don't need to flush the cleancache on the bdev.
+	 * But, for the strange corners, lets be cautious
+	 */
+	cleancache_invalidate_inode(mapping,
+				    cleancache_get_key(mapping->host, &key));
 }
 EXPORT_SYMBOL(invalidate_bdev);
 
diff --git a/fs/super.c b/fs/super.c
index 5a7db4a556e3..7e8d668a587e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -31,6 +31,7 @@
 #include <linux/mutex.h>
 #include <linux/backing-dev.h>
 #include <linux/rculist_bl.h>
+#include <linux/cleancache.h>
 #include <linux/fscrypt.h>
 #include <linux/fsnotify.h>
 #include <linux/lockdep.h>
@@ -374,6 +375,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	s->s_time_gran = 1000000000;
 	s->s_time_min = TIME64_MIN;
 	s->s_time_max = TIME64_MAX;
+	cleancache_add_fs(s);
 
 	s->s_shrink = shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE,
 				     "sb-%s", type->name);
@@ -469,6 +471,7 @@ void deactivate_locked_super(struct super_block *s)
 {
 	struct file_system_type *fs = s->s_type;
 	if (atomic_dec_and_test(&s->s_active)) {
+		cleancache_remove_fs(s);
 		shrinker_free(s->s_shrink);
 		fs->kill_sb(s);
 
diff --git a/include/linux/cleancache.h b/include/linux/cleancache.h
new file mode 100644
index 000000000000..a9161cbf3490
--- /dev/null
+++ b/include/linux/cleancache.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CLEANCACHE_H
+#define _LINUX_CLEANCACHE_H
+
+#include <linux/fs.h>
+#include <linux/exportfs.h>
+#include <linux/mm.h>
+
+/* super_block->cleancache_id value for an invalid ID */
+#define CLEANCACHE_ID_INVALID	-1
+
+#define CLEANCACHE_KEY_MAX	6
+
+/*
+ * Cleancache requires every file with a folio in cleancache to have a
+ * unique key unless/until the file is removed/truncated.  For some
+ * filesystems, the inode number is unique, but for "modern" filesystems
+ * an exportable filehandle is required (see exportfs.h)
+ */
+struct cleancache_filekey {
+	union {
+		ino_t ino;
+		__u32 fh[CLEANCACHE_KEY_MAX];
+		u32 key[CLEANCACHE_KEY_MAX];
+	} u;
+};
+
+#ifdef CONFIG_CLEANCACHE
+
+/* Hooks into MM and FS */
+struct cleancache_filekey *cleancache_get_key(struct inode *inode,
+					      struct cleancache_filekey *key);
+void cleancache_add_fs(struct super_block *sb);
+void cleancache_remove_fs(struct super_block *sb);
+void cleancache_store_folio(struct folio *folio,
+			    struct cleancache_filekey *key);
+bool cleancache_restore_folio(struct folio *folio,
+			      struct cleancache_filekey *key);
+void cleancache_invalidate_folio(struct address_space *mapping,
+				 struct folio *folio,
+				 struct cleancache_filekey *key);
+void cleancache_invalidate_inode(struct address_space *mapping,
+				 struct cleancache_filekey *key);
+
+/*
+ * Backend API
+ *
+ * Cleancache does not touch page reference. Page refcount should be 1 when
+ * page is placed or returned into cleancache and pages obtained from
+ * cleancache will also have their refcount at 1.
+ */
+int cleancache_register_backend(const char *name, struct list_head *folios);
+int cleancache_backend_get_folio(int area_id, struct folio *folio);
+int cleancache_backend_put_folio(int area_id, struct folio *folio);
+
+#else /* CONFIG_CLEANCACHE */
+
+static inline
+struct cleancache_filekey *cleancache_get_key(struct inode *inode,
+					      struct cleancache_filekey *key)
+{
+	return NULL;
+}
+static inline void cleancache_add_fs(struct super_block *sb) {}
+static inline void cleancache_remove_fs(struct super_block *sb) {}
+static inline void cleancache_store_folio(struct folio *folio,
+					  struct cleancache_filekey *key) {}
+static inline bool cleancache_restore_folio(struct folio *folio,
+					    struct cleancache_filekey *key)
+{
+	return false;
+}
+static inline void cleancache_invalidate_folio(struct address_space *mapping,
+					       struct folio *folio,
+					       struct cleancache_filekey *key) {}
+static inline void cleancache_invalidate_inode(struct address_space *mapping,
+					       struct cleancache_filekey *key) {}
+
+static inline int cleancache_register_backend(const char *name,
+		struct list_head *folios) { return -EOPNOTSUPP; }
+static inline int cleancache_backend_get_folio(int area_id,
+		struct folio *folio) { return -EOPNOTSUPP; }
+static inline int cleancache_backend_put_folio(int area_id,
+		struct folio *folio) { return -EOPNOTSUPP; }
+
+#endif /* CONFIG_CLEANCACHE */
+
+#endif /* _LINUX_CLEANCACHE_H */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2788df98080f..851544454c9e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1407,6 +1407,13 @@ struct super_block {
 
 	const struct dentry_operations *s_d_op; /* default d_op for dentries */
 
+#ifdef CONFIG_CLEANCACHE
+	/*
+	 * Saved identifier for cleancache (CLEANCACHE_ID_INVALID means none)
+	 */
+	int cleancache_id;
+#endif
+
 	struct shrinker *s_shrink;	/* per-sb shrinker handle */
 
 	/* Number of inodes with nlink == 0 but still referenced */
diff --git a/mm/Kconfig b/mm/Kconfig
index 4a4e7b63d30a..d6ebf0fb0432 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -945,6 +945,23 @@ config USE_PERCPU_NUMA_NODE_ID
 config HAVE_SETUP_PER_CPU_AREA
 	bool
 
+config CLEANCACHE
+	bool "Enable cleancache to cache clean pages if tmem is present"
+	help
+	  Cleancache can be thought of as a page-granularity victim cache
+	  for clean pages that the kernel's pageframe replacement algorithm
+	  (PFRA) would like to keep around, but can't since there isn't enough
+	  memory.  So when the PFRA "evicts" a page, it first attempts to use
+	  cleancache code to put the data contained in that page into
+	  "transcendent memory", memory that is not directly accessible or
+	  addressable by the kernel and is of unknown and possibly
+	  time-varying size. When system wishes to access a page in a file
+	  on disk, it first checks cleancache to see if it already contains
+	  it; if it does, the page is copied into the kernel and a disk
+	  access is avoided.
+
+	  If unsure, say N.
+
 config CMA
 	bool "Contiguous Memory Allocator"
 	depends on MMU
diff --git a/mm/Makefile b/mm/Makefile
index e7f6bbf8ae5f..084dbe9edbc4 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -148,3 +148,4 @@ obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o
 obj-$(CONFIG_EXECMEM) += execmem.o
 obj-$(CONFIG_TMPFS_QUOTA) += shmem_quota.o
 obj-$(CONFIG_PT_RECLAIM) += pt_reclaim.o
+obj-$(CONFIG_CLEANCACHE) += cleancache.o
diff --git a/mm/cleancache.c b/mm/cleancache.c
new file mode 100644
index 000000000000..23113c5adfc5
--- /dev/null
+++ b/mm/cleancache.c
@@ -0,0 +1,926 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Cleancache frontend
+ *
+ */
+
+#include <linux/cleancache.h>
+#include <linux/debugfs.h>
+#include <linux/exportfs.h>
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/highmem.h>
+#include <linux/idr.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/xarray.h>
+
+/*
+ * Possible lock nesting:
+ * inode->pages.xa_lock
+ *	free_folios_lock
+ *
+ * inode->pages.xa_lock
+ *	fs->hash_lock
+ *
+ * Notes: should keep free_folios_lock and fs->hash_lock HARDIRQ-irq-safe
+ * since inode->pages.xa_lock is HARDIRQ-irq-safe and we take these locks
+ * while holding inode->pages.xa_lock. This means whenever we take these
+ * locks while not holding inode->pages.xa_lock, we should disable irqs.
+ */
+
+/* Counters available via /sys/kernel/debug/cleancache */
+static u64 cleancache_hits;
+static u64 cleancache_misses;
+static u64 cleancache_stores;
+static u64 cleancache_failed_stores;
+static u64 cleancache_invalidates;
+
+/*
+ * @cleancache_inode represents each inode in @cleancache_fs
+ *
+ * The cleancache_inode will be freed by RCU when the last page from xarray
+ * is freed, except for invalidate_inode() case.
+ */
+struct cleancache_inode {
+	struct cleancache_filekey key;
+	struct hlist_node hash;
+	refcount_t ref_count;
+
+	struct xarray pages;
+	struct rcu_head rcu;
+	struct cleancache_fs *fs;
+};
+
+static struct kmem_cache *slab_inode;
+
+#define INODE_HASH_BITS		10
+
+/* represents each file system instance hosted by the cleancache */
+struct cleancache_fs {
+	spinlock_t hash_lock;
+	DECLARE_HASHTABLE(inode_hash, INODE_HASH_BITS);
+	refcount_t ref_count;
+};
+
+static DEFINE_IDR(fs_idr);
+static DEFINE_SPINLOCK(fs_lock);
+
+/* Cleancache backend memory pool */
+struct cleancache_pool {
+	struct list_head free_folios;
+	spinlock_t free_folios_lock;
+};
+
+#define CLEANCACHE_MAX_POOLS	64
+
+static struct cleancache_pool pools[CLEANCACHE_MAX_POOLS];
+static atomic_t nr_pools = ATOMIC_INIT(0);
+static DEFINE_SPINLOCK(pools_lock);
+
+/*
+ * If the filesystem uses exportable filehandles, use the filehandle as
+ * the key, else use the inode number.
+ */
+struct cleancache_filekey *cleancache_get_key(struct inode *inode,
+					      struct cleancache_filekey *key)
+{
+	int (*fhfn)(struct inode *inode, __u32 *fh, int *max_len, struct inode *parent);
+	int len = 0, maxlen = CLEANCACHE_KEY_MAX;
+	struct super_block *sb = inode->i_sb;
+
+	key->u.ino = inode->i_ino;
+	if (sb->s_export_op != NULL) {
+		fhfn = sb->s_export_op->encode_fh;
+		if  (fhfn) {
+			len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL);
+			if (len <= FILEID_ROOT || len == FILEID_INVALID)
+				return NULL;
+			if (maxlen > CLEANCACHE_KEY_MAX)
+				return NULL;
+		}
+	}
+	return key;
+}
+
+/* page attribute helpers */
+static inline void set_page_pool_id(struct page *page, int id)
+{
+	page->page_type = id;
+}
+
+static inline int page_pool_id(struct page *page)
+{
+	return page->page_type;
+}
+
+static inline struct cleancache_pool *page_pool(struct page *page)
+{
+	return &pools[page_pool_id(page)];
+}
+
+/* Can be used only when page is isolated */
+static inline void __SetPageCCacheFree(struct page *page)
+{
+	SetPagePrivate(page);
+}
+
+static inline void SetPageCCacheFree(struct page *page)
+{
+	lockdep_assert_held(&(page_pool(page)->free_folios_lock));
+	__SetPageCCacheFree(page);
+}
+
+static inline void ClearPageCCacheFree(struct page *page)
+{
+	lockdep_assert_held(&(page_pool(page)->free_folios_lock));
+	ClearPagePrivate(page);
+}
+
+static inline int PageCCacheFree(struct page *page)
+{
+	lockdep_assert_held(&(page_pool(page)->free_folios_lock));
+	return PagePrivate(page);
+}
+
+/* Can be used only when page is isolated */
+static void __set_page_inode_offs(struct page *page,
+				  struct cleancache_inode *inode,
+				  unsigned long index)
+{
+	page->mapping = (struct address_space *)inode;
+	page->index = index;
+}
+
+static void set_page_inode_offs(struct page *page, struct cleancache_inode *inode,
+				unsigned long index)
+{
+	lockdep_assert_held(&(page_pool(page)->free_folios_lock));
+
+	__set_page_inode_offs(page, inode, index);
+}
+
+static void page_inode_offs(struct page *page, struct cleancache_inode **inode,
+			    unsigned long *index)
+{
+	lockdep_assert_held(&(page_pool(page)->free_folios_lock));
+
+	*inode = (struct cleancache_inode *)page->mapping;
+	*index = page->index;
+}
+
+/* page pool helpers */
+static void add_page_to_pool(struct page *page, struct cleancache_pool *pool)
+{
+	unsigned long flags;
+
+	VM_BUG_ON(!list_empty(&page->lru));
+
+	spin_lock_irqsave(&pool->free_folios_lock, flags);
+
+	set_page_inode_offs(page, NULL, 0);
+	SetPageCCacheFree(page);
+	list_add(&page_folio(page)->lru, &pool->free_folios);
+
+	spin_unlock_irqrestore(&pool->free_folios_lock, flags);
+}
+
+static struct page *remove_page_from_pool(struct page *page, struct cleancache_pool *pool)
+{
+	lockdep_assert_held(&pool->free_folios_lock);
+	VM_BUG_ON(page_pool(page) != pool);
+
+	if (!PageCCacheFree(page))
+		return NULL;
+
+	list_del_init(&page->lru);
+	ClearPageCCacheFree(page);
+
+	return page;
+}
+
+static struct page *pick_page_from_pool(void)
+{
+	struct cleancache_pool *pool;
+	struct page *page = NULL;
+	unsigned long flags;
+	int count;
+
+	count = atomic_read_acquire(&nr_pools);
+	for (int i = 0; i < count; i++) {
+		pool = &pools[i];
+		spin_lock_irqsave(&pool->free_folios_lock, flags);
+		if (!list_empty(&pool->free_folios)) {
+			struct folio *folio;
+
+			folio = list_last_entry(&pool->free_folios,
+						struct folio, lru);
+			page = &folio->page;
+			WARN_ON(!remove_page_from_pool(page, pool));
+			spin_unlock_irqrestore(&pool->free_folios_lock, flags);
+			break;
+		}
+		spin_unlock_irqrestore(&pool->free_folios_lock, flags);
+	}
+
+	return page;
+}
+
+/* FS helpers */
+static struct cleancache_fs *get_fs(int fs_id)
+{
+	struct cleancache_fs *fs;
+
+	rcu_read_lock();
+	fs = idr_find(&fs_idr, fs_id);
+	if (fs && !refcount_inc_not_zero(&fs->ref_count))
+		fs = NULL;
+	rcu_read_unlock();
+
+	return fs;
+}
+
+static void put_fs(struct cleancache_fs *fs)
+{
+	if (refcount_dec_and_test(&fs->ref_count))
+		kfree(fs);
+}
+
+/* inode helpers */
+static struct cleancache_inode *alloc_inode(struct cleancache_fs *fs,
+					    struct cleancache_filekey *key)
+{
+	struct cleancache_inode *inode;
+
+	inode = kmem_cache_alloc(slab_inode, GFP_ATOMIC|__GFP_NOWARN);
+	if (inode) {
+		memcpy(&inode->key, key, sizeof(*key));
+		xa_init_flags(&inode->pages, XA_FLAGS_LOCK_IRQ);
+		INIT_HLIST_NODE(&inode->hash);
+		inode->fs = fs;
+		refcount_set(&inode->ref_count, 1);
+	}
+
+	return inode;
+}
+
+static int erase_pages_from_inode(struct cleancache_inode *inode,
+				  bool remove_inode);
+
+static void inode_free_rcu(struct rcu_head *rcu)
+{
+	struct cleancache_inode *inode;
+
+	inode = container_of(rcu, struct cleancache_inode, rcu);
+	erase_pages_from_inode(inode, false);
+	kmem_cache_free(slab_inode, inode);
+}
+
+static bool get_inode(struct cleancache_inode *inode)
+{
+	return refcount_inc_not_zero(&inode->ref_count);
+}
+
+static bool put_inode(struct cleancache_inode *inode)
+{
+	if (!refcount_dec_and_test(&inode->ref_count))
+		return false;
+
+	call_rcu(&inode->rcu, inode_free_rcu);
+	return true;
+}
+
+static void remove_inode_if_empty(struct cleancache_inode *inode)
+{
+	struct cleancache_fs *fs = inode->fs;
+
+	lockdep_assert_held(&inode->pages.xa_lock);
+
+	if (!xa_empty(&inode->pages))
+		return;
+
+	spin_lock(&fs->hash_lock);
+	if (!WARN_ON(hlist_unhashed(&inode->hash)))
+		hlist_del_init_rcu(&inode->hash);
+	spin_unlock(&fs->hash_lock);
+	/* Caller should have taken an extra refcount to keep inode valid */
+	WARN_ON(put_inode(inode));
+}
+
+static int store_page_in_inode(struct cleancache_inode *inode,
+			       unsigned long index, struct page *page)
+{
+	struct cleancache_pool *pool = page_pool(page);
+	unsigned long flags;
+	int err;
+
+	lockdep_assert_held(&inode->pages.xa_lock);
+	VM_BUG_ON(!list_empty(&page->lru));
+
+	spin_lock_irqsave(&pool->free_folios_lock, flags);
+
+	err = xa_err(__xa_store(&inode->pages, index, page,
+				GFP_ATOMIC|__GFP_NOWARN));
+	if (!err) {
+		set_page_inode_offs(page, inode, index);
+		VM_BUG_ON_PAGE(PageCCacheFree(page), page);
+	}
+
+	spin_unlock_irqrestore(&pool->free_folios_lock, flags);
+
+	return err;
+}
+
+static void erase_page_from_inode(struct cleancache_inode *inode,
+				  unsigned long index, struct page *page)
+{
+	bool removed;
+
+	lockdep_assert_held(&inode->pages.xa_lock);
+
+	removed = __xa_erase(&inode->pages, index);
+	VM_BUG_ON(!removed || !list_empty(&page->lru));
+
+	remove_inode_if_empty(inode);
+}
+
+static int erase_pages_from_inode(struct cleancache_inode *inode, bool remove_inode)
+{
+	XA_STATE(xas, &inode->pages, 0);
+	unsigned long flags;
+	struct page *page;
+	unsigned int ret = 0;
+
+	xas_lock_irqsave(&xas, flags);
+
+	if (!xa_empty(&inode->pages)) {
+		xas_for_each(&xas, page, ULONG_MAX) {
+			__xa_erase(&inode->pages, xas.xa_index);
+			add_page_to_pool(page, page_pool(page));
+			ret++;
+		}
+	}
+	if (remove_inode)
+		remove_inode_if_empty(inode);
+
+	xas_unlock_irqrestore(&xas, flags);
+
+	return ret;
+}
+
+static struct cleancache_inode *find_and_get_inode(struct cleancache_fs *fs,
+						   struct cleancache_filekey *key)
+{
+	struct cleancache_inode *tmp, *inode = NULL;
+
+	rcu_read_lock();
+	hash_for_each_possible_rcu(fs->inode_hash, tmp, hash, key->u.ino) {
+		if (memcmp(&tmp->key, key, sizeof(*key)))
+			continue;
+
+		/* TODO: should we stop if get fails? */
+		if (get_inode(tmp)) {
+			inode = tmp;
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	return inode;
+}
+
+static struct cleancache_inode *add_and_get_inode(struct cleancache_fs *fs,
+						  struct cleancache_filekey *key)
+{
+	struct cleancache_inode *inode, *tmp;
+	unsigned long flags;
+
+	inode = alloc_inode(fs, key);
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+
+	spin_lock_irqsave(&fs->hash_lock, flags);
+	tmp = find_and_get_inode(fs, key);
+	if (tmp) {
+		spin_unlock_irqrestore(&fs->hash_lock, flags);
+		/* someone already added it */
+		put_inode(inode);
+		put_inode(tmp);
+		return ERR_PTR(-EEXIST);
+	}
+
+	hash_add_rcu(fs->inode_hash, &inode->hash, key->u.ino);
+	get_inode(inode);
+	spin_unlock_irqrestore(&fs->hash_lock, flags);
+
+	return inode;
+}
+
+/*
+ * We want to store only workingset pages in the cleancache to increase hit
+ * ratio so there are four cases:
+ *
+ * @page is workingset but cleancache doesn't have it: use new cleancache page
+ * @page is workingset and cleancache has it: overwrite the stale data
+ * @page is !workingset and cleancache doesn't have it: just bail out
+ * @page is !workingset and cleancache has it: remove the stale @page
+ */
+static bool store_into_inode(struct cleancache_fs *fs,
+			     struct cleancache_filekey *key,
+			     pgoff_t offset, struct page *page)
+{
+	bool workingset = PageWorkingset(page);
+	struct cleancache_inode *inode;
+	struct page *stored_page;
+	void *src, *dst;
+	bool ret = false;
+
+find_inode:
+	inode = find_and_get_inode(fs, key);
+	if (!inode) {
+		if (!workingset)
+			return false;
+
+		inode = add_and_get_inode(fs, key);
+		if (IS_ERR_OR_NULL(inode)) {
+			/*
+			 * Retry if someone just added new inode from under us.
+			 */
+			if (PTR_ERR(inode) == -EEXIST)
+				goto find_inode;
+
+			return false;
+		}
+	}
+
+	xa_lock(&inode->pages);
+
+	stored_page = xa_load(&inode->pages, offset);
+	if (stored_page) {
+		if (!workingset) {
+			erase_page_from_inode(inode, offset, stored_page);
+			add_page_to_pool(stored_page, page_pool(stored_page));
+			goto out_unlock;
+		}
+	} else {
+		if (!workingset)
+			goto out_unlock;
+
+		stored_page = pick_page_from_pool();
+		if (!stored_page)
+			goto out_unlock;
+
+		if (store_page_in_inode(inode, offset, stored_page)) {
+			add_page_to_pool(stored_page, page_pool(stored_page));
+			goto out_unlock;
+		}
+	}
+
+	/* Copy the content of the page */
+	src = kmap_local_page(page);
+	dst = kmap_local_page(stored_page);
+	memcpy(dst, src, PAGE_SIZE);
+	kunmap_local(dst);
+	kunmap_local(src);
+
+	ret = true;
+out_unlock:
+	/*
+	 * Remove the inode if it was just created but we failed to add a page.
+	 */
+	remove_inode_if_empty(inode);
+	xa_unlock(&inode->pages);
+	put_inode(inode);
+
+	return ret;
+}
+
+static bool load_from_inode(struct cleancache_fs *fs,
+			    struct cleancache_filekey *key,
+			    pgoff_t offset, struct page *page)
+{
+	struct cleancache_inode *inode;
+	struct page *stored_page;
+	void *src, *dst;
+	bool ret = false;
+
+	inode = find_and_get_inode(fs, key);
+	if (!inode)
+		return false;
+
+	xa_lock(&inode->pages);
+
+	stored_page = xa_load(&inode->pages, offset);
+	if (stored_page) {
+		src = kmap_local_page(stored_page);
+		dst = kmap_local_page(page);
+		memcpy(dst, src, PAGE_SIZE);
+		kunmap_local(dst);
+		kunmap_local(src);
+		ret = true;
+	}
+
+	xa_unlock(&inode->pages);
+	put_inode(inode);
+
+	return ret;
+}
+
+static bool invalidate_page(struct cleancache_fs *fs,
+			    struct cleancache_filekey *key, pgoff_t offset)
+{
+	struct cleancache_inode *inode;
+	struct page *page;
+
+	inode = find_and_get_inode(fs, key);
+	if (!inode)
+		return false;
+
+	xa_lock(&inode->pages);
+	page = xa_load(&inode->pages, offset);
+	if (page) {
+		erase_page_from_inode(inode, offset, page);
+		add_page_to_pool(page, page_pool(page));
+	}
+	xa_unlock(&inode->pages);
+	put_inode(inode);
+
+	return page != NULL;
+}
+
+static unsigned int invalidate_inode(struct cleancache_fs *fs,
+				     struct cleancache_filekey *key)
+{
+	struct cleancache_inode *inode;
+	unsigned int ret;
+
+	inode = find_and_get_inode(fs, key);
+	if (!inode)
+		return 0;
+
+	ret = erase_pages_from_inode(inode, true);
+	put_inode(inode);
+
+	return ret;
+}
+
+/* Hooks into MM and FS */
+void cleancache_add_fs(struct super_block *sb)
+{
+	int fs_id;
+	struct cleancache_fs *fs;
+
+	fs = kzalloc(sizeof(struct cleancache_fs), GFP_KERNEL);
+	if (!fs)
+		goto err;
+
+	spin_lock_init(&fs->hash_lock);
+	hash_init(fs->inode_hash);
+	refcount_set(&fs->ref_count, 1);
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fs_lock);
+	fs_id = idr_alloc(&fs_idr, fs, 0, 0, GFP_NOWAIT);
+	spin_unlock(&fs_lock);
+	idr_preload_end();
+
+	if (fs_id < 0) {
+		pr_warn("too many file systems\n");
+		goto err_free;
+	}
+
+	sb->cleancache_id = fs_id;
+	return;
+
+err_free:
+	kfree(fs);
+err:
+	sb->cleancache_id = CLEANCACHE_ID_INVALID;
+}
+
+void cleancache_remove_fs(struct super_block *sb)
+{
+	int fs_id = sb->cleancache_id;
+	struct cleancache_inode *inode;
+	struct cleancache_fs *fs;
+	struct hlist_node *tmp;
+	int cursor;
+
+	sb->cleancache_id = CLEANCACHE_ID_INVALID;
+	fs = get_fs(fs_id);
+	if (!fs)
+		return;
+
+	/*
+	 * No need to hold any lock here since this function is called when
+	 * fs is unmounted. IOW, inode insert/delete race cannot happen.
+	 */
+	hash_for_each_safe(fs->inode_hash, cursor, tmp, inode, hash)
+		cleancache_invalidates += invalidate_inode(fs, &inode->key);
+	synchronize_rcu();
+
+#ifdef CONFIG_DEBUG_VM
+	for (int i = 0; i < HASH_SIZE(fs->inode_hash); i++)
+		VM_BUG_ON(!hlist_empty(&fs->inode_hash[i]));
+#endif
+	spin_lock(&fs_lock);
+	idr_remove(&fs_idr, fs_id);
+	spin_unlock(&fs_lock);
+	put_fs(fs);
+	pr_info("removed file system %d\n", fs_id);
+
+	/* free the object */
+	put_fs(fs);
+}
+
+/*
+ * WARNING: This cleancache function might be called with disabled irqs
+ */
+void cleancache_store_folio(struct folio *folio,
+			    struct cleancache_filekey *key)
+{
+	struct cleancache_fs *fs;
+	int fs_id;
+
+	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
+
+	if (!key)
+		return;
+
+	/* Do not support large folios yet */
+	if (folio_test_large(folio))
+		return;
+
+	fs_id = folio->mapping->host->i_sb->cleancache_id;
+	if (fs_id == CLEANCACHE_ID_INVALID)
+		return;
+
+	fs = get_fs(fs_id);
+	if (!fs)
+		return;
+
+	if (store_into_inode(fs, key, folio->index, &folio->page))
+		cleancache_stores++;
+	else
+		cleancache_failed_stores++;
+	put_fs(fs);
+}
+
+bool cleancache_restore_folio(struct folio *folio,
+			      struct cleancache_filekey *key)
+{
+	struct cleancache_fs *fs;
+	int fs_id;
+	bool ret;
+
+	if (!key)
+		return false;
+
+	/* Do not support large folios yet */
+	if (folio_test_large(folio))
+		return false;
+
+	fs_id = folio->mapping->host->i_sb->cleancache_id;
+	if (fs_id == CLEANCACHE_ID_INVALID)
+		return false;
+
+	fs = get_fs(fs_id);
+	if (!fs)
+		return false;
+
+	ret = load_from_inode(fs, key, folio->index, &folio->page);
+	if (ret)
+		cleancache_hits++;
+	else
+		cleancache_misses++;
+	put_fs(fs);
+
+	return ret;
+}
+
+/*
+ * WARNING: This cleancache function might be called with disabled irqs
+ */
+void cleancache_invalidate_folio(struct address_space *mapping,
+				 struct folio *folio,
+				 struct cleancache_filekey *key)
+{
+	struct cleancache_fs *fs;
+	int fs_id;
+
+	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
+
+	if (!key)
+		return;
+
+	/* Do not support large folios yet */
+	if (folio_test_large(folio))
+		return;
+
+	/* Careful, folio->mapping can be NULL */
+	fs_id = mapping->host->i_sb->cleancache_id;
+	if (fs_id == CLEANCACHE_ID_INVALID)
+		return;
+
+	fs = get_fs(fs_id);
+	if (!fs)
+		return;
+
+	if (invalidate_page(fs, key, folio->index))
+		cleancache_invalidates++;
+	put_fs(fs);
+}
+
+void cleancache_invalidate_inode(struct address_space *mapping,
+				 struct cleancache_filekey *key)
+{
+	struct cleancache_fs *fs;
+	int fs_id;
+
+	if (!key)
+		return;
+
+	fs_id = mapping->host->i_sb->cleancache_id;
+	if (fs_id == CLEANCACHE_ID_INVALID)
+		return;
+
+	fs = get_fs(fs_id);
+	if (!fs)
+		return;
+
+	cleancache_invalidates += invalidate_inode(fs, key);
+	put_fs(fs);
+}
+
+/* Backend API */
+/*
+ * Register a new backend and add its pages for cleancache to use.
+ * Returns pool id on success or a negative error code on failure.
+ */
+int cleancache_register_backend(const char *name, struct list_head *folios)
+{
+	struct cleancache_pool *pool;
+	unsigned long pool_size = 0;
+	unsigned long flags;
+	struct folio *folio;
+	int pool_id;
+
+	/* pools_lock prevents concurrent registrations */
+	spin_lock(&pools_lock);
+
+	pool_id = atomic_read(&nr_pools);
+	if (pool_id >= CLEANCACHE_MAX_POOLS) {
+		spin_unlock(&pools_lock);
+		return -ENOMEM;
+	}
+
+	pool = &pools[pool_id];
+	INIT_LIST_HEAD(&pool->free_folios);
+	spin_lock_init(&pool->free_folios_lock);
+	/* Ensure above stores complete before we increase the count */
+	atomic_set_release(&nr_pools, pool_id + 1);
+
+	spin_unlock(&pools_lock);
+
+	list_for_each_entry(folio, folios, lru) {
+		struct page *page;
+
+		/* Do not support large folios yet */
+		VM_BUG_ON_FOLIO(folio_test_large(folio), folio);
+		VM_BUG_ON_FOLIO(folio_ref_count(folio) != 1, folio);
+		page = &folio->page;
+		set_page_pool_id(page, pool_id);
+		__set_page_inode_offs(page, NULL, 0);
+		__SetPageCCacheFree(page);
+		pool_size++;
+	}
+
+	spin_lock_irqsave(&pool->free_folios_lock, flags);
+	list_splice_init(folios, &pool->free_folios);
+	spin_unlock_irqrestore(&pool->free_folios_lock, flags);
+
+	pr_info("Registered \'%s\' cleancache backend, pool id %d, size %lu pages\n",
+		name ? : "none", pool_id, pool_size);
+
+	return pool_id;
+}
+EXPORT_SYMBOL(cleancache_register_backend);
+
+int cleancache_backend_get_folio(int pool_id, struct folio *folio)
+{
+	struct cleancache_inode *inode;
+	struct cleancache_pool *pool;
+	unsigned long flags;
+	unsigned long index;
+	struct page *page;
+
+
+	/* Do not support large folios yet */
+	if (folio_test_large(folio))
+		return -EOPNOTSUPP;
+
+	page = &folio->page;
+	/* Does the page belong to the requesting backend */
+	if (page_pool_id(page) != pool_id)
+		return -EINVAL;
+
+	pool = &pools[pool_id];
+again:
+	spin_lock_irqsave(&pool->free_folios_lock, flags);
+
+	/* If page is free inside the pool, return it */
+	if (remove_page_from_pool(page, pool)) {
+		spin_unlock_irqrestore(&pool->free_folios_lock, flags);
+		return 0;
+	}
+
+	/*
+	 * The page is not free, therefore it has to belong to a valid inode.
+	 * Operations on CCacheFree and page->mapping are done under
+	 * free_folios_lock which we are currently holding and CCacheFree
+	 * always gets cleared before page->mapping is set.
+	 */
+	page_inode_offs(page, &inode, &index);
+	if (WARN_ON(!inode || !get_inode(inode))) {
+		spin_unlock_irqrestore(&pool->free_folios_lock, flags);
+		return -EINVAL;
+	}
+
+	spin_unlock_irqrestore(&pool->free_folios_lock, flags);
+
+	xa_lock_irqsave(&inode->pages, flags);
+	/*
+	 * Retry if the page got erased from the inode but was not added into
+	 * the pool yet. erase_page_from_inode() and add_page_to_pool() happens
+	 * under inode->pages.xa_lock which we are holding, therefore by now
+	 * both operations should have completed. Let's retry.
+	 */
+	if (xa_load(&inode->pages, index) != page) {
+		xa_unlock_irqrestore(&inode->pages, flags);
+		put_inode(inode);
+		goto again;
+	}
+
+	erase_page_from_inode(inode, index, page);
+
+	spin_lock(&pool->free_folios_lock);
+	set_page_inode_offs(page, NULL, 0);
+	spin_unlock(&pool->free_folios_lock);
+
+	xa_unlock_irqrestore(&inode->pages, flags);
+
+	put_inode(inode);
+
+	return 0;
+}
+EXPORT_SYMBOL(cleancache_backend_get_folio);
+
+int cleancache_backend_put_folio(int pool_id, struct folio *folio)
+{
+	struct cleancache_pool *pool = &pools[pool_id];
+	struct page *page;
+
+	/* Do not support large folios yet */
+	if (folio_test_large(folio))
+		return -EOPNOTSUPP;
+
+	page = &folio->page;
+	VM_BUG_ON_PAGE(page_ref_count(page) != 1, page);
+	VM_BUG_ON(!list_empty(&page->lru));
+	/* Reset struct page fields */
+	set_page_pool_id(page, pool_id);
+	INIT_LIST_HEAD(&page->lru);
+	add_page_to_pool(page, pool);
+
+	return 0;
+}
+EXPORT_SYMBOL(cleancache_backend_put_folio);
+
+static int __init init_cleancache(void)
+{
+	slab_inode = KMEM_CACHE(cleancache_inode, 0);
+	if (!slab_inode)
+		return -ENOMEM;
+
+	return 0;
+}
+core_initcall(init_cleancache);
+
+#ifdef CONFIG_DEBUG_FS
+static int __init cleancache_debugfs_init(void)
+{
+	struct dentry *root;
+
+	root = debugfs_create_dir("cleancache", NULL);
+	debugfs_create_u64("hits", 0444, root, &cleancache_hits);
+	debugfs_create_u64("misses", 0444, root, &cleancache_misses);
+	debugfs_create_u64("stores", 0444, root, &cleancache_stores);
+	debugfs_create_u64("failed_stores", 0444, root, &cleancache_failed_stores);
+	debugfs_create_u64("invalidates", 0444, root, &cleancache_invalidates);
+
+	return 0;
+}
+late_initcall(cleancache_debugfs_init);
+#endif
diff --git a/mm/filemap.c b/mm/filemap.c
index cc69f174f76b..51dd86d7031f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -36,6 +36,7 @@
 #include <linux/cpuset.h>
 #include <linux/hugetlb.h>
 #include <linux/memcontrol.h>
+#include <linux/cleancache.h>
 #include <linux/shmem_fs.h>
 #include <linux/rmap.h>
 #include <linux/delayacct.h>
@@ -147,10 +148,20 @@ static void page_cache_delete(struct address_space *mapping,
 }
 
 static void filemap_unaccount_folio(struct address_space *mapping,
-		struct folio *folio)
+		struct folio *folio, struct cleancache_filekey *key)
 {
 	long nr;
 
+	/*
+	 * if we're uptodate, flush out into the cleancache, otherwise
+	 * invalidate any existing cleancache entries.  We can't leave
+	 * stale data around in the cleancache once our page is gone
+	 */
+	if (folio_test_uptodate(folio) && folio_test_mappedtodisk(folio))
+		cleancache_store_folio(folio, key);
+	else
+		cleancache_invalidate_folio(mapping, folio, key);
+
 	VM_BUG_ON_FOLIO(folio_mapped(folio), folio);
 	if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(folio_mapped(folio))) {
 		pr_alert("BUG: Bad page cache in process %s  pfn:%05lx\n",
@@ -210,6 +221,16 @@ static void filemap_unaccount_folio(struct address_space *mapping,
 		folio_account_cleaned(folio, inode_to_wb(mapping->host));
 }
 
+static void ___filemap_remove_folio(struct folio *folio, void *shadow,
+				    struct cleancache_filekey *key)
+{
+	struct address_space *mapping = folio->mapping;
+
+	trace_mm_filemap_delete_from_page_cache(folio);
+	filemap_unaccount_folio(mapping, folio, key);
+	page_cache_delete(mapping, folio, shadow);
+}
+
 /*
  * Delete a page from the page cache and free it. Caller has to make
  * sure the page is locked and that nobody else uses it - or that usage
@@ -217,11 +238,7 @@ static void filemap_unaccount_folio(struct address_space *mapping,
  */
 void __filemap_remove_folio(struct folio *folio, void *shadow)
 {
-	struct address_space *mapping = folio->mapping;
-
-	trace_mm_filemap_delete_from_page_cache(folio);
-	filemap_unaccount_folio(mapping, folio);
-	page_cache_delete(mapping, folio, shadow);
+	___filemap_remove_folio(folio, shadow, NULL);
 }
 
 void filemap_free_folio(struct address_space *mapping, struct folio *folio)
@@ -246,11 +263,20 @@ void filemap_free_folio(struct address_space *mapping, struct folio *folio)
 void filemap_remove_folio(struct folio *folio)
 {
 	struct address_space *mapping = folio->mapping;
+	struct cleancache_filekey *pkey;
+	struct cleancache_filekey key;
 
 	BUG_ON(!folio_test_locked(folio));
+
+	/*
+	 * cleancache_get_key() uses sb->s_export_op->encode_fh which can
+	 * also take inode->i_lock. Get the key before taking inode->i_lock.
+	 */
+	pkey = cleancache_get_key(mapping->host, &key);
+
 	spin_lock(&mapping->host->i_lock);
 	xa_lock_irq(&mapping->i_pages);
-	__filemap_remove_folio(folio, NULL);
+	___filemap_remove_folio(folio, NULL, pkey);
 	xa_unlock_irq(&mapping->i_pages);
 	if (mapping_shrinkable(mapping))
 		inode_add_lru(mapping->host);
@@ -316,18 +342,26 @@ static void page_cache_delete_batch(struct address_space *mapping,
 void delete_from_page_cache_batch(struct address_space *mapping,
 				  struct folio_batch *fbatch)
 {
+	struct cleancache_filekey *pkey;
+	struct cleancache_filekey key;
 	int i;
 
 	if (!folio_batch_count(fbatch))
 		return;
 
+	/*
+	 * cleancache_get_key() uses sb->s_export_op->encode_fh which can
+	 * also take inode->i_lock. Get the key before taking inode->i_lock.
+	 */
+	pkey = cleancache_get_key(mapping->host, &key);
+
 	spin_lock(&mapping->host->i_lock);
 	xa_lock_irq(&mapping->i_pages);
 	for (i = 0; i < folio_batch_count(fbatch); i++) {
 		struct folio *folio = fbatch->folios[i];
 
 		trace_mm_filemap_delete_from_page_cache(folio);
-		filemap_unaccount_folio(mapping, folio);
+		filemap_unaccount_folio(mapping, folio, pkey);
 	}
 	page_cache_delete_batch(mapping, fbatch);
 	xa_unlock_irq(&mapping->i_pages);
@@ -1865,6 +1899,13 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index)
 out:
 	rcu_read_unlock();
 
+	if (folio && !folio_test_uptodate(folio)) {
+		struct cleancache_filekey key;
+
+		if (cleancache_restore_folio(folio, cleancache_get_key(mapping->host, &key)))
+			folio_mark_uptodate(folio);
+	}
+
 	return folio;
 }
 
@@ -2430,6 +2471,7 @@ static int filemap_update_page(struct kiocb *iocb,
 		struct address_space *mapping, size_t count,
 		struct folio *folio, bool need_uptodate)
 {
+	struct cleancache_filekey key;
 	int error;
 
 	if (iocb->ki_flags & IOCB_NOWAIT) {
@@ -2466,6 +2508,11 @@ static int filemap_update_page(struct kiocb *iocb,
 				   need_uptodate))
 		goto unlock;
 
+	if (cleancache_restore_folio(folio,
+			cleancache_get_key(folio->mapping->host, &key))) {
+		folio_mark_uptodate(folio);
+		goto unlock;
+	}
 	error = -EAGAIN;
 	if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT | IOCB_WAITQ))
 		goto unlock;
diff --git a/mm/truncate.c b/mm/truncate.c
index 5d98054094d1..6a981c2e57ca 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -20,6 +20,7 @@
 #include <linux/pagevec.h>
 #include <linux/task_io_accounting_ops.h>
 #include <linux/shmem_fs.h>
+#include <linux/cleancache.h>
 #include <linux/rmap.h>
 #include "internal.h"
 
@@ -190,6 +191,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio)
  */
 bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 {
+	struct cleancache_filekey key;
 	loff_t pos = folio_pos(folio);
 	unsigned int offset, length;
 	struct page *split_at, *split_at2;
@@ -218,6 +220,8 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 	if (!mapping_inaccessible(folio->mapping))
 		folio_zero_range(folio, offset, length);
 
+	cleancache_invalidate_folio(folio->mapping, folio,
+			cleancache_get_key(folio->mapping->host, &key));
 	if (folio_needs_release(folio))
 		folio_invalidate(folio, offset, length);
 	if (!folio_test_large(folio))
@@ -337,6 +341,7 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
 void truncate_inode_pages_range(struct address_space *mapping,
 				loff_t lstart, loff_t lend)
 {
+	struct cleancache_filekey key;
 	pgoff_t		start;		/* inclusive */
 	pgoff_t		end;		/* exclusive */
 	struct folio_batch fbatch;
@@ -347,7 +352,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	bool		same_folio;
 
 	if (mapping_empty(mapping))
-		return;
+		goto out;
 
 	/*
 	 * 'start' and 'end' always covers the range of pages to be fully
@@ -435,6 +440,10 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		truncate_folio_batch_exceptionals(mapping, &fbatch, indices);
 		folio_batch_release(&fbatch);
 	}
+
+out:
+	cleancache_invalidate_inode(mapping,
+				    cleancache_get_key(mapping->host, &key));
 }
 EXPORT_SYMBOL(truncate_inode_pages_range);
 
@@ -488,6 +497,10 @@ void truncate_inode_pages_final(struct address_space *mapping)
 		xa_unlock_irq(&mapping->i_pages);
 	}
 
+	/*
+	 * Cleancache needs notification even if there are no pages or shadow
+	 * entries.
+	 */
 	truncate_inode_pages(mapping, 0);
 }
 EXPORT_SYMBOL(truncate_inode_pages_final);
@@ -643,6 +656,7 @@ int folio_unmap_invalidate(struct address_space *mapping, struct folio *folio,
 int invalidate_inode_pages2_range(struct address_space *mapping,
 				  pgoff_t start, pgoff_t end)
 {
+	struct cleancache_filekey key;
 	pgoff_t indices[PAGEVEC_SIZE];
 	struct folio_batch fbatch;
 	pgoff_t index;
@@ -652,7 +666,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 	int did_range_unmap = 0;
 
 	if (mapping_empty(mapping))
-		return 0;
+		goto out;
 
 	folio_batch_init(&fbatch);
 	index = start;
@@ -713,6 +727,9 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 	if (dax_mapping(mapping)) {
 		unmap_mapping_pages(mapping, start, end - start + 1, false);
 	}
+out:
+	cleancache_invalidate_inode(mapping,
+				    cleancache_get_key(mapping->host, &key));
 	return ret;
 }
 EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range);

From patchwork Thu Mar 20 17:39:30 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Suren Baghdasaryan <surenb@google.com>
X-Patchwork-Id: 14024211
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9FA0FC36002
	for <linux-mm@archiver.kernel.org>; Thu, 20 Mar 2025 17:39:43 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id EDD75280007; Thu, 20 Mar 2025 13:39:40 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E89EA280006; Thu, 20 Mar 2025 13:39:40 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id CDF8A280007; Thu, 20 Mar 2025 13:39:40 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com
 [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id AD6FD280006
	for <linux-mm@kvack.org>; Thu, 20 Mar 2025 13:39:40 -0400 (EDT)
Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 52756161825
	for <linux-mm@kvack.org>; Thu, 20 Mar 2025 17:39:41 +0000 (UTC)
X-FDA: 83242641762.30.CC3706D
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	by imf16.hostedemail.com (Postfix) with ESMTP id 7AA53180016
	for <linux-mm@kvack.org>; Thu, 20 Mar 2025 17:39:39 +0000 (UTC)
Authentication-Results: imf16.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=mIlX91oQ;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf16.hostedemail.com: domain of
 32lLcZwYKCDclnkXgUZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--surenb.bounces.google.com
 designates 209.85.216.74 as permitted sender)
 smtp.mailfrom=32lLcZwYKCDclnkXgUZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--surenb.bounces.google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742492379; a=rsa-sha256;
	cv=none;
	b=Dg46GsxhjjiUycbLYIEb/D55LQsiTExEPLBXgta8wD6pJ5JopXxDFZCJ6alhD4KEnXgVe4
	go1hLT1rd/eTWmLtgPRgG0W5T1S4KLhbZ5FGcvG9dU60advbQXfD8v9fkJ3I4gvd/o6gMy
	6W/wc/lXUwsO02rRSgiaeTZ2/NgCXNA=
ARC-Authentication-Results: i=1;
	imf16.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=mIlX91oQ;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf16.hostedemail.com: domain of
 32lLcZwYKCDclnkXgUZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--surenb.bounces.google.com
 designates 209.85.216.74 as permitted sender)
 smtp.mailfrom=32lLcZwYKCDclnkXgUZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--surenb.bounces.google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1742492379;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=utO18jM3DqI43jAs46xftkc+iTkKZEsg2nm4Y+WIpWw=;
	b=BwnhLLKLkXER6cLpTHIxo3hc6WidjOo/7hJCULnBrBeht7ZZwjPuR+N0bNv2jHh7YkGsYm
	+mv/teirASA2j7BpSPrsgLuzGo9vEv5sQCfAzSzn2QfH/GYFNZHtWzMja17AqXfa7nNng+
	3QxWHZAzXGVd34zyvCA0vRao/cq4jGA=
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-2ff4b130bb2so1508175a91.0
        for <linux-mm@kvack.org>; Thu, 20 Mar 2025 10:39:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1742492378; x=1743097178; darn=kvack.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=utO18jM3DqI43jAs46xftkc+iTkKZEsg2nm4Y+WIpWw=;
        b=mIlX91oQ4FwxGcJ8LogEQnk2fP61iueFelOzaiZ7FQQnjX8KF1FKmul7cCYtZbsh5g
         6YtzKMTtOahuQpdTC+llY1NwITccQKbSzQj3eywQmyw5fnxh6BUyysRZVgNuUoaCi/74
         C5u/SOL2JgXRKeuE4QFexasZ++fk5wub2YyoPlLFDP8xg02wU4kEPR8kj7JAvVrgtc+S
         Fx//xOFC8MyRL8Y2HhpJS0Fg1CUZFxT5LVk9jWZqbhUbkmZ6j9yA4E7QcbnytdHLIkwk
         Tr1kOFjdAjyOvuIdGWckZIAdSsjWAN1gcV1BCwwG+5nDilR2d+XXBNR1m24KokvvE+aX
         m/5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742492378; x=1743097178;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=utO18jM3DqI43jAs46xftkc+iTkKZEsg2nm4Y+WIpWw=;
        b=q83POjayRcwROEVSsgEq3o+DGa5f3efctPKaFes+Woh0WpysCwMmDjV9oji55shzjq
         ok6AM19IZcka1L8hjC6kdH8cRUgxasHh2pdHyTmN5QA088LDEL8sOcIwAvB9Zpj11/zv
         tTIkKxo6xEBKIYN/KBWnUy82YWu7q75hfwEgQqEJS5mbAeqLxgerMraMaXDOuYGu2dh6
         46rTndw9HG7G39T25Y9BqwSjHjAePjZTjscgoKThTkn49Q+4g9VMaeJOIo4SRNXkJiH8
         jHPHR2XsE9248xilrBhxEA09Cd39x1gZZAjaOeIe/NLlbS5wj4t0zF2O0d+05ROdrjRs
         s/Xg==
X-Forwarded-Encrypted: i=1;
 AJvYcCWReBJL2KT5XEVoyzWqHsjQsaPeikscik4eEFsiQctlGsABVeQY+QQdu0AYXqW70d4Z24cOwEJ1KQ==@kvack.org
X-Gm-Message-State: AOJu0YzBpWbmtjRVzoPepiPjKv0IcfNktRnkNLvb5S1Sgx+fTAwE+XNn
	EZcUQb9CKNb94j8595ktvEo5/0j8kGz3qPQ24iHjMN3IVQKmxu3o2Tm9tJ4cqIHkj1/X2NL8DRE
	ZYA==
X-Google-Smtp-Source: 
 AGHT+IGjhRps2uk6ZFQZIC7R3fpxDHdrRFbLqUZ0BLeU3xJE1wYlTDOJepff3lsj9OvAu2ohK9dAI+Xpphc=
X-Received: from pjf11.prod.google.com ([2002:a17:90b:3f0b:b0:2fa:1803:2f9f])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:2743:b0:2ff:5a9d:9390
 with SMTP id 98e67ed59e1d1-3030fe779bbmr123144a91.8.1742492378297; Thu, 20
 Mar 2025 10:39:38 -0700 (PDT)
Date: Thu, 20 Mar 2025 10:39:30 -0700
In-Reply-To: <20250320173931.1583800-1-surenb@google.com>
Mime-Version: 1.0
References: <20250320173931.1583800-1-surenb@google.com>
X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog
Message-ID: <20250320173931.1583800-3-surenb@google.com>
Subject: [RFC 2/3] mm: introduce GCMA
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: willy@infradead.org, david@redhat.com, vbabka@suse.cz,
	lorenzo.stoakes@oracle.com, liam.howlett@oracle.com,
 alexandru.elisei@arm.com,
	peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org,
	m.szyprowski@samsung.com, iamjoonsoo.kim@lge.com, mina86@mina86.com,
	axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org,
	hch@infradead.org, jack@suse.cz, hbathini@linux.ibm.com,
	sourabhjain@linux.ibm.com, ritesh.list@gmail.com, aneesh.kumar@kernel.org,
	bhelgaas@google.com, sj@kernel.org, fvdl@google.com, ziy@nvidia.com,
	yuzhao@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org,
	linuxppc-dev@lists.ozlabs.org, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, Minchan Kim <minchan@google.com>
X-Rspamd-Server: rspam07
X-Rspam-User: 
X-Stat-Signature: stwfy1m679qpth9ncbf4w9x3o1p586ij
X-Rspamd-Queue-Id: 7AA53180016
X-HE-Tag: 1742492379-344436
X-HE-Meta: 
 U2FsdGVkX1+FS5Xfc2DESk0HOXwhdcBMXnEcS/T8jAXlQdKE00baZpv5BhAl3KP7PYOEN+2ZT7oARp3hfIx/JmNgPSCt2RRtRU66YQHqLpdkF+G4PukZKCZMUteYASuW71UMHMumF3ixRLgsDxaZyTw8Qdb4bzY+RZXQRoj4tYtwFvEhzoincd5IlPW4UFGTqNXQNataDtrppotHW291hiwaHxG70s4oFddUPVqjeXTkQ2n8LFJkEcvlGyRohznP+tFHzl7sUI2DY7oovW8rc9On+FOFohjEyjwLbBE1745MBSUdmIjTes5gOli4i3rKX7Y/oSFRAa0ZkqPnpJ11zwAOQdEKyD8ps7aUq6sJoYRhcN43VP9jrEUdxiYaOPJv4HE3BpEhJV9I6i+QJHCd9pr+tDv7dUY6irX8uImdahB2tYUGkBOxXEV9TWtyjTOPui95ZeMtlOConUpec6w/F6Rubo0j1XhMXqe5Kexf4X5L6k1FPL9Cde+pEsS8/BZb0cpGXqrIjDsh/BahtpqpYsgiUToZca37rLVHgqgi+wqaMq+KN1zmmAXZFJtAu9C1h0Q05GjkIFJ7FSB79N+KmHxKsGp4sKT6EmhnT7ku/r8kHMsYFdCdW6yTLffLu5AdJmS9qKadVqkLhfnGgCQpF3rfbXAJ87kVgr38QyBKx3wYWbWu9x8Y5kGgNTurKnWsBrmipb6Eafu7Xr2JrZbYzUe4CpjLlMXea9/DHOUDJquFt5Psh/M1hHtuf+i+41xVeOJLlmuUjls2WUnzQKn+xBPRt1S7xVmYjZegyhiY9eui8flnvzkgQh2PskTAdS4aH0tuNW3qmVB3VNEL8rP9TbXRk5KlgsACzgXZWNZcE381WSu96tGjfEUNemJPukxgXA6ZFM02Qs/JBw5ty1OiceAV7h/YI0fLegdHzvDiOPE9iQ3Xmp/ftb6uYZKb3vI6iqjXRKDXmzSsed2rV0V
 G2jawbdt
 hrku+hACrOjUD4n1xApp2lcF5aZts9xKkWtiA7tYmqPBvqaP01vCw6Aer93ALIrYX1HuM0GfFGPxA/36AkYSaayMUMlKn6MQdCHIMfP+hdjIZjNIQ9RFJacFih+WaLSXPHImR8wiL7Kbb9EggHkpuV8bO/SPOm8/50X7pUmgzu5+ghHC8zchlEZtiC3VZyp9uFhh9fmsKHdRgbJ36OYBXplBhzw9MdBkEIrI/2lC3AImJJ+m/+QFj9VtSfEe8WKP9zFuTj1PCSWNlqYVolOxWvpHxoRJzCHII6cn4pvBG0Xs+DxSauypYNlVu6MwmWPTFaUqFBv42IDFARUvWvziqpWAN2OUNiPbYW+pZFCzYaGTSLtTJoNGyUUJbqUNie3sWv8B2tfKtiAagUGWHlkMagMnL+K8xv/GqJme+fk807xa1oRFHM4cuYGKZ5dnhBogKDHxvPSwWkgpjIYqEaGgOEsJ8ZhZ4P4Mo9vd4ypsYH0isbQwrCdDtxXUhA1ws1LMu6/YYTFdGT+W2Ec/ceBblA4XRceHDpHQ2sNGVPki83bB7TRj4GZVprcvBNfllfTR4yxVj429GRRKhIjf80L7UHkqRzbIoKJfXCfUr7Rb3BPWrq96aHLnpXbIhQ3u6Bo4VPDdUloxpH9f4kDXfp//hNy31iggTLAYiFS3ISOy9h3Div2EvzgVVhXaqi6KeObmBNhLmLgPDdX/CmaOKuRdlprlDqk9FKsBwyT8J16+mSSrFiCw=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

From: Minchan Kim <minchan@google.com>

This patch introduces GCMA (Guaranteed Contiguous Memory Allocator)
cleacache backend which reserves some amount of memory at the boot
and then donates it to store clean file-backed pages in the cleancache.
GCMA aims to guarantee contiguous memory allocation success as well as
low and deterministic allocation latency.

Notes:
Originally, the idea was posted by SeongJae Park and Minchan Kim [1].
Later Minchan reworked it to be used in Android as a reference for
Android vendors to use [2].

[1] https://lwn.net/Articles/619865/
[2] https://android-review.googlesource.com/q/topic:%22gcma_6.12%22

Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/gcma.h |  12 ++++
 mm/Kconfig           |  15 +++++
 mm/Makefile          |   1 +
 mm/gcma.c            | 155 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 183 insertions(+)
 create mode 100644 include/linux/gcma.h
 create mode 100644 mm/gcma.c

diff --git a/include/linux/gcma.h b/include/linux/gcma.h
new file mode 100644
index 000000000000..2ce40fcc74a5
--- /dev/null
+++ b/include/linux/gcma.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __GCMA_H__
+#define __GCMA_H__
+
+#include <linux/types.h>
+
+int gcma_register_area(const char *name,
+		       unsigned long start_pfn, unsigned long count);
+void gcma_alloc_range(unsigned long start_pfn, unsigned long count);
+void gcma_free_range(unsigned long start_pfn, unsigned long count);
+
+#endif /* __GCMA_H__ */
diff --git a/mm/Kconfig b/mm/Kconfig
index d6ebf0fb0432..85268ef901b6 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1002,6 +1002,21 @@ config CMA_AREAS
 
 	  If unsure, leave the default value "8" in UMA and "20" in NUMA.
 
+config GCMA
+       bool "GCMA (Guaranteed Contiguous Memory Allocator)"
+       depends on CLEANCACHE
+	help
+	  This enables the Guaranteed Contiguous Memory Allocator to allow
+	  low latency guaranteed contiguous memory allocations. Memory
+	  reserved by GCMA is donated to cleancache to be used as pagecache
+	  extension. Once GCMA allocation is requested, necessary pages are
+	  taken back from the cleancache and used to satisfy the request.
+	  Cleancache guarantees low latency successful allocation as long
+	  as the total size of GCMA allocations does not exceed the size of
+	  the memory donated to the cleancache.
+
+	  If unsure, say "N".
+
 config MEM_SOFT_DIRTY
 	bool "Track memory changes"
 	depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
diff --git a/mm/Makefile b/mm/Makefile
index 084dbe9edbc4..2173d395d371 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -149,3 +149,4 @@ obj-$(CONFIG_EXECMEM) += execmem.o
 obj-$(CONFIG_TMPFS_QUOTA) += shmem_quota.o
 obj-$(CONFIG_PT_RECLAIM) += pt_reclaim.o
 obj-$(CONFIG_CLEANCACHE) += cleancache.o
+obj-$(CONFIG_GCMA)	+= gcma.o
diff --git a/mm/gcma.c b/mm/gcma.c
new file mode 100644
index 000000000000..263e63da0c89
--- /dev/null
+++ b/mm/gcma.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GCMA (Guaranteed Contiguous Memory Allocator)
+ *
+ */
+
+#define pr_fmt(fmt) "gcma: " fmt
+
+#include <linux/cleancache.h>
+#include <linux/gcma.h>
+#include <linux/hashtable.h>
+#include <linux/highmem.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/xarray.h>
+
+#define MAX_GCMA_AREAS		64
+#define GCMA_AREA_NAME_MAX_LEN	32
+
+struct gcma_area {
+	int area_id;
+	unsigned long start_pfn;
+	unsigned long end_pfn;
+	char name[GCMA_AREA_NAME_MAX_LEN];
+};
+
+static struct gcma_area areas[MAX_GCMA_AREAS];
+static atomic_t nr_gcma_area = ATOMIC_INIT(0);
+static DEFINE_SPINLOCK(gcma_area_lock);
+
+static void alloc_page_range(struct gcma_area *area,
+			     unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long scanned = 0;
+	unsigned long pfn;
+	struct page *page;
+	int err;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		if (!(++scanned % XA_CHECK_SCHED))
+			cond_resched();
+
+		page = pfn_to_page(pfn);
+		err = cleancache_backend_get_folio(area->area_id, page_folio(page));
+		VM_BUG_ON(err);
+	}
+}
+
+static void free_page_range(struct gcma_area *area,
+			    unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long scanned = 0;
+	unsigned long pfn;
+	struct page *page;
+	int err;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		if (!(++scanned % XA_CHECK_SCHED))
+			cond_resched();
+
+		page = pfn_to_page(pfn);
+		err = cleancache_backend_put_folio(area->area_id,
+						   page_folio(page));
+		VM_BUG_ON(err);
+	}
+}
+
+int gcma_register_area(const char *name,
+		       unsigned long start_pfn, unsigned long count)
+{
+	LIST_HEAD(folios);
+	int i, area_id;
+	int nr_area;
+	int ret = 0;
+
+	for (i = 0; i < count; i++) {
+		struct folio *folio;
+
+		folio = page_folio(pfn_to_page(start_pfn + i));
+		list_add(&folio->lru, &folios);
+	}
+
+	area_id = cleancache_register_backend(name, &folios);
+	if (area_id < 0)
+		return area_id;
+
+	spin_lock(&gcma_area_lock);
+
+	nr_area = atomic_read(&nr_gcma_area);
+	if (nr_area < MAX_GCMA_AREAS) {
+		struct gcma_area *area = &areas[nr_area];
+
+		area->area_id = area_id;
+		area->start_pfn = start_pfn;
+		area->end_pfn = start_pfn + count;
+		strscpy(area->name, name);
+		/* Ensure above stores complete before we increase the count */
+		atomic_set_release(&nr_gcma_area, nr_area + 1);
+	} else {
+		ret = -ENOMEM;
+	}
+
+	spin_unlock(&gcma_area_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(gcma_register_area);
+
+void gcma_alloc_range(unsigned long start_pfn, unsigned long count)
+{
+	int nr_area = atomic_read_acquire(&nr_gcma_area);
+	unsigned long end_pfn = start_pfn + count;
+	struct gcma_area *area;
+	int i;
+
+	for (i = 0; i < nr_area; i++) {
+		unsigned long s_pfn, e_pfn;
+
+		area = &areas[i];
+		if (area->end_pfn <= start_pfn)
+			continue;
+
+		if (area->start_pfn > end_pfn)
+			continue;
+
+		s_pfn = max(start_pfn, area->start_pfn);
+		e_pfn = min(end_pfn, area->end_pfn);
+		alloc_page_range(area, s_pfn, e_pfn);
+	}
+}
+EXPORT_SYMBOL_GPL(gcma_alloc_range);
+
+void gcma_free_range(unsigned long start_pfn, unsigned long count)
+{
+	int nr_area = atomic_read_acquire(&nr_gcma_area);
+	unsigned long end_pfn = start_pfn + count;
+	struct gcma_area *area;
+	int i;
+
+	for (i = 0; i < nr_area; i++) {
+		unsigned long s_pfn, e_pfn;
+
+		area = &areas[i];
+		if (area->end_pfn <= start_pfn)
+			continue;
+
+		if (area->start_pfn > end_pfn)
+			continue;
+
+		s_pfn = max(start_pfn, area->start_pfn);
+		e_pfn = min(end_pfn, area->end_pfn);
+		free_page_range(area, s_pfn, e_pfn);
+	}
+}
+EXPORT_SYMBOL_GPL(gcma_free_range);

From patchwork Thu Mar 20 17:39:31 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Suren Baghdasaryan <surenb@google.com>
X-Patchwork-Id: 14024212
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A10E2C28B30
	for <linux-mm@archiver.kernel.org>; Thu, 20 Mar 2025 17:39:46 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id C628B280008; Thu, 20 Mar 2025 13:39:42 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id BE77F280006; Thu, 20 Mar 2025 13:39:42 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 99F31280008; Thu, 20 Mar 2025 13:39:42 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com
 [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 72088280006
	for <linux-mm@kvack.org>; Thu, 20 Mar 2025 13:39:42 -0400 (EDT)
Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 1988F1CCEA8
	for <linux-mm@kvack.org>; Thu, 20 Mar 2025 17:39:43 +0000 (UTC)
X-FDA: 83242641846.07.C04FEAE
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	by imf15.hostedemail.com (Postfix) with ESMTP id 33982A0003
	for <linux-mm@kvack.org>; Thu, 20 Mar 2025 17:39:41 +0000 (UTC)
Authentication-Results: imf15.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=rYCUVAhY;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf15.hostedemail.com: domain of
 33FLcZwYKCDknpmZiWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--surenb.bounces.google.com
 designates 209.85.214.202 as permitted sender)
 smtp.mailfrom=33FLcZwYKCDknpmZiWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--surenb.bounces.google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742492381; a=rsa-sha256;
	cv=none;
	b=2p8lMA7UWR+Kg8ByxCCA3GdOeTle10IgKaQnzdp3rIBGI1+1iDJXX4JNa4rHL240vuhTpu
	dzeOdx4JuqwclISHOG4mcdYJ0hkGAIchXvxs0VZ/Y141D953EuII6Ks5TBu3AKxDQYZF/3
	rN6eotjqVYtQXDkhkJY+YUNfy3j/U9Y=
ARC-Authentication-Results: i=1;
	imf15.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=rYCUVAhY;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf15.hostedemail.com: domain of
 33FLcZwYKCDknpmZiWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--surenb.bounces.google.com
 designates 209.85.214.202 as permitted sender)
 smtp.mailfrom=33FLcZwYKCDknpmZiWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--surenb.bounces.google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1742492381;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=L1LeZ1NqRBiJP18aw9wNSepnO5CEnnjsFwRWiKcady0=;
	b=J1I+ALOjdXEZ0fzdccXV7NRDfthYKERTkQ473SbdTK0iZEO17EJx59DDjE4JW8oqay99wH
	HUY8+Ts8+X5+eST1qPL1Dqla6VoW/YLdQdOVcbVxWvTWpWC2V+LfqVrTbWN6LqwI2CV1tF
	qDR+J5FJgl2Fd/bjPesP8v22VRxHpzk=
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-224192ff68bso13075915ad.1
        for <linux-mm@kvack.org>; Thu, 20 Mar 2025 10:39:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1742492380; x=1743097180; darn=kvack.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=L1LeZ1NqRBiJP18aw9wNSepnO5CEnnjsFwRWiKcady0=;
        b=rYCUVAhYRfZr5qDJxPQ03e7acBeCZaL0zUFfQNiV6musCgYGSlAunbMtzpBMxIs6G9
         ZpHZ/SWPgl0FAoGbntGT16e+KLjTN3crLw8XbCfifW9bE0jL45ngnaZShLm0ytojrlSj
         ts3jkeFdi8h6ge7V+lbQUOgKtMkjlO1Pl0dDdoRwZMyrBil+F7n8Rhum7xwSJO3BlYnd
         5WHP92C0XxWC4liPLwYuo0Vg1ZEWQZYQUmnSmtGAP/YMoxlljNm8tdvbMQcf/aL+Jo4V
         sYo3k66ccWrJSC3zlpZ0uxSR+C36Tz9rVisFXG0RIbjo4HgGnzB4c6SZuRfsYE0P7z+5
         6o9g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1742492380; x=1743097180;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=L1LeZ1NqRBiJP18aw9wNSepnO5CEnnjsFwRWiKcady0=;
        b=k7ddiERB/6SSSbIaGJhdbBCrT1NKRWwEjSBM7XKmk+VFs1qGiOanRhFozxLUhA6f7E
         1J/3tgQZrUrplH4Ky2LBwKXvPU9sj91nibi5TYtXyC1RFeEDocGdKUI17VWYE0uPZtHU
         P38Pnr9yJAYHOFst6ZhS6UqKaBYif2VdBiooqRykBY9+ej8g2CM8nCUnGBLr0LwCyTqj
         MsWn556SHSiM+IwYJ6zd9HfxL0uLy5dYyBsAZjtGaz34ohF8qUwSNM5tKKM8mHIDLm3E
         V63wpiWz3dsSsinIKK8v12LK+34xoKlKr918g+69gyO+7CKvoefJgddP+eSrfhd02h75
         wUPQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCUovLVW31vwt2Iic4ObT5PcsnowH5s8IYM4Pagal83VCDEaquCtPR/gLBoBm3mjvM4FpBWvr3bokQ==@kvack.org
X-Gm-Message-State: AOJu0Yw9ncEBEAgcEI6xZBUZ1Lpk6g3apc/MgX2Y8dcKjx7KjwYnYXiQ
	FkOQCyTewTZPVA7ku990xrbFFvGTHsb58Uk1eDhEJjvkfI7fWW8QsgpOtY2Pfe6Xsg3oYUpRu39
	jXQ==
X-Google-Smtp-Source: 
 AGHT+IFIgtoncBq9NM/3xD7Imlcbq1xmz7ZhqdBzB7CyCMtuFxcwQyRcHUOe3EsR477ZgjHZHfVYEHABcPU=
X-Received: from plbkj16.prod.google.com ([2002:a17:903:6d0:b0:220:ecac:27e5])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:902:e888:b0:223:35cb:e421
 with SMTP id d9443c01a7336-22780e258cemr2827355ad.49.1742492380069; Thu, 20
 Mar 2025 10:39:40 -0700 (PDT)
Date: Thu, 20 Mar 2025 10:39:31 -0700
In-Reply-To: <20250320173931.1583800-1-surenb@google.com>
Mime-Version: 1.0
References: <20250320173931.1583800-1-surenb@google.com>
X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog
Message-ID: <20250320173931.1583800-4-surenb@google.com>
Subject: [RFC 3/3] mm: integrate GCMA with CMA using dt-bindings
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: willy@infradead.org, david@redhat.com, vbabka@suse.cz,
	lorenzo.stoakes@oracle.com, liam.howlett@oracle.com,
 alexandru.elisei@arm.com,
	peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org,
	m.szyprowski@samsung.com, iamjoonsoo.kim@lge.com, mina86@mina86.com,
	axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org,
	hch@infradead.org, jack@suse.cz, hbathini@linux.ibm.com,
	sourabhjain@linux.ibm.com, ritesh.list@gmail.com, aneesh.kumar@kernel.org,
	bhelgaas@google.com, sj@kernel.org, fvdl@google.com, ziy@nvidia.com,
	yuzhao@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org,
	linuxppc-dev@lists.ozlabs.org, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, Minchan Kim <minchan@google.com>
X-Rspam-User: 
X-Rspamd-Queue-Id: 33982A0003
X-Stat-Signature: is4p9e8tgj8tj5si9hmjqmjnmdkuqpgh
X-Rspamd-Server: rspam06
X-HE-Tag: 1742492380-628271
X-HE-Meta: 
 U2FsdGVkX18ZRhGkZLXR4mOsyR5KyIyUjeGRSY9U636DVV4MqpEIN9KIFp8LQfQ6bzdfMfwg3JNiV/f647ZVS3j5gVTTVMxl9IJRhuRp6iDWoy7niGqmFc1r+P5CEOY0kdU4cBxrOKTw4qSr97k3TYIi61nyBzW1X6Q5k+ogSfQNZVxIRzADObuRx1NKKPb8gEbBdL8cV8XgEekIGYixsor0Sz0r/nHxaMPVFTTJikokaXOApl/7i1QOq77QAzzIdZgR25uPx05vPuZKBxGtzYDZ6REAEJwx6CvCFi8orGkuj2cM5vu6FjELuGP0tpaqBiqCBKTuJKcR7mi2mQvCyR6wn/+E3rixcYlvBFxv+RZEN/FWWuFS+bcpjZv0V5csPmorEDIzJiYdogh70/4ZGm3oPSooJqZNjXHqQFmWMheViFxfy0nG/RU+/Yf8Wxqgmfe7BpZthUkdY0F9Xi4fGuvcpRD7xk0kDfnDkVDjaaOmwRnUURhG7xuYTQO9E1FPrFIYlaUCpXiU0t2EkD1upuWEkmTZ9vIrJ4efprbQzDZMcI2Rqu5LGwIZ0PKUqPWR/MTIMPB0+2pqutafm21ARKanSG87/94T49ptwamey95hG6adPvP4CVxe5Ez7AzWbdrcPmi0zzVVLlUs/A/giqR+TZPCV1EWJ2x3B9HqHLQIAaE7qs+thG8i4/fLocvqdUDbxVuVrhn3QFFgWBXVww2676ZjuxGcoqW6dKT6HmXjSxbSWYb1iWNc1ZPmqUsCoFP+17pvpHzbyOYd9XW/sbuvt1giph549lODOVtID1toXtLWye2bu9bZ+Upe9BadrokCttC/7jUql4zkWJ+cGCTh1WGg8ZyClx/WbpiWC1VINaqgZqoMxT0m4Bv9NilZ40NU0GKtNT5P5CpLm/D9zvjCZtsr/f42QTuj5sJrWOioKRbvzMaRZisONMmEXMBKZHzTg4A20rDnKgAcJAtS
 jfWjPJby
 bvG0kaEVn0B0O1aLyuwuwe53kPkpcj3sGWWxnS/MVRI0sN6/PzdHoYlAeo0xs3wIMjsdqYvUiDVHwSiG60Ch7nOrHu1pTuwJAOieNaHkssWdAOaUcIQ7B+RVTxysMm7qf4Byym/HwaF6VJ7FDRvzv2hlEH7/cUrmGYw1zIhfHqtgoYoJOtjwShK5TGpXbH29bbBBiOa4Fr5ymanIaPIxoOg0zDBKoluSQaRvcF69M68Ed/N34LTGeM1eUH7nZ8CE+PuofY//89xhhNPE2Yehy2MJ0PftYqmX0MNIjnaact4VkAMyiLmgloFsZc06ZilwojVlVEaeKyTwbfpGWxLnC5f8nVmqv9UDes6H26uvN/9klHyap7qBJGIwYo+ykFCc91T75VO1RkymYL+YTV0IpqHoVPHxW9AteUUJ/I52B5wQlzBf1jfYPwmTBHf9VCmSIs73xJHcr0mbBh0JLSRnr/kBRwux0DHY/C7wmDEjU1Uipf45OJzdAx/+QNa+94jKj3unhP8I1HcaxHwsaHhiWLRGVnotYN84TyX2YUGjEw2yeIxa2WJtWLLMc2GVy+HcOjw3nhH+7WNocIb2iuEMFt4AfjGIKsLP81FW3m16Xp+L3GRo=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

This patch introduces a new "guarantee" property for shared-dma-pool.
With this property, admin can create specific memory pool as
GCMA-based CMA if they care about allocation success rate and latency.
The downside of GCMA is that it can host only clean file-backed pages
since it's using cleancache as its secondary user.

Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 arch/powerpc/kernel/fadump.c |  2 +-
 include/linux/cma.h          |  2 +-
 kernel/dma/contiguous.c      | 11 ++++++++++-
 mm/cma.c                     | 33 ++++++++++++++++++++++++++-------
 mm/cma.h                     |  1 +
 mm/cma_sysfs.c               | 10 ++++++++++
 6 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 4b371c738213..4eb7be0cdcdb 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -111,7 +111,7 @@ void __init fadump_cma_init(void)
 		return;
 	}
 
-	rc = cma_init_reserved_mem(base, size, 0, "fadump_cma", &fadump_cma);
+	rc = cma_init_reserved_mem(base, size, 0, "fadump_cma", &fadump_cma, false);
 	if (rc) {
 		pr_err("Failed to init cma area for firmware-assisted dump,%d\n", rc);
 		/*
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 62d9c1cf6326..3207db979e94 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -46,7 +46,7 @@ extern int __init cma_declare_contiguous_multi(phys_addr_t size,
 extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 					unsigned int order_per_bit,
 					const char *name,
-					struct cma **res_cma);
+					struct cma **res_cma, bool gcma);
 extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned int align,
 			      bool no_warn);
 extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned long count);
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 055da410ac71..a68b3123438c 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -459,6 +459,7 @@ static int __init rmem_cma_setup(struct reserved_mem *rmem)
 	unsigned long node = rmem->fdt_node;
 	bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
 	struct cma *cma;
+	bool gcma;
 	int err;
 
 	if (size_cmdline != -1 && default_cma) {
@@ -476,7 +477,15 @@ static int __init rmem_cma_setup(struct reserved_mem *rmem)
 		return -EINVAL;
 	}
 
-	err = cma_init_reserved_mem(rmem->base, rmem->size, 0, rmem->name, &cma);
+	gcma = !!of_get_flat_dt_prop(node, "guarantee", NULL);
+#ifndef CONFIG_GCMA
+	if (gcma) {
+		pr_err("Reserved memory: unable to setup GCMA region, GCMA is not enabled\n");
+		return -EINVAL;
+	}
+#endif
+	err = cma_init_reserved_mem(rmem->base, rmem->size, 0, rmem->name,
+				    &cma, gcma);
 	if (err) {
 		pr_err("Reserved memory: unable to setup CMA region\n");
 		return err;
diff --git a/mm/cma.c b/mm/cma.c
index b06d5fe73399..f12cef849e58 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -26,6 +26,7 @@
 #include <linux/cma.h>
 #include <linux/highmem.h>
 #include <linux/io.h>
+#include <linux/gcma.h>
 #include <linux/kmemleak.h>
 #include <trace/events/cma.h>
 
@@ -165,11 +166,18 @@ static void __init cma_activate_area(struct cma *cma)
 			count = cmr->early_pfn - cmr->base_pfn;
 			bitmap_count = cma_bitmap_pages_to_bits(cma, count);
 			bitmap_set(cmr->bitmap, 0, bitmap_count);
+		} else {
+			count = 0;
 		}
 
-		for (pfn = cmr->early_pfn; pfn < cmr->base_pfn + cmr->count;
-		     pfn += pageblock_nr_pages)
-			init_cma_reserved_pageblock(pfn_to_page(pfn));
+		if (cma->gcma) {
+			gcma_register_area(cma->name, cmr->early_pfn,
+					   cma->count - count);
+		} else {
+			for (pfn = cmr->early_pfn; pfn < cmr->base_pfn + cmr->count;
+			     pfn += pageblock_nr_pages)
+				init_cma_reserved_pageblock(pfn_to_page(pfn));
+		}
 	}
 
 	spin_lock_init(&cma->lock);
@@ -270,7 +278,7 @@ static void __init cma_drop_area(struct cma *cma)
 int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 				 unsigned int order_per_bit,
 				 const char *name,
-				 struct cma **res_cma)
+				 struct cma **res_cma, bool gcma)
 {
 	struct cma *cma;
 	int ret;
@@ -301,6 +309,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 	cma->ranges[0].count = cma->count;
 	cma->nranges = 1;
 	cma->nid = NUMA_NO_NODE;
+	cma->gcma = gcma;
 
 	*res_cma = cma;
 
@@ -721,7 +730,8 @@ static int __init __cma_declare_contiguous_nid(phys_addr_t base,
 		base = addr;
 	}
 
-	ret = cma_init_reserved_mem(base, size, order_per_bit, name, res_cma);
+	ret = cma_init_reserved_mem(base, size, order_per_bit, name, res_cma,
+				    false);
 	if (ret)
 		memblock_phys_free(base, size);
 
@@ -815,7 +825,13 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
 
 		pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
 		mutex_lock(&cma->alloc_mutex);
-		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
+		if (cma->gcma) {
+			gcma_alloc_range(pfn, count);
+			ret = 0;
+		} else {
+			ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
+						 gfp);
+		}
 		mutex_unlock(&cma->alloc_mutex);
 		if (ret == 0) {
 			page = pfn_to_page(pfn);
@@ -992,7 +1008,10 @@ bool cma_release(struct cma *cma, const struct page *pages,
 	if (r == cma->nranges)
 		return false;
 
-	free_contig_range(pfn, count);
+	if (cma->gcma)
+		gcma_free_range(pfn, count);
+	else
+		free_contig_range(pfn, count);
 	cma_clear_bitmap(cma, cmr, pfn, count);
 	cma_sysfs_account_release_pages(cma, count);
 	trace_cma_release(cma->name, pfn, pages, count);
diff --git a/mm/cma.h b/mm/cma.h
index 41a3ab0ec3de..c2a5576d7987 100644
--- a/mm/cma.h
+++ b/mm/cma.h
@@ -47,6 +47,7 @@ struct cma {
 	char name[CMA_MAX_NAME];
 	int nranges;
 	struct cma_memrange ranges[CMA_MAX_RANGES];
+	bool gcma;
 #ifdef CONFIG_CMA_SYSFS
 	/* the number of CMA page successful allocations */
 	atomic64_t nr_pages_succeeded;
diff --git a/mm/cma_sysfs.c b/mm/cma_sysfs.c
index 97acd3e5a6a5..4ecc36270a4d 100644
--- a/mm/cma_sysfs.c
+++ b/mm/cma_sysfs.c
@@ -80,6 +80,15 @@ static ssize_t available_pages_show(struct kobject *kobj,
 }
 CMA_ATTR_RO(available_pages);
 
+static ssize_t gcma_show(struct kobject *kobj,
+			 struct kobj_attribute *attr, char *buf)
+{
+	struct cma *cma = cma_from_kobj(kobj);
+
+	return sysfs_emit(buf, "%d\n", cma->gcma);
+}
+CMA_ATTR_RO(gcma);
+
 static void cma_kobj_release(struct kobject *kobj)
 {
 	struct cma *cma = cma_from_kobj(kobj);
@@ -95,6 +104,7 @@ static struct attribute *cma_attrs[] = {
 	&release_pages_success_attr.attr,
 	&total_pages_attr.attr,
 	&available_pages_attr.attr,
+	&gcma_attr.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(cma);