From patchwork Thu Mar 20 17:39:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 14024210 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1631CC36001 for ; Thu, 20 Mar 2025 17:39:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A76CD280005; Thu, 20 Mar 2025 13:39:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A26C5280001; Thu, 20 Mar 2025 13:39:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 852F7280005; Thu, 20 Mar 2025 13:39:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5F5BE280001 for ; Thu, 20 Mar 2025 13:39:39 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 87D98817C7 for ; Thu, 20 Mar 2025 17:39:39 +0000 (UTC) X-FDA: 83242641678.03.F6BEEBE Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf17.hostedemail.com (Postfix) with ESMTP id 8AD6D40013 for ; Thu, 20 Mar 2025 17:39:37 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=E0p85XGJ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 32FLcZwYKCDUjliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=32FLcZwYKCDUjliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742492377; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=t+zfvdKqKRtuF8/jqI7ab7rT2M9nI8zR6FIUh1soqhk=; b=1E16+HE9wztYSEhs6WB3gubBZ1xF+zTIdbbEtbTTAqRJ/314pJZXovhtZ/3KCdV6jRBNcR xRs22i3zDu9IhZZVX2A84TqBWc6ZOS+VP+clAqaJl4oZMR7YPYyS3BefK27KrvzE3FJMJQ u9SQ31Xfs/VEkLT8ZW4pyajP9nrD+8I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742492377; a=rsa-sha256; cv=none; b=YycKnVOY/4mwu9RwVChJGLAufceS3wNWFCS+ARr9q8uSgPhdJz+3nIMBneZlI35dn8VLDV Ybx5zFvOjqfBC3JM7WvKPtO7WRD61PbKvGl0Zl4tXooXKy6xlVjAafhOvVvxnMtbt/6O98 7UdE1nEpKkF8RjGvNc7u4u1L3g9T9Lg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=E0p85XGJ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 32FLcZwYKCDUjliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=32FLcZwYKCDUjliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-225107fbdc7so15401535ad.0 for ; Thu, 20 Mar 2025 10:39:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742492376; x=1743097176; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=t+zfvdKqKRtuF8/jqI7ab7rT2M9nI8zR6FIUh1soqhk=; b=E0p85XGJeNxe4rjStO5BOD5jst6aLUwu7SwsAaqizRZfsh/VL6ZZAG/NOTtY05YpY3 9pv5k9e0v/Gz5Dqr1UVwvCSUlHUI5FVNlY5eVnKy52wy/SqNExPGc8x0TNrtKFJbh4u4 TypnCTvB8sbZhRLcLaDuACIDSBlhqcIeXnJ9+lU+oP2cZiwM5aszsfcsd2CkQEO6WZvu DH8ovQ+lBESOJ0tSoIf4O9iBA/UDfY+13NG2ALkPmlsuZb5PFggm+oSBJSZoQxQ7U8Sh fUzgbdoBAakAAEPUcJofPO0ixls3vYMdxBdo1fSvzmlnY8QkLbtRpzITzcrAiVGEqQR2 /fVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742492376; x=1743097176; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=t+zfvdKqKRtuF8/jqI7ab7rT2M9nI8zR6FIUh1soqhk=; b=EzYAW5NZiJUnjZGMH9gUMsZZ0F1NnByeI653M5KoITZqx6hdXQQzcvzblCfvL6uD1l cm3lxvjvFUzohwgOc4Sdu36UEIyJ4wD+aEnEtCUJPkkS8zwgngYTqOFhOAennLKm/xHV Xxa36/8+ZcnL6uopBZr9JTNRiRY2p5U8+oStUqwYeXRhKlqihM7tvcJzK4JkjIQV4RGY 2UOz09uLuv5I77r5UucvTGlEazL8kT3u+VFoqhCk9LF0mEfN71WVV+eEVBKXScQ14+iN m1cOTDu8zZBdR9fTXDzwT1myIkWEoSsvfWNlm+5ekNXaY7M3/DDwwatrr0TgieljGdP2 MJYg== X-Forwarded-Encrypted: i=1; AJvYcCX3epGS4cV5HExSM2HzX6OqHfg1qdC1f7YlBMFj23nGlu8gMBWLh35YwdQAy+hIJdm5D6xUpALFtg==@kvack.org X-Gm-Message-State: AOJu0YxeMK6t7DHwtD88Agcq8HVb50yfMvrxlwpA4WNtV5oVlvlREfYU I4iizyrMQKKOii2veVArB3JLZzjNhRJm/3pJyJ+tTmp/j85phbszEpRmUgCi+gYdfNlMspvgPPY 6bQ== X-Google-Smtp-Source: AGHT+IEj9U/qqkg2JK42tE1kXZKV4PCOb8pIWaocVHh7N8zNZBI6gh3bGtWfH3FUXtlThY+c9ekx5NvITFQ= X-Received: from pgbdp2.prod.google.com ([2002:a05:6a02:f02:b0:af2:446a:1332]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:748d:b0:1f5:7f56:a649 with SMTP id adf61e73a8af0-1fe42f359e6mr409764637.13.1742492376275; Thu, 20 Mar 2025 10:39:36 -0700 (PDT) Date: Thu, 20 Mar 2025 10:39:29 -0700 In-Reply-To: <20250320173931.1583800-1-surenb@google.com> Mime-Version: 1.0 References: <20250320173931.1583800-1-surenb@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250320173931.1583800-2-surenb@google.com> Subject: [RFC 1/3] mm: implement cleancache From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, david@redhat.com, vbabka@suse.cz, lorenzo.stoakes@oracle.com, liam.howlett@oracle.com, alexandru.elisei@arm.com, peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, m.szyprowski@samsung.com, iamjoonsoo.kim@lge.com, mina86@mina86.com, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, hbathini@linux.ibm.com, sourabhjain@linux.ibm.com, ritesh.list@gmail.com, aneesh.kumar@kernel.org, bhelgaas@google.com, sj@kernel.org, fvdl@google.com, ziy@nvidia.com, yuzhao@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Minchan Kim X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8AD6D40013 X-Stat-Signature: yxdjdsw1h5xxdfgktreu7u13t8r8injo X-HE-Tag: 1742492377-425234 X-HE-Meta: U2FsdGVkX19OdceVVeTjUQfU/CMfNVohgO+OKB97plykCpd7ux4J3MlksReh4vxDkCZMHXOJhTNKfXS6xj5PrVRi/sp6JFxieOpKnCeUuDYEEvVBGBZ2G+RVvUEzmzoCMPkN/Brg55shy1WN1vKWtH137DHcX20VPl85DebmYb9rJRsrhOyK6+7Vh1shZtaJw6KXP0LKJfQ3vN9ILa3NRhzQYp4ZIh8vxB+GZqZ7LzJ6an7hUlqY4vevdyCJJgos5Ty1zefcYmtiiNQ+7HuAjCRr0B5BEUBtnOER4/J6p/D98F0NCdXJT2wY6xD3qQBPuD311i7Ja0vjKtfI5bn8YZjK9/drsmh/xRnGtUudW2pnyT+97rWu+FsK3PRnCFlOACwnf1N/OHYWM5f+IQ5zCQ/1Lki/KWQm27Ew+B31cTdyYneGvyBR0pZzcfyZRgIOR1Xua0anhSs4gzb8REaPYMCOC5RBTrEI4AJ8h5Gi+kqvbB3nfJUJ1xwla2LKSCdTXl5lI2dhcNN9W0kHPdKwsvkWnuTs8GniJh7C3JtMuZUbcgqAMNJerogH5zJWc1IRQ12/wF/s0FguPO0wXWkMzCb9MJ/hS9lSrGDJ6iFbNlmFFAsYb13OD401Y6lvJd+ZkKMoJWt23tx/mcumqLkhkWFEMSv7PH6gA0DPhoDhaFF70RohxkUj6+l06PmLYvTeaUzxQ0EGfED8c7Y5i7AFLGRgPU9mpPe9i5a5PEQG8jDWi2muUzCnruXHp8fVorlui7Tf/2WGyfO+SkfvWH3Dd7WzW/dUVYNyWnk4tnnycyrHgd62u3LH7QhWnspM5K2Wesa6J8Bsz/SqPaVq94J3t36+fK3izn4EtxJuVgXk7kkaWmcgVN0Tapxu6PKxUGCbN34jA0BQTqSiPM+qXaieuz/P4EKcEE7noRPc6jok0SAPtsBCxYjV5vDEbpvYZjZwzC8foqBk/PkJx4bSC3u isYK0dVw 2+Y3WNc64fakvW0ot0B0rsbkI9BY9k4AdIFPHtr9HsPALmQ0eIyfjuH+feWTi2bYL3UdUSds/Q2QrS7c+RSafh8CnsO+HvcMty0tCwQAlKswJaEcnx+wOJ+KxIACWcCHEegOwg9jJh3sc/LXWN4GvlcNZxZxaERiEVUXWyWVOl87t8FDtLwXeY/euqlFpIr12cnZrfzxFBSFhNkUu9Riik31t2Q7W3mCx9A9Dm2EuDnXR9tu+0pV1NpVeYi6rKQItFMXM5CyJNFS+Kqzj/CApbOZAwoAn6Kuu+RFv8SH+ed2sqcYSXk//rZzOq84D5nqIt7FfYTvGtmKpIV8kpOta94zsAdEEg6oL4vXEd3IHNTRuU4HYHK0XY4nOG80C7O/9lMrEWyMWvynBexuSYVALz7QM5nG8wV4+B0ReR4yz2ndPa3cqKcTUAfEdI4qFOYuBmNv5SDmq5LAJnCt3pinDtyKw7GTzeGoGzE5t52K624deZe60C5rie1x3yrWrpuaAaYC7Mda/vEhTOA59xS2pjpa8U+CoW1BpcbXLBpN8DEkBV64CPf7xGkBnC6KWVXGWuNQ5y0iJ2jdnVZDqY3HjyERaec/am0JWN/Gq2XA8bA2lRg8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Cleancache can be thought of as a page-granularity victim cache for clean pages that the kernel's pageframe replacement algorithm (PFRA) would like to keep around, but can't since there isn't enough memory. So when the PFRA "evicts" a page, it first attempts to use cleancache code to put the data contained in that page into "transcendent memory", memory that is not directly accessible or addressable by the kernel. Later, when the system needs to access a page in a file on disk, it first checks cleancache to see if it already contains it; if it does, the page of data is copied into the kernel and a disk access is avoided. The patchset borrows the idea, some code and documentation from previous cleancache implementation but as opposed to being a thin pass-through layer it now implements housekeeping code to associate cleancache pages with their inodes and handling of page pools donated by the cleancache backends. If also avoids intrusive hooks into filesystem code, limiting itself to hooks in mm reclaim and page-in paths and two hooks to detect new filesystem mount/unmount events. This patch implements the basic cleancache support. Future plans include implementing large folio support, sysfs statistics and cleancache page eviction mechanism. Signed-off-by: Suren Baghdasaryan Signed-off-by: Minchan Kim --- block/bdev.c | 8 + fs/super.c | 3 + include/linux/cleancache.h | 88 ++++ include/linux/fs.h | 7 + mm/Kconfig | 17 + mm/Makefile | 1 + mm/cleancache.c | 926 +++++++++++++++++++++++++++++++++++++ mm/filemap.c | 63 ++- mm/truncate.c | 21 +- 9 files changed, 1124 insertions(+), 10 deletions(-) create mode 100644 include/linux/cleancache.h create mode 100644 mm/cleancache.c diff --git a/block/bdev.c b/block/bdev.c index 9d73a8fbf7f9..aa00b9da9e0a 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "../fs/internal.h" #include "blk.h" @@ -95,12 +96,19 @@ static void kill_bdev(struct block_device *bdev) void invalidate_bdev(struct block_device *bdev) { struct address_space *mapping = bdev->bd_mapping; + struct cleancache_filekey key; if (mapping->nrpages) { invalidate_bh_lrus(); lru_add_drain_all(); /* make sure all lru add caches are flushed */ invalidate_mapping_pages(mapping, 0, -1); } + /* + * 99% of the time, we don't need to flush the cleancache on the bdev. + * But, for the strange corners, lets be cautious + */ + cleancache_invalidate_inode(mapping, + cleancache_get_key(mapping->host, &key)); } EXPORT_SYMBOL(invalidate_bdev); diff --git a/fs/super.c b/fs/super.c index 5a7db4a556e3..7e8d668a587e 100644 --- a/fs/super.c +++ b/fs/super.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -374,6 +375,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, s->s_time_gran = 1000000000; s->s_time_min = TIME64_MIN; s->s_time_max = TIME64_MAX; + cleancache_add_fs(s); s->s_shrink = shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE, "sb-%s", type->name); @@ -469,6 +471,7 @@ void deactivate_locked_super(struct super_block *s) { struct file_system_type *fs = s->s_type; if (atomic_dec_and_test(&s->s_active)) { + cleancache_remove_fs(s); shrinker_free(s->s_shrink); fs->kill_sb(s); diff --git a/include/linux/cleancache.h b/include/linux/cleancache.h new file mode 100644 index 000000000000..a9161cbf3490 --- /dev/null +++ b/include/linux/cleancache.h @@ -0,0 +1,88 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_CLEANCACHE_H +#define _LINUX_CLEANCACHE_H + +#include +#include +#include + +/* super_block->cleancache_id value for an invalid ID */ +#define CLEANCACHE_ID_INVALID -1 + +#define CLEANCACHE_KEY_MAX 6 + +/* + * Cleancache requires every file with a folio in cleancache to have a + * unique key unless/until the file is removed/truncated. For some + * filesystems, the inode number is unique, but for "modern" filesystems + * an exportable filehandle is required (see exportfs.h) + */ +struct cleancache_filekey { + union { + ino_t ino; + __u32 fh[CLEANCACHE_KEY_MAX]; + u32 key[CLEANCACHE_KEY_MAX]; + } u; +}; + +#ifdef CONFIG_CLEANCACHE + +/* Hooks into MM and FS */ +struct cleancache_filekey *cleancache_get_key(struct inode *inode, + struct cleancache_filekey *key); +void cleancache_add_fs(struct super_block *sb); +void cleancache_remove_fs(struct super_block *sb); +void cleancache_store_folio(struct folio *folio, + struct cleancache_filekey *key); +bool cleancache_restore_folio(struct folio *folio, + struct cleancache_filekey *key); +void cleancache_invalidate_folio(struct address_space *mapping, + struct folio *folio, + struct cleancache_filekey *key); +void cleancache_invalidate_inode(struct address_space *mapping, + struct cleancache_filekey *key); + +/* + * Backend API + * + * Cleancache does not touch page reference. Page refcount should be 1 when + * page is placed or returned into cleancache and pages obtained from + * cleancache will also have their refcount at 1. + */ +int cleancache_register_backend(const char *name, struct list_head *folios); +int cleancache_backend_get_folio(int area_id, struct folio *folio); +int cleancache_backend_put_folio(int area_id, struct folio *folio); + +#else /* CONFIG_CLEANCACHE */ + +static inline +struct cleancache_filekey *cleancache_get_key(struct inode *inode, + struct cleancache_filekey *key) +{ + return NULL; +} +static inline void cleancache_add_fs(struct super_block *sb) {} +static inline void cleancache_remove_fs(struct super_block *sb) {} +static inline void cleancache_store_folio(struct folio *folio, + struct cleancache_filekey *key) {} +static inline bool cleancache_restore_folio(struct folio *folio, + struct cleancache_filekey *key) +{ + return false; +} +static inline void cleancache_invalidate_folio(struct address_space *mapping, + struct folio *folio, + struct cleancache_filekey *key) {} +static inline void cleancache_invalidate_inode(struct address_space *mapping, + struct cleancache_filekey *key) {} + +static inline int cleancache_register_backend(const char *name, + struct list_head *folios) { return -EOPNOTSUPP; } +static inline int cleancache_backend_get_folio(int area_id, + struct folio *folio) { return -EOPNOTSUPP; } +static inline int cleancache_backend_put_folio(int area_id, + struct folio *folio) { return -EOPNOTSUPP; } + +#endif /* CONFIG_CLEANCACHE */ + +#endif /* _LINUX_CLEANCACHE_H */ diff --git a/include/linux/fs.h b/include/linux/fs.h index 2788df98080f..851544454c9e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1407,6 +1407,13 @@ struct super_block { const struct dentry_operations *s_d_op; /* default d_op for dentries */ +#ifdef CONFIG_CLEANCACHE + /* + * Saved identifier for cleancache (CLEANCACHE_ID_INVALID means none) + */ + int cleancache_id; +#endif + struct shrinker *s_shrink; /* per-sb shrinker handle */ /* Number of inodes with nlink == 0 but still referenced */ diff --git a/mm/Kconfig b/mm/Kconfig index 4a4e7b63d30a..d6ebf0fb0432 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -945,6 +945,23 @@ config USE_PERCPU_NUMA_NODE_ID config HAVE_SETUP_PER_CPU_AREA bool +config CLEANCACHE + bool "Enable cleancache to cache clean pages if tmem is present" + help + Cleancache can be thought of as a page-granularity victim cache + for clean pages that the kernel's pageframe replacement algorithm + (PFRA) would like to keep around, but can't since there isn't enough + memory. So when the PFRA "evicts" a page, it first attempts to use + cleancache code to put the data contained in that page into + "transcendent memory", memory that is not directly accessible or + addressable by the kernel and is of unknown and possibly + time-varying size. When system wishes to access a page in a file + on disk, it first checks cleancache to see if it already contains + it; if it does, the page is copied into the kernel and a disk + access is avoided. + + If unsure, say N. + config CMA bool "Contiguous Memory Allocator" depends on MMU diff --git a/mm/Makefile b/mm/Makefile index e7f6bbf8ae5f..084dbe9edbc4 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -148,3 +148,4 @@ obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o obj-$(CONFIG_EXECMEM) += execmem.o obj-$(CONFIG_TMPFS_QUOTA) += shmem_quota.o obj-$(CONFIG_PT_RECLAIM) += pt_reclaim.o +obj-$(CONFIG_CLEANCACHE) += cleancache.o diff --git a/mm/cleancache.c b/mm/cleancache.c new file mode 100644 index 000000000000..23113c5adfc5 --- /dev/null +++ b/mm/cleancache.c @@ -0,0 +1,926 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Cleancache frontend + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Possible lock nesting: + * inode->pages.xa_lock + * free_folios_lock + * + * inode->pages.xa_lock + * fs->hash_lock + * + * Notes: should keep free_folios_lock and fs->hash_lock HARDIRQ-irq-safe + * since inode->pages.xa_lock is HARDIRQ-irq-safe and we take these locks + * while holding inode->pages.xa_lock. This means whenever we take these + * locks while not holding inode->pages.xa_lock, we should disable irqs. + */ + +/* Counters available via /sys/kernel/debug/cleancache */ +static u64 cleancache_hits; +static u64 cleancache_misses; +static u64 cleancache_stores; +static u64 cleancache_failed_stores; +static u64 cleancache_invalidates; + +/* + * @cleancache_inode represents each inode in @cleancache_fs + * + * The cleancache_inode will be freed by RCU when the last page from xarray + * is freed, except for invalidate_inode() case. + */ +struct cleancache_inode { + struct cleancache_filekey key; + struct hlist_node hash; + refcount_t ref_count; + + struct xarray pages; + struct rcu_head rcu; + struct cleancache_fs *fs; +}; + +static struct kmem_cache *slab_inode; + +#define INODE_HASH_BITS 10 + +/* represents each file system instance hosted by the cleancache */ +struct cleancache_fs { + spinlock_t hash_lock; + DECLARE_HASHTABLE(inode_hash, INODE_HASH_BITS); + refcount_t ref_count; +}; + +static DEFINE_IDR(fs_idr); +static DEFINE_SPINLOCK(fs_lock); + +/* Cleancache backend memory pool */ +struct cleancache_pool { + struct list_head free_folios; + spinlock_t free_folios_lock; +}; + +#define CLEANCACHE_MAX_POOLS 64 + +static struct cleancache_pool pools[CLEANCACHE_MAX_POOLS]; +static atomic_t nr_pools = ATOMIC_INIT(0); +static DEFINE_SPINLOCK(pools_lock); + +/* + * If the filesystem uses exportable filehandles, use the filehandle as + * the key, else use the inode number. + */ +struct cleancache_filekey *cleancache_get_key(struct inode *inode, + struct cleancache_filekey *key) +{ + int (*fhfn)(struct inode *inode, __u32 *fh, int *max_len, struct inode *parent); + int len = 0, maxlen = CLEANCACHE_KEY_MAX; + struct super_block *sb = inode->i_sb; + + key->u.ino = inode->i_ino; + if (sb->s_export_op != NULL) { + fhfn = sb->s_export_op->encode_fh; + if (fhfn) { + len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL); + if (len <= FILEID_ROOT || len == FILEID_INVALID) + return NULL; + if (maxlen > CLEANCACHE_KEY_MAX) + return NULL; + } + } + return key; +} + +/* page attribute helpers */ +static inline void set_page_pool_id(struct page *page, int id) +{ + page->page_type = id; +} + +static inline int page_pool_id(struct page *page) +{ + return page->page_type; +} + +static inline struct cleancache_pool *page_pool(struct page *page) +{ + return &pools[page_pool_id(page)]; +} + +/* Can be used only when page is isolated */ +static inline void __SetPageCCacheFree(struct page *page) +{ + SetPagePrivate(page); +} + +static inline void SetPageCCacheFree(struct page *page) +{ + lockdep_assert_held(&(page_pool(page)->free_folios_lock)); + __SetPageCCacheFree(page); +} + +static inline void ClearPageCCacheFree(struct page *page) +{ + lockdep_assert_held(&(page_pool(page)->free_folios_lock)); + ClearPagePrivate(page); +} + +static inline int PageCCacheFree(struct page *page) +{ + lockdep_assert_held(&(page_pool(page)->free_folios_lock)); + return PagePrivate(page); +} + +/* Can be used only when page is isolated */ +static void __set_page_inode_offs(struct page *page, + struct cleancache_inode *inode, + unsigned long index) +{ + page->mapping = (struct address_space *)inode; + page->index = index; +} + +static void set_page_inode_offs(struct page *page, struct cleancache_inode *inode, + unsigned long index) +{ + lockdep_assert_held(&(page_pool(page)->free_folios_lock)); + + __set_page_inode_offs(page, inode, index); +} + +static void page_inode_offs(struct page *page, struct cleancache_inode **inode, + unsigned long *index) +{ + lockdep_assert_held(&(page_pool(page)->free_folios_lock)); + + *inode = (struct cleancache_inode *)page->mapping; + *index = page->index; +} + +/* page pool helpers */ +static void add_page_to_pool(struct page *page, struct cleancache_pool *pool) +{ + unsigned long flags; + + VM_BUG_ON(!list_empty(&page->lru)); + + spin_lock_irqsave(&pool->free_folios_lock, flags); + + set_page_inode_offs(page, NULL, 0); + SetPageCCacheFree(page); + list_add(&page_folio(page)->lru, &pool->free_folios); + + spin_unlock_irqrestore(&pool->free_folios_lock, flags); +} + +static struct page *remove_page_from_pool(struct page *page, struct cleancache_pool *pool) +{ + lockdep_assert_held(&pool->free_folios_lock); + VM_BUG_ON(page_pool(page) != pool); + + if (!PageCCacheFree(page)) + return NULL; + + list_del_init(&page->lru); + ClearPageCCacheFree(page); + + return page; +} + +static struct page *pick_page_from_pool(void) +{ + struct cleancache_pool *pool; + struct page *page = NULL; + unsigned long flags; + int count; + + count = atomic_read_acquire(&nr_pools); + for (int i = 0; i < count; i++) { + pool = &pools[i]; + spin_lock_irqsave(&pool->free_folios_lock, flags); + if (!list_empty(&pool->free_folios)) { + struct folio *folio; + + folio = list_last_entry(&pool->free_folios, + struct folio, lru); + page = &folio->page; + WARN_ON(!remove_page_from_pool(page, pool)); + spin_unlock_irqrestore(&pool->free_folios_lock, flags); + break; + } + spin_unlock_irqrestore(&pool->free_folios_lock, flags); + } + + return page; +} + +/* FS helpers */ +static struct cleancache_fs *get_fs(int fs_id) +{ + struct cleancache_fs *fs; + + rcu_read_lock(); + fs = idr_find(&fs_idr, fs_id); + if (fs && !refcount_inc_not_zero(&fs->ref_count)) + fs = NULL; + rcu_read_unlock(); + + return fs; +} + +static void put_fs(struct cleancache_fs *fs) +{ + if (refcount_dec_and_test(&fs->ref_count)) + kfree(fs); +} + +/* inode helpers */ +static struct cleancache_inode *alloc_inode(struct cleancache_fs *fs, + struct cleancache_filekey *key) +{ + struct cleancache_inode *inode; + + inode = kmem_cache_alloc(slab_inode, GFP_ATOMIC|__GFP_NOWARN); + if (inode) { + memcpy(&inode->key, key, sizeof(*key)); + xa_init_flags(&inode->pages, XA_FLAGS_LOCK_IRQ); + INIT_HLIST_NODE(&inode->hash); + inode->fs = fs; + refcount_set(&inode->ref_count, 1); + } + + return inode; +} + +static int erase_pages_from_inode(struct cleancache_inode *inode, + bool remove_inode); + +static void inode_free_rcu(struct rcu_head *rcu) +{ + struct cleancache_inode *inode; + + inode = container_of(rcu, struct cleancache_inode, rcu); + erase_pages_from_inode(inode, false); + kmem_cache_free(slab_inode, inode); +} + +static bool get_inode(struct cleancache_inode *inode) +{ + return refcount_inc_not_zero(&inode->ref_count); +} + +static bool put_inode(struct cleancache_inode *inode) +{ + if (!refcount_dec_and_test(&inode->ref_count)) + return false; + + call_rcu(&inode->rcu, inode_free_rcu); + return true; +} + +static void remove_inode_if_empty(struct cleancache_inode *inode) +{ + struct cleancache_fs *fs = inode->fs; + + lockdep_assert_held(&inode->pages.xa_lock); + + if (!xa_empty(&inode->pages)) + return; + + spin_lock(&fs->hash_lock); + if (!WARN_ON(hlist_unhashed(&inode->hash))) + hlist_del_init_rcu(&inode->hash); + spin_unlock(&fs->hash_lock); + /* Caller should have taken an extra refcount to keep inode valid */ + WARN_ON(put_inode(inode)); +} + +static int store_page_in_inode(struct cleancache_inode *inode, + unsigned long index, struct page *page) +{ + struct cleancache_pool *pool = page_pool(page); + unsigned long flags; + int err; + + lockdep_assert_held(&inode->pages.xa_lock); + VM_BUG_ON(!list_empty(&page->lru)); + + spin_lock_irqsave(&pool->free_folios_lock, flags); + + err = xa_err(__xa_store(&inode->pages, index, page, + GFP_ATOMIC|__GFP_NOWARN)); + if (!err) { + set_page_inode_offs(page, inode, index); + VM_BUG_ON_PAGE(PageCCacheFree(page), page); + } + + spin_unlock_irqrestore(&pool->free_folios_lock, flags); + + return err; +} + +static void erase_page_from_inode(struct cleancache_inode *inode, + unsigned long index, struct page *page) +{ + bool removed; + + lockdep_assert_held(&inode->pages.xa_lock); + + removed = __xa_erase(&inode->pages, index); + VM_BUG_ON(!removed || !list_empty(&page->lru)); + + remove_inode_if_empty(inode); +} + +static int erase_pages_from_inode(struct cleancache_inode *inode, bool remove_inode) +{ + XA_STATE(xas, &inode->pages, 0); + unsigned long flags; + struct page *page; + unsigned int ret = 0; + + xas_lock_irqsave(&xas, flags); + + if (!xa_empty(&inode->pages)) { + xas_for_each(&xas, page, ULONG_MAX) { + __xa_erase(&inode->pages, xas.xa_index); + add_page_to_pool(page, page_pool(page)); + ret++; + } + } + if (remove_inode) + remove_inode_if_empty(inode); + + xas_unlock_irqrestore(&xas, flags); + + return ret; +} + +static struct cleancache_inode *find_and_get_inode(struct cleancache_fs *fs, + struct cleancache_filekey *key) +{ + struct cleancache_inode *tmp, *inode = NULL; + + rcu_read_lock(); + hash_for_each_possible_rcu(fs->inode_hash, tmp, hash, key->u.ino) { + if (memcmp(&tmp->key, key, sizeof(*key))) + continue; + + /* TODO: should we stop if get fails? */ + if (get_inode(tmp)) { + inode = tmp; + break; + } + } + rcu_read_unlock(); + + return inode; +} + +static struct cleancache_inode *add_and_get_inode(struct cleancache_fs *fs, + struct cleancache_filekey *key) +{ + struct cleancache_inode *inode, *tmp; + unsigned long flags; + + inode = alloc_inode(fs, key); + if (!inode) + return ERR_PTR(-ENOMEM); + + spin_lock_irqsave(&fs->hash_lock, flags); + tmp = find_and_get_inode(fs, key); + if (tmp) { + spin_unlock_irqrestore(&fs->hash_lock, flags); + /* someone already added it */ + put_inode(inode); + put_inode(tmp); + return ERR_PTR(-EEXIST); + } + + hash_add_rcu(fs->inode_hash, &inode->hash, key->u.ino); + get_inode(inode); + spin_unlock_irqrestore(&fs->hash_lock, flags); + + return inode; +} + +/* + * We want to store only workingset pages in the cleancache to increase hit + * ratio so there are four cases: + * + * @page is workingset but cleancache doesn't have it: use new cleancache page + * @page is workingset and cleancache has it: overwrite the stale data + * @page is !workingset and cleancache doesn't have it: just bail out + * @page is !workingset and cleancache has it: remove the stale @page + */ +static bool store_into_inode(struct cleancache_fs *fs, + struct cleancache_filekey *key, + pgoff_t offset, struct page *page) +{ + bool workingset = PageWorkingset(page); + struct cleancache_inode *inode; + struct page *stored_page; + void *src, *dst; + bool ret = false; + +find_inode: + inode = find_and_get_inode(fs, key); + if (!inode) { + if (!workingset) + return false; + + inode = add_and_get_inode(fs, key); + if (IS_ERR_OR_NULL(inode)) { + /* + * Retry if someone just added new inode from under us. + */ + if (PTR_ERR(inode) == -EEXIST) + goto find_inode; + + return false; + } + } + + xa_lock(&inode->pages); + + stored_page = xa_load(&inode->pages, offset); + if (stored_page) { + if (!workingset) { + erase_page_from_inode(inode, offset, stored_page); + add_page_to_pool(stored_page, page_pool(stored_page)); + goto out_unlock; + } + } else { + if (!workingset) + goto out_unlock; + + stored_page = pick_page_from_pool(); + if (!stored_page) + goto out_unlock; + + if (store_page_in_inode(inode, offset, stored_page)) { + add_page_to_pool(stored_page, page_pool(stored_page)); + goto out_unlock; + } + } + + /* Copy the content of the page */ + src = kmap_local_page(page); + dst = kmap_local_page(stored_page); + memcpy(dst, src, PAGE_SIZE); + kunmap_local(dst); + kunmap_local(src); + + ret = true; +out_unlock: + /* + * Remove the inode if it was just created but we failed to add a page. + */ + remove_inode_if_empty(inode); + xa_unlock(&inode->pages); + put_inode(inode); + + return ret; +} + +static bool load_from_inode(struct cleancache_fs *fs, + struct cleancache_filekey *key, + pgoff_t offset, struct page *page) +{ + struct cleancache_inode *inode; + struct page *stored_page; + void *src, *dst; + bool ret = false; + + inode = find_and_get_inode(fs, key); + if (!inode) + return false; + + xa_lock(&inode->pages); + + stored_page = xa_load(&inode->pages, offset); + if (stored_page) { + src = kmap_local_page(stored_page); + dst = kmap_local_page(page); + memcpy(dst, src, PAGE_SIZE); + kunmap_local(dst); + kunmap_local(src); + ret = true; + } + + xa_unlock(&inode->pages); + put_inode(inode); + + return ret; +} + +static bool invalidate_page(struct cleancache_fs *fs, + struct cleancache_filekey *key, pgoff_t offset) +{ + struct cleancache_inode *inode; + struct page *page; + + inode = find_and_get_inode(fs, key); + if (!inode) + return false; + + xa_lock(&inode->pages); + page = xa_load(&inode->pages, offset); + if (page) { + erase_page_from_inode(inode, offset, page); + add_page_to_pool(page, page_pool(page)); + } + xa_unlock(&inode->pages); + put_inode(inode); + + return page != NULL; +} + +static unsigned int invalidate_inode(struct cleancache_fs *fs, + struct cleancache_filekey *key) +{ + struct cleancache_inode *inode; + unsigned int ret; + + inode = find_and_get_inode(fs, key); + if (!inode) + return 0; + + ret = erase_pages_from_inode(inode, true); + put_inode(inode); + + return ret; +} + +/* Hooks into MM and FS */ +void cleancache_add_fs(struct super_block *sb) +{ + int fs_id; + struct cleancache_fs *fs; + + fs = kzalloc(sizeof(struct cleancache_fs), GFP_KERNEL); + if (!fs) + goto err; + + spin_lock_init(&fs->hash_lock); + hash_init(fs->inode_hash); + refcount_set(&fs->ref_count, 1); + + idr_preload(GFP_KERNEL); + spin_lock(&fs_lock); + fs_id = idr_alloc(&fs_idr, fs, 0, 0, GFP_NOWAIT); + spin_unlock(&fs_lock); + idr_preload_end(); + + if (fs_id < 0) { + pr_warn("too many file systems\n"); + goto err_free; + } + + sb->cleancache_id = fs_id; + return; + +err_free: + kfree(fs); +err: + sb->cleancache_id = CLEANCACHE_ID_INVALID; +} + +void cleancache_remove_fs(struct super_block *sb) +{ + int fs_id = sb->cleancache_id; + struct cleancache_inode *inode; + struct cleancache_fs *fs; + struct hlist_node *tmp; + int cursor; + + sb->cleancache_id = CLEANCACHE_ID_INVALID; + fs = get_fs(fs_id); + if (!fs) + return; + + /* + * No need to hold any lock here since this function is called when + * fs is unmounted. IOW, inode insert/delete race cannot happen. + */ + hash_for_each_safe(fs->inode_hash, cursor, tmp, inode, hash) + cleancache_invalidates += invalidate_inode(fs, &inode->key); + synchronize_rcu(); + +#ifdef CONFIG_DEBUG_VM + for (int i = 0; i < HASH_SIZE(fs->inode_hash); i++) + VM_BUG_ON(!hlist_empty(&fs->inode_hash[i])); +#endif + spin_lock(&fs_lock); + idr_remove(&fs_idr, fs_id); + spin_unlock(&fs_lock); + put_fs(fs); + pr_info("removed file system %d\n", fs_id); + + /* free the object */ + put_fs(fs); +} + +/* + * WARNING: This cleancache function might be called with disabled irqs + */ +void cleancache_store_folio(struct folio *folio, + struct cleancache_filekey *key) +{ + struct cleancache_fs *fs; + int fs_id; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + + if (!key) + return; + + /* Do not support large folios yet */ + if (folio_test_large(folio)) + return; + + fs_id = folio->mapping->host->i_sb->cleancache_id; + if (fs_id == CLEANCACHE_ID_INVALID) + return; + + fs = get_fs(fs_id); + if (!fs) + return; + + if (store_into_inode(fs, key, folio->index, &folio->page)) + cleancache_stores++; + else + cleancache_failed_stores++; + put_fs(fs); +} + +bool cleancache_restore_folio(struct folio *folio, + struct cleancache_filekey *key) +{ + struct cleancache_fs *fs; + int fs_id; + bool ret; + + if (!key) + return false; + + /* Do not support large folios yet */ + if (folio_test_large(folio)) + return false; + + fs_id = folio->mapping->host->i_sb->cleancache_id; + if (fs_id == CLEANCACHE_ID_INVALID) + return false; + + fs = get_fs(fs_id); + if (!fs) + return false; + + ret = load_from_inode(fs, key, folio->index, &folio->page); + if (ret) + cleancache_hits++; + else + cleancache_misses++; + put_fs(fs); + + return ret; +} + +/* + * WARNING: This cleancache function might be called with disabled irqs + */ +void cleancache_invalidate_folio(struct address_space *mapping, + struct folio *folio, + struct cleancache_filekey *key) +{ + struct cleancache_fs *fs; + int fs_id; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + + if (!key) + return; + + /* Do not support large folios yet */ + if (folio_test_large(folio)) + return; + + /* Careful, folio->mapping can be NULL */ + fs_id = mapping->host->i_sb->cleancache_id; + if (fs_id == CLEANCACHE_ID_INVALID) + return; + + fs = get_fs(fs_id); + if (!fs) + return; + + if (invalidate_page(fs, key, folio->index)) + cleancache_invalidates++; + put_fs(fs); +} + +void cleancache_invalidate_inode(struct address_space *mapping, + struct cleancache_filekey *key) +{ + struct cleancache_fs *fs; + int fs_id; + + if (!key) + return; + + fs_id = mapping->host->i_sb->cleancache_id; + if (fs_id == CLEANCACHE_ID_INVALID) + return; + + fs = get_fs(fs_id); + if (!fs) + return; + + cleancache_invalidates += invalidate_inode(fs, key); + put_fs(fs); +} + +/* Backend API */ +/* + * Register a new backend and add its pages for cleancache to use. + * Returns pool id on success or a negative error code on failure. + */ +int cleancache_register_backend(const char *name, struct list_head *folios) +{ + struct cleancache_pool *pool; + unsigned long pool_size = 0; + unsigned long flags; + struct folio *folio; + int pool_id; + + /* pools_lock prevents concurrent registrations */ + spin_lock(&pools_lock); + + pool_id = atomic_read(&nr_pools); + if (pool_id >= CLEANCACHE_MAX_POOLS) { + spin_unlock(&pools_lock); + return -ENOMEM; + } + + pool = &pools[pool_id]; + INIT_LIST_HEAD(&pool->free_folios); + spin_lock_init(&pool->free_folios_lock); + /* Ensure above stores complete before we increase the count */ + atomic_set_release(&nr_pools, pool_id + 1); + + spin_unlock(&pools_lock); + + list_for_each_entry(folio, folios, lru) { + struct page *page; + + /* Do not support large folios yet */ + VM_BUG_ON_FOLIO(folio_test_large(folio), folio); + VM_BUG_ON_FOLIO(folio_ref_count(folio) != 1, folio); + page = &folio->page; + set_page_pool_id(page, pool_id); + __set_page_inode_offs(page, NULL, 0); + __SetPageCCacheFree(page); + pool_size++; + } + + spin_lock_irqsave(&pool->free_folios_lock, flags); + list_splice_init(folios, &pool->free_folios); + spin_unlock_irqrestore(&pool->free_folios_lock, flags); + + pr_info("Registered \'%s\' cleancache backend, pool id %d, size %lu pages\n", + name ? : "none", pool_id, pool_size); + + return pool_id; +} +EXPORT_SYMBOL(cleancache_register_backend); + +int cleancache_backend_get_folio(int pool_id, struct folio *folio) +{ + struct cleancache_inode *inode; + struct cleancache_pool *pool; + unsigned long flags; + unsigned long index; + struct page *page; + + + /* Do not support large folios yet */ + if (folio_test_large(folio)) + return -EOPNOTSUPP; + + page = &folio->page; + /* Does the page belong to the requesting backend */ + if (page_pool_id(page) != pool_id) + return -EINVAL; + + pool = &pools[pool_id]; +again: + spin_lock_irqsave(&pool->free_folios_lock, flags); + + /* If page is free inside the pool, return it */ + if (remove_page_from_pool(page, pool)) { + spin_unlock_irqrestore(&pool->free_folios_lock, flags); + return 0; + } + + /* + * The page is not free, therefore it has to belong to a valid inode. + * Operations on CCacheFree and page->mapping are done under + * free_folios_lock which we are currently holding and CCacheFree + * always gets cleared before page->mapping is set. + */ + page_inode_offs(page, &inode, &index); + if (WARN_ON(!inode || !get_inode(inode))) { + spin_unlock_irqrestore(&pool->free_folios_lock, flags); + return -EINVAL; + } + + spin_unlock_irqrestore(&pool->free_folios_lock, flags); + + xa_lock_irqsave(&inode->pages, flags); + /* + * Retry if the page got erased from the inode but was not added into + * the pool yet. erase_page_from_inode() and add_page_to_pool() happens + * under inode->pages.xa_lock which we are holding, therefore by now + * both operations should have completed. Let's retry. + */ + if (xa_load(&inode->pages, index) != page) { + xa_unlock_irqrestore(&inode->pages, flags); + put_inode(inode); + goto again; + } + + erase_page_from_inode(inode, index, page); + + spin_lock(&pool->free_folios_lock); + set_page_inode_offs(page, NULL, 0); + spin_unlock(&pool->free_folios_lock); + + xa_unlock_irqrestore(&inode->pages, flags); + + put_inode(inode); + + return 0; +} +EXPORT_SYMBOL(cleancache_backend_get_folio); + +int cleancache_backend_put_folio(int pool_id, struct folio *folio) +{ + struct cleancache_pool *pool = &pools[pool_id]; + struct page *page; + + /* Do not support large folios yet */ + if (folio_test_large(folio)) + return -EOPNOTSUPP; + + page = &folio->page; + VM_BUG_ON_PAGE(page_ref_count(page) != 1, page); + VM_BUG_ON(!list_empty(&page->lru)); + /* Reset struct page fields */ + set_page_pool_id(page, pool_id); + INIT_LIST_HEAD(&page->lru); + add_page_to_pool(page, pool); + + return 0; +} +EXPORT_SYMBOL(cleancache_backend_put_folio); + +static int __init init_cleancache(void) +{ + slab_inode = KMEM_CACHE(cleancache_inode, 0); + if (!slab_inode) + return -ENOMEM; + + return 0; +} +core_initcall(init_cleancache); + +#ifdef CONFIG_DEBUG_FS +static int __init cleancache_debugfs_init(void) +{ + struct dentry *root; + + root = debugfs_create_dir("cleancache", NULL); + debugfs_create_u64("hits", 0444, root, &cleancache_hits); + debugfs_create_u64("misses", 0444, root, &cleancache_misses); + debugfs_create_u64("stores", 0444, root, &cleancache_stores); + debugfs_create_u64("failed_stores", 0444, root, &cleancache_failed_stores); + debugfs_create_u64("invalidates", 0444, root, &cleancache_invalidates); + + return 0; +} +late_initcall(cleancache_debugfs_init); +#endif diff --git a/mm/filemap.c b/mm/filemap.c index cc69f174f76b..51dd86d7031f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -147,10 +148,20 @@ static void page_cache_delete(struct address_space *mapping, } static void filemap_unaccount_folio(struct address_space *mapping, - struct folio *folio) + struct folio *folio, struct cleancache_filekey *key) { long nr; + /* + * if we're uptodate, flush out into the cleancache, otherwise + * invalidate any existing cleancache entries. We can't leave + * stale data around in the cleancache once our page is gone + */ + if (folio_test_uptodate(folio) && folio_test_mappedtodisk(folio)) + cleancache_store_folio(folio, key); + else + cleancache_invalidate_folio(mapping, folio, key); + VM_BUG_ON_FOLIO(folio_mapped(folio), folio); if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(folio_mapped(folio))) { pr_alert("BUG: Bad page cache in process %s pfn:%05lx\n", @@ -210,6 +221,16 @@ static void filemap_unaccount_folio(struct address_space *mapping, folio_account_cleaned(folio, inode_to_wb(mapping->host)); } +static void ___filemap_remove_folio(struct folio *folio, void *shadow, + struct cleancache_filekey *key) +{ + struct address_space *mapping = folio->mapping; + + trace_mm_filemap_delete_from_page_cache(folio); + filemap_unaccount_folio(mapping, folio, key); + page_cache_delete(mapping, folio, shadow); +} + /* * Delete a page from the page cache and free it. Caller has to make * sure the page is locked and that nobody else uses it - or that usage @@ -217,11 +238,7 @@ static void filemap_unaccount_folio(struct address_space *mapping, */ void __filemap_remove_folio(struct folio *folio, void *shadow) { - struct address_space *mapping = folio->mapping; - - trace_mm_filemap_delete_from_page_cache(folio); - filemap_unaccount_folio(mapping, folio); - page_cache_delete(mapping, folio, shadow); + ___filemap_remove_folio(folio, shadow, NULL); } void filemap_free_folio(struct address_space *mapping, struct folio *folio) @@ -246,11 +263,20 @@ void filemap_free_folio(struct address_space *mapping, struct folio *folio) void filemap_remove_folio(struct folio *folio) { struct address_space *mapping = folio->mapping; + struct cleancache_filekey *pkey; + struct cleancache_filekey key; BUG_ON(!folio_test_locked(folio)); + + /* + * cleancache_get_key() uses sb->s_export_op->encode_fh which can + * also take inode->i_lock. Get the key before taking inode->i_lock. + */ + pkey = cleancache_get_key(mapping->host, &key); + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); - __filemap_remove_folio(folio, NULL); + ___filemap_remove_folio(folio, NULL, pkey); xa_unlock_irq(&mapping->i_pages); if (mapping_shrinkable(mapping)) inode_add_lru(mapping->host); @@ -316,18 +342,26 @@ static void page_cache_delete_batch(struct address_space *mapping, void delete_from_page_cache_batch(struct address_space *mapping, struct folio_batch *fbatch) { + struct cleancache_filekey *pkey; + struct cleancache_filekey key; int i; if (!folio_batch_count(fbatch)) return; + /* + * cleancache_get_key() uses sb->s_export_op->encode_fh which can + * also take inode->i_lock. Get the key before taking inode->i_lock. + */ + pkey = cleancache_get_key(mapping->host, &key); + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); for (i = 0; i < folio_batch_count(fbatch); i++) { struct folio *folio = fbatch->folios[i]; trace_mm_filemap_delete_from_page_cache(folio); - filemap_unaccount_folio(mapping, folio); + filemap_unaccount_folio(mapping, folio, pkey); } page_cache_delete_batch(mapping, fbatch); xa_unlock_irq(&mapping->i_pages); @@ -1865,6 +1899,13 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index) out: rcu_read_unlock(); + if (folio && !folio_test_uptodate(folio)) { + struct cleancache_filekey key; + + if (cleancache_restore_folio(folio, cleancache_get_key(mapping->host, &key))) + folio_mark_uptodate(folio); + } + return folio; } @@ -2430,6 +2471,7 @@ static int filemap_update_page(struct kiocb *iocb, struct address_space *mapping, size_t count, struct folio *folio, bool need_uptodate) { + struct cleancache_filekey key; int error; if (iocb->ki_flags & IOCB_NOWAIT) { @@ -2466,6 +2508,11 @@ static int filemap_update_page(struct kiocb *iocb, need_uptodate)) goto unlock; + if (cleancache_restore_folio(folio, + cleancache_get_key(folio->mapping->host, &key))) { + folio_mark_uptodate(folio); + goto unlock; + } error = -EAGAIN; if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT | IOCB_WAITQ)) goto unlock; diff --git a/mm/truncate.c b/mm/truncate.c index 5d98054094d1..6a981c2e57ca 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include "internal.h" @@ -190,6 +191,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) */ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) { + struct cleancache_filekey key; loff_t pos = folio_pos(folio); unsigned int offset, length; struct page *split_at, *split_at2; @@ -218,6 +220,8 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) if (!mapping_inaccessible(folio->mapping)) folio_zero_range(folio, offset, length); + cleancache_invalidate_folio(folio->mapping, folio, + cleancache_get_key(folio->mapping->host, &key)); if (folio_needs_release(folio)) folio_invalidate(folio, offset, length); if (!folio_test_large(folio)) @@ -337,6 +341,7 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) void truncate_inode_pages_range(struct address_space *mapping, loff_t lstart, loff_t lend) { + struct cleancache_filekey key; pgoff_t start; /* inclusive */ pgoff_t end; /* exclusive */ struct folio_batch fbatch; @@ -347,7 +352,7 @@ void truncate_inode_pages_range(struct address_space *mapping, bool same_folio; if (mapping_empty(mapping)) - return; + goto out; /* * 'start' and 'end' always covers the range of pages to be fully @@ -435,6 +440,10 @@ void truncate_inode_pages_range(struct address_space *mapping, truncate_folio_batch_exceptionals(mapping, &fbatch, indices); folio_batch_release(&fbatch); } + +out: + cleancache_invalidate_inode(mapping, + cleancache_get_key(mapping->host, &key)); } EXPORT_SYMBOL(truncate_inode_pages_range); @@ -488,6 +497,10 @@ void truncate_inode_pages_final(struct address_space *mapping) xa_unlock_irq(&mapping->i_pages); } + /* + * Cleancache needs notification even if there are no pages or shadow + * entries. + */ truncate_inode_pages(mapping, 0); } EXPORT_SYMBOL(truncate_inode_pages_final); @@ -643,6 +656,7 @@ int folio_unmap_invalidate(struct address_space *mapping, struct folio *folio, int invalidate_inode_pages2_range(struct address_space *mapping, pgoff_t start, pgoff_t end) { + struct cleancache_filekey key; pgoff_t indices[PAGEVEC_SIZE]; struct folio_batch fbatch; pgoff_t index; @@ -652,7 +666,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, int did_range_unmap = 0; if (mapping_empty(mapping)) - return 0; + goto out; folio_batch_init(&fbatch); index = start; @@ -713,6 +727,9 @@ int invalidate_inode_pages2_range(struct address_space *mapping, if (dax_mapping(mapping)) { unmap_mapping_pages(mapping, start, end - start + 1, false); } +out: + cleancache_invalidate_inode(mapping, + cleancache_get_key(mapping->host, &key)); return ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range); From patchwork Thu Mar 20 17:39:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 14024211 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FA0FC36002 for ; Thu, 20 Mar 2025 17:39:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EDD75280007; Thu, 20 Mar 2025 13:39:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E89EA280006; Thu, 20 Mar 2025 13:39:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDF8A280007; Thu, 20 Mar 2025 13:39:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AD6FD280006 for ; Thu, 20 Mar 2025 13:39:40 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 52756161825 for ; Thu, 20 Mar 2025 17:39:41 +0000 (UTC) X-FDA: 83242641762.30.CC3706D Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf16.hostedemail.com (Postfix) with ESMTP id 7AA53180016 for ; Thu, 20 Mar 2025 17:39:39 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mIlX91oQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 32lLcZwYKCDclnkXgUZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--surenb.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=32lLcZwYKCDclnkXgUZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--surenb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742492379; a=rsa-sha256; cv=none; b=Dg46GsxhjjiUycbLYIEb/D55LQsiTExEPLBXgta8wD6pJ5JopXxDFZCJ6alhD4KEnXgVe4 go1hLT1rd/eTWmLtgPRgG0W5T1S4KLhbZ5FGcvG9dU60advbQXfD8v9fkJ3I4gvd/o6gMy 6W/wc/lXUwsO02rRSgiaeTZ2/NgCXNA= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mIlX91oQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 32lLcZwYKCDclnkXgUZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--surenb.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=32lLcZwYKCDclnkXgUZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--surenb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742492379; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=utO18jM3DqI43jAs46xftkc+iTkKZEsg2nm4Y+WIpWw=; b=BwnhLLKLkXER6cLpTHIxo3hc6WidjOo/7hJCULnBrBeht7ZZwjPuR+N0bNv2jHh7YkGsYm +mv/teirASA2j7BpSPrsgLuzGo9vEv5sQCfAzSzn2QfH/GYFNZHtWzMja17AqXfa7nNng+ 3QxWHZAzXGVd34zyvCA0vRao/cq4jGA= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ff4b130bb2so1508175a91.0 for ; Thu, 20 Mar 2025 10:39:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742492378; x=1743097178; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=utO18jM3DqI43jAs46xftkc+iTkKZEsg2nm4Y+WIpWw=; b=mIlX91oQ4FwxGcJ8LogEQnk2fP61iueFelOzaiZ7FQQnjX8KF1FKmul7cCYtZbsh5g 6YtzKMTtOahuQpdTC+llY1NwITccQKbSzQj3eywQmyw5fnxh6BUyysRZVgNuUoaCi/74 C5u/SOL2JgXRKeuE4QFexasZ++fk5wub2YyoPlLFDP8xg02wU4kEPR8kj7JAvVrgtc+S Fx//xOFC8MyRL8Y2HhpJS0Fg1CUZFxT5LVk9jWZqbhUbkmZ6j9yA4E7QcbnytdHLIkwk Tr1kOFjdAjyOvuIdGWckZIAdSsjWAN1gcV1BCwwG+5nDilR2d+XXBNR1m24KokvvE+aX m/5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742492378; x=1743097178; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=utO18jM3DqI43jAs46xftkc+iTkKZEsg2nm4Y+WIpWw=; b=q83POjayRcwROEVSsgEq3o+DGa5f3efctPKaFes+Woh0WpysCwMmDjV9oji55shzjq ok6AM19IZcka1L8hjC6kdH8cRUgxasHh2pdHyTmN5QA088LDEL8sOcIwAvB9Zpj11/zv tTIkKxo6xEBKIYN/KBWnUy82YWu7q75hfwEgQqEJS5mbAeqLxgerMraMaXDOuYGu2dh6 46rTndw9HG7G39T25Y9BqwSjHjAePjZTjscgoKThTkn49Q+4g9VMaeJOIo4SRNXkJiH8 jHPHR2XsE9248xilrBhxEA09Cd39x1gZZAjaOeIe/NLlbS5wj4t0zF2O0d+05ROdrjRs s/Xg== X-Forwarded-Encrypted: i=1; AJvYcCWReBJL2KT5XEVoyzWqHsjQsaPeikscik4eEFsiQctlGsABVeQY+QQdu0AYXqW70d4Z24cOwEJ1KQ==@kvack.org X-Gm-Message-State: AOJu0YzBpWbmtjRVzoPepiPjKv0IcfNktRnkNLvb5S1Sgx+fTAwE+XNn EZcUQb9CKNb94j8595ktvEo5/0j8kGz3qPQ24iHjMN3IVQKmxu3o2Tm9tJ4cqIHkj1/X2NL8DRE ZYA== X-Google-Smtp-Source: AGHT+IGjhRps2uk6ZFQZIC7R3fpxDHdrRFbLqUZ0BLeU3xJE1wYlTDOJepff3lsj9OvAu2ohK9dAI+Xpphc= X-Received: from pjf11.prod.google.com ([2002:a17:90b:3f0b:b0:2fa:1803:2f9f]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2743:b0:2ff:5a9d:9390 with SMTP id 98e67ed59e1d1-3030fe779bbmr123144a91.8.1742492378297; Thu, 20 Mar 2025 10:39:38 -0700 (PDT) Date: Thu, 20 Mar 2025 10:39:30 -0700 In-Reply-To: <20250320173931.1583800-1-surenb@google.com> Mime-Version: 1.0 References: <20250320173931.1583800-1-surenb@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250320173931.1583800-3-surenb@google.com> Subject: [RFC 2/3] mm: introduce GCMA From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, david@redhat.com, vbabka@suse.cz, lorenzo.stoakes@oracle.com, liam.howlett@oracle.com, alexandru.elisei@arm.com, peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, m.szyprowski@samsung.com, iamjoonsoo.kim@lge.com, mina86@mina86.com, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, hbathini@linux.ibm.com, sourabhjain@linux.ibm.com, ritesh.list@gmail.com, aneesh.kumar@kernel.org, bhelgaas@google.com, sj@kernel.org, fvdl@google.com, ziy@nvidia.com, yuzhao@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Minchan Kim X-Rspamd-Server: rspam07 X-Rspam-User: X-Stat-Signature: stwfy1m679qpth9ncbf4w9x3o1p586ij X-Rspamd-Queue-Id: 7AA53180016 X-HE-Tag: 1742492379-344436 X-HE-Meta: U2FsdGVkX1+FS5Xfc2DESk0HOXwhdcBMXnEcS/T8jAXlQdKE00baZpv5BhAl3KP7PYOEN+2ZT7oARp3hfIx/JmNgPSCt2RRtRU66YQHqLpdkF+G4PukZKCZMUteYASuW71UMHMumF3ixRLgsDxaZyTw8Qdb4bzY+RZXQRoj4tYtwFvEhzoincd5IlPW4UFGTqNXQNataDtrppotHW291hiwaHxG70s4oFddUPVqjeXTkQ2n8LFJkEcvlGyRohznP+tFHzl7sUI2DY7oovW8rc9On+FOFohjEyjwLbBE1745MBSUdmIjTes5gOli4i3rKX7Y/oSFRAa0ZkqPnpJ11zwAOQdEKyD8ps7aUq6sJoYRhcN43VP9jrEUdxiYaOPJv4HE3BpEhJV9I6i+QJHCd9pr+tDv7dUY6irX8uImdahB2tYUGkBOxXEV9TWtyjTOPui95ZeMtlOConUpec6w/F6Rubo0j1XhMXqe5Kexf4X5L6k1FPL9Cde+pEsS8/BZb0cpGXqrIjDsh/BahtpqpYsgiUToZca37rLVHgqgi+wqaMq+KN1zmmAXZFJtAu9C1h0Q05GjkIFJ7FSB79N+KmHxKsGp4sKT6EmhnT7ku/r8kHMsYFdCdW6yTLffLu5AdJmS9qKadVqkLhfnGgCQpF3rfbXAJ87kVgr38QyBKx3wYWbWu9x8Y5kGgNTurKnWsBrmipb6Eafu7Xr2JrZbYzUe4CpjLlMXea9/DHOUDJquFt5Psh/M1hHtuf+i+41xVeOJLlmuUjls2WUnzQKn+xBPRt1S7xVmYjZegyhiY9eui8flnvzkgQh2PskTAdS4aH0tuNW3qmVB3VNEL8rP9TbXRk5KlgsACzgXZWNZcE381WSu96tGjfEUNemJPukxgXA6ZFM02Qs/JBw5ty1OiceAV7h/YI0fLegdHzvDiOPE9iQ3Xmp/ftb6uYZKb3vI6iqjXRKDXmzSsed2rV0V G2jawbdt hrku+hACrOjUD4n1xApp2lcF5aZts9xKkWtiA7tYmqPBvqaP01vCw6Aer93ALIrYX1HuM0GfFGPxA/36AkYSaayMUMlKn6MQdCHIMfP+hdjIZjNIQ9RFJacFih+WaLSXPHImR8wiL7Kbb9EggHkpuV8bO/SPOm8/50X7pUmgzu5+ghHC8zchlEZtiC3VZyp9uFhh9fmsKHdRgbJ36OYBXplBhzw9MdBkEIrI/2lC3AImJJ+m/+QFj9VtSfEe8WKP9zFuTj1PCSWNlqYVolOxWvpHxoRJzCHII6cn4pvBG0Xs+DxSauypYNlVu6MwmWPTFaUqFBv42IDFARUvWvziqpWAN2OUNiPbYW+pZFCzYaGTSLtTJoNGyUUJbqUNie3sWv8B2tfKtiAagUGWHlkMagMnL+K8xv/GqJme+fk807xa1oRFHM4cuYGKZ5dnhBogKDHxvPSwWkgpjIYqEaGgOEsJ8ZhZ4P4Mo9vd4ypsYH0isbQwrCdDtxXUhA1ws1LMu6/YYTFdGT+W2Ec/ceBblA4XRceHDpHQ2sNGVPki83bB7TRj4GZVprcvBNfllfTR4yxVj429GRRKhIjf80L7UHkqRzbIoKJfXCfUr7Rb3BPWrq96aHLnpXbIhQ3u6Bo4VPDdUloxpH9f4kDXfp//hNy31iggTLAYiFS3ISOy9h3Div2EvzgVVhXaqi6KeObmBNhLmLgPDdX/CmaOKuRdlprlDqk9FKsBwyT8J16+mSSrFiCw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Minchan Kim This patch introduces GCMA (Guaranteed Contiguous Memory Allocator) cleacache backend which reserves some amount of memory at the boot and then donates it to store clean file-backed pages in the cleancache. GCMA aims to guarantee contiguous memory allocation success as well as low and deterministic allocation latency. Notes: Originally, the idea was posted by SeongJae Park and Minchan Kim [1]. Later Minchan reworked it to be used in Android as a reference for Android vendors to use [2]. [1] https://lwn.net/Articles/619865/ [2] https://android-review.googlesource.com/q/topic:%22gcma_6.12%22 Signed-off-by: Minchan Kim Signed-off-by: Suren Baghdasaryan --- include/linux/gcma.h | 12 ++++ mm/Kconfig | 15 +++++ mm/Makefile | 1 + mm/gcma.c | 155 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 183 insertions(+) create mode 100644 include/linux/gcma.h create mode 100644 mm/gcma.c diff --git a/include/linux/gcma.h b/include/linux/gcma.h new file mode 100644 index 000000000000..2ce40fcc74a5 --- /dev/null +++ b/include/linux/gcma.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __GCMA_H__ +#define __GCMA_H__ + +#include + +int gcma_register_area(const char *name, + unsigned long start_pfn, unsigned long count); +void gcma_alloc_range(unsigned long start_pfn, unsigned long count); +void gcma_free_range(unsigned long start_pfn, unsigned long count); + +#endif /* __GCMA_H__ */ diff --git a/mm/Kconfig b/mm/Kconfig index d6ebf0fb0432..85268ef901b6 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1002,6 +1002,21 @@ config CMA_AREAS If unsure, leave the default value "8" in UMA and "20" in NUMA. +config GCMA + bool "GCMA (Guaranteed Contiguous Memory Allocator)" + depends on CLEANCACHE + help + This enables the Guaranteed Contiguous Memory Allocator to allow + low latency guaranteed contiguous memory allocations. Memory + reserved by GCMA is donated to cleancache to be used as pagecache + extension. Once GCMA allocation is requested, necessary pages are + taken back from the cleancache and used to satisfy the request. + Cleancache guarantees low latency successful allocation as long + as the total size of GCMA allocations does not exceed the size of + the memory donated to the cleancache. + + If unsure, say "N". + config MEM_SOFT_DIRTY bool "Track memory changes" depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS diff --git a/mm/Makefile b/mm/Makefile index 084dbe9edbc4..2173d395d371 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -149,3 +149,4 @@ obj-$(CONFIG_EXECMEM) += execmem.o obj-$(CONFIG_TMPFS_QUOTA) += shmem_quota.o obj-$(CONFIG_PT_RECLAIM) += pt_reclaim.o obj-$(CONFIG_CLEANCACHE) += cleancache.o +obj-$(CONFIG_GCMA) += gcma.o diff --git a/mm/gcma.c b/mm/gcma.c new file mode 100644 index 000000000000..263e63da0c89 --- /dev/null +++ b/mm/gcma.c @@ -0,0 +1,155 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GCMA (Guaranteed Contiguous Memory Allocator) + * + */ + +#define pr_fmt(fmt) "gcma: " fmt + +#include +#include +#include +#include +#include +#include +#include + +#define MAX_GCMA_AREAS 64 +#define GCMA_AREA_NAME_MAX_LEN 32 + +struct gcma_area { + int area_id; + unsigned long start_pfn; + unsigned long end_pfn; + char name[GCMA_AREA_NAME_MAX_LEN]; +}; + +static struct gcma_area areas[MAX_GCMA_AREAS]; +static atomic_t nr_gcma_area = ATOMIC_INIT(0); +static DEFINE_SPINLOCK(gcma_area_lock); + +static void alloc_page_range(struct gcma_area *area, + unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long scanned = 0; + unsigned long pfn; + struct page *page; + int err; + + for (pfn = start_pfn; pfn < end_pfn; pfn++) { + if (!(++scanned % XA_CHECK_SCHED)) + cond_resched(); + + page = pfn_to_page(pfn); + err = cleancache_backend_get_folio(area->area_id, page_folio(page)); + VM_BUG_ON(err); + } +} + +static void free_page_range(struct gcma_area *area, + unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long scanned = 0; + unsigned long pfn; + struct page *page; + int err; + + for (pfn = start_pfn; pfn < end_pfn; pfn++) { + if (!(++scanned % XA_CHECK_SCHED)) + cond_resched(); + + page = pfn_to_page(pfn); + err = cleancache_backend_put_folio(area->area_id, + page_folio(page)); + VM_BUG_ON(err); + } +} + +int gcma_register_area(const char *name, + unsigned long start_pfn, unsigned long count) +{ + LIST_HEAD(folios); + int i, area_id; + int nr_area; + int ret = 0; + + for (i = 0; i < count; i++) { + struct folio *folio; + + folio = page_folio(pfn_to_page(start_pfn + i)); + list_add(&folio->lru, &folios); + } + + area_id = cleancache_register_backend(name, &folios); + if (area_id < 0) + return area_id; + + spin_lock(&gcma_area_lock); + + nr_area = atomic_read(&nr_gcma_area); + if (nr_area < MAX_GCMA_AREAS) { + struct gcma_area *area = &areas[nr_area]; + + area->area_id = area_id; + area->start_pfn = start_pfn; + area->end_pfn = start_pfn + count; + strscpy(area->name, name); + /* Ensure above stores complete before we increase the count */ + atomic_set_release(&nr_gcma_area, nr_area + 1); + } else { + ret = -ENOMEM; + } + + spin_unlock(&gcma_area_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(gcma_register_area); + +void gcma_alloc_range(unsigned long start_pfn, unsigned long count) +{ + int nr_area = atomic_read_acquire(&nr_gcma_area); + unsigned long end_pfn = start_pfn + count; + struct gcma_area *area; + int i; + + for (i = 0; i < nr_area; i++) { + unsigned long s_pfn, e_pfn; + + area = &areas[i]; + if (area->end_pfn <= start_pfn) + continue; + + if (area->start_pfn > end_pfn) + continue; + + s_pfn = max(start_pfn, area->start_pfn); + e_pfn = min(end_pfn, area->end_pfn); + alloc_page_range(area, s_pfn, e_pfn); + } +} +EXPORT_SYMBOL_GPL(gcma_alloc_range); + +void gcma_free_range(unsigned long start_pfn, unsigned long count) +{ + int nr_area = atomic_read_acquire(&nr_gcma_area); + unsigned long end_pfn = start_pfn + count; + struct gcma_area *area; + int i; + + for (i = 0; i < nr_area; i++) { + unsigned long s_pfn, e_pfn; + + area = &areas[i]; + if (area->end_pfn <= start_pfn) + continue; + + if (area->start_pfn > end_pfn) + continue; + + s_pfn = max(start_pfn, area->start_pfn); + e_pfn = min(end_pfn, area->end_pfn); + free_page_range(area, s_pfn, e_pfn); + } +} +EXPORT_SYMBOL_GPL(gcma_free_range); From patchwork Thu Mar 20 17:39:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 14024212 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A10E2C28B30 for ; Thu, 20 Mar 2025 17:39:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C628B280008; Thu, 20 Mar 2025 13:39:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE77F280006; Thu, 20 Mar 2025 13:39:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99F31280008; Thu, 20 Mar 2025 13:39:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 72088280006 for ; Thu, 20 Mar 2025 13:39:42 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1988F1CCEA8 for ; Thu, 20 Mar 2025 17:39:43 +0000 (UTC) X-FDA: 83242641846.07.C04FEAE Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf15.hostedemail.com (Postfix) with ESMTP id 33982A0003 for ; Thu, 20 Mar 2025 17:39:41 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rYCUVAhY; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of 33FLcZwYKCDknpmZiWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--surenb.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=33FLcZwYKCDknpmZiWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--surenb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742492381; a=rsa-sha256; cv=none; b=2p8lMA7UWR+Kg8ByxCCA3GdOeTle10IgKaQnzdp3rIBGI1+1iDJXX4JNa4rHL240vuhTpu dzeOdx4JuqwclISHOG4mcdYJ0hkGAIchXvxs0VZ/Y141D953EuII6Ks5TBu3AKxDQYZF/3 rN6eotjqVYtQXDkhkJY+YUNfy3j/U9Y= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rYCUVAhY; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of 33FLcZwYKCDknpmZiWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--surenb.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=33FLcZwYKCDknpmZiWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--surenb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742492381; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=L1LeZ1NqRBiJP18aw9wNSepnO5CEnnjsFwRWiKcady0=; b=J1I+ALOjdXEZ0fzdccXV7NRDfthYKERTkQ473SbdTK0iZEO17EJx59DDjE4JW8oqay99wH HUY8+Ts8+X5+eST1qPL1Dqla6VoW/YLdQdOVcbVxWvTWpWC2V+LfqVrTbWN6LqwI2CV1tF qDR+J5FJgl2Fd/bjPesP8v22VRxHpzk= Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-224192ff68bso13075915ad.1 for ; Thu, 20 Mar 2025 10:39:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742492380; x=1743097180; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=L1LeZ1NqRBiJP18aw9wNSepnO5CEnnjsFwRWiKcady0=; b=rYCUVAhYRfZr5qDJxPQ03e7acBeCZaL0zUFfQNiV6musCgYGSlAunbMtzpBMxIs6G9 ZpHZ/SWPgl0FAoGbntGT16e+KLjTN3crLw8XbCfifW9bE0jL45ngnaZShLm0ytojrlSj ts3jkeFdi8h6ge7V+lbQUOgKtMkjlO1Pl0dDdoRwZMyrBil+F7n8Rhum7xwSJO3BlYnd 5WHP92C0XxWC4liPLwYuo0Vg1ZEWQZYQUmnSmtGAP/YMoxlljNm8tdvbMQcf/aL+Jo4V sYo3k66ccWrJSC3zlpZ0uxSR+C36Tz9rVisFXG0RIbjo4HgGnzB4c6SZuRfsYE0P7z+5 6o9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742492380; x=1743097180; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=L1LeZ1NqRBiJP18aw9wNSepnO5CEnnjsFwRWiKcady0=; b=k7ddiERB/6SSSbIaGJhdbBCrT1NKRWwEjSBM7XKmk+VFs1qGiOanRhFozxLUhA6f7E 1J/3tgQZrUrplH4Ky2LBwKXvPU9sj91nibi5TYtXyC1RFeEDocGdKUI17VWYE0uPZtHU P38Pnr9yJAYHOFst6ZhS6UqKaBYif2VdBiooqRykBY9+ej8g2CM8nCUnGBLr0LwCyTqj MsWn556SHSiM+IwYJ6zd9HfxL0uLy5dYyBsAZjtGaz34ohF8qUwSNM5tKKM8mHIDLm3E V63wpiWz3dsSsinIKK8v12LK+34xoKlKr918g+69gyO+7CKvoefJgddP+eSrfhd02h75 wUPQ== X-Forwarded-Encrypted: i=1; AJvYcCUovLVW31vwt2Iic4ObT5PcsnowH5s8IYM4Pagal83VCDEaquCtPR/gLBoBm3mjvM4FpBWvr3bokQ==@kvack.org X-Gm-Message-State: AOJu0Yw9ncEBEAgcEI6xZBUZ1Lpk6g3apc/MgX2Y8dcKjx7KjwYnYXiQ FkOQCyTewTZPVA7ku990xrbFFvGTHsb58Uk1eDhEJjvkfI7fWW8QsgpOtY2Pfe6Xsg3oYUpRu39 jXQ== X-Google-Smtp-Source: AGHT+IFIgtoncBq9NM/3xD7Imlcbq1xmz7ZhqdBzB7CyCMtuFxcwQyRcHUOe3EsR477ZgjHZHfVYEHABcPU= X-Received: from plbkj16.prod.google.com ([2002:a17:903:6d0:b0:220:ecac:27e5]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e888:b0:223:35cb:e421 with SMTP id d9443c01a7336-22780e258cemr2827355ad.49.1742492380069; Thu, 20 Mar 2025 10:39:40 -0700 (PDT) Date: Thu, 20 Mar 2025 10:39:31 -0700 In-Reply-To: <20250320173931.1583800-1-surenb@google.com> Mime-Version: 1.0 References: <20250320173931.1583800-1-surenb@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250320173931.1583800-4-surenb@google.com> Subject: [RFC 3/3] mm: integrate GCMA with CMA using dt-bindings From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, david@redhat.com, vbabka@suse.cz, lorenzo.stoakes@oracle.com, liam.howlett@oracle.com, alexandru.elisei@arm.com, peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, m.szyprowski@samsung.com, iamjoonsoo.kim@lge.com, mina86@mina86.com, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, hbathini@linux.ibm.com, sourabhjain@linux.ibm.com, ritesh.list@gmail.com, aneesh.kumar@kernel.org, bhelgaas@google.com, sj@kernel.org, fvdl@google.com, ziy@nvidia.com, yuzhao@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Minchan Kim X-Rspam-User: X-Rspamd-Queue-Id: 33982A0003 X-Stat-Signature: is4p9e8tgj8tj5si9hmjqmjnmdkuqpgh X-Rspamd-Server: rspam06 X-HE-Tag: 1742492380-628271 X-HE-Meta: U2FsdGVkX18ZRhGkZLXR4mOsyR5KyIyUjeGRSY9U636DVV4MqpEIN9KIFp8LQfQ6bzdfMfwg3JNiV/f647ZVS3j5gVTTVMxl9IJRhuRp6iDWoy7niGqmFc1r+P5CEOY0kdU4cBxrOKTw4qSr97k3TYIi61nyBzW1X6Q5k+ogSfQNZVxIRzADObuRx1NKKPb8gEbBdL8cV8XgEekIGYixsor0Sz0r/nHxaMPVFTTJikokaXOApl/7i1QOq77QAzzIdZgR25uPx05vPuZKBxGtzYDZ6REAEJwx6CvCFi8orGkuj2cM5vu6FjELuGP0tpaqBiqCBKTuJKcR7mi2mQvCyR6wn/+E3rixcYlvBFxv+RZEN/FWWuFS+bcpjZv0V5csPmorEDIzJiYdogh70/4ZGm3oPSooJqZNjXHqQFmWMheViFxfy0nG/RU+/Yf8Wxqgmfe7BpZthUkdY0F9Xi4fGuvcpRD7xk0kDfnDkVDjaaOmwRnUURhG7xuYTQO9E1FPrFIYlaUCpXiU0t2EkD1upuWEkmTZ9vIrJ4efprbQzDZMcI2Rqu5LGwIZ0PKUqPWR/MTIMPB0+2pqutafm21ARKanSG87/94T49ptwamey95hG6adPvP4CVxe5Ez7AzWbdrcPmi0zzVVLlUs/A/giqR+TZPCV1EWJ2x3B9HqHLQIAaE7qs+thG8i4/fLocvqdUDbxVuVrhn3QFFgWBXVww2676ZjuxGcoqW6dKT6HmXjSxbSWYb1iWNc1ZPmqUsCoFP+17pvpHzbyOYd9XW/sbuvt1giph549lODOVtID1toXtLWye2bu9bZ+Upe9BadrokCttC/7jUql4zkWJ+cGCTh1WGg8ZyClx/WbpiWC1VINaqgZqoMxT0m4Bv9NilZ40NU0GKtNT5P5CpLm/D9zvjCZtsr/f42QTuj5sJrWOioKRbvzMaRZisONMmEXMBKZHzTg4A20rDnKgAcJAtS jfWjPJby bvG0kaEVn0B0O1aLyuwuwe53kPkpcj3sGWWxnS/MVRI0sN6/PzdHoYlAeo0xs3wIMjsdqYvUiDVHwSiG60Ch7nOrHu1pTuwJAOieNaHkssWdAOaUcIQ7B+RVTxysMm7qf4Byym/HwaF6VJ7FDRvzv2hlEH7/cUrmGYw1zIhfHqtgoYoJOtjwShK5TGpXbH29bbBBiOa4Fr5ymanIaPIxoOg0zDBKoluSQaRvcF69M68Ed/N34LTGeM1eUH7nZ8CE+PuofY//89xhhNPE2Yehy2MJ0PftYqmX0MNIjnaact4VkAMyiLmgloFsZc06ZilwojVlVEaeKyTwbfpGWxLnC5f8nVmqv9UDes6H26uvN/9klHyap7qBJGIwYo+ykFCc91T75VO1RkymYL+YTV0IpqHoVPHxW9AteUUJ/I52B5wQlzBf1jfYPwmTBHf9VCmSIs73xJHcr0mbBh0JLSRnr/kBRwux0DHY/C7wmDEjU1Uipf45OJzdAx/+QNa+94jKj3unhP8I1HcaxHwsaHhiWLRGVnotYN84TyX2YUGjEw2yeIxa2WJtWLLMc2GVy+HcOjw3nhH+7WNocIb2iuEMFt4AfjGIKsLP81FW3m16Xp+L3GRo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch introduces a new "guarantee" property for shared-dma-pool. With this property, admin can create specific memory pool as GCMA-based CMA if they care about allocation success rate and latency. The downside of GCMA is that it can host only clean file-backed pages since it's using cleancache as its secondary user. Signed-off-by: Minchan Kim Signed-off-by: Suren Baghdasaryan --- arch/powerpc/kernel/fadump.c | 2 +- include/linux/cma.h | 2 +- kernel/dma/contiguous.c | 11 ++++++++++- mm/cma.c | 33 ++++++++++++++++++++++++++------- mm/cma.h | 1 + mm/cma_sysfs.c | 10 ++++++++++ 6 files changed, 49 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 4b371c738213..4eb7be0cdcdb 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -111,7 +111,7 @@ void __init fadump_cma_init(void) return; } - rc = cma_init_reserved_mem(base, size, 0, "fadump_cma", &fadump_cma); + rc = cma_init_reserved_mem(base, size, 0, "fadump_cma", &fadump_cma, false); if (rc) { pr_err("Failed to init cma area for firmware-assisted dump,%d\n", rc); /* diff --git a/include/linux/cma.h b/include/linux/cma.h index 62d9c1cf6326..3207db979e94 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -46,7 +46,7 @@ extern int __init cma_declare_contiguous_multi(phys_addr_t size, extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, unsigned int order_per_bit, const char *name, - struct cma **res_cma); + struct cma **res_cma, bool gcma); extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned int align, bool no_warn); extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned long count); diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c index 055da410ac71..a68b3123438c 100644 --- a/kernel/dma/contiguous.c +++ b/kernel/dma/contiguous.c @@ -459,6 +459,7 @@ static int __init rmem_cma_setup(struct reserved_mem *rmem) unsigned long node = rmem->fdt_node; bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL); struct cma *cma; + bool gcma; int err; if (size_cmdline != -1 && default_cma) { @@ -476,7 +477,15 @@ static int __init rmem_cma_setup(struct reserved_mem *rmem) return -EINVAL; } - err = cma_init_reserved_mem(rmem->base, rmem->size, 0, rmem->name, &cma); + gcma = !!of_get_flat_dt_prop(node, "guarantee", NULL); +#ifndef CONFIG_GCMA + if (gcma) { + pr_err("Reserved memory: unable to setup GCMA region, GCMA is not enabled\n"); + return -EINVAL; + } +#endif + err = cma_init_reserved_mem(rmem->base, rmem->size, 0, rmem->name, + &cma, gcma); if (err) { pr_err("Reserved memory: unable to setup CMA region\n"); return err; diff --git a/mm/cma.c b/mm/cma.c index b06d5fe73399..f12cef849e58 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -165,11 +166,18 @@ static void __init cma_activate_area(struct cma *cma) count = cmr->early_pfn - cmr->base_pfn; bitmap_count = cma_bitmap_pages_to_bits(cma, count); bitmap_set(cmr->bitmap, 0, bitmap_count); + } else { + count = 0; } - for (pfn = cmr->early_pfn; pfn < cmr->base_pfn + cmr->count; - pfn += pageblock_nr_pages) - init_cma_reserved_pageblock(pfn_to_page(pfn)); + if (cma->gcma) { + gcma_register_area(cma->name, cmr->early_pfn, + cma->count - count); + } else { + for (pfn = cmr->early_pfn; pfn < cmr->base_pfn + cmr->count; + pfn += pageblock_nr_pages) + init_cma_reserved_pageblock(pfn_to_page(pfn)); + } } spin_lock_init(&cma->lock); @@ -270,7 +278,7 @@ static void __init cma_drop_area(struct cma *cma) int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, unsigned int order_per_bit, const char *name, - struct cma **res_cma) + struct cma **res_cma, bool gcma) { struct cma *cma; int ret; @@ -301,6 +309,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, cma->ranges[0].count = cma->count; cma->nranges = 1; cma->nid = NUMA_NO_NODE; + cma->gcma = gcma; *res_cma = cma; @@ -721,7 +730,8 @@ static int __init __cma_declare_contiguous_nid(phys_addr_t base, base = addr; } - ret = cma_init_reserved_mem(base, size, order_per_bit, name, res_cma); + ret = cma_init_reserved_mem(base, size, order_per_bit, name, res_cma, + false); if (ret) memblock_phys_free(base, size); @@ -815,7 +825,13 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr, pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit); mutex_lock(&cma->alloc_mutex); - ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp); + if (cma->gcma) { + gcma_alloc_range(pfn, count); + ret = 0; + } else { + ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, + gfp); + } mutex_unlock(&cma->alloc_mutex); if (ret == 0) { page = pfn_to_page(pfn); @@ -992,7 +1008,10 @@ bool cma_release(struct cma *cma, const struct page *pages, if (r == cma->nranges) return false; - free_contig_range(pfn, count); + if (cma->gcma) + gcma_free_range(pfn, count); + else + free_contig_range(pfn, count); cma_clear_bitmap(cma, cmr, pfn, count); cma_sysfs_account_release_pages(cma, count); trace_cma_release(cma->name, pfn, pages, count); diff --git a/mm/cma.h b/mm/cma.h index 41a3ab0ec3de..c2a5576d7987 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -47,6 +47,7 @@ struct cma { char name[CMA_MAX_NAME]; int nranges; struct cma_memrange ranges[CMA_MAX_RANGES]; + bool gcma; #ifdef CONFIG_CMA_SYSFS /* the number of CMA page successful allocations */ atomic64_t nr_pages_succeeded; diff --git a/mm/cma_sysfs.c b/mm/cma_sysfs.c index 97acd3e5a6a5..4ecc36270a4d 100644 --- a/mm/cma_sysfs.c +++ b/mm/cma_sysfs.c @@ -80,6 +80,15 @@ static ssize_t available_pages_show(struct kobject *kobj, } CMA_ATTR_RO(available_pages); +static ssize_t gcma_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct cma *cma = cma_from_kobj(kobj); + + return sysfs_emit(buf, "%d\n", cma->gcma); +} +CMA_ATTR_RO(gcma); + static void cma_kobj_release(struct kobject *kobj) { struct cma *cma = cma_from_kobj(kobj); @@ -95,6 +104,7 @@ static struct attribute *cma_attrs[] = { &release_pages_success_attr.attr, &total_pages_attr.attr, &available_pages_attr.attr, + &gcma_attr.attr, NULL, }; ATTRIBUTE_GROUPS(cma);