Message ID | 20220706082016.2603916-4-chao.p.peng@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: mm: fd-based approach for supporting KVM guest private memory | expand |
On 06.07.22 10:20, Chao Peng wrote: > This patch introduces memfile_notifier facility so existing memory file > subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a > third kernel component to make use of memory bookmarked in the memory > file and gets notified when the pages in the memory file become > invalidated. Stupid question, but why is this called "memfile_notifier" and not "memfd_notifier". We're only dealing with memfd's after all ... which are anonymous files essentially. Or what am I missing? Are there any other plans for fs than plain memfd support that I am not aware of? > > It will be used for KVM to use a file descriptor as the guest memory > backing store and KVM will use this memfile_notifier interface to > interact with memory file subsystems. In the future there might be other > consumers (e.g. VFIO with encrypted device memory). > > It consists below components: > - memfile_backing_store: Each supported memory file subsystem can be > implemented as a memory backing store which bookmarks memory and > provides callbacks for other kernel systems (memfile_notifier > consumers) to interact with. > - memfile_notifier: memfile_notifier consumers defines callbacks and > associate them to a file using memfile_register_notifier(). > - memfile_node: A memfile_node is associated with the file (inode) from > the backing store and includes feature flags and a list of registered > memfile_notifier for notifying. > > In KVM usages, userspace is in charge of guest memory lifecycle: it first > allocates pages in memory backing store and then passes the fd to KVM and > lets KVM register memory slot to memory backing store via > memfile_register_notifier. Can we add documentation/description in any form how the different functions exposed in linux/memfile_notifier.h are supposed to be used? Staring at memfile_node_set_flags() and memfile_notifier_invalidate() it's not immediately clear to me who's supposed to call that and under which conditions.
On Fri, Aug 05, 2022 at 03:22:58PM +0200, David Hildenbrand wrote: > On 06.07.22 10:20, Chao Peng wrote: > > This patch introduces memfile_notifier facility so existing memory file > > subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a > > third kernel component to make use of memory bookmarked in the memory > > file and gets notified when the pages in the memory file become > > invalidated. > > Stupid question, but why is this called "memfile_notifier" and not > "memfd_notifier". We're only dealing with memfd's after all ... which > are anonymous files essentially. Or what am I missing? Are there any > other plans for fs than plain memfd support that I am not aware of? There were some discussions on this in v3. https://lkml.org/lkml/2021/12/28/484 Sean commented it's OK to abstract it from memfd but he also wants the kAPI (name) should not bind to memfd to make room for future non-memfd usages. > > > > > It will be used for KVM to use a file descriptor as the guest memory > > backing store and KVM will use this memfile_notifier interface to > > interact with memory file subsystems. In the future there might be other > > consumers (e.g. VFIO with encrypted device memory). > > > > It consists below components: > > - memfile_backing_store: Each supported memory file subsystem can be > > implemented as a memory backing store which bookmarks memory and > > provides callbacks for other kernel systems (memfile_notifier > > consumers) to interact with. > > - memfile_notifier: memfile_notifier consumers defines callbacks and > > associate them to a file using memfile_register_notifier(). > > - memfile_node: A memfile_node is associated with the file (inode) from > > the backing store and includes feature flags and a list of registered > > memfile_notifier for notifying. > > > > In KVM usages, userspace is in charge of guest memory lifecycle: it first > > allocates pages in memory backing store and then passes the fd to KVM and > > lets KVM register memory slot to memory backing store via > > memfile_register_notifier. > > Can we add documentation/description in any form how the different > functions exposed in linux/memfile_notifier.h are supposed to be used? Yeah, code comments can be added. > > Staring at memfile_node_set_flags() and memfile_notifier_invalidate() > it's not immediately clear to me who's supposed to call that and under > which conditions. I will also amend the commit message. Chao > > -- > Thanks, > > David / dhildenb
On 10.08.22 11:22, Chao Peng wrote: > On Fri, Aug 05, 2022 at 03:22:58PM +0200, David Hildenbrand wrote: >> On 06.07.22 10:20, Chao Peng wrote: >>> This patch introduces memfile_notifier facility so existing memory file >>> subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a >>> third kernel component to make use of memory bookmarked in the memory >>> file and gets notified when the pages in the memory file become >>> invalidated. >> >> Stupid question, but why is this called "memfile_notifier" and not >> "memfd_notifier". We're only dealing with memfd's after all ... which >> are anonymous files essentially. Or what am I missing? Are there any >> other plans for fs than plain memfd support that I am not aware of? > > There were some discussions on this in v3. > https://lkml.org/lkml/2021/12/28/484 > Sean commented it's OK to abstract it from memfd but he also wants the > kAPI (name) should not bind to memfd to make room for future non-memfd > usages. Sorry, but how is "memfile" any better? memfd abstracted to memfile?! :) I understand Sean's suggestion about abstracting, but if the new name makes it harder to grasp and there isn't really an alternative to memfd in sight, I'm not so sure I enjoy the tried abstraction here. Otherwise we'd have to get creative now and discuss something like "file_population_notifer" or "mapping_population_notifer" and I am not sure that our time is well spent doing so right now. ... as this is kernel-internal, we can always adjust the name as we please later, once we *actually* now what the abstraction should be. Until then I'd suggest to KIS and soft-glue this to memfd. Or am I missing something important?
+Will On Wed, Aug 10, 2022, David Hildenbrand wrote: > On 10.08.22 11:22, Chao Peng wrote: > > On Fri, Aug 05, 2022 at 03:22:58PM +0200, David Hildenbrand wrote: > >> On 06.07.22 10:20, Chao Peng wrote: > >>> This patch introduces memfile_notifier facility so existing memory file > >>> subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a > >>> third kernel component to make use of memory bookmarked in the memory > >>> file and gets notified when the pages in the memory file become > >>> invalidated. > >> > >> Stupid question, but why is this called "memfile_notifier" and not > >> "memfd_notifier". We're only dealing with memfd's after all ... which > >> are anonymous files essentially. Or what am I missing? Are there any > >> other plans for fs than plain memfd support that I am not aware of? > > > > There were some discussions on this in v3. > > https://lkml.org/lkml/2021/12/28/484 > > Sean commented it's OK to abstract it from memfd but he also wants the > > kAPI (name) should not bind to memfd to make room for future non-memfd > > usages. > > Sorry, but how is "memfile" any better? memfd abstracted to memfile?! :) FWIW, I don't really like the memfile name either. > I understand Sean's suggestion about abstracting, but if the new name > makes it harder to grasp and there isn't really an alternative to memfd > in sight, I'm not so sure I enjoy the tried abstraction here. ARM's pKVM implementation is potentially (hopefully) going to switch to this API (as a consumer) sooner than later. If they anticipate being able to use memfd, then there's unlikely to be a second backing type any time soon. Quentin, Will? > Otherwise we'd have to get creative now and discuss something like > "file_population_notifer" or "mapping_population_notifer" and I am not > sure that our time is well spent doing so right now. > > ... as this is kernel-internal, we can always adjust the name as we > please later, once we *actually* now what the abstraction should be. > Until then I'd suggest to KIS and soft-glue this to memfd. > > Or am I missing something important? I don't think you're missing anything. I'd still prefer a name that doesn't couple KVM to memfd, but it's not a sticking point, and I've never been able to come up with a better name... With a little bit of cleverness I think we can keep the coupling in KVM to a minimum, which is what I really care about.
+CC Fuad On Wednesday 10 Aug 2022 at 14:38:43 (+0000), Sean Christopherson wrote: > > I understand Sean's suggestion about abstracting, but if the new name > > makes it harder to grasp and there isn't really an alternative to memfd > > in sight, I'm not so sure I enjoy the tried abstraction here. > > ARM's pKVM implementation is potentially (hopefully) going to switch to this API > (as a consumer) sooner than later. If they anticipate being able to use memfd, > then there's unlikely to be a second backing type any time soon. > > Quentin, Will? Yep, Fuad is currently trying to port the pKVM mm stuff on top of this series to see how well it fits, so stay tuned. I think there is still some room for discussion around page conversions (private->shared etc), and we'll need a clearer idea of what the code might look like to have a constructive discussion, but so far it does seem like using a memfd (the new private one or perhaps just memfd_secret, to be discussed) + memfd notifiers is a promising option.
On Thu, Aug 11, 2022 at 12:27:56PM +0000, Quentin Perret wrote: > +CC Fuad > > On Wednesday 10 Aug 2022 at 14:38:43 (+0000), Sean Christopherson wrote: > > > I understand Sean's suggestion about abstracting, but if the new name > > > makes it harder to grasp and there isn't really an alternative to memfd > > > in sight, I'm not so sure I enjoy the tried abstraction here. > > > > ARM's pKVM implementation is potentially (hopefully) going to switch to this API > > (as a consumer) sooner than later. If they anticipate being able to use memfd, > > then there's unlikely to be a second backing type any time soon. > > > > Quentin, Will? > > Yep, Fuad is currently trying to port the pKVM mm stuff on top of this > series to see how well it fits, so stay tuned. Good to hear that. >I think there is still > some room for discussion around page conversions (private->shared etc), > and we'll need a clearer idea of what the code might look like to have a > constructive discussion, That's fine. Looking forward to your feedbacks. >but so far it does seem like using a memfd (the > new private one or perhaps just memfd_secret, to be discussed) + memfd > notifiers is a promising option. If it still memfd (even memfd_secret), maybe we can use the name memfd_notifier? Chao
diff --git a/include/linux/memfile_notifier.h b/include/linux/memfile_notifier.h new file mode 100644 index 000000000000..c5d66fd8ba53 --- /dev/null +++ b/include/linux/memfile_notifier.h @@ -0,0 +1,93 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_MEMFILE_NOTIFIER_H +#define _LINUX_MEMFILE_NOTIFIER_H + +#include <linux/pfn_t.h> +#include <linux/rculist.h> +#include <linux/spinlock.h> +#include <linux/srcu.h> +#include <linux/fs.h> + +/* memory in the file is inaccessible from userspace (e.g. read/write/mmap) */ +#define MEMFILE_F_USER_INACCESSIBLE BIT(0) +/* memory in the file is unmovable (e.g. via pagemigration)*/ +#define MEMFILE_F_UNMOVABLE BIT(1) +/* memory in the file is unreclaimable (e.g. via kswapd) */ +#define MEMFILE_F_UNRECLAIMABLE BIT(2) + +#define MEMFILE_F_ALLOWED_MASK (MEMFILE_F_USER_INACCESSIBLE | \ + MEMFILE_F_UNMOVABLE | \ + MEMFILE_F_UNRECLAIMABLE) + +struct memfile_node { + struct list_head notifiers; /* registered notifiers */ + unsigned long flags; /* MEMFILE_F_* flags */ +}; + +struct memfile_backing_store { + struct list_head list; + spinlock_t lock; + struct memfile_node* (*lookup_memfile_node)(struct file *file); + int (*get_pfn)(struct file *file, pgoff_t offset, pfn_t *pfn, + int *order); + void (*put_pfn)(pfn_t pfn); +}; + +struct memfile_notifier; +struct memfile_notifier_ops { + void (*invalidate)(struct memfile_notifier *notifier, + pgoff_t start, pgoff_t end); +}; + +struct memfile_notifier { + struct list_head list; + struct memfile_notifier_ops *ops; + struct memfile_backing_store *bs; +}; + +static inline void memfile_node_init(struct memfile_node *node) +{ + INIT_LIST_HEAD(&node->notifiers); + node->flags = 0; +} + +#ifdef CONFIG_MEMFILE_NOTIFIER +/* APIs for backing stores */ +extern void memfile_register_backing_store(struct memfile_backing_store *bs); +extern int memfile_node_set_flags(struct file *file, unsigned long flags); +extern void memfile_notifier_invalidate(struct memfile_node *node, + pgoff_t start, pgoff_t end); +/*APIs for notifier consumers */ +extern int memfile_register_notifier(struct file *file, unsigned long flags, + struct memfile_notifier *notifier); +extern void memfile_unregister_notifier(struct memfile_notifier *notifier); + +#else /* !CONFIG_MEMFILE_NOTIFIER */ +static inline void memfile_register_backing_store(struct memfile_backing_store *bs) +{ +} + +static inline int memfile_node_set_flags(struct file *file, unsigned long flags) +{ + return -EOPNOTSUPP; +} + +static inline void memfile_notifier_invalidate(struct memfile_node *node, + pgoff_t start, pgoff_t end) +{ +} + +static inline int memfile_register_notifier(struct file *file, + unsigned long flags, + struct memfile_notifier *notifier) +{ + return -EOPNOTSUPP; +} + +static inline void memfile_unregister_notifier(struct memfile_notifier *notifier) +{ +} + +#endif /* CONFIG_MEMFILE_NOTIFIER */ + +#endif /* _LINUX_MEMFILE_NOTIFIER_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 169e64192e48..19ab9350f5cb 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1130,6 +1130,10 @@ config PTE_MARKER_UFFD_WP purposes. It is required to enable userfaultfd write protection on file-backed memory types like shmem and hugetlbfs. +config MEMFILE_NOTIFIER + bool + select SRCU + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index 6f9ffa968a1a..b7e3fb5fa85b 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -133,3 +133,4 @@ obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o obj-$(CONFIG_IO_MAPPING) += io-mapping.o obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o +obj-$(CONFIG_MEMFILE_NOTIFIER) += memfile_notifier.o diff --git a/mm/memfile_notifier.c b/mm/memfile_notifier.c new file mode 100644 index 000000000000..799d3197903e --- /dev/null +++ b/mm/memfile_notifier.c @@ -0,0 +1,121 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2022 Intel Corporation. + * Chao Peng <chao.p.peng@linux.intel.com> + */ + +#include <linux/memfile_notifier.h> +#include <linux/pagemap.h> +#include <linux/srcu.h> + +DEFINE_STATIC_SRCU(memfile_srcu); +static __ro_after_init LIST_HEAD(backing_store_list); + + +void memfile_notifier_invalidate(struct memfile_node *node, + pgoff_t start, pgoff_t end) +{ + struct memfile_notifier *notifier; + int id; + + id = srcu_read_lock(&memfile_srcu); + list_for_each_entry_srcu(notifier, &node->notifiers, list, + srcu_read_lock_held(&memfile_srcu)) { + if (notifier->ops->invalidate) + notifier->ops->invalidate(notifier, start, end); + } + srcu_read_unlock(&memfile_srcu, id); +} + +void __init memfile_register_backing_store(struct memfile_backing_store *bs) +{ + spin_lock_init(&bs->lock); + list_add_tail(&bs->list, &backing_store_list); +} + +static void memfile_node_update_flags(struct file *file, unsigned long flags) +{ + struct address_space *mapping = file_inode(file)->i_mapping; + gfp_t gfp; + + gfp = mapping_gfp_mask(mapping); + if (flags & MEMFILE_F_UNMOVABLE) + gfp &= ~__GFP_MOVABLE; + else + gfp |= __GFP_MOVABLE; + mapping_set_gfp_mask(mapping, gfp); + + if (flags & MEMFILE_F_UNRECLAIMABLE) + mapping_set_unevictable(mapping); + else + mapping_clear_unevictable(mapping); +} + +int memfile_node_set_flags(struct file *file, unsigned long flags) +{ + struct memfile_backing_store *bs; + struct memfile_node *node; + + if (flags & ~MEMFILE_F_ALLOWED_MASK) + return -EINVAL; + + list_for_each_entry(bs, &backing_store_list, list) { + node = bs->lookup_memfile_node(file); + if (node) { + spin_lock(&bs->lock); + node->flags = flags; + spin_unlock(&bs->lock); + memfile_node_update_flags(file, flags); + return 0; + } + } + + return -EOPNOTSUPP; +} + +int memfile_register_notifier(struct file *file, unsigned long flags, + struct memfile_notifier *notifier) +{ + struct memfile_backing_store *bs; + struct memfile_node *node; + struct list_head *list; + + if (!file || !notifier || !notifier->ops) + return -EINVAL; + if (flags & ~MEMFILE_F_ALLOWED_MASK) + return -EINVAL; + + list_for_each_entry(bs, &backing_store_list, list) { + node = bs->lookup_memfile_node(file); + if (node) { + list = &node->notifiers; + notifier->bs = bs; + + spin_lock(&bs->lock); + if (list_empty(list)) + node->flags = flags; + else if (node->flags ^ flags) { + spin_unlock(&bs->lock); + return -EINVAL; + } + + list_add_rcu(¬ifier->list, list); + spin_unlock(&bs->lock); + memfile_node_update_flags(file, flags); + return 0; + } + } + + return -EOPNOTSUPP; +} +EXPORT_SYMBOL_GPL(memfile_register_notifier); + +void memfile_unregister_notifier(struct memfile_notifier *notifier) +{ + spin_lock(¬ifier->bs->lock); + list_del_rcu(¬ifier->list); + spin_unlock(¬ifier->bs->lock); + + synchronize_srcu(&memfile_srcu); +} +EXPORT_SYMBOL_GPL(memfile_unregister_notifier);