mbox series

[RFC,0/4] Add fbind() and NUMA mempolicy support for KVM guest_memfd

Message ID 20241105164549.154700-1-shivankg@amd.com (mailing list archive)
Headers show
Series Add fbind() and NUMA mempolicy support for KVM guest_memfd | expand

Message

Shivank Garg Nov. 5, 2024, 4:45 p.m. UTC
This patch series introduces fbind() syscall to support NUMA memory
policies for KVM guest_memfd, allowing VMMs to configure memory placement
for guest memory. This addresses the current limitation where guest_memfd
allocations ignore NUMA policies, potentially impacting performance of
memory-locality-sensitive workloads.

Currently, while mbind() enables NUMA policy support for userspace
applications, it cannot be used with guest_memfd as the memory isn't mapped
to userspace. This results in guest memory being allocated randomly across
host NUMA nodes, even when specific policies and node preferences are
specified in QEMU commands.

The fbind() syscall is particularly useful for SEV-SNP guests.

Following suggestions from LPC and review comment [1], I switched from an
IOCTL-based approach [2] to fbind() [3]. The fbind() approach is preferred
as it provides a generic NUMA policy configuration working with any fd,
rather than being tied to guest-memfd.

Testing:
QEMU tree- https://github.com/AMDESE/qemu/tree/guest_memfd_fbind_NUMA
Based upon
Github tree- https://github.com/AMDESE/linux/tree/snp-host-latest
Branch: snp-host-latest commit: 85ef1ac

Example command to run a SEV-SNP guest bound to NUMA Node 0 of the host:
$ qemu-system-x86_64 \
   -enable-kvm \
  ...
   -machine memory-encryption=sev0,vmport=off \
   -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \
   -numa node,nodeid=0,memdev=ram0,cpus=0-15 \
   -object memory-backend-memfd,id=ram0,policy=bind,host-nodes=0,size=1024M,share=true,prealloc=false

[1]: https://lore.kernel.org/linux-mm/ZvEga7srKhympQBt@intel.com/
[2]: https://lore.kernel.org/linux-mm/20240919094438.10987-1-shivankg@amd.com
[3]: https://lore.kernel.org/kvm/ZOjpIL0SFH+E3Dj4@google.com/


Shivank Garg (3):
  Introduce fbind syscall
  KVM: guest_memfd: Pass file pointer instead of inode in guest_memfd
    APIs
  KVM: guest_memfd: Enforce NUMA mempolicy if available

Shivansh Dhiman (1):
  mm: Add mempolicy support to the filemap layer

 arch/x86/entry/syscalls/syscall_32.tbl |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl |  1 +
 include/linux/fs.h                     |  3 ++
 include/linux/mempolicy.h              |  3 ++
 include/linux/pagemap.h                | 40 ++++++++++++++++
 include/linux/syscalls.h               |  3 ++
 include/uapi/asm-generic/unistd.h      |  5 +-
 kernel/sys_ni.c                        |  1 +
 mm/Makefile                            |  2 +-
 mm/fbind.c                             | 49 +++++++++++++++++++
 mm/filemap.c                           | 30 ++++++++++--
 mm/mempolicy.c                         | 57 ++++++++++++++++++++++
 virt/kvm/guest_memfd.c                 | 66 +++++++++++++++++++++-----
 13 files changed, 242 insertions(+), 19 deletions(-)
 create mode 100644 mm/fbind.c

Comments

Matthew Wilcox Nov. 5, 2024, 6:55 p.m. UTC | #1
On Tue, Nov 05, 2024 at 04:45:45PM +0000, Shivank Garg wrote:
> This patch series introduces fbind() syscall to support NUMA memory
> policies for KVM guest_memfd, allowing VMMs to configure memory placement
> for guest memory. This addresses the current limitation where guest_memfd
> allocations ignore NUMA policies, potentially impacting performance of
> memory-locality-sensitive workloads.

Why does guest_memfd ignore numa policies?  The pagecache doesn't,
eg in vma_alloc_folio_noprof().