From patchwork Tue Sep 10 23:44:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13799509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A185EE01F1 for ; Tue, 10 Sep 2024 23:46:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A13728D00EF; Tue, 10 Sep 2024 19:45:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 99DCC8D00E2; Tue, 10 Sep 2024 19:45:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 754158D00EF; Tue, 10 Sep 2024 19:45:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 516DB8D00E2 for ; Tue, 10 Sep 2024 19:45:39 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 10E4F1C1F06 for ; Tue, 10 Sep 2024 23:45:39 +0000 (UTC) X-FDA: 82550463198.26.F395B04 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf20.hostedemail.com (Postfix) with ESMTP id 20C4B1C0007 for ; Tue, 10 Sep 2024 23:45:36 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ChtFLklt; spf=pass (imf20.hostedemail.com: domain of 3H9rgZgsKCJQy082F92MHB44CC492.0CA96BIL-AA8Jy08.CF4@flex--ackerleytng.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3H9rgZgsKCJQy082F92MHB44CC492.0CA96BIL-AA8Jy08.CF4@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726011884; a=rsa-sha256; cv=none; b=8l0emdIqcnPbydMY1xPGHXRblZEep9o0/tp/CsrGAhYi08ngyDh4aSMKGLxlFeaADNsLvw mmPZuGA7rL96xI2G4urbVruF4I/+uoDT13U2E2tBR6z0bWkQZaTKnfMGjiCOkb/P9MDa4x FgKy+XXdMZHur2EAfkXVx77wtnys6Ug= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ChtFLklt; spf=pass (imf20.hostedemail.com: domain of 3H9rgZgsKCJQy082F92MHB44CC492.0CA96BIL-AA8Jy08.CF4@flex--ackerleytng.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3H9rgZgsKCJQy082F92MHB44CC492.0CA96BIL-AA8Jy08.CF4@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726011884; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f/AV4772bz9qe8Bi/YmA4hlRAWxNbxHJTt16v2vi3B8=; b=q9NV0S3ECQD/zGDBrgPsmXT9WXjJZ3WzQTF/bMXduYRFx5GuH4ET8K73Ooq3fmJGWEYKFY LjKndLEozOJ9qfl61E+ESAVslZU/HtPdPuXUjgU8+6lZ33xHn3Zly0gqjQMp9LF4JBIj1Q /p8pRISYPDQw7pdIwHt6uM85YYIwVi8= Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-206da734c53so20338195ad.2 for ; Tue, 10 Sep 2024 16:45:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726011936; x=1726616736; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=f/AV4772bz9qe8Bi/YmA4hlRAWxNbxHJTt16v2vi3B8=; b=ChtFLkltMHWBYJUO9g2P8G0VkMpufk2/RyLZPdJIZchrH1uB0T2Lof7uM9ZrSxIlzf 2CG9w7IjuUA0iwXmsCNglq6S+sOGR0gIUiE1hHh1v4ULMAcZGfHBuBBCHrB10gCH+Tbr kJGBp2wfHGq1bolQHI9nIPzw0SL2c6Tw8v49vG8YYBQay8u/gkP6MZ5EitlOO4ri/i53 BPSGnmet0ieZvhnFM6QYPWA770EPM9h4ayd0KfrEDrytgo7t9CnggCaE5Z3ezKsyeC8l A8uBTs9l9iwuI3Oevq074Hm5Q9gNVDjFVpRfOq4VjWni6IuozoD/YcToVikRZyAxpXy0 7Y1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726011936; x=1726616736; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=f/AV4772bz9qe8Bi/YmA4hlRAWxNbxHJTt16v2vi3B8=; b=EMzEl0/ViqFWBVNE8MJ/oOHuCt/Z0VZF/7OyY3pIpmwt8Icq9sGyWKssBoM/+cOIfl wnQT5AzZ2U2gjLfnQTTHEm+LUnRL/AipfkAISVja8ugC0mJ1yIIAijIxdKVbZXKs5dID T1/gQYyPLMl9GyNwZtur0HZD3UkRcnm20DT9B264juv5bjOgQVgDHC7HdXQxF7D6WyhF LMJAj1YSn/zbKsFDVbLp+v8biVuQNNBvd1RmnDIJ2wHJfzCDMTezddYm5/MexSO24+7t eZ9LVzIg3ztcady6B/tcjTpp2mEDY1mDoGftzws5CPYkfaWWuI/jRgUAUdyW3AbsdJME 3xfQ== X-Forwarded-Encrypted: i=1; AJvYcCWYct6RUdWu2uSdUAee4Cp+C3r8VxB9Mw6Ey5V/Q5jn9/GyMRC1VWxZ2jUFPa6ziAiNRBW4f93w1w==@kvack.org X-Gm-Message-State: AOJu0Yz0HSnkdpeTFj7JegWz4ldmT+gGqZycwsxwcoeOQ1EsA5+Hk3yp 3C9e9xqJIK/U/QI8VcMBy+6O1kZEqeI1pYM/6Azgv/0gfkafauAnWuk7glNeJ12+VywHfmiyjvf JCeaOiTo0Z+LzpBgQ98dJ+Q== X-Google-Smtp-Source: AGHT+IHXrryjP2y/Skas2BttlctleDJukyVnmiThULgyHNQcrulpwoY9Zba1IZOLxgpQ53M3aNE5Nft3kH97eZ+fNA== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:902:d2ca:b0:205:799f:125e with SMTP id d9443c01a7336-207521f4414mr358015ad.10.1726011935717; Tue, 10 Sep 2024 16:45:35 -0700 (PDT) Date: Tue, 10 Sep 2024 23:44:07 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: <2daf579fa5d2ba223fa3a907c1048d3ea4458a57.1726009989.git.ackerleytng@google.com> Subject: [RFC PATCH 36/39] KVM: selftests: Refactor vm_mem_add to be more flexible From: Ackerley Tng To: tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk, jgg@nvidia.com, peterx@redhat.com, david@redhat.com, rientjes@google.com, fvdl@google.com, jthoughton@google.com, seanjc@google.com, pbonzini@redhat.com, zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com, isaku.yamahata@intel.com, muchun.song@linux.dev, mike.kravetz@oracle.com Cc: erdemaktas@google.com, vannapurve@google.com, ackerleytng@google.com, qperret@google.com, jhubbard@nvidia.com, willy@infradead.org, shuah@kernel.org, brauner@kernel.org, bfoster@redhat.com, kent.overstreet@linux.dev, pvorel@suse.cz, rppt@kernel.org, richard.weiyang@gmail.com, anup@brainfault.org, haibo1.xu@intel.com, ajones@ventanamicro.com, vkuznets@redhat.com, maciej.wieczor-retman@intel.com, pgonda@google.com, oliver.upton@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-fsdevel@kvack.org X-Stat-Signature: bk6tpq3acmnwng3a6f6tkcjbp3e49die X-Rspamd-Queue-Id: 20C4B1C0007 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1726011936-957134 X-HE-Meta: U2FsdGVkX1/XePdWcH4XsYulsAS/PpzFRyC7CHnsEHQ7a5JJHcMzRwYi2neRjM4FaD8JgJBCY9UU7fFpl8R5hHIVfNhPezuRGSCJH/6vzpns3UVv1fE+2esCFSe8KyORimEOstcX9bK3A0PMtw6z/YMiLWkq1aftxaDVFdVusEsqiuTxzHAaY13yLkzjY09M0b7+EBH/V7e8KqfPjZYL5cmtqWCUhuaIars0ByzEtdImRoM2i/lbb1Z0F44Nuwgw0XH1fcJcw4y5cCgkgXfBt3Wy/zc5iIcXaKCl4BO/+O6Vq9UFBcJTR52u3Ui/hAvgQ5SJwpWIYmWH4Gh9eNqfLF0KFn2XzZ1XYY3HcCqv/TwG9scc0dhqmQxSusHpW0vkw6pep9MlvVxLVir6zk45YPoZ+3BvHwJsGK5j8Oj9rfJHGTMESPSl/QTEPg0hf71w2dR4vly1x5BVsU7NySYeY9qSOXcIQ3+UIjZ3nAfx9vzlOOL2qEaoFh5c0og+h1JCmQFdRERisW36h/FWIs1XhQAmoleJz+F/EgrIC2DpuOB0Hoy+hjv8JycOMVHM8svl+VblbbZSUMezvrbX1whcueYtcrVPh6xjf62uNkOjGD1jeSaxfSWMrqTDhqTj3yVJ2DGfp4yuDIn0J1erN/KAhebTKZqMW6dwV+O9MPDHBHrz5xP0SaBxwEl6lE3Wq4i88UktHRaAKrLURk6sC1vYPiNCk8vst1MOxK4ukJ8D/1SuY/vn5Slok3cAHHOGGtKgd8j2m1cmeJ17v4UTJOJ6jLSMZmYZ6YZarkq41S+YeTt0Nq072omKfM+u56lTg5xt8RHKAF65JPhWDSuysCJwzs9tEU8VYiS5xMimZo7IKgZoHgApIYy4oTGd08PIX3L8qE2Di0I35/XnE92WewiyTxHSNXI1l3JQw69GPtUJSYaHhkifCnSpUzGUG6EuP77YsBbt+wk2bXhR24Ozk4c 8+VAgBqO EOyM7th6hLEdSO9NtEn9bnBZ/TKVAUB49p7apv5x9deImiKtgY2TYfgsEFBqjbqxzWa9Lvh6/QDJqBHbm6FKynO7I49sq/VMW91fpzZAbu5+FXetZmujmjnOOmCLx9UOuLSkBw2MxqzmiK4/oPkY6TYRR4WA6oI+QO9u72+hJzu28vJzpuMKwV1AD8BbCjyyxmxapP3D/NQjQ73BtJYZSYIgoeIA0znHnIMixY5fGRfShuhf95keRsvUiKhOWECeFQuFdku+AbmjFZZaMi1JO+hXy03mSovUP2bPgMygC0lSDURNYXxjExlKZ/5A1ew8WuN2yad3/axrXKHZPAzt1nWtIGY3mS5AHKUAI4yA5UU+/vfavwouY5OmmXcZSMSujVhwU1IgWXukU/b9p45gjKehskuK2GLoHXiih9lvKX3H/96uYAtu56/+hN8eCpu5OzW/+SzosWqZXGexhJi0ytTJcelsao61vxkHDHY8pMI6ZUmCFrtS2JDGcOoQi6K37JlZjdqFEQaMAkQpTVEkPvrA++utWCmr+8Tivw1OhoqBTWeCic39vgl/tAPNHO4hyShkTIZjjmlUBrsXZi4AwdVhVszln7MVxpi5Of271FViahpYNXRpJzUGW+1hzzT6evsd0kP1ZQ/F8i/TBjMiTE6XTqVGfiuPQkbqeedWmo/jXFmBJw5jfwx9YzVeW7Xkmp0yl9p3v1Sdnmk0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: enum vm_mem_backing_src_type is encoding too many different possibilities on different axes of (1) whether to mmap from an fd, (2) granularity of mapping for THP, (3) size of hugetlb mapping, and has yet to be extended to support guest_memfd. When guest_memfd supports mmap() and we also want to support testing with mmap()ing from guest_memfd, the number of combinations make enumeration in vm_mem_backing_src_type difficult. This refactor separates out vm_mem_backing_src_type from userspace_mem_region. For now, vm_mem_backing_src_type remains a possible way for tests to specify, on the command line, the combination of backing memory to test. vm_mem_add() is now the last place where vm_mem_backing_src_type is interpreted, to 1. Check validity of requested guest_paddr 2. Align mmap_size appropriately based on the mapping's page_size and architecture 3. Install memory appropriately according to mapping's page size mmap()ing an alias seems to be specific to userfaultfd tests and could be refactored out of struct userspace_mem_region and localized in userfaultfd tests in future. This paves the way for replacing vm_mem_backing_src_type with multiple command line flags that would specify backing memory more flexibly. Future tests are expected to use vm_mem_region_alloc() to allocate a struct userspace_mem_region, then use more fundamental functions like vm_mem_region_mmap(), vm_mem_region_madvise_thp(), kvm_memfd_create(), vm_create_guest_memfd(), and other functions in vm_mem_add() to flexibly build up struct userspace_mem_region before finally adding the region to the vm with vm_mem_region_add(). Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/include/kvm_util.h | 29 +- .../testing/selftests/kvm/include/test_util.h | 2 + tools/testing/selftests/kvm/lib/kvm_util.c | 413 +++++++++++------- tools/testing/selftests/kvm/lib/test_util.c | 25 ++ 4 files changed, 319 insertions(+), 150 deletions(-) diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index d336cd0c8f19..1576e7e4aefe 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -35,11 +35,26 @@ struct userspace_mem_region { struct sparsebit *protected_phy_pages; int fd; off_t offset; - enum vm_mem_backing_src_type backing_src_type; + /* + * host_mem is mmap_start aligned upwards to an address suitable for the + * architecture. In most cases, host_mem and mmap_start are the same, + * except for s390x, where the host address must be aligned to 1M (due + * to PGSTEs). + */ +#ifdef __s390x__ +#define S390X_HOST_ADDRESS_ALIGNMENT 0x100000 +#endif void *host_mem; + /* host_alias is to mmap_alias as host_mem is to mmap_start */ void *host_alias; void *mmap_start; void *mmap_alias; + /* + * mmap_size is possibly larger than region.memory_size because in some + * cases, host_mem has to be adjusted upwards (see comment for host_mem + * above). In those cases, mmap_size has to be adjusted upwards so that + * enough memory is available in this memslot. + */ size_t mmap_size; struct rb_node gpa_node; struct rb_node hva_node; @@ -559,6 +574,18 @@ int __vm_set_user_memory_region2(struct kvm_vm *vm, uint32_t slot, uint32_t flag uint64_t gpa, uint64_t size, void *hva, uint32_t guest_memfd, uint64_t guest_memfd_offset); +struct userspace_mem_region *vm_mem_region_alloc(struct kvm_vm *vm); +void *vm_mem_region_mmap(struct userspace_mem_region *region, size_t length, + int flags, int fd, off_t offset); +void vm_mem_region_install_memory(struct userspace_mem_region *region, + size_t memslot_size, size_t alignment); +void vm_mem_region_madvise_thp(struct userspace_mem_region *region, int advice); +int vm_mem_region_install_guest_memfd(struct userspace_mem_region *region, + int guest_memfd); +void *vm_mem_region_mmap_alias(struct userspace_mem_region *region, int flags, + size_t alignment); +void vm_mem_region_add(struct kvm_vm *vm, struct userspace_mem_region *region); + void vm_userspace_mem_region_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, uint64_t guest_paddr, uint32_t slot, uint64_t npages, diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 011e757d4e2c..983adeb54c0e 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -159,6 +159,8 @@ size_t get_trans_hugepagesz(void); size_t get_def_hugetlb_pagesz(void); const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i); size_t get_backing_src_pagesz(uint32_t i); +int backing_src_should_madvise(uint32_t i); +int get_backing_src_madvise_advice(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); void backing_src_help(const char *flag); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 56b170b725b3..9bdd03a5da90 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -774,15 +774,12 @@ void kvm_vm_free(struct kvm_vm *vmp) free(vmp); } -int kvm_memfd_alloc(size_t size, bool hugepages) +int kvm_create_memfd(size_t size, unsigned int flags) { - int memfd_flags = MFD_CLOEXEC; - int fd, r; - - if (hugepages) - memfd_flags |= MFD_HUGETLB; + int fd; + int r; - fd = memfd_create("kvm_selftest", memfd_flags); + fd = memfd_create("kvm_selftest", flags); TEST_ASSERT(fd != -1, __KVM_SYSCALL_ERROR("memfd_create()", fd)); r = ftruncate(fd, size); @@ -794,6 +791,16 @@ int kvm_memfd_alloc(size_t size, bool hugepages) return fd; } +int kvm_memfd_alloc(size_t size, bool hugepages) +{ + int memfd_flags = MFD_CLOEXEC; + + if (hugepages) + memfd_flags |= MFD_HUGETLB; + + return kvm_create_memfd(size, memfd_flags); +} + /* * Memory Compare, host virtual to guest virtual * @@ -973,185 +980,293 @@ void vm_set_user_memory_region2(struct kvm_vm *vm, uint32_t slot, uint32_t flags errno, strerror(errno)); } +/** + * Allocates and returns a struct userspace_mem_region. + */ +struct userspace_mem_region *vm_mem_region_alloc(struct kvm_vm *vm) +{ + struct userspace_mem_region *region; -/* FIXME: This thing needs to be ripped apart and rewritten. */ -void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, - uint64_t guest_paddr, uint32_t slot, uint64_t npages, - uint32_t flags, int guest_memfd, uint64_t guest_memfd_offset) + /* Allocate and initialize new mem region structure. */ + region = calloc(1, sizeof(*region)); + TEST_ASSERT(region != NULL, "Insufficient Memory"); + + region->unused_phy_pages = sparsebit_alloc(); + if (vm_arch_has_protected_memory(vm)) + region->protected_phy_pages = sparsebit_alloc(); + + region->fd = -1; + region->region.guest_memfd = -1; + + return region; +} + +static size_t compute_page_size(int mmap_flags, int madvise_advice) +{ + if (mmap_flags & MAP_HUGETLB) { + int size_flags = (mmap_flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK; + if (!size_flags) + return get_def_hugetlb_pagesz(); + + return 1ULL << size_flags; + } + + return madvise_advice == MADV_HUGEPAGE ? get_trans_hugepagesz() : getpagesize(); +} + +/** + * Calls mmap() with @length, @flags, @fd, @offset for @region. + * + * Think of this as the struct userspace_mem_region wrapper for the mmap() + * syscall. + */ +void *vm_mem_region_mmap(struct userspace_mem_region *region, size_t length, + int flags, int fd, off_t offset) +{ + void *mem; + + if (flags & MAP_SHARED) { + TEST_ASSERT(fd != -1, + "Ensure that fd is provided for shared mappings."); + TEST_ASSERT( + region->fd == fd || region->region.guest_memfd == fd, + "Ensure that fd is opened before mmap, and is either " + "set up in region->fd or region->region.guest_memfd."); + } + + mem = mmap(NULL, length, PROT_READ | PROT_WRITE, flags, fd, offset); + TEST_ASSERT(mem != MAP_FAILED, "Couldn't mmap anonymous memory"); + + region->mmap_start = mem; + region->mmap_size = length; + region->offset = offset; + + return mem; +} + +/** + * Installs mmap()ed memory in @region->mmap_start as @region->host_mem, + * checking constraints. + */ +void vm_mem_region_install_memory(struct userspace_mem_region *region, + size_t memslot_size, size_t alignment) +{ + TEST_ASSERT(region->mmap_size >= memslot_size, + "mmap()ed memory insufficient for memslot"); + + region->host_mem = align_ptr_up(region->mmap_start, alignment); + region->region.userspace_addr = (uint64_t)region->host_mem; + region->region.memory_size = memslot_size; +} + + +/** + * Calls madvise with @advice for @region. + * + * Think of this as the struct userspace_mem_region wrapper for the madvise() + * syscall. + */ +void vm_mem_region_madvise_thp(struct userspace_mem_region *region, int advice) { int ret; - struct userspace_mem_region *region; - size_t backing_src_pagesz = get_backing_src_pagesz(src_type); - size_t mem_size = npages * vm->page_size; - size_t alignment; - TEST_REQUIRE_SET_USER_MEMORY_REGION2(); + TEST_ASSERT( + region->host_mem && region->mmap_size, + "vm_mem_region_madvise_thp() must be called after vm_mem_region_mmap()"); - TEST_ASSERT(vm_adjust_num_guest_pages(vm->mode, npages) == npages, - "Number of guest pages is not compatible with the host. " - "Try npages=%d", vm_adjust_num_guest_pages(vm->mode, npages)); - - TEST_ASSERT((guest_paddr % vm->page_size) == 0, "Guest physical " - "address not on a page boundary.\n" - " guest_paddr: 0x%lx vm->page_size: 0x%x", - guest_paddr, vm->page_size); - TEST_ASSERT((((guest_paddr >> vm->page_shift) + npages) - 1) - <= vm->max_gfn, "Physical range beyond maximum " - "supported physical address,\n" - " guest_paddr: 0x%lx npages: 0x%lx\n" - " vm->max_gfn: 0x%lx vm->page_size: 0x%x", - guest_paddr, npages, vm->max_gfn, vm->page_size); + ret = madvise(region->host_mem, region->mmap_size, advice); + TEST_ASSERT(ret == 0, "madvise failed, addr: %p length: 0x%lx", + region->host_mem, region->mmap_size); +} + +/** + * Installs guest_memfd by setting it up in @region. + * + * Returns the guest_memfd that was installed in the @region. + */ +int vm_mem_region_install_guest_memfd(struct userspace_mem_region *region, + int guest_memfd) +{ + /* + * Install a unique fd for each memslot so that the fd can be closed + * when the region is deleted without needing to track if the fd is + * owned by the framework or by the caller. + */ + guest_memfd = dup(guest_memfd); + TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd)); + region->region.guest_memfd = guest_memfd; + + return guest_memfd; +} + +/** + * Calls mmap() to create an alias for mmap()ed memory at region->host_mem, + * exactly the same size the was mmap()ed. + * + * This is used mainly for userfaultfd tests. + */ +void *vm_mem_region_mmap_alias(struct userspace_mem_region *region, int flags, + size_t alignment) +{ + region->mmap_alias = mmap(NULL, region->mmap_size, + PROT_READ | PROT_WRITE, flags, region->fd, 0); + TEST_ASSERT(region->mmap_alias != MAP_FAILED, + __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED)); + + region->host_alias = align_ptr_up(region->mmap_alias, alignment); + + return region->host_alias; +} + +static void vm_mem_region_assert_no_duplicate(struct kvm_vm *vm, uint32_t slot, + uint64_t gpa, size_t size) +{ + struct userspace_mem_region *region; /* * Confirm a mem region with an overlapping address doesn't * already exist. */ - region = (struct userspace_mem_region *) userspace_mem_region_find( - vm, guest_paddr, (guest_paddr + npages * vm->page_size) - 1); - if (region != NULL) - TEST_FAIL("overlapping userspace_mem_region already " - "exists\n" - " requested guest_paddr: 0x%lx npages: 0x%lx " - "page_size: 0x%x\n" - " existing guest_paddr: 0x%lx size: 0x%lx", - guest_paddr, npages, vm->page_size, - (uint64_t) region->region.guest_phys_addr, - (uint64_t) region->region.memory_size); + region = userspace_mem_region_find(vm, gpa, gpa + size - 1); + if (region != NULL) { + TEST_FAIL("overlapping userspace_mem_region already exists\n" + " requested gpa: 0x%lx size: 0x%lx" + " existing gpa: 0x%lx size: 0x%lx", + gpa, size, + (uint64_t) region->region.guest_phys_addr, + (uint64_t) region->region.memory_size); + } /* Confirm no region with the requested slot already exists. */ - hash_for_each_possible(vm->regions.slot_hash, region, slot_node, - slot) { + hash_for_each_possible(vm->regions.slot_hash, region, slot_node, slot) { if (region->region.slot != slot) continue; - TEST_FAIL("A mem region with the requested slot " - "already exists.\n" - " requested slot: %u paddr: 0x%lx npages: 0x%lx\n" - " existing slot: %u paddr: 0x%lx size: 0x%lx", - slot, guest_paddr, npages, - region->region.slot, - (uint64_t) region->region.guest_phys_addr, - (uint64_t) region->region.memory_size); + TEST_FAIL("A mem region with the requested slot already exists.\n" + " requested slot: %u paddr: 0x%lx size: 0x%lx\n" + " existing slot: %u paddr: 0x%lx size: 0x%lx", + slot, gpa, size, + region->region.slot, + (uint64_t) region->region.guest_phys_addr, + (uint64_t) region->region.memory_size); } +} - /* Allocate and initialize new mem region structure. */ - region = calloc(1, sizeof(*region)); - TEST_ASSERT(region != NULL, "Insufficient Memory"); - region->mmap_size = mem_size; +/** + * Add a @region to @vm. All necessary fields in region->region should already + * be populated. + * + * Think of this as the struct userspace_mem_region wrapper for the + * KVM_SET_USER_MEMORY_REGION2 ioctl. + */ +void vm_mem_region_add(struct kvm_vm *vm, struct userspace_mem_region *region) +{ + uint64_t npages; + uint64_t gpa; + int ret; -#ifdef __s390x__ - /* On s390x, the host address must be aligned to 1M (due to PGSTEs) */ - alignment = 0x100000; -#else - alignment = 1; -#endif + TEST_REQUIRE_SET_USER_MEMORY_REGION2(); - /* - * When using THP mmap is not guaranteed to returned a hugepage aligned - * address so we have to pad the mmap. Padding is not needed for HugeTLB - * because mmap will always return an address aligned to the HugeTLB - * page size. - */ - if (src_type == VM_MEM_SRC_ANONYMOUS_THP) - alignment = max(backing_src_pagesz, alignment); + npages = region->region.memory_size / vm->page_size; + TEST_ASSERT(vm_adjust_num_guest_pages(vm->mode, npages) == npages, + "Number of guest pages is not compatible with the host. " + "Try npages=%d", vm_adjust_num_guest_pages(vm->mode, npages)); + + gpa = region->region.guest_phys_addr; + TEST_ASSERT((gpa % vm->page_size) == 0, + "Guest physical address not on a page boundary.\n" + " gpa: 0x%lx vm->page_size: 0x%x", + gpa, vm->page_size); + TEST_ASSERT((((gpa >> vm->page_shift) + npages) - 1) <= vm->max_gfn, + "Physical range beyond maximum supported physical address,\n" + " gpa: 0x%lx npages: 0x%lx\n" + " vm->max_gfn: 0x%lx vm->page_size: 0x%x", + gpa, npages, vm->max_gfn, vm->page_size); + + vm_mem_region_assert_no_duplicate(vm, region->region.slot, gpa, + region->mmap_size); - TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz)); + ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, ®ion->region); + TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n" + " rc: %i errno: %i\n" + " slot: %u flags: 0x%x\n" + " guest_phys_addr: 0x%lx size: 0x%llx guest_memfd: %d", + ret, errno, region->region.slot, region->region.flags, + gpa, region->region.memory_size, + region->region.guest_memfd); - /* Add enough memory to align up if necessary */ - if (alignment > 1) - region->mmap_size += alignment; + sparsebit_set_num(region->unused_phy_pages, gpa >> vm->page_shift, npages); - region->fd = -1; - if (backing_src_is_shared(src_type)) - region->fd = kvm_memfd_alloc(region->mmap_size, - src_type == VM_MEM_SRC_SHARED_HUGETLB); - - region->mmap_start = mmap(NULL, region->mmap_size, - PROT_READ | PROT_WRITE, - vm_mem_backing_src_alias(src_type)->flag, - region->fd, 0); - TEST_ASSERT(region->mmap_start != MAP_FAILED, - __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED)); + /* Add to quick lookup data structures */ + vm_userspace_mem_region_gpa_insert(&vm->regions.gpa_tree, region); + vm_userspace_mem_region_hva_insert(&vm->regions.hva_tree, region); + hash_add(vm->regions.slot_hash, ®ion->slot_node, region->region.slot); +} - TEST_ASSERT(!is_backing_src_hugetlb(src_type) || - region->mmap_start == align_ptr_up(region->mmap_start, backing_src_pagesz), - "mmap_start %p is not aligned to HugeTLB page size 0x%lx", - region->mmap_start, backing_src_pagesz); +void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, + uint64_t guest_paddr, uint32_t slot, uint64_t npages, + uint32_t flags, int guest_memfd, uint64_t guest_memfd_offset) +{ + struct userspace_mem_region *region; + size_t mapping_page_size; + size_t memslot_size; + int madvise_advice; + size_t mmap_size; + size_t alignment; + int mmap_flags; + int memfd; - /* Align host address */ - region->host_mem = align_ptr_up(region->mmap_start, alignment); + memslot_size = npages * vm->page_size; + + mmap_flags = vm_mem_backing_src_alias(src_type)->flag; + madvise_advice = get_backing_src_madvise_advice(src_type); + mapping_page_size = compute_page_size(mmap_flags, madvise_advice); + + TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, mapping_page_size)); + + alignment = mapping_page_size; +#ifdef __s390x__ + alignment = max(alignment, S390X_HOST_ADDRESS_ALIGNMENT); +#endif - /* As needed perform madvise */ - if ((src_type == VM_MEM_SRC_ANONYMOUS || - src_type == VM_MEM_SRC_ANONYMOUS_THP) && thp_configured()) { - ret = madvise(region->host_mem, mem_size, - src_type == VM_MEM_SRC_ANONYMOUS ? MADV_NOHUGEPAGE : MADV_HUGEPAGE); - TEST_ASSERT(ret == 0, "madvise failed, addr: %p length: 0x%lx src_type: %s", - region->host_mem, mem_size, - vm_mem_backing_src_alias(src_type)->name); + region = vm_mem_region_alloc(vm); + + memfd = -1; + if (backing_src_is_shared(src_type)) { + unsigned int memfd_flags = MFD_CLOEXEC; + if (src_type == VM_MEM_SRC_SHARED_HUGETLB) + memfd_flags |= MFD_HUGETLB; + + memfd = kvm_create_memfd(memslot_size, memfd_flags); } + region->fd = memfd; + + mmap_size = align_up(memslot_size, alignment); + vm_mem_region_mmap(region, mmap_size, mmap_flags, memfd, 0); + vm_mem_region_install_memory(region, memslot_size, alignment); - region->backing_src_type = src_type; + if (backing_src_should_madvise(src_type)) + vm_mem_region_madvise_thp(region, madvise_advice); + + if (backing_src_is_shared(src_type)) + vm_mem_region_mmap_alias(region, mmap_flags, alignment); if (flags & KVM_MEM_GUEST_MEMFD) { if (guest_memfd < 0) { - uint32_t guest_memfd_flags = 0; - TEST_ASSERT(!guest_memfd_offset, - "Offset must be zero when creating new guest_memfd"); - guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags); - } else { - /* - * Install a unique fd for each memslot so that the fd - * can be closed when the region is deleted without - * needing to track if the fd is owned by the framework - * or by the caller. - */ - guest_memfd = dup(guest_memfd); - TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd)); + TEST_ASSERT( + guest_memfd_offset == 0, + "Offset must be zero when creating new guest_memfd"); + guest_memfd = vm_create_guest_memfd(vm, memslot_size, 0); } - region->region.guest_memfd = guest_memfd; - region->region.guest_memfd_offset = guest_memfd_offset; - } else { - region->region.guest_memfd = -1; + vm_mem_region_install_guest_memfd(region, guest_memfd); } - region->unused_phy_pages = sparsebit_alloc(); - if (vm_arch_has_protected_memory(vm)) - region->protected_phy_pages = sparsebit_alloc(); - sparsebit_set_num(region->unused_phy_pages, - guest_paddr >> vm->page_shift, npages); region->region.slot = slot; region->region.flags = flags; region->region.guest_phys_addr = guest_paddr; - region->region.memory_size = npages * vm->page_size; - region->region.userspace_addr = (uintptr_t) region->host_mem; - ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, ®ion->region); - TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n" - " rc: %i errno: %i\n" - " slot: %u flags: 0x%x\n" - " guest_phys_addr: 0x%lx size: 0x%lx guest_memfd: %d", - ret, errno, slot, flags, - guest_paddr, (uint64_t) region->region.memory_size, - region->region.guest_memfd); - - /* Add to quick lookup data structures */ - vm_userspace_mem_region_gpa_insert(&vm->regions.gpa_tree, region); - vm_userspace_mem_region_hva_insert(&vm->regions.hva_tree, region); - hash_add(vm->regions.slot_hash, ®ion->slot_node, slot); - - /* If shared memory, create an alias. */ - if (region->fd >= 0) { - region->mmap_alias = mmap(NULL, region->mmap_size, - PROT_READ | PROT_WRITE, - vm_mem_backing_src_alias(src_type)->flag, - region->fd, 0); - TEST_ASSERT(region->mmap_alias != MAP_FAILED, - __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED)); - - /* Align host alias address */ - region->host_alias = align_ptr_up(region->mmap_alias, alignment); - } + region->region.guest_memfd_offset = guest_memfd_offset; + vm_mem_region_add(vm, region); } void vm_userspace_mem_region_add(struct kvm_vm *vm, diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index d0a9b5ee0c01..cbcc1e7ad578 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -351,6 +351,31 @@ size_t get_private_mem_backing_src_pagesz(uint32_t i) } } +int backing_src_should_madvise(uint32_t i) +{ + switch (i) { + case VM_MEM_SRC_ANONYMOUS: + case VM_MEM_SRC_SHMEM: + case VM_MEM_SRC_ANONYMOUS_THP: + return true; + default: + return false; + } +} + +int get_backing_src_madvise_advice(uint32_t i) +{ + switch (i) { + case VM_MEM_SRC_ANONYMOUS: + case VM_MEM_SRC_SHMEM: + return MADV_NOHUGEPAGE; + case VM_MEM_SRC_ANONYMOUS_THP: + return MADV_NOHUGEPAGE; + default: + return 0; + } +} + bool is_backing_src_hugetlb(uint32_t i) { return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB);