From patchwork Wed Mar 15 02:17:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175213 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A1F5C6FD1D for ; Wed, 15 Mar 2023 02:18:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230072AbjCOCSH (ORCPT ); Tue, 14 Mar 2023 22:18:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229528AbjCOCSF (ORCPT ); Tue, 14 Mar 2023 22:18:05 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 722651A97F for ; Tue, 14 Mar 2023 19:18:02 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5447558ae68so15705087b3.13 for ; Tue, 14 Mar 2023 19:18:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846681; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=38L7LZ5jLBjx9PbUKuTB2atr9dbXA4dDC7kLTMXLB+g=; b=rIazOBKTC/i4yPJ6jS6ALmb3Uk/dcnEeRvZILebSx2/8SvkAY4KTvAoZQdqMzL2Rp/ /XWbTqrx9w0YznaSzYtep5dMd584TmDK71ANZgRdzWxSSVv36BlczN5plwqF+eE07be4 scV0XurqVOg37udkF166i9WXLkL9BpjO1MQieqGyZ5B4NOUdH2l1hr5/D+ImVQ7zl/v4 8+Paeyw63FJc+Ze04HpDJMytUUiyxaXTOBJaOLJRn8UZotXjgCQKGAA57sqaTY+MBFVa kmP6tmWwr9w7+luXDMQJFaOUG+bDQ9wBZkv1EI/GRPIOUdG8w58g+wZORlRH004qz0vS hSuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846681; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=38L7LZ5jLBjx9PbUKuTB2atr9dbXA4dDC7kLTMXLB+g=; b=cx2bhHWdE54jXpkaBQ0xVwHyJv1XfgXDHqglU8N/9YzkeSSMn8KY3yXJuli0wIvxnp u9Tu0ZctdKI90m0Up9yUSeJfpvGJuEYXX7jG0x6I6PE5uGuktBOy8lmS6CHEKkVrO1A/ 6h3805lHC+3FehoMOcuB3p9zQ3ij2nJWBWZR07xYEjXQ8hTDwgmJtV5o6UfealHBCeNd KHXCAF4UtxQQgoKTuMCIPM2rp4haS1D29k7z5qzfuINF/CaHCzp1hoZCUhXYmhmR6yOT Oy7ErQacmXuHLGPwzEdvNmLVRBZ03xerRCrGxC9/vimSLNE68glrg8rk8W9tpMXD27JO s3Zw== X-Gm-Message-State: AO0yUKVL4l/7hAvBRUugtCE0c6xKKHKG3vbwTsyPhXj1X+hGvC+POlSk aB+336Ud8MS+owO/3GmBk6eRuuPP/UliyQ== X-Google-Smtp-Source: AK7set+9biMO3uX4J6Z8c/u1SilyDADmMAa1TUpd94kHnA9V3RGh2L8nTqA2VFT+U/qV+VtlX/F/wvGkeICEeg== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a5b:40e:0:b0:ac2:ffe:9cc9 with SMTP id m14-20020a5b040e000000b00ac20ffe9cc9mr25169640ybp.3.1678846681657; Tue, 14 Mar 2023 19:18:01 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:25 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-2-amoorthy@google.com> Subject: [WIP Patch v2 01/14] KVM: selftests: Allow many vCPUs and reader threads per UFFD in demand paging test From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org At the moment, demand_paging_test does not support profiling/testing multiple vCPU threads concurrently faulting on a single uffd because (a) "-u" (run test in userfaultfd mode) creates a uffd for each vCPU's region, so that each uffd services a single vCPU thread. (b) "-u -o" (userfaultfd mode + overlapped vCPU memory accesses) simply doesn't work: the test tries to register the same memory to multiple uffds, causing an error. Add support for many vcpus per uffd by (1) Keeping "-u" behavior unchanged. (2) Making "-u -a" create a single uffd for all of guest memory. (3) Making "-u -o" implicitly pass "-a", solving the problem in (b). In cases (2) and (3) all vCPU threads fault on a single uffd. With multiple potentially multiple vCPU per UFFD, it makes sense to allow configuring the number reader threads per UFFD as well: add the "-r" flag to do so. Signed-off-by: Anish Moorthy Acked-by: James Houghton --- .../selftests/kvm/aarch64/page_fault_test.c | 4 +- .../selftests/kvm/demand_paging_test.c | 62 +++++++++---- .../selftests/kvm/include/userfaultfd_util.h | 18 +++- .../selftests/kvm/lib/userfaultfd_util.c | 86 +++++++++++++------ 4 files changed, 125 insertions(+), 45 deletions(-) diff --git a/tools/testing/selftests/kvm/aarch64/page_fault_test.c b/tools/testing/selftests/kvm/aarch64/page_fault_test.c index df10f1ffa20d9..3b6d228a9340d 100644 --- a/tools/testing/selftests/kvm/aarch64/page_fault_test.c +++ b/tools/testing/selftests/kvm/aarch64/page_fault_test.c @@ -376,14 +376,14 @@ static void setup_uffd(struct kvm_vm *vm, struct test_params *p, *pt_uffd = uffd_setup_demand_paging(uffd_mode, 0, pt_args.hva, pt_args.paging_size, - test->uffd_pt_handler); + 1, test->uffd_pt_handler); *data_uffd = NULL; if (test->uffd_data_handler) *data_uffd = uffd_setup_demand_paging(uffd_mode, 0, data_args.hva, data_args.paging_size, - test->uffd_data_handler); + 1, test->uffd_data_handler); } static void free_uffd(struct test_desc *test, struct uffd_desc *pt_uffd, diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index b0e1fc4de9e29..fc9c6ac76660c 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -58,7 +58,7 @@ static void vcpu_worker(struct memstress_vcpu_args *vcpu_args) } static int handle_uffd_page_request(int uffd_mode, int uffd, - struct uffd_msg *msg) + struct uffd_msg *msg) { pid_t tid = syscall(__NR_gettid); uint64_t addr = msg->arg.pagefault.address; @@ -77,8 +77,15 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, copy.mode = 0; r = ioctl(uffd, UFFDIO_COPY, ©); - if (r == -1) { - pr_info("Failed UFFDIO_COPY in 0x%lx from thread %d with errno: %d\n", + /* + * With multiple vCPU threads fault on a single page and there are + * multiple readers for the UFFD, at least one of the UFFDIO_COPYs + * will fail with EEXIST: handle that case without signaling an + * error. + */ + if (r == -1 && errno != EEXIST) { + pr_info( + "Failed UFFDIO_COPY in 0x%lx from thread %d, errno = %d\n", addr, tid, errno); return r; } @@ -89,8 +96,10 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, cont.range.len = demand_paging_size; r = ioctl(uffd, UFFDIO_CONTINUE, &cont); - if (r == -1) { - pr_info("Failed UFFDIO_CONTINUE in 0x%lx from thread %d with errno: %d\n", + /* See the note about EEXISTs in the UFFDIO_COPY branch. */ + if (r == -1 && errno != EEXIST) { + pr_info( + "Failed UFFDIO_CONTINUE in 0x%lx from thread %d, errno = %d\n", addr, tid, errno); return r; } @@ -110,7 +119,9 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, struct test_params { int uffd_mode; + bool single_uffd; useconds_t uffd_delay; + int readers_per_uffd; enum vm_mem_backing_src_type src_type; bool partition_vcpu_memory_access; }; @@ -133,7 +144,8 @@ static void run_test(enum vm_guest_mode mode, void *arg) struct timespec start; struct timespec ts_diff; struct kvm_vm *vm; - int i; + int i, num_uffds = 0; + uint64_t uffd_region_size; vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, p->src_type, p->partition_vcpu_memory_access); @@ -146,10 +158,13 @@ static void run_test(enum vm_guest_mode mode, void *arg) memset(guest_data_prototype, 0xAB, demand_paging_size); if (p->uffd_mode) { - uffd_descs = malloc(nr_vcpus * sizeof(struct uffd_desc *)); + num_uffds = p->single_uffd ? 1 : nr_vcpus; + uffd_region_size = nr_vcpus * guest_percpu_mem_size / num_uffds; + + uffd_descs = malloc(num_uffds * sizeof(struct uffd_desc *)); TEST_ASSERT(uffd_descs, "Memory allocation failed"); - for (i = 0; i < nr_vcpus; i++) { + for (i = 0; i < num_uffds; i++) { struct memstress_vcpu_args *vcpu_args; void *vcpu_hva; void *vcpu_alias; @@ -160,8 +175,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) vcpu_hva = addr_gpa2hva(vm, vcpu_args->gpa); vcpu_alias = addr_gpa2alias(vm, vcpu_args->gpa); - prefault_mem(vcpu_alias, - vcpu_args->pages * memstress_args.guest_page_size); + prefault_mem(vcpu_alias, uffd_region_size); /* * Set up user fault fd to handle demand paging @@ -169,7 +183,8 @@ static void run_test(enum vm_guest_mode mode, void *arg) */ uffd_descs[i] = uffd_setup_demand_paging( p->uffd_mode, p->uffd_delay, vcpu_hva, - vcpu_args->pages * memstress_args.guest_page_size, + uffd_region_size, + p->readers_per_uffd, &handle_uffd_page_request); } } @@ -186,7 +201,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) if (p->uffd_mode) { /* Tell the user fault fd handler threads to quit */ - for (i = 0; i < nr_vcpus; i++) + for (i = 0; i < num_uffds; i++) uffd_stop_demand_paging(uffd_descs[i]); } @@ -206,14 +221,19 @@ static void run_test(enum vm_guest_mode mode, void *arg) static void help(char *name) { puts(""); - printf("usage: %s [-h] [-m vm_mode] [-u uffd_mode] [-d uffd_delay_usec]\n" - " [-b memory] [-s type] [-v vcpus] [-o]\n", name); + printf("usage: %s [-h] [-m vm_mode] [-u uffd_mode] [-a]\n" + " [-d uffd_delay_usec] [-r readers_per_uffd] [-b memory]\n" + " [-s type] [-v vcpus] [-o]\n", name); guest_modes_help(); printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n" " UFFD registration mode: 'MISSING' or 'MINOR'.\n"); + printf(" -a: Use a single userfaultfd for all of guest memory, instead of\n" + " creating one for each region paged by a unique vCPU\n" + " Set implicitly with -o, and no effect without -u.\n"); printf(" -d: add a delay in usec to the User Fault\n" " FD handler to simulate demand paging\n" " overheads. Ignored without -u.\n"); + printf(" -r: Set the number of reader threads per uffd.\n"); printf(" -b: specify the size of the memory region which should be\n" " demand paged by each vCPU. e.g. 10M or 3G.\n" " Default: 1G\n"); @@ -231,12 +251,14 @@ int main(int argc, char *argv[]) struct test_params p = { .src_type = DEFAULT_VM_MEM_SRC, .partition_vcpu_memory_access = true, + .readers_per_uffd = 1, + .single_uffd = false, }; int opt; guest_modes_append_default(); - while ((opt = getopt(argc, argv, "hm:u:d:b:s:v:o")) != -1) { + while ((opt = getopt(argc, argv, "ahom:u:d:b:s:v:r:")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); @@ -248,6 +270,9 @@ int main(int argc, char *argv[]) p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR; TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'."); break; + case 'a': + p.single_uffd = true; + break; case 'd': p.uffd_delay = strtoul(optarg, NULL, 0); TEST_ASSERT(p.uffd_delay >= 0, "A negative UFFD delay is not supported."); @@ -265,6 +290,13 @@ int main(int argc, char *argv[]) break; case 'o': p.partition_vcpu_memory_access = false; + p.single_uffd = true; + break; + case 'r': + p.readers_per_uffd = atoi(optarg); + TEST_ASSERT(p.readers_per_uffd >= 1, + "Invalid number of readers per uffd %d: must be >=1", + p.readers_per_uffd); break; case 'h': default: diff --git a/tools/testing/selftests/kvm/include/userfaultfd_util.h b/tools/testing/selftests/kvm/include/userfaultfd_util.h index 877449c345928..92cc1f9ec0686 100644 --- a/tools/testing/selftests/kvm/include/userfaultfd_util.h +++ b/tools/testing/selftests/kvm/include/userfaultfd_util.h @@ -17,18 +17,30 @@ typedef int (*uffd_handler_t)(int uffd_mode, int uffd, struct uffd_msg *msg); +struct uffd_reader_args { + int uffd_mode; + int uffd; + useconds_t delay; + uffd_handler_t handler; + /* Holds the read end of the pipe for killing the reader. */ + int pipe; +}; + struct uffd_desc { int uffd_mode; int uffd; - int pipefds[2]; useconds_t delay; uffd_handler_t handler; - pthread_t thread; + uint64_t num_readers; + /* Holds the write ends of the pipes for killing the readers. */ + int *pipefds; + pthread_t *readers; + struct uffd_reader_args *reader_args; }; struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, void *hva, uint64_t len, - uffd_handler_t handler); + uint64_t num_readers, uffd_handler_t handler); void uffd_stop_demand_paging(struct uffd_desc *uffd); diff --git a/tools/testing/selftests/kvm/lib/userfaultfd_util.c b/tools/testing/selftests/kvm/lib/userfaultfd_util.c index 92cef20902f1f..2723ee1e3e1b2 100644 --- a/tools/testing/selftests/kvm/lib/userfaultfd_util.c +++ b/tools/testing/selftests/kvm/lib/userfaultfd_util.c @@ -27,10 +27,8 @@ static void *uffd_handler_thread_fn(void *arg) { - struct uffd_desc *uffd_desc = (struct uffd_desc *)arg; - int uffd = uffd_desc->uffd; - int pipefd = uffd_desc->pipefds[0]; - useconds_t delay = uffd_desc->delay; + struct uffd_reader_args *reader_args = (struct uffd_reader_args *)arg; + int uffd = reader_args->uffd; int64_t pages = 0; struct timespec start; struct timespec ts_diff; @@ -44,7 +42,7 @@ static void *uffd_handler_thread_fn(void *arg) pollfd[0].fd = uffd; pollfd[0].events = POLLIN; - pollfd[1].fd = pipefd; + pollfd[1].fd = reader_args->pipe; pollfd[1].events = POLLIN; r = poll(pollfd, 2, -1); @@ -92,9 +90,9 @@ static void *uffd_handler_thread_fn(void *arg) if (!(msg.event & UFFD_EVENT_PAGEFAULT)) continue; - if (delay) - usleep(delay); - r = uffd_desc->handler(uffd_desc->uffd_mode, uffd, &msg); + if (reader_args->delay) + usleep(reader_args->delay); + r = reader_args->handler(reader_args->uffd_mode, uffd, &msg); if (r < 0) return NULL; pages++; @@ -110,7 +108,7 @@ static void *uffd_handler_thread_fn(void *arg) struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, void *hva, uint64_t len, - uffd_handler_t handler) + uint64_t num_readers, uffd_handler_t handler) { struct uffd_desc *uffd_desc; bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR); @@ -118,14 +116,26 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, struct uffdio_api uffdio_api; struct uffdio_register uffdio_register; uint64_t expected_ioctls = ((uint64_t) 1) << _UFFDIO_COPY; - int ret; + int ret, i; PER_PAGE_DEBUG("Userfaultfd %s mode, faults resolved with %s\n", is_minor ? "MINOR" : "MISSING", is_minor ? "UFFDIO_CONINUE" : "UFFDIO_COPY"); uffd_desc = malloc(sizeof(struct uffd_desc)); - TEST_ASSERT(uffd_desc, "malloc failed"); + TEST_ASSERT(uffd_desc, "Failed to malloc uffd descriptor"); + + uffd_desc->pipefds = malloc(sizeof(int) * num_readers); + TEST_ASSERT(uffd_desc->pipefds, "Failed to malloc pipes"); + + uffd_desc->readers = malloc(sizeof(pthread_t) * num_readers); + TEST_ASSERT(uffd_desc->readers, "Failed to malloc reader threads"); + + uffd_desc->reader_args = malloc( + sizeof(struct uffd_reader_args) * num_readers); + TEST_ASSERT(uffd_desc->reader_args, "Failed to malloc reader_args"); + + uffd_desc->num_readers = num_readers; /* In order to get minor faults, prefault via the alias. */ if (is_minor) @@ -148,18 +158,32 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, TEST_ASSERT((uffdio_register.ioctls & expected_ioctls) == expected_ioctls, "missing userfaultfd ioctls"); - ret = pipe2(uffd_desc->pipefds, O_CLOEXEC | O_NONBLOCK); - TEST_ASSERT(!ret, "Failed to set up pipefd"); - uffd_desc->uffd_mode = uffd_mode; uffd_desc->uffd = uffd; uffd_desc->delay = delay; uffd_desc->handler = handler; - pthread_create(&uffd_desc->thread, NULL, uffd_handler_thread_fn, - uffd_desc); - PER_VCPU_DEBUG("Created uffd thread for HVA range [%p, %p)\n", - hva, hva + len); + for (i = 0; i < uffd_desc->num_readers; ++i) { + int pipes[2]; + + ret = pipe2((int *) &pipes, O_CLOEXEC | O_NONBLOCK); + TEST_ASSERT(!ret, "Failed to set up pipefd %i for uffd_desc %p", + i, uffd_desc); + + uffd_desc->pipefds[i] = pipes[1]; + + uffd_desc->reader_args[i].uffd_mode = uffd_mode; + uffd_desc->reader_args[i].uffd = uffd; + uffd_desc->reader_args[i].delay = delay; + uffd_desc->reader_args[i].handler = handler; + uffd_desc->reader_args[i].pipe = pipes[0]; + + pthread_create(&uffd_desc->readers[i], NULL, uffd_handler_thread_fn, + &uffd_desc->reader_args[i]); + + PER_VCPU_DEBUG("Created uffd thread %i for HVA range [%p, %p)\n", + i, hva, hva + len); + } return uffd_desc; } @@ -167,19 +191,31 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, void uffd_stop_demand_paging(struct uffd_desc *uffd) { char c = 0; - int ret; + int i, ret; - ret = write(uffd->pipefds[1], &c, 1); - TEST_ASSERT(ret == 1, "Unable to write to pipefd"); + for (i = 0; i < uffd->num_readers; ++i) { + ret = write(uffd->pipefds[i], &c, 1); + TEST_ASSERT( + ret == 1, "Unable to write to pipefd %i for uffd_desc %p", i, uffd); + } - ret = pthread_join(uffd->thread, NULL); - TEST_ASSERT(ret == 0, "Pthread_join failed."); + for (i = 0; i < uffd->num_readers; ++i) { + ret = pthread_join(uffd->readers[i], NULL); + TEST_ASSERT( + ret == 0, + "Pthread_join failed on reader thread %i for uffd_desc %p", i, uffd); + } close(uffd->uffd); - close(uffd->pipefds[1]); - close(uffd->pipefds[0]); + for (i = 0; i < uffd->num_readers; ++i) { + close(uffd->pipefds[i]); + close(uffd->reader_args[i].pipe); + } + free(uffd->pipefds); + free(uffd->readers); + free(uffd->reader_args); free(uffd); } From patchwork Wed Mar 15 02:17:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175212 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20CBFC6FD1F for ; Wed, 15 Mar 2023 02:18:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230354AbjCOCSG (ORCPT ); Tue, 14 Mar 2023 22:18:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230259AbjCOCSF (ORCPT ); Tue, 14 Mar 2023 22:18:05 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E0D51B2FC for ; Tue, 14 Mar 2023 19:18:03 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id w14-20020a25ac0e000000b00b369c36c165so11624122ybi.6 for ; Tue, 14 Mar 2023 19:18:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846682; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=haIuqGIqek9CbbGDRZAxRpLspxYrTLCM+X7zGvj8v0c=; b=jmVxUlqsWEWjQf0vivVfGEGk2HI2UDoDW5BYuefiRuDnBfYjmbtTgowCnavj37anZP +pPpLAxGFkBuDWThuB3rIvpx0da1EuPSxblGCFpN8LLwy+DMqrFq1Zd3uVHt5NHl/P3b PDi/5WwYzHcddoAdtzbWdfPSIJRIk7Q1plq92xpGa4FMfSQ8tKNLrBed+Xw6tQhqNN9O fShUqHRg6qPefzFHd0yEGY2/S7oaZsoHSAXSOZ4lLBpJoXFfXf9Esi0fbRtyT+AF6xs9 HPPWgEgq/oSNj29spPenbiYDIgwqggDY3IU4pwFv9d3ghYOO+hWARShbUmJdvZHDHGmh m0ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846682; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=haIuqGIqek9CbbGDRZAxRpLspxYrTLCM+X7zGvj8v0c=; b=cOclpfS20h0ZAAQA4Ueln9kfv9QHm/Gp+V9VQhB15se4CUxWMweH1p0lNYOigT4r1M DKiSai8+O9gWBDlIEygVQbz62X4rNqZVYGNo0IZF62CHcr3lFB0jl+Qm6eFKuPK+Pvug Wmw5S2tDQrWaUUuS0la7t2k//gVsJwo+psh1E/CORqW5rUusWjpHA87pJDZX5TAWNc5C QLEUHLnKELlmdBqje242xtx/cxWrtbIXKmD+1U7ETrpNaUTJFOtOo3+xhnaXV/qnEurN pXMWyblXqjTpfUzV/frr3Pej6rRfM5PIvGa+hhMfFs/+JJw6ECg/1chC5Gx7t1fdyhwO 8Nfg== X-Gm-Message-State: AO0yUKUxXd4OMx0FYhQ75GS+kVxmjNtOyauk5yw6ODbtMJQWTjvquo6B DdtRSrJUHSgXg9Pt7hyGwyXVj/arqh4VZA== X-Google-Smtp-Source: AK7set9o3ReXb02KOU8z9vJgD27yzgyAGVIpy2km8Yp0wsPBs2mSvzdsreBDMsF5mJp+WyndDknP/9+69pP6Hg== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a05:6902:1205:b0:b3e:c715:c313 with SMTP id s5-20020a056902120500b00b3ec715c313mr1019906ybu.6.1678846682653; Tue, 14 Mar 2023 19:18:02 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:26 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-3-amoorthy@google.com> Subject: [WIP Patch v2 02/14] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org With multiple reader threads POLLing a single UFFD, the test suffers from the thundering herd problem: performance degrades as the number of reader threads is increased. Solve this issue [1] by switching the the polling mechanism to EPOLL + EPOLLEXCLUSIVE. Also, change the error-handling convention of uffd_handler_thread_fn. Instead of just printing errors and returning early from the polling loop, check for them via TEST_ASSERT. "return NULL" is reserved for a successful exit from uffd_handler_thread_fn, ie one triggered by a write to the exit pipe. Performance samples generated by the command in [2] are given below. Num Reader Threads, Paging Rate (POLL), Paging Rate (EPOLL) 1 249k 185k 2 201k 235k 4 186k 155k 16 150k 217k 32 89k 198k [1] Single-vCPU performance does suffer somewhat. [2] ./demand_paging_test -u MINOR -s shmem -v 4 -o -r Signed-off-by: Anish Moorthy Acked-by: James Houghton --- .../selftests/kvm/demand_paging_test.c | 1 - .../selftests/kvm/lib/userfaultfd_util.c | 76 +++++++++---------- 2 files changed, 37 insertions(+), 40 deletions(-) diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index fc9c6ac76660c..f8c1831614a9d 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -13,7 +13,6 @@ #include #include #include -#include #include #include #include diff --git a/tools/testing/selftests/kvm/lib/userfaultfd_util.c b/tools/testing/selftests/kvm/lib/userfaultfd_util.c index 2723ee1e3e1b2..863840d340105 100644 --- a/tools/testing/selftests/kvm/lib/userfaultfd_util.c +++ b/tools/testing/selftests/kvm/lib/userfaultfd_util.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include "kvm_util.h" @@ -32,60 +33,56 @@ static void *uffd_handler_thread_fn(void *arg) int64_t pages = 0; struct timespec start; struct timespec ts_diff; + int epollfd; + struct epoll_event evt; + + epollfd = epoll_create(1); + TEST_ASSERT(epollfd >= 0, "Failed to create epollfd."); + + evt.events = EPOLLIN | EPOLLEXCLUSIVE; + evt.data.u32 = 0; + TEST_ASSERT(epoll_ctl(epollfd, EPOLL_CTL_ADD, uffd, &evt) == 0, + "Failed to add uffd to epollfd"); + + evt.events = EPOLLIN; + evt.data.u32 = 1; + TEST_ASSERT(epoll_ctl(epollfd, EPOLL_CTL_ADD, reader_args->pipe, &evt) == 0, + "Failed to add pipe to epollfd"); clock_gettime(CLOCK_MONOTONIC, &start); while (1) { struct uffd_msg msg; - struct pollfd pollfd[2]; - char tmp_chr; int r; - pollfd[0].fd = uffd; - pollfd[0].events = POLLIN; - pollfd[1].fd = reader_args->pipe; - pollfd[1].events = POLLIN; - - r = poll(pollfd, 2, -1); - switch (r) { - case -1: - pr_info("poll err"); - continue; - case 0: - continue; - case 1: - break; - default: - pr_info("Polling uffd returned %d", r); - return NULL; - } + r = epoll_wait(epollfd, &evt, 1, -1); + TEST_ASSERT( + r == 1, + "Unexpected number of events (%d) returned by epoll, errno = %d", + r, errno); - if (pollfd[0].revents & POLLERR) { - pr_info("uffd revents has POLLERR"); - return NULL; - } + if (evt.data.u32 == 1) { + char tmp_chr; - if (pollfd[1].revents & POLLIN) { - r = read(pollfd[1].fd, &tmp_chr, 1); + TEST_ASSERT(!(evt.events & (EPOLLERR | EPOLLHUP)), + "Reader thread received EPOLLERR or EPOLLHUP on pipe."); + r = read(reader_args->pipe, &tmp_chr, 1); TEST_ASSERT(r == 1, - "Error reading pipefd in UFFD thread\n"); + "Error reading pipefd in uffd reader thread"); return NULL; } - if (!(pollfd[0].revents & POLLIN)) - continue; + TEST_ASSERT(!(evt.events & (EPOLLERR | EPOLLHUP)), + "Reader thread received EPOLLERR or EPOLLHUP on uffd."); r = read(uffd, &msg, sizeof(msg)); if (r == -1) { - if (errno == EAGAIN) - continue; - pr_info("Read of uffd got errno %d\n", errno); - return NULL; + TEST_ASSERT(errno == EAGAIN, + "Error reading from UFFD: errno = %d", errno); + continue; } - if (r != sizeof(msg)) { - pr_info("Read on uffd returned unexpected size: %d bytes", r); - return NULL; - } + TEST_ASSERT(r == sizeof(msg), + "Read on uffd returned unexpected number of bytes (%d)", r); if (!(msg.event & UFFD_EVENT_PAGEFAULT)) continue; @@ -93,8 +90,9 @@ static void *uffd_handler_thread_fn(void *arg) if (reader_args->delay) usleep(reader_args->delay); r = reader_args->handler(reader_args->uffd_mode, uffd, &msg); - if (r < 0) - return NULL; + TEST_ASSERT( + r >= 0, + "Reader thread handler function returned negative value %d", r); pages++; } From patchwork Wed Mar 15 02:17:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175214 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9892DC7618D for ; Wed, 15 Mar 2023 02:18:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230408AbjCOCSI (ORCPT ); Tue, 14 Mar 2023 22:18:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230030AbjCOCSF (ORCPT ); Tue, 14 Mar 2023 22:18:05 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DAAFE1B7 for ; Tue, 14 Mar 2023 19:18:04 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-538116920c3so184361677b3.15 for ; Tue, 14 Mar 2023 19:18:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846683; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=bq+O9a1u8eXgxSzc7ecXz3Zf8cdQRV4ap6Xut1hzXhk=; b=U9+eY6410Erwi3Nr2kPWBuFJFUVXvrUqmUnx9+Q6T8QuHHFPL1HRx44Z38/eKvPTug yjuBTrwjJaJnOjFseVzyoWayTIW3uwUg5W42AelrjE/YO4QOWysu/HZZ15lhW3v987Li G8pc+LqVrlQJq+tvXHt/JbUCJLMi4Ig0G5HzwhhUB6zwGaDBsHT2gMaLEJQsfpv60q1C xnodbWBEueE6LFuxUAzSzj303iIvS8yPAMuDoZKIwVYVVBIh2q1bklguacbGENKaYFNo KKWanb4oYN0szxUUH0a6p8EuXGisuGyJrNPZhOT/nGfDqSoJh9zAElciS3Y8GqwizU1v 6i2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846683; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bq+O9a1u8eXgxSzc7ecXz3Zf8cdQRV4ap6Xut1hzXhk=; b=UCtiomHaKAu9twHEPNtmi9/Xmhkf9JOsrCtLeQJnXgQ9jIH2Gdk0Js4R5e/mTR3asn oA47uOeZfHu6BgyH//jly5qcgvYz9woLl6TgkLKr16L4S408vWMVfYAU39qcntZL44Je kDfFGy9vmHM1Z687d+Wg0QNF0joJwY3Kn5X4ZN1U1nFDWDW1pb6Btqyehnp7pIYsXtJ8 qN0vG1V/ITQx6HIZWvN9vZAfkwQywOPIUnJbp6E0Z2Hao1U0w3AYGWoElAGU+zPaZrpb l7QUv1fJnVXe2c8Fw9mKCiQCw0/ISLxee4xngh/45XHih1Zs6voFLUp0Hz4pEnTWfmJV lyhg== X-Gm-Message-State: AO0yUKXq61tpDudR4cdJLuGO2kOXmuceBhcgSLekoDkVyHAkOup7uXI2 SXBPs82PeRGZb2es7mIE3c4fcm2YwyiIKg== X-Google-Smtp-Source: AK7set+IYIxxhg12Snl0iXDFeh1Y7UbmxuAQGbZWEkenjtcCdlqzvQsREZYkPhj+2e/Tx5RxaozB2lwCG3E5zw== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a81:af1f:0:b0:536:4ad1:f71 with SMTP id n31-20020a81af1f000000b005364ad10f71mr25820691ywh.9.1678846683690; Tue, 14 Mar 2023 19:18:03 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:27 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-4-amoorthy@google.com> Subject: [WIP Patch v2 03/14] KVM: Allow hva_pfn_fast to resolve read-only faults. From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org hva_to_pfn_fast currently just fails for read-only faults, which is unnecessary. Instead, try pinning the page without passing FOLL_WRITE. This allows read-only faults to (potentially) be resolved without falling back to slow GUP. Suggested-by: James Houghton Signed-off-by: Anish Moorthy --- virt/kvm/kvm_main.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d255964ec331e..e38ddda05b261 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2479,7 +2479,7 @@ static inline int check_user_page_hwpoison(unsigned long addr) } /* - * The fast path to get the writable pfn which will be stored in @pfn, + * The fast path to get the pfn which will be stored in @pfn, * true indicates success, otherwise false is returned. It's also the * only part that runs if we can in atomic context. */ @@ -2487,16 +2487,14 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, bool *writable, kvm_pfn_t *pfn) { struct page *page[1]; - /* * Fast pin a writable pfn only if it is a write fault request * or the caller allows to map a writable pfn for a read fault * request. */ - if (!(write_fault || writable)) - return false; + unsigned int gup_flags = (write_fault || writable) ? FOLL_WRITE : 0; - if (get_user_page_fast_only(addr, FOLL_WRITE, page)) { + if (get_user_page_fast_only(addr, gup_flags, page)) { *pfn = page_to_pfn(page[0]); if (writable) From patchwork Wed Mar 15 02:17:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175216 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5952C6FD1D for ; Wed, 15 Mar 2023 02:18:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230423AbjCOCSM (ORCPT ); Tue, 14 Mar 2023 22:18:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229633AbjCOCSH (ORCPT ); Tue, 14 Mar 2023 22:18:07 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C21C23655 for ; Tue, 14 Mar 2023 19:18:05 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5419fb7d6c7so81793007b3.11 for ; Tue, 14 Mar 2023 19:18:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846684; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UKZCXRQdouW4lFONkCtjqZEO064FzEr/Crc/c+grCEc=; b=IG2HUkYVekK6hu1A/87Fasmd0q4In1YsdAzk3ZA8TLvBsBeAvPmTZKhAfgL0627G8t BF6GXB9ZNKYX59VvBfLmHj+qMxvYtHgYmhuIr1X2DKnVjwszEtFd5czYna/0uKwl1nbR 9LUgcobH1h80vq9JWomchvusCOpAz2Jj2tkfjnSXsmveSIDY+xjV55Ll91MMpW4PtRor gzvhzfCDsULSgBrfHiNYWB8xjT80Yn3woCoWhrTQGvSLlr0jETAxS2Vko6O7nbShlvEu VNio9LhvSGs3wRW2e5nj6h/hMjLKrQamNKp4+zkBZU8wnFWvKgqbUXCM8qaDRIlzrnjM YMTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846684; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UKZCXRQdouW4lFONkCtjqZEO064FzEr/Crc/c+grCEc=; b=SswBus1HUjwUkW3IeGOZOdc/iZp0dk8eIrK5LarpCEzoAST9LMYg+lmCmjS/gOZV49 29PiRRKaDo7vIz9PT6oMLCvG5B/f9ns1jye0BpIuGjaOFQcypp8NtGq/BREdXi77p80H aCy67iQ/Wp5TS1gVIPhHc5fAaERHAU+gPl4FwV03qc5Rx8ir7Q1fZVFoU33fYPwlkhBn 9BCz4cMWOEgeEojm/YFsklk4kxJxQtokQdaG6+CWrSl7shGq5isWVXlyMuJMvbCH7F0r nbD5YlrB+uN8pdaGkkeyx7CBWbb5oeZHiEeHTqMgtkczSrnPZPvgAEm1rp0jMhsgfvgr uKJg== X-Gm-Message-State: AO0yUKV2MsvQGG4dDW7N0ulh4EDp1BhnnYRi6nLSfdqtCdrrtvvwS059 haXlILBKpHpSA5hdzZq6nOUPlX9JtlbHqQ== X-Google-Smtp-Source: AK7set9TfLFf1hsOd24CXOgiqxFWvVztR4UFy8lrysh3yrpi0ycvRX0iBlrfrdfvA8o87v7AACCShFy01CG9jA== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a81:ef14:0:b0:544:7dac:710d with SMTP id o20-20020a81ef14000000b005447dac710dmr224485ywm.6.1678846684649; Tue, 14 Mar 2023 19:18:04 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:28 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-5-amoorthy@google.com> Subject: [WIP Patch v2 04/14] KVM: x86: Add KVM_CAP_X86_MEMORY_FAULT_EXIT and associated kvm_run field From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Memory fault exits allow KVM to return useful information from KVM_RUN instead of having to -EFAULT when a guest memory access goes wrong. Document the intent and API of the new capability, and introduce helper functions which will be useful in places where it needs to be implemented. Also allow the capability to be enabled, even though that won't currently *do* anything: implementations at the relevant -EFAULT sites will performed in subsequent commits. --- Documentation/virt/kvm/api.rst | 37 ++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 1 + include/linux/kvm_host.h | 16 +++++++++++++++ include/uapi/linux/kvm.h | 16 +++++++++++++++ tools/include/uapi/linux/kvm.h | 15 ++++++++++++++ virt/kvm/kvm_main.c | 28 +++++++++++++++++++++++++ 6 files changed, 113 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 62de0768d6aa5..f9ca18bbec879 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6636,6 +6636,19 @@ array field represents return values. The userspace should update the return values of SBI call before resuming the VCPU. For more details on RISC-V SBI spec refer, https://github.com/riscv/riscv-sbi-doc. +:: + + /* KVM_EXIT_MEMORY_FAULT */ + struct { + __u64 flags; + __u64 gpa; + __u64 len; /* in bytes */ + } memory_fault; + +Indicates a memory fault on the guest physical address range [gpa, gpa + len). +flags is a bitfield describing the reasons(s) for the fault. See +KVM_CAP_X86_MEMORY_FAULT_EXIT for more details. + :: /* KVM_EXIT_NOTIFY */ @@ -7669,6 +7682,30 @@ This capability is aimed to mitigate the threat that malicious VMs can cause CPU stuck (due to event windows don't open up) and make the CPU unavailable to host or other VMs. +7.34 KVM_CAP_X86_MEMORY_FAULT_EXIT +---------------------------------- + +:Architectures: x86 +:Parameters: args[0] is a bitfield specifying what reasons to exit upon. +:Returns: 0 on success, -EINVAL if unsupported or if unrecognized exit reason + specified. + +This capability transforms -EFAULTs returned by KVM_RUN in response to guest +memory accesses into VM exits (KVM_EXIT_MEMORY_FAULT), with 'gpa' and 'len' +describing the problematic range of memory and 'flags' describing the reason(s) +for the fault. + +The implementation is currently incomplete. Please notify the maintainers if you +come across a case where it needs to be implemented. + +Through args[0], the capability can be set on a per-exit-reason basis. +Currently, the only exit reasons supported are + +1. KVM_MEMFAULT_REASON_UNKNOWN (1 << 0) + +Memory fault exits with a reason of UNKNOWN should not be depended upon: they +may be added, removed, or reclassified under a stable reason. + 8. Other capabilities. ====================== diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f706621c35b86..b3c1b2f57e680 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4425,6 +4425,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VAPIC: case KVM_CAP_ENABLE_CAP: case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: + case KVM_CAP_X86_MEMORY_FAULT_EXIT: r = 1; break; case KVM_CAP_EXIT_HYPERCALL: diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8ada23756b0ec..d3ccfead73e42 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -805,6 +805,7 @@ struct kvm { struct notifier_block pm_notifier; #endif char stats_id[KVM_STATS_NAME_SIZE]; + uint64_t memfault_exit_reasons; }; #define kvm_err(fmt, ...) \ @@ -2278,4 +2279,19 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 +/* + * If memory fault exits are enabled for any of the reasons given in exit_flags + * then sets up a KVM_EXIT_MEMORY_FAULT for the given guest physical address, + * length, and flags and returns -1. + * Otherwise, returns -EFAULT + */ +inline int kvm_memfault_exit_or_efault( + struct kvm_vcpu *vcpu, uint64_t gpa, uint64_t len, uint64_t exit_flags); + +/* + * Checks that all of the bits specified in 'reasons' correspond to known + * memory fault exit reasons. + */ +bool kvm_memfault_exit_flags_valid(uint64_t reasons); + #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d77aef872a0a0..0ba1d7f01346e 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -264,6 +264,7 @@ struct kvm_xen_exit { #define KVM_EXIT_RISCV_SBI 35 #define KVM_EXIT_RISCV_CSR 36 #define KVM_EXIT_NOTIFY 37 +#define KVM_EXIT_MEMORY_FAULT 38 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -505,6 +506,17 @@ struct kvm_run { #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) __u32 flags; } notify; + /* KVM_EXIT_MEMORY_FAULT */ + struct { + /* + * Indicates a memory fault on the guest physical address range + * [gpa, gpa + len). flags is a bitfield describing the reasons(s) + * for the fault. + */ + __u64 flags; + __u64 gpa; + __u64 len; /* in bytes */ + } memory_fault; /* Fix the size of the union. */ char padding[256]; }; @@ -1184,6 +1196,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224 #define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225 #define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226 +#define KVM_CAP_X86_MEMORY_FAULT_EXIT 227 #ifdef KVM_CAP_IRQ_ROUTING @@ -2237,4 +2250,7 @@ struct kvm_s390_zpci_op { /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) +/* Exit reasons for KVM_EXIT_MEMORY_FAULT */ +#define KVM_MEMFAULT_REASON_UNKNOWN (1 << 0) + #endif /* __LINUX_KVM_H */ diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index 55155e262646e..2b468345f25c3 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -264,6 +264,7 @@ struct kvm_xen_exit { #define KVM_EXIT_RISCV_SBI 35 #define KVM_EXIT_RISCV_CSR 36 #define KVM_EXIT_NOTIFY 37 +#define KVM_EXIT_MEMORY_FAULT 38 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -505,6 +506,17 @@ struct kvm_run { #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) __u32 flags; } notify; + /* KVM_EXIT_MEMORY_FAULT */ + struct { + /* + * Indicates a memory fault on the guest physical address range + * [gpa, gpa + len). flags is a bitfield describing the reasons(s) + * for the fault. + */ + __u64 flags; + __u64 gpa; + __u64 len; /* in bytes */ + } memory_fault; /* Fix the size of the union. */ char padding[256]; }; @@ -2228,4 +2240,7 @@ struct kvm_s390_zpci_op { /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) +/* Exit reasons for KVM_EXIT_MEMORY_FAULT */ +#define KVM_MEMFAULT_REASON_UNKNOWN (1 << 0) + #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e38ddda05b261..00aec43860ff1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1142,6 +1142,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) spin_lock_init(&kvm->mn_invalidate_lock); rcuwait_init(&kvm->mn_memslots_update_rcuwait); xa_init(&kvm->vcpu_array); + kvm->memfault_exit_reasons = 0; INIT_LIST_HEAD(&kvm->gpc_list); spin_lock_init(&kvm->gpc_lock); @@ -4671,6 +4672,14 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, return r; } + case KVM_CAP_X86_MEMORY_FAULT_EXIT: { + if (!kvm_vm_ioctl_check_extension(kvm, KVM_CAP_X86_MEMORY_FAULT_EXIT)) + return -EINVAL; + else if (!kvm_memfault_exit_flags_valid(cap->args[0])) + return -EINVAL; + kvm->memfault_exit_reasons = cap->args[0]; + return 0; + } default: return kvm_vm_ioctl_enable_cap(kvm, cap); } @@ -6172,3 +6181,22 @@ int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn, return init_context.err; } + +inline int kvm_memfault_exit_or_efault( + struct kvm_vcpu *vcpu, uint64_t gpa, uint64_t len, uint64_t exit_flags) +{ + if (!(vcpu->kvm->memfault_exit_reasons & exit_flags)) + return -EFAULT; + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; + vcpu->run->memory_fault.gpa = gpa; + vcpu->run->memory_fault.len = len; + vcpu->run->memory_fault.flags = exit_flags; + return -1; +} + +bool kvm_memfault_exit_flags_valid(uint64_t reasons) +{ + uint64_t valid_flags = KVM_MEMFAULT_REASON_UNKNOWN; + + return !(reasons & !valid_flags); +} From patchwork Wed Mar 15 02:17:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175215 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CDD8C7618A for ; Wed, 15 Mar 2023 02:18:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230420AbjCOCSK (ORCPT ); Tue, 14 Mar 2023 22:18:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230359AbjCOCSH (ORCPT ); Tue, 14 Mar 2023 22:18:07 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE33BE1B7 for ; Tue, 14 Mar 2023 19:18:05 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id w14-20020a25ac0e000000b00b369c36c165so11624220ybi.6 for ; Tue, 14 Mar 2023 19:18:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846685; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=F0Mb5w0ZCF8Y3aayRysDLjn/DNgIzbtfOXDWe3H9Orw=; b=onNUU+LqXD/kfgHXV7HsW6g3sUblWEfqaPYUGPiATDPjKj12rg+7nlz69F304YkiUQ vj6CPyyxDWJyK0eF9VZMpNlsDBDLIIYL/08XYkWoMyyC/1PSyyqwaSJtjzhF+wp2kqJM F19Z3HdWDZ30vN9FFic+lGcvLeEcniGdyVcdcu4XiUjkyjZoWa2gawqh+wR2TshCEHiB z0qiXeR0iyPGLwiVTTXjJt/nTHdZme03PALBlh8lx6EQx7tfo/VHd8hK9KRROIPzaTlo 37YXFAhA4Pe5q3MtXZ91/lwqSQLYxoTssMDt2HigIyQxJzf8o11Wr8sag++4HAjlZd0u gURQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846685; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=F0Mb5w0ZCF8Y3aayRysDLjn/DNgIzbtfOXDWe3H9Orw=; b=wfh2vOvXR01eXoMoAivRDU86vP+bff+m+Z5+x5gW9e4IVfKP3EzQ6QI90IvWjw+kod tV9mRM31D5nNzZuE7dY0Pgw9RYm6WVPFt2FnVCGVsvkZ2QLFcC3HN8Jo33Go+4sV29t1 14bJsAi7XD26sYM1z1eDhu/boSq8jBfMkMfuDNLykYIBLQgtuZUSB1D/DJX9or26rFAJ BYH5fqyphdLhY3oUrc/6QMU7JCTt/bBNaRY8SHEt74wpZlW5sMlh7/frstdIJEzVSsHt WjJoiCUgiacxDmiFuQdnQrVsWbTEeOnQmr2nl/aCWoIy8V7zG2XIDddKRzKEIWQuwngq EH/w== X-Gm-Message-State: AO0yUKXzV3qwr/Y0widHb+4hLfKlKU5cnFId4kHh4RpJ0Ov1inOATUXO L31d3ijvi1yAmLbCYZnuDDhFz10/Cv6gsg== X-Google-Smtp-Source: AK7set98+f39ohth03C+Fyu6+B0Kpd5LRFCya2w5WV/rKIQHdL4gASrczEpM9VjlLMw//lBMgNLg0fhMxux3tQ== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a81:ed06:0:b0:540:e6c5:5118 with SMTP id k6-20020a81ed06000000b00540e6c55118mr10158746ywm.2.1678846685481; Tue, 14 Mar 2023 19:18:05 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:29 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-6-amoorthy@google.com> Subject: [WIP Patch v2 05/14] KVM: x86: Implement memory fault exit for direct_map From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org TODO: The return value of this function is ignored in kvm_arch_async_page_ready. Make sure that the side effects of memory_fault_exit_or_efault are acceptable there. --- arch/x86/kvm/mmu/mmu.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c8ebe542c565f..0b02e2c360c08 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3193,7 +3193,10 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) } if (WARN_ON_ONCE(it.level != fault->goal_level)) - return -EFAULT; + return kvm_memfault_exit_or_efault( + vcpu, fault->gfn * PAGE_SIZE, + KVM_PAGES_PER_HPAGE(fault->goal_level), + KVM_MEMFAULT_REASON_UNKNOWN); ret = mmu_set_spte(vcpu, fault->slot, it.sptep, ACC_ALL, base_gfn, fault->pfn, fault); From patchwork Wed Mar 15 02:17:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175217 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76199C7618A for ; Wed, 15 Mar 2023 02:18:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230422AbjCOCSP (ORCPT ); Tue, 14 Mar 2023 22:18:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43094 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230395AbjCOCSI (ORCPT ); Tue, 14 Mar 2023 22:18:08 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D8792DE64 for ; Tue, 14 Mar 2023 19:18:07 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-541a39df9f4so77695387b3.20 for ; Tue, 14 Mar 2023 19:18:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846686; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=vW6rX2Jrools7NRMcNdMGYee4jcJRRUYgvKMGEvobrg=; b=Y364NvKuiYHXVPQKv2ilaTlGGB+NYVmBmHEv/WXl3eQD5dVirC/ENEbYCF2+dBvvqo YYcfhE0Db/sny+jNPfIJIOjWMZqEylNqlSHyPNLAbMQeOHQ8QOJNBMpOqJxCmfByV4qn O52Fuy9+11w45eWdJGEOOWy2qW79uoJv+5pDRbGz1/L2nfVvRDFfnc38PDugsOFHS035 hudW2n0zSFfjT8wYpoTBgNmg4hRl7JlO3P0QjhUYM5VpOEOho0OD/oMSUfBXEmpjllec 5id/sLmkxSVSQlGkK6I74/hV8/ECiDafRc/U912nvr1EZ+vydlmnfyU2rP6QWlEIHKL9 jM1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846686; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vW6rX2Jrools7NRMcNdMGYee4jcJRRUYgvKMGEvobrg=; b=OYcFXNoyPM8Yu/PeisF3LOzAyEU0Cb+YSrYsLg35KQHyYGesdxYabIHWItDu6FT62A 0MDL0zVAc1gg+LjcjyA0xtKvT2/hRXS74TvxDiQAut+0KZ4noiKSvNBJW2v5zPY/t0t9 O8ykCU6lj7bP8QtIWvBhSyfjC/PCFyM0SpshN88/M3dhHDuQoSlR5DLE26gk7Sy+jVCp lHWTupGMepdfBTz3VaECgAa3yiaPtl50yfHADPvdrLtUAATjrK0eMolV9+sjGMxANVxN BluYB1ubjdT0IZtloREVg20OWdYbUY+vFRttpZtsx+B0cubEqaS41wmc/XBjfggExE5z mbFg== X-Gm-Message-State: AO0yUKXqqd9h7yTF8QwhEdfPuUAqjaVMkeE9wCFISMV4gPPpe1eUz3Z/ 1uxE5a3A+y64OKPUW9DdIz+Sb7t6BtqxIg== X-Google-Smtp-Source: AK7set9oQDiqrEOFbJ03n5ZfdXEanstD3FAVaYyn+fhysTq8Vcu8fGmxP29HhGIiX9noehtkdrmbLADhK9m4Lw== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a25:9c43:0:b0:94a:ebba:cba6 with SMTP id x3-20020a259c43000000b0094aebbacba6mr19656011ybo.9.1678846686501; Tue, 14 Mar 2023 19:18:06 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:30 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-7-amoorthy@google.com> Subject: [WIP Patch v2 06/14] KVM: x86: Implement memory fault exit for kvm_handle_page_fault From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org --- arch/x86/kvm/mmu/mmu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0b02e2c360c08..5e0140db384f6 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4375,7 +4375,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, #ifndef CONFIG_X86_64 /* A 64-bit CR2 should be impossible on 32-bit KVM. */ if (WARN_ON_ONCE(fault_address >> 32)) - return -EFAULT; + return kvm_mefault_exit_or_efault( + vcpu, fault_address, PAGE_SIZE, + KVM_MEMFAULT_REASON_UNKNOWN); #endif vcpu->arch.l1tf_flush_l1d = true; From patchwork Wed Mar 15 02:17:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175218 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6CD3C7618D for ; Wed, 15 Mar 2023 02:18:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230371AbjCOCSR (ORCPT ); Tue, 14 Mar 2023 22:18:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230255AbjCOCSI (ORCPT ); Tue, 14 Mar 2023 22:18:08 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BA6B22CB5 for ; Tue, 14 Mar 2023 19:18:08 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 204-20020a2514d5000000b00a3637aea9e1so19133669ybu.17 for ; Tue, 14 Mar 2023 19:18:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846687; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=yfFTGhbbVraDRgVNXHu8uvOvYJ9e5XPaW8umznc0CcA=; b=iuPFeNQCLIKVWv/PcGHPSOQF56F84Z4k6pPGo0DhTV127D1hJTtMM4gCrWjMuL5kRT m2Ic2ugK52Um0iz4hpxkEYyFnEchVCLlI1OD/bE0XEmAzddQ/0Af0m/wcu0ceOniHJL8 GtuxDu170BEQ8m1x+ZGwUWOiXs3VL9khy0XLBscsfq6UdfMwA95HDdxASJuGKo7tMd1E 69rAtrdkNEAvjcQXcjLHzQTPRJG7zEn4PvIAxEcrswUatNFATsZOvLt0Qv23jKg2cnLC L9YMXDOb8x6OqbR3DF8hZ+XI/k8ZpuBHq451/BJlxUmFn3RUZsJCwjQEJPwlXtgd3bwZ zYhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846687; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yfFTGhbbVraDRgVNXHu8uvOvYJ9e5XPaW8umznc0CcA=; b=BMKe9q9FEqiOQm68jqszBEKoxGSIILc6vg88aWXnhF6+CEdAP6gDWhxSI68v1zjsbG RFENgC0i06v10dYt1ifIYYd61vA24PGF14OGDmUurJ4VlpooNy10k/9mlB2FZWDtDQtS VQP6i21LITP/PNIC9P2mDvEiiXLJ8H7OxjgB+nEJPgidoJbPaFIk7uaKv/Km7tM172Xi 1hHAz5rrf8Jozm/jjJ5UaOeruzlU/AGEHoSe0o6ndUzlADaylRgFyfoz0IPQMZgp4JL/ n1Pytf81XOsi+VaFGRZ/ffJr6O8b/+fm89Os3AwVxTzKsuxd0S+5uhgDT0f+Pv9SdNMJ lqDg== X-Gm-Message-State: AO0yUKVzptEfDwfojAIKXvltmy4HAXuFqxDyDI9PqhY9MheQTEiOrsDC mcKDEi1oREWDhYJF1ndIuCBOSS0F/OQFcQ== X-Google-Smtp-Source: AK7set9NS3+w2A3CePKXQBzZnfMtJpEuyDhal2sNJnOm7hnztem4I38hnaHSRurYYAV88qWsTKdT9/k8B6I1wQ== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a81:ef01:0:b0:541:66a2:dd93 with SMTP id o1-20020a81ef01000000b0054166a2dd93mr9094443ywm.3.1678846687407; Tue, 14 Mar 2023 19:18:07 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:31 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-8-amoorthy@google.com> Subject: [WIP Patch v2 07/14] KVM: x86: Implement memory fault exit for setup_vmgexit_scratch From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org --- arch/x86/kvm/svm/sev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index c25aeb550cd97..c042d385350de 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2683,7 +2683,9 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len) pr_err("vmgexit: kvm_read_guest for scratch area failed\n"); kvfree(scratch_va); - return -EFAULT; + return kvm_memfault_exit_or_efault( + &svm->vcpu, scratch_gpa_beg, len, + KVM_MEMFAULT_REASON_UNKNOWN); } /* From patchwork Wed Mar 15 02:17:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175219 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A957C6FD1F for ; Wed, 15 Mar 2023 02:18:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230393AbjCOCST (ORCPT ); Tue, 14 Mar 2023 22:18:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230414AbjCOCSK (ORCPT ); Tue, 14 Mar 2023 22:18:10 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 196832F78F for ; Tue, 14 Mar 2023 19:18:08 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536d63d17dbso186987597b3.22 for ; Tue, 14 Mar 2023 19:18:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846688; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PspZCzUKPaFAOpAxbY2Ovw8whn5Lj6qp2h/+W5rl0Nw=; b=FtE7NUPPDAZpU6HMOhKHMXQBEef1ltjt6HKkrMzk5UsmDjG/njjxE3SprJYef0Focc AIAMROMVhqSkeyvtuYlgsa6lJAuppz2Ob7/BimDublG2szvI4SEyODT3W8zk9cNqsFNO 0dv4g6sMXbtJu0FlA61QnAVYyqBwzmEGbDtq2i5uHvzQh85TkLfOoPpsXS/fb3E1vB2u iGnkzZGosWXQlViwny1o7mDgq7pu2POm3+xzgXX2RClZf4JZ4ao7JADJQ6zjHNDK85Qr AmNWleJd2pTJ0xqb60nbRQMxC70E8f9lVNlI+MUr47mZICtk8BmM4sJSxV/Y5PMMU1BG l9Dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846688; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PspZCzUKPaFAOpAxbY2Ovw8whn5Lj6qp2h/+W5rl0Nw=; b=C7jMocixYhpS9fBfBYh2+7O4WwLXOHLQArjnDx17vHf1z3yjwNG1IVg6V3eBcJ1YW3 nKS7T5uDrCfWPK7QJxeGzfijMWX83ISI22bOJQAC6Doxpa941Hmw/9q19IC0S1uX80hA iClID4xeYxHdq3zLe6cJ7W0nCpWBpFmjmwBBauIJPBI4AFXfE2t5eS94zrlWawv2w+KJ TinnmwVqCTkKpyxwj7yPRfgJWgjbfIzyYt/RxqPsnFAOSg2tQ71eeEIoQUC4/p1qsHQ3 yXayMsO5Wp1eaWgKc8fiUZMA8gQBtDKunPzygZ5I03hofsGFm27dcov/uBYdTPWpzNaY TDXw== X-Gm-Message-State: AO0yUKXcfwYung/PjJXj1pG0CfdUMv7tvrZKDPPotmhxGrllDFhNAFNE eEqcMJx6Y7vwpaFEQT+Z+J/elkm0OnJTBQ== X-Google-Smtp-Source: AK7set88lixFsKzxi/PrTdCTgM6mm48cFhmYViig7GrUjxGvtnUZfp8R/105AQWd8fUvHaeveJivsDp2ITohTw== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a05:6902:1024:b0:b4a:3896:bc17 with SMTP id x4-20020a056902102400b00b4a3896bc17mr1226573ybt.0.1678846688254; Tue, 14 Mar 2023 19:18:08 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:32 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-9-amoorthy@google.com> Subject: [WIP Patch v2 08/14] KVM: x86: Implement memory fault exit for FNAME(fetch) From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org --- arch/x86/kvm/mmu/paging_tmpl.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 57f0b75c80f9d..ed996dccc03bf 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -717,7 +717,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, } if (WARN_ON_ONCE(it.level != fault->goal_level)) - return -EFAULT; + return kvm_memfault_exit_or_efault( + vcpu, fault->gfn * PAGE_SIZE, KVM_PAGES_PER_HPAGE(fault->goal_level), + KVM_MEMFAULT_REASON_UNKNOWN); ret = mmu_set_spte(vcpu, fault->slot, it.sptep, gw->pte_access, base_gfn, fault->pfn, fault); From patchwork Wed Mar 15 02:17:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175220 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22739C6FD1D for ; Wed, 15 Mar 2023 02:18:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230426AbjCOCSU (ORCPT ); Tue, 14 Mar 2023 22:18:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230425AbjCOCSM (ORCPT ); Tue, 14 Mar 2023 22:18:12 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBB602FCC4 for ; Tue, 14 Mar 2023 19:18:09 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-544781e30easo8996057b3.1 for ; Tue, 14 Mar 2023 19:18:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846689; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ouQZR3jCVX1NbJ8ErcgopfTDmNVdxDovSmR7v+Aq4kA=; b=g0erQilp5jzVdp4J++yfeRLeAhQ2A11SNqKjG2KZNejaQ6l0iIHhNi2VpQWbbltPv1 YL/ceVpZW1jkuGD35U9tw06jFQUdHNFFCER/ikGa2MrhzmIkRFqX6WggO4CJN40Am3/q pXVess640l0ZILiEd/wa+Ce+DkzrURrvsc+M0lTHcuNd9ExWo5WR1jLuBRk3vP+NQ80N 49QSzg4c4CMTQnpmouHde8dGZrl9qWsuIsvTtC02MHqu4xw/W5wcFt1E6otSxBU80dwR 32Tg5gUG4roLDaHwm8cpv9azbiwAuD9+6xBJodz+25Rig2lQwV595E8JxB9Fg4WkcQTg M8RQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846689; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ouQZR3jCVX1NbJ8ErcgopfTDmNVdxDovSmR7v+Aq4kA=; b=bOdehPIsF2Lp2PXZAqu/dDD+svcGV9LPLVbA2yq5SjwsWtCjEuwvTSAEzf0Xf2Kl5m yNMt/1Mzqy5WwcOTUl61iSv5SMfFU9qlCoiCHPAKRnR9x6WErjEobwX7/GBD+p2ZPonK GFPtqjr8nhhatmQNnExXXy53UkBsKTedhdI6kGTw8exFfE4O2gpSgswzKPnDVzCZTiFh J4nD7DsCYfVGZmOg3xtrDkJc6e9J5BD2UAZn5SO1F8tGC7bjJA1YowhUHGVc73ALLxHg vpIGf8Vw3+oldiImJQ9/C7KuRaBLH5BGhGhstkMRDSzoUO+fvebTyB0LnczFDsFjemfy wvOQ== X-Gm-Message-State: AO0yUKXAc9WsQvBqVZxivwiljbsx3diydpR+vtGR8pBUB4lQtVB8vkM3 xcflcqy7hp6//2gPKGWto0ob6FGt0rP2wg== X-Google-Smtp-Source: AK7set8HZsEmEXQslqIfABZ1D2Z2udCh07S5JB6g10ppnAuASIpmeFrInci4YQxuaMjh/bbJxj/OglHQMwk/xA== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a05:6902:2c5:b0:b45:5cbe:48b3 with SMTP id w5-20020a05690202c500b00b455cbe48b3mr2518477ybh.0.1678846689068; Tue, 14 Mar 2023 19:18:09 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:33 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-10-amoorthy@google.com> Subject: [WIP Patch v2 09/14] KVM: Introduce KVM_CAP_MEMORY_FAULT_NOWAIT without implementation From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add documentation, memslot flags, useful helper functions, and the actual new capability itself. Memory fault exits on absent mappings are particularly useful for userfaultfd-based live migration postcopy. When many vCPUs fault upon a single userfaultfd the faults can take a while to surface to userspace due to having to contend for uffd wait queue locks. Bypassing the uffd entirely by triggering a vCPU exit avoids this contention and can improve the fault rate by as much as 10x. --- Documentation/virt/kvm/api.rst | 37 +++++++++++++++++++++++++++++++--- include/linux/kvm_host.h | 6 ++++++ include/uapi/linux/kvm.h | 3 +++ tools/include/uapi/linux/kvm.h | 2 ++ virt/kvm/kvm_main.c | 7 ++++++- 5 files changed, 51 insertions(+), 4 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index f9ca18bbec879..4932c0f62eb3d 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1312,6 +1312,7 @@ yet and must be cleared on entry. /* for kvm_userspace_memory_region::flags */ #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) + #define KVM_MEM_ABSENT_MAPPING_FAULT (1UL << 2) This ioctl allows the user to create, modify or delete a guest physical memory slot. Bits 0-15 of "slot" specify the slot id and this value @@ -1342,12 +1343,15 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr be identical. This allows large pages in the guest to be backed by large pages in the host. -The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and -KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of +The flags field supports three flags + +1. KVM_MEM_LOG_DIRTY_PAGES: can be set to instruct KVM to keep track of writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to -use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it, +use it. +2. KVM_MEM_READONLY: can be set, if KVM_CAP_READONLY_MEM capability allows it, to make a new slot read-only. In this case, writes to this memory will be posted to userspace as KVM_EXIT_MMIO exits. +3. KVM_MEM_ABSENT_MAPPING_FAULT: see KVM_CAP_MEMORY_FAULT_NOWAIT for details. When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of the memory region are automatically reflected into the guest. For example, an @@ -7702,10 +7706,37 @@ Through args[0], the capability can be set on a per-exit-reason basis. Currently, the only exit reasons supported are 1. KVM_MEMFAULT_REASON_UNKNOWN (1 << 0) +2. KVM_MEMFAULT_REASON_ABSENT_MAPPING (1 << 1) Memory fault exits with a reason of UNKNOWN should not be depended upon: they may be added, removed, or reclassified under a stable reason. +7.35 KVM_CAP_MEMORY_FAULT_NOWAIT +-------------------------------- + +:Architectures: x86, arm64 +:Returns: -EINVAL. + +The presence of this capability indicates that userspace may pass the +KVM_MEM_ABSENT_MAPPING_FAULT flag to KVM_SET_USER_MEMORY_REGION to cause KVM_RUN +to exit to populate 'kvm_run.memory_fault' and exit to userspace (*) in response +to page faults for which the userspace page tables do not contain present +mappings. Attempting to enable the capability directly will fail. + +The 'gpa' and 'len' fields of kvm_run.memory_fault will be set to the starting +address and length (in bytes) of the faulting page. 'flags' will be set to +KVM_MEMFAULT_REASON_ABSENT_MAPPING. + +Userspace should determine how best to make the mapping present, then take +appropriate action. For instance, in the case of absent mappings this might +involve establishing the mapping for the first time via UFFDIO_COPY/CONTINUE or +faulting the mapping in using MADV_POPULATE_READ/WRITE. After establishing the +mapping, userspace can return to KVM to retry the previous memory access. + +(*) NOTE: On x86, KVM_CAP_X86_MEMORY_FAULT_EXIT must be enabled for the +KVM_MEMFAULT_REASON_ABSENT_MAPPING_reason: otherwise userspace will only receive +a -EFAULT from KVM_RUN without any useful information. + 8. Other capabilities. ====================== diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d3ccfead73e42..c28330f25526f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -593,6 +593,12 @@ static inline bool kvm_slot_dirty_track_enabled(const struct kvm_memory_slot *sl return slot->flags & KVM_MEM_LOG_DIRTY_PAGES; } +static inline bool kvm_slot_fault_on_absent_mapping( + const struct kvm_memory_slot *slot) +{ + return slot->flags & KVM_MEM_ABSENT_MAPPING_FAULT; +} + static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot) { return ALIGN(memslot->npages, BITS_PER_LONG) / 8; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 0ba1d7f01346e..2146b27cdd61a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -102,6 +102,7 @@ struct kvm_userspace_memory_region { */ #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) +#define KVM_MEM_ABSENT_MAPPING_FAULT (1UL << 2) /* for KVM_IRQ_LINE */ struct kvm_irq_level { @@ -1197,6 +1198,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225 #define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226 #define KVM_CAP_X86_MEMORY_FAULT_EXIT 227 +#define KVM_CAP_MEMORY_FAULT_NOWAIT 228 #ifdef KVM_CAP_IRQ_ROUTING @@ -2252,5 +2254,6 @@ struct kvm_s390_zpci_op { /* Exit reasons for KVM_EXIT_MEMORY_FAULT */ #define KVM_MEMFAULT_REASON_UNKNOWN (1 << 0) +#define KVM_MEMFAULT_REASON_ABSENT_MAPPING (1 << 1) #endif /* __LINUX_KVM_H */ diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index 2b468345f25c3..1a1707d9f442a 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -102,6 +102,7 @@ struct kvm_userspace_memory_region { */ #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) +#define KVM_MEM_ABSENT_MAPPING_FAULT (1UL << 2) /* for KVM_IRQ_LINE */ struct kvm_irq_level { @@ -2242,5 +2243,6 @@ struct kvm_s390_zpci_op { /* Exit reasons for KVM_EXIT_MEMORY_FAULT */ #define KVM_MEMFAULT_REASON_UNKNOWN (1 << 0) +#define KVM_MEMFAULT_REASON_ABSENT_MAPPING (1 << 1) #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 00aec43860ff1..aa3b59410a356 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1525,6 +1525,9 @@ static int check_memory_region_flags(const struct kvm_userspace_memory_region *m valid_flags |= KVM_MEM_READONLY; #endif + if (kvm_vm_ioctl_check_extension(NULL, KVM_CAP_MEMORY_FAULT_NOWAIT)) + valid_flags |= KVM_MEM_ABSENT_MAPPING_FAULT; + if (mem->flags & ~valid_flags) return -EINVAL; @@ -6196,7 +6199,9 @@ inline int kvm_memfault_exit_or_efault( bool kvm_memfault_exit_flags_valid(uint64_t reasons) { - uint64_t valid_flags = KVM_MEMFAULT_REASON_UNKNOWN; + uint64_t valid_flags + = KVM_MEMFAULT_REASON_UNKNOWN + | KVM_MEMFAULT_REASON_ABSENT_MAPPING; return !(reasons & !valid_flags); } From patchwork Wed Mar 15 02:17:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175221 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE058C6FD1F for ; Wed, 15 Mar 2023 02:18:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230437AbjCOCSW (ORCPT ); Tue, 14 Mar 2023 22:18:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230432AbjCOCSP (ORCPT ); Tue, 14 Mar 2023 22:18:15 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CF302FCF3 for ; Tue, 14 Mar 2023 19:18:10 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id t25-20020a252d19000000b00b341ad1b626so13196660ybt.19 for ; Tue, 14 Mar 2023 19:18:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846690; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BGs+CYQNnAC1/1DAeQ0v9jrMES3obN+k4CgQ4vi5pgY=; b=N7ufsDwje934tbXbl8Sbrp9u5iaWppQN91BjkQvKljW3t0rZNApdhSjeOHF1cc8vvM F6q8JeNIBZ9VNWAT0pj1ZpvqeUND0QdetuJITK3veQuCaRepcAAJ+ZQFIj/qTyQhRLi4 aw4O9fii/6pe76Wl6/GWw8CsARM163Mouj02lGG7EXHanePsTdQKCec7GS5YwzEp+KrC jJHDvQf/qCEeVTmnzmUaybRk26khJqjTXW/oMnJtQAnw8l/yewn+Cs7pExByqXfOe+9a GkO7S2yWkio98OaI7g38b1h8m4n2Em/vFRp3zNoFU1qILz+IXRvArUs9HcSEqnlST054 17Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846690; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BGs+CYQNnAC1/1DAeQ0v9jrMES3obN+k4CgQ4vi5pgY=; b=X80auA4CoeeaM6VcK1Rxm7xUD7cJ83tt+DZ/Qi0v64OvGCAmS18uq527i+kqWIMb8p 8iS84lVpjtiuJBfHx6ZsLcGAnOjJJqwU6co7HX72ONhnmMM0aUtxgEajAEbXJ/Gt7dvt G4IJibGJ3+d+Zb+LbxRxOfW0eN5JZJhBJ289q5xv/UboHHaJEnO+emlclfDpcl4hZhB1 GnaFe01GBqF2xZPJzbTYi4usMVK5ConIHPPkMkhztl8LVDPQqQLhZmOsIXWgQT+etneS T8nJ2qkICEgtGInm6NVnk8JNMD57FelgyaZDm54sKhb8FD45vZ670EIIHP925vR9c80P XP2A== X-Gm-Message-State: AO0yUKWNEa3G9fHApI80s3JZ4k4k3YWisorCbjoLuLnTxhhsdo7fKtnW LmIXfsIOmEslWowFXg71j6SEYQjpzVPSTg== X-Google-Smtp-Source: AK7set9eDfxDtnAU4+XBrjluNZ8+vxTrom6dCQjntDZEZQYHy12UouqiKx40BwESlt8EtAu+Ml65kvRn+GXshQ== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a81:a906:0:b0:52e:d2a7:1ba1 with SMTP id g6-20020a81a906000000b0052ed2a71ba1mr26322283ywh.1.1678846689933; Tue, 14 Mar 2023 19:18:09 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:34 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-11-amoorthy@google.com> Subject: [WIP Patch v2 10/14] KVM: x86: Implement KVM_CAP_MEMORY_FAULT_NOWAIT From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When a memslot has the KVM_MEM_MEMORY_FAULT_EXIT flag set, exit to userspace upon encountering a page fault for which the userspace page tables do not contain a present mapping. --- arch/x86/kvm/mmu/mmu.c | 33 +++++++++++++++++++++++++-------- arch/x86/kvm/x86.c | 1 + 2 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5e0140db384f6..68bc4ab2bd942 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3214,7 +3214,9 @@ static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn) send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, PAGE_SHIFT, current); } -static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +static int kvm_handle_error_pfn( + struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, + bool faulted_on_absent_mapping) { if (is_sigpending_pfn(fault->pfn)) { kvm_handle_signal_exit(vcpu); @@ -3234,7 +3236,11 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa return RET_PF_RETRY; } - return -EFAULT; + return kvm_memfault_exit_or_efault( + vcpu, fault->gfn * PAGE_SIZE, PAGE_SIZE, + faulted_on_absent_mapping + ? KVM_MEMFAULT_REASON_ABSENT_MAPPING + : KVM_MEMFAULT_REASON_UNKNOWN); } static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, @@ -4209,7 +4215,9 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); } -static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +static int __kvm_faultin_pfn( + struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, + bool fault_on_absent_mapping) { struct kvm_memory_slot *slot = fault->slot; bool async; @@ -4242,9 +4250,15 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault } async = false; - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, - fault->write, &fault->map_writable, - &fault->hva); + + fault->pfn = __gfn_to_pfn_memslot( + slot, fault->gfn, + fault_on_absent_mapping, + false, + fault_on_absent_mapping ? NULL : &async, + fault->write, &fault->map_writable, + &fault->hva); + if (!async) return RET_PF_CONTINUE; /* *pfn has correct page already */ @@ -4274,16 +4288,19 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, unsigned int access) { int ret; + bool fault_on_absent_mapping + = likely(fault->slot) && kvm_slot_fault_on_absent_mapping(fault->slot); fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq; smp_rmb(); - ret = __kvm_faultin_pfn(vcpu, fault); + ret = __kvm_faultin_pfn( + vcpu, fault, fault_on_absent_mapping); if (ret != RET_PF_CONTINUE) return ret; if (unlikely(is_error_pfn(fault->pfn))) - return kvm_handle_error_pfn(vcpu, fault); + return kvm_handle_error_pfn(vcpu, fault, fault_on_absent_mapping); if (unlikely(!fault->slot)) return kvm_handle_noslot_fault(vcpu, fault, access); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b3c1b2f57e680..41435324b41d7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4426,6 +4426,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ENABLE_CAP: case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: case KVM_CAP_X86_MEMORY_FAULT_EXIT: + case KVM_CAP_MEMORY_FAULT_NOWAIT: r = 1; break; case KVM_CAP_EXIT_HYPERCALL: From patchwork Wed Mar 15 02:17:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175224 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1E80C6FD1F for ; Wed, 15 Mar 2023 02:18:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230445AbjCOCS2 (ORCPT ); Tue, 14 Mar 2023 22:18:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230395AbjCOCSQ (ORCPT ); Tue, 14 Mar 2023 22:18:16 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B1532E805 for ; Tue, 14 Mar 2023 19:18:11 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-544570e6d82so54176987b3.23 for ; Tue, 14 Mar 2023 19:18:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846691; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nTfFnzlVbgHA4yLi91OjwVvIMZq5YobnQNy+0t5Y1os=; b=SbQbTgauOj8SF/qThrELRob6CK7088fdCcev5pgnuA+5goVZf/VLoIpx7OqUGayKlM e4uKHpvIAmfutRl7l641AWGwe/LvcioX6Gu9AnE3aBroPtzBXj7CXTmG+UypGqBOFiSy X9QwL1TCMSgRYe5+MpXevQNvqfgeB2qhMpyTi1n+r0arQEJ1jBDyK3p0IgFqDsLOn/D3 fwCUzve3AAjtB/4hP0kX6SzWFsoMCR9sKalBDY19z1Z/5CWY8MyqE6dJkP2RebhTJhEr 7ygnzdd0diBDTbe7lzTSfsS3L0CFipjN2K1Oyk6M8OiH9r1RH0ibxntzaitJ1+gFlyZY TZ8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846691; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nTfFnzlVbgHA4yLi91OjwVvIMZq5YobnQNy+0t5Y1os=; b=fHcuq7Irq+oTkJ3Q1F9AwueCieB7Ba5jVyYRlH3tKtQbYXL4xA1p6c1Pc+0UviWCii Bg+cJM6YT3yM5dgXiawFT01I7e1m0Nu/wcKfqx06/Ta+gf32jYig7jvxNQwO5yuws5gh c3jBZpWCMHxuF3+YCgHPZMhj82o8rAs0Zfl337w8QkBp7osOnC4n/ENM/RIoJ6WkJ5a1 WS0O0Y/xG5qOAx1uO5OqXeVyrhORWXaVOYxxJar0p1T2uM292KFhgveFfvpc6JBXEWJS QvkZr0m75a5EQCDvQFO8pQ7ESbZYXh7G5kzIkSswwDoHcsEAr0RE9uTIbmPWVFlXGzkR tdPw== X-Gm-Message-State: AO0yUKW7QnTEDu8Zz0w7dmDAxV4r8JM5x036kKfMj7IxspejhMRvZxyQ V/YFgdggLhAo/Mi7mEycioWsDqliyM7KwA== X-Google-Smtp-Source: AK7set8/aAE7Rd9xJfziTRYdmCtUeLmMv5J5T3P62J+/VebncMoCBFFRoUNLdvxtoxYaShNuBQDN1UFjPf67ZQ== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a05:6902:728:b0:b48:5eaa:a804 with SMTP id l8-20020a056902072800b00b485eaaa804mr1758041ybt.0.1678846690946; Tue, 14 Mar 2023 19:18:10 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:35 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-12-amoorthy@google.com> Subject: [WIP Patch v2 11/14] KVM: arm64: Allow user_mem_abort to return 0 to signal a 'normal' exit From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org kvm_handle_guest_abort currently just returns 1 if user_mem_abort returns 0. Since 1 is the "resume the guest" code, user_mem_abort is essentially incapable of triggering a "normal" exit: it can only trigger exits by returning a negative value, which indicates an error. Remove the "if (ret == 0) ret = 1;" statement from kvm_handle_guest_abort and refactor user_mem_abort slightly to allow it to trigger 'normal' exits by returning 0. Signed-off-by: Anish Moorthy --- arch/arm64/kvm/mmu.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 7113587222ffe..735044859eb25 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1190,7 +1190,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long hva, unsigned long fault_status) { - int ret = 0; + int ret = 1; bool write_fault, writable, force_pte = false; bool exec_fault; bool device = false; @@ -1281,8 +1281,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, (logging_active && write_fault)) { ret = kvm_mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm)); - if (ret) + if (ret < 0) return ret; + else + ret = 1; } mmu_seq = vcpu->kvm->mmu_invalidate_seq; @@ -1305,7 +1307,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, write_fault, &writable, NULL); if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); - return 0; + return 1; } if (is_error_noslot_pfn(pfn)) return -EFAULT; @@ -1387,6 +1389,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED); + if (ret == 0) + ret = 1; + /* Mark the page dirty only if the fault is handled successfully */ if (writable && !ret) { kvm_set_pfn_dirty(pfn); @@ -1397,7 +1402,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, read_unlock(&kvm->mmu_lock); kvm_set_pfn_accessed(pfn); kvm_release_pfn_clean(pfn); - return ret != -EAGAIN ? ret : 0; + return ret != -EAGAIN ? ret : 1; } /* Resolve the access fault by making the page young again. */ @@ -1549,8 +1554,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) } ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status); - if (ret == 0) - ret = 1; out: if (ret == -ENOEXEC) { kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu)); From patchwork Wed Mar 15 02:17:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175222 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3F8AC6FD1D for ; Wed, 15 Mar 2023 02:18:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230451AbjCOCSY (ORCPT ); Tue, 14 Mar 2023 22:18:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230434AbjCOCSP (ORCPT ); Tue, 14 Mar 2023 22:18:15 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B51B2ED57 for ; Tue, 14 Mar 2023 19:18:12 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-54476ef9caeso11770097b3.6 for ; Tue, 14 Mar 2023 19:18:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846691; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=R7PbkkBFGhCnK5frH2DUXaaJljM3hiPgZ3cGoBNbkII=; b=Kts1MUs2Prw/gLPbxugZ4kB0qWpj6Rn94SuTJ8epb2RjFhorZC2vo7Ji/P92V2sbYW CyW7u2jy+ih9/5Vc4klfO+VRJyPYmeNmP4BOzDRGfKNNi1iobVo7AAxptHq0sqjiqjrY ulUuHgE0rpBYlElLFMUkQZA6nDq8Da/3tBsLjg3B69EpJuuhYDVKOGpHiVdURJ7M71s/ KiIfnly5Cf/qJ+r5gs80fKjZMMEohv/rOwW6Bv6lX/hIoYraH69H2DGJAZ+Y5XiO2wQa m5S3dawN4LUyfgp4xkmLS3iPeNjRB5YDwgUVlac5NY7zJsOJvWAx/yinsmA3nAR1CtCX m3dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846691; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=R7PbkkBFGhCnK5frH2DUXaaJljM3hiPgZ3cGoBNbkII=; b=KgqXD0GoQyNAV3CWLvy04+8kW5EvKaUaDbU6kNwjAii59lREJZx/gSNkWLhXwZdeJ7 4xAHbcaKHfp+wGsvfEHq3fsE5tHh+YD4Xdw9lioVY3ltuYOL7h6lt9jqBpG5WVY70/cv NF57ZaajlOlA3GzUQBS+0orXkyNirakD2W0EofaMdNJtH8QpuisbiT7hclnWRMnMKroW J9S2DY40ACh6e+BbLGCQT6EkZ/lSuHQhN+nK6QtGU4f0m5LzNx/jBRQv3ZxC6NF5+hd+ vXfFAKzZizHyMWGMgmWueQIMAZxPg8OFyYEzN1wYcvPECKTKhqxbipzSAcCA7aQFnyOQ 0j1g== X-Gm-Message-State: AO0yUKWIQ6hRLtQYig7JQhoqpnmZlQrfkuQOfA3iretoAXegb6euSpq8 aYXL4kNSh8GkcQxBrUKCg51lqCKsh4+OYQ== X-Google-Smtp-Source: AK7set9AJwkbREOX0A5bda+owuTYtXQ9O0hh2FTBNC++1Iw7pbTeuYgVBi5v/OnQ1FtY+85j06UfWLTXZTFwbw== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a05:6902:1024:b0:b4a:3896:bc17 with SMTP id x4-20020a056902102400b00b4a3896bc17mr1226659ybt.0.1678846691855; Tue, 14 Mar 2023 19:18:11 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:36 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-13-amoorthy@google.com> Subject: [WIP Patch v2 12/14] KVM: arm64: Implement KVM_CAP_MEMORY_FAULT_NOWAIT From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When a memslot has the KVM_MEM_MEMORY_FAULT_EXIT flag set, exit to userspace upon encountering a page fault for which the userspace page tables do not contain a present mapping. Signed-off-by: Anish Moorthy Acked-by: James Houghton --- arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/mmu.c | 14 ++++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 3bd732eaf0872..f8337e757c777 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -220,6 +220,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VCPU_ATTRIBUTES: case KVM_CAP_PTP_KVM: case KVM_CAP_ARM_SYSTEM_SUSPEND: + case KVM_CAP_MEMORY_FAULT_NOWAIT: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 735044859eb25..0d04ffc81f783 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1206,6 +1206,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, unsigned long vma_pagesize, fault_granule; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; struct kvm_pgtable *pgt; + bool exit_on_memory_fault = kvm_slot_fault_on_absent_mapping(memslot); fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level); write_fault = kvm_is_write_fault(vcpu); @@ -1303,8 +1304,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ smp_rmb(); - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - write_fault, &writable, NULL); + pfn = __gfn_to_pfn_memslot( + memslot, gfn, exit_on_memory_fault, false, NULL, + write_fault, &writable, NULL); + + if (exit_on_memory_fault && pfn == KVM_PFN_ERR_FAULT) { + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; + vcpu->run->memory_fault.flags = 0; + vcpu->run->memory_fault.gpa = gfn << PAGE_SHIFT; + vcpu->run->memory_fault.len = vma_pagesize; + return 0; + } if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); return 1; From patchwork Wed Mar 15 02:17:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175223 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99D47C7618A for ; Wed, 15 Mar 2023 02:18:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230293AbjCOCS0 (ORCPT ); Tue, 14 Mar 2023 22:18:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230445AbjCOCSP (ORCPT ); Tue, 14 Mar 2023 22:18:15 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F8AD1A97F for ; Tue, 14 Mar 2023 19:18:13 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-54352648c1eso67494057b3.9 for ; Tue, 14 Mar 2023 19:18:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846692; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mEznPiSbaCiL3Is+YrkLXyxUf5ngMg4h6AMinScVsr8=; b=rK9UyswBaIu0UH0YhRLrss1oWLImqUxJ0kHaA+Mvi9LLmgXQ8qZzzPlEOrqKt9Yna8 LnrzZy7Yp+yAlVR/xToeAVB12A6L4//vPZvAY6NeZV77Rl3Hl4k4ar6TfaBA+tDFl1Kp 8VQZZOhqibGW6RGfQJAbpDG8Ff/uEiKpr90OPsh2vYHmT2vK/pkVfshiw1WjfQO+hvyt YpwsR6rWKLUpelimw6xPxwC2kTnoRAU/Yli+8c+xrLT41YOVnDAHoQ3z34qL1efnEwzT BRk382sCgQd8YkpYGsGUsVx4ATciCbKISj07oPASI4ThK3RCsv39RhmGDFVYzyJgarsZ wdBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846692; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mEznPiSbaCiL3Is+YrkLXyxUf5ngMg4h6AMinScVsr8=; b=emSvwqohUG9dp+1LvHnyWs0mselU7Ofrw5pxzbUfqgAO6sg4emBohsM74yOhs4l1om 3UFAa6ORzcQKqDgl/PBg5Z9GtXs+w/Qosx9658Ld6nWzHRWD+C4ojm0UqmFCOgJiy/ba 7iG+w4WuTr4r/v0+eZzm4BUC7lCC/lVb3HesToK0SBPzpF1oAvPGINSc2Zz+UMM6RFTa i5bUQudpPK/zDa1BJh6jHQbDcIxXqxXW0SB4cLo/8aw9Wd6NaIaPX24JujR6Ld2jVGAJ lu6zq6Saq4lGmgmWNeav9946GYbCiqU906RX2H+iXO/1fEtVakh1ntuWB4JlvkyWOXPj tFdg== X-Gm-Message-State: AO0yUKVJzjwj/dVZIfWD6WEG74KT1XFJ0mA+CLLKQJYoOsEZrboOEw75 jkRYSYs0UHq6zecoqGcuhTroin22XMMrqw== X-Google-Smtp-Source: AK7set/RdS/9R9oHQI5q5i/HXDVp2pGT2xrQzO3YHU5x6YhhiA3qZI47xn5heoBvXFf5xn7ePttCWCrPr+1gPg== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a81:a782:0:b0:541:6763:3ce1 with SMTP id e124-20020a81a782000000b0054167633ce1mr8809567ywh.2.1678846692636; Tue, 14 Mar 2023 19:18:12 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:37 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-14-amoorthy@google.com> Subject: [WIP Patch v2 13/14] KVM: selftests: Add memslot_flags parameter to memstress_create_vm From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Memslot flags aren't currently exposed to the tests, and are just always set to 0. Add a parameter to allow tests to manually set those flags. Signed-off-by: Anish Moorthy --- tools/testing/selftests/kvm/access_tracking_perf_test.c | 2 +- tools/testing/selftests/kvm/demand_paging_test.c | 6 ++++-- tools/testing/selftests/kvm/dirty_log_perf_test.c | 2 +- tools/testing/selftests/kvm/include/memstress.h | 2 +- tools/testing/selftests/kvm/lib/memstress.c | 4 ++-- .../selftests/kvm/memslot_modification_stress_test.c | 2 +- 6 files changed, 10 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c index 3c7defd34f567..b51656b408b83 100644 --- a/tools/testing/selftests/kvm/access_tracking_perf_test.c +++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c @@ -306,7 +306,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) struct kvm_vm *vm; int nr_vcpus = params->nr_vcpus; - vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, + vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, 0, params->backing_src, !overlap_memory_access); memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main); diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index f8c1831614a9d..607cd2846e39c 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -146,8 +146,10 @@ static void run_test(enum vm_guest_mode mode, void *arg) int i, num_uffds = 0; uint64_t uffd_region_size; - vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, - p->src_type, p->partition_vcpu_memory_access); + vm = memstress_create_vm( + mode, nr_vcpus, guest_percpu_mem_size, + 1, 0, + p->src_type, p->partition_vcpu_memory_access); demand_paging_size = get_backing_src_pagesz(p->src_type); diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c index e9d6d1aecf89c..6c8749193cfa4 100644 --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c @@ -224,7 +224,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) int i; vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, - p->slots, p->backing_src, + p->slots, 0, p->backing_src, p->partition_vcpu_memory_access); pr_info("Random seed: %u\n", p->random_seed); diff --git a/tools/testing/selftests/kvm/include/memstress.h b/tools/testing/selftests/kvm/include/memstress.h index 72e3e358ef7bd..1cba965d2d331 100644 --- a/tools/testing/selftests/kvm/include/memstress.h +++ b/tools/testing/selftests/kvm/include/memstress.h @@ -56,7 +56,7 @@ struct memstress_args { extern struct memstress_args memstress_args; struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus, - uint64_t vcpu_memory_bytes, int slots, + uint64_t vcpu_memory_bytes, int slots, uint32_t slot_flags, enum vm_mem_backing_src_type backing_src, bool partition_vcpu_memory_access); void memstress_destroy_vm(struct kvm_vm *vm); diff --git a/tools/testing/selftests/kvm/lib/memstress.c b/tools/testing/selftests/kvm/lib/memstress.c index 5f1d3173c238c..7589b8cef6911 100644 --- a/tools/testing/selftests/kvm/lib/memstress.c +++ b/tools/testing/selftests/kvm/lib/memstress.c @@ -119,7 +119,7 @@ void memstress_setup_vcpus(struct kvm_vm *vm, int nr_vcpus, } struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus, - uint64_t vcpu_memory_bytes, int slots, + uint64_t vcpu_memory_bytes, int slots, uint32_t slot_flags, enum vm_mem_backing_src_type backing_src, bool partition_vcpu_memory_access) { @@ -207,7 +207,7 @@ struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus, vm_userspace_mem_region_add(vm, backing_src, region_start, MEMSTRESS_MEM_SLOT_INDEX + i, - region_pages, 0); + region_pages, slot_flags); } /* Do mapping for the demand paging memory slot */ diff --git a/tools/testing/selftests/kvm/memslot_modification_stress_test.c b/tools/testing/selftests/kvm/memslot_modification_stress_test.c index 9855c41ca811f..0b19ec3ecc9cc 100644 --- a/tools/testing/selftests/kvm/memslot_modification_stress_test.c +++ b/tools/testing/selftests/kvm/memslot_modification_stress_test.c @@ -95,7 +95,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) struct test_params *p = arg; struct kvm_vm *vm; - vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, + vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1, 0, VM_MEM_SRC_ANONYMOUS, p->partition_vcpu_memory_access); From patchwork Wed Mar 15 02:17:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anish Moorthy X-Patchwork-Id: 13175225 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7BC4C6FD1D for ; Wed, 15 Mar 2023 02:18:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230484AbjCOCSb (ORCPT ); Tue, 14 Mar 2023 22:18:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230455AbjCOCST (ORCPT ); Tue, 14 Mar 2023 22:18:19 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 452DE1B559 for ; Tue, 14 Mar 2023 19:18:14 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-541942bfdccso88090717b3.14 for ; Tue, 14 Mar 2023 19:18:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678846693; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2m1v7ilwqJ4XQC2d/8mjjmsvnkBF+41jXI4NYltR25U=; b=Kksu2t5hJb0/PxJm60kW0E92hmVYBTwuvxPt9PIw4Rp/9UQdAjo/gGiaf6XdT1+Miw 3gNnJx9+ygBs6H+jUOvTClFLEw8Bd1TsGA5sqIGOOWjojDeRj0fL6S/xr8DMKKB77ITy Nm4y3YZjgYRz3wDCWDSLC9us2FenIMwZvG/8ITBmKyJJ5S7sV5DaCVYFy2LB2wZqZ7pF htQ68X1WfuSnifoqj6tMdaGqAA6ERRwNwuAoYAPRKQ24fAqFUBxy5eK4NYr7BKUW28vH Un1YNTPA8nK46WUWZiaB0UIFC6soerLPykrxuDxauI2T/nyG5FZDste2abLBXwfL1m2B bR2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678846693; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2m1v7ilwqJ4XQC2d/8mjjmsvnkBF+41jXI4NYltR25U=; b=6Nj2YZNcKRXe8Cfyz4BVSOjQC5gdBuUZ6x7mMTBWD0TFLpg/i4+OtzrKS7WJ4o6rDZ bmwgJvc/T5JsodZcdXVmlzs8Ky+gxawSQ3FC9zyW+Xv9XDmF8IuMwOJJSUVBjb14PlGo FZ0fv2WZ5Q+3eAr/qEwFWM/3jm5673xIYdwMJEEuEflibjpWxX2dPrnx1v5xtFdr59Dx Y3uXD/gBmlePS5RT87Vh+KJ41fnGQbRSZ64tldBTKcwFBlYfjuAM4kYNQ3osmlBMqMxQ VrRvA9HfXIYwYF8fh2ZxwjwzvTcPqHxZI1wU9TFmw8b7DQp97HqB+a5Po0CoZwK2bVyO npRQ== X-Gm-Message-State: AO0yUKXZZcxBSIvmpcghTfI6vJ2cxvA+W/wldiF+Hq0OFQ5RJhQF8x/4 qLzAO9kfVqons1CFQTYiuEPl49F+06U/fw== X-Google-Smtp-Source: AK7set9nA6v66x5sjHMUs9juzRdmx6r4W3klJG3LDqWHwYcbz5jxitlqbdlBaoo2FPYe84pyXt9yzc112rieog== X-Received: from laogai.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:2c9]) (user=amoorthy job=sendgmr) by 2002:a5b:542:0:b0:a67:c976:c910 with SMTP id r2-20020a5b0542000000b00a67c976c910mr20080788ybp.7.1678846693543; Tue, 14 Mar 2023 19:18:13 -0700 (PDT) Date: Wed, 15 Mar 2023 02:17:38 +0000 In-Reply-To: <20230315021738.1151386-1-amoorthy@google.com> Mime-Version: 1.0 References: <20230315021738.1151386-1-amoorthy@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230315021738.1151386-15-amoorthy@google.com> Subject: [WIP Patch v2 14/14] KVM: selftests: Handle memory fault exits in demand_paging_test From: Anish Moorthy To: seanjc@google.com Cc: jthoughton@google.com, kvm@vger.kernel.org, Anish Moorthy Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Demonstrate a (very basic) scheme for supporting memory fault exits. From the vCPU threads: 1. Simply issue UFFDIO_COPY/CONTINUEs in response to memory fault exits, with the purpose of establishing the absent mappings. Do so with wake_waiters=false to avoid serializing on the userfaultfd wait queue locks. 2. When the UFFDIO_COPY/CONTINUE in (1) fails with EEXIST, assume that the mapping was already established but is currently absent [A] and attempt to populate it using MADV_POPULATE_WRITE. Issue UFFDIO_COPY/CONTINUEs from the reader threads as well, but with wake_waiters=true to ensure that any threads sleeping on the uffd are eventually woken up. A real VMM would track whether it had already COPY/CONTINUEd pages (eg, via a bitmap) to avoid calls destined to EEXIST. However, even the naive approach is enough to demonstrate the performance advantages of KVM_EXIT_MEMORY_FAULT. [A] In reality it is much likelier that the vCPU thread simply lost a race to establish the mapping for the page. Signed-off-by: Anish Moorthy Acked-by: James Houghton --- .../selftests/kvm/demand_paging_test.c | 220 +++++++++++++----- 1 file changed, 164 insertions(+), 56 deletions(-) diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 607cd2846e39c..dce72adcb1632 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include "kvm_util.h" @@ -31,6 +32,60 @@ static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE; static size_t demand_paging_size; static char *guest_data_prototype; +static int num_uffds; +static size_t uffd_region_size; +static struct uffd_desc **uffd_descs; +/* + * Delay when demand paging is performed through userfaultfd or directly by + * vcpu_worker in the case of a KVM_EXIT_MEMORY_FAULT. + */ +static useconds_t uffd_delay; +static int uffd_mode; + + +static int handle_uffd_page_request( + int uffd_mode, int uffd, uint64_t hva, bool is_vcpu +); + +static void madv_write_or_err(uint64_t gpa) +{ + int r; + void *hva = addr_gpa2hva(memstress_args.vm, gpa); + + r = madvise(hva, demand_paging_size, MADV_POPULATE_WRITE); + TEST_ASSERT( + r == 0, + "MADV_POPULATE_WRITE on hva 0x%lx (gpa 0x%lx) failed with errno %i\n", + (uintptr_t) hva, gpa, errno); +} + +static void ready_page(uint64_t gpa) +{ + int r, uffd; + + /* + * This test only registers memslot 1 w/ userfaultfd. Any accesses outside + * the registered ranges should fault in the physical pages through + * MADV_POPULATE_WRITE. + */ + if ((gpa < memstress_args.gpa) + || (gpa >= memstress_args.gpa + memstress_args.size)) { + madv_write_or_err(gpa); + } else { + if (uffd_delay) + usleep(uffd_delay); + + uffd = uffd_descs[(gpa - memstress_args.gpa) / uffd_region_size]->uffd; + + r = handle_uffd_page_request( + uffd_mode, uffd, + (uint64_t) addr_gpa2hva(memstress_args.vm, gpa), true); + + if (r == EEXIST) + madv_write_or_err(gpa); + } +} + static void vcpu_worker(struct memstress_vcpu_args *vcpu_args) { struct kvm_vcpu *vcpu = vcpu_args->vcpu; @@ -42,25 +97,37 @@ static void vcpu_worker(struct memstress_vcpu_args *vcpu_args) clock_gettime(CLOCK_MONOTONIC, &start); - /* Let the guest access its memory */ - ret = _vcpu_run(vcpu); - TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); - if (get_ucall(vcpu, NULL) != UCALL_SYNC) { - TEST_ASSERT(false, - "Invalid guest sync status: exit_reason=%s\n", - exit_reason_str(run->exit_reason)); - } + while (true) { + /* Let the guest access its memory */ + ret = _vcpu_run(vcpu); + TEST_ASSERT(ret == 0 || (run->exit_reason == KVM_EXIT_MEMORY_FAULT), + "vcpu_run failed: %d\n", ret); + if (get_ucall(vcpu, NULL) != UCALL_SYNC) { + + if (run->exit_reason == KVM_EXIT_MEMORY_FAULT) { + TEST_ASSERT(run->memory_fault.flags == 0, + "Unrecognized flags 0x%llx on memory fault exit", + run->memory_fault.flags); + ready_page(run->memory_fault.gpa); + continue; + } + + TEST_ASSERT(false, + "Invalid guest sync status: exit_reason=%s\n", + exit_reason_str(run->exit_reason)); + } - ts_diff = timespec_elapsed(start); - PER_VCPU_DEBUG("vCPU %d execution time: %ld.%.9lds\n", vcpu_idx, - ts_diff.tv_sec, ts_diff.tv_nsec); + ts_diff = timespec_elapsed(start); + PER_VCPU_DEBUG("vCPU %d execution time: %ld.%.9lds\n", vcpu_idx, + ts_diff.tv_sec, ts_diff.tv_nsec); + break; + } } -static int handle_uffd_page_request(int uffd_mode, int uffd, - struct uffd_msg *msg) +static int handle_uffd_page_request( + int uffd_mode, int uffd, uint64_t hva, bool is_vcpu) { pid_t tid = syscall(__NR_gettid); - uint64_t addr = msg->arg.pagefault.address; struct timespec start; struct timespec ts_diff; int r; @@ -71,58 +138,81 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, struct uffdio_copy copy; copy.src = (uint64_t)guest_data_prototype; - copy.dst = addr; + copy.dst = hva; copy.len = demand_paging_size; - copy.mode = 0; + copy.mode = UFFDIO_COPY_MODE_DONTWAKE; - r = ioctl(uffd, UFFDIO_COPY, ©); /* - * With multiple vCPU threads fault on a single page and there are - * multiple readers for the UFFD, at least one of the UFFDIO_COPYs - * will fail with EEXIST: handle that case without signaling an - * error. + * With multiple vCPU threads and at least one of multiple reader threads + * or vCPU memory faults, multiple vCPUs accessing an absent page will + * almost certainly cause some thread doing the UFFDIO_COPY here to get + * EEXIST: make sure to allow that case. */ - if (r == -1 && errno != EEXIST) { - pr_info( - "Failed UFFDIO_COPY in 0x%lx from thread %d, errno = %d\n", - addr, tid, errno); - return r; - } + r = ioctl(uffd, UFFDIO_COPY, ©); + TEST_ASSERT( + r == 0 || errno == EEXIST, + "Thread 0x%x failed UFFDIO_COPY on hva 0x%lx, errno = %d", + gettid(), hva, errno); } else if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR) { + /* The comments in the UFFDIO_COPY branch also apply here. */ struct uffdio_continue cont = {0}; - cont.range.start = addr; + cont.range.start = hva; cont.range.len = demand_paging_size; + cont.mode = UFFDIO_CONTINUE_MODE_DONTWAKE; r = ioctl(uffd, UFFDIO_CONTINUE, &cont); - /* See the note about EEXISTs in the UFFDIO_COPY branch. */ - if (r == -1 && errno != EEXIST) { - pr_info( - "Failed UFFDIO_CONTINUE in 0x%lx from thread %d, errno = %d\n", - addr, tid, errno); - return r; - } + TEST_ASSERT( + r == 0 || errno == EEXIST, + "Thread 0x%x failed UFFDIO_CONTINUE on hva 0x%lx, errno = %d", + gettid(), hva, errno); } else { TEST_FAIL("Invalid uffd mode %d", uffd_mode); } + /* + * If the above UFFDIO_COPY/CONTINUE fails with EEXIST, it will do so without + * waking threads waiting on the UFFD: make sure that happens here. + */ + if (!is_vcpu) { + struct uffdio_range range = { + .start = hva, + .len = demand_paging_size + }; + r = ioctl(uffd, UFFDIO_WAKE, &range); + TEST_ASSERT( + r == 0, + "Thread 0x%x failed UFFDIO_WAKE on hva 0x%lx, errno = %d", + gettid(), hva, errno); + } + ts_diff = timespec_elapsed(start); PER_PAGE_DEBUG("UFFD page-in %d \t%ld ns\n", tid, timespec_to_ns(ts_diff)); PER_PAGE_DEBUG("Paged in %ld bytes at 0x%lx from thread %d\n", - demand_paging_size, addr, tid); + demand_paging_size, hva, tid); return 0; } +static int handle_uffd_page_request_from_uffd( + int uffd_mode, int uffd, struct uffd_msg *msg) +{ + TEST_ASSERT( + msg->event == UFFD_EVENT_PAGEFAULT, + "Received uffd message with event %d != UFFD_EVENT_PAGEFAULT", + msg->event); + return handle_uffd_page_request( + uffd_mode, uffd, msg->arg.pagefault.address, false); +} + struct test_params { - int uffd_mode; bool single_uffd; - useconds_t uffd_delay; int readers_per_uffd; enum vm_mem_backing_src_type src_type; bool partition_vcpu_memory_access; + bool memfault_exits; }; static void prefault_mem(void *alias, uint64_t len) @@ -139,18 +229,31 @@ static void prefault_mem(void *alias, uint64_t len) static void run_test(enum vm_guest_mode mode, void *arg) { struct test_params *p = arg; - struct uffd_desc **uffd_descs = NULL; struct timespec start; struct timespec ts_diff; struct kvm_vm *vm; - int i, num_uffds = 0; - uint64_t uffd_region_size; + int i; + uint32_t slot_flags = 0; + bool uffd_memfault_exits = uffd_mode && p->memfault_exits; + + if (uffd_memfault_exits) { + TEST_ASSERT(kvm_has_cap(KVM_CAP_MEMORY_FAULT_NOWAIT) > 0, + "KVM does not have KVM_CAP_MEMORY_FAULT_NOWAIT"); + slot_flags = KVM_MEM_ABSENT_MAPPING_FAULT; + } vm = memstress_create_vm( mode, nr_vcpus, guest_percpu_mem_size, - 1, 0, + 1, slot_flags, p->src_type, p->partition_vcpu_memory_access); + if (uffd_memfault_exits) { + if (kvm_has_cap(KVM_CAP_X86_MEMORY_FAULT_EXIT)) + vm_enable_cap( + vm, KVM_CAP_X86_MEMORY_FAULT_EXIT, + KVM_MEMFAULT_REASON_ABSENT_MAPPING); + } + demand_paging_size = get_backing_src_pagesz(p->src_type); guest_data_prototype = malloc(demand_paging_size); @@ -158,12 +261,12 @@ static void run_test(enum vm_guest_mode mode, void *arg) "Failed to allocate buffer for guest data pattern"); memset(guest_data_prototype, 0xAB, demand_paging_size); - if (p->uffd_mode) { + if (uffd_mode) { num_uffds = p->single_uffd ? 1 : nr_vcpus; uffd_region_size = nr_vcpus * guest_percpu_mem_size / num_uffds; uffd_descs = malloc(num_uffds * sizeof(struct uffd_desc *)); - TEST_ASSERT(uffd_descs, "Memory allocation failed"); + TEST_ASSERT(uffd_descs, "Failed to allocate memory of uffd descriptors"); for (i = 0; i < num_uffds; i++) { struct memstress_vcpu_args *vcpu_args; @@ -183,10 +286,10 @@ static void run_test(enum vm_guest_mode mode, void *arg) * requests. */ uffd_descs[i] = uffd_setup_demand_paging( - p->uffd_mode, p->uffd_delay, vcpu_hva, + uffd_mode, uffd_delay, vcpu_hva, uffd_region_size, p->readers_per_uffd, - &handle_uffd_page_request); + &handle_uffd_page_request_from_uffd); } } @@ -200,7 +303,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) ts_diff = timespec_elapsed(start); pr_info("All vCPU threads joined\n"); - if (p->uffd_mode) { + if (uffd_mode) { /* Tell the user fault fd handler threads to quit */ for (i = 0; i < num_uffds; i++) uffd_stop_demand_paging(uffd_descs[i]); @@ -215,7 +318,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) memstress_destroy_vm(vm); free(guest_data_prototype); - if (p->uffd_mode) + if (uffd_mode) free(uffd_descs); } @@ -224,7 +327,7 @@ static void help(char *name) puts(""); printf("usage: %s [-h] [-m vm_mode] [-u uffd_mode] [-a]\n" " [-d uffd_delay_usec] [-r readers_per_uffd] [-b memory]\n" - " [-s type] [-v vcpus] [-o]\n", name); + " [-w] [-s type] [-v vcpus] [-o]\n", name); guest_modes_help(); printf(" -u: use userfaultfd to handle vCPU page faults. Mode is a\n" " UFFD registration mode: 'MISSING' or 'MINOR'.\n"); @@ -235,6 +338,7 @@ static void help(char *name) " FD handler to simulate demand paging\n" " overheads. Ignored without -u.\n"); printf(" -r: Set the number of reader threads per uffd.\n"); + printf(" -w: Enable kvm cap for memory fault exits.\n"); printf(" -b: specify the size of the memory region which should be\n" " demand paged by each vCPU. e.g. 10M or 3G.\n" " Default: 1G\n"); @@ -254,29 +358,30 @@ int main(int argc, char *argv[]) .partition_vcpu_memory_access = true, .readers_per_uffd = 1, .single_uffd = false, + .memfault_exits = false, }; int opt; guest_modes_append_default(); - while ((opt = getopt(argc, argv, "ahom:u:d:b:s:v:r:")) != -1) { + while ((opt = getopt(argc, argv, "ahowm:u:d:b:s:v:r:")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); break; case 'u': if (!strcmp("MISSING", optarg)) - p.uffd_mode = UFFDIO_REGISTER_MODE_MISSING; + uffd_mode = UFFDIO_REGISTER_MODE_MISSING; else if (!strcmp("MINOR", optarg)) - p.uffd_mode = UFFDIO_REGISTER_MODE_MINOR; - TEST_ASSERT(p.uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'."); + uffd_mode = UFFDIO_REGISTER_MODE_MINOR; + TEST_ASSERT(uffd_mode, "UFFD mode must be 'MISSING' or 'MINOR'."); break; case 'a': p.single_uffd = true; break; case 'd': - p.uffd_delay = strtoul(optarg, NULL, 0); - TEST_ASSERT(p.uffd_delay >= 0, "A negative UFFD delay is not supported."); + uffd_delay = strtoul(optarg, NULL, 0); + TEST_ASSERT(uffd_delay >= 0, "A negative UFFD delay is not supported."); break; case 'b': guest_percpu_mem_size = parse_size(optarg); @@ -299,6 +404,9 @@ int main(int argc, char *argv[]) "Invalid number of readers per uffd %d: must be >=1", p.readers_per_uffd); break; + case 'w': + p.memfault_exits = true; + break; case 'h': default: help(argv[0]); @@ -306,7 +414,7 @@ int main(int argc, char *argv[]) } } - if (p.uffd_mode == UFFDIO_REGISTER_MODE_MINOR && + if (uffd_mode == UFFDIO_REGISTER_MODE_MINOR && !backing_src_is_shared(p.src_type)) { TEST_FAIL("userfaultfd MINOR mode requires shared memory; pick a different -s"); }