From patchwork Sun Feb 21 12:04:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elena Afanasova X-Patchwork-Id: 12097295 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8434C433DB for ; Sun, 21 Feb 2021 12:12:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A791864E4E for ; Sun, 21 Feb 2021 12:12:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229918AbhBUMLh (ORCPT ); Sun, 21 Feb 2021 07:11:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229663AbhBUMLf (ORCPT ); Sun, 21 Feb 2021 07:11:35 -0500 Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com [IPv6:2a00:1450:4864:20::230]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D38CCC061786 for ; Sun, 21 Feb 2021 04:10:54 -0800 (PST) Received: by mail-lj1-x230.google.com with SMTP id e8so8398376ljj.5 for ; Sun, 21 Feb 2021 04:10:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zrgjYZbuj6EoVl0nOpsXCBS/JrXJCWU4RgXnmNNrQDU=; b=CaoPb27blDPylwNIjJTfDZK1v94vkToTS7N6Se6nkzKZvcRg2b2d7tYQkOWi6GzRNV sE9178SELH4mjIumvbrO8C7b3iHxBKvSvhWmRiCnIqkN+kaeCB+Wkd51089oY2w6Zvwf DTt+FN/CPVtGXqJ/KpT9tjQ0AMy1uFXBqmM1y1LyBvS50MhG12flFEBoPTq4pBeXhWGX zU6w5dO8G8rpSNZNEudu1Zzuc2LZt1QNvwqU77KPRP+Ie4tVWwshAoX2NHQNQoomb4XN CsE044D5ripZNvZITAISEJh2JeekurHUwbXdlBn3Bvogu+igcu/h6BtmIf3Gt0n1IHkq WH3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zrgjYZbuj6EoVl0nOpsXCBS/JrXJCWU4RgXnmNNrQDU=; b=DQuffwqJevNM/Y/hMuXq8yWoIUQ698ITWQ1mrtRPPRHQCP10iBXlv0SKOPBwD+2vpH XRSQGhIPSBOKXJ3M0zw3Fmm7n0wP4of3IpbqWHzxeJ6v+6q4Ktvx58UTNq8JtFIKwqlZ 1DiORJCcEqLSeqn4NeZceEIGiI5qV6+sGIVTx3wZdcmQcVOo3gw20bEnP6ZBK9tedeva 1riTgTMHLQaSoe0LegHRM8xlLgwJyQ7/JK7KA6xxUxQJejh96hFxo7jU1OocRUpjD4Z7 hIDAIs8m9xJXy5Eu/4VXAIUJ6OtTmMLJQyBVa68a4mxVICgHk4JXByvgTihhJFjhhIdY K00w== X-Gm-Message-State: AOAM531N1miaGkUnywY5Y2eWQ8qKwvCo73sML7xExCrAdMdekc+QrZ1F Sz/E2Au5DM/FfuBIT+j0GEDhfCHTI8ek8Ba0 X-Google-Smtp-Source: ABdhPJy+zPkmJAQgqWpSnbfFTFfBvwAhOQh6mBxGfYUz9bSZkeB50FBxp5B95DJQngQ1jJFjDIFS9A== X-Received: by 2002:a05:651c:211b:: with SMTP id a27mr11037706ljq.433.1613909450332; Sun, 21 Feb 2021 04:10:50 -0800 (PST) Received: from localhost.localdomain (37-145-186-126.broadband.corbina.ru. [37.145.186.126]) by smtp.gmail.com with ESMTPSA id q6sm1547715lfn.23.2021.02.21.04.10.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Feb 2021 04:10:49 -0800 (PST) From: Elena Afanasova To: kvm@vger.kernel.org Cc: stefanha@redhat.com, jag.raman@oracle.com, elena.ufimtseva@oracle.com, pbonzini@redhat.com, jasowang@redhat.com, mst@redhat.com, cohuck@redhat.com, john.levon@nutanix.com, Elena Afanasova Subject: [RFC v3 1/5] KVM: add initial support for KVM_SET_IOREGION Date: Sun, 21 Feb 2021 15:04:37 +0300 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This vm ioctl adds or removes an ioregionfd MMIO/PIO region. Guest read and write accesses are dispatched through the given ioregionfd instead of returning from ioctl(KVM_RUN). Signed-off-by: Elena Afanasova --- v3: - add FAST_MMIO bus support - add KVM_IOREGION_DEASSIGN flag - rename kvm_ioregion read/write file descriptors arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/Makefile | 1 + arch/x86/kvm/x86.c | 1 + include/linux/kvm_host.h | 18 +++ include/uapi/linux/kvm.h | 25 ++++ virt/kvm/Kconfig | 3 + virt/kvm/eventfd.c | 25 ++++ virt/kvm/eventfd.h | 14 +++ virt/kvm/ioregion.c | 265 +++++++++++++++++++++++++++++++++++++++ virt/kvm/ioregion.h | 15 +++ virt/kvm/kvm_main.c | 11 ++ 11 files changed, 379 insertions(+) create mode 100644 virt/kvm/eventfd.h create mode 100644 virt/kvm/ioregion.c create mode 100644 virt/kvm/ioregion.h diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index f92dfd8ef10d..b914ef375199 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -33,6 +33,7 @@ config KVM select HAVE_KVM_IRQ_BYPASS select HAVE_KVM_IRQ_ROUTING select HAVE_KVM_EVENTFD + select KVM_IOREGION select KVM_ASYNC_PF select USER_RETURN_NOTIFIER select KVM_MMIO diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index b804444e16d4..b3b17dc9f7d4 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -12,6 +12,7 @@ KVM := ../../../virt/kvm kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o +kvm-$(CONFIG_KVM_IOREGION) += $(KVM)/ioregion.o kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e545a8a613b1..ddb28f5ca252 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3739,6 +3739,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_X86_USER_SPACE_MSR: case KVM_CAP_X86_MSR_FILTER: case KVM_CAP_ENFORCE_PV_FEATURE_CPUID: + case KVM_CAP_IOREGIONFD: r = 1; break; case KVM_CAP_SYNC_REGS: diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7f2e2a09ebbd..f35f0976f5cf 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -470,6 +470,11 @@ struct kvm { struct mutex resampler_lock; } irqfds; struct list_head ioeventfds; +#endif +#ifdef CONFIG_KVM_IOREGION + struct list_head ioregions_fast_mmio; + struct list_head ioregions_mmio; + struct list_head ioregions_pio; #endif struct kvm_vm_stat stat; struct kvm_arch arch; @@ -1262,6 +1267,19 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) #endif /* CONFIG_HAVE_KVM_EVENTFD */ +#ifdef CONFIG_KVM_IOREGION +void kvm_ioregionfd_init(struct kvm *kvm); +int kvm_ioregionfd(struct kvm *kvm, struct kvm_ioregion *args); + +#else + +static inline void kvm_ioregionfd_init(struct kvm *kvm) {} +static inline int kvm_ioregionfd(struct kvm *kvm, struct kvm_ioregion *args) +{ + return -ENOSYS; +} +#endif + void kvm_arch_irq_routing_update(struct kvm *kvm); static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index ca41220b40b8..a1b1a60571f8 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -732,6 +732,29 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +enum { + kvm_ioregion_flag_nr_pio, + kvm_ioregion_flag_nr_posted_writes, + kvm_ioregion_flag_nr_deassign, + kvm_ioregion_flag_nr_max, +}; + +#define KVM_IOREGION_PIO (1 << kvm_ioregion_flag_nr_pio) +#define KVM_IOREGION_POSTED_WRITES (1 << kvm_ioregion_flag_nr_posted_writes) +#define KVM_IOREGION_DEASSIGN (1 << kvm_ioregion_flag_nr_deassign) + +#define KVM_IOREGION_VALID_FLAG_MASK ((1 << kvm_ioregion_flag_nr_max) - 1) + +struct kvm_ioregion { + __u64 guest_paddr; /* guest physical address */ + __u64 memory_size; /* bytes */ + __u64 user_data; + __s32 read_fd; + __s32 write_fd; + __u32 flags; + __u8 pad[28]; +}; + #define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0) #define KVM_X86_DISABLE_EXITS_HLT (1 << 1) #define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2) @@ -1053,6 +1076,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_X86_USER_SPACE_MSR 188 #define KVM_CAP_X86_MSR_FILTER 189 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190 +#define KVM_CAP_IOREGIONFD 191 #ifdef KVM_CAP_IRQ_ROUTING @@ -1308,6 +1332,7 @@ struct kvm_vfio_spapr_tce { struct kvm_userspace_memory_region) #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) +#define KVM_SET_IOREGION _IOW(KVMIO, 0x49, struct kvm_ioregion) /* enable ucontrol for s390 */ struct kvm_s390_ucas_mapping { diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 1c37ccd5d402..5e6620bbf000 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -17,6 +17,9 @@ config HAVE_KVM_EVENTFD bool select EVENTFD +config KVM_IOREGION + bool + config KVM_MMIO bool diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c2323c27a28b..aadb73903f8b 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -27,6 +27,7 @@ #include #include +#include "ioregion.h" #ifdef CONFIG_HAVE_KVM_IRQFD @@ -755,6 +756,23 @@ static const struct kvm_io_device_ops ioeventfd_ops = { .destructor = ioeventfd_destructor, }; +#ifdef CONFIG_KVM_IOREGION +/* assumes kvm->slots_lock held */ +bool kvm_eventfd_collides(struct kvm *kvm, int bus_idx, + u64 start, u64 size) +{ + struct _ioeventfd *_p; + + list_for_each_entry(_p, &kvm->ioeventfds, list) + if (_p->bus_idx == bus_idx && + overlap(start, size, _p->addr, + !_p->length ? 8 : _p->length)) + return true; + + return false; +} +#endif + /* assumes kvm->slots_lock held */ static bool ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) @@ -770,6 +788,13 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) _p->datamatch == p->datamatch)))) return true; +#ifdef CONFIG_KVM_IOREGION + if (p->bus_idx == KVM_MMIO_BUS || p->bus_idx == KVM_PIO_BUS) + if (kvm_ioregion_collides(kvm, p->bus_idx, p->addr, + !p->length ? 8 : p->length)) + return true; +#endif + return false; } diff --git a/virt/kvm/eventfd.h b/virt/kvm/eventfd.h new file mode 100644 index 000000000000..73a621eebae3 --- /dev/null +++ b/virt/kvm/eventfd.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __KVM_EVENTFD_H__ +#define __KVM_EVENTFD_H__ + +#ifdef CONFIG_KVM_IOREGION +bool kvm_eventfd_collides(struct kvm *kvm, int bus_idx, u64 start, u64 size); +#else +static inline bool +kvm_eventfd_collides(struct kvm *kvm, int bus_idx, u64 start, u64 size) +{ + return false; +} +#endif +#endif diff --git a/virt/kvm/ioregion.c b/virt/kvm/ioregion.c new file mode 100644 index 000000000000..e09ef3e2c9d7 --- /dev/null +++ b/virt/kvm/ioregion.c @@ -0,0 +1,265 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include "eventfd.h" + +void +kvm_ioregionfd_init(struct kvm *kvm) +{ + INIT_LIST_HEAD(&kvm->ioregions_fast_mmio); + INIT_LIST_HEAD(&kvm->ioregions_mmio); + INIT_LIST_HEAD(&kvm->ioregions_pio); +} + +struct ioregion { + struct list_head list; + u64 paddr; /* guest physical address */ + u64 size; /* size in bytes */ + struct file *rf; + struct file *wf; + u64 user_data; /* opaque token used by userspace */ + struct kvm_io_device dev; + bool posted_writes; +}; + +static inline struct ioregion * +to_ioregion(struct kvm_io_device *dev) +{ + return container_of(dev, struct ioregion, dev); +} + +/* assumes kvm->slots_lock held */ +static void +ioregion_release(struct ioregion *p) +{ + if (p->rf) + fput(p->rf); + fput(p->wf); + list_del(&p->list); + kfree(p); +} + +static int +ioregion_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, + int len, void *val) +{ + return -EOPNOTSUPP; +} + +static int +ioregion_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, + int len, const void *val) +{ + return -EOPNOTSUPP; +} + +/* + * This function is called as KVM is completely shutting down. We do not + * need to worry about locking just nuke anything we have as quickly as possible + */ +static void +ioregion_destructor(struct kvm_io_device *this) +{ + struct ioregion *p = to_ioregion(this); + + ioregion_release(p); +} + +static const struct kvm_io_device_ops ioregion_ops = { + .read = ioregion_read, + .write = ioregion_write, + .destructor = ioregion_destructor, +}; + +static inline struct list_head * +get_ioregion_list(struct kvm *kvm, enum kvm_bus bus_idx) +{ + if (bus_idx == KVM_FAST_MMIO_BUS) + return &kvm->ioregions_fast_mmio; + if (bus_idx == KVM_MMIO_BUS) + return &kvm->ioregions_mmio; + if (bus_idx == KVM_PIO_BUS) + return &kvm->ioregions_pio; +} + +/* check for not overlapping case and reverse */ +inline bool +overlap(u64 start1, u64 size1, u64 start2, u64 size2) +{ + u64 end1 = start1 + size1 - 1; + u64 end2 = start2 + size2 - 1; + + return !(end1 < start2 || start1 >= end2); +} + +/* assumes kvm->slots_lock held */ +bool +kvm_ioregion_collides(struct kvm *kvm, int bus_idx, + u64 start, u64 size) +{ + struct ioregion *p; + struct list_head *ioregions = get_ioregion_list(kvm, bus_idx); + + list_for_each_entry(p, ioregions, list) + if (overlap(start, size, p->paddr, !p->size ? 8 : p->size)) + return true; + + return false; +} + +/* assumes kvm->slots_lock held */ +static bool +ioregion_collision(struct kvm *kvm, struct ioregion *p, enum kvm_bus bus_idx) +{ + if (kvm_ioregion_collides(kvm, bus_idx, p->paddr, !p->size ? 8 : p->size) || + kvm_eventfd_collides(kvm, bus_idx, p->paddr, !p->size ? 8 : p->size)) + return true; + + return false; +} + +static enum kvm_bus +get_bus_from_flags(__u32 flags) +{ + if (flags & KVM_IOREGION_PIO) + return KVM_PIO_BUS; + return KVM_MMIO_BUS; +} + +int +kvm_set_ioregion_idx(struct kvm *kvm, struct kvm_ioregion *args, enum kvm_bus bus_idx) +{ + struct ioregion *p; + struct file *rfile = NULL, *wfile; + int ret = 0; + + wfile = fget(args->write_fd); + if (!wfile) + return -EBADF; + if (args->memory_size) { + rfile = fget(args->read_fd); + if (!rfile) { + fput(wfile); + return -EBADF; + } + } + p = kzalloc(sizeof(*p), GFP_KERNEL_ACCOUNT); + if (!p) { + ret = -ENOMEM; + goto fail; + } + + INIT_LIST_HEAD(&p->list); + p->paddr = args->guest_paddr; + p->size = args->memory_size; + p->user_data = args->user_data; + p->rf = rfile; + p->wf = wfile; + p->posted_writes = args->flags & KVM_IOREGION_POSTED_WRITES; + + mutex_lock(&kvm->slots_lock); + + if (ioregion_collision(kvm, p, bus_idx)) { + ret = -EEXIST; + goto unlock_fail; + } + kvm_iodevice_init(&p->dev, &ioregion_ops); + ret = kvm_io_bus_register_dev(kvm, bus_idx, p->paddr, p->size, + &p->dev); + if (ret < 0) + goto unlock_fail; + list_add_tail(&p->list, get_ioregion_list(kvm, bus_idx)); + + mutex_unlock(&kvm->slots_lock); + + return 0; + +unlock_fail: + mutex_unlock(&kvm->slots_lock); + kfree(p); +fail: + if (rfile) + fput(rfile); + fput(wfile); + + return ret; +} + +static int +kvm_rm_ioregion_idx(struct kvm *kvm, struct kvm_ioregion *args, enum kvm_bus bus_idx) +{ + struct ioregion *p, *tmp; + int ret = -ENOENT; + + struct list_head *ioregions = get_ioregion_list(kvm, bus_idx); + + mutex_lock(&kvm->slots_lock); + + list_for_each_entry_safe(p, tmp, ioregions, list) { + if (p->paddr == args->guest_paddr && + p->size == args->memory_size) { + kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev); + ioregion_release(p); + ret = 0; + break; + } + } + + mutex_unlock(&kvm->slots_lock); + + return ret; +} + +static int +kvm_set_ioregion(struct kvm *kvm, struct kvm_ioregion *args) +{ + int ret; + + enum kvm_bus bus_idx = get_bus_from_flags(args->flags); + + /* check for range overflow */ + if (args->guest_paddr + args->memory_size < args->guest_paddr) + return -EINVAL; + /* If size is ignored only posted writes are allowed */ + if (!args->memory_size && !(args->flags & KVM_IOREGION_POSTED_WRITES)) + return -EINVAL; + + ret = kvm_set_ioregion_idx(kvm, args, bus_idx); + if (ret) + return ret; + + /* If size is ignored, MMIO is also put on a FAST_MMIO bus */ + if (!args->memory_size && bus_idx == KVM_MMIO_BUS) + ret = kvm_set_ioregion_idx(kvm, args, KVM_FAST_MMIO_BUS); + if (ret) { + kvm_rm_ioregion_idx(kvm, args, bus_idx); + return ret; + } + + return 0; +} + +static int +kvm_rm_ioregion(struct kvm *kvm, struct kvm_ioregion *args) +{ + enum kvm_bus bus_idx = get_bus_from_flags(args->flags); + int ret = kvm_rm_ioregion_idx(kvm, args, bus_idx); + + if (!args->memory_size && bus_idx == KVM_MMIO_BUS) + kvm_rm_ioregion_idx(kvm, args, KVM_FAST_MMIO_BUS); + + return ret; +} + +int +kvm_ioregionfd(struct kvm *kvm, struct kvm_ioregion *args) +{ + if (args->flags & ~KVM_IOREGION_VALID_FLAG_MASK) + return -EINVAL; + + if (args->flags & KVM_IOREGION_DEASSIGN) + return kvm_rm_ioregion(kvm, args); + + return kvm_set_ioregion(kvm, args); +} diff --git a/virt/kvm/ioregion.h b/virt/kvm/ioregion.h new file mode 100644 index 000000000000..23ffa812ec7a --- /dev/null +++ b/virt/kvm/ioregion.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __KVM_IOREGION_H__ +#define __KVM_IOREGION_H__ + +#ifdef CONFIG_KVM_IOREGION +inline bool overlap(u64 start1, u64 size1, u64 start2, u64 size2); +bool kvm_ioregion_collides(struct kvm *kvm, int bus_idx, u64 start, u64 size); +#else +static inline bool +kvm_ioregion_collides(struct kvm *kvm, int bus_idx, u64 start, u64 size) +{ + return false; +} +#endif +#endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2541a17ff1c4..88b92fc3da51 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -747,6 +747,7 @@ static struct kvm *kvm_create_vm(unsigned long type) mmgrab(current->mm); kvm->mm = current->mm; kvm_eventfd_init(kvm); + kvm_ioregionfd_init(kvm); mutex_init(&kvm->lock); mutex_init(&kvm->irq_lock); mutex_init(&kvm->slots_lock); @@ -3708,6 +3709,16 @@ static long kvm_vm_ioctl(struct file *filp, r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem); break; } + case KVM_SET_IOREGION: { + struct kvm_ioregion data; + + r = -EFAULT; + if (copy_from_user(&data, argp, sizeof(data))) + goto out; + + r = kvm_ioregionfd(kvm, &data); + break; + } case KVM_GET_DIRTY_LOG: { struct kvm_dirty_log log; From patchwork Sun Feb 21 12:04:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elena Afanasova X-Patchwork-Id: 12097299 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08835C433E6 for ; Sun, 21 Feb 2021 12:12:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C362B64F03 for ; Sun, 21 Feb 2021 12:12:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229663AbhBUMLr (ORCPT ); Sun, 21 Feb 2021 07:11:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229925AbhBUMLl (ORCPT ); Sun, 21 Feb 2021 07:11:41 -0500 Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 00CE1C06178A for ; Sun, 21 Feb 2021 04:11:01 -0800 (PST) Received: by mail-lj1-x231.google.com with SMTP id a17so47259704ljq.2 for ; Sun, 21 Feb 2021 04:11:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=v2wYisLi/6JiY10iEt1CqvqaKJZOrEN0T8rwl6ZqCm8=; b=pMwpmLuM0KUN/RIwTgT4qoAsaUIuuW63wP/2gFsoR2cWxt/8+D5tEzeSdQuIn/neyH IQVBX9U5wZaggaHsBGBxDpiSx9dK6psO1i+G41L/4Iv/mgCa4nT7ve/6dpIcqsGZB3UX /KvDfy5RfXe+FQwq4aL/76NwnXvGZVeNwqwNxWG6fak/kOzWKAn761mzyZdFQQ7JDH1+ 7KlOgAypH+Rz1jDyUNIE887JQz2BYEivpG6D9YqCPYFnD7McFWFEMfKsu2K7egmHlJ/R MQmzEntHLpxC9ww8m3vlIjTPeNgFN7b7uw4A9fF60KJON/zGnKR1Wnkcmz1RIr+v7j85 kszg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=v2wYisLi/6JiY10iEt1CqvqaKJZOrEN0T8rwl6ZqCm8=; b=FiuVrhEv7/fwDwuItkldqXhJkXHi1HAkpuFv7bOEXiKQH9ignczH6t7ymazb80Iorg QP6W5iC/stszTJdnVn0bczRKMg5PRw0uFyE/DxUdukD8xZ9+PTZOYuRyn5Hin7zNBq4f fnAJQBhnwQa1eNaaOe1aQvEBbNCxz5wsyGA7qlghzT7hD/YnbkFr+KbdZ49FDSNzR34h O3i9uGJXv7w7AFIWsUg6jMFUobqPR8Q5wrflQp8w7bDZF6u8ajRoQppKGEzcUpg8hg7Z upMyEcrauDDB3rw/nlOI9Ex1v3u4uaWXoRLdcLT7hvUgoYP9hGE0gsQZzDuMXISq3wau OldQ== X-Gm-Message-State: AOAM530yIb6/T3rnLqK5R04JD/3n4IWYxSHTQA3DXeNZRFl5pJHN+wph bq6xt8r7CcxSCkkvob/9gmbCHsKEcfQLP8c1 X-Google-Smtp-Source: ABdhPJzFfuD6YixhyJlMSdYC7bY0o6YW/drrkx7lxHsYGzyV2dLS9ZjJu+e20cl+zTVXZ/y2LKvMoQ== X-Received: by 2002:ac2:4c0b:: with SMTP id t11mr9318368lfq.605.1613909458830; Sun, 21 Feb 2021 04:10:58 -0800 (PST) Received: from localhost.localdomain (37-145-186-126.broadband.corbina.ru. [37.145.186.126]) by smtp.gmail.com with ESMTPSA id q6sm1547715lfn.23.2021.02.21.04.10.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Feb 2021 04:10:58 -0800 (PST) From: Elena Afanasova To: kvm@vger.kernel.org Cc: stefanha@redhat.com, jag.raman@oracle.com, elena.ufimtseva@oracle.com, pbonzini@redhat.com, jasowang@redhat.com, mst@redhat.com, cohuck@redhat.com, john.levon@nutanix.com, Elena Afanasova Subject: [RFC v3 2/5] KVM: x86: add support for ioregionfd signal handling Date: Sun, 21 Feb 2021 15:04:38 +0300 Message-Id: <575df1656277c55f26e660b7274a7c570b448636.1613828727.git.eafanasova@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The vCPU thread may receive a signal during ioregionfd communication, ioctl(KVM_RUN) needs to return to userspace and then ioctl(KVM_RUN) must resume ioregionfd. Signed-off-by: Elena Afanasova --- v3: - add FAST_MMIO bus support - move ioregion_interrupted flag to ioregion_ctx - reorder ioregion_ctx fields - rework complete_ioregion operations - add signal handling support for crossing a page boundary case - fix kvm_arch_vcpu_ioctl_run() should return -EINTR in case ioregionfd is interrupted arch/x86/kvm/vmx/vmx.c | 40 +++++- arch/x86/kvm/x86.c | 272 +++++++++++++++++++++++++++++++++++++-- include/linux/kvm_host.h | 10 ++ virt/kvm/kvm_main.c | 16 ++- 4 files changed, 317 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 47b8357b9751..39db31afd27e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5357,19 +5357,51 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); } +#ifdef CONFIG_KVM_IOREGION +static int complete_ioregion_fast_mmio(struct kvm_vcpu *vcpu) +{ + int ret, idx; + + idx = srcu_read_lock(&vcpu->kvm->srcu); + ret = kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, + vcpu->ioregion_ctx.addr, 0, NULL); + if (ret) { + ret = kvm_mmu_page_fault(vcpu, vcpu->ioregion_ctx.addr, + PFERR_RSVD_MASK, NULL, 0); + srcu_read_unlock(&vcpu->kvm->srcu, idx); + return ret; + } + + srcu_read_unlock(&vcpu->kvm->srcu, idx); + return kvm_skip_emulated_instruction(vcpu); +} +#endif + static int handle_ept_misconfig(struct kvm_vcpu *vcpu) { gpa_t gpa; + int ret; /* * A nested guest cannot optimize MMIO vmexits, because we have an * nGPA here instead of the required GPA. */ gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); - if (!is_guest_mode(vcpu) && - !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { - trace_kvm_fast_mmio(gpa); - return kvm_skip_emulated_instruction(vcpu); + if (!is_guest_mode(vcpu)) { + ret = kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL); + if (!ret) { + trace_kvm_fast_mmio(gpa); + return kvm_skip_emulated_instruction(vcpu); + } + +#ifdef CONFIG_KVM_IOREGION + if (unlikely(vcpu->ioregion_ctx.is_interrupted && ret == -EINTR)) { + vcpu->run->exit_reason = KVM_EXIT_INTR; + vcpu->arch.complete_userspace_io = complete_ioregion_fast_mmio; + ++vcpu->stat.signal_exits; + return ret; + } +#endif } return kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ddb28f5ca252..07a538f02e3b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5799,19 +5799,33 @@ static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len, { int handled = 0; int n; + int ret = 0; + bool is_apic; do { n = min(len, 8); - if (!(lapic_in_kernel(vcpu) && - !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, addr, n, v)) - && kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v)) - break; + is_apic = lapic_in_kernel(vcpu) && + !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, + addr, n, v); + if (!is_apic) { + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, + addr, n, v); + if (ret) + break; + } handled += n; addr += n; len -= n; v += n; } while (len); +#ifdef CONFIG_KVM_IOREGION + if (ret == -EINTR) { + vcpu->run->exit_reason = KVM_EXIT_INTR; + ++vcpu->stat.signal_exits; + } +#endif + return handled; } @@ -5819,14 +5833,20 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v) { int handled = 0; int n; + int ret = 0; + bool is_apic; do { n = min(len, 8); - if (!(lapic_in_kernel(vcpu) && - !kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, - addr, n, v)) - && kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, n, v)) - break; + is_apic = lapic_in_kernel(vcpu) && + !kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, + addr, n, v); + if (!is_apic) { + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, + addr, n, v); + if (ret) + break; + } trace_kvm_mmio(KVM_TRACE_MMIO_READ, n, addr, v); handled += n; addr += n; @@ -5834,6 +5854,13 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v) v += n; } while (len); +#ifdef CONFIG_KVM_IOREGION + if (ret == -EINTR) { + vcpu->run->exit_reason = KVM_EXIT_INTR; + ++vcpu->stat.signal_exits; + } +#endif + return handled; } @@ -6229,6 +6256,13 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, if (!ret && ops->read_write_emulate(vcpu, gpa, val, bytes)) return X86EMUL_CONTINUE; +#ifdef CONFIG_KVM_IOREGION + /* crossing a page boundary case is interrupted */ + if (vcpu->ioregion_ctx.is_interrupted && + vcpu->run->exit_reason == KVM_EXIT_INTR) + goto out; +#endif + /* * Is this MMIO handled locally? */ @@ -6236,6 +6270,7 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, if (handled == bytes) return X86EMUL_CONTINUE; +out: gpa += handled; bytes -= handled; val += handled; @@ -6294,6 +6329,12 @@ static int emulator_read_write(struct x86_emulate_ctxt *ctxt, vcpu->mmio_needed = 1; vcpu->mmio_cur_fragment = 0; +#ifdef CONFIG_KVM_IOREGION + if (vcpu->ioregion_ctx.is_interrupted && + vcpu->run->exit_reason == KVM_EXIT_INTR) + return (vcpu->ioregion_ctx.in) ? X86EMUL_IO_NEEDED : X86EMUL_CONTINUE; +#endif + vcpu->run->mmio.len = min(8u, vcpu->mmio_fragments[0].len); vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write; vcpu->run->exit_reason = KVM_EXIT_MMIO; @@ -6411,16 +6452,22 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd) for (i = 0; i < vcpu->arch.pio.count; i++) { if (vcpu->arch.pio.in) - r = kvm_io_bus_read(vcpu, KVM_PIO_BUS, vcpu->arch.pio.port, + r = kvm_io_bus_read(vcpu, KVM_PIO_BUS, + vcpu->arch.pio.port, vcpu->arch.pio.size, pd); else r = kvm_io_bus_write(vcpu, KVM_PIO_BUS, - vcpu->arch.pio.port, vcpu->arch.pio.size, - pd); + vcpu->arch.pio.port, + vcpu->arch.pio.size, pd); if (r) break; pd += vcpu->arch.pio.size; } +#ifdef CONFIG_KVM_IOREGION + if (vcpu->ioregion_ctx.is_interrupted && r == -EINTR) + vcpu->ioregion_ctx.pio = i; +#endif + return r; } @@ -6428,16 +6475,27 @@ static int emulator_pio_in_out(struct kvm_vcpu *vcpu, int size, unsigned short port, void *val, unsigned int count, bool in) { + int ret = 0; + vcpu->arch.pio.port = port; vcpu->arch.pio.in = in; vcpu->arch.pio.count = count; vcpu->arch.pio.size = size; - if (!kernel_pio(vcpu, vcpu->arch.pio_data)) { + ret = kernel_pio(vcpu, vcpu->arch.pio_data); + if (!ret) { vcpu->arch.pio.count = 0; return 1; } +#ifdef CONFIG_KVM_IOREGION + if (ret == -EINTR) { + vcpu->run->exit_reason = KVM_EXIT_INTR; + ++vcpu->stat.signal_exits; + return 0; + } +#endif + vcpu->run->exit_reason = KVM_EXIT_IO; vcpu->run->io.direction = in ? KVM_EXIT_IO_IN : KVM_EXIT_IO_OUT; vcpu->run->io.size = size; @@ -7141,6 +7199,10 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt, static int complete_emulated_mmio(struct kvm_vcpu *vcpu); static int complete_emulated_pio(struct kvm_vcpu *vcpu); +#ifdef CONFIG_KVM_IOREGION +static int complete_ioregion_io(struct kvm_vcpu *vcpu); +static int complete_ioregion_fast_pio(struct kvm_vcpu *vcpu); +#endif static void kvm_smm_changed(struct kvm_vcpu *vcpu) { @@ -7405,6 +7467,14 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, r = 1; if (inject_emulated_exception(vcpu)) return r; +#ifdef CONFIG_KVM_IOREGION + } else if (vcpu->ioregion_ctx.is_interrupted && + vcpu->run->exit_reason == KVM_EXIT_INTR) { + if (vcpu->ioregion_ctx.in) + writeback = false; + vcpu->arch.complete_userspace_io = complete_ioregion_io; + r = 0; +#endif } else if (vcpu->arch.pio.count) { if (!vcpu->arch.pio.in) { /* FIXME: return into emulator if single-stepping. */ @@ -7501,6 +7571,12 @@ static int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, vcpu->arch.complete_userspace_io = complete_fast_pio_out_port_0x7e; kvm_skip_emulated_instruction(vcpu); +#ifdef CONFIG_KVM_IOREGION + } else if (vcpu->ioregion_ctx.is_interrupted && + vcpu->run->exit_reason == KVM_EXIT_INTR) { + vcpu->arch.pio.linear_rip = kvm_get_linear_rip(vcpu); + vcpu->arch.complete_userspace_io = complete_ioregion_fast_pio; +#endif } else { vcpu->arch.pio.linear_rip = kvm_get_linear_rip(vcpu); vcpu->arch.complete_userspace_io = complete_fast_pio_out; @@ -7549,6 +7625,14 @@ static int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, } vcpu->arch.pio.linear_rip = kvm_get_linear_rip(vcpu); + +#ifdef CONFIG_KVM_IOREGION + if (vcpu->ioregion_ctx.is_interrupted && + vcpu->run->exit_reason == KVM_EXIT_INTR) { + vcpu->arch.complete_userspace_io = complete_ioregion_fast_pio; + return 0; + } +#endif vcpu->arch.complete_userspace_io = complete_fast_pio_in; return 0; @@ -9204,6 +9288,162 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu) return 0; } +#ifdef CONFIG_KVM_IOREGION +static int complete_ioregion_access(struct kvm_vcpu *vcpu, u8 bus, gpa_t addr, + int len, void *val) +{ + if (vcpu->ioregion_ctx.in) + return kvm_io_bus_read(vcpu, bus, addr, len, val); + else + return kvm_io_bus_write(vcpu, bus, addr, len, val); +} + +static int complete_ioregion_mmio(struct kvm_vcpu *vcpu) +{ + struct kvm_mmio_fragment *frag; + int idx, ret, i, n; + + idx = srcu_read_lock(&vcpu->kvm->srcu); + for (i = vcpu->mmio_cur_fragment; i < vcpu->mmio_nr_fragments; i++) { + frag = &vcpu->mmio_fragments[i]; + do { + n = min(8u, frag->len); + ret = complete_ioregion_access(vcpu, KVM_MMIO_BUS, + frag->gpa, n, frag->data); + if (ret < 0) + goto do_exit; + frag->len -= n; + frag->data += n; + frag->gpa += n; + } while (frag->len); + vcpu->mmio_cur_fragment++; + } + + vcpu->mmio_needed = 0; + if (!vcpu->ioregion_ctx.in) { + ret = 1; + goto out; + } + + vcpu->mmio_read_completed = 1; + ret = kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE); + goto out; + +do_exit: + if (ret != -EOPNOTSUPP) { + vcpu->arch.complete_userspace_io = complete_ioregion_mmio; + goto out; + } + + /* if ioregion is removed KVM needs to return with KVM_EXIT_MMIO */ + vcpu->run->exit_reason = KVM_EXIT_MMIO; + vcpu->run->mmio.phys_addr = frag->gpa; + if (!vcpu->ioregion_ctx.in) + memcpy(vcpu->run->mmio.data, frag->data, n); + vcpu->run->mmio.len = n; + vcpu->run->mmio.is_write = !vcpu->ioregion_ctx.in; + vcpu->arch.complete_userspace_io = complete_emulated_mmio; + ret = 0; +out: + srcu_read_unlock(&vcpu->kvm->srcu, idx); + return ret; +} + +static int complete_ioregion_pio(struct kvm_vcpu *vcpu) +{ + int i, idx, ret; + unsigned long off; + + idx = srcu_read_lock(&vcpu->kvm->srcu); + + for (i = vcpu->ioregion_ctx.pio; i < vcpu->arch.pio.count; i++) { + ret = complete_ioregion_access(vcpu, KVM_PIO_BUS, vcpu->arch.pio.port, + vcpu->arch.pio.size, + vcpu->ioregion_ctx.val); + if (ret < 0) + goto do_exit; + vcpu->ioregion_ctx.val += vcpu->arch.pio.size; + } + + ret = 1; + if (vcpu->ioregion_ctx.in) + ret = kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE); + vcpu->arch.pio.count = 0; + goto out; + +do_exit: + if (ret != -EOPNOTSUPP) { + vcpu->ioregion_ctx.pio = i; + vcpu->arch.complete_userspace_io = complete_ioregion_pio; + goto out; + } + + /* if ioregion is removed KVM needs to return with KVM_EXIT_IO */ + off = vcpu->ioregion_ctx.val - vcpu->arch.pio_data; + vcpu->run->exit_reason = KVM_EXIT_IO; + vcpu->run->io.direction = vcpu->ioregion_ctx.in ? KVM_EXIT_IO_IN : KVM_EXIT_IO_OUT; + vcpu->run->io.size = vcpu->arch.pio.size; + vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE + off; + vcpu->run->io.count = vcpu->arch.pio.count - i; + vcpu->run->io.port = vcpu->arch.pio.port; + if (vcpu->ioregion_ctx.in) + vcpu->arch.complete_userspace_io = complete_emulated_pio; + else + vcpu->arch.pio.count = 0; + ret = 0; +out: + srcu_read_unlock(&vcpu->kvm->srcu, idx); + return ret; +} + +static int complete_ioregion_fast_pio(struct kvm_vcpu *vcpu) +{ + int idx, ret; + u64 val; + + idx = srcu_read_lock(&vcpu->kvm->srcu); + ret = complete_ioregion_access(vcpu, KVM_PIO_BUS, vcpu->arch.pio.port, + vcpu->arch.pio.size, vcpu->ioregion_ctx.val); + if (ret < 0) + goto do_exit; + srcu_read_unlock(&vcpu->kvm->srcu, idx); + + if (vcpu->ioregion_ctx.in) { + memcpy(&val, vcpu->ioregion_ctx.val, vcpu->ioregion_ctx.len); + kvm_rax_write(vcpu, val); + } + vcpu->arch.pio.count = 0; + return kvm_skip_emulated_instruction(vcpu); + +do_exit: + if (ret != -EOPNOTSUPP) { + vcpu->arch.complete_userspace_io = complete_ioregion_fast_pio; + goto out; + } + + vcpu->run->exit_reason = KVM_EXIT_IO; + vcpu->run->io.direction = vcpu->ioregion_ctx.in ? KVM_EXIT_IO_IN : KVM_EXIT_IO_OUT; + vcpu->run->io.size = vcpu->arch.pio.size; + vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE; + vcpu->run->io.count = 1; + vcpu->run->io.port = vcpu->arch.pio.port; + vcpu->arch.complete_userspace_io = vcpu->ioregion_ctx.in ? + complete_fast_pio_in : complete_fast_pio_out; + ret = 0; +out: + srcu_read_unlock(&vcpu->kvm->srcu, idx); + return ret; +} + +static int complete_ioregion_io(struct kvm_vcpu *vcpu) +{ + if (vcpu->mmio_needed) + return complete_ioregion_mmio(vcpu); + if (vcpu->arch.pio.count) + return complete_ioregion_pio(vcpu); +} +#endif /* CONFIG_KVM_IOREGION */ + static void kvm_save_current_fpu(struct fpu *fpu) { /* @@ -9309,6 +9549,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) else r = vcpu_run(vcpu); +#ifdef CONFIG_KVM_IOREGION + if (vcpu->ioregion_ctx.is_interrupted && + vcpu->run->exit_reason == KVM_EXIT_INTR) + r = -EINTR; +#endif + out: kvm_put_guest_fpu(vcpu); if (kvm_run->kvm_valid_regs) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f35f0976f5cf..84f07597d131 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -318,6 +318,16 @@ struct kvm_vcpu { #endif bool preempted; bool ready; +#ifdef CONFIG_KVM_IOREGION + struct { + u64 addr; + void *val; + int pio; + u8 state; /* SEND_CMD/GET_REPLY */ + bool in; + bool is_interrupted; + } ioregion_ctx; +#endif struct kvm_vcpu_arch arch; }; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 88b92fc3da51..df387857f51f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4193,6 +4193,7 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus, struct kvm_io_range *range, const void *val) { int idx; + int ret = 0; idx = kvm_io_bus_get_first_dev(bus, range->addr, range->len); if (idx < 0) @@ -4200,9 +4201,12 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus, while (idx < bus->dev_count && kvm_io_bus_cmp(range, &bus->range[idx]) == 0) { - if (!kvm_iodevice_write(vcpu, bus->range[idx].dev, range->addr, - range->len, val)) + ret = kvm_iodevice_write(vcpu, bus->range[idx].dev, range->addr, + range->len, val); + if (!ret) return idx; + if (ret < 0 && ret != -EOPNOTSUPP) + return ret; idx++; } @@ -4264,6 +4268,7 @@ static int __kvm_io_bus_read(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus, struct kvm_io_range *range, void *val) { int idx; + int ret = 0; idx = kvm_io_bus_get_first_dev(bus, range->addr, range->len); if (idx < 0) @@ -4271,9 +4276,12 @@ static int __kvm_io_bus_read(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus, while (idx < bus->dev_count && kvm_io_bus_cmp(range, &bus->range[idx]) == 0) { - if (!kvm_iodevice_read(vcpu, bus->range[idx].dev, range->addr, - range->len, val)) + ret = kvm_iodevice_read(vcpu, bus->range[idx].dev, range->addr, + range->len, val); + if (!ret) return idx; + if (ret < 0 && ret != -EOPNOTSUPP) + return ret; idx++; } From patchwork Sun Feb 21 12:04:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elena Afanasova X-Patchwork-Id: 12097297 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 179CFC433E9 for ; Sun, 21 Feb 2021 12:12:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DF9CD6186A for ; Sun, 21 Feb 2021 12:12:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229933AbhBUML5 (ORCPT ); Sun, 21 Feb 2021 07:11:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229921AbhBUMLy (ORCPT ); Sun, 21 Feb 2021 07:11:54 -0500 Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com [IPv6:2a00:1450:4864:20::230]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDD55C061574 for ; Sun, 21 Feb 2021 04:11:13 -0800 (PST) Received: by mail-lj1-x230.google.com with SMTP id g1so41272421ljj.13 for ; Sun, 21 Feb 2021 04:11:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TQuhSuGZ+mKlsZSP2ivNdXF1vnj7kB1aS+u79shP5tY=; b=fCvC53zD00KR1tYz1c3TCCyZQIzAtoYXP1MA1fH7Mw2ILg6jyjw/iQ5hvdrehrQLO1 Vad/Re8+fv1f4FKQWg6DIl2KMwFsNulJkiBQ8gMzKAKXpIA07W+BGHgdd7VTc/7lYppb SX4zBSuyGhFqb+Ha9Z/daxTW1011Uas4c7HYE59NHzSyxU2HshmpEHr+oahJFjQ+C5w2 YcydsvQ8lbogXDBBgselTqNTbt33k6QujkakONwT/vBEq+V7dvRYAfgjTrseWa/fFZOl 7jzUxUM4bj52WLcUl7wnOv4hdxaBZADsc+ItF0QjjfY9ZlhVJnJGqIjIjMYSJCw+h+MA plrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TQuhSuGZ+mKlsZSP2ivNdXF1vnj7kB1aS+u79shP5tY=; b=nObvlO61r2fLah7+NPqxeiGt7OlG4YMfsJq6C/ip6d62nZmKMmQAzt3+Ei61KNs+ux do+sFQnIVsPxtfSR4dwnu8BS+0QAgqN3DHNUXQ7U7ORyrRd5xNrQpbMbIJsw46HJRuVf vRJbzgOVH+/s+/+1cWj7lhCpO5+4n0myl0PipU8LkJXQs4RehkXsEcXEOkN3/7M75cOH T1VsxURyRtmkM0nPzpovsoDip3V9NIr+kJW1FbNfWjJ+N2IVoNxxfhT/aHywXND1HjsC yxzz/GkNFLjxhYg0HakBSDwHRtJgErl1IMEWs1h4FdvzyyI3GNM5f+nV0/DUn5onTraX QQ8w== X-Gm-Message-State: AOAM533yZmFVfNoUTbe3tR73461eyIJPqOrM6J5j2hm5qQTgqwP9CdXH POOPAv3ONE3wOZVDrLygJAion8jLNH1zbrol X-Google-Smtp-Source: ABdhPJw5h0UBuW4StEwspWfbFq1J8eQ42K1WV/BO/MLa0arZ3vTUxNTkGlAPqGCIrdG8WWgiEpMKPA== X-Received: by 2002:a2e:6a11:: with SMTP id f17mr11795841ljc.14.1613909472134; Sun, 21 Feb 2021 04:11:12 -0800 (PST) Received: from localhost.localdomain (37-145-186-126.broadband.corbina.ru. [37.145.186.126]) by smtp.gmail.com with ESMTPSA id q6sm1547715lfn.23.2021.02.21.04.11.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Feb 2021 04:11:11 -0800 (PST) From: Elena Afanasova To: kvm@vger.kernel.org Cc: stefanha@redhat.com, jag.raman@oracle.com, elena.ufimtseva@oracle.com, pbonzini@redhat.com, jasowang@redhat.com, mst@redhat.com, cohuck@redhat.com, john.levon@nutanix.com, Elena Afanasova Subject: [RFC v3 3/5] KVM: implement wire protocol Date: Sun, 21 Feb 2021 15:04:39 +0300 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add ioregionfd blocking read/write operations. Signed-off-by: Elena Afanasova --- v3: - change wire protocol license - remove ioregionfd_cmd info and drop appropriate macros - fix ioregionfd state machine - add sizeless ioregions support - drop redundant check in ioregion_read/write() include/uapi/linux/ioregion.h | 30 +++++++ virt/kvm/ioregion.c | 162 +++++++++++++++++++++++++++++++++- 2 files changed, 190 insertions(+), 2 deletions(-) create mode 100644 include/uapi/linux/ioregion.h diff --git a/include/uapi/linux/ioregion.h b/include/uapi/linux/ioregion.h new file mode 100644 index 000000000000..58f9b5ba6186 --- /dev/null +++ b/include/uapi/linux/ioregion.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-3-Clause) */ +#ifndef _UAPI_LINUX_IOREGION_H +#define _UAPI_LINUX_IOREGION_H + +/* Wire protocol */ + +struct ioregionfd_cmd { + __u8 cmd; + __u8 size_exponent : 4; + __u8 resp : 1; + __u8 padding[6]; + __u64 user_data; + __u64 offset; + __u64 data; +}; + +struct ioregionfd_resp { + __u64 data; + __u8 pad[24]; +}; + +#define IOREGIONFD_CMD_READ 0 +#define IOREGIONFD_CMD_WRITE 1 + +#define IOREGIONFD_SIZE_8BIT 0 +#define IOREGIONFD_SIZE_16BIT 1 +#define IOREGIONFD_SIZE_32BIT 2 +#define IOREGIONFD_SIZE_64BIT 3 + +#endif diff --git a/virt/kvm/ioregion.c b/virt/kvm/ioregion.c index e09ef3e2c9d7..1e1c7772d274 100644 --- a/virt/kvm/ioregion.c +++ b/virt/kvm/ioregion.c @@ -3,6 +3,7 @@ #include #include #include "eventfd.h" +#include void kvm_ioregionfd_init(struct kvm *kvm) @@ -40,18 +41,175 @@ ioregion_release(struct ioregion *p) kfree(p); } +static bool +pack_cmd(struct ioregionfd_cmd *cmd, u64 offset, u64 len, u8 opt, u8 resp, + u64 user_data, const void *val) +{ + switch (len) { + case 0: + break; + case 1: + cmd->size_exponent = IOREGIONFD_SIZE_8BIT; + break; + case 2: + cmd->size_exponent = IOREGIONFD_SIZE_16BIT; + break; + case 4: + cmd->size_exponent = IOREGIONFD_SIZE_32BIT; + break; + case 8: + cmd->size_exponent = IOREGIONFD_SIZE_64BIT; + break; + default: + return false; + } + + if (val) + memcpy(&cmd->data, val, len); + cmd->user_data = user_data; + cmd->offset = offset; + cmd->cmd = opt; + cmd->resp = resp; + + return true; +} + +enum { + SEND_CMD, + GET_REPLY, + COMPLETE +}; + +static void +ioregion_save_ctx(struct kvm_vcpu *vcpu, bool in, gpa_t addr, u8 state, void *val) +{ + vcpu->ioregion_ctx.is_interrupted = true; + vcpu->ioregion_ctx.val = val; + vcpu->ioregion_ctx.state = state; + vcpu->ioregion_ctx.addr = addr; + vcpu->ioregion_ctx.in = in; +} + static int ioregion_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, int len, void *val) { - return -EOPNOTSUPP; + struct ioregion *p = to_ioregion(this); + union { + struct ioregionfd_cmd cmd; + struct ioregionfd_resp resp; + } buf; + int ret = 0; + int state = SEND_CMD; + + if (unlikely(vcpu->ioregion_ctx.is_interrupted)) { + vcpu->ioregion_ctx.is_interrupted = false; + + switch (vcpu->ioregion_ctx.state) { + case SEND_CMD: + goto send_cmd; + case GET_REPLY: + goto get_repl; + default: + return -EINVAL; + } + } + +send_cmd: + memset(&buf, 0, sizeof(buf)); + if (!pack_cmd(&buf.cmd, addr - p->paddr, len, IOREGIONFD_CMD_READ, + 1, p->user_data, NULL)) + return -EOPNOTSUPP; + + ret = kernel_write(p->wf, &buf.cmd, sizeof(buf.cmd), 0); + state = (ret == sizeof(buf.cmd)) ? GET_REPLY : SEND_CMD; + if (signal_pending(current) && state == SEND_CMD) { + ioregion_save_ctx(vcpu, 1, addr, state, val); + return -EINTR; + } + if (ret != sizeof(buf.cmd)) { + ret = (ret < 0) ? ret : -EIO; + return (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + } + if (!p->rf) + return 0; + +get_repl: + memset(&buf, 0, sizeof(buf)); + ret = kernel_read(p->rf, &buf.resp, sizeof(buf.resp), 0); + state = (ret == sizeof(buf.resp)) ? COMPLETE : GET_REPLY; + if (signal_pending(current) && state == GET_REPLY) { + ioregion_save_ctx(vcpu, 1, addr, state, val); + return -EINTR; + } + if (ret != sizeof(buf.resp)) { + ret = (ret < 0) ? ret : -EIO; + return (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + } + + memcpy(val, &buf.resp.data, len); + + return 0; } static int ioregion_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, int len, const void *val) { - return -EOPNOTSUPP; + struct ioregion *p = to_ioregion(this); + union { + struct ioregionfd_cmd cmd; + struct ioregionfd_resp resp; + } buf; + int ret = 0; + int state = SEND_CMD; + + if (unlikely(vcpu->ioregion_ctx.is_interrupted)) { + vcpu->ioregion_ctx.is_interrupted = false; + + switch (vcpu->ioregion_ctx.state) { + case SEND_CMD: + goto send_cmd; + case GET_REPLY: + goto get_repl; + default: + return -EINVAL; + } + } + +send_cmd: + memset(&buf, 0, sizeof(buf)); + if (!pack_cmd(&buf.cmd, addr - p->paddr, len, IOREGIONFD_CMD_WRITE, + p->posted_writes ? 0 : 1, p->user_data, val)) + return -EOPNOTSUPP; + + ret = kernel_write(p->wf, &buf.cmd, sizeof(buf.cmd), 0); + state = (ret == sizeof(buf.cmd)) ? GET_REPLY : SEND_CMD; + if (signal_pending(current) && state == SEND_CMD) { + ioregion_save_ctx(vcpu, 0, addr, state, (void *)val); + return -EINTR; + } + if (ret != sizeof(buf.cmd)) { + ret = (ret < 0) ? ret : -EIO; + return (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + } + +get_repl: + if (!p->posted_writes) { + memset(&buf, 0, sizeof(buf)); + ret = kernel_read(p->rf, &buf.resp, sizeof(buf.resp), 0); + state = (ret == sizeof(buf.resp)) ? COMPLETE : GET_REPLY; + if (signal_pending(current) && state == GET_REPLY) { + ioregion_save_ctx(vcpu, 0, addr, state, (void *)val); + return -EINTR; + } + if (ret != sizeof(buf.resp)) { + ret = (ret < 0) ? ret : -EIO; + return (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + } + } + + return 0; } /* From patchwork Sun Feb 21 12:04:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elena Afanasova X-Patchwork-Id: 12097301 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53363C433E6 for ; Sun, 21 Feb 2021 12:12:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 308D464EF6 for ; Sun, 21 Feb 2021 12:12:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229985AbhBUMMK (ORCPT ); Sun, 21 Feb 2021 07:12:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229953AbhBUMMC (ORCPT ); Sun, 21 Feb 2021 07:12:02 -0500 Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1C1AC06178B for ; Sun, 21 Feb 2021 04:11:21 -0800 (PST) Received: by mail-lj1-x231.google.com with SMTP id c8so47176309ljd.12 for ; Sun, 21 Feb 2021 04:11:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=x0RBaipR9PXwbELxNszRAUv20CSSLCJyKfQGgE/hhnk=; b=ODJnb0rwJwJ/T0ogYdEwmDoTrrm5zW2/nIx6EcbOejtDn8PCO0zYh3Zq0DML5X3BUC 75JP4E+8PRBp1/JmIkr/XG3v+bjuxps7pj/0uYNpVPzl5Jkji4FNfB00cjczUEzu2eza N3VRuN/FxCwDIROcRGHdS3Bl9zGdr031IvSa4Pu239Q4XDxJBRRGacPzDLpFbL4oi/6Z eScLL2OO2wJoGZNq+oAM59EzvrvsAR3Oz5UfjG5hQNBMlyKpIC6qY/NpR5HoLGQeUoop tFKnKQNrEpqIuOO6l85byAPR7U+luT+A9cLnwPgZ8dqmASF7EiMAV3W0sX/lTCpHu56i ZhmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=x0RBaipR9PXwbELxNszRAUv20CSSLCJyKfQGgE/hhnk=; b=bMHxKYwIHdo+KFIwJ71nv2L0JIyrzxR9Tl4JNOfkpZDPM05fhiNuoodPavMcEh8U6K m9YhqgHqCCEES+oyR0csmtcNipUxTj4e13Zt94Y2hb0FDOYG3qdva9M+1+kVO71g22Ef cUPRA8BkzbdUTl6b5dWkJ9F2qHPjrJJ2v8752bXX3PsCET54/tpZVw6cuhcwkWfxUMsi qsD1VTghXK+APpiMy+BZBITU6NJ/OBlTttZ1YsSrspgNgaFDSXdCoT2e1e6dnf/5W8rN gmso9JetN5HBdPsuQmyDgGrK/DIJLwcgztUDmDghnJmweQbyGqTAY5iBP8Ib5nByLGKm Godw== X-Gm-Message-State: AOAM532CWLLIeoDXeqAWR7gXaDOsWNvSH4WBBmgwVS3aAfvQq0RUSgVV v0nnLmM5Dsa3nY4bmxPAjVDBRVrFrNLNIPxg X-Google-Smtp-Source: ABdhPJxy/h4mo1fs65vRplUzgzecBcjEk3a8Zk4QPWd0KqwwYLPhr0DcFwx5YWSUF7PxWjkjpQd2Ag== X-Received: by 2002:a05:651c:288:: with SMTP id b8mr10973238ljo.133.1613909480018; Sun, 21 Feb 2021 04:11:20 -0800 (PST) Received: from localhost.localdomain (37-145-186-126.broadband.corbina.ru. [37.145.186.126]) by smtp.gmail.com with ESMTPSA id q6sm1547715lfn.23.2021.02.21.04.11.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Feb 2021 04:11:19 -0800 (PST) From: Elena Afanasova To: kvm@vger.kernel.org Cc: stefanha@redhat.com, jag.raman@oracle.com, elena.ufimtseva@oracle.com, pbonzini@redhat.com, jasowang@redhat.com, mst@redhat.com, cohuck@redhat.com, john.levon@nutanix.com, Elena Afanasova Subject: [RFC v3 4/5] KVM: add ioregionfd context Date: Sun, 21 Feb 2021 15:04:40 +0300 Message-Id: <4436ef071e55d88ff3996b134cc2303053581242.1613828727.git.eafanasova@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add support for ioregionfd cmds/replies serialization. Signed-off-by: Elena Afanasova --- v3: - add comment - drop kvm_io_bus_finish/prepare() virt/kvm/ioregion.c | 164 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 135 insertions(+), 29 deletions(-) diff --git a/virt/kvm/ioregion.c b/virt/kvm/ioregion.c index 1e1c7772d274..d53e3d1cd2ff 100644 --- a/virt/kvm/ioregion.c +++ b/virt/kvm/ioregion.c @@ -1,10 +1,39 @@ // SPDX-License-Identifier: GPL-2.0-only #include -#include +#include #include #include "eventfd.h" #include +/* ioregions that share the same rfd are serialized so that only one vCPU + * thread sends a struct ioregionfd_cmd to userspace at a time. This + * ensures that the struct ioregionfd_resp received from userspace will + * be processed by the one and only vCPU thread that sent it. + * + * A waitqueue is used to wake up waiting vCPU threads in order. Most of + * the time the waitqueue is unused and the lock is not contended. + * For best performance userspace should set up ioregionfds so that there + * is no contention (e.g. dedicated ioregionfds for queue doorbell + * registers on multi-queue devices). + */ +struct ioregionfd { + wait_queue_head_t wq; + struct file *rf; + struct kref kref; + bool busy; +}; + +struct ioregion { + struct list_head list; + u64 paddr; /* guest physical address */ + u64 size; /* size in bytes */ + struct file *wf; + u64 user_data; /* opaque token used by userspace */ + struct kvm_io_device dev; + bool posted_writes; + struct ioregionfd *ctx; +}; + void kvm_ioregionfd_init(struct kvm *kvm) { @@ -13,29 +42,28 @@ kvm_ioregionfd_init(struct kvm *kvm) INIT_LIST_HEAD(&kvm->ioregions_pio); } -struct ioregion { - struct list_head list; - u64 paddr; /* guest physical address */ - u64 size; /* size in bytes */ - struct file *rf; - struct file *wf; - u64 user_data; /* opaque token used by userspace */ - struct kvm_io_device dev; - bool posted_writes; -}; - static inline struct ioregion * to_ioregion(struct kvm_io_device *dev) { return container_of(dev, struct ioregion, dev); } +/* assumes kvm->slots_lock held */ +static void ctx_free(struct kref *kref) +{ + struct ioregionfd *ctx = container_of(kref, struct ioregionfd, kref); + + kfree(ctx); +} + /* assumes kvm->slots_lock held */ static void ioregion_release(struct ioregion *p) { - if (p->rf) - fput(p->rf); + if (p->ctx) { + fput(p->ctx->rf); + kref_put(&p->ctx->kref, ctx_free); + } fput(p->wf); list_del(&p->list); kfree(p); @@ -90,6 +118,30 @@ ioregion_save_ctx(struct kvm_vcpu *vcpu, bool in, gpa_t addr, u8 state, void *va vcpu->ioregion_ctx.in = in; } +static inline void +ioregion_lock_ctx(struct ioregionfd *ctx) +{ + if (!ctx) + return; + + spin_lock(&ctx->wq.lock); + wait_event_interruptible_exclusive_locked(ctx->wq, !ctx->busy); + ctx->busy = true; + spin_unlock(&ctx->wq.lock); +} + +static inline void +ioregion_unlock_ctx(struct ioregionfd *ctx) +{ + if (!ctx) + return; + + spin_lock(&ctx->wq.lock); + ctx->busy = false; + wake_up_locked(&ctx->wq); + spin_unlock(&ctx->wq.lock); +} + static int ioregion_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, int len, void *val) @@ -115,11 +167,15 @@ ioregion_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, } } + ioregion_lock_ctx(p->ctx); + send_cmd: memset(&buf, 0, sizeof(buf)); if (!pack_cmd(&buf.cmd, addr - p->paddr, len, IOREGIONFD_CMD_READ, - 1, p->user_data, NULL)) - return -EOPNOTSUPP; + 1, p->user_data, NULL)) { + ret = -EOPNOTSUPP; + goto out; + } ret = kernel_write(p->wf, &buf.cmd, sizeof(buf.cmd), 0); state = (ret == sizeof(buf.cmd)) ? GET_REPLY : SEND_CMD; @@ -129,14 +185,15 @@ ioregion_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, } if (ret != sizeof(buf.cmd)) { ret = (ret < 0) ? ret : -EIO; - return (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + ret = (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + goto out; } - if (!p->rf) + if (!p->ctx) return 0; get_repl: memset(&buf, 0, sizeof(buf)); - ret = kernel_read(p->rf, &buf.resp, sizeof(buf.resp), 0); + ret = kernel_read(p->ctx->rf, &buf.resp, sizeof(buf.resp), 0); state = (ret == sizeof(buf.resp)) ? COMPLETE : GET_REPLY; if (signal_pending(current) && state == GET_REPLY) { ioregion_save_ctx(vcpu, 1, addr, state, val); @@ -144,12 +201,17 @@ ioregion_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, } if (ret != sizeof(buf.resp)) { ret = (ret < 0) ? ret : -EIO; - return (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + ret = (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + goto out; } memcpy(val, &buf.resp.data, len); + ret = 0; - return 0; +out: + ioregion_unlock_ctx(p->ctx); + + return ret; } static int @@ -177,11 +239,15 @@ ioregion_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, } } + ioregion_lock_ctx(p->ctx); + send_cmd: memset(&buf, 0, sizeof(buf)); if (!pack_cmd(&buf.cmd, addr - p->paddr, len, IOREGIONFD_CMD_WRITE, - p->posted_writes ? 0 : 1, p->user_data, val)) - return -EOPNOTSUPP; + p->posted_writes ? 0 : 1, p->user_data, val)) { + ret = -EOPNOTSUPP; + goto out; + } ret = kernel_write(p->wf, &buf.cmd, sizeof(buf.cmd), 0); state = (ret == sizeof(buf.cmd)) ? GET_REPLY : SEND_CMD; @@ -191,13 +257,14 @@ ioregion_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, } if (ret != sizeof(buf.cmd)) { ret = (ret < 0) ? ret : -EIO; - return (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + ret = (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + goto out; } get_repl: if (!p->posted_writes) { memset(&buf, 0, sizeof(buf)); - ret = kernel_read(p->rf, &buf.resp, sizeof(buf.resp), 0); + ret = kernel_read(p->ctx->rf, &buf.resp, sizeof(buf.resp), 0); state = (ret == sizeof(buf.resp)) ? COMPLETE : GET_REPLY; if (signal_pending(current) && state == GET_REPLY) { ioregion_save_ctx(vcpu, 0, addr, state, (void *)val); @@ -205,11 +272,16 @@ ioregion_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, } if (ret != sizeof(buf.resp)) { ret = (ret < 0) ? ret : -EIO; - return (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + ret = (ret == -EAGAIN || ret == -EWOULDBLOCK) ? -EINVAL : ret; + goto out; } } + ret = 0; - return 0; +out: + ioregion_unlock_ctx(p->ctx); + + return ret; } /* @@ -285,6 +357,33 @@ get_bus_from_flags(__u32 flags) return KVM_MMIO_BUS; } +/* assumes kvm->slots_lock held */ +static bool +ioregion_get_ctx(struct kvm *kvm, struct ioregion *p, struct file *rf, int bus_idx) +{ + struct ioregion *_p; + struct list_head *ioregions; + + ioregions = get_ioregion_list(kvm, bus_idx); + list_for_each_entry(_p, ioregions, list) + if (file_inode(_p->ctx->rf)->i_ino == file_inode(rf)->i_ino) { + p->ctx = _p->ctx; + kref_get(&p->ctx->kref); + return true; + } + + p->ctx = kzalloc(sizeof(*p->ctx), GFP_KERNEL_ACCOUNT); + if (!p->ctx) + return false; + + p->ctx->rf = rf; + p->ctx->busy = false; + init_waitqueue_head(&p->ctx->wq); + kref_get(&p->ctx->kref); + + return true; +} + int kvm_set_ioregion_idx(struct kvm *kvm, struct kvm_ioregion *args, enum kvm_bus bus_idx) { @@ -309,11 +408,10 @@ kvm_set_ioregion_idx(struct kvm *kvm, struct kvm_ioregion *args, enum kvm_bus bu } INIT_LIST_HEAD(&p->list); + p->wf = wfile; p->paddr = args->guest_paddr; p->size = args->memory_size; p->user_data = args->user_data; - p->rf = rfile; - p->wf = wfile; p->posted_writes = args->flags & KVM_IOREGION_POSTED_WRITES; mutex_lock(&kvm->slots_lock); @@ -322,6 +420,12 @@ kvm_set_ioregion_idx(struct kvm *kvm, struct kvm_ioregion *args, enum kvm_bus bu ret = -EEXIST; goto unlock_fail; } + + if (rfile && !ioregion_get_ctx(kvm, p, rfile, bus_idx)) { + ret = -ENOMEM; + goto unlock_fail; + } + kvm_iodevice_init(&p->dev, &ioregion_ops); ret = kvm_io_bus_register_dev(kvm, bus_idx, p->paddr, p->size, &p->dev); @@ -335,6 +439,8 @@ kvm_set_ioregion_idx(struct kvm *kvm, struct kvm_ioregion *args, enum kvm_bus bu unlock_fail: mutex_unlock(&kvm->slots_lock); + if (p->ctx) + kref_put(&p->ctx->kref, ctx_free); kfree(p); fail: if (rfile) From patchwork Sun Feb 21 12:04:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elena Afanasova X-Patchwork-Id: 12097303 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CD8FC433E0 for ; Sun, 21 Feb 2021 12:12:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 44BCE64F00 for ; Sun, 21 Feb 2021 12:12:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229974AbhBUMMM (ORCPT ); Sun, 21 Feb 2021 07:12:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229925AbhBUMMH (ORCPT ); Sun, 21 Feb 2021 07:12:07 -0500 Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B07D0C06178C for ; Sun, 21 Feb 2021 04:11:26 -0800 (PST) Received: by mail-lj1-x22d.google.com with SMTP id a17so47264940ljq.2 for ; Sun, 21 Feb 2021 04:11:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6BaYeStx5XlEI2H6fREV0PGgKmNTuolmOOj8Wnxngjs=; b=QGUUUqwYGj3Bl/lr90Oe9KVytxhiEOK0IuBBCOV9l8LowRPufr5Q1TSW1mhdPCzXYX 7BJGIOm78YenXRt/KoPDcC3kvQH7WDjFMLGcCIkfaQTDRKdFM9Btfha5YnnDx71gtX8v g1a9d+N5dRoYIB8pmtf4e+Y5FokO2hil6Q67hd5syBEwQ3kAdmacR4f+p2zfRA7Ac8Bn y32Idt0CCDrhKZrmyMQ8OK4Bvz7bxMMogHUMafNbk2Jno46LNYx/KbLmQFPdnB9YC3+J Y3b7zrBEng2F4TKcSr51V1ddyeKR1sfvjaKwOmbi3JCyCZkO16/yhSF7Jb61SYpXUlVr v5rA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6BaYeStx5XlEI2H6fREV0PGgKmNTuolmOOj8Wnxngjs=; b=cLo0WNCEOXy9QFCpBaAKqCNT/A8y5MkNW5QKJpObjOnzHxd6w06t5bMPkghHytmEQT Q6MUt5HuVCcXDysmyL2wq9MgGL9LsywOp7hPnq8Uo5EFPUjjIylmwAbi3V6XGk8MP2AI P+bTXJphpk6mQQeEtJidAHSUq73J//PzbYZ6kzZHJIuG6FOiGgl1LMAb6n75zBadXULT sxjWdVq5sg1FZEGjDVaA7GlJgSRcGwhsoWqcJKC8OqnW9NrHU0iVEA3ODCWEqmnt6Ox4 MBvNKwnFh3+w2qkyPw4dmzx3treRfaE093kn39uA9zXoHKVxAX9ETX9ddYgcWf9gugoM DzKw== X-Gm-Message-State: AOAM533rCz6LHoFffRyo5uhZ/nhOAVqObhY9O725W67IwZtWFlGpr6TN qQgLCBD0hsWG0IWnx6db4jqUXnCeMt7NRfnQ X-Google-Smtp-Source: ABdhPJxru09ZwV8MbCjkpcHfT/qCx8SjEGcn+/3yVxYO9XyvcHoJpoqW2hvMt3h6EXnCoHwSuj0Exw== X-Received: by 2002:a2e:9b5a:: with SMTP id o26mr3274555ljj.6.1613909484968; Sun, 21 Feb 2021 04:11:24 -0800 (PST) Received: from localhost.localdomain (37-145-186-126.broadband.corbina.ru. [37.145.186.126]) by smtp.gmail.com with ESMTPSA id q6sm1547715lfn.23.2021.02.21.04.11.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Feb 2021 04:11:24 -0800 (PST) From: Elena Afanasova To: kvm@vger.kernel.org Cc: stefanha@redhat.com, jag.raman@oracle.com, elena.ufimtseva@oracle.com, pbonzini@redhat.com, jasowang@redhat.com, mst@redhat.com, cohuck@redhat.com, john.levon@nutanix.com, Elena Afanasova Subject: [RFC v3 5/5] KVM: enforce NR_IOBUS_DEVS limit if kmemcg is disabled Date: Sun, 21 Feb 2021 15:04:41 +0300 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org ioregionfd relies on kmemcg in order to limit the amount of kernel memory that userspace can consume. Enforce NR_IOBUS_DEVS hardcoded limit in case kmemcg is disabled. Signed-off-by: Elena Afanasova --- virt/kvm/kvm_main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index df387857f51f..99a828153afd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4320,9 +4320,12 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, if (!bus) return -ENOMEM; - /* exclude ioeventfd which is limited by maximum fd */ - if (bus->dev_count - bus->ioeventfd_count > NR_IOBUS_DEVS - 1) - return -ENOSPC; + /* enforce hard limit if kmemcg is disabled and + * exclude ioeventfd which is limited by maximum fd + */ + if (!memcg_kmem_enabled()) + if (bus->dev_count - bus->ioeventfd_count > NR_IOBUS_DEVS - 1) + return -ENOSPC; new_bus = kmalloc(struct_size(bus, range, bus->dev_count + 1), GFP_KERNEL_ACCOUNT);