[RFC,v6,04/92] kvm: introspection: add the read/dispatch message function

Message ID	20190809160047.8319-5-alazar@bitdefender.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: =?utf-8?q?Adalbert_Laz=C4=83r?= <alazar@bitdefender.com> To: kvm@vger.kernel.org Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, Tamas K Lengyel <tamas@tklengyel.com>, Mathieu Tarral <mathieu.tarral@protonmail.com>, =?utf-8?q?Samuel_Laur=C3=A9?= =?utf-8?q?n?= <samuel.lauren@iki.fi>, Patrick Colp <patrick.colp@oracle.com>, Jan Kiszka <jan.kiszka@siemens.com>, Stefan Hajnoczi <stefanha@redhat.com>, Weijiang Yang <weijiang.yang@intel.com>, Zhang@vger.kernel.org, Yu C <yu.c.zhang@intel.com>, =?utf-8?q?Mihai_Don=C8=9Bu?= <mdontu@bitdefender.com>, =?utf-8?q?Adalbert_L?= =?utf-8?q?az=C4=83r?= <alazar@bitdefender.com> Subject: [RFC PATCH v6 04/92] kvm: introspection: add the read/dispatch message function Date: Fri, 9 Aug 2019 18:59:19 +0300 Message-Id: <20190809160047.8319-5-alazar@bitdefender.com> In-Reply-To: <20190809160047.8319-1-alazar@bitdefender.com> References: <20190809160047.8319-1-alazar@bitdefender.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	VM introspection \| expand [RFC,v6,00/92] VM introspection [RFC,v6,01/92] kvm: introduce KVMI (VM introspection subsystem) [RFC,v6,02/92] kvm: introspection: add basic ioctls (hook/unhook) [RFC,v6,03/92] kvm: introspection: add permission access ioctls [RFC,v6,04/92] kvm: introspection: add the read/dispatch message function [RFC,v6,05/92] kvm: introspection: add KVMI_GET_VERSION [RFC,v6,06/92] kvm: introspection: add KVMI_CONTROL_CMD_RESPONSE [RFC,v6,07/92] kvm: introspection: honor the reply option when handling the KVMI_GET_VERSION command [RFC,v6,08/92] kvm: introspection: add KVMI_CHECK_COMMAND and KVMI_CHECK_EVENT [RFC,v6,09/92] kvm: introspection: add KVMI_GET_GUEST_INFO [RFC,v6,10/92] kvm: introspection: add KVMI_CONTROL_VM_EVENTS [RFC,v6,11/92] kvm: introspection: add vCPU related data [RFC,v6,12/92] kvm: introspection: add a jobs list to every introspected vCPU [RFC,v6,13/92] kvm: introspection: make the vCPU wait even when its jobs list is empty [RFC,v6,14/92] kvm: introspection: handle introspection commands before returning to guest [RFC,v6,15/92] kvm: introspection: handle vCPU related introspection commands [RFC,v6,16/92] kvm: introspection: handle events and event replies [RFC,v6,17/92] kvm: introspection: introduce event actions [RFC,v6,18/92] kvm: introspection: add KVMI_EVENT_UNHOOK [RFC,v6,19/92] kvm: introspection: add KVMI_EVENT_CREATE_VCPU [RFC,v6,20/92] kvm: introspection: add KVMI_GET_VCPU_INFO [RFC,v6,21/92] kvm: page track: add track_create_slot() callback [RFC,v6,22/92] kvm: x86: provide all page tracking hooks with the guest virtual address [RFC,v6,23/92] kvm: page track: add support for preread, prewrite and preexec [RFC,v6,24/92] kvm: x86: wire in the preread/prewrite/preexec page trackers [RFC,v6,25/92] kvm: x86: intercept the write access on sidt and other emulated instructions [RFC,v6,26/92] kvm: x86: add kvm_mmu_nested_pagefault() [RFC,v6,27/92] kvm: introspection: use page track [RFC,v6,28/92] kvm: x86: consult the page tracking from kvm_mmu_get_page() and __direct_map() [RFC,v6,29/92] kvm: introspection: add KVMI_CONTROL_EVENTS [RFC,v6,30/92] kvm: x86: add kvm_spt_fault() [RFC,v6,31/92] kvm: introspection: add KVMI_EVENT_PF [RFC,v6,32/92] kvm: introspection: add KVMI_GET_PAGE_ACCESS [RFC,v6,33/92] kvm: introspection: add KVMI_SET_PAGE_ACCESS [RFC,v6,34/92] Documentation: Introduce EPT based Subpage Protection [RFC,v6,35/92] KVM: VMX: Add control flags for SPP enabling [RFC,v6,36/92] KVM: VMX: Implement functions for SPPT paging setup [RFC,v6,37/92] KVM: VMX: Introduce SPP access bitmap and operation functions [RFC,v6,38/92] KVM: VMX: Add init/set/get functions for SPP [RFC,v6,39/92] KVM: VMX: Introduce SPP user-space IOCTLs [RFC,v6,40/92] KVM: VMX: Handle SPP induced vmexit and page fault [RFC,v6,41/92] KVM: MMU: Enable Lazy mode SPPT setup [RFC,v6,42/92] KVM: MMU: Handle host memory remapping and reclaim [RFC,v6,43/92] kvm: introspection: add KVMI_CONTROL_SPP [RFC,v6,44/92] kvm: introspection: extend the internal database of tracked pages with write_bitmap … [RFC,v6,45/92] kvm: introspection: add KVMI_GET_PAGE_WRITE_BITMAP [RFC,v6,46/92] kvm: introspection: add KVMI_SET_PAGE_WRITE_BITMAP [RFC,v6,47/92] kvm: introspection: add KVMI_READ_PHYSICAL and KVMI_WRITE_PHYSICAL [RFC,v6,48/92] kvm: add kvm_vcpu_kick_and_wait() [RFC,v6,49/92] kvm: introspection: add KVMI_PAUSE_VCPU and KVMI_EVENT_PAUSE_VCPU [RFC,v6,50/92] kvm: introspection: add KVMI_GET_REGISTERS [RFC,v6,51/92] kvm: introspection: add KVMI_SET_REGISTERS [RFC,v6,52/92] kvm: introspection: add KVMI_GET_CPUID [RFC,v6,53/92] kvm: introspection: add KVMI_INJECT_EXCEPTION + KVMI_EVENT_TRAP [RFC,v6,54/92] kvm: introspection: add KVMI_CONTROL_CR and KVMI_EVENT_CR [RFC,v6,55/92] kvm: introspection: add KVMI_CONTROL_MSR and KVMI_EVENT_MSR [RFC,v6,56/92] kvm: x86: block any attempt to disable MSR interception if tracked by introspection [RFC,v6,57/92] kvm: introspection: add KVMI_GET_XSAVE [RFC,v6,58/92] kvm: introspection: add KVMI_GET_MTRR_TYPE [RFC,v6,59/92] kvm: introspection: add KVMI_EVENT_XSETBV [RFC,v6,60/92] kvm: x86: add kvm_arch_vcpu_set_guest_debug() [RFC,v6,61/92] kvm: introspection: add KVMI_EVENT_BREAKPOINT [RFC,v6,62/92] kvm: introspection: add KVMI_EVENT_HYPERCALL [RFC,v6,63/92] kvm: introspection: add KVMI_EVENT_DESCRIPTOR [RFC,v6,64/92] kvm: introspection: add single-stepping [RFC,v6,65/92] kvm: introspection: add KVMI_EVENT_SINGLESTEP [RFC,v6,66/92] kvm: introspection: add custom input when single-stepping a vCPU [RFC,v6,67/92] kvm: introspection: use single stepping on unimplemented instructions [RFC,v6,68/92] kvm: x86: emulate a guest page table walk on SPT violations due to A/D bit updates [RFC,v6,69/92] kvm: x86: keep the page protected if tracked by the introspection tool [RFC,v6,70/92] kvm: x86: filter out access rights only when tracked by the introspection tool [RFC,v6,71/92] mm: add support for remote mapping [RFC,v6,72/92] kvm: introspection: add memory map/unmap support on the guest side [RFC,v6,73/92] kvm: introspection: use remote mapping [RFC,v6,74/92] kvm: x86: do not unconditionally patch the hypercall instruction during emulation [RFC,v6,75/92] kvm: x86: disable gpa_available optimization in emulator_read_write_onepage() [RFC,v6,76/92] kvm: x86: disable EPT A/D bits if introspection is present [RFC,v6,77/92] kvm: introspection: add trace functions [RFC,v6,78/92] kvm: x86: add tracepoints for interrupt and exception injections [RFC,v6,79/92] kvm: x86: emulate movsd xmm, m64 [RFC,v6,80/92] kvm: x86: emulate movss xmm, m32 [RFC,v6,81/92] kvm: x86: emulate movq xmm, m64 [RFC,v6,82/92] kvm: x86: emulate movq r, xmm [RFC,v6,83/92] kvm: x86: emulate movd xmm, m32 [RFC,v6,84/92] kvm: x86: enable the half part of movss, movsd, movups [RFC,v6,85/92] kvm: x86: emulate lfence [RFC,v6,86/92] kvm: x86: emulate xorpd xmm2/m128, xmm1 [RFC,v6,87/92] kvm: x86: emulate xorps xmm/m128, xmm [RFC,v6,88/92] kvm: x86: emulate fst/fstp m64fp [RFC,v6,89/92] kvm: x86: make lock cmpxchg r, r/m atomic [RFC,v6,90/92] kvm: x86: emulate lock cmpxchg8b atomically [RFC,v6,91/92] kvm: x86: emulate lock cmpxchg16b m128 [RFC,v6,92/92] kvm: x86: fallback to the single-step on multipage CMPXCHG emulation

diff --git a/Documentation/virtual/kvm/kvmi.rst b/Documentation/virtual/kvm/kvmi.rst index 47b7c36d334a..1d4a1dcd7d2f 100644 --- a/Documentation/virtual/kvm/kvmi.rst +++ b/Documentation/virtual/kvm/kvmi.rst @@ -64,6 +64,85 @@ used on that guest. Obviously, whether the guest can really continue normal execution depends on whether the introspection tool has made any modifications that require an active KVMI channel. +All messages (commands or events) have a common header:: + + struct kvmi_msg_hdr { + __u16 id; + __u16 size; + __u32 seq; + }; + +The replies have the same header, with the sequence number (``seq``) +and message id (``id``) matching the command/event. + +After ``kvmi_msg_hdr``, ``id`` specific data of ``size`` bytes will +follow. + +The message header and its data must be sent with one ``sendmsg()`` call +to the socket. This simplifies the receiver loop and avoids +the reconstruction of messages on the other side. + +The wire protocol uses the host native byte-order. The introspection tool +must check this during the handshake and do the necessary conversion. + +A command reply begins with:: + + struct kvmi_error_code { + __s32 err; + __u32 padding; + } + +followed by the command specific data if the error code ``err`` is zero. + +The error code -KVM_EOPNOTSUPP is returned for unsupported commands. + +The error code -KVM_EPERM is returned for disallowed commands (see **Hooking**). + +The error code is related to the message processing, including unsupported +commands. For all the other errors (incomplete messages, wrong sequence +numbers, socket errors etc.) the socket will be closed. The device +manager should reconnect. + +While all commands will have a reply as soon as possible, the replies +to events will probably be delayed until a set of (new) commands will +complete:: + + Host kernel Tool + ----------- ---- + event 1 -> + <- command 1 + command 1 reply -> + <- command 2 + command 2 reply -> + <- event 1 reply + +If both ends send a message at the same time:: + + Host kernel Tool + ----------- ---- + event X -> <- command X + +the host kernel will reply to 'command X', regardless of the receive time +(before or after the 'event X' was sent). + +As it can be seen below, the wire protocol specifies occasional padding. This +is to permit working with the data by directly using C structures or to round +the structure size to a multiple of 8 bytes (64bit) to improve the copy +operations that happen during ``recvmsg()`` or ``sendmsg()``. The members +should have the native alignment of the host (4 bytes on x86). All padding +must be initialized with zero otherwise the respective commands will fail +with -KVM_EINVAL. + +To describe the commands/events, we reuse some conventions from api.txt: + + - Architectures: which instruction set architectures provide this command/event + + - Versions: which versions provide this command/event + + - Parameters: incoming message data + + - Returns: outgoing/reply message data + Handshake --------- @@ -99,6 +178,13 @@ commands/events) to KVM, and forget about it. It will be notified by KVM when the introspection tool closes the file handle (in case of errors), and should reinitiate the handshake. +Once the file handle reaches KVM, the introspection tool should use +the *KVMI_GET_VERSION* command to get the API version and/or +the *KVMI_CHECK_COMMAND* and *KVMI_CHECK_EVENTS* commands to see which +commands/events are allowed for this guest. The error code -KVM_EPERM +will be returned if the introspection tool uses a command or enables an +event which is disallowed. + Unhooking --------- diff --git a/include/uapi/linux/kvmi.h b/include/uapi/linux/kvmi.h index dbf63ad0862f..6c7600ed4564 100644 --- a/include/uapi/linux/kvmi.h +++ b/include/uapi/linux/kvmi.h @@ -65,4 +65,17 @@ enum { KVMI_NUM_EVENTS }; +#define KVMI_MSG_SIZE (4096 - sizeof(struct kvmi_msg_hdr)) + +struct kvmi_msg_hdr { + __u16 id; + __u16 size; + __u32 seq; +}; + +struct kvmi_error_code { + __s32 err; + __u32 padding; +}; + #endif /* _UAPI__LINUX_KVMI_H */ diff --git a/virt/kvm/kvmi.c b/virt/kvm/kvmi.c index dc64f975998f..afa31748d7f4 100644 --- a/virt/kvm/kvmi.c +++ b/virt/kvm/kvmi.c @@ -10,13 +10,54 @@ #include <linux/kthread.h> #include <linux/bitmap.h> -int kvmi_init(void) +static struct kmem_cache *msg_cache; + +void *kvmi_msg_alloc(void) +{ + return kmem_cache_zalloc(msg_cache, GFP_KERNEL); +} + +void *kvmi_msg_alloc_check(size_t size) +{ + if (size > KVMI_MSG_SIZE_ALLOC) + return NULL; + return kvmi_msg_alloc(); +} + +void kvmi_msg_free(void *addr) +{ + if (addr) + kmem_cache_free(msg_cache, addr); +} + +static void kvmi_cache_destroy(void) { + kmem_cache_destroy(msg_cache); + msg_cache = NULL; +} + +static int kvmi_cache_create(void) +{ + msg_cache = kmem_cache_create("kvmi_msg", KVMI_MSG_SIZE_ALLOC, + 4096, SLAB_ACCOUNT, NULL); + + if (!msg_cache) { + kvmi_cache_destroy(); + + return -1; + } + return 0; } +int kvmi_init(void) +{ + return kvmi_cache_create(); +} + void kvmi_uninit(void) { + kvmi_cache_destroy(); } static bool alloc_kvmi(struct kvm *kvm, const struct kvm_introspection *qemu) diff --git a/virt/kvm/kvmi_int.h b/virt/kvm/kvmi_int.h index bd8b539e917a..76119a4b69d8 100644 --- a/virt/kvm/kvmi_int.h +++ b/virt/kvm/kvmi_int.h @@ -23,6 +23,8 @@ #define kvmi_err(ikvm, fmt, ...) \ kvm_info("%pU ERROR: " fmt, &ikvm->uuid, ## __VA_ARGS__) +#define KVMI_MSG_SIZE_ALLOC (sizeof(struct kvmi_msg_hdr) + KVMI_MSG_SIZE) + #define KVMI_KNOWN_VCPU_EVENTS ( \ BIT(KVMI_EVENT_CR) | \ BIT(KVMI_EVENT_MSR) | \ @@ -91,4 +93,9 @@ void kvmi_sock_shutdown(struct kvmi *ikvm); void kvmi_sock_put(struct kvmi *ikvm); bool kvmi_msg_process(struct kvmi *ikvm); +/* kvmi.c */ +void *kvmi_msg_alloc(void); +void *kvmi_msg_alloc_check(size_t size); +void kvmi_msg_free(void *addr); + #endif diff --git a/virt/kvm/kvmi_msg.c b/virt/kvm/kvmi_msg.c index 4de012eafb6d..af6bc47dc031 100644 --- a/virt/kvm/kvmi_msg.c +++ b/virt/kvm/kvmi_msg.c @@ -8,6 +8,19 @@ #include <linux/net.h> #include "kvmi_int.h" +static const char *const msg_IDs[] = { +}; + +static bool is_known_message(u16 id) +{ + return id < ARRAY_SIZE(msg_IDs) && msg_IDs[id]; +} + +static const char *id2str(u16 id) +{ + return is_known_message(id) ? msg_IDs[id] : "unknown"; +} + bool kvmi_sock_get(struct kvmi *ikvm, int fd) { struct socket *sock; @@ -35,8 +48,231 @@ void kvmi_sock_shutdown(struct kvmi *ikvm) kernel_sock_shutdown(ikvm->sock, SHUT_RDWR); } +static int kvmi_sock_read(struct kvmi *ikvm, void *buf, size_t size) +{ + struct kvec i = { + .iov_base = buf, + .iov_len = size, + }; + struct msghdr m = { }; + int rc; + + rc = kernel_recvmsg(ikvm->sock, &m, &i, 1, size, MSG_WAITALL); + + if (rc > 0) + print_hex_dump_debug("read: ", DUMP_PREFIX_NONE, 32, 1, + buf, rc, false); + + if (unlikely(rc != size)) { + if (rc >= 0) + rc = -EPIPE; + else + kvmi_err(ikvm, "kernel_recvmsg: %d\n", rc); + return rc; + } + + return 0; +} + +static int kvmi_sock_write(struct kvmi *ikvm, struct kvec *i, size_t n, + size_t size) +{ + struct msghdr m = { }; + int rc, k; + + rc = kernel_sendmsg(ikvm->sock, &m, i, n, size); + + if (rc > 0) + for (k = 0; k < n; k++) + print_hex_dump_debug("write: ", DUMP_PREFIX_NONE, 32, 1, + i[k].iov_base, i[k].iov_len, false); + + if (unlikely(rc != size)) { + kvmi_err(ikvm, "kernel_sendmsg: %d\n", rc); + if (rc >= 0) + rc = -EPIPE; + return rc; + } + + return 0; +} + +static int kvmi_msg_reply(struct kvmi *ikvm, + const struct kvmi_msg_hdr *msg, int err, + const void *rpl, size_t rpl_size) +{ + struct kvmi_error_code ec; + struct kvmi_msg_hdr h; + struct kvec vec[3] = { + { .iov_base = &h, .iov_len = sizeof(h) }, + { .iov_base = &ec, .iov_len = sizeof(ec) }, + { .iov_base = (void *)rpl, .iov_len = rpl_size }, + }; + size_t size = sizeof(h) + sizeof(ec) + (err ? 0 : rpl_size); + size_t n = err ? ARRAY_SIZE(vec) - 1 : ARRAY_SIZE(vec); + + memset(&h, 0, sizeof(h)); + h.id = msg->id; + h.seq = msg->seq; + h.size = size - sizeof(h); + + memset(&ec, 0, sizeof(ec)); + ec.err = err; + + return kvmi_sock_write(ikvm, vec, n, size); +} + +static int kvmi_msg_vm_reply(struct kvmi *ikvm, + const struct kvmi_msg_hdr *msg, int err, + const void *rpl, size_t rpl_size) +{ + return kvmi_msg_reply(ikvm, msg, err, rpl, rpl_size); +} + +static bool is_command_allowed(struct kvmi *ikvm, int id) +{ + return test_bit(id, ikvm->cmd_allow_mask); +} + +/* + * These commands are executed on the receiving thread/worker. + */ +static int(*const msg_vm[])(struct kvmi *, const struct kvmi_msg_hdr *, + const void *) = { +}; + +static bool is_vm_message(u16 id) +{ + return id < ARRAY_SIZE(msg_vm) && !!msg_vm[id]; +} + +static bool is_unsupported_message(u16 id) +{ + bool supported; + + supported = is_known_message(id) && is_vm_message(id); + + return !supported; +} + +static int kvmi_consume_bytes(struct kvmi *ikvm, size_t bytes) +{ + size_t to_read; + u8 buf[1024]; + int err = 0; + + while (bytes && !err) { + to_read = min(bytes, sizeof(buf)); + + err = kvmi_sock_read(ikvm, buf, to_read); + + bytes -= to_read; + } + + return err; +} + +static struct kvmi_msg_hdr *kvmi_msg_recv(struct kvmi *ikvm, bool *unsupported) +{ + struct kvmi_msg_hdr *msg; + int err; + + *unsupported = false; + + msg = kvmi_msg_alloc(); + if (!msg) + goto out_err; + + err = kvmi_sock_read(ikvm, msg, sizeof(*msg)); + if (err) + goto out_err; + + if (msg->size > KVMI_MSG_SIZE) + goto out_err_msg; + + if (is_unsupported_message(msg->id)) { + if (msg->size && kvmi_consume_bytes(ikvm, msg->size) < 0) + goto out_err_msg; + + *unsupported = true; + return msg; + } + + if (msg->size && kvmi_sock_read(ikvm, msg + 1, msg->size) < 0) + goto out_err_msg; + + return msg; + +out_err_msg: + kvmi_err(ikvm, "%s id %u (%s) size %u\n", + __func__, msg->id, id2str(msg->id), msg->size); + +out_err: + kvmi_msg_free(msg); + + return NULL; +} + +static int kvmi_msg_dispatch_vm_cmd(struct kvmi *ikvm, + const struct kvmi_msg_hdr *msg) +{ + return msg_vm[msg->id](ikvm, msg, msg + 1); +} + +static int kvmi_msg_dispatch(struct kvmi *ikvm, + struct kvmi_msg_hdr *msg, bool *queued) +{ + int err; + + err = kvmi_msg_dispatch_vm_cmd(ikvm, msg); + + if (err) + kvmi_err(ikvm, "%s: msg id: %u (%s), err: %d\n", __func__, + msg->id, id2str(msg->id), err); + + return err; +} + +static bool is_message_allowed(struct kvmi *ikvm, __u16 id) +{ + if (id == KVMI_EVENT_REPLY) + return true; + + /* + * Some commands (eg.pause) request events that might be + * disallowed. The command is allowed here, but the function + * handling the command will return -KVM_EPERM if the event + * is disallowed. + */ + return is_command_allowed(ikvm, id); +} + bool kvmi_msg_process(struct kvmi *ikvm) { - kvmi_info(ikvm, "TODO: %s", __func__); - return false; + struct kvmi_msg_hdr *msg; + bool queued = false; + bool unsupported; + int err = -1; + + msg = kvmi_msg_recv(ikvm, &unsupported); + if (!msg) + goto out; + + if (unsupported) { + err = kvmi_msg_vm_reply(ikvm, msg, -KVM_EOPNOTSUPP, NULL, 0); + goto out; + } + + if (!is_message_allowed(ikvm, msg->id)) { + err = kvmi_msg_vm_reply(ikvm, msg, -KVM_EPERM, NULL, 0); + goto out; + } + + err = kvmi_msg_dispatch(ikvm, msg, &queued); + +out: + if (!queued) + kvmi_msg_free(msg); + + return err == 0; }

[RFC,v6,04/92] kvm: introspection: add the read/dispatch message function

Commit Message

Patch