[v12,32/77] KVM: introspection: add the read/dispatch message function

Message ID	20211006173113.26445-33-alazar@bitdefender.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: =?utf-8?q?Adalbert_Laz=C4=83r?= <alazar@bitdefender.com> To: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org, Paolo Bonzini <pbonzini@redhat.com>, Sean Christopherson <seanjc@google.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, Mathieu Tarral <mathieu.tarral@protonmail.com>, Tamas K Lengyel <tamas@tklengyel.com>, =?utf-8?q?Adalbert_Laz=C4=83r?= <alazar@bitdefender.com> Subject: [PATCH v12 32/77] KVM: introspection: add the read/dispatch message function Date: Wed, 6 Oct 2021 20:30:28 +0300 Message-Id: <20211006173113.26445-33-alazar@bitdefender.com> In-Reply-To: <20211006173113.26445-1-alazar@bitdefender.com> References: <20211006173113.26445-1-alazar@bitdefender.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	VM introspection \| expand [v12,00/77] VM introspection [v12,01/77] KVM: UAPI: add error codes used by the VM introspection code [v12,02/77] KVM: add kvm_vcpu_kick_and_wait() [v12,03/77] KVM: x86: add kvm_arch_vcpu_get_regs() and kvm_arch_vcpu_get_sregs() [v12,04/77] KVM: x86: add kvm_arch_vcpu_set_regs() [v12,05/77] KVM: x86: avoid injecting #PF when emulate the VMCALL instruction [v12,06/77] KVM: x86: add kvm_x86_ops.bp_intercepted() [v12,07/77] KVM: x86: add kvm_x86_ops.control_cr3_intercept() [v12,08/77] KVM: x86: add kvm_x86_ops.cr3_write_intercepted() [v12,09/77] KVM: x86: add kvm_x86_ops.desc_ctrl_supported() [v12,10/77] KVM: svm: add support for descriptor-table VM-exits [v12,11/77] KVM: x86: add kvm_x86_ops.control_desc_intercept() [v12,12/77] KVM: x86: add kvm_x86_ops.desc_intercepted() [v12,13/77] KVM: x86: add kvm_x86_ops.msr_write_intercepted() [v12,14/77] KVM: x86: svm: use the vmx convention to control the MSR interception [v12,15/77] KVM: x86: add kvm_x86_ops.control_msr_intercept() [v12,16/77] KVM: x86: save the error code during EPT/NPF exits handling [v12,17/77] KVM: x86: add kvm_x86_ops.fault_gla() [v12,18/77] KVM: x86: add kvm_x86_ops.control_singlestep() [v12,19/77] KVM: x86: export kvm_arch_vcpu_set_guest_debug() [v12,20/77] KVM: x86: extend kvm_mmu_gva_to_gpa_system() with the 'access' parameter [v12,21/77] KVM: x86: export kvm_inject_pending_exception() [v12,22/77] KVM: x86: export kvm_vcpu_ioctl_x86_get_xsave() [v12,23/77] KVM: x86: export kvm_vcpu_ioctl_x86_set_xsave() [v12,24/77] KVM: x86: page track: provide all callbacks with the guest virtual address [v12,25/77] KVM: x86: page track: add track_create_slot() callback [v12,26/77] KVM: x86: page_track: add support for preread, prewrite and preexec [v12,27/77] KVM: x86: wire in the preread/prewrite/preexec page trackers [v12,28/77] KVM: x86: disable gpa_available optimization for fetch and page-walk SPT violations [v12,29/77] KVM: introduce VM introspection [v12,30/77] KVM: introspection: add hook/unhook ioctls [v12,31/77] KVM: introspection: add permission access ioctls [v12,32/77] KVM: introspection: add the read/dispatch message function [v12,33/77] KVM: introspection: add KVMI_GET_VERSION [v12,34/77] KVM: introspection: add KVMI_VM_CHECK_COMMAND and KVMI_VM_CHECK_EVENT [v12,35/77] KVM: introspection: add KVMI_VM_GET_INFO [v12,36/77] KVM: introspection: add KVM_INTROSPECTION_PREUNHOOK [v12,37/77] KVM: introspection: add KVMI_VM_EVENT_UNHOOK [v12,38/77] KVM: introspection: add KVMI_VM_CONTROL_EVENTS [v12,39/77] KVM: introspection: add KVMI_VM_READ_PHYSICAL/KVMI_VM_WRITE_PHYSICAL [v12,40/77] KVM: introspection: add vCPU related data [v12,41/77] KVM: introspection: add a jobs list to every introspected vCPU [v12,42/77] KVM: introspection: handle vCPU introspection requests [v12,43/77] KVM: introspection: handle vCPU commands [v12,44/77] KVM: introspection: add KVMI_VCPU_GET_INFO [v12,45/77] KVM: introspection: add KVMI_VM_PAUSE_VCPU [v12,46/77] KVM: introspection: add support for vCPU events [v12,47/77] KVM: introspection: add KVMI_VCPU_EVENT_PAUSE [v12,48/77] KVM: introspection: add the crash action handling on the event reply [v12,49/77] KVM: introspection: add KVMI_VCPU_CONTROL_EVENTS [v12,50/77] KVM: introspection: add KVMI_VCPU_GET_REGISTERS [v12,51/77] KVM: introspection: add KVMI_VCPU_SET_REGISTERS [v12,52/77] KVM: introspection: add KVMI_VCPU_GET_CPUID [v12,53/77] KVM: introspection: add KVMI_VCPU_EVENT_HYPERCALL [v12,54/77] KVM: introspection: add KVMI_VCPU_EVENT_BREAKPOINT [v12,55/77] KVM: introspection: add cleanup support for vCPUs [v12,56/77] KVM: introspection: restore the state of #BP interception on unhook [v12,57/77] KVM: introspection: add KVMI_VM_CONTROL_CLEANUP [v12,58/77] KVM: introspection: add KVMI_VCPU_CONTROL_CR and KVMI_VCPU_EVENT_CR [v12,59/77] KVM: introspection: restore the state of CR3 interception on unhook [v12,60/77] KVM: introspection: add KVMI_VCPU_INJECT_EXCEPTION + KVMI_VCPU_EVENT_TRAP [v12,61/77] KVM: introspection: add KVMI_VCPU_EVENT_XSETBV [v12,62/77] KVM: introspection: add KVMI_VCPU_GET_XCR [v12,63/77] KVM: introspection: add KVMI_VCPU_GET_XSAVE [v12,64/77] KVM: introspection: add KVMI_VCPU_SET_XSAVE [v12,65/77] KVM: introspection: add KVMI_VCPU_GET_MTRR_TYPE [v12,66/77] KVM: introspection: add KVMI_VCPU_EVENT_DESCRIPTOR [v12,67/77] KVM: introspection: restore the state of descriptor-table register interception on unho… [v12,68/77] KVM: introspection: add KVMI_VCPU_CONTROL_MSR and KVMI_VCPU_EVENT_MSR [v12,69/77] KVM: introspection: restore the state of MSR interception on unhook [v12,70/77] KVM: introspection: add KVMI_VM_SET_PAGE_ACCESS [v12,71/77] KVM: introspection: add KVMI_VCPU_EVENT_PF [v12,72/77] KVM: introspection: extend KVMI_GET_VERSION with struct kvmi_features [v12,73/77] KVM: introspection: add KVMI_VCPU_CONTROL_SINGLESTEP [v12,74/77] KVM: introspection: add KVMI_VCPU_EVENT_SINGLESTEP [v12,75/77] KVM: introspection: add KVMI_VCPU_TRANSLATE_GVA [v12,76/77] KVM: introspection: emulate a guest page table walk on SPT violations due to A/D bit up… [v12,77/77] KVM: x86: call the page tracking code on emulation failure

diff --git a/Documentation/virt/kvm/kvmi.rst b/Documentation/virt/kvm/kvmi.rst index 59cc33a39f9f..ae6bbf37aef3 100644 --- a/Documentation/virt/kvm/kvmi.rst +++ b/Documentation/virt/kvm/kvmi.rst @@ -65,6 +65,74 @@ been used on that guest (if requested). Obviously, whether the guest can really continue normal execution depends on whether the introspection tool has made any modifications that require an active KVMI channel. +All messages (commands or events) have a common header:: + + struct kvmi_msg_hdr { + __u16 id; + __u16 size; + __u32 seq; + }; + +The replies have the same header, with the sequence number (``seq``) +and message id (``id``) matching the command/event. + +After ``kvmi_msg_hdr``, ``id`` specific data of ``size`` bytes will +follow. + +The message header and its data must be sent with one ``sendmsg()`` call +to the socket. This simplifies the receiver loop and avoids +the reconstruction of messages on the other side. + +The wire protocol uses the host native byte-order. The introspection tool +must check this during the handshake and do the necessary conversion. + +A command reply begins with:: + + struct kvmi_error_code { + __s32 err; + __u32 padding; + } + +followed by the command specific data if the error code ``err`` is zero. + +The error code -KVM_ENOSYS is returned for unsupported commands. + +The error code -KVM_EPERM is returned for disallowed commands (see **Hooking**). + +Other error codes can be returned during message handling, but for +some errors (incomplete messages, wrong sequence numbers, socket errors +etc.) the socket will be closed. The device manager should reconnect. + +When a vCPU thread sends an introspection event, it will wait (and handle +any related introspection command) until it gets the event reply:: + + Host kernel Introspection tool + ----------- ------------------ + event 1 -> + <- command 1 + command 1 reply -> + <- command 2 + command 2 reply -> + <- event 1 reply + +As it can be seen below, the wire protocol specifies occasional padding. This +is to permit working with the data by directly using C structures or to round +the structure size to a multiple of 8 bytes (64bit) to improve the copy +operations that happen during ``recvmsg()`` or ``sendmsg()``. The members +should have the native alignment of the host. All padding must be +initialized with zero otherwise the respective command will fail with +-KVM_EINVAL. + +To describe the commands/events, we reuse some conventions from api.rst: + + - Architectures: which instruction set architectures provide this command/event + + - Versions: which versions provide this command/event + + - Parameters: incoming message data + + - Returns: outgoing/reply message data + Handshake --------- @@ -99,6 +167,13 @@ In the end, the device manager will pass the file descriptor (plus the allowed commands/events) to KVM. It will detect when the socket is shutdown and it will reinitiate the handshake. +Once the file descriptor reaches KVM, the introspection tool should +use the *KVMI_GET_VERSION* command to get the API version and/or the +*KVMI_VM_CHECK_COMMAND* and *KVMI_VM_CHECK_EVENT* commands to see which +commands/events are allowed for this guest. The error code -KVM_EPERM +will be returned if the introspection tool uses a command or tries to +enable an event which is disallowed. + Unhooking --------- diff --git a/include/uapi/linux/kvmi.h b/include/uapi/linux/kvmi.h index 85f8622ddf95..2b37eee82c52 100644 --- a/include/uapi/linux/kvmi.h +++ b/include/uapi/linux/kvmi.h @@ -32,4 +32,15 @@ enum { KVMI_NEXT_VCPU_EVENT }; +struct kvmi_msg_hdr { + __u16 id; + __u16 size; + __u32 seq; +}; + +struct kvmi_error_code { + __s32 err; + __u32 padding; +}; + #endif /* _UAPI__LINUX_KVMI_H */ diff --git a/tools/testing/selftests/kvm/x86_64/kvmi_test.c b/tools/testing/selftests/kvm/x86_64/kvmi_test.c index 25bef2164186..6d7802403f16 100644 --- a/tools/testing/selftests/kvm/x86_64/kvmi_test.c +++ b/tools/testing/selftests/kvm/x86_64/kvmi_test.c @@ -15,6 +15,7 @@ #include "processor.h" #include "../lib/kvm_util_internal.h" +#include "linux/kvm_para.h" #include "linux/kvmi.h" #define VCPU_ID 1 @@ -117,10 +118,109 @@ static void unhook_introspection(struct kvm_vm *vm) errno, strerror(errno)); } +static void receive_data(void *dest, size_t size) +{ + ssize_t r; + + r = recv(Userspace_socket, dest, size, MSG_WAITALL); + TEST_ASSERT(r == size, + "recv() failed, expected %zd, result %zd, errno %d (%s)\n", + size, r, errno, strerror(errno)); +} + +static int receive_cmd_reply(struct kvmi_msg_hdr *req, void *rpl, + size_t rpl_size) +{ + struct kvmi_msg_hdr hdr; + struct kvmi_error_code ec; + + receive_data(&hdr, sizeof(hdr)); + + TEST_ASSERT(hdr.seq == req->seq, + "Unexpected messages sequence 0x%x, expected 0x%x\n", + hdr.seq, req->seq); + + TEST_ASSERT(hdr.size >= sizeof(ec), + "Invalid message size %d, expected %zd bytes (at least)\n", + hdr.size, sizeof(ec)); + + receive_data(&ec, sizeof(ec)); + + if (ec.err) { + TEST_ASSERT(hdr.size == sizeof(ec), + "Invalid command reply on error\n"); + } else { + TEST_ASSERT(hdr.size == sizeof(ec) + rpl_size, + "Invalid command reply\n"); + + if (rpl && rpl_size) + receive_data(rpl, rpl_size); + } + + return ec.err; +} + +static unsigned int new_seq(void) +{ + static unsigned int seq; + + return seq++; +} + +static void send_message(int msg_id, struct kvmi_msg_hdr *hdr, size_t size) +{ + ssize_t r; + + hdr->id = msg_id; + hdr->seq = new_seq(); + hdr->size = size - sizeof(*hdr); + + r = send(Userspace_socket, hdr, size, 0); + TEST_ASSERT(r == size, + "send() failed, sending %zd, result %zd, errno %d (%s)\n", + size, r, errno, strerror(errno)); +} + +static const char *kvm_strerror(int error) +{ + switch (error) { + case KVM_ENOSYS: + return "Invalid system call number"; + case KVM_EOPNOTSUPP: + return "Operation not supported on transport endpoint"; + case KVM_EAGAIN: + return "Try again"; + default: + return strerror(error); + } +} + +static int do_command(int cmd_id, struct kvmi_msg_hdr *req, + size_t req_size, void *rpl, size_t rpl_size) +{ + send_message(cmd_id, req, req_size); + return receive_cmd_reply(req, rpl, rpl_size); +} + +static void test_cmd_invalid(void) +{ + int invalid_msg_id = 0xffff; + struct kvmi_msg_hdr req; + int r; + + r = do_command(invalid_msg_id, &req, sizeof(req), NULL, 0); + TEST_ASSERT(r == -KVM_ENOSYS, + "Invalid command didn't failed with KVM_ENOSYS, error %d (%s)\n", + -r, kvm_strerror(-r)); +} + static void test_introspection(struct kvm_vm *vm) { setup_socket(); hook_introspection(vm); + + test_cmd_invalid(); + unhook_introspection(vm); } diff --git a/virt/kvm/introspection/kvmi.c b/virt/kvm/introspection/kvmi.c index 9b5f1b654125..3c51a5f59ac2 100644 --- a/virt/kvm/introspection/kvmi.c +++ b/virt/kvm/introspection/kvmi.c @@ -13,11 +13,51 @@ #define KVMI_NUM_EVENTS __cmp((int)KVMI_NEXT_VM_EVENT, \ (int)KVMI_NEXT_VCPU_EVENT, >) -int kvmi_init(void) +#define KVMI_MSG_SIZE_ALLOC (sizeof(struct kvmi_msg_hdr) + KVMI_MAX_MSG_SIZE) + +static struct kmem_cache *msg_cache; + +void *kvmi_msg_alloc(void) +{ + return kmem_cache_zalloc(msg_cache, GFP_KERNEL); +} + +void kvmi_msg_free(void *addr) +{ + if (addr) + kmem_cache_free(msg_cache, addr); +} + +static void kvmi_cache_destroy(void) +{ + kmem_cache_destroy(msg_cache); + msg_cache = NULL; +} + +static int kvmi_cache_create(void) { + msg_cache = kmem_cache_create("kvmi_msg", KVMI_MSG_SIZE_ALLOC, + 4096, SLAB_ACCOUNT, NULL); + + if (!msg_cache) { + kvmi_cache_destroy(); + + return -1; + } + return 0; } +bool kvmi_is_command_allowed(struct kvm_introspection *kvmi, u16 id) +{ + return id < KVMI_NUM_COMMANDS && test_bit(id, kvmi->cmd_allow_mask); +} + +int kvmi_init(void) +{ + return kvmi_cache_create(); +} + int kvmi_version(void) { return KVMI_VERSION; @@ -25,6 +65,7 @@ int kvmi_version(void) void kvmi_uninit(void) { + kvmi_cache_destroy(); } static void kvmi_free(struct kvm *kvm) diff --git a/virt/kvm/introspection/kvmi_int.h b/virt/kvm/introspection/kvmi_int.h index c89875bd2bac..206aaf93f8ba 100644 --- a/virt/kvm/introspection/kvmi_int.h +++ b/virt/kvm/introspection/kvmi_int.h @@ -7,6 +7,11 @@ #include <uapi/linux/kvmi.h> #define KVMI(kvm) ((kvm)->kvmi) +/* + * This limit is used to accommodate the largest known fixed-length + * message. + */ +#define KVMI_MAX_MSG_SIZE (4096 * 2 - sizeof(struct kvmi_msg_hdr)) /* kvmi_msg.c */ bool kvmi_sock_get(struct kvm_introspection *kvmi, int fd); @@ -14,4 +19,9 @@ void kvmi_sock_shutdown(struct kvm_introspection *kvmi); void kvmi_sock_put(struct kvm_introspection *kvmi); bool kvmi_msg_process(struct kvm_introspection *kvmi); +/* kvmi.c */ +void *kvmi_msg_alloc(void); +void kvmi_msg_free(void *addr); +bool kvmi_is_command_allowed(struct kvm_introspection *kvmi, u16 id); + #endif diff --git a/virt/kvm/introspection/kvmi_msg.c b/virt/kvm/introspection/kvmi_msg.c index 9387b1427cbe..b72df00ae8a7 100644 --- a/virt/kvm/introspection/kvmi_msg.c +++ b/virt/kvm/introspection/kvmi_msg.c @@ -8,6 +8,10 @@ #include <linux/net.h> #include "kvmi_int.h" +typedef int (*kvmi_vm_msg_fct)(struct kvm_introspection *kvmi, + const struct kvmi_msg_hdr *msg, + const void *req); + bool kvmi_sock_get(struct kvm_introspection *kvmi, int fd) { struct socket *sock; @@ -33,7 +37,162 @@ void kvmi_sock_shutdown(struct kvm_introspection *kvmi) kernel_sock_shutdown(kvmi->sock, SHUT_RDWR); } +static int handle_sock_rc(int rc, size_t size) +{ + if (unlikely(rc < 0)) + return rc; + if (unlikely(rc != size)) + return -EPIPE; + return 0; +} + +static int kvmi_sock_read(struct kvm_introspection *kvmi, void *buf, + size_t size) +{ + struct kvec vec = { .iov_base = buf, .iov_len = size, }; + struct msghdr m = { }; + int rc; + + rc = kernel_recvmsg(kvmi->sock, &m, &vec, 1, size, MSG_WAITALL); + + return handle_sock_rc(rc, size); +} + +static int kvmi_sock_write(struct kvm_introspection *kvmi, struct kvec *vec, + size_t n, size_t size) +{ + struct msghdr m = { }; + int rc; + + rc = kernel_sendmsg(kvmi->sock, &m, vec, n, size); + + return handle_sock_rc(rc, size); +} + +static int kvmi_msg_reply(struct kvm_introspection *kvmi, + const struct kvmi_msg_hdr *msg, int err, + const void *rpl, size_t rpl_size) +{ + struct kvmi_error_code ec; + struct kvmi_msg_hdr h; + struct kvec vec[3] = { + { .iov_base = &h, .iov_len = sizeof(h) }, + { .iov_base = &ec, .iov_len = sizeof(ec) }, + { .iov_base = (void *)rpl, .iov_len = rpl_size }, + }; + size_t size = sizeof(h) + sizeof(ec) + (err ? 0 : rpl_size); + size_t n = ARRAY_SIZE(vec) - (err ? 1 : 0); + + memset(&h, 0, sizeof(h)); + h.id = msg->id; + h.seq = msg->seq; + h.size = size - sizeof(h); + + memset(&ec, 0, sizeof(ec)); + ec.err = err; + + return kvmi_sock_write(kvmi, vec, n, size); +} + +static int kvmi_msg_vm_reply(struct kvm_introspection *kvmi, + const struct kvmi_msg_hdr *msg, + int err, const void *rpl, + size_t rpl_size) +{ + return kvmi_msg_reply(kvmi, msg, err, rpl, rpl_size); +} + +/* + * These commands are executed by the receiving thread. + */ +static const kvmi_vm_msg_fct msg_vm[] = { +}; + +static kvmi_vm_msg_fct get_vm_msg_handler(u16 id) +{ + return id < ARRAY_SIZE(msg_vm) ? msg_vm[id] : NULL; +} + +static bool is_vm_message(u16 id) +{ + bool is_vm_msg_id = (id & 1) == 0; + + return is_vm_msg_id && !!get_vm_msg_handler(id); +} + +static bool is_vm_command(u16 id) +{ + return is_vm_message(id); +} + +static struct kvmi_msg_hdr *kvmi_msg_recv(struct kvm_introspection *kvmi) +{ + struct kvmi_msg_hdr *msg; + int err; + + msg = kvmi_msg_alloc(); + if (!msg) + goto out; + + err = kvmi_sock_read(kvmi, msg, sizeof(*msg)); + if (err) + goto out_err; + + if (msg->size) { + if (msg->size > KVMI_MAX_MSG_SIZE) + goto out_err; + + err = kvmi_sock_read(kvmi, msg + 1, msg->size); + if (err) + goto out_err; + } + + return msg; + +out_err: + kvmi_msg_free(msg); +out: + return NULL; +} + +static int kvmi_msg_do_vm_cmd(struct kvm_introspection *kvmi, + const struct kvmi_msg_hdr *msg) +{ + kvmi_vm_msg_fct fct = get_vm_msg_handler(msg->id); + + return fct(kvmi, msg, msg + 1); +} + +static int kvmi_msg_vm_reply_ec(struct kvm_introspection *kvmi, + const struct kvmi_msg_hdr *msg, int ec) +{ + return kvmi_msg_vm_reply(kvmi, msg, ec, NULL, 0); +} + +static int kvmi_msg_handle_vm_cmd(struct kvm_introspection *kvmi, + struct kvmi_msg_hdr *msg) +{ + if (!kvmi_is_command_allowed(kvmi, msg->id)) + return kvmi_msg_vm_reply_ec(kvmi, msg, -KVM_EPERM); + + return kvmi_msg_do_vm_cmd(kvmi, msg); +} + bool kvmi_msg_process(struct kvm_introspection *kvmi) { - return false; + struct kvmi_msg_hdr *msg; + int err = -1; + + msg = kvmi_msg_recv(kvmi); + if (!msg) + goto out; + + if (is_vm_command(msg->id)) + err = kvmi_msg_handle_vm_cmd(kvmi, msg); + else + err = kvmi_msg_vm_reply_ec(kvmi, msg, -KVM_ENOSYS); + + kvmi_msg_free(msg); +out: + return err == 0; }

[v12,32/77] KVM: introspection: add the read/dispatch message function

Commit Message

Patch