[v8,41/81] KVM: introspection: add the read/dispatch message function

Message ID	20200330101308.21702-42-alazar@bitdefender.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=cHma=5P=vger.kernel.org=kvm-owner@kernel.org> From: =?utf-8?q?Adalbert_Laz=C4=83r?= <alazar@bitdefender.com> To: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org, Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?q?Adalbert_Laz=C4=83r?= <alazar@bitdefender.com> Subject: [PATCH v8 41/81] KVM: introspection: add the read/dispatch message function Date: Mon, 30 Mar 2020 13:12:28 +0300 Message-Id: <20200330101308.21702-42-alazar@bitdefender.com> In-Reply-To: <20200330101308.21702-1-alazar@bitdefender.com> References: <20200330101308.21702-1-alazar@bitdefender.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	VM introspection \| expand [v8,00/81] VM introspection [v8,01/81] sched/swait: add swait_event_killable_exclusive() [v8,02/81] export kill_pid_info() [v8,03/81] KVM: add new error codes for VM introspection [v8,04/81] KVM: add kvm_vcpu_kick_and_wait() [v8,05/81] KVM: add kvm_get_max_gfn() [v8,06/81] KVM: doc: fix the hypercall numbering [v8,07/81] KVM: x86: add kvm_arch_vcpu_get_regs() and kvm_arch_vcpu_get_sregs() [v8,08/81] KVM: x86: add kvm_arch_vcpu_set_regs() [v8,09/81] KVM: x86: avoid injecting #PF when emulate the VMCALL instruction [v8,10/81] KVM: x86: add .bp_intercepted() to struct kvm_x86_ops [v8,11/81] KVM: x86: add .control_cr3_intercept() to struct kvm_x86_ops [v8,12/81] KVM: x86: add .cr3_write_intercepted() [v8,13/81] KVM: x86: add .desc_ctrl_supported() [v8,14/81] KVM: svm: add support for descriptor-table exits [v8,15/81] KVM: x86: add .control_desc_intercept() [v8,16/81] KVM: x86: add .desc_intercepted() [v8,17/81] KVM: x86: export .msr_write_intercepted() [v8,18/81] KVM: x86: use MSR_TYPE_R, MSR_TYPE_W and MSR_TYPE_RW with AMD code too [v8,19/81] KVM: svm: pass struct kvm_vcpu to set_msr_interception() [v8,20/81] KVM: vmx: pass struct kvm_vcpu to the intercept msr related functions [v8,21/81] KVM: x86: add .control_msr_intercept() [v8,22/81] KVM: x86: vmx: use a symbolic constant when checking the exit qualifications [v8,23/81] KVM: x86: save the error code during EPT/NPF exits handling [v8,24/81] KVM: x86: add .fault_gla() [v8,25/81] KVM: x86: add .spt_fault() [v8,26/81] KVM: x86: add .gpt_translation_fault() [v8,27/81] KVM: x86: add .control_singlestep() [v8,28/81] KVM: x86: export kvm_arch_vcpu_set_guest_debug() [v8,29/81] KVM: x86: extend kvm_mmu_gva_to_gpa_system() with the 'access' parameter [v8,30/81] KVM: x86: export kvm_inject_pending_exception() [v8,31/81] KVM: x86: export kvm_vcpu_ioctl_x86_get_xsave() [v8,32/81] KVM: x86: page track: provide all page tracking hooks with the guest virtual address [v8,33/81] KVM: x86: page track: add track_create_slot() callback [v8,34/81] KVM: x86: page_track: add support for preread, prewrite and preexec [v8,35/81] KVM: x86: wire in the preread/prewrite/preexec page trackers [v8,36/81] KVM: x86: intercept the write access on sidt and other emulated instructions [v8,37/81] KVM: x86: disable gpa_available optimization for fetch and page-walk NPF/EPT violations [v8,38/81] KVM: introduce VM introspection [v8,39/81] KVM: introspection: add hook/unhook ioctls [v8,40/81] KVM: introspection: add permission access ioctls [v8,41/81] KVM: introspection: add the read/dispatch message function [v8,42/81] KVM: introspection: add KVMI_GET_VERSION [v8,43/81] KVM: introspection: add KVMI_VM_CHECK_COMMAND and KVMI_VM_CHECK_EVENT [v8,44/81] KVM: introspection: add KVMI_VM_GET_INFO [v8,45/81] KVM: introspection: add KVMI_EVENT_UNHOOK [v8,46/81] KVM: introspection: add KVMI_VM_CONTROL_EVENTS [v8,47/81] KVM: introspection: add KVMI_VM_READ_PHYSICAL/KVMI_VM_WRITE_PHYSICAL [v8,48/81] KVM: introspection: add vCPU related data [v8,49/81] KVM: introspection: add a jobs list to every introspected vCPU [v8,50/81] KVM: introspection: handle vCPU introspection requests [v8,51/81] KVM: introspection: handle vCPU commands [v8,52/81] KVM: introspection: add KVMI_VCPU_GET_INFO [v8,53/81] KVM: introspection: add KVMI_VCPU_PAUSE [v8,54/81] KVM: introspection: add KVMI_EVENT_PAUSE_VCPU [v8,55/81] KVM: introspection: add crash action handling on event reply [v8,56/81] KVM: introspection: add KVMI_VCPU_CONTROL_EVENTS [v8,57/81] KVM: introspection: add KVMI_VCPU_GET_REGISTERS [v8,58/81] KVM: introspection: add KVMI_VCPU_SET_REGISTERS [v8,59/81] KVM: introspection: add KVMI_VCPU_GET_CPUID [v8,60/81] KVM: introspection: add KVMI_EVENT_HYPERCALL [v8,61/81] KVM: introspection: add KVMI_EVENT_BREAKPOINT [v8,62/81] KVM: introspection: restore the state of #BP interception on unhook [v8,63/81] KVM: introspection: add KVMI_VCPU_CONTROL_CR and KVMI_EVENT_CR [v8,64/81] KVM: introspection: restore the state of CR3 interception on unhook [v8,65/81] KVM: introspection: add KVMI_VCPU_INJECT_EXCEPTION + KVMI_EVENT_TRAP [v8,66/81] KVM: introspection: add KVMI_VM_GET_MAX_GFN [v8,67/81] KVM: introspection: add KVMI_EVENT_XSETBV [v8,68/81] KVM: introspection: add KVMI_VCPU_GET_XSAVE [v8,69/81] KVM: introspection: add KVMI_VCPU_GET_MTRR_TYPE [v8,70/81] KVM: introspection: add KVMI_EVENT_DESCRIPTOR [v8,71/81] KVM: introspection: restore the state of descriptor-table register interception on unhook [v8,72/81] KVM: introspection: add KVMI_VCPU_CONTROL_MSR and KVMI_EVENT_MSR [v8,73/81] KVM: introspection: restore the state of MSR interception on unhook [v8,74/81] KVM: introspection: add KVMI_VM_SET_PAGE_ACCESS [v8,75/81] KVM: introspection: add KVMI_EVENT_PF [v8,76/81] KVM: introspection: extend KVMI_GET_VERSION with struct kvmi_features [v8,77/81] KVM: introspection: add KVMI_VCPU_CONTROL_SINGLESTEP [v8,78/81] KVM: introspection: add KVMI_EVENT_SINGLESTEP [v8,79/81] KVM: introspection: add KVMI_VCPU_TRANSLATE_GVA [v8,80/81] KVM: introspection: emulate a guest page table walk on SPT violations due to A/D bit upd… [v8,81/81] KVM: x86: call the page tracking code on emulation failure

diff --git a/Documentation/virt/kvm/kvmi.rst b/Documentation/virt/kvm/kvmi.rst index 2ee37c03585a..efde4b771586 100644 --- a/Documentation/virt/kvm/kvmi.rst +++ b/Documentation/virt/kvm/kvmi.rst @@ -65,6 +65,85 @@ used on that guest. Obviously, whether the guest can really continue normal execution depends on whether the introspection tool has made any modifications that require an active KVMI channel. +All messages (commands or events) have a common header:: + + struct kvmi_msg_hdr { + __u16 id; + __u16 size; + __u32 seq; + }; + +The replies have the same header, with the sequence number (``seq``) +and message id (``id``) matching the command/event. + +After ``kvmi_msg_hdr``, ``id`` specific data of ``size`` bytes will +follow. + +The message header and its data must be sent with one ``sendmsg()`` call +to the socket. This simplifies the receiver loop and avoids +the reconstruction of messages on the other side. + +The wire protocol uses the host native byte-order. The introspection tool +must check this during the handshake and do the necessary conversion. + +A command reply begins with:: + + struct kvmi_error_code { + __s32 err; + __u32 padding; + } + +followed by the command specific data if the error code ``err`` is zero. + +The error code -KVM_ENOSYS is returned for unsupported commands. + +The error code -KVM_EPERM is returned for disallowed commands (see **Hooking**). + +The error code is related to the message processing, including unsupported +commands. For all the other errors (incomplete messages, wrong sequence +numbers, socket errors etc.) the socket will be closed. The device +manager should reconnect. + +While all commands will have a reply as soon as possible, the replies +to events will probably be delayed until a set of (new) commands will +complete:: + + Host kernel Tool + ----------- ---- + event 1 -> + <- command 1 + command 1 reply -> + <- command 2 + command 2 reply -> + <- event 1 reply + +If both ends send a message at the same time:: + + Host kernel Tool + ----------- ---- + event X -> <- command X + +the host kernel will reply to 'command X', regardless of the receive time +(before or after the 'event X' was sent). + +As it can be seen below, the wire protocol specifies occasional padding. This +is to permit working with the data by directly using C structures or to round +the structure size to a multiple of 8 bytes (64bit) to improve the copy +operations that happen during ``recvmsg()`` or ``sendmsg()``. The members +should have the native alignment of the host (4 bytes on x86). All padding +must be initialized with zero otherwise the respective commands will fail +with -KVM_EINVAL. + +To describe the commands/events, we reuse some conventions from api.txt: + + - Architectures: which instruction set architectures provide this command/event + + - Versions: which versions provide this command/event + + - Parameters: incoming message data + + - Returns: outgoing/reply message data + Handshake --------- @@ -99,6 +178,13 @@ In the end, the device manager will pass the file handle (plus the allowed commands/events) to KVM. It will detect when the socket is shutdown and it will reinitiate the handshake. +Once the file handle reaches KVM, the introspection tool should +use the *KVMI_GET_VERSION* command to get the API version and/or the +*KVMI_VM_CHECK_COMMAND* and *KVMI_VM_CHECK_EVENT* commands to see which +commands/events are allowed for this guest. The error code -KVM_EPERM +will be returned if the introspection tool uses a command or enables an +event which is disallowed. + Unhooking --------- diff --git a/include/uapi/linux/kvmi.h b/include/uapi/linux/kvmi.h index d7b18ffef4fa..6fdaa92393a4 100644 --- a/include/uapi/linux/kvmi.h +++ b/include/uapi/linux/kvmi.h @@ -18,4 +18,26 @@ enum { KVMI_NUM_EVENTS }; +struct kvmi_msg_hdr { + __u16 id; + __u16 size; + __u32 seq; +}; + +/* + * kvmi_msg_hdr.size is limited to KVMI_MSG_SIZE. + * The kernel side will close the socket if userspace + * uses a bigger value. + * This limit is used to accommodate the biggest known message, + * the commands to read/write a 4K page from/to guest memory. + */ +enum { + KVMI_MSG_SIZE = (4096 * 2 - sizeof(struct kvmi_msg_hdr)) +}; + +struct kvmi_error_code { + __s32 err; + __u32 padding; +}; + #endif /* _UAPI__LINUX_KVMI_H */ diff --git a/tools/testing/selftests/kvm/x86_64/kvmi_test.c b/tools/testing/selftests/kvm/x86_64/kvmi_test.c index d1d02e067393..4c1fe67c8e35 100644 --- a/tools/testing/selftests/kvm/x86_64/kvmi_test.c +++ b/tools/testing/selftests/kvm/x86_64/kvmi_test.c @@ -15,6 +15,7 @@ #include "processor.h" #include "../lib/kvm_util_internal.h" +#include "linux/kvm_para.h" #include "linux/kvmi.h" #define VCPU_ID 5 @@ -82,10 +83,107 @@ static void unhook_introspection(struct kvm_vm *vm) errno, strerror(errno)); } +static void receive_data(void *dest, size_t size) +{ + ssize_t r; + + r = recv(Userspace_socket, dest, size, MSG_WAITALL); + TEST_ASSERT(r == size, + "recv() failed, expected %d, result %d, errno %d (%s)\n", + size, r, errno, strerror(errno)); +} + +static int receive_cmd_reply(struct kvmi_msg_hdr *req, void *rpl, + size_t rpl_size) +{ + struct kvmi_msg_hdr hdr; + struct kvmi_error_code ec; + + receive_data(&hdr, sizeof(hdr)); + + TEST_ASSERT(hdr.seq == req->seq, + "Unexpected messages sequence 0x%x, expected 0x%x\n", + hdr.seq, req->seq); + + TEST_ASSERT(hdr.size >= sizeof(ec), + "Invalid message size %d, expected %d bytes (at least)\n", + hdr.size, sizeof(ec)); + + receive_data(&ec, sizeof(ec)); + + if (ec.err) { + TEST_ASSERT(hdr.size == sizeof(ec), + "Invalid command reply on error\n"); + } else { + TEST_ASSERT(hdr.size == sizeof(ec) + rpl_size, + "Invalid command reply\n"); + + if (rpl && rpl_size) + receive_data(rpl, rpl_size); + } + + return ec.err; +} + +static unsigned int new_seq(void) +{ + static unsigned int seq; + + return seq++; +} + +static void send_message(int msg_id, struct kvmi_msg_hdr *hdr, size_t size) +{ + ssize_t r; + + hdr->id = msg_id; + hdr->seq = new_seq(); + hdr->size = size - sizeof(*hdr); + + r = send(Userspace_socket, hdr, size, 0); + TEST_ASSERT(r == size, + "send() failed, sending %d, result %d, errno %d (%s)\n", + size, r, errno, strerror(errno)); +} + +static const char *kvm_strerror(int error) +{ + switch (error) { + case KVM_ENOSYS: + return "Invalid system call number"; + case KVM_EOPNOTSUPP: + return "Operation not supported on transport endpoint"; + default: + return strerror(error); + } +} + +static int do_command(int cmd_id, struct kvmi_msg_hdr *req, + size_t req_size, void *rpl, size_t rpl_size) +{ + send_message(cmd_id, req, req_size); + return receive_cmd_reply(req, rpl, rpl_size); +} + +static void test_cmd_invalid(void) +{ + int invalid_msg_id = 0xffff; + struct kvmi_msg_hdr req; + int r; + + r = do_command(invalid_msg_id, &req, sizeof(req), NULL, 0); + TEST_ASSERT(r == -KVM_ENOSYS, + "Invalid command didn't failed with KVM_ENOSYS, error %d (%s)\n", + -r, kvm_strerror(-r)); +} + static void test_introspection(struct kvm_vm *vm) { setup_socket(); hook_introspection(vm); + + test_cmd_invalid(); + unhook_introspection(vm); } diff --git a/virt/kvm/introspection/kvmi.c b/virt/kvm/introspection/kvmi.c index 95b08a40d814..88d29408fbf1 100644 --- a/virt/kvm/introspection/kvmi.c +++ b/virt/kvm/introspection/kvmi.c @@ -8,13 +8,49 @@ #include "kvmi_int.h" #include <linux/kthread.h> +#define KVMI_MSG_SIZE_ALLOC (sizeof(struct kvmi_msg_hdr) + KVMI_MSG_SIZE) + +static struct kmem_cache *msg_cache; + +void *kvmi_msg_alloc(void) +{ + return kmem_cache_zalloc(msg_cache, GFP_KERNEL); +} + +void kvmi_msg_free(void *addr) +{ + if (addr) + kmem_cache_free(msg_cache, addr); +} + +static void kvmi_cache_destroy(void) +{ + kmem_cache_destroy(msg_cache); + msg_cache = NULL; +} + +static int kvmi_cache_create(void) +{ + msg_cache = kmem_cache_create("kvmi_msg", KVMI_MSG_SIZE_ALLOC, + 4096, SLAB_ACCOUNT, NULL); + + if (!msg_cache) { + kvmi_cache_destroy(); + + return -1; + } + + return 0; +} + int kvmi_init(void) { - return 0; + return kvmi_cache_create(); } void kvmi_uninit(void) { + kvmi_cache_destroy(); } static void free_kvmi(struct kvm *kvm) diff --git a/virt/kvm/introspection/kvmi_int.h b/virt/kvm/introspection/kvmi_int.h index 1c9cc15ab4d9..36f5e504e791 100644 --- a/virt/kvm/introspection/kvmi_int.h +++ b/virt/kvm/introspection/kvmi_int.h @@ -24,4 +24,8 @@ void kvmi_sock_shutdown(struct kvm_introspection *kvmi); void kvmi_sock_put(struct kvm_introspection *kvmi); bool kvmi_msg_process(struct kvm_introspection *kvmi); +/* kvmi.c */ +void *kvmi_msg_alloc(void); +void kvmi_msg_free(void *addr); + #endif diff --git a/virt/kvm/introspection/kvmi_msg.c b/virt/kvm/introspection/kvmi_msg.c index f9e66274fb43..02fc5d95fef6 100644 --- a/virt/kvm/introspection/kvmi_msg.c +++ b/virt/kvm/introspection/kvmi_msg.c @@ -33,7 +33,154 @@ void kvmi_sock_shutdown(struct kvm_introspection *kvmi) kernel_sock_shutdown(kvmi->sock, SHUT_RDWR); } +static int kvmi_sock_read(struct kvm_introspection *kvmi, void *buf, + size_t size) +{ + struct kvec i = { + .iov_base = buf, + .iov_len = size, + }; + struct msghdr m = { }; + int rc; + + rc = kernel_recvmsg(kvmi->sock, &m, &i, 1, size, MSG_WAITALL); + + if (unlikely(rc != size && rc >= 0)) + rc = -EPIPE; + + return rc >= 0 ? 0 : rc; +} + +static int kvmi_sock_write(struct kvm_introspection *kvmi, struct kvec *i, + size_t n, size_t size) +{ + struct msghdr m = { }; + int rc; + + rc = kernel_sendmsg(kvmi->sock, &m, i, n, size); + + if (unlikely(rc != size && rc >= 0)) + rc = -EPIPE; + + return rc >= 0 ? 0 : rc; +} + +static int kvmi_msg_reply(struct kvm_introspection *kvmi, + const struct kvmi_msg_hdr *msg, int err, + const void *rpl, size_t rpl_size) +{ + struct kvmi_error_code ec; + struct kvmi_msg_hdr h; + struct kvec vec[3] = { + { .iov_base = &h, .iov_len = sizeof(h) }, + { .iov_base = &ec, .iov_len = sizeof(ec) }, + { .iov_base = (void *)rpl, .iov_len = rpl_size }, + }; + size_t size = sizeof(h) + sizeof(ec) + (err ? 0 : rpl_size); + size_t n = err ? ARRAY_SIZE(vec) - 1 : ARRAY_SIZE(vec); + + memset(&h, 0, sizeof(h)); + h.id = msg->id; + h.seq = msg->seq; + h.size = size - sizeof(h); + + memset(&ec, 0, sizeof(ec)); + ec.err = err; + + return kvmi_sock_write(kvmi, vec, n, size); +} + +static int kvmi_msg_vm_reply(struct kvm_introspection *kvmi, + const struct kvmi_msg_hdr *msg, + int err, const void *rpl, + size_t rpl_size) +{ + return kvmi_msg_reply(kvmi, msg, err, rpl, rpl_size); +} + +static bool is_command_allowed(struct kvm_introspection *kvmi, u16 id) +{ + return id < KVMI_NUM_COMMANDS && test_bit(id, kvmi->cmd_allow_mask); +} + +/* + * These commands are executed by the receiving thread/worker. + */ +static int(*const msg_vm[])(struct kvm_introspection *, + const struct kvmi_msg_hdr *, const void *) = { +}; + +static bool is_vm_command(u16 id) +{ + return id < ARRAY_SIZE(msg_vm) && !!msg_vm[id]; +} + +static struct kvmi_msg_hdr *kvmi_msg_recv(struct kvm_introspection *kvmi) +{ + struct kvmi_msg_hdr *msg; + int err; + + msg = kvmi_msg_alloc(); + if (!msg) + goto out_err; + + err = kvmi_sock_read(kvmi, msg, sizeof(*msg)); + if (err) + goto out_err; + + if (msg->size) { + if (msg->size > KVMI_MSG_SIZE) + goto out_err; + + err = kvmi_sock_read(kvmi, msg + 1, msg->size); + if (err) + goto out_err; + } + + return msg; + +out_err: + kvmi_msg_free(msg); + + return NULL; +} + +static int kvmi_msg_dispatch_vm_cmd(struct kvm_introspection *kvmi, + const struct kvmi_msg_hdr *msg) +{ + return msg_vm[msg->id](kvmi, msg, msg + 1); +} + +static bool is_message_allowed(struct kvm_introspection *kvmi, u16 id) +{ + return is_command_allowed(kvmi, id); +} + +static int kvmi_msg_vm_reply_ec(struct kvm_introspection *kvmi, + const struct kvmi_msg_hdr *msg, int ec) +{ + return kvmi_msg_vm_reply(kvmi, msg, ec, NULL, 0); +} + bool kvmi_msg_process(struct kvm_introspection *kvmi) { - return false; + struct kvmi_msg_hdr *msg; + int err = -1; + + msg = kvmi_msg_recv(kvmi); + if (!msg) + goto out; + + if (is_vm_command(msg->id)) { + if (is_message_allowed(kvmi, msg->id)) + err = kvmi_msg_dispatch_vm_cmd(kvmi, msg); + else + err = kvmi_msg_vm_reply_ec(kvmi, msg, -KVM_EPERM); + } else { + err = kvmi_msg_vm_reply_ec(kvmi, msg, -KVM_ENOSYS); + } + + kvmi_msg_free(msg); +out: + return err == 0; }

[v8,41/81] KVM: introspection: add the read/dispatch message function

Commit Message

Patch