From patchwork Mon Nov 6 11:51:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10043391 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 10124602BF for ; Mon, 6 Nov 2017 11:52:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EF7BD29B9F for ; Mon, 6 Nov 2017 11:52:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E33D929BA7; Mon, 6 Nov 2017 11:52:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E4F229B9F for ; Mon, 6 Nov 2017 11:52:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752455AbdKFLwD (ORCPT ); Mon, 6 Nov 2017 06:52:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38640 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752276AbdKFLwC (ORCPT ); Mon, 6 Nov 2017 06:52:02 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D6F0449035; Mon, 6 Nov 2017 11:52:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com D6F0449035 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=david@redhat.com Received: from [10.36.117.230] (ovpn-117-230.ams2.redhat.com [10.36.117.230]) by smtp.corp.redhat.com (Postfix) with ESMTP id BB4895D6A8; Mon, 6 Nov 2017 11:51:58 +0000 (UTC) Subject: Re: BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:LINE To: syzbot , borntraeger@de.ibm.com, hpa@zytor.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, lstoakes@gmail.com, mhocko@suse.com, mingo@redhat.com, pbonzini@redhat.com, rkrcmar@redhat.com, syzkaller-bugs@googlegroups.com, tglx@linutronix.de, x86@kernel.org References: <001a11447f06b06409055cd621ed@google.com> From: David Hildenbrand Organization: Red Hat GmbH Cc: Nick Desaulniers Message-ID: <367fa416-1adb-53db-4295-c25cec17b893@redhat.com> Date: Mon, 6 Nov 2017 12:51:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <001a11447f06b06409055cd621ed@google.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 06 Nov 2017 11:52:02 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 31.10.2017 12:34, syzbot wrote: > Hello, > > syzkaller hit the following crash on > 91dfed74eabcdae9378131546c446442c29bf769 > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached > Raw console output is attached. > C reproducer is attached > syzkaller reproducer is attached. See https://goo.gl/kgGztJ > for information about syzkaller reproducers > > > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109 > 2 locks held by syzkaller879109/2909: > #0: (&vcpu->mutex){+.+.}, at: [] vcpu_load+0x1c/0x70 > arch/x86/kvm/../../../virt/kvm/kvm_main.c:154 > #1: (&kvm->srcu){....}, at: [] vcpu_enter_guest > arch/x86/kvm/x86.c:6983 [inline] > #1: (&kvm->srcu){....}, at: [] vcpu_run > arch/x86/kvm/x86.c:7061 [inline] > #1: (&kvm->srcu){....}, at: [] > kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222 > CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted 4.13.0-rc4-next-20170811 > #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:16 [inline] > dump_stack+0x194/0x257 lib/dump_stack.c:52 > ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014 > __might_sleep+0x95/0x190 kernel/sched/core.c:5967 > __might_fault+0xab/0x1d0 mm/memory.c:4383 > __copy_from_user include/linux/uaccess.h:71 [inline] > __kvm_read_guest_page+0x58/0xa0 > arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771 > kvm_vcpu_read_guest_page+0x44/0x60 > arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791 > kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407 > kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466 > segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819 > em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022 In em_fxrstor, we do a get_fpu(), which in return disables preemption. With preempt_disable(), we do a segmented_read_std()->kvm_vcpu_read_guest_page(), triggering the warning. > x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471 > x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698 > kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854 > handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400 > vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718 > vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline] > vcpu_run arch/x86/kvm/x86.c:7061 [inline] > kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222 > kvm_vcpu_ioctl+0x64c/0x1010 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591 > vfs_ioctl fs/ioctl.c:45 [inline] > do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 > SYSC_ioctl fs/ioctl.c:700 [inline] > SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 > entry_SYSCALL_64_fastpath+0x1f/0xbe I don't really see a way to avoid two fxstate variables. Unloading the fpu in between the fxstore/fxrstr could lead to host values getting overwritten. Loading/saving the fpu in kvm_arch_vcpu_ioctl_run() would most probably also not work, as the relevant portions of fxregs_state would not get saved/restored. So the preemption would still be needed. So all I can offer for now is the following (untested, can send as proper patch if needed): From f32d06c8c621c6d68e073e9bdb81a6280b6c9544 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 6 Nov 2017 12:35:39 +0100 Subject: [PATCH v1] KVM: x86: fix em_fxstor sleeping while in atomic Commit 9d643f63128b tried to optimize the stack size, but introduced a guest memory access which might sleep while in atomic. Let's undo that part of the commit but keep the cleanups. Reported by syzbot: in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109 2 locks held by syzkaller879109/2909: #0: (&vcpu->mutex){+.+.}, at: [] vcpu_load+0x1c/0x70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:154 #1: (&kvm->srcu){....}, at: [] vcpu_enter_guest arch/x86/kvm/x86.c:6983 [inline] #1: (&kvm->srcu){....}, at: [] vcpu_run arch/x86/kvm/x86.c:7061 [inline] #1: (&kvm->srcu){....}, at: [] kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222 CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted 4.13.0-rc4-next-20170811 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014 __might_sleep+0x95/0x190 kernel/sched/core.c:5967 __might_fault+0xab/0x1d0 mm/memory.c:4383 __copy_from_user include/linux/uaccess.h:71 [inline] __kvm_read_guest_page+0x58/0xa0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771 kvm_vcpu_read_guest_page+0x44/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791 kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407 kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466 segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819 em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022 x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471 x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698 kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854 handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400 vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718 vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline] vcpu_run arch/x86/kvm/x86.c:7061 [inline] kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222 kvm_vcpu_ioctl+0x64c/0x1010 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x437fc9 RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9 RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000 R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000 Fixes: 9d643f63128b ("KVM: x86: avoid large stack allocations in em_fxrstor") Signed-off-by: David Hildenbrand --- arch/x86/kvm/emulate.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index fb0055953fbc..d87f01a2d6f4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4002,7 +4002,7 @@ static int em_fxsave(struct x86_emulate_ctxt *ctxt) static int em_fxrstor(struct x86_emulate_ctxt *ctxt) { - struct fxregs_state fx_state; + struct fxregs_state fx_state, fx_old; int rc; size_t size; @@ -4010,19 +4010,21 @@ static int em_fxrstor(struct x86_emulate_ctxt *ctxt) if (rc != X86EMUL_CONTINUE) return rc; + size = fxstate_size(ctxt); + rc = segmented_read_std(ctxt, ctxt->memop.addr.mem, &fx_state, size); + if (rc != X86EMUL_CONTINUE) + return rc; + ctxt->ops->get_fpu(ctxt); - size = fxstate_size(ctxt); if (size < __fxstate_size(16)) { - rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state)); + rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_old)); if (rc != X86EMUL_CONTINUE) goto out; + memcpy(((void *)&fx_state) + size, ((void *)&fx_old) + size, + __fxstate_size(16) - size); } - rc = segmented_read_std(ctxt, ctxt->memop.addr.mem, &fx_state, size); - if (rc != X86EMUL_CONTINUE) - goto out; - if (fx_state.mxcsr >> 16) { rc = emulate_gp(ctxt, 0); goto out;