From patchwork Thu Sep 21 18:52:37 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 9964711 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E4E3B600C5 for ; Thu, 21 Sep 2017 18:54:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D805E2952D for ; Thu, 21 Sep 2017 18:54:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CC53E2952E; Thu, 21 Sep 2017 18:54:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_MED, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.wl.linuxfoundation.org (Postfix) with SMTP id 6DAAA29530 for ; Thu, 21 Sep 2017 18:54:23 +0000 (UTC) Received: (qmail 3579 invoked by uid 550); 21 Sep 2017 18:54:21 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 3536 invoked from network); 21 Sep 2017 18:54:19 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=9mz/D4x0BHqEb/iK838s2GV74fQdof+iAyhxb7r+kEg=; b=Rr+sjcNIQe7GH0MoW5kte5aiTEXg5PRkFnj4+12cIkhpGzqaXFY08Gr//WC/pMAj1E h7iFgh7+bWh192J1VHtxu2FXXiLRFYaJN2xcs6nrqZe+Tsfvz4SzgHGdbMmKwGkgt6oR M6Y8jfntuh6AYCUDpVCshIh3ySpWJA+KxtEfbJBXgpIjXV0XqzdNGetd1S0cVjzOPno5 8/uRURz0h93hyDuo+78/qOoDQ2s7QNbaTsEgsdQerhLGAgZLkOHiqt4LIDECZEh2UpPu 5SmsHf/dfF4kq0wSfIImfkIpnyiV5X+JOjAhbs1ngHAhjM/KXqpYBqJXlDBlJFamPNJa YPJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=9mz/D4x0BHqEb/iK838s2GV74fQdof+iAyhxb7r+kEg=; b=JIntMt2E3quUtrxZeYJA5rZVf/g7qwDV9QNBWIhUAo5n1xsBXsgNyHqJNKKvs2IpqF whme7+ViWC1eaJHHuJKHW6okkL7tpeYFKE03nyC2bz/MwZf3vpbGUrG1UHQSnYvmcCDf +3dSrzfaAjAw8Yk1QGk32Du0sA92gKqmmDD6fo8pvCHRzFGVhSNfTTNi/etXZ8xb5Hco aHQdsWfjeKOVO4vXr4zjXLL3JUXeI04hJY5scyS5e4Vm2rwbjrroOdR2HeFzpE6MuiWA 8sYRZOYEYh/yT2X8ue0ZkvSU3XRV18+EQhyW9yVYfE0OOlkS+aKgdkqAPhxWBfdQgAbD KDEg== X-Gm-Message-State: AHPjjUjKPP27r5oWu1pGbjIkOEBLivM49qCbUfxA6Fa2Ioyxl4dyTrgg UtXp0dx3hNIu6QDL5RxlSYA= X-Google-Smtp-Source: AOwi7QD5aEo8Mh2x9u6oB0LebcCJ0uwJ8rFQ3YRVnmKPC5SQJbJFRO1yRXM5CSVfnrskh4xi3nZMDw== X-Received: by 10.99.124.30 with SMTP id x30mr6810461pgc.52.1506020047485; Thu, 21 Sep 2017 11:54:07 -0700 (PDT) From: Eric Biggers To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, kernel-hardening@lists.openwall.com, Andy Lutomirski , Dave Hansen , Dmitry Vyukov , Fenghua Yu , Ingo Molnar , Kevin Hao , Oleg Nesterov , Wanpeng Li , Yu-cheng Yu , Michael Halcrow , Eric Biggers , stable@vger.kernel.org Date: Thu, 21 Sep 2017 11:52:37 -0700 Message-Id: <20170921185239.88398-2-ebiggers3@gmail.com> X-Mailer: git-send-email 2.14.1.821.g8fa685d3b7-goog In-Reply-To: <20170921185239.88398-1-ebiggers3@gmail.com> References: <20170921185239.88398-1-ebiggers3@gmail.com> Subject: [kernel-hardening] [PATCH v3 1/3] x86/fpu: don't let userspace set bogus xcomp_bv X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers On x86, userspace can use the ptrace() or rt_sigreturn() system calls to set a task's extended state (xstate) or "FPU" registers. ptrace() can set them for another task using the PTRACE_SETREGSET request with NT_X86_XSTATE, while rt_sigreturn() can set them for the current task. In either case, registers can be set to any value, but the kernel assumes that the XSAVE area itself remains valid in the sense that the CPU can restore it. However, in the case where the kernel is using the uncompacted xstate format (which it does whenever the XSAVES instruction is unavailable), it was possible for userspace to set the xcomp_bv field in the xstate_header to an arbitrary value. However, all bits in that field are reserved in the uncompacted case, so when switching to a task with nonzero xcomp_bv, the XRSTOR instruction failed with a #GP fault. This caused the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit. In addition, since the error is otherwise ignored, the FPU registers from the task previously executing on the CPU were leaked. Fix the bug by checking that the user-supplied value of xcomp_bv is 0 in the uncompacted case, and returning an error otherwise. The reason for validating xcomp_bv rather than simply overwriting it with 0 is that we want userspace to see an error if it (incorrectly) provides an XSAVE area in compacted format rather than in uncompacted format. Note that as before, in case of error we clear the task's FPU state. This is perhaps non-ideal, especially for PTRACE_SETREGSET; it might be better to return an error before changing anything. But it seems the "clear on error" behavior is fine for now, and it's a little tricky to do otherwise because it would mean we couldn't simply copy the full userspace state into kernel memory in one __copy_from_user(). This bug was found by syzkaller, which hit the above-mentioned WARN_ON_FPU(): WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000 RIP: 0010:__switch_to+0x5b5/0x5d0 RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082 RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100 RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0 RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0 R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40 FS: 00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0 Call Trace: Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff <0f> ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f Here is a C reproducer. The expected behavior is that the program spin forever with no output. However, on a buggy kernel running on a processor with the "xsave" feature but without the "xsaves" feature (e.g. Sandy Bridge through Broadwell for Intel), within a second or two the program reports that the xmm registers were corrupted, i.e. were not restored correctly. With CONFIG_X86_DEBUG_FPU=y it also hits the above kernel warning. #define _GNU_SOURCE #include #include #include #include #include #include #include #include int main(void) { int pid = fork(); uint64_t xstate[512]; struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) }; if (pid == 0) { bool tracee = true; for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++) tracee = (fork() != 0); uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x00000000 : 0xDEADBEEF }; asm volatile(" movdqu %0, %%xmm0\n" " mov %0, %%rbx\n" "1: movdqu %%xmm0, %0\n" " mov %0, %%rax\n" " cmp %%rax, %%rbx\n" " je 1b\n" : "+m" (xmm0) : : "rax", "rbx", "xmm0"); printf("BUG: xmm registers corrupted! tracee=%d, xmm0=%08X%08X%08X%08X\n", tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]); } else { usleep(100000); ptrace(PTRACE_ATTACH, pid, 0, 0); wait(NULL); ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov); xstate[65] = -1; ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov); ptrace(PTRACE_CONT, pid, 0, 0); wait(NULL); } return 1; } Note: the program only tests for the bug using the ptrace() system call. The bug can also be reproduced using the rt_sigreturn() system call, but only when called from a 32-bit program, since for 64-bit programs the kernel restores the FPU state from the signal frame by doing XRSTOR directly from userspace memory (with proper error checking). Fixes: 0b29643a5843 ("x86/xsaves: Change compacted format xsave area header") Reported-by: Dmitry Vyukov Reviewed-by: Kees Cook Acked-by: Dave Hansen Cc: Andy Lutomirski Cc: Fenghua Yu Cc: Ingo Molnar Cc: Kevin Hao Cc: Oleg Nesterov Cc: Wanpeng Li Cc: Yu-cheng Yu Cc: [v3.17+] Signed-off-by: Eric Biggers Reviewed-by: Rik van Riel --- arch/x86/kernel/fpu/regset.c | 9 +++++++-- arch/x86/kernel/fpu/signal.c | 4 ++++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c index b188b16841e3..8ab1a1f4d1c1 100644 --- a/arch/x86/kernel/fpu/regset.c +++ b/arch/x86/kernel/fpu/regset.c @@ -131,11 +131,16 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, fpu__activate_fpstate_write(fpu); - if (boot_cpu_has(X86_FEATURE_XSAVES)) + if (boot_cpu_has(X86_FEATURE_XSAVES)) { ret = copyin_to_xsaves(kbuf, ubuf, xsave); - else + } else { ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1); + /* xcomp_bv must be 0 when using uncompacted format */ + if (!ret && xsave->header.xcomp_bv) + ret = -EINVAL; + } + /* * In case of failure, mark all states as init: */ diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index 83c23c230b4c..169d6e001d32 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -329,6 +329,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) } else { err = __copy_from_user(&fpu->state.xsave, buf_fx, state_size); + + /* xcomp_bv must be 0 when using uncompacted format */ + if (!err && fpu->state.xsave.header.xcomp_bv) + err = -EINVAL; } if (err || __copy_from_user(&env, buf, sizeof(env))) {