From patchwork Thu Feb 21 23:44:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 10824869 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40316922 for ; Thu, 21 Feb 2019 23:51:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2EA7731D2C for ; Thu, 21 Feb 2019 23:51:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 22CD831D30; Thu, 21 Feb 2019 23:51:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 507C831D2C for ; Thu, 21 Feb 2019 23:51:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E60D8E00CA; Thu, 21 Feb 2019 18:51:01 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 196668E00B5; Thu, 21 Feb 2019 18:51:01 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB8A38E00CA; Thu, 21 Feb 2019 18:51:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 9EBE48E00B5 for ; Thu, 21 Feb 2019 18:51:00 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id w17so275180plp.23 for ; Thu, 21 Feb 2019 15:51:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=O07umIFKOz6W0sIx5craaaoizFYzAP28pSU0DhhenOc=; b=s/JFMiPuGuttCeHK23zYPp/I3XYGBJnLBM6ZOJge3tT+ACiDpnpdou6wEB8F0qLfMD HvVudWsVACz7YHM/scOWrTAEgJCZZXmtXOls62qDW2RbQ5I0OsoippU3HNspn19iKNRR +D8cNXpMuWSjEF3/pG0q3V7eVD2S1891w6dir30dXM12Re4xAw6vugwMokKJBZBBfCbE aopsApij0yv2dPqCHlrCNxUbUei4uFJk96P0DxaVJBAouyCjADC+sUqOHBmXF3EbfKQw 9tk+NWA83MmZf23K5Sj1eh1XCfOjqVI+JZhUB0uOwEMjlMXmOfr+0VjMbpeZidD6HkFf 9FyQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AHQUAuaNz9F+u+0PbrcDKEZr7Z10kwgYToIhD66tfN3aIkpXjrj09hVB SXh6T0AAoYGGUzP1vp2/kmGiPilSKYkhbtCL1m6gZ2w2OmADRoyvVOTwicQApZZ/k5WZt1/SDYw HtYTlAkpKGY4eeezjfrxcIOUOQdf5/sjVrO6pM/m4ek3cr8j6T/KT+Hl+eeQzk/0DOQ== X-Received: by 2002:a17:902:147:: with SMTP id 65mr1204263plb.116.1550793060287; Thu, 21 Feb 2019 15:51:00 -0800 (PST) X-Google-Smtp-Source: AHgI3IbM0GVCwzA+o1hG/p+gTmrk93c6qhHYYnlcIeRoPmO5s7BDbOJlrIGXBPjgMyRal8w5ulYP X-Received: by 2002:a17:902:147:: with SMTP id 65mr1204206plb.116.1550793059362; Thu, 21 Feb 2019 15:50:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550793059; cv=none; d=google.com; s=arc-20160816; b=mb+G4VVdpIDTa1b8kvYO6+Axtgm+iP28/wHKLnHAwuP45F0ginG4BEpnUy095mmHdo 8HSFK/FfI6G7TPfiAYAHh58RV7LgG6GZVDHEBeHdrXZ2HRAnpFroV2M87r0DoEq3fWik P1BZ4BZ+wxL/+NwIQjRMLu53hE/+TZ1jhZj4VMBXHiC+6u6BlgE15cGOGBBM8xRm3pxm q+dAmNei9KswcXGc1+zkSYizMT19HYn5typHAhhRPQJFGoTVXGX6gv9SPj5qFeZBFbj3 xwCwPTJafigJRvPAGbvBU6vWxqMmGaeFmc8LFoSSVctlplmVDxSfLBvsYRhCdoGWcZoK jOmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=O07umIFKOz6W0sIx5craaaoizFYzAP28pSU0DhhenOc=; b=xWTIgFoGJZqpmcBdDJfNm9Nw7IJ6pceg8zHdqBBClNhVwXBGiESAC9WQClgHks4pol 51YLxcHENhiXWhE6AQENMUJN+NDgQwJIJgWFTbB/qnZQlT4V3ee+WbPSGkZNXSuCaNKT sh6I7lNtm/UNnOxOMDkgn9tdUeGAWgknLkKi9NuKH6Z9yfaG2kSbWqNrAra7YKxrBtfv gh1AY6ApT4PGo5LIUHzFCaNb91ByN76L4h7BJrS42vfsTQLknZqlKGODWJHYs/UP+Mk2 29wy84Fo8ZwSu53lQL77Yb50UBBPhGj/fIpB+kD9GasP69upnetjmfzzEuSd3Zlx4WVE dgkQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id c4si238494pfn.83.2019.02.21.15.50.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Feb 2019 15:50:59 -0800 (PST) Received-SPF: pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Feb 2019 15:50:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,397,1544515200"; d="scan'208";a="322394832" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by fmsmga005.fm.intel.com with ESMTP; 21 Feb 2019 15:50:57 -0800 From: Rick Edgecombe To: Andy Lutomirski , Ingo Molnar Cc: linux-kernel@vger.kernel.org, x86@kernel.org, hpa@zytor.com, Thomas Gleixner , Borislav Petkov , Nadav Amit , Dave Hansen , Peter Zijlstra , linux_dti@icloud.com, linux-integrity@vger.kernel.org, linux-security-module@vger.kernel.org, akpm@linux-foundation.org, kernel-hardening@lists.openwall.com, linux-mm@kvack.org, will.deacon@arm.com, ard.biesheuvel@linaro.org, kristen@linux.intel.com, deneen.t.dock@intel.com, Nadav Amit , Kees Cook , Dave Hansen , Masami Hiramatsu , Rick Edgecombe Subject: [PATCH v3 06/20] x86/alternative: Use temporary mm for text poking Date: Thu, 21 Feb 2019 15:44:37 -0800 Message-Id: <20190221234451.17632-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190221234451.17632-1-rick.p.edgecombe@intel.com> References: <20190221234451.17632-1-rick.p.edgecombe@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Nadav Amit text_poke() can potentially compromise security as it sets temporary PTEs in the fixmap. These PTEs might be used to rewrite the kernel code from other cores accidentally or maliciously, if an attacker gains the ability to write onto kernel memory. Moreover, since remote TLBs are not flushed after the temporary PTEs are removed, the time-window in which the code is writable is not limited if the fixmap PTEs - maliciously or accidentally - are cached in the TLB. To address these potential security hazards, use a temporary mm for patching the code. Finally, text_poke() is also not conservative enough when mapping pages, as it always tries to map 2 pages, even when a single one is sufficient. So try to be more conservative, and do not map more than needed. Cc: Andy Lutomirski Cc: Kees Cook Cc: Dave Hansen Cc: Masami Hiramatsu Acked-by: Peter Zijlstra (Intel) Signed-off-by: Nadav Amit Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/fixmap.h | 2 - arch/x86/kernel/alternative.c | 108 +++++++++++++++++++++++++++------- arch/x86/xen/mmu_pv.c | 2 - 3 files changed, 86 insertions(+), 26 deletions(-) diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 50ba74a34a37..9da8cccdf3fb 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -103,8 +103,6 @@ enum fixed_addresses { #ifdef CONFIG_PARAVIRT FIX_PARAVIRT_BOOTMAP, #endif - FIX_TEXT_POKE1, /* reserve 2 pages for text_poke() */ - FIX_TEXT_POKE0, /* first page is last, because allocation is backward */ #ifdef CONFIG_X86_INTEL_MID FIX_LNW_VRTC, #endif diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index ae05fbb50171..cfe5bfe06f9d 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -683,41 +684,104 @@ __ro_after_init unsigned long poking_addr; static void *__text_poke(void *addr, const void *opcode, size_t len) { + bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE; + struct page *pages[2] = {NULL}; + temp_mm_state_t prev; unsigned long flags; - char *vaddr; - struct page *pages[2]; - int i; + pte_t pte, *ptep; + spinlock_t *ptl; + pgprot_t pgprot; /* - * While boot memory allocator is runnig we cannot use struct - * pages as they are not yet initialized. + * While boot memory allocator is running we cannot use struct pages as + * they are not yet initialized. There is no way to recover. */ BUG_ON(!after_bootmem); if (!core_kernel_text((unsigned long)addr)) { pages[0] = vmalloc_to_page(addr); - pages[1] = vmalloc_to_page(addr + PAGE_SIZE); + if (cross_page_boundary) + pages[1] = vmalloc_to_page(addr + PAGE_SIZE); } else { pages[0] = virt_to_page(addr); WARN_ON(!PageReserved(pages[0])); - pages[1] = virt_to_page(addr + PAGE_SIZE); + if (cross_page_boundary) + pages[1] = virt_to_page(addr + PAGE_SIZE); } - BUG_ON(!pages[0]); + /* + * If something went wrong, crash and burn since recovery paths are not + * implemented. + */ + BUG_ON(!pages[0] || (cross_page_boundary && !pages[1])); + local_irq_save(flags); - set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0])); - if (pages[1]) - set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1])); - vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0); - memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len); - clear_fixmap(FIX_TEXT_POKE0); - if (pages[1]) - clear_fixmap(FIX_TEXT_POKE1); - local_flush_tlb(); - sync_core(); - /* Could also do a CLFLUSH here to speed up CPU recovery; but - that causes hangs on some VIA CPUs. */ - for (i = 0; i < len; i++) - BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]); + + /* + * Map the page without the global bit, as TLB flushing is done with + * flush_tlb_mm_range(), which is intended for non-global PTEs. + */ + pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL); + + /* + * The lock is not really needed, but this allows to avoid open-coding. + */ + ptep = get_locked_pte(poking_mm, poking_addr, &ptl); + + /* + * This must not fail; preallocated in poking_init(). + */ + VM_BUG_ON(!ptep); + + pte = mk_pte(pages[0], pgprot); + set_pte_at(poking_mm, poking_addr, ptep, pte); + + if (cross_page_boundary) { + pte = mk_pte(pages[1], pgprot); + set_pte_at(poking_mm, poking_addr + PAGE_SIZE, ptep + 1, pte); + } + + /* + * Loading the temporary mm behaves as a compiler barrier, which + * guarantees that the PTE will be set at the time memcpy() is done. + */ + prev = use_temporary_mm(poking_mm); + + kasan_disable_current(); + memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len); + kasan_enable_current(); + + /* + * Ensure that the PTE is only cleared after the instructions of memcpy + * were issued by using a compiler barrier. + */ + barrier(); + + pte_clear(poking_mm, poking_addr, ptep); + if (cross_page_boundary) + pte_clear(poking_mm, poking_addr + PAGE_SIZE, ptep + 1); + + /* + * Loading the previous page-table hierarchy requires a serializing + * instruction that already allows the core to see the updated version. + * Xen-PV is assumed to serialize execution in a similar manner. + */ + unuse_temporary_mm(prev); + + /* + * Flushing the TLB might involve IPIs, which would require enabled + * IRQs, but not if the mm is not used, as it is in this point. + */ + flush_tlb_mm_range(poking_mm, poking_addr, poking_addr + + (cross_page_boundary ? 2 : 1) * PAGE_SIZE, + PAGE_SHIFT, false); + + /* + * If the text does not match what we just wrote then something is + * fundamentally screwy; there's nothing we can really do about that. + */ + BUG_ON(memcmp(addr, opcode, len)); + + pte_unmap_unlock(ptep, ptl); local_irq_restore(flags); return addr; } diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 0f4fe206dcc2..82b181fcefe5 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2319,8 +2319,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) #elif defined(CONFIG_X86_VSYSCALL_EMULATION) case VSYSCALL_PAGE: #endif - case FIX_TEXT_POKE0: - case FIX_TEXT_POKE1: /* All local page mappings */ pte = pfn_pte(phys, prot); break;