From patchwork Wed Dec 19 21:33:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Igor Stoppa X-Patchwork-Id: 10738187 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 827CE6C5 for ; Wed, 19 Dec 2018 21:34:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 71553286B2 for ; Wed, 19 Dec 2018 21:34:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 643D1286B4; Wed, 19 Dec 2018 21:34:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.0 required=2.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.wl.linuxfoundation.org (Postfix) with SMTP id 13FE0286B2 for ; Wed, 19 Dec 2018 21:34:56 +0000 (UTC) Received: (qmail 30713 invoked by uid 550); 19 Dec 2018 21:34:23 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 30626 invoked from network); 19 Dec 2018 21:34:22 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=yHB4k0lvmhU4nXv0IUtjgNRdUGapAYzZQcu+eQuvDdA=; b=kOnaqwp5GBEPCU0a21wgkz/EG5Ts/3mLyDdMp2Mo/NK9HQoGQ8NjvOhrJz+z5RPlwM T3QOLqQzntFpKjHk3dHoLVRsqc2EWGa9YtNeI4cU0zgXLyzfHZhUK8yaDT2ruKWD1LRU KMRBn6v5H2PL0wB2hxA1fqq2DmdHU4ical+SkVHpj5UTMBZHMB+zEleQP1z8EanXyix0 qlv3x63u3GHJt7IKyNKSmuEGyqRdc8eUtV5+ala5HgnVpVM03uQ1OaKjeCJE8KK6dFJ/ oXN2McI5XOKo5TOxG5hc0y2/hAi9i+p/IVDfegOJ2LWqyEzTTUFoZ96howTd3QfExt8H aVPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=yHB4k0lvmhU4nXv0IUtjgNRdUGapAYzZQcu+eQuvDdA=; b=QvX3V96s7RtutFvjpdY2OSQClgcA7F1bRDgtfBWuqNm3nhsgQVlOm+6oB0TKIcRx+0 j0IayG69OqHN9JoFqMOOeVGWkEBn2TOq4DUTTLceWlC/zsJv/ZOZFFY+WpASg4yZcUOZ qFUFtWfViJjsAVgxAb4M/sjLQgSL7wPZMSOrPl1wvTo31+4mk6BdXN7J8zbr2PWG57kP ZwP+KhvQ2v7LBajwr9DW4vHchtzF3FR52RkIsgAjpGYwvAa6LUXbYh3HBQMoIixDOWkz yKeLPh03aOFnCE1s5x/B+05OYXHokPi6hO0aDj9jOfwvbZW7FI0HBRG5d1eB4bbcy4cZ FOhg== X-Gm-Message-State: AA+aEWYpS3MjQIF6rwEPMqpeNSeGvB2KlXDzzt+ByOTx1pDzwn3XwUmq pYJguMiAUPcwALVn+kTxfCY= X-Google-Smtp-Source: AFSGD/Uo61rA/JVaTfVFoo4LkxPpQtbW+kf75l1G+fg6IG3iNHxg6ZllSWLW1R02pnii+GSEQCqSOA== X-Received: by 2002:a2e:3012:: with SMTP id w18-v6mr13490729ljw.75.1545255251599; Wed, 19 Dec 2018 13:34:11 -0800 (PST) From: Igor Stoppa X-Google-Original-From: Igor Stoppa To: Andy Lutomirski , Matthew Wilcox , Peter Zijlstra , Dave Hansen , Mimi Zohar Cc: igor.stoppa@huawei.com, Nadav Amit , Kees Cook , linux-integrity@vger.kernel.org, kernel-hardening@lists.openwall.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 04/12] __wr_after_init: x86_64: __wr_op Date: Wed, 19 Dec 2018 23:33:30 +0200 Message-Id: <20181219213338.26619-5-igor.stoppa@huawei.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181219213338.26619-1-igor.stoppa@huawei.com> References: <20181219213338.26619-1-igor.stoppa@huawei.com> MIME-Version: 1.0 X-Virus-Scanned: ClamAV using ClamSMTP Architecture-specific implementation of the core write rare operation. The implementation is based on code from Andy Lutomirski and Nadav Amit for patching the text on x86 [here goes reference to commits, once merged] The modification of write protected data is done through an alternate mapping of the same pages, as writable. This mapping is persistent, but active only for a core that is performing a write rare operation. And only for the duration of said operation. Local interrupts are disabled, while the alternate mapping is active. In theory, it could introduce a non-predictable delay, in a preemptible system, however the amount of data to be altered is likely to be far smaller than a page. Signed-off-by: Igor Stoppa CC: Andy Lutomirski CC: Nadav Amit CC: Matthew Wilcox CC: Peter Zijlstra CC: Kees Cook CC: Dave Hansen CC: Mimi Zohar CC: linux-integrity@vger.kernel.org CC: kernel-hardening@lists.openwall.com CC: linux-mm@kvack.org CC: linux-kernel@vger.kernel.org --- arch/x86/Kconfig | 1 + arch/x86/mm/Makefile | 2 + arch/x86/mm/prmem.c | 120 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 123 insertions(+) create mode 100644 arch/x86/mm/prmem.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8689e794a43c..e5e4fc4fa5c2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -32,6 +32,7 @@ config X86_64 select SWIOTLB select X86_DEV_DMA_OPS select ARCH_HAS_SYSCALL_WRAPPER + select ARCH_HAS_PRMEM # # Arch settings diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 4b101dd6e52f..66652de1e2c7 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o + +obj-$(CONFIG_PRMEM) += prmem.o diff --git a/arch/x86/mm/prmem.c b/arch/x86/mm/prmem.c new file mode 100644 index 000000000000..fc367551e736 --- /dev/null +++ b/arch/x86/mm/prmem.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * prmem.c: Memory Protection Library + * + * (C) Copyright 2017-2018 Huawei Technologies Co. Ltd. + * Author: Igor Stoppa + */ + +#include +#include +#include +#include +#include +#include +#include + +static __ro_after_init bool wr_ready; +static __ro_after_init struct mm_struct *wr_poking_mm; +static __ro_after_init unsigned long wr_poking_base; + +/* + * The following two variables are statically allocated by the linker + * script at the the boundaries of the memory region (rounded up to + * multiples of PAGE_SIZE) reserved for __wr_after_init. + */ +extern long __start_wr_after_init; +extern long __end_wr_after_init; + +static inline bool is_wr_after_init(unsigned long ptr, __kernel_size_t size) +{ + unsigned long start = (unsigned long)&__start_wr_after_init; + unsigned long end = (unsigned long)&__end_wr_after_init; + unsigned long low = ptr; + unsigned long high = ptr + size; + + return likely(start <= low && low <= high && high <= end); +} + +void *__wr_op(unsigned long dst, unsigned long src, __kernel_size_t len, + enum wr_op_type op) +{ + temporary_mm_state_t prev; + unsigned long offset; + unsigned long wr_poking_addr; + + /* Confirm that the writable mapping exists. */ + if (WARN_ONCE(!wr_ready, "No writable mapping available")) + return (void *)dst; + + if (WARN_ONCE(op >= WR_OPS_NUMBER, "Invalid WR operation.") || + WARN_ONCE(!is_wr_after_init(dst, len), "Invalid WR range.")) + return (void *)dst; + + offset = dst - (unsigned long)&__start_wr_after_init; + wr_poking_addr = wr_poking_base + offset; + local_irq_disable(); + prev = use_temporary_mm(wr_poking_mm); + + if (op == WR_MEMCPY) + copy_to_user((void __user *)wr_poking_addr, (void *)src, len); + else if (op == WR_MEMSET) + memset_user((void __user *)wr_poking_addr, (u8)src, len); + + unuse_temporary_mm(prev); + local_irq_enable(); + return (void *)dst; +} + +#define TB (1UL << 40) + +struct mm_struct *copy_init_mm(void); +void __init wr_poking_init(void) +{ + unsigned long start = (unsigned long)&__start_wr_after_init; + unsigned long end = (unsigned long)&__end_wr_after_init; + unsigned long i; + unsigned long wr_range; + + wr_poking_mm = copy_init_mm(); + if (WARN_ONCE(!wr_poking_mm, "No alternate mapping available.")) + return; + + wr_range = round_up(end - start, PAGE_SIZE); + + /* Randomize the poking address base*/ + wr_poking_base = TASK_UNMAPPED_BASE + + (kaslr_get_random_long("Write Rare Poking") & PAGE_MASK) % + (TASK_SIZE - (TASK_UNMAPPED_BASE + wr_range)); + + /* + * Place 64TB of kernel address space within 128TB of user address + * space, at a random page aligned offset. + */ + wr_poking_base = (((unsigned long)kaslr_get_random_long("WR Poke")) & + PAGE_MASK) % (64 * _BITUL(40)); + + /* Create alternate mapping for the entire wr_after_init range. */ + for (i = start; i < end; i += PAGE_SIZE) { + struct page *page; + spinlock_t *ptl; + pte_t pte; + pte_t *ptep; + unsigned long wr_poking_addr; + + page = virt_to_page(i); + if (WARN_ONCE(!page, "WR memory without physical page")) + return; + wr_poking_addr = i - start + wr_poking_base; + + /* The lock is not needed, but avoids open-coding. */ + ptep = get_locked_pte(wr_poking_mm, wr_poking_addr, &ptl); + if (WARN_ONCE(!ptep, "No pte for writable mapping")) + return; + + pte = mk_pte(page, PAGE_KERNEL); + set_pte_at(wr_poking_mm, wr_poking_addr, ptep, pte); + spin_unlock(ptl); + } + wr_ready = true; +}