From patchwork Thu Apr 25 21:45:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 10917791 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B13D314D5 for ; Thu, 25 Apr 2019 21:46:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A128728D41 for ; Thu, 25 Apr 2019 21:46:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 952D828D43; Thu, 25 Apr 2019 21:46:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D59228D41 for ; Thu, 25 Apr 2019 21:46:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388155AbfDYVqQ (ORCPT ); Thu, 25 Apr 2019 17:46:16 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:41068 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388153AbfDYVqP (ORCPT ); Thu, 25 Apr 2019 17:46:15 -0400 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3PLcv5E112331 for ; Thu, 25 Apr 2019 17:46:14 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 2s3kbe4c84-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 25 Apr 2019 17:46:13 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 25 Apr 2019 22:46:11 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 25 Apr 2019 22:46:07 +0100 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3PLk6X347775906 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Apr 2019 21:46:06 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7DB174C044; Thu, 25 Apr 2019 21:46:06 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0DA044C04E; Thu, 25 Apr 2019 21:46:04 +0000 (GMT) Received: from rapoport-lnx (unknown [9.148.204.209]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 25 Apr 2019 21:46:03 +0000 (GMT) Received: by rapoport-lnx (sSMTP sendmail emulation); Fri, 26 Apr 2019 00:46:03 +0300 From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexandre Chartre , Andy Lutomirski , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Ingo Molnar , James Bottomley , Jonathan Adams , Kees Cook , Paul Turner , Peter Zijlstra , Thomas Gleixner , linux-mm@kvack.org, linux-security-module@vger.kernel.org, x86@kernel.org, Mike Rapoport Subject: [RFC PATCH 1/7] x86/cpufeatures: add X86_FEATURE_SCI Date: Fri, 26 Apr 2019 00:45:48 +0300 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> References: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 19042521-0012-0000-0000-000003150D15 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19042521-0013-0000-0000-0000214D68D7 Message-Id: <1556228754-12996-2-git-send-email-rppt@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-25_18:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=706 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904250133 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The X86_FEATURE_SCI will be set when system call isolation is enabled. Signed-off-by: Mike Rapoport --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 6d61225..a01c6dd 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -221,6 +221,7 @@ #define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */ #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */ +#define X86_FEATURE_SCI ( 7*32+31) /* "" System call isolation */ /* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index a5ea841..79947f0 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -62,6 +62,12 @@ # define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) #endif +#ifdef CONFIG_SYSCALL_ISOLATION +# define DISABLE_SCI 0 +#else +# define DISABLE_SCI (1 << (X86_FEATURE_SCI & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -72,7 +78,7 @@ #define DISABLED_MASK4 (DISABLE_PCID) #define DISABLED_MASK5 0 #define DISABLED_MASK6 0 -#define DISABLED_MASK7 (DISABLE_PTI) +#define DISABLED_MASK7 (DISABLE_PTI|DISABLE_SCI) #define DISABLED_MASK8 0 #define DISABLED_MASK9 (DISABLE_MPX|DISABLE_SMAP) #define DISABLED_MASK10 0 From patchwork Thu Apr 25 21:45:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 10917817 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 197F714D5 for ; Thu, 25 Apr 2019 21:46:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A40228D41 for ; Thu, 25 Apr 2019 21:46:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F14A028D43; Thu, 25 Apr 2019 21:46:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A881D28D41 for ; Thu, 25 Apr 2019 21:46:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388132AbfDYVqU (ORCPT ); Thu, 25 Apr 2019 17:46:20 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:47312 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388168AbfDYVqT (ORCPT ); Thu, 25 Apr 2019 17:46:19 -0400 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3PLeX7L127065 for ; Thu, 25 Apr 2019 17:46:17 -0400 Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103]) by mx0a-001b2d01.pphosted.com with ESMTP id 2s3n0vr5gh-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 25 Apr 2019 17:46:17 -0400 Received: from localhost by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 25 Apr 2019 22:46:15 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 25 Apr 2019 22:46:11 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3PLk9wg54984822 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Apr 2019 21:46:10 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D3B725205A; Thu, 25 Apr 2019 21:46:09 +0000 (GMT) Received: from rapoport-lnx (unknown [9.148.204.209]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTPS id 4AED55204E; Thu, 25 Apr 2019 21:46:07 +0000 (GMT) Received: by rapoport-lnx (sSMTP sendmail emulation); Fri, 26 Apr 2019 00:46:06 +0300 From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexandre Chartre , Andy Lutomirski , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Ingo Molnar , James Bottomley , Jonathan Adams , Kees Cook , Paul Turner , Peter Zijlstra , Thomas Gleixner , linux-mm@kvack.org, linux-security-module@vger.kernel.org, x86@kernel.org, Mike Rapoport Subject: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation Date: Fri, 26 Apr 2019 00:45:49 +0300 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> References: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 19042521-0028-0000-0000-000003670D68 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19042521-0029-0000-0000-00002426659D Message-Id: <1556228754-12996-3-git-send-email-rppt@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-25_18:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904250133 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When enabled, the system call isolation (SCI) would allow execution of the system calls with reduced page tables. These page tables are almost identical to the user page tables in PTI. The only addition is the code page containing system call entry function that will continue exectution after the context switch. Unlike PTI page tables, there is no sharing at higher levels and all the hierarchy for SCI page tables is cloned. The SCI page tables are created when a system call that requires isolation is executed for the first time. Whenever a system call should be executed in the isolated environment, the context is switched to the SCI page tables. Any further access to the kernel memory will generate a page fault. The page fault handler can verify that the access is safe and grant it or kill the task otherwise. The initial SCI implementation allows access to any kernel data, but it limits access to the code in the following way: * calls and jumps to known code symbols without offset are allowed * calls and jumps into a known symbol with offset are allowed only if that symbol was already accessed and the offset is in the next page * all other code access are blocked After the isolated system call finishes, the mappings created during its execution are cleared. The entire SCI page table is lazily freed at task exit() time. Signed-off-by: Mike Rapoport --- arch/x86/include/asm/sci.h | 55 ++++ arch/x86/mm/Makefile | 1 + arch/x86/mm/init.c | 2 + arch/x86/mm/sci.c | 608 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/sched.h | 5 + include/linux/sci.h | 12 + 6 files changed, 683 insertions(+) create mode 100644 arch/x86/include/asm/sci.h create mode 100644 arch/x86/mm/sci.c create mode 100644 include/linux/sci.h diff --git a/arch/x86/include/asm/sci.h b/arch/x86/include/asm/sci.h new file mode 100644 index 0000000..0b56200 --- /dev/null +++ b/arch/x86/include/asm/sci.h @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef _ASM_X86_SCI_H +#define _ASM_X86_SCI_H + +#ifdef CONFIG_SYSCALL_ISOLATION + +struct sci_task_data { + pgd_t *pgd; + unsigned long cr3_offset; + unsigned long backtrace_size; + unsigned long *backtrace; + unsigned long ptes_count; + pte_t **ptes; +}; + +struct sci_percpu_data { + unsigned long sci_syscall; + unsigned long sci_cr3_offset; +}; + +DECLARE_PER_CPU_PAGE_ALIGNED(struct sci_percpu_data, cpu_sci); + +void sci_check_boottime_disable(void); + +int sci_init(struct task_struct *tsk); +void sci_exit(struct task_struct *tsk); + +bool sci_verify_and_map(struct pt_regs *regs, unsigned long addr, + unsigned long hw_error_code); +void sci_clear_data(void); + +static inline void sci_switch_to(struct task_struct *next) +{ + this_cpu_write(cpu_sci.sci_syscall, next->in_isolated_syscall); + if (next->sci) + this_cpu_write(cpu_sci.sci_cr3_offset, next->sci->cr3_offset); +} + +#else /* CONFIG_SYSCALL_ISOLATION */ + +static inline void sci_check_boottime_disable(void) {} + +static inline bool sci_verify_and_map(struct pt_regs *regs,unsigned long addr, + unsigned long hw_error_code) +{ + return true; +} + +static inline void sci_clear_data(void) {} + +static inline void sci_switch_to(struct task_struct *next) {} + +#endif /* CONFIG_SYSCALL_ISOLATION */ + +#endif /* _ASM_X86_SCI_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 4b101dd..9a728b7 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -49,6 +49,7 @@ obj-$(CONFIG_X86_INTEL_MPX) += mpx.o obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o +obj-$(CONFIG_SYSCALL_ISOLATION) += sci.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index f905a23..b6e2db4 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -22,6 +22,7 @@ #include #include #include +#include /* * We need to define the tracepoints somewhere, and tlb.c @@ -648,6 +649,7 @@ void __init init_mem_mapping(void) unsigned long end; pti_check_boottime_disable(); + sci_check_boottime_disable(); probe_page_size_mask(); setup_pcid(); diff --git a/arch/x86/mm/sci.c b/arch/x86/mm/sci.c new file mode 100644 index 0000000..e7ddec1 --- /dev/null +++ b/arch/x86/mm/sci.c @@ -0,0 +1,608 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright(c) 2019 IBM Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * Author: Mike Rapoport + * + * This code is based on pti.c, see it for the original copyrights + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#undef pr_fmt +#define pr_fmt(fmt) "SCI: " fmt + +#define SCI_MAX_PTES 256 +#define SCI_MAX_BACKTRACE 64 + +__visible DEFINE_PER_CPU_PAGE_ALIGNED(struct sci_percpu_data, cpu_sci); + +/* + * Walk the shadow copy of the page tables to PMD level (optionally) + * trying to allocate page table pages on the way down. + * + * Allocation failures are not handled here because the entire page + * table will be freed in sci_free_pagetable. + * + * Returns a pointer to a PMD on success, or NULL on failure. + */ +static pmd_t *sci_pagetable_walk_pmd(struct mm_struct *mm, + pgd_t *pgd, unsigned long address) +{ + p4d_t *p4d; + pud_t *pud; + + p4d = p4d_alloc(mm, pgd, address); + if (!p4d) + return NULL; + + pud = pud_alloc(mm, p4d, address); + if (!pud) + return NULL; + + return pmd_alloc(mm, pud, address); +} + +/* + * Walk the shadow copy of the page tables to PTE level (optionally) + * trying to allocate page table pages on the way down. + * + * Returns a pointer to a PTE on success, or NULL on failure. + */ +static pte_t *sci_pagetable_walk_pte(struct mm_struct *mm, + pgd_t *pgd, unsigned long address) +{ + pmd_t *pmd = sci_pagetable_walk_pmd(mm, pgd, address); + + if (!pmd) + return NULL; + + if (__pte_alloc(mm, pmd)) + return NULL; + + return pte_offset_kernel(pmd, address); +} + +/* + * Clone a single page mapping + * + * The new mapping in the @target_pgdp is always created for base + * page. If the orinal page table has the page at @addr mapped at PMD + * level, we anyway create at PTE in the target page table and map + * only PAGE_SIZE. + */ +static pte_t *sci_clone_page(struct mm_struct *mm, + pgd_t *pgdp, pgd_t *target_pgdp, + unsigned long addr) +{ + pte_t *pte, *target_pte, ptev; + pgd_t *pgd, *target_pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + + pgd = pgd_offset_pgd(pgdp, addr); + if (pgd_none(*pgd)) + return NULL; + + p4d = p4d_offset(pgd, addr); + if (p4d_none(*p4d)) + return NULL; + + pud = pud_offset(p4d, addr); + if (pud_none(*pud)) + return NULL; + + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return NULL; + + target_pgd = pgd_offset_pgd(target_pgdp, addr); + + if (pmd_large(*pmd)) { + pgprot_t flags; + unsigned long pfn; + + /* + * We map only PAGE_SIZE rather than the entire huge page. + * The PTE will have the same pgprot bits as the origial PMD + */ + flags = pte_pgprot(pte_clrhuge(*(pte_t *)pmd)); + pfn = pmd_pfn(*pmd) + pte_index(addr); + ptev = pfn_pte(pfn, flags); + } else { + pte = pte_offset_kernel(pmd, addr); + if (pte_none(*pte) || !(pte_flags(*pte) & _PAGE_PRESENT)) + return NULL; + + ptev = *pte; + } + + target_pte = sci_pagetable_walk_pte(mm, target_pgd, addr); + if (!target_pte) + return NULL; + + *target_pte = ptev; + + return target_pte; +} + +/* + * Clone a range keeping the same leaf mappings + * + * If the range has holes they are simply skipped + */ +static int sci_clone_range(struct mm_struct *mm, + pgd_t *pgdp, pgd_t *target_pgdp, + unsigned long start, unsigned long end) +{ + unsigned long addr; + + /* + * Clone the populated PMDs which cover start to end. These PMD areas + * can have holes. + */ + for (addr = start; addr < end;) { + pte_t *pte, *target_pte; + pgd_t *pgd, *target_pgd; + pmd_t *pmd, *target_pmd; + p4d_t *p4d; + pud_t *pud; + + /* Overflow check */ + if (addr < start) + break; + + pgd = pgd_offset_pgd(pgdp, addr); + if (pgd_none(*pgd)) + return 0; + + p4d = p4d_offset(pgd, addr); + if (p4d_none(*p4d)) + return 0; + + pud = pud_offset(p4d, addr); + if (pud_none(*pud)) { + addr += PUD_SIZE; + continue; + } + + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) { + addr += PMD_SIZE; + continue; + } + + target_pgd = pgd_offset_pgd(target_pgdp, addr); + + if (pmd_large(*pmd)) { + target_pmd = sci_pagetable_walk_pmd(mm, target_pgd, + addr); + if (!target_pmd) + return -ENOMEM; + + *target_pmd = *pmd; + + addr += PMD_SIZE; + continue; + } else { + pte = pte_offset_kernel(pmd, addr); + if (pte_none(*pte)) { + addr += PAGE_SIZE; + continue; + } + + target_pte = sci_pagetable_walk_pte(mm, target_pgd, + addr); + if (!target_pte) + return -ENOMEM; + + *target_pte = *pte; + + addr += PAGE_SIZE; + } + } + + return 0; +} + +/* + * we have to map the syscall entry because we'll fault there after + * CR3 switch and before the verifier is able to detect this as proper + * access + */ +extern void do_syscall_64(unsigned long nr, struct pt_regs *regs); +unsigned long syscall_entry_addr = (unsigned long)do_syscall_64; + +static void sci_reset_backtrace(struct sci_task_data *sci) +{ + memset(sci->backtrace, 0, sci->backtrace_size); + sci->backtrace[0] = syscall_entry_addr; + sci->backtrace_size = 1; +} + +static inline void sci_sync_user_pagetable(struct task_struct *tsk) +{ + pgd_t *u_pgd = kernel_to_user_pgdp(tsk->mm->pgd); + pgd_t *sci_pgd = tsk->sci->pgd; + + down_write(&tsk->mm->mmap_sem); + memcpy(sci_pgd, u_pgd, PGD_KERNEL_START * sizeof(pgd_t)); + up_write(&tsk->mm->mmap_sem); +} + +static int sci_free_pte_range(struct mm_struct *mm, pmd_t *pmd) +{ + pte_t *ptep = pte_offset_kernel(pmd, 0); + + pmd_clear(pmd); + pte_free(mm, virt_to_page(ptep)); + mm_dec_nr_ptes(mm); + + return 0; +} + +static int sci_free_pmd_range(struct mm_struct *mm, pud_t *pud) +{ + pmd_t *pmd, *pmdp; + int i; + + pmdp = pmd_offset(pud, 0); + + for (i = 0, pmd = pmdp; i < PTRS_PER_PMD; i++, pmd++) + if (!pmd_none(*pmd) && !pmd_large(*pmd)) + sci_free_pte_range(mm, pmd); + + pud_clear(pud); + pmd_free(mm, pmdp); + mm_dec_nr_pmds(mm); + + return 0; +} + +static int sci_free_pud_range(struct mm_struct *mm, p4d_t *p4d) +{ + pud_t *pud, *pudp; + int i; + + pudp = pud_offset(p4d, 0); + + for (i = 0, pud = pudp; i < PTRS_PER_PUD; i++, pud++) + if (!pud_none(*pud)) + sci_free_pmd_range(mm, pud); + + p4d_clear(p4d); + pud_free(mm, pudp); + mm_dec_nr_puds(mm); + + return 0; +} + +static int sci_free_p4d_range(struct mm_struct *mm, pgd_t *pgd) +{ + p4d_t *p4d, *p4dp; + int i; + + p4dp = p4d_offset(pgd, 0); + + for (i = 0, p4d = p4dp; i < PTRS_PER_P4D; i++, p4d++) + if (!p4d_none(*p4d)) + sci_free_pud_range(mm, p4d); + + pgd_clear(pgd); + p4d_free(mm, p4dp); + + return 0; +} + +static int sci_free_pagetable(struct task_struct *tsk, pgd_t *sci_pgd) +{ + struct mm_struct *mm = tsk->mm; + pgd_t *pgd, *pgdp = sci_pgd; + +#ifdef SCI_SHARED_PAGE_TABLES + int i; + + for (i = KERNEL_PGD_BOUNDARY; i < PTRS_PER_PGD; i++) { + if (i >= pgd_index(VMALLOC_START) && + i < pgd_index(__START_KERNEL_map)) + continue; + pgd = pgdp + i; + sci_free_p4d_range(mm, pgd); + } +#else + for (pgd = pgdp + KERNEL_PGD_BOUNDARY; pgd < pgdp + PTRS_PER_PGD; pgd++) + if (!pgd_none(*pgd)) + sci_free_p4d_range(mm, pgd); +#endif + + + return 0; +} + +static int sci_pagetable_init(struct task_struct *tsk, pgd_t *sci_pgd) +{ + struct mm_struct *mm = tsk->mm; + pgd_t *k_pgd = mm->pgd; + pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd); + unsigned long stack = (unsigned long)tsk->stack; + unsigned long addr; + unsigned int cpu; + pte_t *pte; + int ret; + + /* copy the kernel part of user visible page table */ + ret = sci_clone_range(mm, u_pgd, sci_pgd, CPU_ENTRY_AREA_BASE, + CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE); + if (ret) + goto err_free_pagetable; + + ret = sci_clone_range(mm, u_pgd, sci_pgd, + (unsigned long) __entry_text_start, + (unsigned long) __irqentry_text_end); + if (ret) + goto err_free_pagetable; + + ret = sci_clone_range(mm, mm->pgd, sci_pgd, + stack, stack + THREAD_SIZE); + if (ret) + goto err_free_pagetable; + + ret = -ENOMEM; + for_each_possible_cpu(cpu) { + addr = (unsigned long)&per_cpu(cpu_sci, cpu); + pte = sci_clone_page(mm, k_pgd, sci_pgd, addr); + if (!pte) + goto err_free_pagetable; + } + + /* plus do_syscall_64 */ + pte = sci_clone_page(mm, k_pgd, sci_pgd, syscall_entry_addr); + if (!pte) + goto err_free_pagetable; + + return 0; + +err_free_pagetable: + sci_free_pagetable(tsk, sci_pgd); + return ret; +} + +static int sci_alloc(struct task_struct *tsk) +{ + struct sci_task_data *sci; + int err = -ENOMEM; + + if (!static_cpu_has(X86_FEATURE_SCI)) + return 0; + + if (tsk->sci) + return 0; + + sci = kzalloc(sizeof(*sci), GFP_KERNEL); + if (!sci) + return err; + + sci->ptes = kcalloc(SCI_MAX_PTES, sizeof(*sci->ptes), GFP_KERNEL); + if (!sci->ptes) + goto free_sci; + + sci->backtrace = kcalloc(SCI_MAX_BACKTRACE, sizeof(*sci->backtrace), + GFP_KERNEL); + if (!sci->backtrace) + goto free_ptes; + + sci->pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL); + if (!sci->pgd) + goto free_backtrace; + + err = sci_pagetable_init(tsk, sci->pgd); + if (err) + goto free_pgd; + + sci_reset_backtrace(sci); + + tsk->sci = sci; + + return 0; + +free_pgd: + free_page((unsigned long)sci->pgd); +free_backtrace: + kfree(sci->backtrace); +free_ptes: + kfree(sci->ptes); +free_sci: + kfree(sci); + return err; +} + +int sci_init(struct task_struct *tsk) +{ + if (!tsk->sci) { + int err = sci_alloc(tsk); + + if (err) + return err; + } + + sci_sync_user_pagetable(tsk); + + return 0; +} + +void sci_exit(struct task_struct *tsk) +{ + struct sci_task_data *sci = tsk->sci; + + if (!static_cpu_has(X86_FEATURE_SCI)) + return; + + if (!sci) + return; + + sci_free_pagetable(tsk, tsk->sci->pgd); + free_page((unsigned long)sci->pgd); + kfree(sci->backtrace); + kfree(sci->ptes); + kfree(sci); +} + +void sci_clear_data(void) +{ + struct sci_task_data *sci = current->sci; + int i; + + if (WARN_ON(!sci)) + return; + + for (i = 0; i < sci->ptes_count; i++) + pte_clear(NULL, 0, sci->ptes[i]); + + memset(sci->ptes, 0, sci->ptes_count); + sci->ptes_count = 0; + + sci_reset_backtrace(sci); +} + +static void sci_add_pte(struct sci_task_data *sci, pte_t *pte) +{ + int i; + + for (i = sci->ptes_count - 1; i >= 0; i--) + if (pte == sci->ptes[i]) + return; + sci->ptes[sci->ptes_count++] = pte; +} + +static void sci_add_rip(struct sci_task_data *sci, unsigned long rip) +{ + int i; + + for (i = sci->backtrace_size - 1; i >= 0; i--) + if (rip == sci->backtrace[i]) + return; + + sci->backtrace[sci->backtrace_size++] = rip; +} + +static bool sci_verify_code_access(struct sci_task_data *sci, + struct pt_regs *regs, unsigned long addr) +{ + char namebuf[KSYM_NAME_LEN]; + unsigned long offset, size; + const char *symbol; + char *modname; + + + /* instruction fetch outside kernel or module text */ + if (!(is_kernel_text(addr) || is_module_text_address(addr))) + return false; + + /* no symbol matches the address */ + symbol = kallsyms_lookup(addr, &size, &offset, &modname, namebuf); + if (!symbol) + return false; + + /* BPF or ftrace? */ + if (symbol != namebuf) + return false; + + /* access in the middle of a function */ + if (offset) { + int i = 0; + + for (i = sci->backtrace_size - 1; i >= 0; i--) { + unsigned long rip = sci->backtrace[i]; + + /* allow jumps to the next page of already mapped one */ + if ((addr >> PAGE_SHIFT) == ((rip >> PAGE_SHIFT) + 1)) + return true; + } + + return false; + } + + sci_add_rip(sci, regs->ip); + + return true; +} + +bool sci_verify_and_map(struct pt_regs *regs, unsigned long addr, + unsigned long hw_error_code) +{ + struct task_struct *tsk = current; + struct mm_struct *mm = tsk->mm; + struct sci_task_data *sci = tsk->sci; + pte_t *pte; + + /* run out of room for metadata, can't grant access */ + if (sci->ptes_count >= SCI_MAX_PTES || + sci->backtrace_size >= SCI_MAX_BACKTRACE) + return false; + + /* only code access is checked */ + if (hw_error_code & X86_PF_INSTR && + !sci_verify_code_access(sci, regs, addr)) + return false; + + pte = sci_clone_page(mm, mm->pgd, sci->pgd, addr); + if (!pte) + return false; + + sci_add_pte(sci, pte); + + return true; +} + +void __init sci_check_boottime_disable(void) +{ + char arg[5]; + int ret; + + if (!cpu_feature_enabled(X86_FEATURE_PCID)) { + pr_info("System call isolation requires PCID\n"); + return; + } + + /* Assume SCI is disabled unless explicitly overridden. */ + ret = cmdline_find_option(boot_command_line, "sci", arg, sizeof(arg)); + if (ret == 2 && !strncmp(arg, "on", 2)) { + setup_force_cpu_cap(X86_FEATURE_SCI); + pr_info("System call isolation is enabled\n"); + return; + } + + pr_info("System call isolation is disabled\n"); +} diff --git a/include/linux/sched.h b/include/linux/sched.h index f9b43c9..cdcdb07 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1202,6 +1202,11 @@ struct task_struct { unsigned long prev_lowest_stack; #endif +#ifdef CONFIG_SYSCALL_ISOLATION + unsigned long in_isolated_syscall; + struct sci_task_data *sci; +#endif + /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/include/linux/sci.h b/include/linux/sci.h new file mode 100644 index 0000000..7a6beac --- /dev/null +++ b/include/linux/sci.h @@ -0,0 +1,12 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef _LINUX_SCI_H +#define _LINUX_SCI_H + +#ifdef CONFIG_SYSCALL_ISOLATION +#include +#else +static inline int sci_init(struct task_struct *tsk) { return 0; } +static inline void sci_exit(struct task_struct *tsk) {} +#endif + +#endif /* _LINUX_SCI_H */ From patchwork Thu Apr 25 21:45:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 10917815 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8243514B6 for ; Thu, 25 Apr 2019 21:46:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 74DAB28D41 for ; Thu, 25 Apr 2019 21:46:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 697D628D43; Thu, 25 Apr 2019 21:46:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C908A28D41 for ; Thu, 25 Apr 2019 21:46:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388197AbfDYVqX (ORCPT ); Thu, 25 Apr 2019 17:46:23 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:32928 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388185AbfDYVqW (ORCPT ); Thu, 25 Apr 2019 17:46:22 -0400 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3PLZJV0079832 for ; Thu, 25 Apr 2019 17:46:21 -0400 Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103]) by mx0a-001b2d01.pphosted.com with ESMTP id 2s3k8jv6sg-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 25 Apr 2019 17:46:21 -0400 Received: from localhost by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 25 Apr 2019 22:46:19 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 25 Apr 2019 22:46:14 +0100 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3PLkDCu46923840 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Apr 2019 21:46:13 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 263B0A405C; Thu, 25 Apr 2019 21:46:13 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AA982A405F; Thu, 25 Apr 2019 21:46:10 +0000 (GMT) Received: from rapoport-lnx (unknown [9.148.204.209]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 25 Apr 2019 21:46:10 +0000 (GMT) Received: by rapoport-lnx (sSMTP sendmail emulation); Fri, 26 Apr 2019 00:46:09 +0300 From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexandre Chartre , Andy Lutomirski , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Ingo Molnar , James Bottomley , Jonathan Adams , Kees Cook , Paul Turner , Peter Zijlstra , Thomas Gleixner , linux-mm@kvack.org, linux-security-module@vger.kernel.org, x86@kernel.org, Mike Rapoport Subject: [RFC PATCH 3/7] x86/entry/64: add infrastructure for switching to isolated syscall context Date: Fri, 26 Apr 2019 00:45:50 +0300 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> References: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 19042521-0028-0000-0000-000003670D69 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19042521-0029-0000-0000-00002426659F Message-Id: <1556228754-12996-4-git-send-email-rppt@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-25_18:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=641 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904250133 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The isolated system calls will use a separate page table that does not map the entire kernel. Exception and interrupts entries should switch the context to the full kernel page tables and then restore it back to continue the execution of the isolated system call. Signed-off-by: Mike Rapoport --- arch/x86/entry/calling.h | 65 ++++++++++++++++++++++++++++++++++ arch/x86/entry/entry_64.S | 13 +++++-- arch/x86/include/asm/processor-flags.h | 8 +++++ arch/x86/include/asm/tlbflush.h | 8 ++++- arch/x86/kernel/asm-offsets.c | 7 ++++ 5 files changed, 98 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index efb0d1b..766e74e 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -187,6 +187,56 @@ For 32-bit we have the following conventions - kernel is built with #endif .endm +#ifdef CONFIG_SYSCALL_ISOLATION + +#define SCI_PCID_BIT X86_CR3_SCI_PCID_BIT + +#define THIS_CPU_sci_syscall \ + PER_CPU_VAR(cpu_sci) + SCI_SYSCALL + +#define THIS_CPU_sci_cr3_offset \ + PER_CPU_VAR(cpu_sci) + SCI_CR3_OFFSET + +.macro SAVE_AND_SWITCH_SCI_TO_KERNEL_CR3 scratch_reg:req save_reg:req + ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_SCI + movq THIS_CPU_sci_syscall, \scratch_reg + cmpq $0, \scratch_reg + je .Ldone_\@ + movq %cr3, \scratch_reg + bt $SCI_PCID_BIT, \scratch_reg + jc .Lsci_context_\@ + xorq \save_reg, \save_reg + jmp .Ldone_\@ +.Lsci_context_\@: + movq \scratch_reg, \save_reg + addq THIS_CPU_sci_cr3_offset, \scratch_reg + movq \scratch_reg, %cr3 +.Ldone_\@: +.endm + +.macro RESTORE_SCI_CR3 scratch_reg:req save_reg:req + ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_SCI + movq THIS_CPU_sci_syscall, \scratch_reg + cmpq $0, \scratch_reg + je .Ldone_\@ + movq \save_reg, \scratch_reg + cmpq $0, \scratch_reg + je .Ldone_\@ + xorq \save_reg, \save_reg + movq \scratch_reg, %cr3 +.Ldone_\@: +.endm + +#else /* CONFIG_SYSCALL_ISOLATION */ + +.macro SAVE_AND_SWITCH_SCI_TO_KERNEL_CR3 scratch_reg:req save_reg:req +.endm + +.macro RESTORE_SCI_CR3 scratch_reg:req save_reg:req +.endm + +#endif /* CONFIG_SYSCALL_ISOLATION */ + #ifdef CONFIG_PAGE_TABLE_ISOLATION /* @@ -264,6 +314,21 @@ For 32-bit we have the following conventions - kernel is built with ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_PTI movq %cr3, \scratch_reg movq \scratch_reg, \save_reg + +#ifdef CONFIG_SYSCALL_ISOLATION + /* + * Test the SCI PCID bit. If set, then the SCI page tables are + * active. If clear CR3 has either the kernel or user page + * table active. + */ + ALTERNATIVE "jmp .Lcheck_user_pt_\@", "", X86_FEATURE_SCI + bt $SCI_PCID_BIT, \scratch_reg + jnc .Lcheck_user_pt_\@ + addq THIS_CPU_sci_cr3_offset, \scratch_reg + movq \scratch_reg, %cr3 + jmp .Ldone_\@ +.Lcheck_user_pt_\@: +#endif /* * Test the user pagetable bit. If set, then the user page tables * are active. If clear CR3 already has the kernel page table diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 1f0efdb..3cef67b 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -543,7 +543,7 @@ ENTRY(interrupt_entry) ENCODE_FRAME_POINTER 8 testb $3, CS+8(%rsp) - jz 1f + jz .Linterrupt_entry_kernel /* * IRQ from user mode. @@ -559,12 +559,17 @@ ENTRY(interrupt_entry) CALL_enter_from_user_mode -1: +.Linterrupt_entry_done: ENTER_IRQ_STACK old_rsp=%rdi save_ret=1 /* We entered an interrupt context - irqs are off: */ TRACE_IRQS_OFF ret + +.Linterrupt_entry_kernel: + SAVE_AND_SWITCH_SCI_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14 + jmp .Linterrupt_entry_done + END(interrupt_entry) _ASM_NOKPROBE(interrupt_entry) @@ -656,6 +661,8 @@ retint_kernel: */ TRACE_IRQS_IRETQ + RESTORE_SCI_CR3 scratch_reg=%rax save_reg=%r14 + GLOBAL(restore_regs_and_return_to_kernel) #ifdef CONFIG_DEBUG_ENTRY /* Assert that pt_regs indicates kernel mode. */ @@ -1263,6 +1270,8 @@ ENTRY(error_entry) * for these here too. */ .Lerror_kernelspace: + SAVE_AND_SWITCH_SCI_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14 + leaq native_irq_return_iret(%rip), %rcx cmpq %rcx, RIP+8(%rsp) je .Lerror_bad_iret diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h index 02c2cbd..eca9e17 100644 --- a/arch/x86/include/asm/processor-flags.h +++ b/arch/x86/include/asm/processor-flags.h @@ -53,4 +53,12 @@ # define X86_CR3_PTI_PCID_USER_BIT 11 #endif +#ifdef CONFIG_SYSCALL_ISOLATION +# if defined(X86_CR3_PTI_PCID_USER_BIT) +# define X86_CR3_SCI_PCID_BIT (X86_CR3_PTI_PCID_USER_BIT - 1) +# else +# define X86_CR3_SCI_PCID_BIT 11 +# endif +#endif + #endif /* _ASM_X86_PROCESSOR_FLAGS_H */ diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index f4204bf..dc69cc4 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -54,7 +54,13 @@ # define PTI_CONSUMED_PCID_BITS 0 #endif -#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS) +#ifdef CONFIG_SYSCALL_ISOLATION +# define SCI_CONSUMED_PCID_BITS 1 +#else +# define SCI_CONSUMED_PCID_BITS 0 +#endif + +#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS - SCI_CONSUMED_PCID_BITS) /* * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 168543d..f2c9cd3f 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -18,6 +18,7 @@ #include #include #include +#include #ifdef CONFIG_XEN #include @@ -105,4 +106,10 @@ static void __used common(void) OFFSET(TSS_sp0, tss_struct, x86_tss.sp0); OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); OFFSET(TSS_sp2, tss_struct, x86_tss.sp2); + +#ifdef CONFIG_SYSCALL_ISOLATION + /* system calls isolation */ + OFFSET(SCI_SYSCALL, sci_percpu_data, sci_syscall); + OFFSET(SCI_CR3_OFFSET, sci_percpu_data, sci_cr3_offset); +#endif } From patchwork Thu Apr 25 21:45:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 10917813 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A96B614B6 for ; Thu, 25 Apr 2019 21:46:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B4FD28D41 for ; Thu, 25 Apr 2019 21:46:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8F5A028D43; Thu, 25 Apr 2019 21:46:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2245628D41 for ; Thu, 25 Apr 2019 21:46:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388210AbfDYVq0 (ORCPT ); Thu, 25 Apr 2019 17:46:26 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:48588 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388204AbfDYVqZ (ORCPT ); Thu, 25 Apr 2019 17:46:25 -0400 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3PLdOhb090800 for ; Thu, 25 Apr 2019 17:46:24 -0400 Received: from e06smtp01.uk.ibm.com (e06smtp01.uk.ibm.com [195.75.94.97]) by mx0a-001b2d01.pphosted.com with ESMTP id 2s3hf9r736-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 25 Apr 2019 17:46:24 -0400 Received: from localhost by e06smtp01.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 25 Apr 2019 22:46:21 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp01.uk.ibm.com (192.168.101.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 25 Apr 2019 22:46:17 +0100 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3PLkGbB52494340 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Apr 2019 21:46:16 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 74A6311C052; Thu, 25 Apr 2019 21:46:16 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 00CAC11C04C; Thu, 25 Apr 2019 21:46:14 +0000 (GMT) Received: from rapoport-lnx (unknown [9.148.204.209]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 25 Apr 2019 21:46:13 +0000 (GMT) Received: by rapoport-lnx (sSMTP sendmail emulation); Fri, 26 Apr 2019 00:46:13 +0300 From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexandre Chartre , Andy Lutomirski , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Ingo Molnar , James Bottomley , Jonathan Adams , Kees Cook , Paul Turner , Peter Zijlstra , Thomas Gleixner , linux-mm@kvack.org, linux-security-module@vger.kernel.org, x86@kernel.org, Mike Rapoport Subject: [RFC PATCH 4/7] x86/sci: hook up isolated system call entry and exit Date: Fri, 26 Apr 2019 00:45:51 +0300 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> References: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 19042521-4275-0000-0000-0000032E1ABE X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19042521-4276-0000-0000-0000383D6915 Message-Id: <1556228754-12996-5-git-send-email-rppt@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-25_18:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=818 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904250133 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a system call is required to run in an isolated context, the CR3 will be switched to the SCI page table a per-cpu variable will contain and offset from the original CR3. This offset is used to switch back to the full kernel context when a trap occurs during isolated system call. Signed-off-by: Mike Rapoport --- arch/x86/entry/common.c | 61 ++++++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/process_64.c | 5 ++++ kernel/exit.c | 3 +++ 3 files changed, 69 insertions(+) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 7bc105f..8f2a6fd 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -25,12 +25,14 @@ #include #include #include +#include #include #include #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -269,6 +271,50 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs) } #ifdef CONFIG_X86_64 + +#ifdef CONFIG_SYSCALL_ISOLATION +static inline bool sci_required(unsigned long nr) +{ + return false; +} + +static inline unsigned long sci_syscall_enter(unsigned long nr) +{ + unsigned long sci_cr3, kernel_cr3; + unsigned long asid; + + kernel_cr3 = __read_cr3(); + asid = kernel_cr3 & ~PAGE_MASK; + + sci_cr3 = build_cr3(current->sci->pgd, 0) & PAGE_MASK; + sci_cr3 |= (asid | (1 << X86_CR3_SCI_PCID_BIT)); + + current->in_isolated_syscall = 1; + current->sci->cr3_offset = kernel_cr3 - sci_cr3; + + this_cpu_write(cpu_sci.sci_syscall, 1); + this_cpu_write(cpu_sci.sci_cr3_offset, current->sci->cr3_offset); + + write_cr3(sci_cr3); + + return kernel_cr3; +} + +static inline void sci_syscall_exit(unsigned long cr3) +{ + if (cr3) { + write_cr3(cr3); + current->in_isolated_syscall = 0; + this_cpu_write(cpu_sci.sci_syscall, 0); + sci_clear_data(); + } +} +#else +static inline bool sci_required(unsigned long nr) { return false; } +static inline unsigned long sci_syscall_enter(unsigned long nr) { return 0; } +static inline void sci_syscall_exit(unsigned long cr3) {} +#endif + __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs) { struct thread_info *ti; @@ -286,10 +332,25 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs) */ nr &= __SYSCALL_MASK; if (likely(nr < NR_syscalls)) { + unsigned long sci_cr3 = 0; + nr = array_index_nospec(nr, NR_syscalls); + + if (sci_required(nr)) { + int err = sci_init(current); + + if (err) { + regs->ax = err; + goto err_return_from_syscall; + } + sci_cr3 = sci_syscall_enter(nr); + } + regs->ax = sys_call_table[nr](regs); + sci_syscall_exit(sci_cr3); } +err_return_from_syscall: syscall_return_slowpath(regs); } #endif diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 6a62f4a..b8aa624 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -55,6 +55,8 @@ #include #include #include +#include + #ifdef CONFIG_IA32_EMULATION /* Not included via unistd.h */ #include @@ -581,6 +583,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) switch_to_extra(prev_p, next_p); + /* update syscall isolation per-cpu data */ + sci_switch_to(next_p); + #ifdef CONFIG_XEN_PV /* * On Xen PV, IOPL bits in pt_regs->flags have no effect, and diff --git a/kernel/exit.c b/kernel/exit.c index 2639a30..8e81353 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -62,6 +62,7 @@ #include #include #include +#include #include #include @@ -859,6 +860,8 @@ void __noreturn do_exit(long code) tsk->exit_code = code; taskstats_exit(tsk, group_dead); + sci_exit(tsk); + exit_mm(); if (group_dead) From patchwork Thu Apr 25 21:45:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 10917811 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF9B81575 for ; Thu, 25 Apr 2019 21:46:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B145D28D41 for ; Thu, 25 Apr 2019 21:46:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A58C628D42; Thu, 25 Apr 2019 21:46:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5837128D44 for ; Thu, 25 Apr 2019 21:46:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388230AbfDYVqa (ORCPT ); Thu, 25 Apr 2019 17:46:30 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:35210 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388224AbfDYVqa (ORCPT ); Thu, 25 Apr 2019 17:46:30 -0400 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3PLYAWO084407 for ; Thu, 25 Apr 2019 17:46:28 -0400 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0a-001b2d01.pphosted.com with ESMTP id 2s3hu3yx07-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 25 Apr 2019 17:46:28 -0400 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 25 Apr 2019 22:46:26 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 25 Apr 2019 22:46:20 +0100 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3PLkJQT61538416 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Apr 2019 21:46:19 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CA28142041; Thu, 25 Apr 2019 21:46:19 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 50F4B42049; Thu, 25 Apr 2019 21:46:17 +0000 (GMT) Received: from rapoport-lnx (unknown [9.148.204.209]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 25 Apr 2019 21:46:17 +0000 (GMT) Received: by rapoport-lnx (sSMTP sendmail emulation); Fri, 26 Apr 2019 00:46:16 +0300 From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexandre Chartre , Andy Lutomirski , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Ingo Molnar , James Bottomley , Jonathan Adams , Kees Cook , Paul Turner , Peter Zijlstra , Thomas Gleixner , linux-mm@kvack.org, linux-security-module@vger.kernel.org, x86@kernel.org, Mike Rapoport Subject: [RFC PATCH 5/7] x86/mm/fault: hook up SCI verification Date: Fri, 26 Apr 2019 00:45:52 +0300 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> References: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 19042521-0016-0000-0000-000002750F36 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19042521-0017-0000-0000-000032D18954 Message-Id: <1556228754-12996-6-git-send-email-rppt@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-25_18:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=411 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904250133 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP If a system call runs in isolated context, it's accesses to kernel code and data will be verified by SCI susbsytem. Signed-off-by: Mike Rapoport --- arch/x86/mm/fault.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 9d5c75f..baa2a2f 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -18,6 +18,7 @@ #include /* faulthandler_disabled() */ #include /* efi_recover_from_page_fault()*/ #include +#include /* sci_verify_and_map() */ #include /* boot_cpu_has, ... */ #include /* dotraplinkage, ... */ @@ -1254,6 +1255,30 @@ static int fault_in_kernel_space(unsigned long address) return address >= TASK_SIZE_MAX; } +#ifdef CONFIG_SYSCALL_ISOLATION +static int sci_fault(struct pt_regs *regs, unsigned long hw_error_code, + unsigned long address) +{ + struct task_struct *tsk = current; + + if (!tsk->in_isolated_syscall) + return 0; + + if (!sci_verify_and_map(regs, address, hw_error_code)) { + this_cpu_write(cpu_sci.sci_syscall, 0); + no_context(regs, hw_error_code, address, SIGKILL, 0); + } + + return 1; +} +#else +static inline int sci_fault(struct pt_regs *regs, unsigned long hw_error_code, + unsigned long address) +{ + return 0; +} +#endif + /* * Called for all faults where 'address' is part of the kernel address * space. Might get called for faults that originate from *code* that @@ -1301,6 +1326,9 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, if (kprobes_fault(regs)) return; + if (sci_fault(regs, hw_error_code, address)) + return; + /* * Note, despite being a "bad area", there are quite a few * acceptable reasons to get here, such as erratum fixups From patchwork Thu Apr 25 21:45:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 10917803 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F3CD314D5 for ; Thu, 25 Apr 2019 21:46:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E448428D41 for ; Thu, 25 Apr 2019 21:46:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D829C28D43; Thu, 25 Apr 2019 21:46:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8A7DB28D41 for ; Thu, 25 Apr 2019 21:46:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388205AbfDYVqd (ORCPT ); Thu, 25 Apr 2019 17:46:33 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:48918 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388239AbfDYVqc (ORCPT ); Thu, 25 Apr 2019 17:46:32 -0400 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3PLY7DR082543 for ; Thu, 25 Apr 2019 17:46:31 -0400 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 2s3kxcjrdc-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 25 Apr 2019 17:46:31 -0400 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 25 Apr 2019 22:46:29 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp02.uk.ibm.com (192.168.101.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 25 Apr 2019 22:46:24 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3PLkNkp60686498 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Apr 2019 21:46:23 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1F2B3A4055; Thu, 25 Apr 2019 21:46:23 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A1C6BA404D; Thu, 25 Apr 2019 21:46:20 +0000 (GMT) Received: from rapoport-lnx (unknown [9.148.204.209]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 25 Apr 2019 21:46:20 +0000 (GMT) Received: by rapoport-lnx (sSMTP sendmail emulation); Fri, 26 Apr 2019 00:46:19 +0300 From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexandre Chartre , Andy Lutomirski , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Ingo Molnar , James Bottomley , Jonathan Adams , Kees Cook , Paul Turner , Peter Zijlstra , Thomas Gleixner , linux-mm@kvack.org, linux-security-module@vger.kernel.org, x86@kernel.org, Mike Rapoport Subject: [RFC PATCH 6/7] security: enable system call isolation in kernel config Date: Fri, 26 Apr 2019 00:45:53 +0300 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> References: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 19042521-0008-0000-0000-000002E00BF3 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19042521-0009-0000-0000-0000224C6727 Message-Id: <1556228754-12996-7-git-send-email-rppt@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-25_18:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=955 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904250133 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add SYSCALL_ISOLATION Kconfig option to enable build of SCI infrastructure. Signed-off-by: Mike Rapoport --- security/Kconfig | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/security/Kconfig b/security/Kconfig index e4fe2f3..0c6929a 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -65,6 +65,16 @@ config PAGE_TABLE_ISOLATION See Documentation/x86/pti.txt for more details. +config SYSCALL_ISOLATION + bool "System call isolation" + default n + depends on PAGE_TABLE_ISOLATION && !X86_PAE + help + This is an experimental feature to allow executing system + calls in an isolated address space. + + If you are unsure how to answer this question, answer N. + config SECURITY_INFINIBAND bool "Infiniband Security Hooks" depends on SECURITY && INFINIBAND From patchwork Thu Apr 25 21:45:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 10917809 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A9F814D5 for ; Thu, 25 Apr 2019 21:46:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4AC0628D41 for ; Thu, 25 Apr 2019 21:46:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3EA3528D43; Thu, 25 Apr 2019 21:46:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CEFE028D41 for ; Thu, 25 Apr 2019 21:46:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388262AbfDYVqk (ORCPT ); Thu, 25 Apr 2019 17:46:40 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:57654 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388260AbfDYVqg (ORCPT ); Thu, 25 Apr 2019 17:46:36 -0400 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3PLYuTm143375 for ; Thu, 25 Apr 2019 17:46:35 -0400 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2s3m2faf46-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 25 Apr 2019 17:46:35 -0400 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 25 Apr 2019 22:46:32 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 25 Apr 2019 22:46:27 +0100 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3PLkQoE46923988 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Apr 2019 21:46:26 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6F4B3AE04D; Thu, 25 Apr 2019 21:46:26 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EECBFAE051; Thu, 25 Apr 2019 21:46:23 +0000 (GMT) Received: from rapoport-lnx (unknown [9.148.204.209]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 25 Apr 2019 21:46:23 +0000 (GMT) Received: by rapoport-lnx (sSMTP sendmail emulation); Fri, 26 Apr 2019 00:46:23 +0300 From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Alexandre Chartre , Andy Lutomirski , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Ingo Molnar , James Bottomley , Jonathan Adams , Kees Cook , Paul Turner , Peter Zijlstra , Thomas Gleixner , linux-mm@kvack.org, linux-security-module@vger.kernel.org, x86@kernel.org, Mike Rapoport Subject: [RFC PATCH 7/7] sci: add example system calls to exercse SCI Date: Fri, 26 Apr 2019 00:45:54 +0300 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> References: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 19042521-0020-0000-0000-000003360C05 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19042521-0021-0000-0000-000021887A43 Message-Id: <1556228754-12996-8-git-send-email-rppt@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-25_18:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904250133 Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Mike Rapoport --- arch/x86/entry/common.c | 6 +++- arch/x86/entry/syscalls/syscall_64.tbl | 3 ++ kernel/Makefile | 2 +- kernel/sci-examples.c | 52 ++++++++++++++++++++++++++++++++++ 4 files changed, 61 insertions(+), 2 deletions(-) create mode 100644 kernel/sci-examples.c diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 8f2a6fd..be0e1a7 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -275,7 +275,11 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs) #ifdef CONFIG_SYSCALL_ISOLATION static inline bool sci_required(unsigned long nr) { - return false; + if (!static_cpu_has(X86_FEATURE_SCI)) + return false; + if (nr < __NR_get_answer) + return false; + return true; } static inline unsigned long sci_syscall_enter(unsigned long nr) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index f0b1709..a25e838 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -343,6 +343,9 @@ 332 common statx __x64_sys_statx 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq +335 64 get_answer __x64_sys_get_answer +336 64 sci_write_dmesg __x64_sys_sci_write_dmesg +337 64 sci_write_dmesg_bad __x64_sys_sci_write_dmesg_bad # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/kernel/Makefile b/kernel/Makefile index 6aa7543..d6441d0 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -10,7 +10,7 @@ obj-y = fork.o exec_domain.o panic.o \ extable.o params.o \ kthread.o sys_ni.o nsproxy.o \ notifier.o ksysfs.o cred.o reboot.o \ - async.o range.o smpboot.o ucount.o + async.o range.o smpboot.o ucount.o sci-examples.o obj-$(CONFIG_MODULES) += kmod.o obj-$(CONFIG_MULTIUSER) += groups.o diff --git a/kernel/sci-examples.c b/kernel/sci-examples.c new file mode 100644 index 0000000..9bfaad0 --- /dev/null +++ b/kernel/sci-examples.c @@ -0,0 +1,52 @@ +#include +#include +#include +#include +#include + +SYSCALL_DEFINE0(get_answer) +{ + return 42; +} + +#define BUF_SIZE 1024 + +typedef void (*foo)(void); + +SYSCALL_DEFINE2(sci_write_dmesg, const char __user *, ubuf, size_t, count) +{ + char buf[BUF_SIZE]; + + if (!ubuf || count >= BUF_SIZE) + return -EINVAL; + + buf[count] = '\0'; + if (copy_from_user(buf, ubuf, count)) + return -EFAULT; + + printk("%s\n", buf); + + return count; +} + +SYSCALL_DEFINE2(sci_write_dmesg_bad, const char __user *, ubuf, size_t, count) +{ + unsigned long addr = (unsigned long)(void *)hugetlb_reserve_pages; + char buf[BUF_SIZE]; + foo func1; + + addr += 0xc5; + func1 = (foo)(void *)addr; + func1(); + + if (!ubuf || count >= BUF_SIZE) + return -EINVAL; + + buf[count] = '\0'; + if (copy_from_user(buf, ubuf, count)) + return -EFAULT; + + printk("%s\n", buf); + + return count; +}