From patchwork Mon May 4 14:49:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Chartre X-Patchwork-Id: 11526395 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1BEC15E6 for ; Mon, 4 May 2020 14:53:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B02CF206D9 for ; Mon, 4 May 2020 14:53:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="kKs5ycMX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B02CF206D9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BB5358E0028; Mon, 4 May 2020 10:53:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B65DD8E0024; Mon, 4 May 2020 10:53:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A54248E0028; Mon, 4 May 2020 10:53:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id 8CFE18E0024 for ; Mon, 4 May 2020 10:53:17 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 433DA180AD80F for ; Mon, 4 May 2020 14:53:17 +0000 (UTC) X-FDA: 76779329634.04.mask56_910c195d98639 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alexandre.chartre@oracle.com,,RULES_HIT:30003:30012:30029:30054:30064:30067:30070:30074,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: mask56_910c195d98639 X-Filterd-Recvd-Size: 10227 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Mon, 4 May 2020 14:53:16 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044EnCHY024811; Mon, 4 May 2020 14:53:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=aTvo3LOy6ZsSCX7rN8bz6bGi2mzArjNplhSWG2j/8T4=; b=kKs5ycMXibypAaFfXJGlmHPuQPQAejWwXHc+K3dNpt4MN/F/gF/k7/AoXcEzh+X3qXos 3AzkG9SQ+GKii+7TCX5tCmtDXOlJarpRbwYu9FmQf7Y9Q+8G6n4CcaOSe0zDhPQxBvVr XDlLRRtMIxpeRYldNbZV/5xVzYW7h0U5uaNsIHeiJvpeqGCRKbhw96ZCr2ZOeHwlKtgk 3R5Wmt+mJKnzRNa6PBN/FM7aidz+X0aRCdx/Ef8V7Ne/+6cvmbxqOdiIYyfKpzv/YxaY 59LVOCvVZNIE0udsdMJdJD8KUwlAcmqYO7rCFXEug6eTIokHgvXrjPTd1KFjmD5w9STQ Kg== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 30s0tm7byn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:53:07 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044Elap1053358; Mon, 4 May 2020 14:51:07 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 30t1r2e0t8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:07 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 044Ep4HN027354; Mon, 4 May 2020 14:51:04 GMT Received: from linux-1.home.com (/10.175.9.166) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 May 2020 07:51:04 -0700 From: Alexandre Chartre To: rkrcmar@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, konrad.wilk@oracle.com, jan.setjeeilers@oracle.com, liran.alon@oracle.com, junaids@google.com, graf@amazon.de, rppt@linux.vnet.ibm.com, kuzuno@gmail.com, mgross@linux.intel.com, alexandre.chartre@oracle.com Subject: [RFC v4][PATCH part-1 1/7] mm/x86: Introduce kernel Address Space Isolation (ASI) Date: Mon, 4 May 2020 16:49:33 +0200 Message-Id: <20200504144939.11318-2-alexandre.chartre@oracle.com> X-Mailer: git-send-email 2.18.2 In-Reply-To: <20200504144939.11318-1-alexandre.chartre@oracle.com> References: <20200504144939.11318-1-alexandre.chartre@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 suspectscore=0 phishscore=0 clxscore=1015 bulkscore=0 mlxlogscore=999 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce core functions and structures for implementing Address Space Isolation (ASI). Kernel address space isolation provides the ability to run some kernel code with a reduced kernel address space. An address space isolation is defined with a struct asi structure and associated with an ASI type and a pagetable. Signed-off-by: Alexandre Chartre --- arch/x86/include/asm/asi.h | 88 ++++++++++++++++++++++++++++++++++++++ arch/x86/mm/Makefile | 1 + arch/x86/mm/asi.c | 60 ++++++++++++++++++++++++++ security/Kconfig | 10 +++++ 4 files changed, 159 insertions(+) create mode 100644 arch/x86/include/asm/asi.h create mode 100644 arch/x86/mm/asi.c diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h new file mode 100644 index 000000000000..844a81fb84d2 --- /dev/null +++ b/arch/x86/include/asm/asi.h @@ -0,0 +1,88 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef ARCH_X86_MM_ASI_H +#define ARCH_X86_MM_ASI_H + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +/* + * An Address Space Isolation (ASI) is defined with a struct asi and + * associated with an ASI type (struct asi_type). All ASIs of the same + * type reference the same ASI type. + * + * An ASI type has a unique PCID prefix (a value in the range [1, 255]) + * which is used to define the PCID used for the ASI CR3 value. The + * first four bits of the ASI PCID come from the kernel PCID (a value + * between 1 and 6, see TLB_NR_DYN_ASIDS). The remaining 8 bits are + * filled with the ASI PCID prefix. + * + * ASI PCID = (ASI Type PCID Prefix << 4) | Kernel PCID + * + * The ASI PCID is used to optimize TLB flushing when switching between + * the kernel and ASI pagetables. The optimization is valid only when + * a task switches between ASI of different types. If a task switches + * between different ASIs with the same type then the ASI TLB the task + * is switching to will always be flushed. + */ + +#define ASI_PCID_PREFIX_SHIFT 4 +#define ASI_PCID_PREFIX_MASK 0xff0 +#define ASI_KERNEL_PCID_MASK 0x00f + +/* + * We use bit 12 of a pagetable pointer (and so of the CR3 value) as + * a way to know if a pointer/CR3 is referencing a full kernel page + * table or an ASI page table. + * + * A full kernel pagetable is always located on the first half of an + * 8K buffer, while an ASI pagetable is always located on the second + * half of an 8K buffer. + */ +#define ASI_PGTABLE_BIT PAGE_SHIFT +#define ASI_PGTABLE_MASK (1 << ASI_PGTABLE_BIT) + +#ifndef __ASSEMBLY__ + +#include + +struct asi_type { + int pcid_prefix; /* PCID prefix */ +}; + +/* + * Macro to define and declare an ASI type. + * + * Declaring an ASI type will also define an inline function + * (asi_create_()) to easily create an ASI of the + * specified type. + */ +#define DEFINE_ASI_TYPE(name, pcid_prefix) \ + struct asi_type asi_type_ ## name = { \ + pcid_prefix, \ + }; \ + EXPORT_SYMBOL(asi_type_ ## name) + +#define DECLARE_ASI_TYPE(name) \ + extern struct asi_type asi_type_ ## name; \ + DECLARE_ASI_CREATE(name) + +#define DECLARE_ASI_CREATE(name) \ +static inline struct asi *asi_create_ ## name(void) \ +{ \ + return asi_create(&asi_type_ ## name); \ +} + +struct asi { + struct asi_type *type; /* ASI type */ + pgd_t *pagetable; /* ASI pagetable */ + unsigned long base_cr3; /* base ASI CR3 */ +}; + +extern struct asi *asi_create(struct asi_type *type); +extern void asi_destroy(struct asi *asi); +extern void asi_set_pagetable(struct asi *asi, pgd_t *pagetable); + +#endif /* __ASSEMBLY__ */ + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +#endif diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 98f7c6fa2eaa..e57af263e870 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o +obj-$(CONFIG_ADDRESS_SPACE_ISOLATION) += asi.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c new file mode 100644 index 000000000000..0a0ac9d6d078 --- /dev/null +++ b/arch/x86/mm/asi.c @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2019, 2020, Oracle and/or its affiliates. + * + * Kernel Address Space Isolation (ASI) + */ + +#include +#include + +#include +#include + +struct asi *asi_create(struct asi_type *type) +{ + struct asi *asi; + + if (!type) + return NULL; + + asi = kzalloc(sizeof(*asi), GFP_KERNEL); + if (!asi) + return NULL; + + asi->type = type; + + return asi; +} +EXPORT_SYMBOL(asi_create); + +void asi_destroy(struct asi *asi) +{ + kfree(asi); +} +EXPORT_SYMBOL(asi_destroy); + +void asi_set_pagetable(struct asi *asi, pgd_t *pagetable) +{ + /* + * Check that the specified pagetable is properly aligned to be + * used as an ASI pagetable. If not, the pagetable is ignored + * and entering/exiting ASI will do nothing. + */ + if (!(((unsigned long)pagetable) & ASI_PGTABLE_MASK)) { + WARN(1, "ASI %p: invalid ASI pagetable", asi); + asi->pagetable = NULL; + return; + } + asi->pagetable = pagetable; + + /* + * Initialize the invariant part of the ASI CR3 value. We will + * just have to complete the PCID with the kernel PCID before + * using it. + */ + asi->base_cr3 = __sme_pa(asi->pagetable) | + (asi->type->pcid_prefix << ASI_PCID_PREFIX_SHIFT); + +} +EXPORT_SYMBOL(asi_set_pagetable); diff --git a/security/Kconfig b/security/Kconfig index cd3cc7da3a55..d98197eb260c 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -65,6 +65,16 @@ config PAGE_TABLE_ISOLATION See Documentation/x86/pti.rst for more details. +config ADDRESS_SPACE_ISOLATION + bool "Allow code to run with a reduced kernel address space" + default y + depends on (X86_64 || X86_PAE) && !UML + help + This feature provides the ability to run some kernel code + with a reduced kernel address space. This can be used to + mitigate speculative execution attacks which are able to + leak data between sibling CPU hyper-threads. + config SECURITY_INFINIBAND bool "Infiniband Security Hooks" depends on SECURITY && INFINIBAND From patchwork Mon May 4 14:49:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Chartre X-Patchwork-Id: 11526387 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8DF8115E6 for ; Mon, 4 May 2020 14:51:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4CB08206B9 for ; Mon, 4 May 2020 14:51:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="S/XXinXD" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4CB08206B9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DE8988E0003; Mon, 4 May 2020 10:51:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C2B518E0024; Mon, 4 May 2020 10:51:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA36D8E0021; Mon, 4 May 2020 10:51:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id 6A0DE8E0003 for ; Mon, 4 May 2020 10:51:27 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 196D517870 for ; Mon, 4 May 2020 14:51:27 +0000 (UTC) X-FDA: 76779325014.26.actor66_80fce9f077f28 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alexandre.chartre@oracle.com,,RULES_HIT:30003:30045:30051:30054:30064:30070:30091,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: actor66_80fce9f077f28 X-Filterd-Recvd-Size: 10574 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Mon, 4 May 2020 14:51:26 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044En9le024704; Mon, 4 May 2020 14:51:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=Hw8EHa7s5kuPup58M+aztcjQP7OcZEWmGUe+ltkfQS8=; b=S/XXinXDNwkGl1a9virUpZ6znnfR5WHBbm0PASLZbG6/Nvz6L5GOMKA3lJlUX5u9Vvft PYuZh8gDmX47NAYHOLnsTgkVpDqlX7itpoTqu6V36zIRqL5MQYV0YTJJyv3FIE8G9P2N +0ZoBlHMkq8Dmdeta5YVqSujVcfpONrZw/RYwnAqy5CQWMja3etewQmyMkUP4+38eQ7K G3CPbhoQVm0NfHu9rg30d3UHJuLQZxl8QbPgIya9tvWj9Gp7hyw1VjSMwQR2pzM91rJW mcdtRv2GmhUoQYWsKaQqCy7ypq7U1FD3j/ftsVj3T1K1UIYHUnoue/QLUGv9rQVHW7lC Vw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 30s0tm7bk1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:11 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044Elb6P053467; Mon, 4 May 2020 14:51:10 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3030.oracle.com with ESMTP id 30t1r2e0y2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:10 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 044Ep9YN022491; Mon, 4 May 2020 14:51:09 GMT Received: from linux-1.home.com (/10.175.9.166) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 May 2020 07:51:08 -0700 From: Alexandre Chartre To: rkrcmar@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, konrad.wilk@oracle.com, jan.setjeeilers@oracle.com, liran.alon@oracle.com, junaids@google.com, graf@amazon.de, rppt@linux.vnet.ibm.com, kuzuno@gmail.com, mgross@linux.intel.com, alexandre.chartre@oracle.com Subject: [RFC v4][PATCH part-1 2/7] mm/asi: ASI entry/exit interface Date: Mon, 4 May 2020 16:49:34 +0200 Message-Id: <20200504144939.11318-3-alexandre.chartre@oracle.com> X-Mailer: git-send-email 2.18.2 In-Reply-To: <20200504144939.11318-1-alexandre.chartre@oracle.com> References: <20200504144939.11318-1-alexandre.chartre@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 suspectscore=0 phishscore=0 clxscore=1015 bulkscore=0 mlxlogscore=999 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Address Space Isolation (ASI) is entered by calling asi_enter() which switches the kernel page-table to the ASI page-table. Isolation is then exited by calling asi_exit() which switches the page-table back to the original kernel page-table. The ASI being used and its state is tracked in a per-cpu ASI session structure (struct asi_session). Signed-off-by: Alexandre Chartre --- arch/x86/include/asm/asi.h | 4 ++ arch/x86/include/asm/asi_session.h | 17 ++++++ arch/x86/include/asm/mmu_context.h | 19 ++++++- arch/x86/include/asm/tlbflush.h | 12 ++++ arch/x86/mm/asi.c | 90 ++++++++++++++++++++++++++++++ 5 files changed, 140 insertions(+), 2 deletions(-) create mode 100644 arch/x86/include/asm/asi_session.h diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 844a81fb84d2..29b745ab393e 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -44,6 +44,8 @@ #include +#include + struct asi_type { int pcid_prefix; /* PCID prefix */ }; @@ -80,6 +82,8 @@ struct asi { extern struct asi *asi_create(struct asi_type *type); extern void asi_destroy(struct asi *asi); extern void asi_set_pagetable(struct asi *asi, pgd_t *pagetable); +extern int asi_enter(struct asi *asi); +extern void asi_exit(struct asi *asi); #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/asi_session.h b/arch/x86/include/asm/asi_session.h new file mode 100644 index 000000000000..9d39c936a4ee --- /dev/null +++ b/arch/x86/include/asm/asi_session.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef ARCH_X86_MM_ASI_SESSION_H +#define ARCH_X86_MM_ASI_SESSION_H + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +struct asi; + +struct asi_session { + struct asi *asi; /* ASI for this session */ + unsigned long isolation_cr3; /* cr3 when ASI is active */ + unsigned long original_cr3; /* cr3 before entering ASI */ +}; + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +#endif diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 4e55370e48e8..9b03bad00b81 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -13,6 +13,7 @@ #include #include #include +#include extern atomic64_t last_mm_ctx_id; @@ -234,8 +235,22 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, */ static inline unsigned long __get_current_cr3_fast(void) { - unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, - this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + unsigned long cr3; + + /* + * If isolation is active then we need to return the CR3 for the + * currently active ASI. This value is stored in the isolation_cr3 + * field of the ASI session. + */ + if (IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION) && + this_cpu_read(cpu_asi_session.asi)) { + cr3 = this_cpu_read(cpu_asi_session.isolation_cr3); + /* CR3 read never returns with the NOFLUSH bit */ + cr3 &= ~X86_CR3_PCID_NOFLUSH; + } else { + cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, + this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + } /* For now, be very restrictive about when this can be called. */ VM_WARN_ON(in_nmi() || preemptible()); diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 6f66d841262d..241058ff63ba 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -12,6 +12,7 @@ #include #include #include +#include /* * The x86 feature is called PCID (Process Context IDentifier). It is similar @@ -239,9 +240,20 @@ struct tlb_state { * context 0. */ struct tlb_context ctxs[TLB_NR_DYN_ASIDS]; + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* + * The ASI session tracks the ASI being used and its state. + */ + struct asi_session asi_session; +#endif }; DECLARE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define cpu_asi_session (cpu_tlbstate.asi_session) +#endif + /* * Blindly accessing user memory from NMI context can be dangerous * if we're in the middle of switching the current user task or diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 0a0ac9d6d078..9fbc92122ce2 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -10,6 +10,8 @@ #include #include +#include +#include struct asi *asi_create(struct asi_type *type) { @@ -58,3 +60,91 @@ void asi_set_pagetable(struct asi *asi, pgd_t *pagetable) } EXPORT_SYMBOL(asi_set_pagetable); + +static void asi_switch_to_asi_cr3(struct asi *asi) +{ + unsigned long original_cr3, asi_cr3; + struct asi_session *asi_session; + u16 pcid; + + WARN_ON(!irqs_disabled()); + + original_cr3 = __get_current_cr3_fast(); + + /* build the ASI cr3 value */ + asi_cr3 = asi->base_cr3; + if (boot_cpu_has(X86_FEATURE_PCID)) { + pcid = original_cr3 & ASI_KERNEL_PCID_MASK; + asi_cr3 |= pcid; + } + + /* get the ASI session ready for entering ASI */ + asi_session = &get_cpu_var(cpu_asi_session); + asi_session->asi = asi; + asi_session->original_cr3 = original_cr3; + asi_session->isolation_cr3 = asi_cr3; + + /* Update CR3 to immediately enter ASI */ + native_write_cr3(asi_cr3); +} + +static void asi_switch_to_kernel_cr3(struct asi *asi) +{ + struct asi_session *asi_session; + unsigned long original_cr3; + + WARN_ON(!irqs_disabled()); + + original_cr3 = this_cpu_read(cpu_asi_session.original_cr3); + if (boot_cpu_has(X86_FEATURE_PCID)) + original_cr3 |= X86_CR3_PCID_NOFLUSH; + native_write_cr3(original_cr3); + + asi_session = &get_cpu_var(cpu_asi_session); + asi_session->asi = NULL; +} + +int asi_enter(struct asi *asi) +{ + struct asi *current_asi; + unsigned long flags; + + /* + * We can re-enter isolation, but only with the same ASI (we don't + * support nesting isolation). + */ + current_asi = this_cpu_read(cpu_asi_session.asi); + if (current_asi) { + if (current_asi != asi) { + WARN_ON(1); + return -EBUSY; + } + return 0; + } + + local_irq_save(flags); + asi_switch_to_asi_cr3(asi); + local_irq_restore(flags); + + return 0; +} +EXPORT_SYMBOL(asi_enter); + +void asi_exit(struct asi *asi) +{ + struct asi *current_asi; + unsigned long flags; + + current_asi = this_cpu_read(cpu_asi_session.asi); + if (!current_asi) { + /* already exited */ + return; + } + + WARN_ON(current_asi != asi); + + local_irq_save(flags); + asi_switch_to_kernel_cr3(asi); + local_irq_restore(flags); +} +EXPORT_SYMBOL(asi_exit); From patchwork Mon May 4 14:49:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Chartre X-Patchwork-Id: 11526385 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E7D81668 for ; Mon, 4 May 2020 14:51:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E17CE206B9 for ; Mon, 4 May 2020 14:51:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="dO9U9DQi" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E17CE206B9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B418D8E0023; Mon, 4 May 2020 10:51:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B17278E0003; Mon, 4 May 2020 10:51:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F4F28E0024; Mon, 4 May 2020 10:51:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id 572598E0021 for ; Mon, 4 May 2020 10:51:27 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0FDDE17866 for ; Mon, 4 May 2020 14:51:27 +0000 (UTC) X-FDA: 76779325014.05.party80_810443bab635e X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alexandre.chartre@oracle.com,,RULES_HIT:30029:30045:30054:30064:30070:30075:30080,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: party80_810443bab635e X-Filterd-Recvd-Size: 7710 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Mon, 4 May 2020 14:51:26 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044En9GP024674; Mon, 4 May 2020 14:51:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=p0tYN8+DxLbVADW76AZKM7InJpaQ2V4ebiI+EuWm4L4=; b=dO9U9DQijykqredOTtBqvm++EIUo64O298gL3GNoxQBLBWJ3yZparDKY6GxulpT5DA20 BB1pWSYic19hp2D4eh9IYSBOWCE3P/yrrkOgeLLoVfCVMHfnQtgQ+317Dnl+Prh8zApi P0MLn2XLo66iZifiO7r/XbyuKiF9Ks+c0URdUdrYgXjI+gdbJrF7uQpKdKFkpDQLl1PR ZEVECcsa6mijtB5mjDI7AkTXwtgN/faPbn+SvAbHA8Y5tXoJ7nQE4Gh7LfAzK+t9IfE2 QxsNLju6fqS55t5Iia/GM/x43Is3Jnp7oNeH3bj7E5/DMiJZ6RGKQY6dyr9UxlzbCXbN Yg== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 30s0tm7bkt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:15 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044Elcva053565; Mon, 4 May 2020 14:51:14 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3030.oracle.com with ESMTP id 30t1r2e12u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:14 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 044EpDY3022514; Mon, 4 May 2020 14:51:13 GMT Received: from linux-1.home.com (/10.175.9.166) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 May 2020 07:51:12 -0700 From: Alexandre Chartre To: rkrcmar@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, konrad.wilk@oracle.com, jan.setjeeilers@oracle.com, liran.alon@oracle.com, junaids@google.com, graf@amazon.de, rppt@linux.vnet.ibm.com, kuzuno@gmail.com, mgross@linux.intel.com, alexandre.chartre@oracle.com Subject: [RFC v4][PATCH part-1 3/7] mm/asi: Improve TLB flushing when switching to an ASI pagetable Date: Mon, 4 May 2020 16:49:35 +0200 Message-Id: <20200504144939.11318-4-alexandre.chartre@oracle.com> X-Mailer: git-send-email 2.18.2 In-Reply-To: <20200504144939.11318-1-alexandre.chartre@oracle.com> References: <20200504144939.11318-1-alexandre.chartre@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 suspectscore=0 phishscore=0 clxscore=1015 bulkscore=0 mlxlogscore=999 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When switching to an ASI pagetable, the TLB doesn't need to be flushed if it was previously used with the same PCID. So, to avoid unnecessary TLB flushing, we track which pagetables are used with the different ASI PCIDs. If an ASI PCID is being used with a different ASI pagetable, or if we have a new generation of the same ASI pagetable, then the TLB needs to be flushed. This behavior is similar to the context tracking done when switching mm. Signed-off-by: Alexandre Chartre --- arch/x86/include/asm/asi.h | 23 +++++++++++++++++++++++ arch/x86/mm/asi.c | 34 ++++++++++++++++++++++++++++++++-- 2 files changed, 55 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 29b745ab393e..bcfb68e8e392 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -46,8 +46,26 @@ #include +/* + * ASI_NR_DYN_ASIDS is the same as TLB_NR_DYN_ASIDS. We can't directly + * use TLB_NR_DYN_ASIDS because asi.h and tlbflush.h can't both include + * each other. + */ +#define ASI_TLB_NR_DYN_ASIDS 6 + +struct asi_tlb_pgtable { + u64 id; + u64 gen; +}; + +struct asi_tlb_state { + struct asi_tlb_pgtable tlb_pgtables[ASI_TLB_NR_DYN_ASIDS]; +}; + struct asi_type { int pcid_prefix; /* PCID prefix */ + struct asi_tlb_state *tlb_state; /* percpu ASI TLB state */ + atomic64_t last_pgtable_id; /* last id for this type */ }; /* @@ -58,8 +76,11 @@ struct asi_type { * specified type. */ #define DEFINE_ASI_TYPE(name, pcid_prefix) \ + DEFINE_PER_CPU(struct asi_tlb_state, asi_tlb_ ## name); \ struct asi_type asi_type_ ## name = { \ pcid_prefix, \ + &asi_tlb_ ## name, \ + ATOMIC64_INIT(1), \ }; \ EXPORT_SYMBOL(asi_type_ ## name) @@ -76,6 +97,8 @@ static inline struct asi *asi_create_ ## name(void) \ struct asi { struct asi_type *type; /* ASI type */ pgd_t *pagetable; /* ASI pagetable */ + u64 pgtable_id; /* ASI pagetable ID */ + atomic64_t pgtable_gen; /* ASI pagetable generation */ unsigned long base_cr3; /* base ASI CR3 */ }; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 9fbc92122ce2..cf0d122a3c72 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -25,6 +25,8 @@ struct asi *asi_create(struct asi_type *type) return NULL; asi->type = type; + asi->pgtable_id = atomic64_inc_return(&type->last_pgtable_id); + atomic64_set(&asi->pgtable_gen, 0); return asi; } @@ -61,6 +63,33 @@ void asi_set_pagetable(struct asi *asi, pgd_t *pagetable) } EXPORT_SYMBOL(asi_set_pagetable); +/* + * Update ASI TLB flush information for the specified ASI CR3 value. + * Return an updated ASI CR3 value which specified if TLB needs to + * be flushed or not. + */ +static unsigned long asi_update_flush(struct asi *asi, unsigned long asi_cr3) +{ + struct asi_tlb_pgtable *tlb_pgtable; + struct asi_tlb_state *tlb_state; + s64 pgtable_gen; + u16 pcid; + + pcid = asi_cr3 & ASI_KERNEL_PCID_MASK; + tlb_state = get_cpu_ptr(asi->type->tlb_state); + tlb_pgtable = &tlb_state->tlb_pgtables[pcid - 1]; + pgtable_gen = atomic64_read(&asi->pgtable_gen); + if (tlb_pgtable->id == asi->pgtable_id && + tlb_pgtable->gen == pgtable_gen) { + asi_cr3 |= X86_CR3_PCID_NOFLUSH; + } else { + tlb_pgtable->id = asi->pgtable_id; + tlb_pgtable->gen = pgtable_gen; + } + + return asi_cr3; +} + static void asi_switch_to_asi_cr3(struct asi *asi) { unsigned long original_cr3, asi_cr3; @@ -72,10 +101,11 @@ static void asi_switch_to_asi_cr3(struct asi *asi) original_cr3 = __get_current_cr3_fast(); /* build the ASI cr3 value */ - asi_cr3 = asi->base_cr3; if (boot_cpu_has(X86_FEATURE_PCID)) { pcid = original_cr3 & ASI_KERNEL_PCID_MASK; - asi_cr3 |= pcid; + asi_cr3 = asi_update_flush(asi, asi->base_cr3 | pcid); + } else { + asi_cr3 = asi->base_cr3; } /* get the ASI session ready for entering ASI */ From patchwork Mon May 4 14:49:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Chartre X-Patchwork-Id: 11526391 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DAB3B1668 for ; Mon, 4 May 2020 14:51:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8B030206D9 for ; Mon, 4 May 2020 14:51:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="kahOWj8e" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B030206D9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 664A88E0021; Mon, 4 May 2020 10:51:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 57D2F8E0026; Mon, 4 May 2020 10:51:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F18C8E0024; Mon, 4 May 2020 10:51:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id E24D78E0021 for ; Mon, 4 May 2020 10:51:33 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AA13A824934B for ; Mon, 4 May 2020 14:51:33 +0000 (UTC) X-FDA: 76779325266.18.cause08_81f353befdc06 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alexandre.chartre@oracle.com,,RULES_HIT:30003:30012:30045:30051:30054:30064:30069:30070:30075:30090:30091,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: cause08_81f353befdc06 X-Filterd-Recvd-Size: 16624 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Mon, 4 May 2020 14:51:32 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044ElhSn024576; Mon, 4 May 2020 14:51:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=e6ewI8S7aGm6tiwPHtu5JUiRjJNIfg07Rz/hbnCp4jg=; b=kahOWj8ehikuxogimkKGAVx/2OUzmgb9ydJnTjvsYi99fH+8cC1FKV3gtlSEjo4xcIuw IaxX29izCxiO13ZfQR09uWAw0aN1oq07R60ob7VNSzh9WFgtJzEKqV6/Tr6w53CCKW2N Jt3HTCP9NcqMe4lZszkuRwjKLItX4FihtbYKGLhmxTISzbQYCi6qJM+PQJvHj+DaC8nv qR+kQQqT4v4h3JhZ27aMGa9aTbY+vjRUI/Lm/F5iwqruPSJDDdxnhAtdvETyGyTC+AAi gYL+UlUMuNOOetpwTrbqmQQWuiAnKHv+DSvqljXuI4owhCp1EfrXzaeKVQmUpjrMdE1u Dw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 30s1gmy6kg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:19 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044Elb6S053467; Mon, 4 May 2020 14:51:19 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3030.oracle.com with ESMTP id 30t1r2e16u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:19 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 044EpH2d022548; Mon, 4 May 2020 14:51:17 GMT Received: from linux-1.home.com (/10.175.9.166) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 May 2020 07:51:16 -0700 From: Alexandre Chartre To: rkrcmar@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, konrad.wilk@oracle.com, jan.setjeeilers@oracle.com, liran.alon@oracle.com, junaids@google.com, graf@amazon.de, rppt@linux.vnet.ibm.com, kuzuno@gmail.com, mgross@linux.intel.com, alexandre.chartre@oracle.com Subject: [RFC v4][PATCH part-1 4/7] mm/asi: Interrupt ASI on interrupt/exception/NMI Date: Mon, 4 May 2020 16:49:36 +0200 Message-Id: <20200504144939.11318-5-alexandre.chartre@oracle.com> X-Mailer: git-send-email 2.18.2 In-Reply-To: <20200504144939.11318-1-alexandre.chartre@oracle.com> References: <20200504144939.11318-1-alexandre.chartre@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If an interrupt/exception/NMI is triggered while using ASI then ASI is interrupted and the system switches back to the (kernel) page-table used before entering ASI. When the interrupt/exception/NMI handler returns then ASI is resumed by switching back to the ASI page-table. Signed-off-by: Alexandre Chartre Reported-by: kernel test robot --- arch/x86/entry/calling.h | 26 +++++- arch/x86/entry/entry_64.S | 22 ++++++ arch/x86/include/asm/asi.h | 122 +++++++++++++++++++++++++++++ arch/x86/include/asm/asi_session.h | 7 ++ arch/x86/include/asm/mmu_context.h | 3 +- arch/x86/kernel/asm-offsets.c | 5 ++ arch/x86/mm/asi.c | 67 ++++++++++++++-- 7 files changed, 242 insertions(+), 10 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 0789e13ece90..ca23b79adecf 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -6,6 +6,7 @@ #include #include #include +#include /* @@ -172,7 +173,30 @@ For 32-bit we have the following conventions - kernel is built with .endif .endm -#ifdef CONFIG_PAGE_TABLE_ISOLATION +#if defined(CONFIG_ADDRESS_SPACE_ISOLATION) + +/* + * For now, ASI is not compatible with PTI. + */ + +.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req +.endm + +.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req +.endm + +.macro SWITCH_TO_USER_CR3_STACK scratch_reg:req +.endm + +.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req + ASI_INTERRUPT_AND_SAVE_CR3 \scratch_reg \save_reg +.endm + +.macro RESTORE_CR3 scratch_reg:req save_reg:req + ASI_RESUME_AND_RESTORE_CR3 \save_reg +.endm + +#elif defined(CONFIG_PAGE_TABLE_ISOLATION) /* * PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 0e9504fabe52..ac47da63a29f 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -573,7 +573,15 @@ SYM_CODE_START(interrupt_entry) CALL_enter_from_user_mode +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + jmp 2f +#endif 1: +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Interrupt address space isolation if it is active */ + ASI_INTERRUPT scratch_reg=%rdi +2: +#endif ENTER_IRQ_STACK old_rsp=%rdi save_ret=1 /* We entered an interrupt context - irqs are off: */ TRACE_IRQS_OFF @@ -673,6 +681,10 @@ retint_kernel: jnz 1f call preempt_schedule_irq 1: +#endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + ASI_PREPARE_RESUME + ASI_RESUME scratch_reg=%rdi #endif /* * The iretq could re-enable interrupts: @@ -1238,6 +1250,9 @@ SYM_CODE_START_LOCAL(paranoid_entry) * This is also why CS (stashed in the "iret frame" by the * hardware at entry) can not be used: this may be a return * to kernel code, but with a user CR3 value. + * + * If ASI is enabled, this also handles the case where we are + * using an ASI CR3 value. */ SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14 @@ -1313,6 +1328,13 @@ SYM_CODE_START_LOCAL(error_entry) .Lerror_entry_done_lfence: FENCE_SWAPGS_KERNEL_ENTRY +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* + * Interrupt address space isolation if it is active. This will restore + * the original kernel CR3. + */ + ASI_INTERRUPT scratch_reg=%rdi +#endif .Lerror_entry_done: ret diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index bcfb68e8e392..d240954b2f85 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -108,6 +108,128 @@ extern void asi_set_pagetable(struct asi *asi, pgd_t *pagetable); extern int asi_enter(struct asi *asi); extern void asi_exit(struct asi *asi); +#else /* __ASSEMBLY__ */ + +#include +#include +#include +#include +#include + +#define THIS_ASI_SESSION_asi \ + PER_CPU_VAR(cpu_tlbstate + TLB_STATE_asi) +#define THIS_ASI_SESSION_isolation_cr3 \ + PER_CPU_VAR(cpu_tlbstate + TLB_STATE_asi_isolation_cr3) +#define THIS_ASI_SESSION_original_cr3 \ + PER_CPU_VAR(cpu_tlbstate + TLB_STATE_asi_original_cr3) +#define THIS_ASI_SESSION_idepth \ + PER_CPU_VAR(cpu_tlbstate + TLB_STATE_asi_idepth) + +.macro SET_NOFLUSH_BIT reg:req + bts $X86_CR3_PCID_NOFLUSH_BIT, \reg +.endm + +/* + * Switch CR3 to the original kernel CR3 value. This is used when exiting + * interrupting ASI. + */ +.macro ASI_SWITCH_TO_KERNEL_CR3 scratch_reg:req + /* + * KERNEL pages can always resume with NOFLUSH as we do + * explicit flushes. + */ + movq THIS_ASI_SESSION_original_cr3, \scratch_reg + ALTERNATIVE "", "SET_NOFLUSH_BIT \scratch_reg", X86_FEATURE_PCID + movq \scratch_reg, %cr3 +.endm + +/* + * Interrupt ASI, when there's an interrupt or exception while we + * were running with ASI. + */ +.macro ASI_INTERRUPT scratch_reg:req + movq THIS_ASI_SESSION_asi, \scratch_reg + testq \scratch_reg, \scratch_reg + jz .Lasi_interrupt_done_\@ + incl THIS_ASI_SESSION_idepth + cmp $1, THIS_ASI_SESSION_idepth + jne .Lasi_interrupt_done_\@ + ASI_SWITCH_TO_KERNEL_CR3 \scratch_reg +.Lasi_interrupt_done_\@: +.endm + +.macro ASI_PREPARE_RESUME + call asi_prepare_resume +.endm + +/* + * Resume ASI, after it was interrupted by an interrupt or an exception. + */ +.macro ASI_RESUME scratch_reg:req + movq THIS_ASI_SESSION_asi, \scratch_reg + testq \scratch_reg, \scratch_reg + jz .Lasi_resume_done_\@ + decl THIS_ASI_SESSION_idepth + jnz .Lasi_resume_done_\@ + movq THIS_ASI_SESSION_isolation_cr3, \scratch_reg + mov \scratch_reg, %cr3 +.Lasi_resume_done_\@: +.endm + +/* + * Interrupt ASI, special processing when ASI is interrupted by a NMI + * or a paranoid interrupt/exception. + */ +.macro ASI_INTERRUPT_AND_SAVE_CR3 scratch_reg:req save_reg:req + movq %cr3, \save_reg + /* + * Test the ASI PCID bits. If set, then an ASI page table + * is active. If clear, CR3 already has the kernel page table + * active. + */ + bt $ASI_PGTABLE_BIT, \save_reg + jnc .Ldone_\@ + incl THIS_ASI_SESSION_idepth + ASI_SWITCH_TO_KERNEL_CR3 \scratch_reg +.Ldone_\@: +.endm + +/* + * Resume ASI, special processing when ASI is resumed from a NMI + * or a paranoid interrupt/exception. + */ +.macro ASI_RESUME_AND_RESTORE_CR3 save_reg:req + + ALTERNATIVE "jmp .Lwrite_cr3_\@", "", X86_FEATURE_PCID + + bt $ASI_PGTABLE_BIT, \save_reg + jnc .Lrestore_kernel_cr3_\@ + + /* + * Restore ASI CR3. We need to update TLB flushing + * information. + */ + movq THIS_ASI_SESSION_asi, %rdi + movq \save_reg, %rsi + call asi_update_flush + movq %rax, THIS_ASI_SESSION_isolation_cr3 + decl THIS_ASI_SESSION_idepth + movq %rax, %cr3 + jmp .Ldone_\@ + +.Lrestore_kernel_cr3_\@: + /* + * Restore kernel CR3. KERNEL pages can always resume + * with NOFLUSH as we do explicit flushes. + */ + SET_NOFLUSH_BIT \save_reg + +.Lwrite_cr3_\@: + movq \save_reg, %cr3 + +.Ldone_\@: +.endm + #endif /* __ASSEMBLY__ */ #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ diff --git a/arch/x86/include/asm/asi_session.h b/arch/x86/include/asm/asi_session.h index 9d39c936a4ee..85968f7e8f32 100644 --- a/arch/x86/include/asm/asi_session.h +++ b/arch/x86/include/asm/asi_session.h @@ -10,6 +10,13 @@ struct asi_session { struct asi *asi; /* ASI for this session */ unsigned long isolation_cr3; /* cr3 when ASI is active */ unsigned long original_cr3; /* cr3 before entering ASI */ + /* + * The interrupt depth (idepth) tracks interrupt (actually + * interrupt/exception/NMI) nesting. ASI is interrupted on + * the first interrupt, and it is resumed when that interrupt + * handler returns. + */ + unsigned int idepth; /* interrupt depth */ }; #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 9b03bad00b81..b8c81e7b197a 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -243,7 +243,8 @@ static inline unsigned long __get_current_cr3_fast(void) * field of the ASI session. */ if (IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION) && - this_cpu_read(cpu_asi_session.asi)) { + this_cpu_read(cpu_asi_session.asi) && + !this_cpu_read(cpu_asi_session.idepth)) { cr3 = this_cpu_read(cpu_asi_session.isolation_cr3); /* CR3 read never returns with the NOFLUSH bit */ cr3 &= ~X86_CR3_PCID_NOFLUSH; diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 3ca07ad552ae..4c08a688b4b9 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -94,6 +94,11 @@ static void __used common(void) /* TLB state for the entry code */ OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask); + OFFSET(TLB_STATE_asi, tlb_state, asi_session.asi); + OFFSET(TLB_STATE_asi_isolation_cr3, tlb_state, + asi_session.isolation_cr3); + OFFSET(TLB_STATE_asi_original_cr3, tlb_state, asi_session.original_cr3); + OFFSET(TLB_STATE_asi_idepth, tlb_state, asi_session.idepth); /* Layout info for cpu_entry_area */ OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index cf0d122a3c72..c91ba82a095b 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -68,7 +68,7 @@ EXPORT_SYMBOL(asi_set_pagetable); * Return an updated ASI CR3 value which specified if TLB needs to * be flushed or not. */ -static unsigned long asi_update_flush(struct asi *asi, unsigned long asi_cr3) +unsigned long asi_update_flush(struct asi *asi, unsigned long asi_cr3) { struct asi_tlb_pgtable *tlb_pgtable; struct asi_tlb_state *tlb_state; @@ -90,7 +90,24 @@ static unsigned long asi_update_flush(struct asi *asi, unsigned long asi_cr3) return asi_cr3; } -static void asi_switch_to_asi_cr3(struct asi *asi) + +/* + * Switch to the ASI pagetable. + * + * If schedule is ASI_SWITCH_NOW, then immediately switch to the ASI + * pagetable by updating the CR3 register with the ASI CR3 value. + * Otherwise, if schedule is ASI_SWITCH_ON_RESUME, prepare everything + * for switching to ASI pagetable but do not update the CR3 register + * yet. This will be done by the next ASI_RESUME call. + */ + +enum asi_switch_schedule { + ASI_SWITCH_NOW, + ASI_SWITCH_ON_RESUME, +}; + +static void asi_switch_to_asi_cr3(struct asi *asi, + enum asi_switch_schedule schedule) { unsigned long original_cr3, asi_cr3; struct asi_session *asi_session; @@ -114,8 +131,16 @@ static void asi_switch_to_asi_cr3(struct asi *asi) asi_session->original_cr3 = original_cr3; asi_session->isolation_cr3 = asi_cr3; - /* Update CR3 to immediately enter ASI */ - native_write_cr3(asi_cr3); + if (schedule == ASI_SWITCH_ON_RESUME) { + /* + * Defer the CR3 update the next ASI resume by setting + * the interrupt depth to 1. + */ + asi_session->idepth = 1; + } else { + /* Update CR3 to immediately enter ASI */ + native_write_cr3(asi_cr3); + } } static void asi_switch_to_kernel_cr3(struct asi *asi) @@ -132,6 +157,7 @@ static void asi_switch_to_kernel_cr3(struct asi *asi) asi_session = &get_cpu_var(cpu_asi_session); asi_session->asi = NULL; + asi_session->idepth = 0; } int asi_enter(struct asi *asi) @@ -153,7 +179,7 @@ int asi_enter(struct asi *asi) } local_irq_save(flags); - asi_switch_to_asi_cr3(asi); + asi_switch_to_asi_cr3(asi, ASI_SWITCH_NOW); local_irq_restore(flags); return 0; @@ -162,8 +188,10 @@ EXPORT_SYMBOL(asi_enter); void asi_exit(struct asi *asi) { + struct asi_session *asi_session; struct asi *current_asi; unsigned long flags; + int idepth; current_asi = this_cpu_read(cpu_asi_session.asi); if (!current_asi) { @@ -173,8 +201,31 @@ void asi_exit(struct asi *asi) WARN_ON(current_asi != asi); - local_irq_save(flags); - asi_switch_to_kernel_cr3(asi); - local_irq_restore(flags); + idepth = this_cpu_read(cpu_asi_session.idepth); + if (!idepth) { + local_irq_save(flags); + asi_switch_to_kernel_cr3(asi); + local_irq_restore(flags); + } else { + /* + * ASI was interrupted so we already switched back + * to the back to the kernel page table and we just + * need to clear the ASI session. + */ + asi_session = &get_cpu_var(cpu_asi_session); + asi_session->asi = NULL; + asi_session->idepth = 0; + } } EXPORT_SYMBOL(asi_exit); + +void asi_prepare_resume(void) +{ + struct asi_session *asi_session; + + asi_session = &get_cpu_var(cpu_asi_session); + if (!asi_session->asi || asi_session->idepth > 1) + return; + + asi_switch_to_asi_cr3(asi_session->asi, ASI_SWITCH_ON_RESUME); +} From patchwork Mon May 4 14:49:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Chartre X-Patchwork-Id: 11526389 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 17A7C1668 for ; Mon, 4 May 2020 14:51:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD9B3206B9 for ; Mon, 4 May 2020 14:51:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="L026+B3Q" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD9B3206B9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3A0118E0025; Mon, 4 May 2020 10:51:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 306808E0021; Mon, 4 May 2020 10:51:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 158848E0025; Mon, 4 May 2020 10:51:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0174.hostedemail.com [216.40.44.174]) by kanga.kvack.org (Postfix) with ESMTP id E54C18E0024 for ; Mon, 4 May 2020 10:51:33 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AC9648248D51 for ; Mon, 4 May 2020 14:51:33 +0000 (UTC) X-FDA: 76779325266.04.wish90_81f333773fe48 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alexandre.chartre@oracle.com,,RULES_HIT:30045:30054:30064:30069:30091,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:2:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: wish90_81f333773fe48 X-Filterd-Recvd-Size: 8704 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Mon, 4 May 2020 14:51:32 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044Elb1t116460; Mon, 4 May 2020 14:51:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=61u5E3r7FKjiKSK7bv9u8QvVvHDSNPsfx2pYV6Bq610=; b=L026+B3QsjeTLnHt5BXCtOMcdosY2tn5zYViP0f12HHDe3jihPgn/7lfW5ZH/xzGf6tV lvYMdtvHS/531InAezTehMs4M93vNxtBR+bu1U+UmXd22N2Xx3UQZ/XBG5B0C+rJANGs 2Ooqmh+mZdqogknL8hnhEK5GqUHklgsn3AeIWWROmhHGEyB454kX7kptGDQ0ZNmOP8JY Ad2fcWFKMbDm+NulBfZ0Zf0Uezijyrz7A7neqWC1ss3rnqzHrf1c9fVN6KrdWWacz0zy CHRafa0abI0/BNS1AmzUqLDAhjHMwxC+uCJYlc/shODumFcX/IvVEfO9xrCFSGpyMcyj rQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 30s09qydud-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:24 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044EmWa9105788; Mon, 4 May 2020 14:51:23 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3020.oracle.com with ESMTP id 30sjjvyw3u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:23 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 044EpL4E027428; Mon, 4 May 2020 14:51:21 GMT Received: from linux-1.home.com (/10.175.9.166) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 May 2020 07:51:21 -0700 From: Alexandre Chartre To: rkrcmar@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, konrad.wilk@oracle.com, jan.setjeeilers@oracle.com, liran.alon@oracle.com, junaids@google.com, graf@amazon.de, rppt@linux.vnet.ibm.com, kuzuno@gmail.com, mgross@linux.intel.com, alexandre.chartre@oracle.com Subject: [RFC v4][PATCH part-1 5/7] mm/asi: Exit/enter ASI when task enters/exits scheduler Date: Mon, 4 May 2020 16:49:37 +0200 Message-Id: <20200504144939.11318-6-alexandre.chartre@oracle.com> X-Mailer: git-send-email 2.18.2 In-Reply-To: <20200504144939.11318-1-alexandre.chartre@oracle.com> References: <20200504144939.11318-1-alexandre.chartre@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 mlxscore=0 phishscore=0 bulkscore=0 malwarescore=0 spamscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=0 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Exit ASI as soon as a task is entering the scheduler (__schedule()), otherwise ASI will likely quick fault, for example when accessing run queues. The task will return to ASI when it is scheduled again. Signed-off-by: Alexandre Chartre --- arch/x86/include/asm/asi.h | 3 ++ arch/x86/mm/asi.c | 67 ++++++++++++++++++++++++++++++++++++++ include/linux/sched.h | 9 +++++ kernel/sched/core.c | 17 ++++++++++ 4 files changed, 96 insertions(+) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index d240954b2f85..a0733f1e4a67 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -102,6 +102,9 @@ struct asi { unsigned long base_cr3; /* base ASI CR3 */ }; +void asi_schedule_out(struct task_struct *task); +void asi_schedule_in(struct task_struct *task); + extern struct asi *asi_create(struct asi_type *type); extern void asi_destroy(struct asi *asi); extern void asi_set_pagetable(struct asi *asi, pgd_t *pagetable); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index c91ba82a095b..3795582c66d8 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -229,3 +229,70 @@ void asi_prepare_resume(void) asi_switch_to_asi_cr3(asi_session->asi, ASI_SWITCH_ON_RESUME); } + +void asi_schedule_out(struct task_struct *task) +{ + struct asi_session *asi_session; + unsigned long flags; + struct asi *asi; + + asi = this_cpu_read(cpu_asi_session.asi); + if (!asi) + return; + + /* + * Save the ASI session. + * + * Exit the session if it hasn't been interrupted, otherwise + * just save the session state. + */ + local_irq_save(flags); + if (!this_cpu_read(cpu_asi_session.idepth)) { + asi_switch_to_kernel_cr3(asi); + task->asi_session.asi = asi; + task->asi_session.idepth = 0; + } else { + asi_session = &get_cpu_var(cpu_asi_session); + task->asi_session = *asi_session; + asi_session->asi = NULL; + asi_session->idepth = 0; + } + local_irq_restore(flags); +} + +void asi_schedule_in(struct task_struct *task) +{ + struct asi_session *asi_session; + unsigned long flags; + struct asi *asi; + + asi = task->asi_session.asi; + if (!asi) + return; + + /* + * At this point, the CPU shouldn't be using ASI because the + * ASI session is expected to be cleared in asi_schedule_out(). + */ + WARN_ON(this_cpu_read(cpu_asi_session.asi)); + + /* + * Restore ASI. + * + * If the task was scheduled out while using ASI, then the ASI + * is already setup and we can immediately switch to ASI page + * table. + * + * Otherwise, if the task was scheduled out while ASI was + * interrupted, just restore the ASI session. + */ + local_irq_save(flags); + if (!task->asi_session.idepth) { + asi_switch_to_asi_cr3(asi, ASI_SWITCH_NOW); + } else { + asi_session = &get_cpu_var(cpu_asi_session); + *asi_session = task->asi_session; + } + task->asi_session.asi = NULL; + local_irq_restore(flags); +} diff --git a/include/linux/sched.h b/include/linux/sched.h index 4418f5cb8324..ea86bda713ee 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -10,6 +10,7 @@ #include #include +#include #include #include @@ -1289,6 +1290,14 @@ struct task_struct { unsigned long prev_lowest_stack; #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* + * ASI session is saved here when the task is scheduled out + * while an ASI session was active or interrupted. + */ + struct asi_session asi_session; +#endif + /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9a2fbf98fd6f..140071cfa25d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -14,6 +14,7 @@ #include #include +#include #include "../workqueue_internal.h" #include "../../fs/io-wq.h" @@ -3241,6 +3242,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) } tick_nohz_task_switch(); + return rq; } @@ -4006,6 +4008,14 @@ static void __sched notrace __schedule(bool preempt) struct rq *rq; int cpu; + /* + * If the task is using ASI then exit it right away otherwise the + * ASI will likely quickly fault, for example when accessing run + * queues. + */ + if (IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION)) + asi_schedule_out(current); + cpu = smp_processor_id(); rq = cpu_rq(cpu); prev = rq->curr; @@ -4087,6 +4097,13 @@ static void __sched notrace __schedule(bool preempt) } balance_callback(rq); + + /* + * Now the task will resume execution, we can safely return to + * its ASI if one was in used. + */ + if (IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION)) + asi_schedule_in(current); } void __noreturn do_task_dead(void) From patchwork Mon May 4 14:49:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Chartre X-Patchwork-Id: 11526393 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B993015E6 for ; Mon, 4 May 2020 14:51:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6A928206B9 for ; Mon, 4 May 2020 14:51:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="XZKYWr18" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A928206B9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 32A9F8E0026; Mon, 4 May 2020 10:51:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2B33C8E0024; Mon, 4 May 2020 10:51:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A3498E0026; Mon, 4 May 2020 10:51:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id F214F8E0024 for ; Mon, 4 May 2020 10:51:41 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B061116966 for ; Mon, 4 May 2020 14:51:41 +0000 (UTC) X-FDA: 76779325602.20.pies06_83241cd9d2c37 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alexandre.chartre@oracle.com,,RULES_HIT:30003:30025:30029:30045:30051:30054:30062:30064:30069:30070:30075:30080:30091,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: pies06_83241cd9d2c37 X-Filterd-Recvd-Size: 12512 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Mon, 4 May 2020 14:51:39 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044ElhN0024557; Mon, 4 May 2020 14:51:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=TS1kNnb45s8+qP1Fov+EsfUHmkKg6QnEgbUAWnEhWdk=; b=XZKYWr18cq07husTSVI30bZnqXq1iiifkdxU4bl9wU3VrxI3y+JqPcNY0A8rpkUJNBhD F4m1z/cud070WfMZ1dR/pF+Y/cXgjHMKrN1DH3g/4c3xJKL3Y6U3pGPuF+9Ba6Jhlr/u I/Bl3l6WKoRzOQULvyOq9Utc7rRBQVsJGNWbUOe2yFdPU1aiyVSxy/u2p19/xpQZtOXS wd4uvZZUS8hHzj+4PFg/lRCjMCt7Sp6sV3ayLr4tCkVCk4AHT4xvdiFr2NS3uqyAmbjj rB15/NW34SOpmhqQ5Y7PFnP2XOrfD887MG6ME7PxXtpyPD6mq2a8ffNpKx3t4XEqtuJv jA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2120.oracle.com with ESMTP id 30s1gmy6m7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:30 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044ElLdk033746; Mon, 4 May 2020 14:51:29 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3020.oracle.com with ESMTP id 30sjnav447-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:29 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 044EpR0r022638; Mon, 4 May 2020 14:51:27 GMT Received: from linux-1.home.com (/10.175.9.166) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 May 2020 07:51:26 -0700 From: Alexandre Chartre To: rkrcmar@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, konrad.wilk@oracle.com, jan.setjeeilers@oracle.com, liran.alon@oracle.com, junaids@google.com, graf@amazon.de, rppt@linux.vnet.ibm.com, kuzuno@gmail.com, mgross@linux.intel.com, alexandre.chartre@oracle.com Subject: [RFC v4][PATCH part-1 6/7] mm/asi: ASI fault handler Date: Mon, 4 May 2020 16:49:38 +0200 Message-Id: <20200504144939.11318-7-alexandre.chartre@oracle.com> X-Mailer: git-send-email 2.18.2 In-Reply-To: <20200504144939.11318-1-alexandre.chartre@oracle.com> References: <20200504144939.11318-1-alexandre.chartre@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add an ASI fault handler and options to define the handler behavior. Depending on the ASI, the ASI fault handler can either abort the isolation and retry the faulty instruction with the full kernel page-table, or preserve the isolation and process the fault like any regular fault. If isolation is aborted then the location and address of the fault can be logged and optionally include a stack trace. Signed-off-by: Alexandre Chartre --- arch/x86/include/asm/asi.h | 42 ++++++++++++++++- arch/x86/mm/asi.c | 95 ++++++++++++++++++++++++++++++++++++++ arch/x86/mm/fault.c | 20 ++++++++ 3 files changed, 156 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index a0733f1e4a67..b8d7b936cd19 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -66,6 +66,7 @@ struct asi_type { int pcid_prefix; /* PCID prefix */ struct asi_tlb_state *tlb_state; /* percpu ASI TLB state */ atomic64_t last_pgtable_id; /* last id for this type */ + bool fault_abort; /* abort ASI on fault? */ }; /* @@ -75,12 +76,13 @@ struct asi_type { * (asi_create_()) to easily create an ASI of the * specified type. */ -#define DEFINE_ASI_TYPE(name, pcid_prefix) \ +#define DEFINE_ASI_TYPE(name, pcid_prefix, fault_abort) \ DEFINE_PER_CPU(struct asi_tlb_state, asi_tlb_ ## name); \ struct asi_type asi_type_ ## name = { \ pcid_prefix, \ &asi_tlb_ ## name, \ ATOMIC64_INIT(1), \ + fault_abort \ }; \ EXPORT_SYMBOL(asi_type_ ## name) @@ -94,16 +96,49 @@ static inline struct asi *asi_create_ ## name(void) \ return asi_create(&asi_type_ ## name); \ } +/* ASI fault log size */ +#define ASI_FAULT_LOG_SIZE 128 + +/* + * Options to specify the fault log policy when a fault occurs + * while using ASI. + * + * When set, ASI_FAULT_LOG_KERNEL|USER log the address and location + * of the fault. In addition, if ASI_FAULT_LOG_STACK is set, the stack + * trace where the fault occurred is also logged. + * + * Faults are logged only for ASIs with a type which aborts ASI on an + * ASI fault (see fault_abort in struct asi_type). + */ +#define ASI_FAULT_LOG_KERNEL 0x01 /* log kernel faults */ +#define ASI_FAULT_LOG_USER 0x02 /* log user faults */ +#define ASI_FAULT_LOG_STACK 0x04 /* log stack trace */ + +enum asi_fault_origin { + ASI_FAULT_KERNEL = ASI_FAULT_LOG_KERNEL, + ASI_FAULT_USER = ASI_FAULT_LOG_USER, +}; + +struct asi_fault_log { + unsigned long address; /* fault address */ + unsigned long count; /* fault count */ +}; + struct asi { struct asi_type *type; /* ASI type */ pgd_t *pagetable; /* ASI pagetable */ u64 pgtable_id; /* ASI pagetable ID */ atomic64_t pgtable_gen; /* ASI pagetable generation */ unsigned long base_cr3; /* base ASI CR3 */ + spinlock_t fault_lock; /* protect fault_log_* */ + struct asi_fault_log fault_log[ASI_FAULT_LOG_SIZE]; + int fault_log_policy; /* fault log policy */ }; void asi_schedule_out(struct task_struct *task); void asi_schedule_in(struct task_struct *task); +bool asi_fault(struct pt_regs *regs, unsigned long error_code, + unsigned long address, enum asi_fault_origin fault_origin); extern struct asi *asi_create(struct asi_type *type); extern void asi_destroy(struct asi *asi); @@ -111,6 +146,11 @@ extern void asi_set_pagetable(struct asi *asi, pgd_t *pagetable); extern int asi_enter(struct asi *asi); extern void asi_exit(struct asi *asi); +static inline void asi_set_log_policy(struct asi *asi, int policy) +{ + asi->fault_log_policy = policy; +} + #else /* __ASSEMBLY__ */ #include diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 3795582c66d8..a4a5d35fb779 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -6,6 +6,7 @@ */ #include +#include #include #include @@ -13,6 +14,97 @@ #include #include +static void asi_log_fault(struct asi *asi, struct pt_regs *regs, + unsigned long error_code, unsigned long address, + enum asi_fault_origin fault_origin) +{ + int i; + + /* + * Log information about the fault only if this is a fault + * we don't know about yet (and the fault log is not full). + */ + spin_lock(&asi->fault_lock); + if (!(asi->fault_log_policy & fault_origin)) { + spin_unlock(&asi->fault_lock); + return; + } + for (i = 0; i < ASI_FAULT_LOG_SIZE; i++) { + if (asi->fault_log[i].address == regs->ip) { + asi->fault_log[i].count++; + spin_unlock(&asi->fault_lock); + return; + } + if (!asi->fault_log[i].address) { + asi->fault_log[i].address = regs->ip; + asi->fault_log[i].count = 1; + break; + } + } + + if (i >= ASI_FAULT_LOG_SIZE) { + pr_warn("ASI %p: fault log buffer is full [%d]\n", + asi, i); + } + + pr_info("ASI %p: PF#%d (%ld) at %pS on %px\n", asi, i, + error_code, (void *)regs->ip, (void *)address); + + if (asi->fault_log_policy & ASI_FAULT_LOG_STACK) + show_stack(NULL, (unsigned long *)regs->sp); + + spin_unlock(&asi->fault_lock); +} + +bool asi_fault(struct pt_regs *regs, unsigned long error_code, + unsigned long address, enum asi_fault_origin fault_origin) +{ + struct asi_session *asi_session; + + /* + * If address space isolation was active when the fault occurred + * then the page fault handler has interrupted the isolation + * (exception handlers interrupt isolation very early) and switched + * CR3 back to its original kernel value. So we can safely retrieved + * the CPU ASI session. + */ + asi_session = &get_cpu_var(cpu_asi_session); + + /* + * If address space isolation is not active, or we have a fault + * after isolation was aborted then this was not a fault while + * using ASI and we don't handle it. + */ + if (!asi_session->asi || asi_session->idepth > 1) + return false; + + /* + * We have a fault while the CPU is using address space isolation. + * Depending on the ASI fault policy, either: + * + * - Abort the isolation. The ASI used when the fault occurred is + * aborted, and the faulty instruction is immediately retried. + * The fault is not processed by the system fault handler. The + * fault handler will return immediately, the system will not + * restore the ASI pagetable and will continue to run with the + * full kernel pagetable. + * + * - Or preserve the isolation. The system fault handler will + * process the fault like any regular fault. The ASI pagetable + * be restored after the fault has been handled and the system + * fault handler returns. + */ + if (asi_session->asi->type->fault_abort) { + asi_log_fault(asi_session->asi, regs, error_code, + address, fault_origin); + asi_session->asi = NULL; + asi_session->idepth = 0; + return true; + } + + return false; +} + struct asi *asi_create(struct asi_type *type) { struct asi *asi; @@ -27,6 +119,9 @@ struct asi *asi_create(struct asi_type *type) asi->type = type; asi->pgtable_id = atomic64_inc_return(&type->last_pgtable_id); atomic64_set(&asi->pgtable_gen, 0); + spin_lock_init(&asi->fault_lock); + /* by default, log ASI kernel faults */ + asi->fault_log_policy = ASI_FAULT_LOG_KERNEL; return asi; } diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index a51df516b87b..fa278030df65 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -30,6 +30,7 @@ #include /* store_idt(), ... */ #include /* exception stack */ #include /* VMALLOC_START, ... */ +#include /* asi_fault() */ #define CREATE_TRACE_POINTS #include @@ -1257,6 +1258,15 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, */ WARN_ON_ONCE(hw_error_code & X86_PF_PK); + /* + * Check if the fault occurs with ASI and if the ASI handler + * handles it. + */ + if (IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION) && + asi_fault(regs, hw_error_code, address, ASI_FAULT_KERNEL)) { + return; + } + /* * We can fault-in kernel-space virtual memory on-demand. The * 'reference' page table is init_mm.pgd. @@ -1312,6 +1322,16 @@ void do_user_addr_fault(struct pt_regs *regs, vm_fault_t fault, major = 0; unsigned int flags = FAULT_FLAG_DEFAULT; + + /* + * Check if the fault occurs with ASI and if the ASI handler + * handles it. + */ + if (IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION) && + asi_fault(regs, hw_error_code, address, ASI_FAULT_USER)) { + return; + } + tsk = current; mm = tsk->mm; From patchwork Mon May 4 14:49:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Chartre X-Patchwork-Id: 11526397 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF8B715E6 for ; Mon, 4 May 2020 14:53:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9B817206D9 for ; Mon, 4 May 2020 14:53:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="CE1y+v2S" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9B817206D9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AF1348E0029; Mon, 4 May 2020 10:53:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AC8AA8E0024; Mon, 4 May 2020 10:53:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DE438E0029; Mon, 4 May 2020 10:53:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id 865E78E0024 for ; Mon, 4 May 2020 10:53:43 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 46FA8181AC9B6 for ; Mon, 4 May 2020 14:53:43 +0000 (UTC) X-FDA: 76779330726.08.store26_338da084d336 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alexandre.chartre@oracle.com,,RULES_HIT:30003:30012:30045:30051:30054:30064:30080,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: store26_338da084d336 X-Filterd-Recvd-Size: 15521 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Mon, 4 May 2020 14:53:41 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044ElbHk116462; Mon, 4 May 2020 14:53:33 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=KfFqsMI5jGd2bd0XMxuaC8d2Cmv1uDxwfqDjnqPviiU=; b=CE1y+v2Sc98uVioaW5uQ/yM6AzfykXaej2nDMdjVajUZ9kjTmy+68nyj5gDDIDotGCKH zjLQoHLxdmstHDyAnSQHMGA9KdbcCmtgCvSVL5Pylv8EKDdLMmhw8NGBSlNghiEpHqZ8 G0oGK+zPMS6Mi+TaB9MnmhTJYFHxfonrawF3fLdb3iR9N0qo+aoGFt0hS8N7tYQCSqSx z5WZwFk5GpE1IgvVxw+gUPckeeohzy4bbJL9EnWB0wWGo6lZ9vxah7tBKlBytER4v0MO w/1mN/GJXv79SU6nJ09M1wChNfuJp/xgzY+URoRZdUY9ViJej90MfEAczHzV4tAqRAHD 8g== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 30s09qye95-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:53:32 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 044ElLVD033713; Mon, 4 May 2020 14:51:32 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 30sjnav48n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 May 2020 14:51:32 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 044EpVJN027509; Mon, 4 May 2020 14:51:31 GMT Received: from linux-1.home.com (/10.175.9.166) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 May 2020 07:51:30 -0700 From: Alexandre Chartre To: rkrcmar@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, konrad.wilk@oracle.com, jan.setjeeilers@oracle.com, liran.alon@oracle.com, junaids@google.com, graf@amazon.de, rppt@linux.vnet.ibm.com, kuzuno@gmail.com, mgross@linux.intel.com, alexandre.chartre@oracle.com Subject: [RFC v4][PATCH part-1 7/7] mm/asi: Implement PTI with ASI Date: Mon, 4 May 2020 16:49:39 +0200 Message-Id: <20200504144939.11318-8-alexandre.chartre@oracle.com> X-Mailer: git-send-email 2.18.2 In-Reply-To: <20200504144939.11318-1-alexandre.chartre@oracle.com> References: <20200504144939.11318-1-alexandre.chartre@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9610 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040123 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: ASI supersedes PTI. If both CONFIG_ADDRESS_SPACE_ISOLATION and CONFIG_PAGE_TABLE_ISOLATION are set then PTI is implemented using ASI. For each user process, a "user" ASI is then defined with the PTI pagetable. The user ASI is used when running userland code, and it is exited when entering a syscall. The user ASI is re-entered when the syscall returns to userland. As with any ASI, interrupts/exceptions/NMIs will interrupt the ASI, the ASI will resume when the interrupt/exception/NMI has completed. Faults won't abort the user ASI as user faults are handled by the kernel before returning to userland. Signed-off-by: Alexandre Chartre --- arch/x86/entry/calling.h | 13 ++++++++++++- arch/x86/entry/common.c | 29 ++++++++++++++++++++++++----- arch/x86/entry/entry_64.S | 6 ++++++ arch/x86/include/asm/asi.h | 9 +++++++++ arch/x86/include/asm/tlbflush.h | 11 +++++++++-- arch/x86/mm/asi.c | 9 +++++++++ arch/x86/mm/pti.c | 28 ++++++++++++++++++++-------- include/linux/mm_types.h | 5 +++++ kernel/fork.c | 17 +++++++++++++++++ 9 files changed, 111 insertions(+), 16 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index ca23b79adecf..e452fce1435f 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -176,16 +176,27 @@ For 32-bit we have the following conventions - kernel is built with #if defined(CONFIG_ADDRESS_SPACE_ISOLATION) /* - * For now, ASI is not compatible with PTI. + * ASI supersedes the entry points used by PTI. If both + * CONFIG_ADDRESS_SPACE_ISOLATION and CONFIG_PAGE_TABLE_ISOLATION are + * set then PTI is implemented using ASI. */ .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + ASI_INTERRUPT \scratch_reg +.Lend_\@: .endm .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + ASI_RESUME \scratch_reg +.Lend_\@: .endm .macro SWITCH_TO_USER_CR3_STACK scratch_reg:req + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + ASI_RESUME \scratch_reg +.Lend_\@: .endm .macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 76735ec813e6..752b6672d455 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -35,6 +35,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -50,6 +51,13 @@ __visible inline void enter_from_user_mode(void) static inline void enter_from_user_mode(void) {} #endif +static inline void syscall_enter(void) +{ + /* syscall enter has interrupted ASI, now exit ASI */ + asi_exit(current->mm->user_asi); + enter_from_user_mode(); +} + static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch) { #ifdef CONFIG_X86_64 @@ -225,6 +233,17 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs) mds_user_clear_cpu_buffers(); } +static inline void prepare_syscall_return(struct pt_regs *regs) +{ + prepare_exit_to_usermode(regs); + + /* + * Syscall return will resume ASI, prepare resume to enter + * user ASI. + */ + asi_deferred_enter(current->mm->user_asi); +} + #define SYSCALL_EXIT_WORK_FLAGS \ (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT) @@ -276,7 +295,7 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs) syscall_slow_exit_work(regs, cached_flags); local_irq_disable(); - prepare_exit_to_usermode(regs); + prepare_syscall_return(regs); } #ifdef CONFIG_X86_64 @@ -284,7 +303,7 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs) { struct thread_info *ti; - enter_from_user_mode(); + syscall_enter(); local_irq_enable(); ti = current_thread_info(); if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) @@ -343,7 +362,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs) /* Handles int $0x80 */ __visible void do_int80_syscall_32(struct pt_regs *regs) { - enter_from_user_mode(); + syscall_enter(); local_irq_enable(); do_syscall_32_irqs_on(regs); } @@ -366,7 +385,7 @@ __visible long do_fast_syscall_32(struct pt_regs *regs) */ regs->ip = landing_pad; - enter_from_user_mode(); + syscall_enter(); local_irq_enable(); @@ -388,7 +407,7 @@ __visible long do_fast_syscall_32(struct pt_regs *regs) /* User code screwed up. */ local_irq_disable(); regs->ax = -EFAULT; - prepare_exit_to_usermode(regs); + prepare_syscall_return(regs); return 0; /* Keep it simple: use IRET. */ } diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index ac47da63a29f..003c945dd6b0 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -627,6 +627,9 @@ ret_from_intr: .Lretint_user: mov %rsp,%rdi call prepare_exit_to_usermode +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + ASI_PREPARE_RESUME +#endif TRACE_IRQS_ON SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL) @@ -1491,6 +1494,9 @@ SYM_CODE_START(nmi) movq %rsp, %rdi movq $-1, %rsi call do_nmi +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + ASI_PREPARE_RESUME +#endif /* * Return back to user mode. We must *not* do the normal exit diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index b8d7b936cd19..ac0594d4f549 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -62,6 +62,10 @@ struct asi_tlb_state { struct asi_tlb_pgtable tlb_pgtables[ASI_TLB_NR_DYN_ASIDS]; }; +#ifdef CONFIG_PAGE_TABLE_ISOLATION +#define ASI_PCID_PREFIX_USER 0x80 /* user ASI */ +#endif + struct asi_type { int pcid_prefix; /* PCID prefix */ struct asi_tlb_state *tlb_state; /* percpu ASI TLB state */ @@ -139,6 +143,7 @@ void asi_schedule_out(struct task_struct *task); void asi_schedule_in(struct task_struct *task); bool asi_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address, enum asi_fault_origin fault_origin); +void asi_deferred_enter(struct asi *asi); extern struct asi *asi_create(struct asi_type *type); extern void asi_destroy(struct asi *asi); @@ -146,6 +151,10 @@ extern void asi_set_pagetable(struct asi *asi, pgd_t *pagetable); extern int asi_enter(struct asi *asi); extern void asi_exit(struct asi *asi); +#ifdef CONFIG_PAGE_TABLE_ISOLATION +DECLARE_ASI_TYPE(user); +#endif + static inline void asi_set_log_policy(struct asi *asi, int policy) { asi->fault_log_policy = policy; diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 241058ff63ba..db114deeb763 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -390,6 +390,8 @@ extern void initialize_tlbstate_and_flush(void); */ static inline void invalidate_user_asid(u16 asid) { + struct asi_tlb_state *tlb_state; + /* There is no user ASID if address space separation is off */ if (!IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION)) return; @@ -404,8 +406,13 @@ static inline void invalidate_user_asid(u16 asid) if (!static_cpu_has(X86_FEATURE_PTI)) return; - __set_bit(kern_pcid(asid), - (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask)); + if (IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION)) { + tlb_state = get_cpu_ptr(asi_type_user.tlb_state); + tlb_state->tlb_pgtables[asid].id = 0; + } else { + __set_bit(kern_pcid(asid), + (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask)); + } } /* diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index a4a5d35fb779..b63a0a883293 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -14,6 +14,10 @@ #include #include +#ifdef CONFIG_PAGE_TABLE_ISOLATION +DEFINE_ASI_TYPE(user, ASI_PCID_PREFIX_USER, false); +#endif + static void asi_log_fault(struct asi *asi, struct pt_regs *regs, unsigned long error_code, unsigned long address, enum asi_fault_origin fault_origin) @@ -314,6 +318,11 @@ void asi_exit(struct asi *asi) } EXPORT_SYMBOL(asi_exit); +void asi_deferred_enter(struct asi *asi) +{ + asi_switch_to_asi_cr3(asi, ASI_SWITCH_ON_RESUME); +} + void asi_prepare_resume(void) { struct asi_session *asi_session; diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 843aa10a4cb6..a1d09c163709 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -430,6 +430,18 @@ static void __init pti_clone_p4d(unsigned long addr) *user_p4d = *kernel_p4d; } +static void __init pti_map_va(unsigned long va) +{ + phys_addr_t pa = per_cpu_ptr_to_phys((void *)va); + pte_t *target_pte; + + target_pte = pti_user_pagetable_walk_pte(va); + if (WARN_ON(!target_pte)) + return; + + *target_pte = pfn_pte(pa >> PAGE_SHIFT, PAGE_KERNEL); +} + /* * Clone the CPU_ENTRY_AREA and associated data into the user space visible * page table. @@ -457,15 +469,15 @@ static void __init pti_clone_user_shared(void) * is set up. */ - unsigned long va = (unsigned long)&per_cpu(cpu_tss_rw, cpu); - phys_addr_t pa = per_cpu_ptr_to_phys((void *)va); - pte_t *target_pte; - - target_pte = pti_user_pagetable_walk_pte(va); - if (WARN_ON(!target_pte)) - return; + pti_map_va((unsigned long)&per_cpu(cpu_tss_rw, cpu)); - *target_pte = pfn_pte(pa >> PAGE_SHIFT, PAGE_KERNEL); + if (IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION)) { + /* + * Map the ASI session. We need to always be able + * to access the ASI session. + */ + pti_map_va((unsigned long)&per_cpu(cpu_tlbstate, cpu)); + } } } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 4aba6c0c2ba8..e2c6d63f39e5 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -25,6 +25,7 @@ struct address_space; struct mem_cgroup; +struct asi; /* * Each physical page in the system has a struct page associated with @@ -534,6 +535,10 @@ struct mm_struct { atomic_long_t hugetlb_usage; #endif struct work_struct async_put_work; +#if defined(CONFIG_ADDRESS_SPACE_ISOLATION) && defined(CONFIG_PAGE_TABLE_ISOLATION) + /* ASI used for user address space */ + struct asi *user_asi; +#endif } __randomize_layout; /* diff --git a/kernel/fork.c b/kernel/fork.c index 8c700f881d92..f245f9a4c55d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -101,6 +101,7 @@ #include #include #include +#include #include @@ -698,6 +699,10 @@ void __mmdrop(struct mm_struct *mm) mmu_notifier_subscriptions_destroy(mm); check_mm(mm); put_user_ns(mm->user_ns); + if (IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION) && + IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION)) { + asi_destroy(mm->user_asi); + } free_mm(mm); } EXPORT_SYMBOL_GPL(__mmdrop); @@ -1049,6 +1054,18 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, if (init_new_context(p, mm)) goto fail_nocontext; + if (IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION) && + IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION)) { + /* + * If we have PTI and ASI then use ASI to switch between + * user and kernel spaces, so create an ASI for this mm. + */ + mm->user_asi = asi_create_user(); + if (!mm->user_asi) + goto fail_nocontext; + asi_set_pagetable(mm->user_asi, kernel_to_user_pgdp(mm->pgd)); + } + mm->user_ns = get_user_ns(user_ns); return mm;